AWS re:Invent -New services and features

Amazon Web Service’s has come up with many new services and features in recently held AWS re:Invent 2016. In this blog post I am writing about new announcements related to storage and data migration .

AWS Snowmobile:

AWS snowmobile is nothing but a beast to carry petabytes of your data to AWS cloud over a truck. yes on a truck, I am not kidding 🙂

snow-ball

This secure data truck stores up to 100 PB of data and can help you to move exabytes to AWS in a matter of weeks. Snow mobile attaches to you data center network and
appears as local NFS mount volume. It includes a network cable connected to a high-speed switch capable of supporting 1 Tb/sec of data transfer spread across
multiple 40 Gb/sec connections. As per AWS official blogs Snowmobile is available in all AWS Regions. We need to contact AWS Sales team to use this service.

AWS Snowball Edge:

The new Snowball Edge appliance has all the features of its twin brother Snowball which is launched in 2015.

snow-ball-ede

AWS Snowball Edge is Petabyte-scale data transport with  on-board storage and compute. It arrives with your S3 buckets, Lambda code, and clustering configuration pre- installed. you can execute AWS Lambda functions and process data locally on the AWS Snowball Edge.You can order Snowball Edge with just a few clicks in the AWS Management Console.

Amazon EFS – New Feature

Amazon Elastic File System (Amazon EFS) provides simple file storage for use with Amazon EC2 instances in the AWS Cloud. With new feature we can mount Amazon EFS file systems on our on-premises datacenter servers when connected to Amazon VPC with AWS Direct Connect.

S3 Storage Management with new features

S3 Object Tagging:

S3 Object tags are key-value pairs applied to S3 objects which can be created, updated or deleted at any time during the lifetime of the object.With
these users have the ability to create IAM policies, setup Lifecycle policies, and customize storage metrics.

S3 Analytics, Storage Class Analysis:

This new S3 Analytics feature automatically identifies the optimal lifecycle policy to move less frequently accessed storage to S3 Standard – Infrequent Access .Users
can configure a storage class analysis policy to monitor an entire bucket, a prefix, or object tag. Once an infrequent access pattern is observed, we can easily create
a new lifecycle age policy based on the results.

S3 CloudWatch Metrics :

This helps understand and improve the performance of applications that use Amazon S3 by monitoring and alarming on 13 new S3 CloudWatch  Metrics. Users can receive 1-minute CloudWatch Metrics, set alarms, and access dashboards to view real-time operations and performance such as bytes downloaded from  their Amazon S3 storage.

you can read my blog post related to Data migration to AWS by clicking below link.Thanks and happy reading 🙂

Cloud data migration with AWS

 

Advertisement

Cloud data migration with AWS

Data migration is a key challenge in any cloud migration and as a storage admin it always fascinated me to understand efforts it take to migrate petabytes of data to the public cloud.In this post I will try to give a brief outline of  3 out of 8 ways in which we can migrate data to Amazon web services.

AWS Direct Connect:-In AWS direct connect we will be having a dedicated network connection from your  data center premises to AWS. With the high available speeds you can either directly copy data from any of your server  to an S3 bucket using cli commands or do host based migration to any ec2 instance with sufficient number of EBS volumes.
Multiple connections can be used simultaneously for increased bandwidth or redundancy.We can also use the AWS  partner network in case AWS direct connect location is not available near to your data centers.

directcnnect
AWS direct Connect

 

AWS Import/Export Snowball:- AWS import export snowball is petabyte scale storage migration solution. AWS will ship you a storage device as shown below to you data center,  which can copy 50 or 80 TB of storage.

snowball
Snowball

once you receive a Snowball, you plug it in, connect it to your network, configure the IP address , and install the AWS Snowball client. Use the client to identify the directories you want to copy. Data will be encrypted while copying to snowball  and decrypted when AWS offloads it to S3. As per AWS it takes 21 hours to copy 80TB of data from your data source to a Snowball  by using a 10 Gbps at 80 percent network utilization. AWS has also shown use case where customer was able to migrate 1PB of data in 1 week time using multiple snowballs parallel.

AWS storage gateway:- Storage gateway is installed on a local host in your data center. It creates an on-premises virtual appliance that provides seamless and secure integration between your on-premises applications and AWS’s storage infrastructure.It can create iSCSI volumes for storing all or recently accessed data on premise for faster response while asynchronously  uploading this data to Amazon S3 in background.

GW
AWS Storage Gateway

A combination of AWS snow ball with either direct  connect or storage gateway will help in making migration much faster and easier.We can do a one time migration of data using snow ball and later make differential data update using direct connect or storage gateway.Hope this has given some basic idea on migrations with AWS solutions. Thanks for reading.

You can also read my blog post on a NextGen SSD Interface-NVMe .click on below link. Happy reading. 🙂

https://sskanth.com/2016/04/20/nvme-nextgen-sd-interface/

 

 

 

NVMe -NextGen SSD Interface

nvm What is NVMe?

let  me  first start with NVM. NVM is non volatile memory, which means all the flash drives and SSD which have  revolutionized our storage world. NVMe is a protocol to write and access data to NVM. As of now NVMe is promoted  by Group of companies that includes Cisco, Dell, EMC, HGST, Intel, Micron, Microsoft, NetApp, Oracle, PMC-Sierra, Samsung, SanDisk and Seagate.

Why did you say it is next generation SSD interface? what are we using right now?

The Small Computer System Interface (SCSI) is the most used standard for physically connecting and transferring data  for hard disk drives and tape drives from almost more than a decade. We still use same SCSI interface even for flash drives which is becoming bottle neck in utilizing flash to full potential. SCSI protocol is best for HDD, but it is loosing steam when it comes to SSD. That is why we are looking at NVMe which is developed exclusively for flash technology.

Can you explain more on how NVMe is different from SCSI?

sure, first let us try to understand difference between HDD and SSD.

HDD:A hard drive stores data on a series of spinning magnetic disks, called platters. There’s an actuator arm with read/write heads attached to it. This arm positions the  read-write heads over the correct area of the drive to read or write information. Because the drive heads must align over an area of the disk in order to read or write  data (and the disk is constantly spinning), there’s a wait time before data can be accessed. The drive may need to read from multiple locations in order to launch a program or load a file, which means it may have to wait for the platters to spin into the proper position multiple times before it can complete the command. If a drive  is asleep or in a low-power state, it can take several seconds more for the disk to spin up to full power and begin operating.

hdd

SSD:

Solid-state drives are called that specifically because they don’t rely on moving parts or spinning disks. Instead, data is saved to a pool of NAND flash.Because SSDs have no moving parts, they can operate at speeds far above those of a typical HDD.

ssd

So to access data from HDD we use SCSI protocol. SCSI sends a  command one-at-a-time and waits for platter to adjust under actuator  arm  and fetch data back. we are using same SCSI protocol for SSD also ,which is diminishing its performance. SSD can serve more IO at same time as it got no rotational component but we are using SCSI  commands which process one command at time .Here comes NVMe. NVMe parallelizes instructions. NVMe is designed to have up to 64 thousand queues. Moreover, each of those queues in turn can have up to 64 thousand commands Simultaneously. That is, at the same time. Inshort  NVMe is  exclusively developed to leverage SSD technology.

Do you have any metrics to support your claims about NVMe?

yes,you can see below graphs published by SNIA(Storage Networking Industry Association)  which shows the difference of NVMe when compared to SAS and SATA protocols for random and sequential workloads.

randomsequential

Where can i get more information about NVMe?

You can get more information about latest development in NVME from the official site http://www.nvmexpress.org/.

Anything  else?

This may be first time you hearing about NVMe but I bet you it won’t be the last. NVMe is for sure here to stay as Flash technology comes to realize its full potential. Hope you enjoyed reading about it 🙂

 

What is Software Defined Storage?

Software-Defined-Storage-for-Dummies

Any one following trends in  IT will definitely  a see new approach in data centers  towards software defined model.The new buzz word for this is SDDC-software defined data center. In SDDC all infrastructure components are virtualized and delivered as a service. Similar trend can be observed in Storage which is heart of DC called SDS-software defined storage.

The exact definition of SDS is still evolving, but the generally accepted definition is that software defined storage is “where the management and intelligence of the storage system is decoupled from the underlying physical hardware.”

sds_02

The following are the areas that will make difference with implementation of SDS.

Administration: As a storage admin I follow different process to achieve  same task(provisioning, reclamation) in  arrays manufactured from different  vendors  like EMC, NetApp, Hitachi. With implementation of SDS we can manage all storage infra in data center  from single pane and also follow same steps for  tasks irrespective of manufacturer. SDS makes extensive use of  API to  communicate to the arrays. In data centers  with private cloud implementation SDS will definitely help in improving automation and  orchestration. Example would be EMC ViPR.

Use of commodity hardware:Any  new storage array we buy  ,we end up buying license for similar set of features like Snapshots, Cloning, Replication, Data Mobility, Encryption, and Thin Provisioning. In SDS since intelligence of the storage system is decoupled from the underlying physical  hardware we can save our  costs on this repetitive  features. We can also make any  x86 commodity hardware into a robust enterprise storage with help of some SDS solutions like Data core, Nexenta.

Cloud integration: any new software or HW solution is not complete without integration to public cloud and same is the case for SDS. SDS can  be used to pool resources from cloud and also manage both  in house and public cloud assets under single pane. Since SDS is the solution that can abstract storage from underlying physical hardware it will be useful in seamless  data transfer between private to public cloud and vice versa.

In short software-defined storage solutions is a fundamental component of software defined data center , providing a range of scale-out solutions to meet rapidly growing and changing data demands.

 

 

An AWSomeday

AWS event got a lot of craze among IT folks in Hyderabad. Almost 400 odd IT engineers turned up to this event on Nov17 ,2015 at ITC Kakatiya, Begumpet, Hyderabad. I know most of folks registered for event but delegates are invited based on some selection process by AWS. This post is to give glimpse of how exactly event happened and take aways from the event.

Event started with a welcome note from Chandra balani, Head of Business development. He gave us quick view of AWS history. Chandra said that AWS is available to public from 2006 ,but prior that they used this technology to run Amazon.com site for almost a decade. So AWS has total experience of 20 years in the cloud industry.

awsomeday

The tech part started with Harshith taking the charge. He is a great guy. I remember him from last years AWSomeday event. He carry’s entire event on his shoulders with all technical stuff. He gave deeper understanding of AWS core and application services. Showed how to deploy and automate your infrastructure on the AWS Cloud. We are given a student guide with information on AWS storage, compute ,network and applications.

 

The event has it’s fun part too. There are lots of contests on twitter with hash tag #AWSomday. There are stalls by AWS partners &experts. AWS experts are really nice in answering most of the questions from delegates. Event ended with goodies presentation to lucky winners and participant certificate to delegates.

 

 

 

learning about all flash arrays

I started learning about flash arrays as all flash array started occupying data center floors and trend is going to continue for next 5 to 10 years. Gartner predicts in a report that “By 2019, 20% of traditional high-end storage arrays will be replaced by dedicated solid-state arrays (SSA).”.let me start this flash series by explaining new glossary which is mostly used in by all flash array (AFA) vendors.

1)PE cycles
2)different types of flash available now
3)over provisioning
4)compression and Dedupe
5)garbage collection

PE cycles:the life expectancy of a flash drive is expressed in program/erase( PE )cycles. flash cells wear out a little every time they are erased or programmed. This is similar to erasing same spot of paper with an eraser multiple times which may result in tearing  of paper.

Different types of flash available now:below is the picture of different types of flash available as of now and their differences.

disks

Over provisioning:-This is the inclusion of extra storage capacity in solid-state drive. That extra capacity is not visible to the host as available storage. It is like under promise and over deliver .vendor will give you with more hidden capacity which will help in distributing total number of writes  and erases over more number of flash cells. This will increase life expectancy of drive.

Compression and Dedupe: There is a thin line of difference between dedupe and compression which most people fail to understand(including me.it took me reading multiple websites to understand it 🙂 ).

Dedupe: Dedupe occurs at file level. For suppose you sent a mail with an attachment of 1mb size to 10 people. The exchange server will save only one copy of attachment and mark remaining as duplicate.this saves a lot of space as we used  just  1mb instead of 10mb.

Compression: With compression you are using some algorithm or other to reduce the size of a particular file by eliminating  redundant bits. But if your users or applications have stored the same file multiple times, then no matter how good your compression method is your storage will end up with multiple copies of the compressed files.

Garbage collection:It took me some time to understand this.I will put it my way and provide you the links of blogs where I understood better.

flash writes data in different way. when we are trying to update data ,instead of rewriting in old cells, it writes in new free cells. This will make old cells data invalid or stale. Garbage collection is a process by which this old stale data is erased and make those stale cells available for next usage.You can find better diagrammatic explanation in wiki and also in some tech blogs in below links.

Link to understand garbage collection

why do front end director ports WWN does not change after replacement in symmetrix?

EMC-Logo

 

 

why do front end director ports WWN does not change after replacement in symmetrix?

vmax_fa_all1

 

 

 

This question used to bother me for long time.I used to answer myself like their must be some mechanism EMC is using to get same WWN as old director port to even new director after replacement.Recently I found a primus which helped me to understand deep down on FA WWN.

I will follow the primus emc223285 and try decode Symmetrix WorldWide Names (WWNs) on a VMAX.

We will be using the following WWN as an example: 50000972081349AD

5000097-these hexa numbers are assigned by IEEE and is the vender UID of Symmetrix V-Max.so these
numbers same for any FA WWN of VMAX array.

now let us break down reamaining WWN 2081349AD.
Start by breaking down the hexadecimal digit WWN into binary.

2         0        8        1        3       4        9       A      D
0010 0000 1000 0001 0011 0100 1001 1010 1101

bit35 <—————————————–< bit 0

Starting from left to right, number the bits from 35 on down to 0.

Follow below screenshot along with descritpion to undesrtand well.

bit

 

Bit arrangement description:

Bits 35 through 33:Bits 35 through 33 deal with the build location of the array.  For any given WWN, one of the 3 bits will be set; the other two bits will be not be set.  If bit 35 is set, the array was made in China and the Serial Number starts with CN49xxxxxxx.  If bit 34 is set, the array was made in Europe and the Serial Number starts with CK29xxxxxxx.  Lastly if bit 33 is set, the array was made in the USA and the Serial Number starts with HK19xxxxxxx. In the example provided, bit 33 is set, which indicates the example WWN has a Serial Number starting in HK19xxxxxxx.

Bits 32 through 26:bits 32 through 26, deal with the Symmetrix Model Type.
Refer to the following chart for a breakdown of the bits.

bit 32 26

 

 

 

 

 

In the example WWN, the break down of bits 32 through 26 is 0 0 0 0 0 1 0 which indicates the WWN is HK1926xxxxx.

Bits 25 through 10:Bits 25 through 10 encode the last 5 digits of the Symmetrix Serial Number.  Take those bits and place them into a scientific calculator set to binary, convert the binary into decimal, and you will receive the last 5 digits of the Symmetrix Serial Number.  In the example, bits 26 through 10 come out to be 000 0001 0011 0100 10.  Placing this into a calculator and converting from binary to decimal yields 1234.  If the yield comes out to be a 4-digit number, front pad the number with a 0 to make 5 digits (i.e., 01234). The full Serial Number for the Symmetrix in the example WWN is HK192601234.

binarydecimal

 

 

 

 

 

Bits 9 through 6:  Bits 9 through 6 hold the encoding for the processor letter (CPU letter) for the director.  Use the break out of the bits to the chart below.

bit 9 6

 

Bits 5 through 2: Bits 5 through 2 hold the encoding for the director number for the director.  Use the break out of the bits to the chart below.

bit52

 

the last two bits:Lastly, the last two bits, bits 1 and 0, hold the encoding for the director port.  If the bits are 00, then it is the 0 or A port. If the bits are 01, then it is the 1 or B port.  In the example, it is port 1 or B.

So, finally, WWN 50000972081349AD decodes to VMAX / VMAX 20K HK192601234 director 12g port 1/B 🙂

 

what is Inquiry utility (INQ)?

EMC-Logo

 

 

So what is INQ?

Inquiry utility (INQ) is a command-line troubleshooting utility that displays information on storage devices, typically Symmetrix. By default, INQ returns the device name, Symmetrix ID, Symmetrix LUN, and capacity. This utility will operate independently of any other EMC software.Use the INQ Utility to collect system information to provide to EMC Global Services for problem troubleshooting.

But we generally use EMC grab report for that?

Yes ,even INQ is also one of several tools bundled and run as part of the host grab utilities (EMC Grab and EMCReports).

 Can we analyze INQ output?

Luckily we can analyze INQ output. below is the process. Generally we see two types of INQ output. It is Enginuity level dependent.

for older versions INQ output will be in below format

When running inq or syminq, you’ll see a column titled Ser Num. This column has quite a bit of information hiding in it.

Device                          Product                         Device

—————-        ————————-       ——————————–

Name                    Type    Vendor  ID              Rev     Ser Num         Cap(KB)

—————-        —–   ——- ———       ——- ———       ——–

/dev/dsk/c1t0d0                 EMC     SYMMETRIX       5265    73009150        459840

/dev/dsk/c1t4d0         BCV     EMC     SYMMETRIX       5265    73010150        459840

/dev/dsk/c1t5d0         GK      EMC     SYMMETRIX       5265    73019150        2880

/dev/dsk/c2t6d0         GK      EMC     SYMMETRIX       5265    7301A281        2880

Using the first and last serial numbers as examples, the serial number is broken out as follows:

73      Last two digits of the Symmetrix serial number

009     Symmetrix device number

15      Symmetrix director number. If <= 16, using the A processor

0       Port number on the director

 

If Device Serial Number = “71018000” Legend = SSVVVDDP

SS = Last 2 Digits of System S/N V = Volume Number (000 – FFF) DD = Director Number (01 – 16 is A director, 17 – 32 is B

director) P = Port (0 – 3)

 

In new INQ outputs generally we can directly find columns for array SN and  device id and device WWN(i don’t have inq output from newe version so not posting it here)

 

what if I don’t have INQ utility and can get only multipathing output?

Yes. if mutlipathing output is from powerpath then we can directly get most of details

 

#powermt display dev=all ====>  Display All Attached LUNs

 

Mostly we used to run this command powermt, which will display all the attached logical devices to the server.

 

Pseudo name=disk915

Symmetrix ID=000290103691

Logical device ID=06B8

state=alive; policy=SymmOpt; priority=0; queued-IOs=0;

==============================================================================

————— Host —————   – Stor –   — I/O Path —  — Stats —

###  HW Path               I/O Paths    Interf.   Mode    State   Q-IOs Errors

==============================================================================

3 0/4/0/0/0/1.0x5006048c52a862e7.0x40a6000000000000 c14t4d6   FA  8cB   active  alive       0      2

3 0/4/0/0/0/1.0x5006048c52a862f7.0x40a6000000000000 c15t4d6   FA  8dB   active  alive       0      2

5 0/5/0/0/0/1.0x5006048c52a862e8.0x40a6000000000000 c16t4d6   FA  9cB   active  alive       0      2

5 0/5/0/0/0/1.0x5006048c52a862f8.0x40a6000000000000 c17t4d6   FA  9dB   active  alive       0      2

 

If I have only native multipathing?

So now we have to use a bit of technique to decode  device WWN which is part of CTD addressing

General EMC device  WWN: 600009700001926055542533030363338

you can break above wwn this way 60000970000 192605542 5330 30363338

192605542-serial nuumber of array in decimal

last 8 digits-30363338 are the symm device in ASCII

ASCII to Hexa

30=0

36=6

33=3

38=8

so device is 0638

so now we can say device is 0638 from array 5542

where exactly I can use this INQ or multipathing output ?

These outputs will be helpful for us in knowing what storage devices server able to see from san side. this can help us in resolving device missing or path down tickets.

 where do I get INQ utility?

Below is the FTP link

INQ utility

what else?

Yup, I am done :). If you have more info or any corrections please feel free to mail me..

SAN QUES SET 01

dw-question-answer

1) what is FC controller and disk controller & disk array controller?
A. FC Controller is nothing but HBA (Host Bus Adapter)
The disk controller is the circuit which enables the CPU to communicate with a hard disk, floppy disk or other kind of disk drive.
A disk array controller is a device which manages the physical disk drives and presents them to the computer as logical units. It almost always implements hardware
RAID, thus it is sometimes referred to as RAID controller. It also often provides additional disk cache.

A disk array controller name is often improperly shortened to a disk controller. The two should not be confused as they provide very different functionality.

2) Difference and usage of RAID 0+1 and RAID 1+0
A)I don’t want to write answer again. There are already so many blogs explaining different raids and
there performance.

3)Default ID for SCSI HBA?
A) In a typical parallel SCSI subsystem, each device has assigned to it a unique numerical ID. As a rule, the host adapter appears as SCSI ID 7, which gives it the
highest priority on the SCSI bus (priority descends as the SCSI ID descends; on a 16-bit or “wide” bus, ID 8 has the lowest priority, a feature that maintains
compatibility with the priority scheme of the 8-bit or “narrow” bus).so answer is 7.

4) Highest and lowest priority in SCSI?
A) Each SCSI device is addressed on the bus via a specific number. For narrow SCSI (which allows up to 8 total devices), these are numbered 0 through 7; for wide SCSI
(16 devices) the numbering is 0 through 15. The priority that a device has on the SCSI bus is based on its ID number. For the first 8 IDs, higher numbers have higher
priority, so 7 is the highest and 0 the lowest. For Wide SCSI, the additional IDs from 8 to 15 again have the highest number as the highest priority, but the entire
sequence is lower priority than the numbers from 0 to 7. So the overall priority sequence for wide SCSI is 7, 6, 5, 4, 3, 2, 1 , 0, 15, 14, 13, 12, 11, 10, 9, 8.

5) SAN TOPOLOGY?
A) We have different types of topologies like core edge and mesh topology.

6) How many disks are minimum for Raid 5?
A) 3 disks are minimum for RAID 5.

7)Can Hotspare assigned for RAID0?
A) For Raid0 we don’t have any parity. so there is no point in assigning hot spare since we cannot rebuild data from failed disk.

8) What is HA?
A) HA means High Availbiliy.We do lot of things to maintain High Availbiliy in SAN like maintaining redundan fabrics, raid system, hot spares.
All these things help us in avoiding single point of failures and allows system to handle faults of failures.