You are currently browsing the tag archive for the ‘a-sis’ tag.
This past week, during EMC World 2010 in Boston, EMC made several announcements of updates to the Celerra and CLARiiON midrange platforms. Some of the most impressive were new capabilities coming to CLARiiON FLARE in just a couple short months. Major updates to Celerra DART will coincide with the FLARE updates and if you are already running CLARiiON CX4 hardware, or are evaluating CX4 (or Celerra), you will want to check these new features out. They will be available to existing CX4(120,240,480,960)/NS(120,480,960) systems as part of a software update.
Here’s a list of key changes in FLARE 30:
- Unified management for midrange storage platforms including CLARiiON and Celerra today, plus RecoverPoint, Replication Manager and more in the future. This is a true single pane of glass for monitoring AND managing SAN, NAS, and data protection and it’s built in to the platform. “EMC Unisphere” replaces Navisphere Manager and Celerra Manager and supports multiple storage systems simultaneously in a single window. (Video Demo)
- Extremely large cache (ie: FASTCache) – Up to 2TB of additional read/write cache in CLARiiON using SSDs (Video Demo)
- Block level Fully Automated Storage Tiering (ie: sub-LUN FAST) – Fully automated assignment of data across multiple disk types
- Block Level Compression – Compress LUNs in the CLARiiON to reduce disk space requirements
- VAAI Support – Integrate with vSphere ESX for improved performance
These features are in addition to existing features like:
- Seamless and non-disruptive mobility of LUNs within a storage array – (via Virtual LUNs)
- Non-Disruptive Data Migration – (via PowerPath Migration Enabler)
- VMWare Aware Storage Management – (Navisphere, Unisphere, and vSphere Plugins giving complete visibility and self-service provisioning for VMWare admins (Video Demo) AND Storage Admins
- CIFS and NFS Compression – Compress production data on Celerra to reduce disk space requirements including VMs
- Dynamic SAN path load balancing – (via PowerPath)
- At-Rest-Encryption – (via PowerPath w/RSA)
- SSD, FC, and SATA drives in the same system – Balance performance and capacity as needed for your application
- Local and Remote replication with array level consistency – (SnapView, MirrorView, etc)
- Hot-swap, Hot-Add, Hot-Upgrade IO Modules – Upgrade connectivity for FC, FCoE, and iSCSI with no downtime
- Scale to 1.8PB of storage in a single system
- Simultaneously provide FC, iSCSI, MPFS, NFS, and CIFS access
All together, this is an impressive list of features for a single platform. In fact, while many of EMC’s competitors have similar features, none of them have all of them in the same platform, or leverage them all simultaneously to gain efficiency. When CLARiiON CX4 and Celerra NS are integrated and managed as a single Unified storage system with EMC Unisphere there is tremendous value as I’ll point out below…
Improve Performance easily…
- Install a couple SSD drives into a CLARiiON and enable FASTCache to increase the array’s read/write cache from the industry competive 4GB-32GB up to 2TB of array based non-volatile Read AND Write cache available to ALL applications including NAS data hosted by the array.
- Install PowerPath on Windows, Linux, Solaris, AND VMWare ESX hosts to automatically balance IO across all available paths to storage. PowerPath detects latency and queuing occuring on each path and adjusts automatically, improving performance at the storage array AND for your hosts. This is a huge benefit in VMWare environments especially.
- When VMWare releases the updated version of vSphere ESX that supports VAAI, ESX will be able to leverage VAAI support in the CLARiiON to reduce the amount of IO required to do many tasks, improving performance across the environment again.
- Upgrade from 1gbe iSCSI to 10gbe iSCSI, or from 4gbe FiberChannel to 8gbe FiberChannel, without a screwdriver or downtime.
- Provide NAS shared file access with block-level performance for any application using EMC’s MPFS protocol.
Improve Efficiency and cost easily…
- Create a single pool of storage containing some SSD, some FC, and some SATA drives, that automatically monitors and moves portions of data to the appropriate disk type to both improve performance AND decrease cost simultaneously.
- Non-disruptively compress volumes and/or files with a single click to save 50% of your disk space in many cases.
- Convert traditional LUNs to more efficient Thin-LUNs non-disruptively using PowerPath Migration Enabler, saving more disk space.
Increase and Manage Capacity easily…
- Add additional storage non-disruptively with SSD, FC, and SATA drives in any mix up to 1.8PB of raw storage in a single CLARiiON CX4.
- Using FASTCache, iSCSI, FC, and FCoE connectivity simultaneously does not reduce total capacity of the system.
- Expanding LUNs, RAID Groups, and Storage Pools is non-disruptive.
- Migrating LUNs between RAID groups and/or Storage Pools is non-disruptive using built-in CLARiiON LUN Migration, as is migrating data to a different storage array (using PowerPath Migration Enabler)!
- Balancing workload between storage processors is non-disruptive and at individual LUN granularity.
Protect your data easily…
- Snapshot, Clone, and Replicate any of the data to anywhere with built in array tools that can maintain complete data consistency across a single, or multiple applications without installing software.
- Maintain application consistency for Exchange, SQL, Oracle, SAP, and much more, even within VMWare VMs, while replicating to anywhere with a single pane-of-glass.
- Encrypt sensitive data seamlessly using PowerPath Encryption w/RSA.
- While you can do all of these things quickly and simply, you still have the flexibility to create traditional RAID sets using RAID 0, 1, 5, 6, and 10 where you need highly predicable performance, or tune read and write cache at the array and LUN level for specific workloads. Do you want read/write snapshots? How about full copy clones on completely separate disks for workload isolation and failure protection? What about the ability to rollback data to different points in time using snapshots without deleting any other snapshots? EMC Storage arrays have been able to do this for a long time and that hasn’t changed.
There are few manufacturers aside from EMC that can provide all of these capabilities, let alone provide them within a single platform. That’s the definition of simple, efficient, Unified Storage in my opinion.
This is the 3rd part of a multi-part discussion on capacity vs performance in SAN environments. My previous post discussed the use of thin provisioning to increase storage utilization. Today we are going to focus on a newer technology called Data De-Duplication.
Data De-Duplication can be likened to an advanced form of compression. It is a way to store large amounts of data with the least amount of physical disk possible.
De-duplication technology was originally targeted at lowering the cost of disk-based backup. DataDomain (recently acquired by EMC Corp) was a pioneer in this space. Each vendor has their own implementation of de-duplication technology but they are generally similar in that they take raw data, look for similarities in relatively small chunks of that data and remove the duplicates. The diagram below is the simplest one I could find on the web. You can see that where there were multiple C, D, B, etc blocks in the original data, the final “de-duplicated” data has only one of each. The system then stores metadata (essentially pointers) to track what the original data looked like for later reconstruction.
The first and most widely used implementations of de-dupe technology were in the backup space where much of the same data is being backed up during each backup cycle and many days of history (retention) must be maintained. Compression ratios using de-duplication alone can easily exceed 10:1 in backup systems. The neat thing here is that when the de-duplication technology works at the block-level (rather than file-level) duplicate data is found across completely disparate data-sets. There is commonality between Exchange email, Microsoft Word, and SQL data for example. In a 24 hour period, backing up 5.7TB of data to disk, the de-dupe ratio in my own backup environment is 19.2X plus an additional 1.7X of standard compression on the post de-dupe’d data, consuming only 173.9GB of physical disk space. The entire set of backed up data, totaling 106TB currently stored on disk, consumes only 7.5TB of physical disk. The benefits are pretty obvious as you can see how we can store large amounts of data in much less physical disk space.
There are numerous de-duplication systems available for backup applications — DataDomain, Quantum DXi, EMC DL3D, NetApp VTL, IBM Diligent, and several others. Most of these are “target-based” de-duplication systems because they do all the work at the storage layer with the primary benefit being better use of disk space. They also integrate easily into most traditional backup environments. There are also “source-based” de-duplication systems — EMC Avamar and Symantec PureDisk are two primary examples. These systems actually replace your existing backup application entirely and perform their work on the client machine that is being backed up. They save disk space just like the other systems but also reduce bandwidth usage during the backup which is extremely useful when trying to get backups of data across a slow network connection like a WAN.
So now you know why de-duplication is good, and how it helps in a backup environment.. But what about using it for primary storage like NAS/SAN environments? Well it turns out several vendors are playing in that space as well. NetApp was the first major vendor to add de-duplication to primary storage with their A-SIS (Advanced Single Instance Storage) product. EMC followed with their own implementation of de-duplication on Celerra NAS. They are entirely different in their implementation but attempt to address the same problem of ballooning storage requirements.
EMC Celerra de-dupe performs file-level single-instancing to eliminate duplicate files in a filesystem, and then uses a proprietary compression engine to reduce the size of the files themselves. Celerra does not deal with portions of files. In practice, this feature can significantly reduce the storage requirements for a NAS volume. In a test I performed recently for storing large amounts of Syslog data, Celerra de-dupe easily saved 90% of the disk space consumed by the logs and it hadn’t actually touched all of the files yet.
NetApp’s A-SIS works at a 4KB block size and compares all data within a filesystem regardless of the data type. Besides NAS shares, A-SIS also works on block volumes (ie: FiberChannel and iSCSI LUNs) where EMC’s implementation does not. Celerra has an advantage when working with files which contain high amounts of duplication in very small block sizes (like 50 bytes) since NetApp looks at 4KB chunks. Celerra’s use of a more traditional compression engine saves more space in the syslog scenario but NetApp’s block level approach could save more space than Celerra when dealing with lots of large files.
The ability to work on traditional LUNs presents some interesting opportunities, especially in a VMWare/Hyper-V environment. As I mentioned in my previous post, virtual environments have lots of redundant data since there are many systems potentially running the same operating system sharing the disk subsystem. If you put 10 Windows virtual machines on the same LUN, de-duplication will likely save you tons of disk space on that LUN. There are limitations to this that prevent the full benefits from being realized. VMWare best practices require you to limit the number of virtual machine disks sharing the same SAN lun for performance reasons (VMFS locking issues) and A-SIS can only de-dupe data within a LUN but not across multiple LUNs. So in a large environment your savings are limited. NetApp’s recommendation is to use NFS NAS volumes for VMWare instead of FC or iSCSI LUNs because you can eliminate the VMFS locking issue and place many VMs on a single very large NFS volume which can then be de-duplicated. Unfortunately there are limits on certain VMWare features when using NFS so this may not be an option for some applications or environments. Specifically, VMWare Site Recovery Manager which coordinates site-to-site replication and failover of entire VMWare environments does not currently support NFS as of this writing.
When it comes to de-duplication’s impact on performance it’s kind of all over the map. In backup applications, most target based systems either perform the work in memory while the data is coming in or as a post-process job that runs when the backups for that day have completed. In either case, the hardware is designed for high throughput and performance is not really a problem. For primary data, both EMC and NetApp’s implementations are post-process and do not generally impact write performance. However, EMC has limitations on the size of files that can be de-duplicated before a modification to a compressed file causes a significant delay. Since they also limit de-duplication to files that have not been accessed or modified for some period of time, the problem is minimal in most environments. NetApp claims to have little performance impact to either read or writes when using A-SIS. This has much to do with the architecture of the NetApp WAFL filesystem and how A-SIS interacts with it but it would take an entirely new post to describe how that all works. Suffice it to say that NetApp A-SIS is useful in more situations than EMC’s Celerra De-duplication.
Where I do see potential problems with performance regardless of the vendor is in the same situation as thin provisioning. If your application requires 1000 IOPS but you’ve only got 2 disks in the array because of the disk space savings from thin-provisioning and/or de-duplication, the application performance will suffer. You still need to service the IOPS and each disk has a finite number of IOPS (100-200 generally for FC/SCSI). Flash/SSD changes the situation dramatically however.
Right now I believe that de-duplication is extremely useful for backups, but not quite ready for prime-time when it comes to primary storage. There are just too many caveats to make any general recommendations. If you happen to purchase an EMC Celerra or NetApp FAS/IBM nSeries that supports de-duplication, make sure to read all the best-practices documentation from the vendor and make a decision on whether your environment can use de-duplication effectively, then experiment with it in a lab or dev/test environment. It could save you tons of disk space and money or it could be more trouble than it’s worth. The good thing is that it’s pretty much a free option from EMC and NetApp depending on the hardware you own/purchase and your maintenance agreements.