You are currently browsing the tag archive for the ‘thin provisioning’ tag.
I have a customer who just recently upgraded their EMC Celerra NS480 Unified Storage Array (based on Clariion CX4-480) to FLARE30 and enabled FASTCache across the array, as well as FASTVP automated tiering for a large amount of their block data. Now that it’s been configured and the customer has performed a large amount of non-disruptive migrations of data from older RAID groups and VP pools into the newer FASTVP pool, including thick-to-thin conversions, I was able to get some performance data from their array and thought I’d share these results.
This is Real-World data
This is NOT some edge case where the customer’s workload is perfect for FASTCache and FASTVP and it’s also NOT a crazy configuration that would cost an arm and a leg. This is a real production system running in a customer datacenter, with a few EFDs split between FASTCache and FASTVP and some SATA to augment capacity in the pool for their existing FC based LUNS. These are REAL results that show how FASTVP has distributed the IO workload across all available disks and how a relatively small amount of FASTCache is absorbing a decent percentage of the total array workload.
This NS480 array has nearly 480 drives in total and has approximately 28TB of block data (I only counted consumed data on the thin LUNs) and about 100TB of NAS data. Out of the 28TB of block LUNs, 20TB is in Virtual Pools, 14TB of which is in a single FASTVP Pool. This array supports the customers’ ERP application, entire VMWare environment, SQL databases, and NAS shares simultaneously.
In this case FASTCache has been configured with just 183GB of usable capacity (4 x 100GB EFD disks) for the entire storage array (128TB of data) and is enabled for all LUNs and Pools. The graphs here are from a 4 hour window of time after the very FIRST FASTVP re-allocation completed using only about 1 days’ worth of statistics. Subsequent re-allocations in the FASTVP pool will tune the array even more.
First, let’s take a look at the array as a whole, here you can see that the array is processing approximately ~10,000 IOPS through the entire interval.
FASTCache is handling about 25% of the entire workload with just 4 disks. I didn’t graph it here but the total array IO Response time through this window is averaging 2.5 ms. The pools and RAID Groups on this array are almost all RAID5 and the read/write ratio averages 60/40 which is a bit write heavy for RAID5 environments, generally speaking.
If you’ve done any reading about EMC FASTCache, you probably know that it is a read/write cache. Let’s take a look at the write load of the array and see how much of that write load FASTCache is handling. In the following graph you can see that out of the ~10,000 total IOPS, the array is averaging about 2500-3500 write IOPS with FASTCache handling about 1500 of that total.
That means FASTCache is reducing the back-end writes to disk by about 50% on this system. On the NS480/CX4-480, FASTCache can be configured with up to 800GB usable capacity, so this array could see higher overall performance if needed by augmenting FASTCache further. Installing and upgrading FASTCache is non-disruptive so you can start with a small amount and upgrade later if needed.
FASTVP and FASTCache Together
Next, we’ll drill down to the FASTVP pool which contains 190 total disks (5 x EFD, 170 x FC, and 15 x SATA). There is no maximum number of drives in a Virtual Pool on FLARE30 so this pool could easily be much larger if desired. I’ve graphed the IOPS-per-tier as well as the FASTCache IOPS associated with just this pool in a stacked graph to give an idea of total throughput for the pool as well as the individual tiers.
The pool is servicing between 5,000 and 8,000 IOPS on average which is about half of the total array workload. In case you didn’t already know, FASTVP and FASTCache work together to make sure that data is not duplicated in EFDs. If data has been promoted to the EFD tier in a pool, it will not be promoted to FASTCache, and vise-versa. As a result of this intelligence, FASTCache acceleration is additive to an EFD-enabled FASTVP pool. Here you can see that the EFD tier and FASTCache combined are servicing about 25-40% of the total workload, the FC tier another 40-50%, and the SATA tier services the remaining IOPS. Keep in mind that FASTCache is accelerating IO for other Pools and RAID Group LUNs in addition to this one, so it’s not dedicated to just this pool (although that is configurable.)
FASTVP IO Distribution
Lastly, to illustrate FASTVP’s effect on IO distribution at the physical disk layer, I’ve broken down IOPS-per-spindle-per-tier for this pool as well. You can see that the FC disks are servicing relatively low IO and have plenty of head room available while the EFD disks, also not being stretched to their limits, are servicing vastly more IOPS per spindle, as expected. The other thing you may have noticed here is that the EFDs are seeing the majority of the workload’s volatility, while the FC and SATA disks have a pretty flat workload over time. This illustrates that FASTVP has placed the more bursty workloads on EFD where they can be serviced more effectively.
Hopefully you can see here how a very small amount of EFDs used with both FASTCache and FASTVP can relieve a significant portion of the workload from the rest of the disks. FASTCache on this system adds up to only 0.14% of the total data set size and the EFD tier in the FASTVP pool only accounts for 2.6% of the total dataset in that pool.
What do you think of these results? Have you added FASTCache and/or FASTVP to your array? If so, what were your results?
Yesterday, In his blog posted entitled “Myth Busting: Storage Guarantees“, Vaughn Stewart from NetApp blogged about the EMC 20% Guarantee and posted a chart of storage efficiency features from EMC and NetApp platforms to illustrate his point. Chuck Hollis from EMC called it “chartsmithing” in comment but didn’t elaborate specifically on the charts deficiencies. Well allow me to take that ball…
As presented, Vaughn’s chart (below) is technically factual (with one exception which I’ll note), but it plays on the human emotion of Good vs Bad (Green vs Red) by attempting to show more Red on EMC products than there should be.
The first and biggest problem is the chart compares EMC Symmetrix and EMC Clariion dedicated-block storage arrays with NetApp FAS, EMC Celerra, and NetApp vSeries which are all Unified storage systems or gateways. Rather than put n/a or leave the field blank for NAS features on the block-only arrays, the chart shows a resounding and red NO, leading the reader to assume that the feature should be there but somehow EMC left it out.
As far as keeping things factual, some of the EMC and NetApp features in this chart are not necessarily shipping today (very soon though, and since it affects both vendors I’ll allow it here). And I must make a correction with respect to EMC Symmetrix and Space Reclamation, which IS available on Symm today.
I’ve taken the liberty of massaging Vaughn’s chart to provide a more balanced view of the feature comparison. I’ve also added EMC Celerra gateway on Symmetrix to the comparison as well as an additional data point which I felt was important to include.
1.) I removed the block only EMC configuration devices because the NetApp devices in the comparison are Unified systems.
2.) I removed the SAN data row for Single Instance storage because Single Instance (identical file) data reduction technology is inherently NAS related.
3.) Zero Space Reclamation is a feature available in Symmetrix storage. In Clariion, the Compression feature can provide a similar result since zero pages are compressible.
I left the 3 different data reduction techniques as individually listed even though the goal of all of them is to save disk space. Depending on the data types, each method has strengths and weaknesses.
One question, if a bug in OnTap causes a vSeries to lose access to the disk on a Symmetrix during an online Enginuity upgrade, who do you call? How would you know ahead of time if EMC hasn’t validated vSeries on Symmetrix like EMC does with many other operating systems/hosts/applications in eLab?
The goal if my post here really is to show how the same data can be presented in different ways to give readers a different impression. I won’t get into too much as far as technical differences between the products, like how comparing FAS to Symmetrix is like comparing a box truck to a freight train, or how fronting an N+1 loosely coupled clustered, global cached, high-end storage array with a midrange dual-controller gateway for block data might not be in a customer’s best interest.
What do you think?
A comment about HDS’s Zero Page Reclaim on one of my previous posts got me thinking about the effectiveness of thin provisioning in general. In that previous post, I talked about the trade-offs between increased storage utilization through the use of thin-provisioning and the potential performance problems associated with it.
There are intrinsic benefits that come with the use of thin provisioning. First, new storage can be provisioned for applications without nearly as much planning. Next, application owners get what they want, while storage admins can show they are utilizing the storage systems effectively. Also, rather than managing the growth of data in individual applications, storage admins are able to manage the growth of data across the enterprise as a whole.
Thin provisioning can also provide performance benefits… For example, consider a set of virtual Windows servers running across several LUNs contained in the same RAID group. Each Windows VM stores its OS files in the first few GB of their respective VMDK files. Each VMDK file is stored in order in each LUN, with some free space at the end. In essence, we have a whole bunch of OS sections separated by gaps of no data. If all VMs were booting at approximately the same time, the disk heads would have to move continuously across the entire disk, increasing disk latency.
Now take the same disks, configured as a thin pool, and create the same LUNs (as thin LUNs) and the same VMs. Because thin-provisioning in general only writes data to the physical disks as it’s being written by the application, starting from the beginning of the disk, all of those Windows VMs’ OS files will be placed at the beginning of the disks. This increased data locality will reduce IO latency across all of the VMs. The effect is probably minor, but reduced disk latency translates to possibly higher IOPS from the same set of physical disks. And the only change is the use of thin-provisioning.
So back to HDS Zero Page Reclaim. The biggest problem with thin provisioning is that it doesn’t stay thin for long. Windows NTFS, for example, is particularly NOT thin-friendly since it favors previously untouched disk space for new writes rather than overwriting deleted files. This activity eventually causes a thin-LUN to grow to it’s maximum size over time, even though the actual amount of data stored in the LUN may not change. And Windows isn’t the only one with the problem. This means that thin provisioning may make provisioning easier, or possibly improve IO latency, but it might not actually save you any money on disk. This is where HDS’s Zero Page Reclaim can help. Hitachi’s Dynamic Provisioning (with ZPR) can scan a LUN for sections where all the bytes are zero and reclaim that space for other thin LUNs. This is particularly useful for converting thick LUNs into thin LUNs. But, it can only see blocks of zeros, and so it won’t necessarily see space freed up by deleting files. Hitachi’s own documentation points out that many file systems are not-thin friendly, and ZPR won’t help with long-term growth of thin LUNs caused by actively writing and then deleting data.
Although there are ways to script the writing of zeros to free space on a server so that ZPI can reclaim that space, you would need to run that script on all of your servers, requiring a unique tool for each operating system in your environment. The script would also have to run periodically, since the file system will grow again afterward.
NetApp’s SnapDrive tool for Windows can scan an NTFS file system, detect deleted files, then report the associated blocks back to the Filer to be added back to the aggregate for use by other volumes/LUNs. The Space Reclamation scan can be run as needed, and I believe it can be scheduled; but, it appears to be Windows only. Again, this will have to be done periodically.
But what if you could solve the problem across most or all of your systems, regardless of operating system, regardless of application, with real-time reclamation? And what if you could simultaneously solve other problems? Enter Symantec’s Storage Foundation with Thin-Reclamation API. Storage Foundation consists of VxFS, VxVM, DMP, and some other tools that together provide dynamic grow/shrink, snapshots, replication, thin-friendly volume usage, and dynamic SAN multipathing across multiple operating systems. Storage Foundation’s Thin-Reclamation API is to thin-provisioning what OST is to Backup Deduplication. Storage vendors can now add near-real-time zero page reclaim for customers that are willing to deploy VxFS/VxVM on their servers. For EMC customers, DMP can replace PowerPath, thereby offsetting the cost.
As far as I know, 3PAR is the first and only storage vendor to write to Symantec’s thin-API, which means they now have the most dynamic, non-disruptive, zero-page-reclaim feature set on the market. As a storage engineer myself, I have often wondered if VxVM/VxFS could make management of application data storage in our diverse environment easier and more dynamic. Adding Thin-Reclamation to the mix makes it even more attractive. I’d like to see more storage vendors follow 3PAR’s lead and write to Symantec’s API. I’d also like to see Symantec open up both OST and the Thin-Reclamation API for others to use, but I doubt that will happen.
Do you have an EMC Clariion CX4 with Virtual Provisioning (thin provisioning)? Have you tried to expand the host visible (ie: maximum) size of a thin-LUN and can’t figure out how? Well you aren’t alone… Despite extremely little information available from EMC to say one way or the other, I finally figured out that you actually can’t expand a thin lun yet. This was a surprise to me, since I had just assumed that would be there. Thin-LUNs are pretty much virtual LUNs, and as such, they don’t have any direct block mapping to a RAID group that has to be maintained. And traditional LUNs can be expanded using MetaLUN striping or concatenation. Due to this restriction the host visible size of a thin-LUN cannot be edited after the LUN has been created.
But there IS a workaround. It’s not perfect but its all we have right now. Using CLARiiON’s built-in LUN Migration technology you can expand a thin-LUN in two steps.
Step 1: Migrate the thin-LUN to a thick-LUN of the same maximum size.
Step 2: Migrate the thick-LUN to a new thin-LUN that was created with the new larger size you want. After migration, the thin-LUN will consume disk space equivalent to the old thin-LUN’s maximum size, but will have a new, higher host maximum visible size.
This requires that you have a RAID group outside of your thin pools that has enough usable free space to fit the temporary thick-LUN, so it’s not a perfect solution. You’d think that migrating the old thin-LUN directly into a new, larger thin-LUN would work, but to use the additional disk space after a LUN migration, you have to edit the LUN size which, again, can’t be done on thin-LUNs. I haven’t actually tested this, but this is based on all of the documentation I could find from EMC on the topic.
I’m looking into other methods that might be better, but so far it seems that certain restrictions on SnapView Clones and SANCopy might preclude those from being used. The ability to expand thin-LUNs will come in a later FLARE release for those that are willing to wait.
This is the 3rd part of a multi-part discussion on capacity vs performance in SAN environments. My previous post discussed the use of thin provisioning to increase storage utilization. Today we are going to focus on a newer technology called Data De-Duplication.
Data De-Duplication can be likened to an advanced form of compression. It is a way to store large amounts of data with the least amount of physical disk possible.
De-duplication technology was originally targeted at lowering the cost of disk-based backup. DataDomain (recently acquired by EMC Corp) was a pioneer in this space. Each vendor has their own implementation of de-duplication technology but they are generally similar in that they take raw data, look for similarities in relatively small chunks of that data and remove the duplicates. The diagram below is the simplest one I could find on the web. You can see that where there were multiple C, D, B, etc blocks in the original data, the final “de-duplicated” data has only one of each. The system then stores metadata (essentially pointers) to track what the original data looked like for later reconstruction.
The first and most widely used implementations of de-dupe technology were in the backup space where much of the same data is being backed up during each backup cycle and many days of history (retention) must be maintained. Compression ratios using de-duplication alone can easily exceed 10:1 in backup systems. The neat thing here is that when the de-duplication technology works at the block-level (rather than file-level) duplicate data is found across completely disparate data-sets. There is commonality between Exchange email, Microsoft Word, and SQL data for example. In a 24 hour period, backing up 5.7TB of data to disk, the de-dupe ratio in my own backup environment is 19.2X plus an additional 1.7X of standard compression on the post de-dupe’d data, consuming only 173.9GB of physical disk space. The entire set of backed up data, totaling 106TB currently stored on disk, consumes only 7.5TB of physical disk. The benefits are pretty obvious as you can see how we can store large amounts of data in much less physical disk space.
There are numerous de-duplication systems available for backup applications — DataDomain, Quantum DXi, EMC DL3D, NetApp VTL, IBM Diligent, and several others. Most of these are “target-based” de-duplication systems because they do all the work at the storage layer with the primary benefit being better use of disk space. They also integrate easily into most traditional backup environments. There are also “source-based” de-duplication systems — EMC Avamar and Symantec PureDisk are two primary examples. These systems actually replace your existing backup application entirely and perform their work on the client machine that is being backed up. They save disk space just like the other systems but also reduce bandwidth usage during the backup which is extremely useful when trying to get backups of data across a slow network connection like a WAN.
So now you know why de-duplication is good, and how it helps in a backup environment.. But what about using it for primary storage like NAS/SAN environments? Well it turns out several vendors are playing in that space as well. NetApp was the first major vendor to add de-duplication to primary storage with their A-SIS (Advanced Single Instance Storage) product. EMC followed with their own implementation of de-duplication on Celerra NAS. They are entirely different in their implementation but attempt to address the same problem of ballooning storage requirements.
EMC Celerra de-dupe performs file-level single-instancing to eliminate duplicate files in a filesystem, and then uses a proprietary compression engine to reduce the size of the files themselves. Celerra does not deal with portions of files. In practice, this feature can significantly reduce the storage requirements for a NAS volume. In a test I performed recently for storing large amounts of Syslog data, Celerra de-dupe easily saved 90% of the disk space consumed by the logs and it hadn’t actually touched all of the files yet.
NetApp’s A-SIS works at a 4KB block size and compares all data within a filesystem regardless of the data type. Besides NAS shares, A-SIS also works on block volumes (ie: FiberChannel and iSCSI LUNs) where EMC’s implementation does not. Celerra has an advantage when working with files which contain high amounts of duplication in very small block sizes (like 50 bytes) since NetApp looks at 4KB chunks. Celerra’s use of a more traditional compression engine saves more space in the syslog scenario but NetApp’s block level approach could save more space than Celerra when dealing with lots of large files.
The ability to work on traditional LUNs presents some interesting opportunities, especially in a VMWare/Hyper-V environment. As I mentioned in my previous post, virtual environments have lots of redundant data since there are many systems potentially running the same operating system sharing the disk subsystem. If you put 10 Windows virtual machines on the same LUN, de-duplication will likely save you tons of disk space on that LUN. There are limitations to this that prevent the full benefits from being realized. VMWare best practices require you to limit the number of virtual machine disks sharing the same SAN lun for performance reasons (VMFS locking issues) and A-SIS can only de-dupe data within a LUN but not across multiple LUNs. So in a large environment your savings are limited. NetApp’s recommendation is to use NFS NAS volumes for VMWare instead of FC or iSCSI LUNs because you can eliminate the VMFS locking issue and place many VMs on a single very large NFS volume which can then be de-duplicated. Unfortunately there are limits on certain VMWare features when using NFS so this may not be an option for some applications or environments. Specifically, VMWare Site Recovery Manager which coordinates site-to-site replication and failover of entire VMWare environments does not currently support NFS as of this writing.
When it comes to de-duplication’s impact on performance it’s kind of all over the map. In backup applications, most target based systems either perform the work in memory while the data is coming in or as a post-process job that runs when the backups for that day have completed. In either case, the hardware is designed for high throughput and performance is not really a problem. For primary data, both EMC and NetApp’s implementations are post-process and do not generally impact write performance. However, EMC has limitations on the size of files that can be de-duplicated before a modification to a compressed file causes a significant delay. Since they also limit de-duplication to files that have not been accessed or modified for some period of time, the problem is minimal in most environments. NetApp claims to have little performance impact to either read or writes when using A-SIS. This has much to do with the architecture of the NetApp WAFL filesystem and how A-SIS interacts with it but it would take an entirely new post to describe how that all works. Suffice it to say that NetApp A-SIS is useful in more situations than EMC’s Celerra De-duplication.
Where I do see potential problems with performance regardless of the vendor is in the same situation as thin provisioning. If your application requires 1000 IOPS but you’ve only got 2 disks in the array because of the disk space savings from thin-provisioning and/or de-duplication, the application performance will suffer. You still need to service the IOPS and each disk has a finite number of IOPS (100-200 generally for FC/SCSI). Flash/SSD changes the situation dramatically however.
Right now I believe that de-duplication is extremely useful for backups, but not quite ready for prime-time when it comes to primary storage. There are just too many caveats to make any general recommendations. If you happen to purchase an EMC Celerra or NetApp FAS/IBM nSeries that supports de-duplication, make sure to read all the best-practices documentation from the vendor and make a decision on whether your environment can use de-duplication effectively, then experiment with it in a lab or dev/test environment. It could save you tons of disk space and money or it could be more trouble than it’s worth. The good thing is that it’s pretty much a free option from EMC and NetApp depending on the hardware you own/purchase and your maintenance agreements.
In my previous post, where I discussed the problem of unusable (or slack) disk space on a SAN, I promised a follow-up with techniques on how to increase storage utilization. I realized that I should discuss some related technologies first and then follow that up with how to put it all together. So today I start by talking about Thin Provisioning. I will then follow up with an explanation of De-Duplication and finally talk about how to use multiple technologies together to get the most use out of your storage.
So what is Thin Provisioning? It is a technology that allows you to create LUNs or Volumes on a storage device such that the LUN/Volume(s) appear to the host or client to be larger than they actually are. In general, NAS clients and SAN attached hosts see “Thin Provisioned” LUNs just as they see any other LUN but the actual amount of disk space used on the storage device can be significantly smaller than the provisioned size. How does this help increase storage utilization? Well, with thin provisioning you provide applications with exactly the storage they want and/or need but you don’t have to purchase all of the disk capacity up front.
Let’s start with a comparison of using standard LUNs vs thin LUNs with a theoretical application set:
Say we have 3 servers, each running Windows Server. The operating system partition is on local disk and application data drives are on SAN. Each server runs an application that collects and stores data over time and the application owner expects that over the next year or so the data will grow to 1TB on each server. In this particular case we also know that the application’s performance requirements are relatively low.
With traditional provisioning we might create 3 LUNs that are 1TB each and present them to the servers. This provides the application with room for the expected growth. Using 300GB FC disks we can carve out three 4+1 RAID5 sets, create one LUN in each and it would work fine. Alternatively we could use wide striping (ie: a MetaLUN on EMC Clariion) and put all three LUNs on the same 15 disks. Either way we’ve just burned 15 disks on the storage array based on uncertain future requirements. If we were stingier with storage we could create smaller LUNs (500GB for example) and use LUN expansion technology to increase the size when the application data fills the disk to that capacity.
In the Thin Provisioning world we still create three 1TB LUNs but they would start out by taking no space. The pool of disk that the LUNs get provisioned from doesn’t even need to have 3TB of capacity. As the application data grows over the next 12 months or longer the pool size only needs to grow to accommodate the actual amount of data stored. Depending on the storage array, we can add disks to the pool one at a time. So on day one we start with 3 disks in the pool, and then add additional disks one by one throughout the year. We can then create additional LUNs for other applications without adding disks. As we add disks to the pool, we expand the capacity available for all of the LUNs to grow (up to each LUN’s maximum size) and we increase performance for ALL of the LUNs in the pool since we are adding spindles. The real-world benefits come as we consolidate numerous LUNs into a single disk pool.
The nice thing about this approach is that we stop managing the size of individual LUNs and just manage the underlying disk pool as a whole. And the cost-per-GB for SAN disk constantly goes down so we can spend only what we have to today, and when we add more later it will likely be a little cheaper. Disk capacity utilization will be much higher in a thin model compared with the traditional/thick model.
The story gets even better in a virtual server environment such as with MS Hyper-V or VMWare ESX. First, the virtual server OS drives are on the SAN in addition to the application data, and there can be multiple virtual disks on the same LUN. Whether physical or virtual, we need to maintain some free space in the disks to keep applications running, plus with virtual systems we need some free space on the LUN for features of the virtualization technology like snapshots. The net effect is that in a virtualized environment, disk utilization never gets much above 50% when slack space at both the virtual layer and inside the virtual servers is considered. With thin provisioning we could potentially store twice the number of virtual servers on the same physical disks.
There are caveats of course. Maintaining performance is the primary concern. Whether used in a thick LUN or thin LUN, each disk has a specific amount of performance. Thin provisioning has no effect on the amount of IOPS or bandwidth the application requires nor the amount of IOPS the physical disk can handle. So even if thin provisioning saves 50% disk space in your environment, you may not be able to use all of that reclaimed space before running into performance bottlenecks. If the storage array has QOS features (ie: EMC Clariion NQM) it is possible to prioritize the more important applications in your disk pool to maintain performance where it matters.
Other problems that you may encounter have to do with interoperability. For starters, some applications are not “thin-friendly”; ie: they write data in such a way as to negate any benefit that thin provisioning provides. Also, while many storage arrays support thin provisioning, each has different rules about the use of thin LUNs. For example, in some scenarios you can’t replicate thin LUNs using native array tools. It pays to do your homework before choosing a new storage array or implementing thin provisioning.
I didn’t cover thin provisioning in NAS environments directly but the feature works in the same manner. Thin volumes are provisioned from pools of storage and users/clients see a large amount of available disk space even if the disk pool itself is very small. Since NAS is traditionally used for user home directories and departmental shares, absolute performance is usually not as much of a concern so thin provisioning is much easier to implement and in many cases is the default behavior or simply a check box on NAS appliances like EMC Celerra or NetApp FAS.
Thin provisioning is a powerful technology when used where it makes sense. In my next post I’ll explain de-duplication technology and then talk about how these technologies can be used together plus some workarounds for the caveats that I’ve mentioned.