You are currently browsing the tag archive for the ‘thin’ tag.
Some customers are afraid of thin provisioning…
Practically every week I have discussions with customers about leveraging thin provisioning to reduce their storage costs and just as often the customer pushes back worried that some day, some number of applications, for some reason, will suddenly consume all of their allocated space in a short period of time and cause the storage pool to run out of space. If this was to happen, every application using that storage pool will essentially experience an outage and resolving the problem requires allocating more space to the pool, migrating data, and/or deleting data, each of which would take precious time and/or money. In my opinion, this fear is the primary gating factor to customers using thin provisioning. Exacerbating the issue, most large organizations have a complex procurement process that forces them to buy storage many months in advance of needing it, further reducing the usefulness of thin provisioning. The IT organization for one of my customers can only purchase new storage AFTER a business unit requests it and approved by senior management; and they batch those requests before approving a storage purchase. This means that the business unit may have to wait months to get the storage they requested.
This same customer recently purchased a Symmetrix VMAX with FASTVP and will be leveraging sub-LUN tiering with SSD, FC, and SATA disks totaling over 600TB of usable capacity in this single system. As we began design work for the storage array the topic of thin provisioning came up and the same fear of running out of space in the pool was voiced. To prevent this, the customer fully allocates all LUNs in the pool up front which prevents oversubscription. It’s an effective way to guarantee performance and availability but it means that any free space not used by application owners is locked up by the application server and not available to other applications. If you take their entire environment into account with approximately 3PB of usable storage and NO thin provisioning, there is probably close to $1 million in storage not being used and not available for applications. If you weigh the risk of an outage causing the loss of several million dollars per hour of revenue, the customer has decided the risks outweigh the potential savings. I’ve seen this decision made time and again in various IT shops.
Sub-LUN Tiering pushes the costs for growth down
I previously blogged about using cloud storage for block storage in the form of Cirtas BlueJet and how it would not be to much of a stretch to add this functionality to sub-LUN tiering software like EMC’s FASTVP to leverage cloud storage as a block storage tier as shown in this diagram.
Let’s first assume the customer is already using FASTVP for automated sub-LUN tiering on a VMAX. FASTVP is already identifying the hot and cold data and moving it to the appropriate tier, and as a result the lowest tier is likely seeing the least amount of IOPS per GB. In a VMAX, each tier consists of one or more virtual provisioned pools, and as the amount of data stored on the array grows FASTVP will continually adjust, pushing the hot data up to higher tiers and cold data down to the lower tiers The cold data is more likely to be old data as well so in many cases the data sort of ages down the tiers over time and its the old/least used portion of the data that grows. Conceptually, the only tier you may have to expand is the lowest (ie: SATA) when you need more space. This reduces the long term cost of data growth which is great. But you still need to monitor the pools and expand them before they run out of space, or an outage may occur. Most storage arrays have alerts and other methods to let you know that you will soon run out of space.
Risk-Free Thin Provisioning
What if the storage array had the ability automatically expand itself into a cloud storage provider, such as AT&T Synaptic, to prevent itself from running out of space? Technically this is not much different from using the cloud as a tier all it’s own but I’m thinking about temporary use of a cloud provider versus long term. The cloud provider becomes a buffer for times when the procurement process takes too long, or unexpected growth of data in the pool occurs. With an automated tiering solution, this becomes relatively easy to do with fairly low impact on production performance. In fact, I’d argue that you MUST have automated tiering to do this or the array wouldn’t have any method for determining what data it should move to the cloud. Without that level of intelligence, you’d likely be moving hot data to the cloud which could heavily impact performance of the applications.
Once the customer is able to physically add storage to the pool to deal with the added data, the array would auto-adjust by bringing the data back from the cloud freeing up that space. The cloud provider would only charge for the transfer of data in/out and the temporary use of space. Storage reduction technologies like compression and de-duplication could be added to the cloud interface to improve performance for data stored in the cloud and reduce costs. Zero detect and reclaim technologies could also be leveraged to keep LUNs thin over time as well as prevent the movement of zero’d blocks to the cloud.
Using cloud storage as a buffer for thin provisioning in this way could reduce the risk of using thin provisioning, increasing the utilization rate of the storage, and reducing the overall cost to store data.
What do you think? Would you feel better about oversubscribing storage pools if you had a fully automated buffer, even if that buffer cost some amount of money in the event it was used?
A comment about HDS’s Zero Page Reclaim on one of my previous posts got me thinking about the effectiveness of thin provisioning in general. In that previous post, I talked about the trade-offs between increased storage utilization through the use of thin-provisioning and the potential performance problems associated with it.
There are intrinsic benefits that come with the use of thin provisioning. First, new storage can be provisioned for applications without nearly as much planning. Next, application owners get what they want, while storage admins can show they are utilizing the storage systems effectively. Also, rather than managing the growth of data in individual applications, storage admins are able to manage the growth of data across the enterprise as a whole.
Thin provisioning can also provide performance benefits… For example, consider a set of virtual Windows servers running across several LUNs contained in the same RAID group. Each Windows VM stores its OS files in the first few GB of their respective VMDK files. Each VMDK file is stored in order in each LUN, with some free space at the end. In essence, we have a whole bunch of OS sections separated by gaps of no data. If all VMs were booting at approximately the same time, the disk heads would have to move continuously across the entire disk, increasing disk latency.
Now take the same disks, configured as a thin pool, and create the same LUNs (as thin LUNs) and the same VMs. Because thin-provisioning in general only writes data to the physical disks as it’s being written by the application, starting from the beginning of the disk, all of those Windows VMs’ OS files will be placed at the beginning of the disks. This increased data locality will reduce IO latency across all of the VMs. The effect is probably minor, but reduced disk latency translates to possibly higher IOPS from the same set of physical disks. And the only change is the use of thin-provisioning.
So back to HDS Zero Page Reclaim. The biggest problem with thin provisioning is that it doesn’t stay thin for long. Windows NTFS, for example, is particularly NOT thin-friendly since it favors previously untouched disk space for new writes rather than overwriting deleted files. This activity eventually causes a thin-LUN to grow to it’s maximum size over time, even though the actual amount of data stored in the LUN may not change. And Windows isn’t the only one with the problem. This means that thin provisioning may make provisioning easier, or possibly improve IO latency, but it might not actually save you any money on disk. This is where HDS’s Zero Page Reclaim can help. Hitachi’s Dynamic Provisioning (with ZPR) can scan a LUN for sections where all the bytes are zero and reclaim that space for other thin LUNs. This is particularly useful for converting thick LUNs into thin LUNs. But, it can only see blocks of zeros, and so it won’t necessarily see space freed up by deleting files. Hitachi’s own documentation points out that many file systems are not-thin friendly, and ZPR won’t help with long-term growth of thin LUNs caused by actively writing and then deleting data.
Although there are ways to script the writing of zeros to free space on a server so that ZPI can reclaim that space, you would need to run that script on all of your servers, requiring a unique tool for each operating system in your environment. The script would also have to run periodically, since the file system will grow again afterward.
NetApp’s SnapDrive tool for Windows can scan an NTFS file system, detect deleted files, then report the associated blocks back to the Filer to be added back to the aggregate for use by other volumes/LUNs. The Space Reclamation scan can be run as needed, and I believe it can be scheduled; but, it appears to be Windows only. Again, this will have to be done periodically.
But what if you could solve the problem across most or all of your systems, regardless of operating system, regardless of application, with real-time reclamation? And what if you could simultaneously solve other problems? Enter Symantec’s Storage Foundation with Thin-Reclamation API. Storage Foundation consists of VxFS, VxVM, DMP, and some other tools that together provide dynamic grow/shrink, snapshots, replication, thin-friendly volume usage, and dynamic SAN multipathing across multiple operating systems. Storage Foundation’s Thin-Reclamation API is to thin-provisioning what OST is to Backup Deduplication. Storage vendors can now add near-real-time zero page reclaim for customers that are willing to deploy VxFS/VxVM on their servers. For EMC customers, DMP can replace PowerPath, thereby offsetting the cost.
As far as I know, 3PAR is the first and only storage vendor to write to Symantec’s thin-API, which means they now have the most dynamic, non-disruptive, zero-page-reclaim feature set on the market. As a storage engineer myself, I have often wondered if VxVM/VxFS could make management of application data storage in our diverse environment easier and more dynamic. Adding Thin-Reclamation to the mix makes it even more attractive. I’d like to see more storage vendors follow 3PAR’s lead and write to Symantec’s API. I’d also like to see Symantec open up both OST and the Thin-Reclamation API for others to use, but I doubt that will happen.



