You are currently browsing the tag archive for the ‘fas’ tag.
The more I talk with customers, the more I find that the technical details of how something works is much less important than the business outcome it achieves. When it comes to storage, most customers just want a device that will provide the capacity and performance they need, at a price they can afford–and it better not be too complicated. Pretty much any vendor trying to sell something will attempt to make their solution fit your needs even if they really don’t have the right products. It’s a fact of life, sell what you have. Along these lines, there has been a lot of back and forth between vendors about dedup vs. compression technology and which one solves customer problems best.
After snapshots and thin provisioning, data reduction technology in storage arrays has become a big focus in storage efficiency lately; and there are two primary methods of data reduction — compression and deduplication.
While EMC has been marketing compression technology for block and file data in Celerra, Unified, and Clariion storage systems, NetApp has been marketing deduplication as the technology of choice for block and file storage savings. But which one is the best choice? The short answer is.. it depends. Some data types benefit most from deduplication while others get better savings with compression.
Currently, EMC supports file compression on all EMC Celerra NS20, 40, 80, 120, 480, 960, VG2, and VG8 systems running DART 5.6.47.x+ and block compression on all CX4 based arrays running FLARE30.x+. In all cases, compression is enabled on a volume/LUN level with a simple check box and processing can be paused, resumed, and disabled completely, uncompressing the data if desired. Data is compressed out-of-band and has no impact on writes, with minimal overhead on reads. Any or all LUN(s) and/or Filesystem(s) can be compressed if desired even if they existed prior to upgrading the array to newer code levels.
With the release of OnTap 8.0.1, NetApp has added support for in-line compression within their FAS arrays. It is enabled per-FlexVol and as far as I have been able to determine, cannot be disabled later (I’m sure Vaughn or another NetApp representative will correct me if I’m wrong here.) Compression requires 64-bit aggregates which are new in OnTap 8, so FlexVols that existed prior to an upgrade to 8.x cannot be compressed without a data migration which could be disruptive. Since compression is inline, it creates overhead in the FAS controller and could impact performance of reads and writes to the data.
Vaughn Stewart, of NetApp, expertly blogged today about the new compression feature, including some of the caveats involved, and to me the most interesting part of the post was the following graphic he included showing the space savings of compression vs. dedup for various data types.
Apart from “The Cloud”, “Unified Storage” is the other big buzzword in the storage industry of late. But what exactly is Unified Storage?
Mirriam-Webster defines unify as ”to make into a unit or coherent whole“
So how does this apply to storage systems? If you look at marketing messages by EMC, NetApp, and other vendors you’ll find that they all use the term in different ways in order to fit nicely with the products they have. Based on what I see, there are generally two different approaches.
Single HW/SW Stack Approach:
Some vendors want you to believe that the only way it can be called Unified Storage is if the same physical box and software stack provides all protocols and features, even if management of the single system is not perfectly cohesive.
NetApp’s FAS storage systems are an example of this strategy. A single filer provides all services whether SAN or NAS, IP or FiberChannel. However, a single HA cluster is actually managed as two separate systems, each cluster node is managed independently using independent FilerView instances and there are separate tools (NetApp System Manager, Operations Manager, Provisioning Manager, Protection Manager) that can bring all of the filer heads into one view. Disks are captive to a specific filer head in a cluster and moving disks and/or volumes between filer heads is not seamless.
Single Point of Management Approach:
Others approach it more holistically and figure that as long as the customer manages it as a single system, it qualifies as “Unified”, even if there may be disparate hardware and software components providing the different services. After all, once it’s installed you don’t really go in the datacenter to physically look at the hardware very often.
EMC’s Unified Storage (which is a combination of Celerra NAS and Clariion Block storage systems) is an example of this. In a best-of-breed approach, EMC allows the Clariion backend to do what it does best, block storage via FC or IP, while the Celerra, which is purpose built for NAS, provides CIFS/NFS services while leveraging the disk capacity, processors, cache, and other features of the Clariion as a kind of offload engine. Regardless of which services you use, all parts of the solution are managed from a single Unisphere instance, including other Clariions and/or Celerras in the environment. Unisphere launches from any Clariion or Celerra management port, and regardless of which device you launch it from, all systems are manageable together.
Which approach is better?
I see advantages and disadvantages to both approaches, as a former admin of both NetApp and EMC storage, I feel that while NetApp’s hardware and software stack is unified, their management stack is decidedly un-unified. EMC’s Unified storage is physically “integrated” to work together as a system, but the unifying feature is the management infrastructure built-in with Unisphere.
There are other advantages to EMCs approach as well. For example, if a particular workload seems to hammer the CPUs on the NAS but the backend is not a bottleneck, more Celerra datamovers can be added to take advantage of the same backend disks and improve front end performance. Likewise, the backend can be augmented as needed to improve performance, increase capacity, etc without having to scale up the front end NAS head. With the NetApp approach, if your CPU or cache is stressed, you need to deploy more FAS systems (in pairs for HA) along with any required disks for that new system to store data.
Both approaches work, and both have their merits, but what do customers really want?
In my opinion, most customers don’t really care *how* the hardware works, so long as it DOES WORK, and is easy to manage. In the grand scheme of things, if I, as an admin, can provision, replicate, snapshot, and clone storage across my entire environment, regardless of protocol, from a “single pane of glass”, that is a strong positive.
EMC Unisphere makes it easy to do just that and it launches right from the array with no separate installation or servers required. Unisphere can authenticate against Active Directory or LDAP and has role-based-administration built in. And since Unisphere launches from any Clariion Storage processor or Celerra Control Station, there’s no single point of failure for storage management either.
So what do you think customers want? If you are a customer, what do YOU want?
(Warning: This is a long post…)
You have a critical application that you can’t afford to lose:
So you want to replicate your critical applications because they are, well, critical. And you are looking at the top midrange storage vendors for a solution. NetApp touts awesome efficiency, awesome snapshots, etc while EMC is throwing considerable weight behind it’s 20% Efficiency Guarantee. While EMC guarantees to be 20% more efficient in any unified storage solution, there is perhaps no better scenario than a replication solution to prove it.
I’m going to describe a real-world scenario using Microsoft Exchange as the example application and show why the EMC Unified platform requires less storage, and less WAN bandwidth for replication, while maintaining the same or better application availability vs. a NetApp FAS solution. The example will use a single Microsoft Exchange 2007 SP2 server with ten 100GB mail databases connected via FibreChannel to the storage array. A second storage array exists in a remote site connected via IP to the primary site and a standby Exchange server is attached to that array.
- 100GB per database, 1 database per storage group, 1 storage group per LUN, 130GB LUNs
- 50GB Log LUNs, ensure enough space for extra log creation during maintenance, etc
- 10% change rate per day average
- Nightly backup truncates logs as required
- Best Practices followed by all vendors
- 1500 users (Heavy Users 0.4IOPS), 10% of users leverage Blackberry (BES Server = 4X IOPS per user)
- Approximate IOPS requirement for Exchange: 780IOPS for this server.
- EMC Solution: 2 x EMC Unified Storage systems with SnapView/SANCopy and Replication Manager
- NetApp Solution: 2 x NetApp FAS Storage systems with SnapMirror and SnapManager for Exchange
- RPO: 4 hours (remote site replication update frequency)
Based on those assumptions we have 10 x 130GB DB LUNs and 10 x 50GB Log LUNs and we need approximately 780 host IOPS 50/50 read/write from the backend storage array.
Disk IOPS calculation: (50/50 read/write)
- RAID10, 780 host IOPS translates to 1170 disk IOPS (r+w*2)
- RAID5, 780 host IOPS translates to 1950 disk IOPS (r+w*4)
- RAIDDP is essentially RAID6 so we have about 2730 disk IOPS (r + w*6)
Note: NetApp can create sequential stripes on writes to improve write performance for RAIDDP but that advantage drops significantly as the volumes fill up and free space becomes fragmented which is extremely likely to happen after a few months or less of activity.
Assuming 15K FiberChannel drives can make 180 IOPS with reasonable latencies for a database we’d need:
- RAID10, Database 6.5 disks (round up to 8), using 450GB 15K drives = 1.7TB usable (1 x 4+4)
- RAID5, 10.8 disks for RAID5 (round up to 12), using 300GB 15K drives = 2.8TB usable (2 x 5+1)
- RAID6/DP, 15.1 disks for RAID6 (round up to 16), using 300GB 15K drives = 3.9TB usable (1 x 14+2)
Log writes are highly cachable so we generally need fewer disks; for both the RAID10 and RAID5 EMC options we’ll use a single RAID1 1+1 raid group with 2 x 600GB 15K drives. Since we can’t do RAID1 or RAID10 on NetApp we’ll have to use at least 3 disks (1 data and 2 parity) for the 500GB worth of Log LUNs but we’ll actually need more than that.
Picking a RAID Configuration and Sizing for snapshots:
For EMC, the RAID10 solution uses fewer disks and provides the most appropriate amount of disk space for LUNs vs. the RAID5 solution. With the NetApp solution there really isn’t another alternative so we’ll stick with the 16 disk RAID-DP config. We have loads of free space but we need some of that for snapshots which we’ll see next. We also need to allocate more space to the Log disks for those snapshots.
Since we expect about 10% change per day in the databases (about 10GB per database) we’ll double that to be safe and plan for 20GB of changes per day per LUN (DB and Log).
NetApp arrays store snapshot data in the same volume (FlexVol) as the application data/LUN so you need to size the FlexVol’s and Aggregates appropriately. We need 200GB for the DB LUNs and 200GB for the Log LUNs to cover our daily change rate but we’re doubling that to 400GB each to cover our 2 day contingency. In the case of the DB LUNs the aggregate has more than enough space for the 400GB of snapshot data we are planning for but we need to add 400GB to the Log aggregate as well so we need 4 x 600GB 15K drives to cover the Exchange logs and snapshot data.
EMC Unified arrays store snapshot data for all LUNs in centralized location called the Reserve LUN Pool or RLP. The RLP actually consists of a number of LUNs that can be used and released as needed by snapshot operations occurring across the entire array. The RLP LUNs can be created on any number of disks, using any RAID type to handle various IO loads and sizing an RLP is based on the total change rate of all simultaneously active snapshots across the array. Since we need 400GB of space in the Reserve LUN Pool for one day of changes, we’ll again be safe by doubling that to 800GB which we’ll provide with 6 dedicated 300GB 15K drives in RAID10.
At this point we have 20 disks on the NetApp array and 16 disks on the EMC array. We have loads of free space in the primary database aggregate on the NetApp but we can’t use that free space because it’s sized for the IOPS workload we expect from the Exchange server.
In order to replicate this data to an alternate site, we’ll configure the appropriate tools.
- Install Replication Manager on a server and deploy an agent to each Exchange server
- Configure SANCopy connectivity between the two arrays over the IP ports built-in to each array
- In Replication Manager, Configure a job that quiesces Exchange, then uses SANCopy to incrementally update a copy of the database and log LUNs on the remote array and schedule for every 4 hours using RM’s built in scheduler.
- Install SnapManager for Exchange on each Exchange server
- Configure SnapMirror connectivity betweeen the two arrays over the IP ports built-in to each array
- In SnapManager, Configure a backup job that quiesces Exchange and takes a Snapshot of the Exchange DBs and Logs, then starts a SnapMirror session to replicate the updated FlexVol (including the snapshot) to the remote array. Configure a schedule in Windows Task Manager to run the backup job every 4 hours.
Both the EMC and NetApp solutions run on schedule, create remote copies, and everything runs fine, until...
Tuesday night during the weekly maintenance window, the Exchange admins decide to migrate half of the users from DB1, to DB2 and DB3 and half of the users from DB4, to DB5 and DB6. About 80GB of data is moved (25GB to each of the target DBs.) The transactions logs on DB1 and DB4 jump to almost 50GB, 35GB each on DB2, DB3, DB5, and DB6.
On the NetApp array, the 50GB log LUNs already have about 10GB of snapshot data stored and as the migration is happening, new snapshot data is tracked on all 6 of the affected DB and Log LUNs. The 25GB of new data plus the 10GB of existing data exceeds the 20GB of free space in the FlexVol that each LUN is contained in and guess what… Exchange chokes because it can no longer write to the LUNs.
There are workarounds: First, you enable automatic volume expansion for the FlexVols and automatic Snapshot deletion as a secondary fallback. In the above scenario, the 6 affected FlexVols autoextend to approximately 100GB each equaling 300GB of snapshot data for those 6 LUNs and another 40GB for the remaining 4 LUNs. There is only 60GB free in the aggregate for any additional snapshot data across all 10 LUNs. Now, SnapMirror struggles to update the 1200GB of new data (application data + snapshot data) across the WAN link and as it falls behind more data changes on the production LUNs increasing the amount of snapshot data and the aggregate runs out of space. By default, SnapMirror snapshots are not included in the “automatically delete snapshots” option so Exchange goes down. You can set a flag to allow SnapMirror owned snapshots to be automatically deleted but then you have to resync the databases from scratch. In order to prevent this problem from ever occurring, you need to size the aggregate to handle >100% change meaning more disks.
Consider how the EMC array handles this same scenario using SANCopy. The same changes occur to the databases and approximately 600GB of data is changed across 12 LUNs (6 DB and 6 Log). When the Replication Manager job starts, SANCopy takes a new snapshot of all of the blocks that just changed for purposes of the current update and begins to copy those changed blocks across the WAN.
- SANCopy/Inc is not tracking the changes that occur AS they occur, only while an update is in process so the Reserve LUN Pool is actually empty before the update job starts. If you want additional snapshots on top of the ones used for replication, that will increase the amount of data in the Reserve LUN Pool for tracking changes, but snapshots are created on both arrays independently and the snapshot data is NOT replicated. This nuance allows you to have different snapshot schedules in production vs. disaster recovery for example.
- Because SANCopy/Inc only replicates the blocks that have changed on the production LUNs, NOT the snapshot data, it copies only half of the data across the WAN vs SnapMirror which reduces the time out of sync. This translates to lower WAN utilization AND a better RPO.
- IF an update was occurring when the maintenance took place, the amount of data put in the Reserve LUN pool would be approximately 600GB (leaving 200GB free for more changed data). More efficient use of the Snapshot pool and more flexibility.
- IF the Reserve LUN Pool ran out of space, the SANCopy update would fail but the production LUNs ARE NEVER AFFECTED. Higher availability for the critical application that you devoted time and money to replicate.
- Less spinning disk on the EMC array vs. the NetApp.
EMC has several replication products available that each act differently. I used SANCopy because, combined with Replication Manager, it provides similar functionality to NetApp SnapMirror and SnapManager. MirrorView/Async has the same advantages as SANCopy/Incremental in these scenarios and can replicate Exchange, SQL, and other applications without any host involvement.
Higher Application availability, lower WAN Utilization , Better RPO, Fewer Spinning Disks, without even leveraging advanced features for even better efficiency and performance.
Yesterday, In his blog posted entitled “Myth Busting: Storage Guarantees“, Vaughn Stewart from NetApp blogged about the EMC 20% Guarantee and posted a chart of storage efficiency features from EMC and NetApp platforms to illustrate his point. Chuck Hollis from EMC called it “chartsmithing” in comment but didn’t elaborate specifically on the charts deficiencies. Well allow me to take that ball…
As presented, Vaughn’s chart (below) is technically factual (with one exception which I’ll note), but it plays on the human emotion of Good vs Bad (Green vs Red) by attempting to show more Red on EMC products than there should be.
The first and biggest problem is the chart compares EMC Symmetrix and EMC Clariion dedicated-block storage arrays with NetApp FAS, EMC Celerra, and NetApp vSeries which are all Unified storage systems or gateways. Rather than put n/a or leave the field blank for NAS features on the block-only arrays, the chart shows a resounding and red NO, leading the reader to assume that the feature should be there but somehow EMC left it out.
As far as keeping things factual, some of the EMC and NetApp features in this chart are not necessarily shipping today (very soon though, and since it affects both vendors I’ll allow it here). And I must make a correction with respect to EMC Symmetrix and Space Reclamation, which IS available on Symm today.
I’ve taken the liberty of massaging Vaughn’s chart to provide a more balanced view of the feature comparison. I’ve also added EMC Celerra gateway on Symmetrix to the comparison as well as an additional data point which I felt was important to include.
1.) I removed the block only EMC configuration devices because the NetApp devices in the comparison are Unified systems.
2.) I removed the SAN data row for Single Instance storage because Single Instance (identical file) data reduction technology is inherently NAS related.
3.) Zero Space Reclamation is a feature available in Symmetrix storage. In Clariion, the Compression feature can provide a similar result since zero pages are compressible.
I left the 3 different data reduction techniques as individually listed even though the goal of all of them is to save disk space. Depending on the data types, each method has strengths and weaknesses.
One question, if a bug in OnTap causes a vSeries to lose access to the disk on a Symmetrix during an online Enginuity upgrade, who do you call? How would you know ahead of time if EMC hasn’t validated vSeries on Symmetrix like EMC does with many other operating systems/hosts/applications in eLab?
The goal if my post here really is to show how the same data can be presented in different ways to give readers a different impression. I won’t get into too much as far as technical differences between the products, like how comparing FAS to Symmetrix is like comparing a box truck to a freight train, or how fronting an N+1 loosely coupled clustered, global cached, high-end storage array with a midrange dual-controller gateway for block data might not be in a customer’s best interest.
What do you think?
This is a follow up to my recent post NetApp and EMC: Replication Management Tools Comparison, in which I discussed the differences between EMC Replication Manager and NetApp SnapManager.
As a former customer of both NetApp and EMC, and now as an employee of EMC, I noticed a big difference between NetApp and EMC as far as marketing their replication management tools. As a customer, EMC talked about Replication Manager several times and we purchased it and deployed it. NetApp made SnapManager a very central part of their sales campaign, sometimes skipping any discussion of the underlying storage in favor of showing off SnapManager functionality. This is an extremely effective sales technique and NetApp sales teams are so good at this that many people don’t even realize that other vendors have similar, and in my opinion EMC has better, functionality. One of the reasons for this difference in marketing strategy is that NetApp users NEED SnapManager, while EMC users do not always need Replication Manager.
The reason why is both simple and complex…
EMC storage arrays (Clariion, Symmetrix, RecoverPoint, Invista) all have one technology in common that NetApp Filers do not–Consistency Groups. A consistency group allows the storage system to take a snapshot of multiple LUNs simultaneously, so simultaneous in fact that all of the snapshots are at the exact same point in time down to the individual write. This means that, without taking any applications offline and without any orchestration software, EMC storage arrays can create crash-consistent copies of nearly any kind of data at any time.
The EMC Whitepaper “EMC CLARiiON Database Storage Solutions: Oracle 10g/11g with CLARiiON Storage Replication Consistency” downloadable from EMC’s website has the following explanation of consistency groups in general…
“…Consistent replication operates on multiple LUNs as a set such that if the replication action fails for one member in the set, replication for all other members of the set are canceled or stopped. Thus the contents of all replicated LUNs in the set are guaranteed to be identical point-in-time replicas of their source and dependent-write consistency is maintained…”
“…With consistent replication, the database does not have to be shut down or put into “hot backup mode.” Replicates created with SnapView or MV/S (or MV/A, Timefinder, SRDF, Recoverpoint, etc) consistency operations, without first quiescing or halting the application, are restartable point-in-time replicas of the production data and guaranteed to be dependent-write consistent.”
Consistency is important for any application that is writing to multiple LUNs at the same time such as SQL database and log volumes. SnapManager and Replication Manager actually prepare the application by quiescing the database during the snapshot creation process. This process creates “application-consistent” copies which are technically better for recovery compared with “storage-consistent” copies (also known as crash-consistent copies).
So, while I will acknowledge that quiescing the database during a snapshot/replication operation provides the best possible recovery image, that may not be realistic in some scenarios. The first issue is that the actual operation of quiescing, snapping, checking the image, then pushing an update to a remote storage array takes some time. Depending on the size of the dataset, this operation can take from several minutes to several hours to complete. If you have a Recovery Point Objective (RPO) of 5 minutes or less, using either of these tools is pretty much a non-starter.
Another issue is one of application support. EMC Replication Manager and NetApp SnapManager have very wide support for the most popular operating systems, filesystems, databases, and applications, they certainly don’t support every application. A very simple example is a Novell Netware file server with a NSS pool/volume spanning multiple LUNs. Neither NetApp nor EMC have support for Novell Netware in their replication management tools. While you can certainly replicate all of the LUNs with NetApp SnapManager, SnapManager has no consistency technology built-in to keep the LUNs write-order consistent. The secondary copy will appear completely corrupt to the Netware server if a recovery is attempted. Through the use of consistency groups with MirrorView/Async, the replication of each LUN is tracked as a group and all of the LUNs are write-order consistent with each other, keeping the filesystem itself consistent. You would need to have either array-level consistency technology, or support for Netware in the replication management tool in order to replication such a server.. Unfortunately, NetApp provides neither.
You may have complex applications that consist of Oracle and SQL databases, NTFS filesystems, and application servers running as VMs. Using array-based consistency groups, you can replicate all of these components simultaneously and keep them all consistent with each other. This way you won’t have transactions that normally affect two databases end up missing in one of the two after a recovery operation, even if those databases are different technologies (Oracle and MySQL, or PostgreSQL for example).
EMC Storage arrays provide consistency group technology for Snapshots and Replication in Clariion and Symmetrix storage arrays. In fact, with Symmetrix, consistency groups can span multiple arrays without any host software. By comparison, NetApp Filers do not have consistency group technology in the array. Snapshots are taken (for local replicas and for SnapMirror) at the FlexVolume level. Two FlexVolumes cannot be snapped consistently with each other without SnapManager.
There are a couple workarounds for NetApp users–you can snapshot an aggregate, but that is not recommended by NetApp for most customers, or you can put multiple LUNs in the same FlexVol, but that still limits you to 16TB of data including snapshot reserve space, and both options violate best practices for database designs of keeping data and logs in separate spindles for recovery. Even with these workarounds, you cannot gain LUN consistency across the two controllers in an HA Filer pair, something the CLARiiON does natively, and can help for load balancing IO across the storage processors.
In general, I recommend that EMC customers use EMC Replication Manager and NetApp customers use SnapManager for the applications that are supported, and for most scenarios. But when RPO’s are short, or the environment falls outside the support matrix for those tools, consistency groups become the best or only option.
Incidentally, with EMC RecoverPoint, you get the best of both worlds. CDP or near-CDP replication of data using consistency groups for zero or near-zero RPOs plus application-consistent bookmarks made anytime the database is quiesced. Recovery is done from the up-to-the-second version of the data, but if that data is not good for any reason, you can roll back to another point in time, including a point-in-time when the database was quiesced (a bookmark).
So, while EMC has, in Replication Manager, an equivalent offering to NetApp’s SnapManager, EMC customers are not required to use it, and in some cases they can achieve better results using array-based consistency technologies.
I started this post before I started working for EMC and got sidetracked with other topics. Recent discussions I’ve had with people have got me thinking more about orchestration of data protection, replication, and disaster recovery, so it was time to finish this one up…
Prior to me coming to work for EMC, I was working on a project to leverage NetApp and EMC storage simultaneously for redundancy. I had a chance to put various tools from EMC and NetApp into production and have been able to make some observations with respect to some of the differences. This is a follow up my previous NetApp and EMC posts…
Specifically this post is a comparison between NetApp SnapManager 5.x and EMC Replication Manager 5.x. First, here’s a quick background on both tools based on my personal experience using them.
EMC Replication Manager (RM) is a single application that runs on a dedicated “Replication Manager Server.” RM agents are deployed to the hosts of applications that will be replicated. RM supports local and remote replication features in EMC’s Clariion storage array, Celerra Unified NAS, Symmetrix DMX/V-Max, Invista, and RecoverPoint products. With a single interface, Replication Manager lets you schedule, modify, and monitor snapshot, clone, and replication jobs for Exchange, SQL, Oracle, Sharepoint, VMWare, Hyper-V, etc. RM supports Role-Based authentication so application owners can have access to jobs for their own applications for monitoring and managing replication. RM can manage jobs across all of the supported applications, array types, and replication technologies simultaneously. RM is licensed by storage array type and host count. No specific license is required to support the various applications.
NetApp SnapManager is actually a series of applications designed for each application that NetApp supports. There are versions of SnapManager for Exchange, SQL, Sharepoint, SAP, Oracle, VMWare, and Hyper-V. The SnapManager application is installed on each host of an application that will be replicated, and jobs are scheduled on each specific host using Windows Task Scheduler. Each version of SnapManager is licensed by application and host count. I believe you can also license SnapManager per-array instead of per-host which could make financial sense if you have lots of hosts.
EMC Replication Manager and NetApp SnapManager products tackle the same customer problem–provide guaranteed recoverability of an application, in the primary or a secondary datacenter, using array-based replication technologies. Both products leverage array-based snapshot and replication technology while layering application-consistency intelligence to perform their duties. In general, they automate local and remote protection of data. Both applications have extensive CLI support for those that want that.
- EMC RM – Replication Manager is a client-server application installed on a control server. Agents are deployed to the protected servers.
- NetApp SM – SnapManager is several applications that are installed directly on the servers that host applications being protected.
- Job Management
- EMC RM – All job creation, management, and monitoring is done from the central GUI. Replication Manager has a Java based GUI.
- NetApp SM – Job creation and monitoring is done via the SnapManager GUI on the server being protected. SnapManager utilizes an MMC based GUI.
- Job Scheduling
- EMC RM – Replication Manager has a central scheduler built-in to the product that runs on the RM Server. Jobs are initiated and controlled by the RM Server, the agent on the protected server performs necessary tasks as required.
- NetApp SM – SnapManager jobs are scheduled with Windows Task Scheduler after creation. The SnapManager GUI creates the initial scheduled task when a job is created through the wizard. Modifications are made by editing the scheduled task in Windows task scheduler.
So while the tools essentially perform the same function, you can see that there are clear architectural differences, and that’s where the rubber meets the road. Being a centrally managed client-server application, EMC Replication Manager has advantages for many customers.
Simple Comparison Example: Exchange 2007 CCR cluster
(snapshot and replicate one of the two copies of Exchange data)
With NetApp SnapManager, the application is installed on both cluster nodes, then an administrator must log on to the console on the node that hosts the copy you want to replicate, and create two jobs which run on the same schedule. Job A is configured to run when the node is the active node, Job B is configured to run when the node is passive. Due to some of the differences in the settings, I was unable to configure a single job that ran successfully regardless of whether the node was active or passive. If you want to modify the settings, you either have to edit the command line options in the Scheduled Task, or create a new job from scratch and delete the old one.
With EMC Replication Manager, you deploy the agent to both cluster nodes, then in the RM GUI, create a job against the cluster virtual name, not the individual node. You define which server you want the job to run on in the cluster, and whether the job should run when the node is passive, active, or both. All logs, monitoring, and scheduling is done in the same RM GUI, even if you have 50 Exchange clusters, or SQL and Oracle for that matter. Modifying the job is done by right-clicking on the job and editing the properties. Modifying the schedule is done in the same way.
So as the number of servers and clusters increases in your environment, having a central UI to manage and monitor all jobs across the enterprise really helps. But here’s where having a centrally managed application really shines…
But what if it gets complicated?
Let’s say you have a multi-tier application like IBM FileNet, EMC Documentum, or OpenText and you need to replicate multiple servers, multiple databases, and multiple file systems that are all related to that single application. Not only does EMC Replication Manager support SQL and Filesystems in the same GUI, you can tie the jobs together and make them dependent on each other for both failure reporting and scheduling. For example, you can snapshot a database and a filesystem, then replicate both of them without worrying about how long the first job takes to complete. Jobs can start other jobs on completely independent systems as necessary.
Without this job dependence functionality, you’d generally have to create scheduled tasks on each server and have dependent jobs start with a delay that is long enough to allow the first job to complete while as short as possible to prevent the two parts of the application from getting too far out of sync. Some times the first job takes longer than usual causing subsequent jobs to complete incorrectly. This is where Replication Manager shows it’s muscle with it’s ability to orchestrate complex data protection strategies, across the entire enterprise, with your choice of protection technologies (CDP, Snapshot, Clone, Bulk Copy, Async, Sync) from a single central user interface.
I’ve been having some fun discussions with one of my customers recently about how to tackle various application problems within the storage environment and it got me thinking about the value of having “options”. This customer has an EMC Celerra Unified Storage Array that has Fiber Channel, iSCSI, NFS, and CIFS protocols enabled. This single storage system supports VMWare, SQL, Web, Business Intelligence, and many custom applications.
The discussion was specifically centered on ensuring adequate storage performance for several different applications, each with a different type of workload…
1.) Web Servers – Primarily VMs with general-purpose IO loads and low write ratios.
2.) SQL Servers – Physical and Virtual machines with 30-40% write ratios and low latency requirements.
3.) Custom Application – A custom application database with 100% random read profiles running across 50 servers.
The EMC Unified solution:
EMC Storage already sports virtual provisioning in order to provision LUNs from large pools of disk to improve overall performance and reduce complexity. In addition, QoS features in the array can be used to provide guaranteed levels of performance for specific datasets by specifying minimum and maximum bandwidth, response time, and IO requirements on a per-LUN basis. This can help alleviate disk contention when many LUNs share the same disks, as in a virtual pool. Enterprise Flash Drives (EFD) are also available for EMC Storage arrays to provide extremely high performance to applications that require it and they can coexist with FC and SATA drives in the same array. Read and write cache can also be tuned at an array and LUN level to help with specific workloads. With the updates to the EMC Unified Platform that I discussed previously, Sub-LUN FAST (auto tiering), and FAST Cache (EFD used as array cache) will be available to existing customers after a simple, non-disruptive, microcode upgrade, providing two new ways to tackle these issues.
So which feature should my customer use to address their 3 different applications?
Sub-LUN FAST (Fully Automated Storage Tiering)
Put all of the data into large Virtual Provisioning pools on the array, add a few EFD (SSD) and SATA disks to the mix and enable FAST to automatically move the blocks to the appropriate tier of storage. Over time the workload would even out across the various tiers and performance would increase for all of the workloads with much fewer drives, saving on power, floor space, cooling, and potentially disk cost depending on the configuration. This happens non-disruptively in the background. Seems like a no-brainer right?
For this customer, FAST helps the web server VMs and the general-purpose SQL databases where the workload is predominately read and much of the same data is being accessed repeatedly (high locality of reference). As long as the blocks being accessed most often are generally the same, day-to-day, automated tiering (FAST) is a great solution. But what if the workload is much more random? FAST would want to push all of the data into EFD, which generally wouldn’t be possible due to capacity requirements. Okay, so tiering won’t solve all of their problems. What about FAST Cache?
Exponentially increase the size of the storage array’s read AND write cache with EFD (SSD) disks. This would improve performance across the entire array for all “cache friendly” applications.
For this customer, increasing the size of write cache definitely helps performance for SQL (50% increase in TPM, 50% better response time as an example) but what about their custom database that is 100% random read? Increasing the size of read cache will help get more data into cache and reduce the need to go to disk for reads, but the more random the data, the less useful cache is. Okay, so very large caches won’t solve all of their problems. EFDs must be the answer right?
Forget SATA and FC disks; just use EFD for everything and it will be super fast!! EFD has extremely high random read/write performance, low latency at high loads, and very high bandwidth. You will even save money on power and cooling.
The total amount of data this customer is dealing with in these three applications alone exceeds 20TB. To store that much in EFD would be cost prohibitive to say the least. So, while EFD can solve all of this customer’s technical problems, they couldn’t afford to acquire enough EFD for the capacity requirements.
But wait, it’s not OR, it’s AND
The beauty of the EMC Unified solution is that you can use all of these technologies, together, on the same array, simultaneously.
In this customer’s case, we put FC and SATA into a virtual pool with FAST enabled and provision the web and general-purpose SQL servers from it. FAST will eventually migrate the least used blocks to SATA, freeing the FC disks for the more demanding blocks.
Next, we extend the array cache using a couple EFDs and FAST Cache to help with random read, sequential pre-fetching, and bursty writes across the whole array.
Finally, for the custom 100% random read database, we dedicate a few EFDs to just that application, snapshot the DB and present copies to each server. We disable read and write cache for the EFD backed volumes which leaves more cache available to the rest of the applications on the array, further improving total system performance.
Now, if and when the customer starts to see disk contention in the virtual pool that might affect performance of the general-purpose SQL databases, QoS can be tuned to ensure low response times on just the SQL volumes ensuring consistent performance. If the disks become saturated to the point where QoS cannot maintain the response time or the other LUNs are suffering from load generated by SQL, any of the volumes can be migrated (non-disruptively) to a different virtual pool in the array to reduce disk contention.
If you look at offerings from the various storage vendors, many promote large virtual pools, some also promote large caches of some kind, others promote block level tiering, and a few promote EFD (aka SSDs) to solve performance problems. But, when you are consolidating multiple workloads into a single platform, you will discover that there are weaknesses in every one of those features and you are going to wish you had the option to use most or all of those features together.
You have that option on EMC Unified.
This past week, during EMC World 2010 in Boston, EMC made several announcements of updates to the Celerra and CLARiiON midrange platforms. Some of the most impressive were new capabilities coming to CLARiiON FLARE in just a couple short months. Major updates to Celerra DART will coincide with the FLARE updates and if you are already running CLARiiON CX4 hardware, or are evaluating CX4 (or Celerra), you will want to check these new features out. They will be available to existing CX4(120,240,480,960)/NS(120,480,960) systems as part of a software update.
Here’s a list of key changes in FLARE 30:
- Unified management for midrange storage platforms including CLARiiON and Celerra today, plus RecoverPoint, Replication Manager and more in the future. This is a true single pane of glass for monitoring AND managing SAN, NAS, and data protection and it’s built in to the platform. ”EMC Unisphere” replaces Navisphere Manager and Celerra Manager and supports multiple storage systems simultaneously in a single window. (Video Demo)
- Extremely large cache (ie: FASTCache) – Up to 2TB of additional read/write cache in CLARiiON using SSDs (Video Demo)
- Block level Fully Automated Storage Tiering (ie: sub-LUN FAST) – Fully automated assignment of data across multiple disk types
- Block Level Compression – Compress LUNs in the CLARiiON to reduce disk space requirements
- VAAI Support – Integrate with vSphere ESX for improved performance
These features are in addition to existing features like:
- Seamless and non-disruptive mobility of LUNs within a storage array – (via Virtual LUNs)
- Non-Disruptive Data Migration – (via PowerPath Migration Enabler)
- VMWare Aware Storage Management – (Navisphere, Unisphere, and vSphere Plugins giving complete visibility and self-service provisioning for VMWare admins (Video Demo) AND Storage Admins
- CIFS and NFS Compression – Compress production data on Celerra to reduce disk space requirements including VMs
- Dynamic SAN path load balancing – (via PowerPath)
- At-Rest-Encryption – (via PowerPath w/RSA)
- SSD, FC, and SATA drives in the same system – Balance performance and capacity as needed for your application
- Local and Remote replication with array level consistency – (SnapView, MirrorView, etc)
- Hot-swap, Hot-Add, Hot-Upgrade IO Modules – Upgrade connectivity for FC, FCoE, and iSCSI with no downtime
- Scale to 1.8PB of storage in a single system
- Simultaneously provide FC, iSCSI, MPFS, NFS, and CIFS access
All together, this is an impressive list of features for a single platform. In fact, while many of EMC’s competitors have similar features, none of them have all of them in the same platform, or leverage them all simultaneously to gain efficiency. When CLARiiON CX4 and Celerra NS are integrated and managed as a single Unified storage system with EMC Unisphere there is tremendous value as I’ll point out below…
Improve Performance easily…
- Install a couple SSD drives into a CLARiiON and enable FASTCache to increase the array’s read/write cache from the industry competive 4GB-32GB up to 2TB of array based non-volatile Read AND Write cache available to ALL applications including NAS data hosted by the array.
- Install PowerPath on Windows, Linux, Solaris, AND VMWare ESX hosts to automatically balance IO across all available paths to storage. PowerPath detects latency and queuing occuring on each path and adjusts automatically, improving performance at the storage array AND for your hosts. This is a huge benefit in VMWare environments especially.
- When VMWare releases the updated version of vSphere ESX that supports VAAI, ESX will be able to leverage VAAI support in the CLARiiON to reduce the amount of IO required to do many tasks, improving performance across the environment again.
- Upgrade from 1gbe iSCSI to 10gbe iSCSI, or from 4gbe FiberChannel to 8gbe FiberChannel, without a screwdriver or downtime.
- Provide NAS shared file access with block-level performance for any application using EMC’s MPFS protocol.
Improve Efficiency and cost easily…
- Create a single pool of storage containing some SSD, some FC, and some SATA drives, that automatically monitors and moves portions of data to the appropriate disk type to both improve performance AND decrease cost simultaneously.
- Non-disruptively compress volumes and/or files with a single click to save 50% of your disk space in many cases.
- Convert traditional LUNs to more efficient Thin-LUNs non-disruptively using PowerPath Migration Enabler, saving more disk space.
Increase and Manage Capacity easily…
- Add additional storage non-disruptively with SSD, FC, and SATA drives in any mix up to 1.8PB of raw storage in a single CLARiiON CX4.
- Using FASTCache, iSCSI, FC, and FCoE connectivity simultaneously does not reduce total capacity of the system.
- Expanding LUNs, RAID Groups, and Storage Pools is non-disruptive.
- Migrating LUNs between RAID groups and/or Storage Pools is non-disruptive using built-in CLARiiON LUN Migration, as is migrating data to a different storage array (using PowerPath Migration Enabler)!
- Balancing workload between storage processors is non-disruptive and at individual LUN granularity.
Protect your data easily…
- Snapshot, Clone, and Replicate any of the data to anywhere with built in array tools that can maintain complete data consistency across a single, or multiple applications without installing software.
- Maintain application consistency for Exchange, SQL, Oracle, SAP, and much more, even within VMWare VMs, while replicating to anywhere with a single pane-of-glass.
- Encrypt sensitive data seamlessly using PowerPath Encryption w/RSA.
- While you can do all of these things quickly and simply, you still have the flexibility to create traditional RAID sets using RAID 0, 1, 5, 6, and 10 where you need highly predicable performance, or tune read and write cache at the array and LUN level for specific workloads. Do you want read/write snapshots? How about full copy clones on completely separate disks for workload isolation and failure protection? What about the ability to rollback data to different points in time using snapshots without deleting any other snapshots? EMC Storage arrays have been able to do this for a long time and that hasn’t changed.
There are few manufacturers aside from EMC that can provide all of these capabilities, let alone provide them within a single platform. That’s the definition of simple, efficient, Unified Storage in my opinion.
In the last post, I talked about a project I am involved in right now to deploy NetApp storage alongside EMC for SAN and NAS. Today, I’m going to talk about my first impressions of the NetApp during deployment and initial configuration.
I’m going to be pretty blunt — I have been working with EMC hardware and software for a while now, and I’m generally happy with the usability of their GUIs. Over that time, I’ve used several major revisions of Navisphere Manager and Celerra Manager, and even more minor revisions, and I’ve never actually found a UI bug. To be clear, EMC, IBM, NetApp, HDS, and every other vendor have bugs in their software, and they all do what they can to find and fix them quickly, but I just haven’t personally seen one in the EMC UIs despite using every feature offered by those systems. (I have come across bugs in the firmware)
Contrast that with the first day using the new NetApp, running the latest 184.108.40.206L1 code, where we discovered a UI problem in the first 10 minutes. When attempting to add disks to an aggregate in FilerView, we could not select FC disk to add. We could, however, add SATA disk to the FC aggregate. The only way to get around the issue was to use the CLI via SSH. As I mentioned in my previous post, our NetApp is actually an IBM nSeries, and IBM claims they perform additional QC before their customers get new NetApp code.
Shortly after that, we found a second UI issue in FilerView. When creating a new Initiator group, FilerView populates the initiator list with the WWNs that have logged in to it. Auto-populating is nice but the problem is that FilerView was incorrectly parsing the WWN of the server HBAs and populating the list with NodeWWNs rather than PortWWNs. We spent several hours trying to figure out why the ESX servers didn’t see any LUNs before we realized that the WWNs in the Initiator group were incorrect. Editing the 2nd digit on each one fixed the problem.
I find it interesting that these issues, which seemed easy to discover, made it through the QC process of two organizations. ONTap 7.3.2RC1 is available now, but I don’t know if these issues were addressed.
As far as FilerView goes, it is generally easy to use once you know how NetApp systems are provisioned. The biggest drawback in an HA-Filer setup is the fact you have to open FilerView separately for each Filer and configure each one as a separate storage system. Two HA-Filer pairs? Four FilerView windows. If you include the initial launch page that comes up before you get to the actual FilerView window, you double the number of browser windows open to manage your systems. NetApp likes to mention that they have unified management for NAS and SAN where EMC has two separate platforms, each with their own management tools. EMC treats the two storage processors (SPs) in a Clariion in a much more unified manner, and provisioning is done against the entire Clariion, not per SP. Further, Navisphere can manage many Clariions in the same UI. Celerra Manager acts similarly for EMC NAS. Six of one, half a dozen of the other some say, except that I find that I generally provision NAS storage and SAN storage at different times, and I’d rather have all of the controllers/filers in the same window than NAS and SAN in the same window. Just my preference.
I should mention, NetApp recently released System Manager 1.0 as a free download. This new admin tool does present all of the controllers in one view and may end up being a much better tool than FilerView. For now, it’s missing too many features to be used 100% of the time and it’s Windows only since it’s based on MMC. Which brings me to my other problem with managing the NetApp. Neither FilerView nor System Manager can actually do everything you might need to do, and that means you end up in the CLI, FREQUENTLY. I’m comfortable with CLIs and they are extremely powerful for troubleshooting problems, and especially for scripting batch changes, but I don’t like to be forced into the CLI for general administration. GUI based management helps prevent possibly crippling typos and can make visualizing your environment easier. During deployment, we kept going back and forth between FilerView and CLI to configure different things. Further, since we were using MultiStore (vFilers) for CIFS shares and disaster recovery, we were stuck in the CLI almost entirely because System Manager can’t even see vFilers, and FilerView can only create them and attach volumes.
Had I not been managing Celerra and Clariion for so long, I probably wouldn’t have noticed the above problems. After several years of configuring CIFS, NFS, iSCSI, Virtual DataMovers, IP Interfaces, Snapshots, Replication, and DR Failover, etc. on Celerra, as well as literally thousands of LUNs for hundreds of servers on Clariion, I don’t recall EVER being forced to use the CLI. CelerraCLI and NaviCLI are very powerful, and I have written many scripts leveraging them, and I’ll use CLI when troubleshooting an issue. But for every single feature I’ve ever used on the Celerra or Clarrion, I was able to completely configure from start to finish using the GUI. Installing a Celerra from scratch even uses a GUI based installation wizard. Comparing Clariion Storage Groups with NetApp Initiator groups and LUN maps isn’t even fair. For MS Exchange, I mapped about 50 LUNs to the ESX cluster, which took about 30 minutes in FilerView. On the Clariion, the same operation is done by just editing the Storage Group and checking each LUN, taking only a couple minutes for the entire process.
Now, all of the above commentary has to do with the management tools, UIs, and to some degree personal preferences, and does not have any bearing on the equipment or underlying functionality. There are, of course, optional management tools like Operations Manager, Provisioning Manager, and Protection Manager available from NetApp, just as there is Control Center from EMC (which incidentally can monitor the NetApp) or Command Central from Symantec. Depending on your overall needs, you may want to look at optional management tools; or, FilerView may be perfectly fine.
In the next post, I’ll get into more specifics about how the Exchange 2007 CCR cluster turned out in this new environment, along with some notes on making CCR truly redundant. I’ve also been working on the NAS side of the project, so I’ll also post about that some time soon.
I’ve been tasked recently on a project to increase availability of applications through the use of multiple/disparate storage systems. This environment has heavily invested in EMC Clariion and Celerra storage systems over the past few years and needed a non-EMC platform from which to build the second half of a redundant storage environment. For various reasons I won’t go into here, we chose IBM nSeries as that second platform. (Since the IBM system is rebranded NetApp FAS, I will refer to this as a NetApp filer.) I’ve been working on implementing the new equipment as well as integrating it into the Business Continuity strategy.
The overall strategy is to continue to use the EMC Clariion/Celerra systems for production and disaster recovery replication and split applications between and across the two storage platforms for local redundancy. The NetApp will also perform disaster recovery replication for some of the applications. Here’s a really simple diagram that might help if the description is confusing:
Now this may sound easy, but it is, in fact, NOT straightforward. This strategy requires close coordination with application owners and careful planning. As we move forward on this project, I’ll talk about various idiosyncrasies, caveats, and problems we’ve faced, how we got around them, and I’ll also talk a lot about the differences between the Clariion/Celerra and NetApp platforms’ features and functionality, application support, and manageability. These comparisons will include using both systems with FiberChannel connections as well as CIFS/NFS NAS, all in conjunction with DR replication and failover.
To start off, I figure we should compare some of the terminology between EMC and NetApp systems. Some terms don’t directly translate, but I matched them up as close as I could and noted where there is no equivalent. Below are two tables: one for Block Storage, and the other for NAS Storage. Click on them to see full size versions.
In the next update, I’ll start talking about the deployment itself. The point of these articles is to discuss the differences, advantages, and disadvantages of each platform so that you can understand how each one might work in your environment. I do not intend to disparage either platform or vendor. I will try to be vendor agnostic as much as possible, and I do feel like I have a somewhat unique position of comparing new and recent hardware and firmware from both vendors, in the same production capacities, simultaneously, in the same environment. I am NOT comparing old ONTap code to new FLARE/DART code or vise-versa, nor am I comparing old Clariion CX hardware to new NetApp/IBM hardware, etc.