Wednesday, August 17, 2011

vSphere Replication 1.0


With another problem comes another opportunity...

I have been working on upgrading our vSphere host hardware and migrating VMs from our old EMC Celerra NS350 to a newer HP EVA4400 (that in of itself is worthy of its own blog post).  We had purchased the EVA two years ago to host our ERP data and it has run with a single host accessing it ever since.

So we ordered additional disks and shelves for the EVAs at both primary and DR datacenters.  I installed the HBAs in the hosts, added them to the fabric, created zones, created the LUNs, masked theme off, etc, etc.  Everything was going great until...

I went to setup replication.  I setup array-based replication (ABR) for the first LUN - no problem.  Storage vMotioned a VM over to it and it replicated without issue.  Tried to setup replication for the second LUN - major obstacle time.  HP's replication mechanism for the EVA, Continuous Access (CA), is licensed based on capacity.  And of course, we had licensed 1TB but needed more like 12TB.  Great.  Meanwhile, there's grumblings and doubt by others on the IT team that CA would even be the right choice for replicating this data.

Come on HP, really? Does any vendor license replication by capacity anymore?  You don't do this with LeftHand/P4000 or 3PAR arrays.  Fustrating...

Now I'll be the first to tell you that I hate, hate, hate vendor lock-in.  Technology changes so fast that whatever you're using today, probably isn't what you'll be using 3-5-10 years from now.  Again, a good topic that deserves its own post.  This is one reason that, as a vSphere and storage engineer, I've become a fan of host-based replication (HBR).  There are third-party products that provide this capability for virtual machines today: Veeam Backup and Replication and Quest vReplicator just to name a couple.

But here comes vSphere 5 and SRM 5.  We'll be entitled to both when they're released.  As part of the upgrade we'll get the capability to replicate VMs using vSphere Replication 1.0 for free.  I've started setting up a testing environment and will post my experiences with this new feature.  One thing I'm really curious about is how the bits actually get replicated.  Different arrays handle this differently.  I will have my investigative hat on at VMworld and will ask the storage vendors all the gory details.  I'll follow-up with another article detailing how different vendors implement their replication (geesh, I've got a lot of writing to do!).

In the mean-time, I've gathered some information on vSphere Replication 1.0, all of which is publicly available.  Exciting stuff!  Here are the details:
  • This feature is included with all editions of SRM 5
  • VMs can be replicated from any storage to any storage, including local disk
    • Replicated disks can be place on any ESXi-compatible disks/filesystem
    • Breaks storage vendor lock-in
  • Replication is an attribute of the VM (not the LUN or some other element)
  • You can choose which VMDKs to replicate within the VM
    • In some cases you may not want to replicate the system drive/VMDK, only the data drive/VMDK
  • Disks are replicated in a "group consistent" manner
  • Does not use CBT to track and replicate deltas.  Instead, VMware developed another technology that tracks I/O changes to VMDKs and captures them in a "PSF" or persistent state file.  It does not use VM snapshots
    • I'm not sure why they didn't leverage existing CBT technology - more details to follow
  • Initial "seed" copy can be made in advanced by FTP, external disk/sneaker net, etc.
    • Saves bandwidth - great if you have a slower WAN connection and/or a large number of VMs to replicate
  • RPO can be set on a per-VM basis
    • 5 - minutes to ?
    • If you need an RPO smaller than 5 minutes, you got other challenges to face!

Some limitations:
  • VM must be powered-on
    • My guess is that the thinking here is that if it's powered-off it must not be critical enough to recovery in a DR scenario.  I hope VMware reconsiders on this one.  I don't have any of these today, but it I can see the possibility of it in the future.
  • Will not replicate swap, logs, dumps
  • Will replicate VMs with snapshots.  However, snapshots will not be replicated.  Instead, the I/O from the source snapshot is written to the destination VM, effectively making the destination VM look like the source VM after collapsing the snapshot.
  • No FT VMs, linked clones, templates, physical RDMs, ISOs or floppies
  • Requires VM hardware version 7 or later

That wasn't too painful.  Here's what the (high-level) architecture looks like:
  • vRMS - vSphere Replication Management Server
    • Required at both sites
    • This is a virtual appliance (VA) imported into vCenter
  • vRA - vShpere Replciation agent
    • Required at the protected site
    • Runs on the ESXi 5 hosts
  • vRS - vSphere Replication Server
    • Runs on the recovery site
    • This too is a VA imported into the vCenter at the recovery site

Scalability info:
  • VM totals = 500 replicated (1000 total for SRM)
    • If you need to protect more that 500 VMs, not only do you have a large environment, you'll need to use ABR or find an alternative HBR solution that can scale higher (if it exists).  With that size of an environment I'd recommend working with your VMware account representative and/or storage vendor.

For a storage geek like me this is pretty exciting stuff.  I think a lot of VMware customers, from the small SMB to the mid-sized and even some larger companies, are going to benefit from this new feature.

Time to kick the tires, stay tuned!