Tuesday, January 17, 2012

It's Time: vSphere 5 Upgrade

The technological benefits of upgrading from VMware vSphere 4.x to vSphere 5.x are not as great as upgrading VMware Virtual Infrastructure 3.x to vSphere 4.x in my opinion.  I don't know what the numbers are but I'm guessing that adoption hasn't been as fast as a consequence.

The latest XtraVirt poll posed this question with the following results:
What is your company's timescale for vSphere 5 deployment?
35%   6-12 Months
27%   1-3 Months
15%   3-6 Months
15%   1-2 Years
8%     No Current Plan

Well over half of admins responding will be upgrading within a year.  That's not bad.  I am in the "1-3 months" category and could argue that I've already started (planning and testing are complete).  The driver in my case is not technology/features-based, but is cost avoidance.

HP EVA SAN replication is licensed by capacity.  I don't want to rant about how utterly wrong this is but I'm not about to pay HP additional thousands of dollars for the privilege of replicating my data, thank you.  And now that SRM 5.0 provides replication for free, I don't have to (see previous post).

I've been planning our vSphere upgrade over the last month.  I've successfully installed and upgraded to vSphere 5 in the lab, now it's time to upgrade production.  I think VMware has finally got the upgrade process right.  It took them awhile although the vSphere 4.x upgrade was fairly smooth.  If you upgraded Virtual Infrastructure 2 to 3 you know what I mean!

I've pasted my upgrade plan below (taking out company-specifics).  I also included some notes from the lab tests. On to the show!

Phase 1: Planning

VMware vSphere Upgrade and Install Community Forum

(Note: I watched the forums and blogosphere in general to determine the stability of the release.  This is always a good practice but especially with a ".0" release - let others test it out.)

As of 9/15/2011, no real “deal-killer”/systemic errors found during or after upgrade.  Here’s the list:
1.       Port group names must be less than 19 characters (no impact to us)
2.       vCenter license must be provided during installation or it will start in expired mode (no impact to us)
3.       One admin gets an error when trying to upgrade ESXi via CLI – “downloading metadata failed” (no impact to us – we’re not using CLI to upgrade ESXi)
4.       One admin had a corrupted esx.conf file that prevented him from upgrading ESXi (no impact to us)

Check Pre-requisites:

1.       Check VMware Infrastructure Licenses
a.       vCenter 5.0  (available)
b.      ESXi 5.0  (available)
c.   Site Recovery Manager 5.0 (available November 19, 2011)
2.       Check Hardware Compatibility
a.       Server Platform
b.   SAN Platform
c.   HBA cards
d.   NIC cards
3.       Check Software Compatibility
a.       Review Guest OS Support
b.      Review 3rd-party Product Support
                                                   i.      Backup Software
                                                 ii.      Management Software
                                                iii.      Custom Scripts
c.       Review Interoperability matrix
                                                   i.      http://www.vmware.com/resources/compatibility/sim/interop_matrix.php

PROJECT MILESTONE: Planning Complete

Phase 2: Testing

Setup Test Environment

Once the above steps are complete, it’s time to start testing.
(Note: The general idea here is to configure an environment that matches the current production environment as closely as possible.  I've done this in the past using cloned VMs with "mostly" successful results.  This is the way to go if you can keep the test and production networks isolated.  For this upgrade, I decided to configure new VMs - this should still give me valid tests and is faster to get up and running than using cloned VMs based on past experience.  If you have the time, you may want to go the cloned VMs route.)

Step 1: Setup ESXi Hosts

1.       Setup 3 servers as ESXi hosts in Datacenter1
2.       Setup 3 servers as ESXi hosts in Datacenter2
3.       Install vSphere 4.1 (or whatever your current version is) each host per the COMPANY Installation Guide.  (You do have this documented, right?)

Step 2: Setup SAN LUNs and Replication

1.       Create one LUN in Datacenter1 for vCenter and SQL VMs
2.       Create one LUN in Datacenter2 for vCenter and SQL VMs
3.       Create one read-only LUN in Datacenter2 to host the replicated LUN
4.       Configure the Datacenter1 LUN for replication to Datacenter2.

Step 3: Setup Test Virtual Machines

1.       Install one base Windows Server 2008 R2 image on one host as each site to be used as the VCenter/SRM server
2.       Clone images for use as SQL server to host the vCenter databases
3.       Configure images as stand-alone VMs.

Step 4: Configure SQL Server

1.       Install the same SQL Server version used in production both SQL VMs
a.       Use the production configuration as reference.

Step 5: Configure vCenter

1.       Install the SQL Native Client on both vCenter VMs
2.       Install vCenter on both vCenter VMs
a.       Use the production configuration as reference
b.      Configure vCenter with production licenses during installation or it will start in expired mode
3.       Add hosts to vCenter as appropriate
4.       Use vUM to apply the same updates to ESXi hosts as are present in production.
a.       TEST: This will test vUM to upgrade hosts
b.      TEST: This will also test vMotion capability
c.       TEST: This will also test the ESXi host configuration (networks, storage, etc.).

Step 6: Configure SRM

1.       Install SRM on each vCenter VM
2.       Configure SRM for the replicated LUN
3.       TEST: Perform a test SRM recovery to ensure system is functional.

PROJECT MILESTONE: Fully Functional Test Environment Complete

Upgrade to vCenter 5.0/SRM 5.0

Now with a fully functional test environment in place, it’s time to start upgrading.
The upgrade to 5.0 is supposed to be the least intense VI/vSphere upgrade yet.  The steps are fairly straight-forward:
  1.     Upgrade vCenter/Upgrade Manager (vUM)
  2.     Upgrade the ESXi hosts
  3.     Upgrade VMware Tools of the VMs and the Virtual Hardware of the VMs to version 8
  4.     Upgrade the VMFS datastores to version 5
  5.     Upgrade the SRM to version 5
  6.     Upgrade and 3rd-party tools and scripts.

Step 1: Upgrade vCenter

1.       Backup the vCenter database using SQL Server Management Studio
2.       Backup the SSL certificates (%allusersprofile%\Application Data\VMware\VMware VirtualCenter)
3.      Stop all vCenter services
4.       Install JDK 1.6
5.       Using the vCenter ISO, upgrade vCenter to version 5
a.       Upgrade the recovery site first
b.      Run the vCenter host agent pre-upgrade checker
6.       Configure the new vSphere 5 licenses
7.       Upgrade the vSphere Client
8.       From the recovery site, rejoin the site via Linked Mode:
From the Start menu, select All Programs > VMware > vCenter Server Linked Mode Configuration

Step 2: Upgrade Manager (vUM)

1.       Backup the vUM database using SQL Server Management Studio
2.       Stop all vUM services
3.       Install JDK 1.6
4.       Using the vCenter ISO, upgrade vUM to version 5
a.       Upgrade the recovery site first
5.       Upgrade the vSphere Client vUM plug-in

Step 3: Upgrade the ESXi hosts

There are two options to upgrade the hosts: via vUM or doing a clean install with the OEM custom ESXi 5 image/ISO.  For testing purposes, we’ll try both methods.
1.       Using vUM, upgrade the first host of each cluster
a.       Test vMotioning VMs between 4.1 and 5.0 hosts
(Lab note: Need to force remediation.  Enable "remove incompatible packages".)
2.       Using the ISO, perform a clean install of the last host in each cluster.
(Note: I skipped step 2 - vUM worked so well I decided to go with that method.  HP provides an image that can be "imported" directly into vUM (very nice!) which worked great.)

DECISION POINT: Use vUM Upgrade or Clean Install from ISO

Step 4: Upgrade VMware Tools and Virtual Hardware

1.       Using vUM, upgrade the VMware Tools component of all VMs
2.       Using vUM, upgrade the virtual hardware of all VMs.
(Note: vUM scan skips vCenter and SQL VMs.  You'll need to go back and upgrade these manually.)

Step 5: Upgrade the VMFS Datastore

The VMFS datastores can be upgraded in place while the VMs are running.

Step 6: Upgrade SRM

1.       Snapshot VC and SQL VMs
2.       Upgrade SRM to 5.0 on recovery, then protected site vCenter VM
a.       Stop SRM
b.      Uninstall SAN SRA
c.       Uninstall the SRM plug-in
d.      Upgrade SRM
e.      Install the 5.0 SRA (if using array-based replication)
f.        Restart SRM
3.       Configure SRM
a.       Remove Array Manager
b.      Add new Array Manager
c.       Reconfigure protection group
d.      Reconfigure recovery plan.
4.       Install and Configure vSphere Replication (if not using array-based replication)
a.       At the PROTECTED SITE deploy the vSphere Replication Management Server (vRMS)
                                                   i.      Assign a static IP address
                                                 ii.      Register with the protected site’s vCenter instance
b.      At the RECOVERY SITE deploy the vSphere Replication Management Server (vRMS)
                                                   i.      Assign a static IP address
                                                 ii.      Register with the recovery site’s vCenter instance
c.       At the RECOVERY Site deploy the vSphere Replication Server (vRS)
d.      Configure VMs for replication via the vSphere client
5.       TEST: Perform a test SRM recovery to ensure system is functional

Step 7: Configure Network Monitoring Tool

Note: In our case, we configured 2 "applications" in our NetFlow-based network traffic monitoring system:
  1. vSphere Initial: port 31031 (port used to do the inital "seed" copy of the VM)
  2. vSphere Ongoing: port 44046 (port used to replicate changes after initial replication completes)

Step 8: Upgrade 3rd-Party Tools

There’s little need to test tools that vendors have certified for vSphere 5.0.  However, custom scripts will need to be tested.

PROJECT MILESTONE: vCenter and Site Recovery Manager Upgrade Complete

Phase 3: Production Upgrade

Follow steps in Phase 2 to upgrade production.  

Conclusion 

That's it!  Depending on the outcome of your tests in Phase 2, this may become an iterative process.  Also note that there's always the chance that you've tested everything 100 times and still encounter an issue while upgrading production.  It's the nature of the beast.  However, having the experience of performing the upgrade in a test environment will give you a leg up in troubleshooting problems.  And, having worked with VMware support numerous times, I can recommend calling them without hesitation.

Finally, always remember to document things along the way.  Follow these steps and you will be in good shape.