Thursday, March 15, 2012

vSphere 5 Upgrade: SRM - PART 1

Per Its Time vSphere 5 Upgrade, time to upgrade SRM.  Well, I got to step 6.2 and things went downhill from there.  The following is the description I used to open an SR with VMware support:
Upgrading SRM from 4.1.2 to 5.0.  I get the error: "failed to create database tables".  It doesn't appear to make any changes to the database.
I've double-checked settings per KB1015436 and several communities postings.
I've tried re-installing SRM 4.1.2 (successfully), performing a repair, then another upgrade but it still fails with the same error.
The first support tech went down the 'invalid permissions' path but this was not the problem.  Turns out, the SRM 5.0 upgrade does not support upgrading from SRM 4.1.2!  Interesting because the only documentation I can find on the subject clearly states that you can upgrade from SRM 4.1(!).  Looks like VMware needs to do a better job documenting these requirements.

I was then informed that I could wait until the next minor/point release of SRM 5 which would support upgrading from 4.1.2, but I didn't have that kind of time (and who knows when they'll actually release it).  So no upgrade for me, full install from scratch instead!  Besides have to reconfigure mappings, protection groups (which I was going to have to do anyway), etc, the biggest downside is losing the previous DR test results.  Yes, I saved those off as separate Excel files, but it would have been nice to have had all of the results right there in SRM from the beginning.


But wait, there's more!  Now that I have a brand new freshly installed SRM up and running, it's time to setup vSphere Replication.  Did that go problem free you ask?  Ummm, no.  The following is the description I used to open yet another SR with VMware support:
The VRMS servers at both sites fail to connect.  I have unregistered the server, powered down/deleted the appliance VM, re-initialized the VRMS database, repaired SRM, redeployed the VRMS servers and configured them with the same vCenter FQDN per KB2007463 but still have the same problem.
Between the support tech and I it took several hours to figure this one out.  The short of it is that it's a vCenter certificate problem.  What clued me into this was the error I got when registering the VRMS instance:

That "unacceptable signature algorithm" message is not your typical self-signed cert warning!  Turns out, my vCenter self-signed certs had expired.  This hadn't caused a problem until installing vSphere Replication - it wants at least a current/non-expired cert.  I checked the vCenter cert and sure enough, it had expired in 2010.  It was created in 2008 and was valid for only 2 years!

Now I bet you're wondering, how does one fix this cert problem?  Well that's easy, reinstall vCenter!  And repair won't work either so you have to uninstall the current vCenter instance and re-install a new one.  Luckily most settings are maintained in the vCenter database so this could have been much more painful.

While I was at it I checked the new vCenter cert and VMware apparently decided to make this one valid for 10 years.  Now that's more like it!

But wait, there's more!  Look for part PART 2 of this adventure in a near future post.  A little hint - the fun ain't over yet.


vSphere 5 Upgrade: VMFS Datastores

Per Its Time vSphere 5 Upgrade, time to upgrade VMFS datastores.  Like the last couple of steps this one completed w/o issue.  Note that it is better to create VMFS datastores because you'll get a 1MB block size regardless of what size datastore you're creating, optimizing disk space.  Compare this to the 2-8MB block sizes required based on datastore size in ESXi 4.1 and earlier.

All of my datastores now report to be VMFS version 5.54.


vSphere 5 Upgrade: VM Tools and Virtual Hardware

Per Its Time vSphere 5 Upgrade, time to upgrade VMware Tools and Virtual Hardware.  I'm happy to report that both of these steps completed w/o issue.  After the previous 4.1 upgrade, the VMware Tools upgrade corrupted one of my Windows 2000 VMs.  VMware claimed it was a Microsoft problem, Microsoft claimed it was a VMware problem.  Love the finger-pointing guys!  Anyway, I made sure this VM was upgraded (actually replaced) with a Windows 2008 R2 server and we updated the application while we were at it - a win/win in my book.

No problems this time (phew!).

The only other thing I did differently compared to my previous upgrade was upgrade the SQL and vCenter servers first.  vUpdate Manager will skip these VMs anyway.

vSphere 5 Upgrade: ESXi Hosts

Per Its Time vSphere 5 Upgrade, the ESXi host upgrades.  Not much to report here.  The upgrades completed w/o issue per vUpdate Manager.

Note that I had to force remediation.  Then I enabled "remove incompatible packages".  I didn't seem to loose anything after the upgrade.  Maybe it removed the HP bundles that were installed for ESXi 4.1?

ERROR: ExtPart - Cannot find C: Drive

ExtPart threw this error on a Windows Server 2003 R2 64bit system that just had logical disk 0 extended.
ExtPart does support 32 and 64bit Windows Server 2003 R1/R2 (needs to be extracted via 32bit Windows or a tool such as 7zip).

My guess was that the command interpreter was not "seeing" the newly extended local disk space.  Sure enough, a reboot fix 'er right up!

I highly recommend both tools be a part of any administrator's toolbag:
Dell's ExtPart Tool
7-zip

Friday, March 2, 2012

Removing a VMFS Datastore

To remove VMFS datastores in the past, I always made sure there were no VMs left on the datastore, right-clicked and chose "Delete".  Well, apparently thats the wrong way to do it!  I must have been lucky, as VMware claims this method could result in an APD (All Paths Down) state.  If you don't know what that is, let me tell you it's bad (I have experienced this but for a different reason).  Your host(s) will lose access to storage.

I stumbled upon this vSphere blog post that has the procedure to remove a datastore the right way: Best Practice: How to correctly remove a LUN from an ESX host

UPDATE: For ESXi 5.0, I found it better to follow the KB mentioned in that blog post:
Unpresenting a LUN in ESXi 5.x

For ESXi 5.0, here's how I do a slightly modified vesion of the procedure listed in the post above with more information on items such as HA:
  1. Make sure all VMs are evacuated from the datastore/LUN.
  2. Using Datastore Browser, make sure there aren't any left-over directories or files.  If there are, delete them (make sure they can safely deleted first, of course).  Exception: the HA directory - you'll remove that a different way later.  Also note that you won't be able to delete a file if it's in use.
  3. Make sure all vSphere features are disabled for the datastore (i.e. SIOC, Storage DRS)
    1. HA Datastore Heartbeating:  There may be some cases where HA has chosen the datastore you're trying to remove.  In this case, edit cluster settings and change Datastore Heartbeating to "select only from my preferred datastores", then select at least three datastores other than the one you're trying to remove.  I highly recommend changing this setting back to "Select any of the cluster datastores" after you're finished removing this one.
  4. Next, for each host, go to Configuration\Storage\Datastores View, right-lick on the datastore and choose "unmount":
    1. The "Unmount Datastore Wizard" appears. Make sure all hosts are selected.
    2. Click next.  Hopefully the next step will look like this:
    3. Click "Next" then "Finish".  After a minute the datastore will be grayed-out and italicized.
  5. Then go to Configuration\Storage\Devices View, right-lick on the datastore and choose "detach"
    1. You get a pop-up dialog window similar to the one show above.  Choose "OK.  The device will be grayed-out and italicized after a few seconds.
  6. Go in to your SAN and remove LUN masking/unpresent the LUNs to the ESXi hosts.
  7. Finally, right-click on your cluster and choose "Rescan for Datastores".  You may get a StorageConnectivityAlarm alert for every host in your cluster unless you disable this alert first.

I like to go into every host and check the storage adapter for the LUN just to make sure but I haven't found a problem with this procedure yet.