Monday, January 30, 2012

Milestone Reached: 92% Virtualized!

My datacenter reached 92% virtualization Q42011.

The "% virtualized by server" metric is something I recommend every administrator track.  I have been doing this at least once per year but per quarter is probably a better frequency.  I also track "% virtualized by application".  We reached 90% for this metric in the same time-frame.

Note that this is a ratio of virtualized to virtualizable servers (if that wasn't a word, it is now!).  I don't consider servers that can't be virtualized - for example, an HP-UX OS running on Itanium hardware.  These servers aren't important because come upgrade time, your standard platform is vSphere so the rest will take care of itself over time.

Tracking these metrics will give you a good indication of progress over time. It's very interesting to see where things stood 1, 2 or 3 years ago.  It would have been nice to to compare this to energy costs but unfortnately either we haven't tracked this costs or I don't have access to it.

Overall, this is a good indicator of where you're at in your virtualization journey.

A final note: that last 10% is the hardest!

Tuesday, January 24, 2012

ConfigMgr Query: Office 2007 Version

(Note: Another one of my many IT hats is "Systems Management guru".  In my career I have completed 3 different SMS/ConfigMgr implementations.  I started with SMS 1.2 (which was very painful!).  I certified on SMS 2.0 as part of my MCSE.  My previous employer hired me to complete a nation-wide roll-out of SMS 2003.  And, while my current employer hired me to implement and maintain a VMware VI/vSphere environment, they purchased ConfigMgr 2007 but hadn't implemented it yet so I helped them with that (I can't seem to get away from it!).  So from time-to-time you'll see me post Systems Management and ConfigMgr topics such as inventory, queries, software distributions and even Web Remote Tools.)

Turns out that to use some of the new features of Exchange 2010 you must be running Office 2007 SP3.  There's even better integration with Office 2010 of course.

The Office version can be found in Add/Remove Programs.  Here's Microsoft's KB on Office 2007 versions.  At the time of this writing that had not updated it for SP3, which is version 12.0.6612.1000.

To determine what users aren't running Office 2007 SP3, I used the following ConfigMgr 2007 query:

select distinct SMS_R_System.Name, SMS_R_System.ADSiteName, SMS_R_System.IPSubnets, SMS_R_System.LastLogonUserName, SMS_G_System_ADD_REMOVE_PROGRAMS.DisplayName, SMS_G_System_ADD_REMOVE_PROGRAMS.Version
 inner join SMS_G_System_ADD_REMOVE_PROGRAMS
 SMS_G_System_ADD_REMOVE_PROGRAMS.ResourceID = SMS_R_System.ResourceId
SMS_G_System_ADD_REMOVE_PROGRAMS.DisplayName = "Microsoft Office Basic 2007"
SMS_G_System_ADD_REMOVE_PROGRAMS.DisplayName = "Microsoft Office Standard 2007"
SMS_G_System_ADD_REMOVE_PROGRAMS.DisplayName = "Microsoft Office Professional 2007"
SMS_G_System_ADD_REMOVE_PROGRAMS.DisplayName = "Microsoft Office Professional Plus 2007"
SMS_G_System_ADD_REMOVE_PROGRAMS.DisplayName = "Microsoft Office Ultimate 2007"
SMS_G_System_ADD_REMOVE_PROGRAMS.DisplayName = "Microsoft Office Enterprise 2007")
SMS_G_System_ADD_REMOVE_PROGRAMS.Version != "12.0.6612.1000"
order by SMS_R_System.ADSiteName

Tuesday, January 17, 2012

It's Time: vSphere 5 Upgrade

The technological benefits of upgrading from VMware vSphere 4.x to vSphere 5.x are not as great as upgrading VMware Virtual Infrastructure 3.x to vSphere 4.x in my opinion.  I don't know what the numbers are but I'm guessing that adoption hasn't been as fast as a consequence.

The latest XtraVirt poll posed this question with the following results:
What is your company's timescale for vSphere 5 deployment?
35%   6-12 Months
27%   1-3 Months
15%   3-6 Months
15%   1-2 Years
8%     No Current Plan

Well over half of admins responding will be upgrading within a year.  That's not bad.  I am in the "1-3 months" category and could argue that I've already started (planning and testing are complete).  The driver in my case is not technology/features-based, but is cost avoidance.

HP EVA SAN replication is licensed by capacity.  I don't want to rant about how utterly wrong this is but I'm not about to pay HP additional thousands of dollars for the privilege of replicating my data, thank you.  And now that SRM 5.0 provides replication for free, I don't have to (see previous post).

I've been planning our vSphere upgrade over the last month.  I've successfully installed and upgraded to vSphere 5 in the lab, now it's time to upgrade production.  I think VMware has finally got the upgrade process right.  It took them awhile although the vSphere 4.x upgrade was fairly smooth.  If you upgraded Virtual Infrastructure 2 to 3 you know what I mean!

I've pasted my upgrade plan below (taking out company-specifics).  I also included some notes from the lab tests. On to the show!

Phase 1: Planning

VMware vSphere Upgrade and Install Community Forum

(Note: I watched the forums and blogosphere in general to determine the stability of the release.  This is always a good practice but especially with a ".0" release - let others test it out.)

As of 9/15/2011, no real “deal-killer”/systemic errors found during or after upgrade.  Here’s the list:
1.       Port group names must be less than 19 characters (no impact to us)
2.       vCenter license must be provided during installation or it will start in expired mode (no impact to us)
3.       One admin gets an error when trying to upgrade ESXi via CLI – “downloading metadata failed” (no impact to us – we’re not using CLI to upgrade ESXi)
4.       One admin had a corrupted esx.conf file that prevented him from upgrading ESXi (no impact to us)

Check Pre-requisites:

1.       Check VMware Infrastructure Licenses
a.       vCenter 5.0  (available)
b.      ESXi 5.0  (available)
c.   Site Recovery Manager 5.0 (available November 19, 2011)
2.       Check Hardware Compatibility
a.       Server Platform
b.   SAN Platform
c.   HBA cards
d.   NIC cards
3.       Check Software Compatibility
a.       Review Guest OS Support
b.      Review 3rd-party Product Support
                                                   i.      Backup Software
                                                 ii.      Management Software
                                                iii.      Custom Scripts
c.       Review Interoperability matrix

PROJECT MILESTONE: Planning Complete

Phase 2: Testing

Setup Test Environment

Once the above steps are complete, it’s time to start testing.
(Note: The general idea here is to configure an environment that matches the current production environment as closely as possible.  I've done this in the past using cloned VMs with "mostly" successful results.  This is the way to go if you can keep the test and production networks isolated.  For this upgrade, I decided to configure new VMs - this should still give me valid tests and is faster to get up and running than using cloned VMs based on past experience.  If you have the time, you may want to go the cloned VMs route.)

Step 1: Setup ESXi Hosts

1.       Setup 3 servers as ESXi hosts in Datacenter1
2.       Setup 3 servers as ESXi hosts in Datacenter2
3.       Install vSphere 4.1 (or whatever your current version is) each host per the COMPANY Installation Guide.  (You do have this documented, right?)

Step 2: Setup SAN LUNs and Replication

1.       Create one LUN in Datacenter1 for vCenter and SQL VMs
2.       Create one LUN in Datacenter2 for vCenter and SQL VMs
3.       Create one read-only LUN in Datacenter2 to host the replicated LUN
4.       Configure the Datacenter1 LUN for replication to Datacenter2.

Step 3: Setup Test Virtual Machines

1.       Install one base Windows Server 2008 R2 image on one host as each site to be used as the VCenter/SRM server
2.       Clone images for use as SQL server to host the vCenter databases
3.       Configure images as stand-alone VMs.

Step 4: Configure SQL Server

1.       Install the same SQL Server version used in production both SQL VMs
a.       Use the production configuration as reference.

Step 5: Configure vCenter

1.       Install the SQL Native Client on both vCenter VMs
2.       Install vCenter on both vCenter VMs
a.       Use the production configuration as reference
b.      Configure vCenter with production licenses during installation or it will start in expired mode
3.       Add hosts to vCenter as appropriate
4.       Use vUM to apply the same updates to ESXi hosts as are present in production.
a.       TEST: This will test vUM to upgrade hosts
b.      TEST: This will also test vMotion capability
c.       TEST: This will also test the ESXi host configuration (networks, storage, etc.).

Step 6: Configure SRM

1.       Install SRM on each vCenter VM
2.       Configure SRM for the replicated LUN
3.       TEST: Perform a test SRM recovery to ensure system is functional.

PROJECT MILESTONE: Fully Functional Test Environment Complete

Upgrade to vCenter 5.0/SRM 5.0

Now with a fully functional test environment in place, it’s time to start upgrading.
The upgrade to 5.0 is supposed to be the least intense VI/vSphere upgrade yet.  The steps are fairly straight-forward:
  1.     Upgrade vCenter/Upgrade Manager (vUM)
  2.     Upgrade the ESXi hosts
  3.     Upgrade VMware Tools of the VMs and the Virtual Hardware of the VMs to version 8
  4.     Upgrade the VMFS datastores to version 5
  5.     Upgrade the SRM to version 5
  6.     Upgrade and 3rd-party tools and scripts.

Step 1: Upgrade vCenter

1.       Backup the vCenter database using SQL Server Management Studio
2.       Backup the SSL certificates (%allusersprofile%\Application Data\VMware\VMware VirtualCenter)
3.      Stop all vCenter services
4.       Install JDK 1.6
5.       Using the vCenter ISO, upgrade vCenter to version 5
a.       Upgrade the recovery site first
b.      Run the vCenter host agent pre-upgrade checker
6.       Configure the new vSphere 5 licenses
7.       Upgrade the vSphere Client
8.       From the recovery site, rejoin the site via Linked Mode:
From the Start menu, select All Programs > VMware > vCenter Server Linked Mode Configuration

Step 2: Upgrade Manager (vUM)

1.       Backup the vUM database using SQL Server Management Studio
2.       Stop all vUM services
3.       Install JDK 1.6
4.       Using the vCenter ISO, upgrade vUM to version 5
a.       Upgrade the recovery site first
5.       Upgrade the vSphere Client vUM plug-in

Step 3: Upgrade the ESXi hosts

There are two options to upgrade the hosts: via vUM or doing a clean install with the OEM custom ESXi 5 image/ISO.  For testing purposes, we’ll try both methods.
1.       Using vUM, upgrade the first host of each cluster
a.       Test vMotioning VMs between 4.1 and 5.0 hosts
(Lab note: Need to force remediation.  Enable "remove incompatible packages".)
2.       Using the ISO, perform a clean install of the last host in each cluster.
(Note: I skipped step 2 - vUM worked so well I decided to go with that method.  HP provides an image that can be "imported" directly into vUM (very nice!) which worked great.)

DECISION POINT: Use vUM Upgrade or Clean Install from ISO

Step 4: Upgrade VMware Tools and Virtual Hardware

1.       Using vUM, upgrade the VMware Tools component of all VMs
2.       Using vUM, upgrade the virtual hardware of all VMs.
(Note: vUM scan skips vCenter and SQL VMs.  You'll need to go back and upgrade these manually.)

Step 5: Upgrade the VMFS Datastore

The VMFS datastores can be upgraded in place while the VMs are running.

Step 6: Upgrade SRM

1.       Snapshot VC and SQL VMs
2.       Upgrade SRM to 5.0 on recovery, then protected site vCenter VM
a.       Stop SRM
b.      Uninstall SAN SRA
c.       Uninstall the SRM plug-in
d.      Upgrade SRM
e.      Install the 5.0 SRA (if using array-based replication)
f.        Restart SRM
3.       Configure SRM
a.       Remove Array Manager
b.      Add new Array Manager
c.       Reconfigure protection group
d.      Reconfigure recovery plan.
4.       Install and Configure vSphere Replication (if not using array-based replication)
a.       At the PROTECTED SITE deploy the vSphere Replication Management Server (vRMS)
                                                   i.      Assign a static IP address
                                                 ii.      Register with the protected site’s vCenter instance
b.      At the RECOVERY SITE deploy the vSphere Replication Management Server (vRMS)
                                                   i.      Assign a static IP address
                                                 ii.      Register with the recovery site’s vCenter instance
c.       At the RECOVERY Site deploy the vSphere Replication Server (vRS)
d.      Configure VMs for replication via the vSphere client
5.       TEST: Perform a test SRM recovery to ensure system is functional

Step 7: Configure Network Monitoring Tool

Note: In our case, we configured 2 "applications" in our NetFlow-based network traffic monitoring system:
  1. vSphere Initial: port 31031 (port used to do the inital "seed" copy of the VM)
  2. vSphere Ongoing: port 44046 (port used to replicate changes after initial replication completes)

Step 8: Upgrade 3rd-Party Tools

There’s little need to test tools that vendors have certified for vSphere 5.0.  However, custom scripts will need to be tested.

PROJECT MILESTONE: vCenter and Site Recovery Manager Upgrade Complete

Phase 3: Production Upgrade

Follow steps in Phase 2 to upgrade production.  


That's it!  Depending on the outcome of your tests in Phase 2, this may become an iterative process.  Also note that there's always the chance that you've tested everything 100 times and still encounter an issue while upgrading production.  It's the nature of the beast.  However, having the experience of performing the upgrade in a test environment will give you a leg up in troubleshooting problems.  And, having worked with VMware support numerous times, I can recommend calling them without hesitation.

Finally, always remember to document things along the way.  Follow these steps and you will be in good shape.

Thursday, January 12, 2012

vSphere Memory Utilization

Determining actual host memory utilization can present a challenge.  The issue boils down to when the host breaks large memory pages into small pages which can then be shared (aka Transparent Page Sharing).  TPS only works with small pages.

To determine actual host memory usage, especially after the host is using a significant amount (65-70%+), you have to monitor an esxtop parameter called COWH.  For example, you may get to 70% utilization and find adding another host to the cluster doesn’t decrease utilization, but adding another VM or two does(!).  This is because at some point the host will  break some of the large pages into small pages and then TPS kicks in.

For more in depth reading see:

(Not so) Random Thought of The Day #3

Obamacare was modeled after Romneycare in MA.  If Mitt wins the Republican nomination and runs against Obama for president, will there be a real choice?

Changes are a Comin'

Changes are always coming!  There are few things as consistent in IT as change.  The first "system" that I fancied myself an "expert" on was Novell NetWare 3.11.  Then LANtastic, Then Windows NT 3.51.  Then... well you get the picture.  IT technologies, strategies, directions all change over time.  I've turned down job offers when I thought it might lock me into any one product or technology for any length of time.

So this year will be no different.  When I read the feature list for Microsoft Hyper-V 3.0 a couple of months ago, I started telling my peers to keep their eyes open.  For 2012, VMware will remain the market and technology leader.  I have no doubt about that.  But competitors are catching up.  It happens all of the time (i.e. IE vs. Netscape).  The march of innovation will continue to force change on all of us.  That's one of the things I enjoy about IT and technology in general.

I was inspired to write a few thoughts here after reading Mr. Ruben's article on his virtualization landscape predictions for 2012.  Check it out here to read more.

Monday, January 9, 2012

Snapshotting Large Files on Large Datastores

The error:

File <unspecified filename> is larger than the maximum size supported by datastore '<unspecified datastore>

The problem:
The VM has 2 virtual disks, one for the primary system OS on datastore1 formatted with a 2MB block size, the other for data on datastore2 formatted with an 8MB block size.

In vSphere 4.1, when a snap is created on a VM with the above configuration it will put the change files in the same datastore/directory as the VMX file (datastore1 in this case).  So it will try to create a VMDK with the same provisioned size as the original VMDK, although the actual size will typically be much smaller since this file holds only the changes since the snap was created.

Consider a VM with VMDK1 on Datastore1 with a size of 30GB and VMDK2 on Datastore2 with a size of 512GB.  Datastore1 was formatted with 2MB block size and Datastore2 was formatted with 4MB size.  As long as VMDK2 is less than 512GB, the snap will complete successfully.  However, if VMDK2 is over 512GB, the snap will fail with the error listed above even though Datastore2 is capable of hosting virtual disks/VMDKs up to 1TB in size.

I haven't tried this with vSphere 5 yet but will update this post as soon as I have completed testing.  I would expect the same results, especially for upgraded datastores.

The solution:
The best practice is to format all datastores with the same block size (a non-issue for vSphere 5). This will allow you to avoid the problem all-together.

You can also exclude the VMDK on the datastore formatted with the larger block size by changing the virtual disk mode to Independent/Persistent.  The main downside is you won't be able to capture image/VMDK-based backups.

Also consider this chart from VMware KB1003565:

Block Size
Largest virtual disk on VMFS-2
Largest virtual disk on VMFS-3
Largest virtual disk on VMFS-5
2TB minus 512B
Valid if upgraded from VMFS-3
Valid if upgraded from VMFS-3
2TB minus 512B
Valid if upgraded from VMFS-3
Invalid block size
Invalid block size
Invalid block size
Invalid block size
Invalid block size
Invalid block size