Monday, August 1, 2011

ERROR: Cannot login vi-admin00@IPADDRESS

Don't you love it when, during a standard log review of your vSphere environment, you find an error like this that zaps the next four hours of your time?  Not!  Maybe this will save you some time.

Scenario
I had the ESXi 4.1 hosts in my vSphere cluster setup to remote syslog to the VMware vMA appliance per Simon's excellent instructions:  Using vMA as Your ESXi Syslog Server
I recently upgraded our vSphere cluster hardware which included a fresh installation of ESXi.
With that in mind, after recently reviewing tasks and events in vCenter, I noticed the error message "Cannot login vi-admin00@IPADDRESS" where IPADDRESS was the IP of the vMA system.  I found this error on all of the hosts' local events and it occurred often.

Troubleshooting
Reading through the comments of the above post, I noticed someone else had the same problem, but no responses.  I did the "chown" change on the syslog directory but this did not solve the problem.

I then did ran the following command directly on the vMA appliance:
vilogger list --server SERVERNAME
Per the results, I found that the host was "enabled" but it had an "Authentication Failure".  This got me wondering about that vi-admin00 account in the original error message. The vMA has a "vi-admin" local account, but what is "vi-admin00"?   I fired up the vSphere client and logged directly in to one of the host.  Sure enough, the account didn't exist.

Solution
A little more investigation (er, Google searching), and I found the answer here:
How to Remove Stale Targets from vMA
Apparently, rebuilding/replacing the hosts wiped out the accounts vilogger creates including vi-admin00!

First step to fix this is to remove the server.  I did not need to use the "force" parameter:
sudo vifp removeserver SERVERNAME

Then add the server back in:
sudo vifp addserver SERVERNAME

Finally, re-register the host with vilogger:
vilogger enable --server SERVERNAME --numrotation 20 --maxfilesize 10 --collectionperiod 10

You'll know it worked if you get the green "Enabled" result messages.

To verify:
vilogger list --server SERVERNAME

You should see each of the three logs listed as "Enabled" and "Collecting".  I also WinSCP'ed to the system and made sure the logs were updating with new data.

Conclusion
Do this for all hosts in your cluster and you'll be back in business.  And don't forget to add this to the host rebuild/replace checklist.  It's always the little things, isn't it?