Unable to login as a user on a 4.1 ESX server

By default, a 4.1 ESX server denies logins of standard users, while root access via ssh is enabled without problems. This has changed from 4.0 and has caused many headaches for those systems upgraded to 4.1.

Obviously, this is a security problem and something we do not want.

To protect your ESX server and restore standard user access, you have to replace the system-auth config file. In this event, an older 4.0 version of the file will do the job. Always remember to make a backup just in case something goes wrong (if it does and you don’t notice..you’re screwed, so pay attention)

#vi /etc/pam.d/system-auth

paste this content inside the file:

#%PAM-1.0
# Autogenerated by esxcfg-auth

account    required    /lib/security/$ISA/pam_unix.so

auth          required    /lib/security/$ISA/pam_env.so
auth          sufficient           /lib/security/$ISA/pam_unix.so        likeauth nullok
auth          required    /lib/security/$ISA/pam_deny.so

password    requisite     pam_cracklib.so try_first_pass retry=3 dcredit=-1 ucredit=0  ocredit=-1 lcredit=-1 minlen=8
password           required    /lib/security/$ISA/pam_cracklib.so            retry=3
password           sufficient           /lib/security/$ISA/pam_unix.so        nullok use_authtok md5 shadow
password           required    /lib/security/$ISA/pam_deny.so

session      required    /lib/security/$ISA/pam_limits.so
session      required    /lib/security/$ISA/pam_unix.so

You can now login to your 4.1 ESX server using standard login. Now go and harden your server!

Release the lock from a hung vm on Vmware

After an HA event or a network/storage outage with VMware ESX servers (3.5/4.1 alike), you may have a situation in which the VM is down and cannot be powered on, even if you try to migrate it, or to deregister/register it again on the Virtual Center.
On closer inspection, you might notice that the vswp file is still on the VM folder (a sign the VM might be still active somewhere), yet you cannot delete the file because it is “locked”. Actually, one of the ESX in the cluster owns the lock, even if the VM is not running.
So, how to understand what to do with several hosts in the cluster? Let’s find out.
First of all, we have to know which esx is preventing the poweron.
Log in whatever esx, and run:

tail -f /var/log/vmkernel &

Now go to the locked VM datastore, and try to run:

cat vmname.vmdk

You should get some errors referring to the lock, but, more importantly, some vmkernel logs, such as:

Apr 5 09:45:26 Hostname vmkernel: 17:00:38:46.977 cpu1:1033)Lock [type 10c00001 offset 13058048 v 20, hb offset 3499520
Apr 5 09:45:26 Hostname vmkernel: gen 532, mode 1, owner 45feb537-9c52009b-e812-00137266e200 mtime 1174669462]
Apr 5 09:45:26 Hostname vmkernel: 17:00:38:46.977 cpu1:1033)Addr <4, 136, 2>, gen 19, links 1, type reg, flags 0x0, uid 0, gid 0, mode 600
Apr 5 09:45:26 Hostname vmkernel: 17:00:38:46.977 cpu1:1033)len 297795584, nb 142 tbz 0, zla 1, bs 2097152
Apr 5 09:45:26 Hostname vmkernel: 17:00:38:46.977 cpu1:1033)FS3: 132:

Now, that part identifies the host locking the file. That bold part is nothing but the MAC Address of the ESX!
Now, to the boring part: you have to login in every esx of the cluster and check if any network card matches this MAC:

/sbin/ifconfig -a |grep -i 00:13:72:66:e2:00

As soon as identified, the host should be placed in maintenance from the Virtual Center (DRS should do all the work for migrating the virtual machines) and the rebooted. This will release any lock and allow the VM to be finally powered on.