RHEV guest migration: how to troubleshoot generic errors

As you know, Red Hat Enterprise Virtualization (RHEV) supports the so-called “Live Migration” between two hypervisors belonging to the same cluster. This feature allows to migrate a virtual machine from an hypervisor to another without the need to power down all the VMs executed by that host.

After the first installation could happen that the RHEV Manager interface shows a generic alert “Migration failed due to Error: Fatal error during migration“.

The first thing to do is to login to your management server and analyze your /etc/rhevm/rhevm.log in order to achieve a little bit more information:

2012-11-13 14:46:27,030 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-11-thread-50) Error code migrateErr and error message VDSGenericException: VDSErrorException: Failed to MigrateStatusVDS, error = Fatal error during migration mMessage Fatal error during migration

Damn, this isn’t useful. It’s better to go deep analyzing more specific logs on the source hypervisor.

I suggest to tail the /var/log/libvirt.log and launch the migration. You will see a log like this:

2012-11-13 14:42:17.914+0000: 24249: debug : virDomainMigrateToURI:5594 : Using peer2peer migration
2012-11-13 14:42:17.914+0000: 24249: debug : virDomainMigratePeer2Peer:4969 : dom=0x7f97fc0e34b0, (VM: name=vm_name, uuid=2548e682-7a9a-4520-bbb4-3e79923d03a9), xmlin=(null), flags=3, dname=(null), dconnuri=qemu+tls://destination_hypervisor/system, uri=(null), bandwidth=0
2012-11-13 14:42:17.915+0000: 24249: debug : virDomainMigratePeer2Peer:4997 : Using migration protocol 3
2012-11-13 14:42:17.964+0000: 24249: error : virNetClientProgramDispatchError:174 : internal error Attempt to migrate guest to the same host 00020003-0004-0005-0006-000700080009

The same error appears on vdsm.log:

Thread-358::ERROR::2012-11-13 14:45:09,869::vm::177::vm.Vm::(_recover) vmId=`2548e682-7a9a-4520-bbb4-3e79923d03a9`::internal error Attempt to migrate guest to the same host 00020003-0004-0005-0006-000700080009
Thread-358::ERROR::2012-11-13 14:45:10,833::vm::232::vm.Vm::(run) vmId=`2548e682-7a9a-4520-bbb4-3e79923d03a9`::Traceback (most recent call last):
File “/usr/share/vdsm/vm.py”, line 224, in run
File “/usr/share/vdsm/libvirtvm.py”, line 423, in _startUnderlyingMigration
File “/usr/share/vdsm/libvirtvm.py”, line 445, in f
File “/usr/share/vdsm/libvirtconnection.py”, line 63, in wrapper
File “/usr/lib64/python2.6/site-packages/libvirt.py”, line 1039, in migrateToURI
libvirtError: internal error Attempt to migrate guest to the same host 00020003-0004-0005-0006-000700080009

What does it means? I’ve found that libvirt uses the UUID (universally unique identifier) instead of IP address or hostname but… what is an UUID? Take a look at the Wikipedia page.

For some strange reason my two hypervisors had the same UUID. How to solve this situation? You can simply generate a new UUID by executing this command:

# uuidgen

Then you have to edit the libvirtd configuration file…

# vi /etc/libvirt/libvirtd.conf

…edit the commented option host_uuid (uncomment and substitute the example uuid)…

#host_uuid = “00000000-0000-0000-0000-000000000000”

…and lastly restart the vdsmd service

# service vdsmd restart

If you’re using oVirt instead of RHEV you have to restart the libvirtd service.

If the service restart doesn’t solve the issue try to reboot the hypervisor.

Don’t forget that hypervisors need to reach each other on different ways. Check your iptables configuration for these rules:

# vdsm
-A INPUT -p tcp –dport 54321 -j ACCEPT
# libvirt tls
-A INPUT -p tcp –dport 16514 -j ACCEPT
-A INPUT -p tcp –dport 22 -j ACCEPT
# guest consoles
-A INPUT -p tcp -m multiport –dports 5634:6166 -j ACCEPT
# migration
-A INPUT -p tcp -m multiport –dports 49152:49216 -j ACCEPT

If you forget them you will see this error on libvirt.log:

Thread-3798::ERROR::2012-09-20 09:42:56,977::vm::240::vm.Vm::(run) vmId=`2bf3e6eb-49e4-42c7-8188-fc2aeeae2e86`::Failed to migrate
Traceback (most recent call last):
File “/usr/share/vdsm/vm.py”, line 223, in run
File “/usr/share/vdsm/libvirtvm.py”, line 451, in _startUnderlyingMigration
File “/usr/share/vdsm/libvirtvm.py”, line 491, in f
File “/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py”, line 82, in wrapper
File “/usr/lib64/python2.7/site-packages/libvirt.py”, line 1034, in migrateToURI2
libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+tls://ipaddress/system

Now enjoy your RHEV Live Migration! 😉

2 thoughts on “RHEV guest migration: how to troubleshoot generic errors

  1. Pingback: How to manage duplicated UUID on RHEV 3.1 « Va una spada!

Lascia un commento