When using Ubuntu 12.04 – Precise as your base for a XEN host you need to do a number of things to make it work.  The reason I write this is in the first place for myself as a mental post-it.   In the second place I want others to know about a few problems I came across.  In brief here are the demands for my installation and the issues that I had. I recommend using this guide as a cross reference while doing the actual install.

Required:

  • Running VM’s VHD’s on LVM ( This gives us more features, explained later).
  • Ability to snapshot VM’s
  • Use stock xcp-xapi (but it needs some fixing to make it work).
  • Use simple bridging.
  • Internet gateway for package download

Issues:

  • Kernel ‘3.2.0-40-generic’ acts weird, failing to fully boot on the machines this was tried on.   3.2.0-39-generic seemed to be stable, this is important as when you enable security updates, it would have updated to that kernel.
  • Booting with 16 cores with the Xen Kernel will fail, you need to configure grub : openvswitch doesn’t work for me, bridging will when you have a bridge configured manually in advance it will not be touched by xen, but a bug in openvswitch install will not honour your choice for bridging when asked by the install script.  You’ll have to correct manually.
  • blktap.  If you don’t have this module after installing (it can happen depending on when you do apt-get update / upgrade right after the initial install and after you install xcp-xapi.   You will lack a kernel module, and when doing xe host-list you’ll see that the dom0 isn’t running.  The error in the logs is about as cryptic as it gets , but manually starting xend will show you the error.  So when installing a new kernel, you have to verify that the xen blktap modules are being generated.
use lsmod | grep blk
blktap                 25553  0
xen_blkback            23363  0 [permanent]

install guide

    The short guide using the precise netboot iso:

  • Boot from the CD/DVD
  • Install LVM (On whole disk).   When it asks how much to use, don’t use it all, you need to create logical volumes for the clients later on, I usually use about 80/90 Gigs for dom0, enough to contain a few VM snaphot dumps.  It’s annoying that the swap space is about the same size as your physical memory, when you have 24Gigs, that is taken from the 90 you selected earlier, so you’ll want to compensate.  It’s possible to shrink the swap partition later, and even grow the root partition (online).   I did it a once without initial LVM, with it it’s even easier.   But that’s outside the scope here.  But you can reclaim later, but if done remotely, It’s not for the faint of heart, but it can be done.
  • Don’t install any of the server type selections (like ‘lamp’ or ‘virtualisation’ and so on) except SSH if you need to.  Just keep it base, clean and finish it off with apt-get package picking.
  • Enter your network details, you’ll rework them later for bridge use.  But now you’ll be able to do apt-get update & upgrade
  • Continue the installation, it will ask some more questions.   Once you’ve booted from the disk, you’ll need to get updated

Update to latest version

  • sudo apt-get update
  • sudo apt-get upgrade
  • sudo apt-get dist-upgrade

Make sure you boot to the latest kernel version right now before installing xcp-xapi. At this point we’ll be modding grub to boot a XEN aware kernel. If you are in my situation, you’ll need to remove the ‘bad’ first, if it happens to be the latest, it will be the one booted. So for me I do:

apt-get remove linux-image-3.2.0-40-generic

If you happen to be missing the blktap drivers, chances are you don’t have the linux headers / source for your installation, so you’ll have to do something like this for your kernel (substitute version number!) :

apt-get install linux-headers-3.2.0-39-generic

and then rebuild when xcp-xapi is already installed

dpkg-reconfigure blktap-dkms

Selecting the right kernel

Merge/replace the content of  /etc/default/grub with this:

GRUB_DEFAULT="Xen 4.1-amd64"
#GRUB_HIDDEN_TIMEOUT=0
GRUB_HIDDEN_TIMEOUT_QUIET=true
GRUB_TIMEOUT=2
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX="apparmor=0"
GRUB_CMDLINE_XEN="dom0_mem=1G,max:1G dom0_max_vcpus=2 dom0_vcpus_pin=1"

The last line is the most interesting, it will tell XEN to assign and pin 2 VCPU’s to dom0 together with 1G fixed Ram.  This is enough to run quite some VM’s as I’ve researched, some say it’s enought for 70 of them, but for me, with just 5 clients, I have no issues or lack of resources.  If you have 8 cores or more, chances are that 12.04 will not boot decently, or hang.  So make sure you limit before trying a XEN kernel boot.

After setting grub options, apply with

update-grub

You will see some action indicating your grub config is being built. Before rebooting into xen, setup the bridged network, doing it now will save you some headscratching later. Create a simple bridge setup, assuming your primary interface now is named eth0, here’s a interfaces example of what works for me. You might have to install the bridge utils first if it’s not already on there. So optionally:

apt-get install openvswitch-brcompat

Modify the interfaces file into a bridged setup.

auto lo xenbr0
iface lo inet loopback

iface xenbr0 inet static
        address 192.168.128.99
        netmask 255.255.255.128
        network 192.168.128.0
        broadcast 192.168.128.127
        gateway 192.168.128.126
        dns-nameservers 8.8.8.8
        dns-search internal.com
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0
        #bridge_hello 2
        #bridge_maxage 12

iface eth0 inet manual

Also, adapt your sysctl.conf file with these

net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 1
net.ipv4.conf.eth0.proxy_arp = 1
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0

and apply with sysctl -p. This will do a few things, mainly make sure your iptables aren’t picking up and firewalling bridged traffic, you exclude this traffic from being filtered. I also enable forwarding so every host can be the default gateway of the VM’s, mainly for being able to do apt-get update and so forth. It might be possible you will not need proxy arp, I have found the need for this when building loadbalancer nodes on VM’s (haproxy / keepalived). You could just not enable that proxy_arp line. Now can now set up the forwarding on the XEN host and use that ip as your gateway, very handy for VM installations.

iptables -P FORWARD ACCEPT
iptables --table nat -A POSTROUTING -o xenbr0 -j MASQUERADE

You’re about ready now to get xcp-xapi installed on the machine.

apt-get install xcp-xapi

Once xcp-xapi has been installed, verify your package list, these look like to be about as minimal as you can get

root@:# dpkg --list | egrep 'xen|xcp|vswi'
ii  libxen-4.1                       4.1.2-2ubuntu2.6             Public libs for Xen
ii  libxenstore3.0                   4.1.2-2ubuntu2.6             Xenstore communications library for Xen
ii  openvswitch-brcompat             1.4.0-1ubuntu1.5             Open vSwitch bridge compatibility support
ii  openvswitch-common               1.4.0-1ubuntu1.5             Open vSwitch common components
ii  openvswitch-datapath-dkms        1.4.0-1ubuntu1.5             Open vSwitch datapath module source - DKMS version
ii  openvswitch-switch               1.4.0-1ubuntu1.5             Open vSwitch switch implementations
ii  python-xenapi                    1.3.2-5ubuntu0.1             Xen Cloud Platform - XenAPI Python libraries
ii  xcp-eliloader                    0.1-4                        XenAPI bootloader for EL-based guests
ii  xcp-fe                           0.5.2-3                      Fork-and-exec daemon for xapi
ii  xcp-guest-templates              0.1-3                        Guest template generator for XCP
ii  xcp-networkd                     1.3.2-5ubuntu0.1             Xen Cloud Platform - network configuration daemon
ii  xcp-squeezed                     1.3.2-5ubuntu0.1             Xen Cloud Platform - memory ballooning daemon
ii  xcp-storage-managers             0.1.1-2ubuntu1               storage backends for XCP
ii  xcp-v6d                          1.3.2-5ubuntu0.1             Xen Cloud Platform - feature daemon
ii  xcp-vncterm                      0.1-2                        Provides VNC service for XCP guest VMs
ii  xcp-xapi                         1.3.2-5ubuntu0.1             Xen Cloud Platform - XenAPI server
ii  xcp-xe                           1.3.2-5ubuntu0.1             Xen Cloud Platform - command-line utilities
ii  xen-hypervisor-4.1-amd64         4.1.2-2ubuntu2.6             Xen Hypervisor on AMD64
ii  xen-utils-4.1                    4.1.2-2ubuntu2.6             XEN administrative tools
ii  xen-utils-common                 4.1.2-1ubuntu1               XEN administrative tools - common files
ii  xenstore-utils                   4.1.2-2ubuntu2.6             Xenstore utilities for Xen

Now it’s time to fix a few things, You probably had a question on what to use for networking, either openvswitch or bridge, even when selecting bridge it will not end up in the config file, therefor you’ll need to manually fix this. before rebooting, modify the following:

Adjust “/etc/default/xen”

TOOLSTACK=xapi

Verify “bridge” is set in “/etc/xcp/network.conf”, chances are, it’s wrong. Add workaround for XAPI/XEND-conflict:

cd /etc
sed -i -e 's/xend_start$/#xend_start/' -e 's/xend_stop$/#xend_stop/' /etc/init.d/xend
update-rc.d xendomains disable

Now, reboot to XEN-kernel. To see how far you got so far, you can check to see if xcp-xapi works by verifying for dom0.

xe host-list

uuid ( RO)                : 301c187b-28b6-16bf-a2a3-2b23573663a9
          name-label ( RW): serverx
    name-description ( RW): Default install of XenServer

If you get this, it means your kernel booted fine and that the xcp-xapi seems to have correctly started and responding on this simple request. It’s possible that xcp-xapi fails to start with some cryptic error, usually after a fresh install it means the blktap module isn’t getting loaded, so install the missing headers / rebuild the module / update-grub.

At this point we could start deploying virtual machine but there is still a problem with the xcp-xapi drivers vs the lvm version and capabilities in xen if you need to deploy using VHD format. The xcp-xapi package maintainer for ubuntu decided to keep out those drivers since they weren’t fully tested. Therefor, you’ll need to download/modify the xcp drivers. It’s using options the lvm with 12.04 LTS doesn’t support, so failing miserably. If you don’t deploy on some shared filesystem (SAN/NAS) and don’t cluster xen host, you’re ok doing so.

So let’s demonstrate what we will accomplish by doing this. Suppose at this point you start creating virtual machine, it will work but at one point you’ll want to copy or clone a running VM from a snapshot. Dang! This is not possible since the functionality for this has been (kind of) stripped from the official xcp-xapi packages. Why?

Because your SR with LVM backing (xcp driver) has a limit. You should use a modded one, we’ll explain later how. For comparison, see the capabilities of both SR’s.

root@X:~# xe sr-param-list uuid=f6e2cee8-d678-c424-134c-90a3f5ecba98
uuid ( RO)                    : f6e2cee8-d678-c424-134c-90a3f5ecba98
              name-label ( RW): sr_demo
                    host ( RO): X
      allowed-operations (SRO): forget; VDI.create; plug; destroy; VDI.destroy; scan; VDI.clone;
                                               VDI.resize; unplug
      current-operations (SRO):
                    VDIs (SRO): a8f71c34-1605-40fa-a7d6-cb6846c29fd7
                    PBDs (SRO): 7a5df802-b2db-d43e-1c9f-8eff52dbef66
      virtual-allocation ( RO): 21474836480
    physical-utilisation ( RO): 21474836480
           physical-size ( RO): 107361599488
                    type ( RO): lvm
                  shared ( RW): false

A SR that with more capabilities by using LVM + VHD’s:

root@Y:/etc# xe sr-param-list uuid=0e41aa8d-0e89-19f6-ec35-d1792e4b4454
uuid ( RO)                    : 0e41aa8d-0e89-19f6-ec35-d1792e4b4454
              name-label ( RW): sr_demo
                    host ( RO): Y
      allowed-operations (SRO): forget; VDI.create; VDI.snapshot; plug; update; destroy;
                                              VDI.destroy; scan; VDI.clone; VDI.resize; unplug
      current-operations (SRO):
                    VDIs (SRO): 9e67ecb3-ac1a-43f7-9108-21dca8a2bb0f
                    PBDs (SRO): e05f732f-51bc-b0cd-958d-83ce6c9fc662
      virtual-allocation ( RO): 63350767616
    physical-utilisation ( RO): 63484985344
           physical-size ( RO): 64399343616
                    type ( RO): lvm
                  shared ( RW): false
               sm-config (MRO): allocation: thick; use_vhd: true; devserial:

You notice that for machine X you don’t have VDI.snapshot which is kind of essential if you have a backup in mind and/or fast deployment. To achieve this, you’ll need to dive into the xcp-xapi driver code, luckily this is python, which is quite readable most of the times.

Navigate to /usr/lib/xcp/sm. The driver we want to make available is : LVHDSR
You can either get that from the source package. ( download xcp-storage-managers-0.1.1 and extract this file from drivers directory ). You also need to take the lvutil.py file, we’ll be modding this one. The LVHDSR should be copied to the current directory. Put a symbolic link in place too. Suppose you unpacked this in /usr/local/src. You might have to remove LVMSR if that file already exists.

cp /usr/local/src/xcp-storage-managers-0.1.1/drivers/LVHDSR.py /usr/lib/xcp/sm/
cp /usr/local/src/xcp-storage-managers-0.1.1/drivers/lvutil.py /usr/lib/xcp/sm/
ln -s LVHDSR.py LVMSR

Lets adapt the lvutil.py file, essentially, our version of LVM that comes with Precise doesn’t like the –master option used in it. We need to remove this. We also need to adapt the path when copied from the source package, essentially ‘debianify‘ it.

change lines:

line 34  : LVM_BIN = "/sbin"
line 302: cmd = [CMD_VGCHANGE, "-an", "--master", vgname] -> cmd = [CMD_VGCHANGE, "-an", vgname]
line 344: cmd = [CMD_VGCHANGE, "-a" + val, "--master", path] -> cmd = [CMD_VGCHANGE, "-a" + val, path]
lien 354: cmd.extend(["--inactive", "--zero=n"]) -> cmd.extend(["--zero=n"])

This will fix any errors while creating SR’s on the LVM backend, give us the ability to snapshot a running VM, and take a file backup (.xva) from that snapshot, all while keep the original machine running.

So at this point, you should have a running dom0 with the appropriate kernel. In case you still have trouble starting xend or xcp-xapi, go check out the log files in /var/log called SMlog and xcp-xapi.log. They will give you clues as to what is holding you back. You also modified and introduced the LVHD driver, in xe you can use tab completion on options, and this will confirm or not if your mods have been done right. Try to create an SR, to do this, we’ll have to create some logical volumes first.

Go ahead and create a logical volume:

lvcreate --size 50G --name sr_guest1 serverx

Then type this command but don’t enter at the end, try tab-completion:

xe sr-create device-config:device=/dev/serverx/sr_guest1 name-label=vm1 type=<TAB>

You should see a choice if types there:

dummy   ext     file    iso     nfs

if lvm is probably not on there, you haven’t restarted xcp-xapi yet, now restart old school style and try again:

/etc/init.d/xcp-xapi restart
xe sr-create device-config:device=/dev/serverx/sr_guest1 name-label=vm1 type=<TAB>

Now it should show you the lvm option, select lvm and hit enter:

dummy   ext     file    iso     lvm     nfs
xe sr-create device-config:device=/dev/serverx/sr_guest1 name-label=sr_1 type=lvm
cc7537c0-fba9-374a-7997-b72b65d1f7c1

It will respond with a UUID, check the capabilities of that SR like this, with the most interesting options:

xe sr-param-list uuid=cc7537c0-fba9-374a-7997-b72b65d1f7c1
...
    allowed-operations (SRO): forget; VDI.create; VDI.snapshot; plug; update; destroy; VDI.destroy; scan; VDI.clone; VDI.resize; unplug
               sm-config (MRO): allocation: thick; use_vhd: true; devserial:
...

To doublecheck, see what xcp did to the LVM layout by issuing lvs, you should see a new volume group with a logical volume with name MGT. Relevant entries are:

  MGT       VG_XenStorage-cc7537c0-fba9-374a-7997-b72b65d1f7c1 -wi-a-  4.00m
  sr_guest1 server19                                           -wi-ao 50.00g

I guess that’s management. This is a good sign it’s working as expected, you can dive deeper and find out how it trickles down in the chain.

root@x:/dev/VG_XenStorage-cc7537c0-fba9-374a-7997-b72b65d1f7c1# ls -altr
lrwxrwxrwx  1 root root    7 Apr 29 15:50 MGT -> ../dm-3
root@x: dmsetup ls
root@x: dmsetup info VG_XenStorage--cc7537c0--fba9--374a--7997--b72b65d1f7c1-MGT
Name:              VG_XenStorage--cc7537c0--fba9--374a--7997--b72b65d1f7c1-MGT
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      252, 3
Number of targets: 1
UUID: LVM-Bnj36apP5qtLHdOBd7cGWCXhwGEgkBlJfJoJpcMvXccR5kzIEft18onS3KA1xyHm

No files are on there since we haven’t created any VM’s on this SR. Lets create a custom template for installing 12.04 LTS as guest.

Creating a 12.04 install template

TEMPLATE_UUID=`xe template-list name-label="Ubuntu Lucid Lynx 10.04 (64-bit)" params=uuid --minimal`
NEW_TEMPLATE_UUID=`xe vm-clone uuid=$TEMPLATE_UUID new-name-label="Ubuntu Precise Pangolin 12.04 (64-bit)"`
xe template-param-set other-config:default_template=true other-config:debian-release=precise uuid=$NEW_TEMPLATE_UUID

Preparing the VM with Network interface (vif)

VM="`xe vm-install new-name-label="vm01" template="Ubuntu Precise Pangolin 12.04 (64-bit)" sr-name-label="sr_1"`"
xe vm-param-set uuid=$VM other-config:install-repository="http://archive.ubuntu.net/ubuntu/"
NETWORK="`sudo xe network-list bridge=xenbr0 --minimal`"
xe vif-create vm-uuid=$VM network-uuid=$NETWORK mac=random device=0

Right about now is the time to set some limits according to your wishes. The standard disk size is 8GB, if you want a bigger machine, you’ll have to mod the VDI before starting the VM.

Resize the standard sized VDI in halted state

You better do this before starting the new installed vm. Otherwise you end up having to resize filesystems/volumes etc. Finding back what VDI belongs to with VM and SR gets complicated when you have lots of machines. But these commands should help:

xe sr-param-list uuid=cc7537c0-fba9-374a-7997-b72b65d1f7c1 (pick the vdi uuid from this list, use below)
xe vdi-param-list uuid=c95ee4bf-dbf2-4cfd-93c5-ef1b7070d8d5
xe vdi-resize uuid=c95ee4bf-dbf2-4cfd-93c5-ef1b7070d8d5 disk-size=49GiB

Substitute the uuid’s with yours of course. Pay attention to the way units are written, use the built in help for more info.

Assigning system resources to a VM

xe template-param-set uuid=$NEW_TEMPLATE_UUID VCPUs-max=4
xe template-param-set uuid=$NEW_TEMPLATE_UUID VCPUs-at-startup=4
xe template-param-set uuid=$NEW_TEMPLATE_UUID memory-static-min=805306368
xe template-param-set uuid=$NEW_TEMPLATE_UUID memory-static-max=1610612736
xe template-param-set uuid=$NEW_TEMPLATE_UUID memory-dynamic-max=1610612736
xe template-param-set uuid=$NEW_TEMPLATE_UUID memory-dynamic-min=805306368

And so forth. Plenty of options there. When you do make an error there, you can clear the parameter like this:

xe template-param-remove uuid=$NEW_TEMPLATE_UUID param-name=other-config param-key=disks:provision:size

Performing an initial VM start and installation

Before we going to start a vm, we want to have console access to it, we can use the old school xl, utility for that, set it up as an alias

alias console='/usr/lib/xen-4.1/bin/xl console $1'

This way you can attach to a console of a vm by using it’s name. If you want to bail out of the console use the old school terminal shortcuts (CTRL + ] ). It might be possible that VNC gets started, you might have to sort that out since taking over the console of a VM while it’s running reacts akward in a way you don’t immediately realise this fact.

If all went well you should see a Ubuntu 12.04 install dialog, asking for the language. From there on it’s like an other Ubuntu OS installation.

Some hints when installing VM’s

  • Write down your disk UUID’s inside your VM and their function, these will come in handy later, to solve some issues booting your VM’s after redeploying backups, this is very important!
  • When updating VM kernel

Restoring a exported vm using our additional features

Right now, this system is capable of some very interesting things. For instance, you can now snapshot a running VM (if in fact it’s also Ubuntu 12.04 ). You can then dump the VM from that snapshot to a file and import. Let’s start with importing a previously exported machine.

Let’s try one in the SR we created earlier:

root@x:/home/dumps# xe vm-import filename=solr_base.xva preserve=false sr-uuid=342d0647-2ae4-11b1-290f-4003b4d32d96
Operation failed. Error: Connection reset by peer

Nasty error by the looks of it, but as always, go check the xcp-xapi logs, they usually give out a more detailed message.

[20130430T19:47:39.808Z|debug|x|816 UNIX /var/lib/xcp/xapi||cli] Xapi_cli.exception_handler: Got exception SR_BACKEND_FAILURE_44: [ ; There is insufficient space;  ]

So apparently our SR is too small to contain the VM we are about to import. We could resize the SR but lets just remove it, remove the logical volume and create it all again

xe sr-destroy uuid=342d0647-2ae4-11b1-290f-4003b4d32d96
The SR is still connected to a host via a PBD. It cannot be destroyed.
sr: 342d0647-2ae4-11b1-290f-4003b4d32d96 (sr_1)

So we need to identify the correct one and get rid of the pbd.

xe pbd-list  <look for the uuid>
xe pbd-unplug uuid=34047f96-f0f9-d319-85ec-11eaf80aa94d

If the last one fails, do it twice, that seems to work all the time for me, it complains it can’t detach the SR. Now destroy it. I guess there are other ways to do this, but this one works for me.

xe pbd-destroy uuid=34047f96-f0f9-d319-85ec-11eaf80aa94d
xe sr-destroy uuid=342d0647-2ae4-11b1-290f-4003b4d32d96
xe sr-forget uuid=342d0647-2ae4-11b1-290f-4003b4d32d96

Everything is now gone, lets remove the logical volume

lvremove serverx/sr_guest1

And create a larger one

lvcreate --size 90G --name sr_guest1 serverx

And do all steps we did before again to the point we can try the import again, which is not a lot…

xe sr-create device-config:device=/dev/serverx/sr_guest1 name-label=sr_1 type=lvm
xe vm-import filename=solr_base.xva preserve=false sr-uuid=3fefe239-eea2-de85-ebe2-d0d9fad09867

The last command is using a different sr-uuid than before of course. Press enter and go make some coffee or go to the bathroom or both, depending on how fast your disks are. It will take some time.

When this is done, you probably want to rename the vm as it will contain the old name. It’s always wise to disable networking inside the vm (put the network config in comments), because once it boots, it will try use the IP of the base, which is probably in use. We’ll cover snapshotting later.

xe vm-param-set uuid=a89d8255-3c55-9c5f-6847-c8abf0659b34 name-label='demo'
xe vm-start uuid=a89d8255-3c55-9c5f-6847-c8abf0659b34
console demo

When renaming a virtual machine, the console alias from earlier will not work anymore, but the old name will still work. When doing this when the vm is in a halted state, the new name will work right away. The machine should be booting now. Note that you didn’t have to create a vif manually for this VM, it’s all taken care of, importing only requires an SR (if you don’t have a default SR setup).

Now we have this machine imported, let’s explain the steps needed to create such a xva export. Let’s try one from the currently running VM.

Taking a snapshot from VM’s

xe vm-list is-control-domain=false
xe vm-snapshot uuid=a89d8255-3c55-9c5f-6847-c8abf0659b34 new-name-label=snapshot_demo

This will NOT be a consistent snapshot, it will be more like you would freeze the machine in time and copy all files, open and closed over at once. In that state your snapshot will be created. Now a smart person would shut about every service down on the machine to make it as if the power failed. Your VM will have to filecheck the system but will probably startup again. A consistent snapshot would require the machine to be halted.

Also note that the VM’ operating system matters while doing this on a running machine, there is some kernel support needed for this to work as expected.