When using Ubuntu 12.04 – Precise as your base for a XEN host you need to do a number of things to make it work. The reason I write this is in the first place for myself as a mental post-it. In the second place I want others to know about a few problems I came across. In brief here are the demands for my installation and the issues that I had. I recommend using this guide as a cross reference while doing the actual install.
Required:
- Running VM’s VHD’s on LVM ( This gives us more features, explained later).
- Ability to snapshot VM’s
- Use stock xcp-xapi (but it needs some fixing to make it work).
- Use simple bridging.
- Internet gateway for package download
Issues:
- Kernel ‘3.2.0-40-generic’ acts weird, failing to fully boot on the machines this was tried on. 3.2.0-39-generic seemed to be stable, this is important as when you enable security updates, it would have updated to that kernel.
- Booting with 16 cores with the Xen Kernel will fail, you need to configure grub : openvswitch doesn’t work for me, bridging will when you have a bridge configured manually in advance it will not be touched by xen, but a bug in openvswitch install will not honour your choice for bridging when asked by the install script. You’ll have to correct manually.
- blktap. If you don’t have this module after installing (it can happen depending on when you do apt-get update / upgrade right after the initial install and after you install xcp-xapi. You will lack a kernel module, and when doing xe host-list you’ll see that the dom0 isn’t running. The error in the logs is about as cryptic as it gets , but manually starting xend will show you the error. So when installing a new kernel, you have to verify that the xen blktap modules are being generated.
blktap 25553 0
xen_blkback 23363 0 [permanent]
install guide
-
The short guide using the precise netboot iso:
- Boot from the CD/DVD
- Install LVM (On whole disk). When it asks how much to use, don’t use it all, you need to create logical volumes for the clients later on, I usually use about 80/90 Gigs for dom0, enough to contain a few VM snaphot dumps. It’s annoying that the swap space is about the same size as your physical memory, when you have 24Gigs, that is taken from the 90 you selected earlier, so you’ll want to compensate. It’s possible to shrink the swap partition later, and even grow the root partition (online). I did it a once without initial LVM, with it it’s even easier. But that’s outside the scope here. But you can reclaim later, but if done remotely, It’s not for the faint of heart, but it can be done.
- Don’t install any of the server type selections (like ‘lamp’ or ‘virtualisation’ and so on) except SSH if you need to. Just keep it base, clean and finish it off with apt-get package picking.
- Enter your network details, you’ll rework them later for bridge use. But now you’ll be able to do apt-get update & upgrade
- Continue the installation, it will ask some more questions. Once you’ve booted from the disk, you’ll need to get updated
Update to latest version
- sudo apt-get update
- sudo apt-get upgrade
- sudo apt-get dist-upgrade
Make sure you boot to the latest kernel version right now before installing xcp-xapi. At this point we’ll be modding grub to boot a XEN aware kernel. If you are in my situation, you’ll need to remove the ‘bad’ first, if it happens to be the latest, it will be the one booted. So for me I do:
If you happen to be missing the blktap drivers, chances are you don’t have the linux headers / source for your installation, so you’ll have to do something like this for your kernel (substitute version number!) :
and then rebuild when xcp-xapi is already installed
Selecting the right kernel
Merge/replace the content of /etc/default/grub with this:
#GRUB_HIDDEN_TIMEOUT=0
GRUB_HIDDEN_TIMEOUT_QUIET=true
GRUB_TIMEOUT=2
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX="apparmor=0"
GRUB_CMDLINE_XEN="dom0_mem=1G,max:1G dom0_max_vcpus=2 dom0_vcpus_pin=1"
The last line is the most interesting, it will tell XEN to assign and pin 2 VCPU’s to dom0 together with 1G fixed Ram. This is enough to run quite some VM’s as I’ve researched, some say it’s enought for 70 of them, but for me, with just 5 clients, I have no issues or lack of resources. If you have 8 cores or more, chances are that 12.04 will not boot decently, or hang. So make sure you limit before trying a XEN kernel boot.
After setting grub options, apply with
You will see some action indicating your grub config is being built. Before rebooting into xen, setup the bridged network, doing it now will save you some headscratching later. Create a simple bridge setup, assuming your primary interface now is named eth0, here’s a interfaces example of what works for me. You might have to install the bridge utils first if it’s not already on there. So optionally:
Modify the interfaces file into a bridged setup.
iface lo inet loopback
iface xenbr0 inet static
address 192.168.128.99
netmask 255.255.255.128
network 192.168.128.0
broadcast 192.168.128.127
gateway 192.168.128.126
dns-nameservers 8.8.8.8
dns-search internal.com
bridge_ports eth0
bridge_stp off
bridge_fd 0
#bridge_hello 2
#bridge_maxage 12
iface eth0 inet manual
Also, adapt your sysctl.conf file with these
net.ipv4.conf.all.rp_filter = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 1
net.ipv4.conf.eth0.proxy_arp = 1
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0
and apply with sysctl -p. This will do a few things, mainly make sure your iptables aren’t picking up and firewalling bridged traffic, you exclude this traffic from being filtered. I also enable forwarding so every host can be the default gateway of the VM’s, mainly for being able to do apt-get update and so forth. It might be possible you will not need proxy arp, I have found the need for this when building loadbalancer nodes on VM’s (haproxy / keepalived). You could just not enable that proxy_arp line. Now can now set up the forwarding on the XEN host and use that ip as your gateway, very handy for VM installations.
iptables --table nat -A POSTROUTING -o xenbr0 -j MASQUERADE
You’re about ready now to get xcp-xapi installed on the machine.
Once xcp-xapi has been installed, verify your package list, these look like to be about as minimal as you can get
ii libxen-4.1 4.1.2-2ubuntu2.6 Public libs for Xen
ii libxenstore3.0 4.1.2-2ubuntu2.6 Xenstore communications library for Xen
ii openvswitch-brcompat 1.4.0-1ubuntu1.5 Open vSwitch bridge compatibility support
ii openvswitch-common 1.4.0-1ubuntu1.5 Open vSwitch common components
ii openvswitch-datapath-dkms 1.4.0-1ubuntu1.5 Open vSwitch datapath module source - DKMS version
ii openvswitch-switch 1.4.0-1ubuntu1.5 Open vSwitch switch implementations
ii python-xenapi 1.3.2-5ubuntu0.1 Xen Cloud Platform - XenAPI Python libraries
ii xcp-eliloader 0.1-4 XenAPI bootloader for EL-based guests
ii xcp-fe 0.5.2-3 Fork-and-exec daemon for xapi
ii xcp-guest-templates 0.1-3 Guest template generator for XCP
ii xcp-networkd 1.3.2-5ubuntu0.1 Xen Cloud Platform - network configuration daemon
ii xcp-squeezed 1.3.2-5ubuntu0.1 Xen Cloud Platform - memory ballooning daemon
ii xcp-storage-managers 0.1.1-2ubuntu1 storage backends for XCP
ii xcp-v6d 1.3.2-5ubuntu0.1 Xen Cloud Platform - feature daemon
ii xcp-vncterm 0.1-2 Provides VNC service for XCP guest VMs
ii xcp-xapi 1.3.2-5ubuntu0.1 Xen Cloud Platform - XenAPI server
ii xcp-xe 1.3.2-5ubuntu0.1 Xen Cloud Platform - command-line utilities
ii xen-hypervisor-4.1-amd64 4.1.2-2ubuntu2.6 Xen Hypervisor on AMD64
ii xen-utils-4.1 4.1.2-2ubuntu2.6 XEN administrative tools
ii xen-utils-common 4.1.2-1ubuntu1 XEN administrative tools - common files
ii xenstore-utils 4.1.2-2ubuntu2.6 Xenstore utilities for Xen
Now it’s time to fix a few things, You probably had a question on what to use for networking, either openvswitch or bridge, even when selecting bridge it will not end up in the config file, therefor you’ll need to manually fix this. before rebooting, modify the following:
Adjust “/etc/default/xen”
Verify “bridge” is set in “/etc/xcp/network.conf”, chances are, it’s wrong. Add workaround for XAPI/XEND-conflict:
sed -i -e 's/xend_start$/#xend_start/' -e 's/xend_stop$/#xend_stop/' /etc/init.d/xend
update-rc.d xendomains disable
Now, reboot to XEN-kernel. To see how far you got so far, you can check to see if xcp-xapi works by verifying for dom0.
uuid ( RO) : 301c187b-28b6-16bf-a2a3-2b23573663a9
name-label ( RW): serverx
name-description ( RW): Default install of XenServer
If you get this, it means your kernel booted fine and that the xcp-xapi seems to have correctly started and responding on this simple request. It’s possible that xcp-xapi fails to start with some cryptic error, usually after a fresh install it means the blktap module isn’t getting loaded, so install the missing headers / rebuild the module / update-grub.
At this point we could start deploying virtual machine but there is still a problem with the xcp-xapi drivers vs the lvm version and capabilities in xen if you need to deploy using VHD format. The xcp-xapi package maintainer for ubuntu decided to keep out those drivers since they weren’t fully tested. Therefor, you’ll need to download/modify the xcp drivers. It’s using options the lvm with 12.04 LTS doesn’t support, so failing miserably. If you don’t deploy on some shared filesystem (SAN/NAS) and don’t cluster xen host, you’re ok doing so.
So let’s demonstrate what we will accomplish by doing this. Suppose at this point you start creating virtual machine, it will work but at one point you’ll want to copy or clone a running VM from a snapshot. Dang! This is not possible since the functionality for this has been (kind of) stripped from the official xcp-xapi packages. Why?
Because your SR with LVM backing (xcp driver) has a limit. You should use a modded one, we’ll explain later how. For comparison, see the capabilities of both SR’s.
uuid ( RO) : f6e2cee8-d678-c424-134c-90a3f5ecba98
name-label ( RW): sr_demo
host ( RO): X
allowed-operations (SRO): forget; VDI.create; plug; destroy; VDI.destroy; scan; VDI.clone;
VDI.resize; unplug
current-operations (SRO):
VDIs (SRO): a8f71c34-1605-40fa-a7d6-cb6846c29fd7
PBDs (SRO): 7a5df802-b2db-d43e-1c9f-8eff52dbef66
virtual-allocation ( RO): 21474836480
physical-utilisation ( RO): 21474836480
physical-size ( RO): 107361599488
type ( RO): lvm
shared ( RW): false
A SR that with more capabilities by using LVM + VHD’s:
uuid ( RO) : 0e41aa8d-0e89-19f6-ec35-d1792e4b4454
name-label ( RW): sr_demo
host ( RO): Y
allowed-operations (SRO): forget; VDI.create; VDI.snapshot; plug; update; destroy;
VDI.destroy; scan; VDI.clone; VDI.resize; unplug
current-operations (SRO):
VDIs (SRO): 9e67ecb3-ac1a-43f7-9108-21dca8a2bb0f
PBDs (SRO): e05f732f-51bc-b0cd-958d-83ce6c9fc662
virtual-allocation ( RO): 63350767616
physical-utilisation ( RO): 63484985344
physical-size ( RO): 64399343616
type ( RO): lvm
shared ( RW): false
sm-config (MRO): allocation: thick; use_vhd: true; devserial:
You notice that for machine X you don’t have VDI.snapshot which is kind of essential if you have a backup in mind and/or fast deployment. To achieve this, you’ll need to dive into the xcp-xapi driver code, luckily this is python, which is quite readable most of the times.
Navigate to /usr/lib/xcp/sm. The driver we want to make available is : LVHDSR
You can either get that from the source package. ( download xcp-storage-managers-0.1.1 and extract this file from drivers directory ). You also need to take the lvutil.py file, we’ll be modding this one. The LVHDSR should be copied to the current directory. Put a symbolic link in place too. Suppose you unpacked this in /usr/local/src. You might have to remove LVMSR if that file already exists.
cp /usr/local/src/xcp-storage-managers-0.1.1/drivers/lvutil.py /usr/lib/xcp/sm/
ln -s LVHDSR.py LVMSR
Lets adapt the lvutil.py file, essentially, our version of LVM that comes with Precise doesn’t like the –master option used in it. We need to remove this. We also need to adapt the path when copied from the source package, essentially ‘debianify‘ it.
change lines:
line 302: cmd = [CMD_VGCHANGE, "-an", "--master", vgname] -> cmd = [CMD_VGCHANGE, "-an", vgname]
line 344: cmd = [CMD_VGCHANGE, "-a" + val, "--master", path] -> cmd = [CMD_VGCHANGE, "-a" + val, path]
lien 354: cmd.extend(["--inactive", "--zero=n"]) -> cmd.extend(["--zero=n"])
This will fix any errors while creating SR’s on the LVM backend, give us the ability to snapshot a running VM, and take a file backup (.xva) from that snapshot, all while keep the original machine running.
So at this point, you should have a running dom0 with the appropriate kernel. In case you still have trouble starting xend or xcp-xapi, go check out the log files in /var/log called SMlog and xcp-xapi.log. They will give you clues as to what is holding you back. You also modified and introduced the LVHD driver, in xe you can use tab completion on options, and this will confirm or not if your mods have been done right. Try to create an SR, to do this, we’ll have to create some logical volumes first.
Go ahead and create a logical volume:
Then type this command but don’t enter at the end, try tab-completion:
You should see a choice if types there:
if lvm is probably not on there, you haven’t restarted xcp-xapi yet, now restart old school style and try again:
xe sr-create device-config:device=/dev/serverx/sr_guest1 name-label=vm1 type=<TAB>
Now it should show you the lvm option, select lvm and hit enter:
xe sr-create device-config:device=/dev/serverx/sr_guest1 name-label=sr_1 type=lvm
cc7537c0-fba9-374a-7997-b72b65d1f7c1
It will respond with a UUID, check the capabilities of that SR like this, with the most interesting options:
...
allowed-operations (SRO): forget; VDI.create; VDI.snapshot; plug; update; destroy; VDI.destroy; scan; VDI.clone; VDI.resize; unplug
sm-config (MRO): allocation: thick; use_vhd: true; devserial:
...
To doublecheck, see what xcp did to the LVM layout by issuing lvs, you should see a new volume group with a logical volume with name MGT. Relevant entries are:
sr_guest1 server19 -wi-ao 50.00g
I guess that’s management. This is a good sign it’s working as expected, you can dive deeper and find out how it trickles down in the chain.
lrwxrwxrwx 1 root root 7 Apr 29 15:50 MGT -> ../dm-3
root@x: dmsetup ls
root@x: dmsetup info VG_XenStorage--cc7537c0--fba9--374a--7997--b72b65d1f7c1-MGT
Name: VG_XenStorage--cc7537c0--fba9--374a--7997--b72b65d1f7c1-MGT
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 252, 3
Number of targets: 1
UUID: LVM-Bnj36apP5qtLHdOBd7cGWCXhwGEgkBlJfJoJpcMvXccR5kzIEft18onS3KA1xyHm
No files are on there since we haven’t created any VM’s on this SR. Lets create a custom template for installing 12.04 LTS as guest.
Creating a 12.04 install template
NEW_TEMPLATE_UUID=`xe vm-clone uuid=$TEMPLATE_UUID new-name-label="Ubuntu Precise Pangolin 12.04 (64-bit)"`
xe template-param-set other-config:default_template=true other-config:debian-release=precise uuid=$NEW_TEMPLATE_UUID
Preparing the VM with Network interface (vif)
xe vm-param-set uuid=$VM other-config:install-repository="http://archive.ubuntu.net/ubuntu/"
NETWORK="`sudo xe network-list bridge=xenbr0 --minimal`"
xe vif-create vm-uuid=$VM network-uuid=$NETWORK mac=random device=0
Right about now is the time to set some limits according to your wishes. The standard disk size is 8GB, if you want a bigger machine, you’ll have to mod the VDI before starting the VM.
Resize the standard sized VDI in halted state
You better do this before starting the new installed vm. Otherwise you end up having to resize filesystems/volumes etc. Finding back what VDI belongs to with VM and SR gets complicated when you have lots of machines. But these commands should help:
xe vdi-param-list uuid=c95ee4bf-dbf2-4cfd-93c5-ef1b7070d8d5
xe vdi-resize uuid=c95ee4bf-dbf2-4cfd-93c5-ef1b7070d8d5 disk-size=49GiB
Substitute the uuid’s with yours of course. Pay attention to the way units are written, use the built in help for more info.
Assigning system resources to a VM
xe template-param-set uuid=$NEW_TEMPLATE_UUID VCPUs-at-startup=4
xe template-param-set uuid=$NEW_TEMPLATE_UUID memory-static-min=805306368
xe template-param-set uuid=$NEW_TEMPLATE_UUID memory-static-max=1610612736
xe template-param-set uuid=$NEW_TEMPLATE_UUID memory-dynamic-max=1610612736
xe template-param-set uuid=$NEW_TEMPLATE_UUID memory-dynamic-min=805306368
And so forth. Plenty of options there. When you do make an error there, you can clear the parameter like this:
Performing an initial VM start and installation
Before we going to start a vm, we want to have console access to it, we can use the old school xl, utility for that, set it up as an alias
This way you can attach to a console of a vm by using it’s name. If you want to bail out of the console use the old school terminal shortcuts (CTRL + ] ). It might be possible that VNC gets started, you might have to sort that out since taking over the console of a VM while it’s running reacts akward in a way you don’t immediately realise this fact.
If all went well you should see a Ubuntu 12.04 install dialog, asking for the language. From there on it’s like an other Ubuntu OS installation.
Some hints when installing VM’s
- Write down your disk UUID’s inside your VM and their function, these will come in handy later, to solve some issues booting your VM’s after redeploying backups, this is very important!
- When updating VM kernel
Restoring a exported vm using our additional features
Right now, this system is capable of some very interesting things. For instance, you can now snapshot a running VM (if in fact it’s also Ubuntu 12.04 ). You can then dump the VM from that snapshot to a file and import. Let’s start with importing a previously exported machine.
Let’s try one in the SR we created earlier:
Operation failed. Error: Connection reset by peer
Nasty error by the looks of it, but as always, go check the xcp-xapi logs, they usually give out a more detailed message.
So apparently our SR is too small to contain the VM we are about to import. We could resize the SR but lets just remove it, remove the logical volume and create it all again
The SR is still connected to a host via a PBD. It cannot be destroyed.
sr: 342d0647-2ae4-11b1-290f-4003b4d32d96 (sr_1)
So we need to identify the correct one and get rid of the pbd.
xe pbd-unplug uuid=34047f96-f0f9-d319-85ec-11eaf80aa94d
If the last one fails, do it twice, that seems to work all the time for me, it complains it can’t detach the SR. Now destroy it. I guess there are other ways to do this, but this one works for me.
xe sr-destroy uuid=342d0647-2ae4-11b1-290f-4003b4d32d96
xe sr-forget uuid=342d0647-2ae4-11b1-290f-4003b4d32d96
Everything is now gone, lets remove the logical volume
And create a larger one
And do all steps we did before again to the point we can try the import again, which is not a lot…
xe vm-import filename=solr_base.xva preserve=false sr-uuid=3fefe239-eea2-de85-ebe2-d0d9fad09867
The last command is using a different sr-uuid than before of course. Press enter and go make some coffee or go to the bathroom or both, depending on how fast your disks are. It will take some time.
When this is done, you probably want to rename the vm as it will contain the old name. It’s always wise to disable networking inside the vm (put the network config in comments), because once it boots, it will try use the IP of the base, which is probably in use. We’ll cover snapshotting later.
xe vm-start uuid=a89d8255-3c55-9c5f-6847-c8abf0659b34
console demo
When renaming a virtual machine, the console alias from earlier will not work anymore, but the old name will still work. When doing this when the vm is in a halted state, the new name will work right away. The machine should be booting now. Note that you didn’t have to create a vif manually for this VM, it’s all taken care of, importing only requires an SR (if you don’t have a default SR setup).
Now we have this machine imported, let’s explain the steps needed to create such a xva export. Let’s try one from the currently running VM.
Taking a snapshot from VM’s
xe vm-snapshot uuid=a89d8255-3c55-9c5f-6847-c8abf0659b34 new-name-label=snapshot_demo
This will NOT be a consistent snapshot, it will be more like you would freeze the machine in time and copy all files, open and closed over at once. In that state your snapshot will be created. Now a smart person would shut about every service down on the machine to make it as if the power failed. Your VM will have to filecheck the system but will probably startup again. A consistent snapshot would require the machine to be halted.
Also note that the VM’ operating system matters while doing this on a running machine, there is some kernel support needed for this to work as expected.