Monday, August 4, 2008

Virtualbox and custom kernels

I've been using Virtualbox successfully for a few months now. On my laptop, with Linux as the host operating system, the virtual Windows XP install runs faster than the real XP install on another partition, even though I've only given it 512MB of RAM to work with. Very interesting program.

But on my main desktop system I run a custom kernel, and the same Virtualbox install didn't work there. The kernel module it loads just failed. You can get that to reinstall using

/etc/init.d/vboxdrv setup

Or at least you're supposed to be able to. The error message it gave me in /var/log/vbox-install.log suggested I needed to set KERN_DIR to point to the root of my kernel source; did that. No help, and now there's nothing useful in that log file. The module was there, it just didn't work. Manually doing "modprobe vboxdrv" said it was missing symbols.

To track this down, I started by reading the source for the vboxdrv init script. Under the hood, it runs /usr/share/virtualbox/src/build_in_tmp to recompile the module. You can run this by hand to see what's really going on. Annoyingly, there's a crummy test that generates this bogus warning:

test -e include/linux/autoconf.h -a -e include/config/auto.conf || ( \
echo; \
echo " ERROR: Kernel configuration is invalid."; \
echo " include/linux/autoconf.h or include/config/auto.conf are missing."; \
echo " Run 'make oldconfig && make prepare' on kernel src to fix it."; \

This is a lie, both files are there, and it prints this even in the final working configuration after I fixed the problem. I also saw some warnings about deprecated symbols in the middle and the linker complained about missing things, just like when I try to load the module.

I found some notes about using Virtualbox with a custom kernel in an ubuntu forum which was interesting background but no help. Another nice link that I want to remember for future use shows how to repackage Virtualbox using dpkg to support a customer kernel. But since this was on a RedHat system that wasn't going to help, and ultimately I suspected that the same compilation problem would kick in.

After some more searching, I found the root cause here. Some of the kernel interfaces that existed in the original 2.6.18 kernel I was using last time this worked were changed in the 2.6.25 kernel I was running now, which is why the symbols it was looking for were missing. A Gentoo bug report goes over the ugly details and suggests that Virtualbox versions after 1.6.2 fixes the problem (I had been running 1.5.6). Sure enough, download the latest version, install that one, and the module rebuilds itself just fine now as long as KERN_DIR is set correctly.

This is one of those interesting bits to debate: the reason this problem was introduced was that an API was changed "to fix bugs and add a really important feature". Linux benefits from being free to make such enhancements as necessary. But at the same time, the complete lack of concern for backward compatibility in kernel modules can really make things difficult for regular users of Linux software that must link to the kernel. The obligatory reading here for the developer's side of this argument is Stable API Nonsense.

1 comment:

Greg Smith said...

Today I noted there was a really tricky change to 2.6.23 that disabled vmware. Similarly, 2.6.24 broke it again; see vmware+hardy and the Ubuntu forums for more information.