Tuesday, 17 June 2008

virt-ps and the Red Hat Summit

Tomorrow (Wednesday) through to Friday is the Red Hat Summit in Boston. If you're coming, please make sure to see my talk with Dan Berrange on the "Virtualization Toolbox", or all the little, useful tools we've been writing to help you manage your virt systems. That talk is tomorrow, Wednesday 18th June, some time after 11am.

As I mentioned previously on this blog I'm working on deep inspection of the internals of running virtual machines, and dressing this up as familiar, easy to use command line tools, such as virt-df and virt-dmesg. I'll be talking a lot more about those tomorrow, so I don't want to spoil the surprises.

The real question is whether I'll get virt-ps (process listings) working today. Getting the process listing out of a stuck virtual machine is immensely useful to find out what's going on with the machine. For example, did it blow up because there are too many Apache processes? Or is some other daemon causing trouble? I had an initial implementation of this working, but it was rather slow and unsatisfactory because of the all the guessing and heuristics it had to do. In the meantime, I discovered that getting the Linux kernel version is quite easy, and once you know the kernel version you immediately reduce the amount of heuristics you need by a large factor. So the new implementation should be much faster.

Faster, but it doesn't work at the moment. Today is the final push on this - can I get virt-ps working in time for the demo tomorrow?

2 comments:

Jeremy said...

Of course, the kernel version isn't definitive given vendor patching... unless you're going to have a table with how various vendors patch things.

Richard Jones said...

Jeremy:

Yes, and CONFIG_* makes a difference too. We still use heuristics to determine the precise location of pointers within the structure. The interesting pointers are the ones which point to (list_heads of) other task_structs, so one can make assertions about where pointers lie, test them, and backtrack if found not to be correct. If that gets too slow, then we have the full utsname so using tables for each vendor is a possibility too, automation being the key to make that feasible.