The OpenStack Summit in Atlanta ended a few days ago. I was fortunate enough to be there with several of my colleagues at Cloudwatt. Now is time to make a quick summary of some of the session I attented.
Starting from Icehouse, it’s now possible to upgrade the Nova control plane (Nova API, Nova Scheduler, Nova conductor) before the Nova compute nodes, without any interruption. This allows to upgrade the possibly numerous compute nodes with no rush, a new VM could still be launched on “old” compute nodes. From an operational point of view this feature was very much welcomed, though it’s only a first step toward Openstack live upgrade because, at the moment, only Nova supports this.
This technical session explained two main mechanisms that permit this live upgrade :
The versioning of all RPC calls : every RPC calls toward nova-compute now contains a version number so that nova-compute would know if it will be able to successfully service the call. For instance if a nova-compute version Havana only understands RPC calls up to version 2.17, when this nova-compute receives a RPC call in version 3, it would politely decline or take the appropriate decision.
The development of nova-conductor, as a central authority for DB and RPC calls. If a nova-compute receives a RPC call that it doesn’t understand, it can redirect this call associated with the highest version number it understands to nova-conductor. In turn, nova-conductor will try to “cast” (translate) the RPC call to another, older, RPC call understandable by the original nova-compute. That’s how a nova-compute in Havana version is still able to understand an order made by Nova-API in Icehouse version.
At the end of the session, the speaker asked the audience if anyone had already perform the migration from Havana to Icehouse on a real life environment. No one raised their hands :)
In this session, Boden Russell talked in depth about Linux Containers, a.k.a LXC. Boden conducted a lot of benchmarks which confirmed that LXC is at least as performant as KVM, with a much lower overhead in term of CPU and memory consumption. The speaker insisted on how incredibly fast a container could boot.
Though LXC is not yet the silver bullet (concerned has been raised around security and some networking parts), LXC is under active development. If you have a recent kernel, you really should give LXC a try.
New in Openstack Summit Atlanta was the OPS track, a track dedicated to OPS guys :) On Friday morning was a Q&A session on operating a database for Openstack. Some thoughts out of this session :
In OpenStack “rootwrap” is the piece of python code that is called each time a component has to do an administrative operation on the operating system. In other words, “rootwrap” is used to ask for privilege elevation. The problem is that rootwrap is called very frequently especially by Neutron, and rootwrap is slow (it’s written in Python thus the start time is high and rootwrap imports several python modules).
This design session introduced a “rootwrap standalone daemon”. The rootwrap daemon will be persistent, and spawned by each openstack component the first time root priviledge is needed. On the implementation detail, the python multiprocessing module will be used, which allows secure communication between a parent (the openstack component, say Neutron) and the children (say rootwrap daemon). As the rootwrap daemon will run as root, its security is of higher importance, thus a lot of code reviews have to be made. Feel free to have a look at the code proposal. First benchmarks shows that rootwrap daemon could be ten times faster than calling the older form of rootwrap on the fly.
During the design session Thierry Carez also complained that currently the rootwrap filters are too wide open. For instance they allow a non restricted usage of commands “chown, chmod, dd, cat”, which basically means that every openstack components run as root. A complete audit of rootwrap filters is required but so far, no one has stepped up.
The last session I attended to, on Friday, 5pm :) This session was kind of a Q&A session, where the audience enjoyed the presence of several Nova core developers. I can remember of three isues that were raised :
During the session, I learned that when using Galera with OpenStack, it is recommended to use keepalived or haproxy with all but one server marked as backup. This in order to avoid excessive locking when SQL statements “SELECT … FOR UPDATE” are performed (and they are, mainly when dealing with resource quotas).
The summit was great :) One advice though, if you are a developer be sure to attend to a maximum of design sessions, “general” sessions are a bit light technical-wise.