OpenVZ is an implementation of containers for Linux. It consists of a modified Linux kernel and some user-level tools.
OpenVZ kernel adds provides virtualization/isolation, resource management, and checkpointing.
Virtualization and isolation
Each container is a separate entity, and from the point of view of its owner it looks like a real physical server. So each has its own
- System libraries, applications, virtualized /proc and /sys, virtualized locks etc.
- Users and groups
- Each container has its own root users, as well as other users and groups.
- Process tree
- A container only sees its own processes (starting from init). PIDs are virtualized, so that the init PID is 1 as it should be.
- Virtual network device, which allows a container to have its own IP addresses, as well as a set of netfilter (iptables) and routing rules.
- A lot of devices are virtualized. Plus, if needed, any container can be granted access to real devices like network interfaces, serial ports, disk partitions, etc.
- IPC objects
- Shared memory, semaphores, messages.
As all the containers are using the same kernel, resource management is of paramount importance. Really, each container should stay within its boundaries and not affect other containers in any way — and this is what resource management does.
OpenVZ resource management consists of three components: two-level disk quota, fair CPU scheduler, and user beancounters. Please note that all those resources can be changed while container is running, there is no need to reboot. Say, if there is a need to give a container less memory, one just change the appropriate parameters on the fly.
Two-level disk quota
Host system (OpenVZ) owner (root) can set up a per-container disk quotas, in terms of disk blocks and i-nodes. This is the first level of disk quota. In addition to that, a container owner (root) can use usual quota tools inside own container to set standard UNIX per-user and per-group disk quotas.
If one wants to give more disk space to a container, one just increase its disk quota. No need to resize disk partitions etc.
Fair CPU scheduler
CPU scheduler in OpenVZ is a two-level implementation of fair-share scheduling strategy.
On the first level, the scheduler decides which container it is to give the CPU time slice to, based on per-container cpuunits values. On the second level the standard Linux scheduler decides which process to run in that container using standard Linux process priorities and such. Implemented using per-container runqueues.
OpenVZ administrator can set up different values of cpuunits for different containers, and the CPU time will be given to those proportionally.
Also there is a way to limit CPU time, e.g. say that this container is limited to, say, 10% of CPU time available.
User Beancounters is a set of per-container counters, limits, and guarantees. There is a set of about 20 parameters which are carefully chosen to cover all the aspects of containers operation, so no single container can abuse any resource which is limited for the whole node and thus do harm to other containers.
Resources accounted and controlled are mainly memory and various in-kernel objects such as IPC shared memory segments, network buffers etc. etc. Each resource can be seen from /proc/user_beancounters and has five values associated with it: current usage, maximum usage (for the lifetime of a container), barrier, limit, and fail counter. The meaning of barrier and limit is parameter-dependent; in short, those can be thought of as a soft limit and a hard limit. If any resource hits the limit, fail counter for it is increased, so container root can see if something bad is happening by analyzing the output of /proc/user_beancounters in her container.
Checkpointing and live migration
A live migration and checkpointing feature was released for OpenVZ in the middle of April 2006. It allows to migrate a container from one physical server to another without a need to shutdown/restart a container. The process is known as checkpointing: a container is frozen and its whole state is saved to the file on disk. This file can then be transferred to another machine and a container can be restored there.
Since every piece of container state, including opened network connections, is saved, from the user's perspective it looks like a delay in response: one database transaction takes a longer time than usual, when it continues as normal the user doesn't notice that his database is already running on another machine.
That feature makes possible scenarios such as upgrading your server without any need to reboot it: if your database needs more memory or CPU resources, you just buy a newer better server and live migrate your container to it, then increase its limits. If you want to add more RAM to your server, you migrate all containers to another one, shut it down, add memory, start it again and migrate all containers back.
OpenVZ comes with the command-line tools to manage containers (vzctl), as well as tools to manage software in containers (vzpkg).
Compared to other virtualization approaches, such as paravirtualization, containers has a few advantages.
As OpenVZ employs a single kernel model, it is as scalable as the 2.6 Linux kernel; that is, it supports many CPUs and many gigabytes of RAM. A single container can scale up to the whole physical box, i.e. use all the CPUs and all the RAM.
Indeed, some people are using OpenVZ with a single Virtual Environment. While this may seem strange, it can in fact be the natural choice in many scenarios, because a single VE can use all of the hardware resources with native performance, and there are added benefits such as hardware independence, resource management and live migration.
OpenVZ is able to host hundreds of containers on a decent hardware (the main limitations are RAM and CPU). Rough figure is up to 150 containers with Apache/sendmail/sshd and other usual system stuff per 1 GB of RAM.
An owner (root) of OpenVZ physical server can see all the containers' processes and files. That makes mass management scenarios possible. Consider that VMware or Xen is used for server consolidation: in order to apply a security update to your 10 virtual servers you have to log in into each one and run an update procedure – the same you would do with the ten real physical servers.
In OpenVZ case, you can run a simple shell script which will update all (or just some selected) containers at once.