Sep 1, 2014

Linux Containers - Server Virtualization 2.0 : A Game Changer!

Summer Greetings!

The highlight of this blog is about Linux Containers and how it will once again change the server computing by taking us back to the roots of running single OS/server! First, let me quickly highlight the history of Linux Containers and how it evolved over a period of time.

Linux Container History:

  1. 1979: chroot was introduced in Unix7
  2. 1982: chroot added to BSD by Bill Joy
  3. 2000: chroot expands to jail cmd in BSD
  4. 2005: Solaris introduces zones
  5. 2008: Linux introduces lxc. (user space tool to create containers)
  6. 2013: namespaces inclusion in Linux kernel 3.8 
  7. 2014: Container popularity and toolkits like Docker built on top of lxc makes it easy to template/automate apps inside containers

1) Why Server Virtualization? (2000: Unix)

Let us first to take a look at why servers are virtualized using hypervisors like ESX(VMware) or Xen/KVM. The primary reason was "under utilization" of servers. If I remember correctly from my Sun days, it was around 30-40%! The secondary reason is entry of Windows Server OS and its memory leak problem in the servers.

The hypervisors helped to fix both problems:
a) Run many Guest OS thereby running many applications
b) If there is a problem with Windows Server OS, safely reboot it without affecting other applications

Sun did have an answer for under utilization using Solaris Zones. Unfortunately Sun lost its way and Linux had very poor implementation of containment features around 2005. So hypervisors are the only way to increase the server utilization and it resulted in huge success for VMware followed by EC2 in AWS.

2) Server Virtualization 1.0 (2004: Hypervisors)

VMware done a good job of increasing server utilization by enabling to run many OS per server to fully utilize the server hardware using hypervisor technology(ESX). This success resulted in other hypervisor products such as Xen/KVM, Hyper-V.

Here is the overview of server virtualization using hypervisor:

OK. Now let us look at the issues with this approach.

a) CPU
It's slow.

It's going through 2 OS layers before apps are getting executed in CPU. Naturally products like App Servers and Databases are not suitable to run on this type environment. To fix this, there is a technique called "Para Virtualization" which was invented to run OS system calls directly on hardware. Here is the problem: OS needs to be modified to support underlying hypervisors. This creates OS incompatibility- we were handling 3 types of RedHat 6.4! (Bare Metal, Full Virtualization & Para Virtualization)

Then there are some fundamental problems like clock issues in Redhat OS running on EC2 in Para Virtualization mode! Once EC2-S3 communication failed because of clock/NTP problem. Can you believe this type of clock issues in 2014? But that's the reality of running too many abstractions!

b)File System
It's slow. IIOPS are terrible in hypervisor environment. In this masala-curry type environment, we need to create distributed file systems to share data across multiple VM's and adds another level of complexity!

c)Network
The web products are fundamentally distributed systems. They will be speaking many protocols like HTTP, JDBC, AMQP, etc over TCP to create a single application. This type of super-abstraction naturally slows down packet flows!

Finally for each VM we will be duplicating libraries! You can see in the picture above. It's fatty in the range of few GB's, so ability to move around VM is also slow. More importantly since it's fatty we can pack only less number of applications inside a server.

Sure, from the CIO perspective he got maximum server utilization. But how did he get it? Running OS on top of OS and duplicating libraries!!! It's simply low performance and high cost environment.

Note: Hypervisors are based on emulating virtual hardware

3) Server Virtualization 2.0 (2014: Linux Containers)

Remember Sun was trying to improve utilization using zones? The same type of concept is now available in Linux kernel, mainly thanks to all the parties including Google agreeing to create a single "containment features" inside kernel. (Unlike KVM/Xen) It is now officially called as Linux Containers. The linux container basically allows to create many applications in a single OS running on a server. The server utilization, isolation/security is provided with high performance with less abstraction!

Container Ecosystem:
Docker(toolkit)-> LXC(user space)-> Linux Container(namespace + 
cgroups)


The 2 primary Linux containment features are CGroups and Namespaces.

In simple terms, linux containers are about grouping of processes and assign network + file system. This enables to run many applications inside the same server using single OS.  It completely removes hypervisors, dual OS, duplicating app libraries and more importantly allows apps to run directly on hardware instead of system call translations.

Developers can develop, test and run in the same environment. There is no OS incompatibility issues and helps for faster software development.

Here is the overview:

You see here, it's simple! One OS and one set of libraries for all the applications. So we can pack more apps, higher performance and lower cost.

It make sense? Now the question is how real is Linux Containers. The short answer is that everything at Google runs on Linux Containers- search engine, gmail, etc. Google engineers are saying that 2 billion linux containers are created per week! The following picture illustrates the evolution of containers at Google:

Linux containers are already in production. Linux community has agreed to create one technology for containers inside kernel. Toolkits like Docker are helping to create containers. It's real.

Note: Containers are based on sharing operating system kernel

Summary

Simplicity always wins in the computer industry. Linux container will take mainstream in next few years. We will see more cloud providers switching to Linux container based model from hypervisor model in the coming years. It's all about high performance and low cost. Linux Containers as the new paradigm has just started- design your systems specifically to take advantage of containers!

There is no question that legacy hypervisor based virtual machines will exist for a while and we haven't even seen the end of life for mainframes! The growth of hypervisor based virtual machine will stall first and then decline. You can already see that VMware started supporting container and will be interesting to see how they position with ESX which is their cash cow.

High Performance Technical Computing and High Performance Business Computing will jump to container technology first as it helps big time! (Google running everything on container is a good example)