This is a basic introduction to what Docker is based on personal experience and what I understood from online research.
What is Docker?
One needs to understand what containerization is in order to understand what Docker is. I am going to use an example here to explain some concepts first. Lets say you are relocating to a new place. You start to pack your stuff in different boxes or containers. One container will have plates, the other coffee mugs and another electric appliances and so forth. Once all the packing is done, you are going to put these containers in a vehicle and transport your stuff to the new place.
The act of putting stuff in containers and transporting them is known as containerization. You cannot mix your plates with electric appliances, but you can put them in different containers and transport them all at once using one vehicle.
The same applies with operating software, they cannot mix or in other words they cannot both boot simultaneously when a computer is powered on. But due to virtualization, you can have your base operating software as Windows and have Linux running on a virtual machine. One a single laptop or computer you have two operating software running. Back to our moving example, lets say one of the container is a cooling container for keeping meat fresh. But for this container to work, it must also get some of the fuel from the vehicle that is transporting these containers. The cooling container is sharing resources with the vehicle. This is not efficient which is the same with virtual machines (VMs). VMs share resources such as CPU, hard drive space and RAM and this is not efficient especially if the VM needs to operate heavy processes.
Now imagine if the cooling container has its own source of fuel and all it needs is to be placed on the transporting vehicle. At the end you have plates container, electric container and cooling container all being transported with one vehicle.
From the above example we can establish that a container is used to enclose and hold something either for storage, packaging or for transportation. Then containerization is a system of transporting containers. With that in mind, think of application containerization as enclosing or packaging files or libraries needed to run a desired software. This then brings us to Docker. In a nut shell, Docker is a software used to containerize other software or applications. It is developed by Docker Inc and it is open source.
How Docker works
Using our delivery truck example, the image below will show how docker works in a nut shell:
The image below is going to be more technical on how Docker works on a real machine:
As shown from the diagram above, Docker is sharing the Linux’s kernel which in turn will enable the containers to run on the same kernel. Using our delivery analogy, think of the Docker container for HDP Sandbox as the cooling container that requires its own fuel to cool the meat. The Docker container for HDP Sandbox runs on CentOS, has Java and openjdk and other applications include Hive, Ambari, Spark, Yarn just to mention a few. Its own operating software is CentOS yet the base operating software is Ubuntu 16.04.4 LTS. The cooling container requires petrol to operate its cooling engine, a hypothetical example, whereas the delivery truck is running on diesel. In addition, this is different from Docker container for ElasticSearch and Spark. They have their own needs that are separate and independent to those of the HDP Sandbox.
So Docker has containerized these application down to the basics, that is, share the kernel and run your own operating software. The requirements for this to run are not that much, hence, more containers can be added. In addition, due to the fact that the Docker engine allows applications to run on the same kernel with the base operating software, the applications can be started faster. Lastly, by virtue of these applications having their own operating software, the applications can run anywhere as long as Docker is installed on the host / base operating software. Again, using the delivery analogy, if the new place you are moving to is say abroad, the delivery truck will deliver the containers to an airport. Once at the airport, the containers will be transported using an airplane.
If you put the Docker containers for Spark, HDP Sandbox and ElasticSearch on a machine with Mac or Windows, they will run the same way they would on a machine with Linux.
In conclusion, Docker is a software that is used for application containerization. It is not the only containerization software, CoreOS released Rocket. Microsoft is working on its own software for containerization called Drawbridge.
You might be then wondering how these containers are build and once they are built how are they managed. In future articles I will look at how I use Docker and what I use to manage the containers. Back to our moving analogy, at the airport, planes are managed so that they land on the right runway, avoid air collisions and also takeoff at the right time. This again is similar with Docker containers, they need to be managed so that those that need to communicate with each other they can do so. Therefore in future articles I will also look at Kubernetes, an application that I use to manage Docker containers.