Sie sind auf Seite 1von 9

nschoe's labs - Docker: Taming the Beast - Part I

Page 1 of 9

is one of those things that, one day, started popping in forums, in IRC channels discussions and on Hacker News and making
er took time to really know what that was really about.
course I dont live in a cave: I had heard of it, but whenever I read something that talked about Docker, ideas like software containe
ndboxes and virtual machines were lighting, but that was about it.
then one day I wanted to learn Docker, and I immediately faced a problem: everybody knew about Docker, everybody talked about Docker,
ody was actually explaining Docker anymore. It was as if it was so obvious, that the only topics were about Docker implementations, peo
ng that Docker did not bring anything new, that containers had been around for decades already, etc.
thermore, the Docker landing page itself was not much more verbose:

s is very abstract, and for somebody who missed the train, it was not that easy to find some useful explanations. The only thing I knew, was t
ker dealt with containers, which, in my mind, was assimilated to a sandbox (in particular, I come from the Haskell world, so the closest anal
d was the cabal sandboxes).
that sounded like magic to me: make a software in a completely enclosed environment (whatever that means), without affecting the comput
idea seemed very attractive to me, even more because I recently switched to nixOS, a Linux distribution built around the nix package mana
ere this concept of isolation is paramount.

nother word, my goal in this first post is to spare you the trouble of fishing for information about Docker to understand it as well as sharing
about Docker, of course! 5 days a week Im hanging on the #docker IRC chan (Im nschoe by the way, if you want to come and say hello :-))
many times I see newcomers asking questions that clearly indicates they are missing some very important base notions: if youre in that cas
e to help you in this series of articles. Lets fix that together!

I just want to clarify that Docker (and more generally software containers) is a complex topic, that goes back all the way down to
kernel. Im not a Docker developers and Im not an expert on the field. I just like to understand, and I work better if I really get wh
going under the hood.
Ill do my best to transmit what I understand of all of this, but if you spot any mistakes, please dont hesitate to shoot me an email
me know, Ill fix it.
ewise, if youre a newcomer and think I missed a point or left a gray area, tell me so I can fill in the gap!

his first post, we wont play with Docker directly, we wont issue any docker commands, but we will understand the basics, the
ts going on under the hood. But dont worry, we will start playing with Docker right from part 2!
st me: Docker is not an easy thing, and you will benefit from reading some theory about it.

http://nschoe.com/articles/2016-05-26-Docker-Taming-the-Beast-Part-1.html

10/24/2016

nschoe's labs - Docker: Taming the Beast - Part I

Page 2 of 9

always better to have an idea, however vague it may be, when you want to learn something. So let me give you some intuition about Docker, r
w, so that we have the rest of the post to actually explain it.
before I give you a sentence (which would not make much more sense that the previous quote), let me give you some facts, in no partic

Docker is free and open source (which is awesome), if you missed it before, the website is here
Once you have an idea of what Docker is, its documentation is fairly well written and understandable, its available here
Docker can mean many things, if you check the documentations home page, you will see
Docker Engine
Docker Swarm
Docker Compose
Docker Hub
Docker Trusted Registry
Docker Cloud
Docker Machine
Docker Toolbox
a lot!
lot Dont worry, well deal with each of them in our articles.
What you are interested in, and what Docker should mean to you right now is Docker Engine. Unless stated otherwise (or pla
obvious!), from now on, when we talk about Docker, we mean the Docker Engine.
software,
software meaning its not a library, its not a web app. Its a command-line software.
Docker allows you to create containers (the divine word!); as unsatisfactory as it sounds, at the moment, think about containers as softw
: an isolated (we will see how) environment (i.e. set of tools, files, environment variables, etc.) in which your software lives.
This will be the only time when I will tell you this, but for now (and only for now), you can think of containers as a sort of virtual machine.
this analogy is dangerous because people tend to stick to it, and Docker is not a virtual machine, at all. So keep this analogy fresh, and d
store it in your long-term memory.

so now for the sentence, even if that doesnt clarify much, we can say that

ts it for now, lets keep reading to make sense of all of this.

ker is a complex tool, and it can be useful in several scenarii. Lets quickly examine some of them (I dont pretend to list them all) and see if
find some similarities with your situation.

asy Portability
s is easily one of the most common use cases I can find for Docker.
ker allows you to setup (well even say build) your environment as you wish, and this makes it very portable. To save testing and developm
e, people can focus on a specific environment, and distribute their binaries with the Docker image (well see what this is a bit later, for
ment, think of an image as the environment we keep talking about).
s way, the target customer needs only to install Docker and the image will run smoothly (whether the customer runs the same Linux distribu
n you, Windows or OS X).
p in mind that Docker runs on the Linux kernel, so its a simple matter of installing Docker if your target runs Linux, but its currently a bit m
mplex under Windows and OS X (namely, the customer installs Docker inside a virtual machine running Linux).

way, thats a common use case for Docker.

esting Your Software Under Several Environments


ther popular Docker use case is testing.
s is very popular because Docker images can be very easily instantiated and disposed of. When developing a software, its a very delicate tas
ure that it will work on your customers system configuration. Does he use the same library versions? Does he have the same tools than you?
r Web App still work with PHP4 and PHP5? Does your Haskell project compile with older GHC version? Etc.

ker is very useful here because you can create an environment with PHP 4 or GHC 6 installed and test your installation inside it

h your system.

I know.
know
I anticipate some advanced users that may be thinking right now that I missed an important point about Docker. The use case I
described is somehow controversial: some people think that Docker is not the best tool for this.
There are other solutions for this, such as the nix package manager on which the nixOS Linux distribution is based off.
I know. It just happens that there are a lot of people using Docker for this, and despite being a huge fan of nix and nixOS, I still th
worth mentioning.

http://nschoe.com/articles/2016-05-26-Docker-Taming-the-Beast-Part-1.html

10/24/2016

nschoe's labs - Docker: Taming the Beast - Part I

Page 3 of 9

ontinuous Integration and Continuous Delivery


ow that C.I. and C.D. are not the same thing, but they can be grouped together for this sections sake. If you plan on setting up a C.I. developm
ironment, Docker can be very easily grouped with Jenkins (see here for a quick access).
can setup git hooks that will start your Jenkin pipeline inside a Docker container and push your code if the tests pass.

ave
ve Money on Infrastructure Without Compromising Security
oure in the Saas (Software As A Service) business, you have several clients, running the same service you offer. And if youre not too bad, you c
) about security and in particular, about making sure your clients instances are well separated, and as independent as possible. For insta
s is just an example), you might spawn several servers of your custom app, and if one crashes or gets compromised, youd like to reduce the
our other clients being infected.

ull Separation of Services


ically, your service is based on several softwares: you may have an nginx running as an HTTP server, to serve your Web App files, you may ha
bSockets server, you may have a postgreSQL database storing your clients information. All of these are separate entities that may crash or n
e updated separately.
ker allows your to containerize each of these services, making them isolated and independent, while controlling precisely the way t
mmunicate (understand: chose and control which ports and/or UNIX sockets they talk with one another).

ker, through the use of Docker Swarm (this is for another post) can quite easily help you setup a High Availability server/database to preve
e that crashed to ruin your entire setup.
you can quickly and easily setup a load balancer between several nginx instances to relieve your over-crowded server and/or smoothly t
wn a server while replacing it with another, possibly updated one.
you can setup a computing cluster and use Docker (Swarm) to easily spread your computing power.

you see, Docker has many, many possible use cases, and surely I am still missing some very interesting applications. Anyway, before we d
per into Docker, its important to keep our head cool and not mistake Docker for what its not.

w that we have seen some cool Docker features, lets give some warning words and see what Docker is not.
not

not a virtual machine.


This one may be the most difficult concept to grasp for newcomers, Docker-is-not-a-virtual-machine. It doesnt do virtualization

The problem is that every article that talks about Docker also talk about virtual machines, and make analogies. In this article, I will try to av
talking about virtual machines as much as I can, and make you forget about that parallel.
A bit later in the article, well see some differences between the two.

not magic.

Too often I see newcomers come on the #docker IRC chan asking about quick instructions to achieve X, Y and Z. Docker makes it relati
very complex notions such as overlay file systems and control groups, but that doesnt mean its trivial.
One particular point on which I insist is that Docker does not replace system comprehension. If you find something difficult to understand
not think that Docker is the solution. More often than not, Docker will actually add a layer of complexity and it might not be clear whether
problem comes from the application or from the fact that its dockerized.
Two days dont pass without someone coming on the #docker IRC chan and ask why its postgreSQL data were lost when he recreated
container, or why the Docker image he is building takes half his disk space.
In a word, you need to understand the underlying concept before using Docker. I have spent hours in the Docker documentation, and I h
have doubts every time I use Docker for something new: am I really good enough in XYZ to replace the common method by Docker containe

was not concrete enough: dont try to containerize a postgreSQL database if youre not familiar with postgresql (can you do comm
ministration tasks with psql ? Do you know how to create, list, alter and delete tables? do you know how to save and backup your postgreS

not try to setup a containerized front-end nginx, if you dont know the basics of nginx configuration. Ive paid this particular price myself a
lucky enough to find someone on #nginx to help me with that.
nt try to isolate your compilation chain if you cant write basic Makefiles.

l, you get the idea :-)

le this part is technically not needed to start using Docker, I will talk about some fundamental concepts about Docker and how it works, so

ll talk about whats under the hood of Docker and how it makes the magic happen.

rder to get rid of false ideas and our intuition (which, in this case, is most likely playing us), well talk about how Docker does not

http://nschoe.com/articles/2016-05-26-Docker-Taming-the-Beast-Part-1.html

10/24/2016

nschoe's labs - Docker: Taming the Beast - Part I

Page 4 of 9

said before, Docker is not a virtual machine; lets see how one roughly works, then.

rtual machine, as its name implies is like a real machine, only its virtual, as in inside another machine. It behaves very much like a real, fu

ed machine (computer).

en using a virtual machine, you generally create a virtual hard disk, which is a big file on the hosts filesystem, you allocate some of the hosts R
video memory to the virtual machine, by making it accessible through special, low-level routines.
the virtual hard drive, you install a complete operating system (Windows, Linux, etc.) from scratch. It means that from the point of view of
aller inside the virtual machine, its like its really writing to a disk (only its a file on the host, but it -the installer- doesnt know it). But t
snt change much: you still partition it, you still create filesystems on the partitions (FAT, ext3, ext4, etc.).
n you typically write the MBR in the first few sectors of this file (or now you write your UEFI loader in the correct partition), and when you st
r virtual machine, it read the hosts file as its hard drive, reading the partitions, the OS bootsector, etc.

ce your virtual machine is only reading the hard drives data and executing its instructions, it can basically run anything, and in particula
snt matter what OS you install and run. This is the big strength of virtual machines. And with CPUs virtualization features, you can even
erent processor architectures.

s come back quickly to the virtual hard drive: as I said before, the virtual machine uses a (big) file on the hosts filesystem. When the OS inside
ual machine writes data to disk (e.g. creates a file), it calls low-level, kernel routines (drivers) to write to the file. In a real machine, these ke
tines will call the hard drive driver and actually writes the data to the physical disk.
virtual machine, this process is sort of hijacked: when the OS calls the low-level routines to write to disk, the virtual machine software
ch these calls, write the data to the hosts (big) file serving as the virtual hard drive and send back the appropriate answer to the virtualized O

very simplified and Im sure some specialists are hating me right now, but I think its enough to understand the concept. And e
st importantly, to understand the differences with Docker containers.

everything we just saw is roughly how a virtual machine work and its really not the case of Docker. This is important, because you have
get this, or at least, remember that this is different.

w we are getting serious; Docker relies on several features to make the magic happen. And most of them have to do with the Kernel.
summarize quickly, the kernel is the core of Linux (and any OS, for that matter, but Docker runs on Linux, so now Ill mostly talk about Linu
are in OS X or Windows, dont be sad and keep reading, all of this still applies. Well see how Docker works for these OSes).
core I mean that very low-level routines are implemented in the kernel: drivers to communicate with external peripherals, process schedul
ystems, etc. Everything that makes the OS work, is in the kernel.
ou see, this is an important, crucial and heavy part of an OS.

running OS, there are generally two lands or spaces: the kernel space (a.k.a. kernel land) and the user space (a.k.a. user land). You can
spaces as two levels -or rings- of privileges.
re are a small number of programs running in privileged mode (i.e. in kernel space), these are called system calls: stat , splice
as well as device drivers. These programs have a complete, unrestricted access to everything: this is why there are only a very

other programs -the one you use, web browsers, terminal emulators, mail clients, etc- are all running in the user land, with unprivileged
ricted access. These are the user programs, this is why this is called the user land.

t of the magic behind Docker relies on this notion of user lands. To put it simply, each Docker container creates its own user land, tha
arated from the hosts and the other containers. A bit as if you booted your computer a second time on the same kernel.
ll explore how it is achieved in the following sections.

what are those namespaces? We keep hearing about them, but what are they?
answer this question, we need to have a little understanding of how the Linux kernel works. Especially about processes.
processes

cesses are instances of a program. When you run a program, it created a process. Now there are some programs that creates several proces
you can think of it at if the program itself launched other small programs. Note that I am not talking about threads, threads are another beas
le process can spawn several threads to make use of multithreading, to parallelize operations. Here, I am talking about processes, programs.

at any given time, there are a lot of processes running on your computer, you can have an idea: try running ps -A | wc -l . It will return a num
is close to the number of processes running at that time. Right now, I have 149 processes running.

mportant to have an understanding of how these processes interact with each other.

s play a little bit. Launch your favorite terminal, it should present you a shell. For the vast majority of people, this will be bash
elf, but the principle will be exactly the same.
w that you are in your shell, run touch test to create a file named test . Then run tail -f test . What this does is launch the program

http://nschoe.com/articles/2016-05-26-Docker-Taming-the-Beast-Part-1.html

10/24/2016

nschoe's labs - Docker: Taming the Beast - Part I

Page 5 of 9

ow mode, which mean that it keeps watching the content of test (currently empty) for new output. We dont really care about that, all that
e about is that tail wont terminate: it will keep running.

w run another terminal, we will try to see whats happening. As you probably already know, ps is what we can use to see the running proces
will format a bit its output so that it is more readable. Run ps -Ao ppid,pid,user,fname . This launches ps to print a snapshot of the current runn
cessed, and format the output, to display, in the order: the parent PID, the PID, the user who executed the process and then the process nam
hould return a pretty long list, but toward the end, you should see something like this:

7379 nschoe
8308 nschoe
8862 nschoe

zsh
tail
ps

member that the left-most column is the parent PID, and the second is the PID. Here we see something interesting: the zsh process has PID
process has Parent PID 7379 . The numbers will be different that mine, but you will still have those two numbers equal.
s is a very basic, and very important notion of processes: a process can have child processes and processes (the children) can have par
cesses. This is pretty easy to understand: when we launched tail from our bash/zsh shell, it created a new, child process.
ts one important concept. Lets see immediately a new one, go back to your first terminal, the one from which you ran tail and hit
.
w, launch this command: tail -f test & . The & sign that we appended means that the command we launched, tail , runs in the backgrou
eed, you can see that your terminal is now available, even though tail is still running.
ps -Ao ppid,pid,user,fname

12242 nschoe zsh


2 13267 nschoe tail
13346 nschoe ps

w from that terminal, hit CTRL + D . It is possible that it answers with somehting along the line of
you have running jobs

which case, hit CTRL + D again. It should quit the terminal. Now that it is exited, lets run our ps command again (in a new terminal):
:

15131 nschoe

ps

re will be plenty of output of course, but toward the end, where we usually saw tail and bash/zsh we dont see them. Lets grep
ps -Ao ppid,pid,user,fname | grep tail you should see nothing.

s a second, very important concept of processes: when you kill the parent (which we did by killing bash/shell ), the child process is generally ki
How does it work and how can this be possible?
create the child process, the parent process generally called fork , which creates the child process (this is a basic summary, that will be enoug
erstand Docker). fork returned the PID of the child process.
en we kill the parent process, by sending it a SIGNAL, the parent can (and should) forward that SIGNAL to its child(ren): this is how the par
cess can kill its child process.

en a child process dies (either because its parent forwarded it a SIGNAL or because it received itself a SIGNAL), what really happens is that
return code is set to a special code, called EXIT_ZOMBIE (this is the real, actual, official name!). At that point, the process still technically exist
es a slot in the maximum number of processes, etc), and a signal called SIGCHLD (for SIGnal CHiLD) is sent to the parent. This signal basically t
parent process that its child just died, and that it should do something about it. The parent then must reap the dead process child. Then,
y then, does the child process cease to exist.

the parent never gets a chance to reap the child process? Well, we will emulate this behavior: open a terminal, and run this comma
. As before, the & will make tail run in the background. the nohup directive here prevents the parent ( bash/shell ) from forward
SIGNAL to its child.

p tail -f test&

ps -Ao ppid,pid,user,fname

16127 nschoe zsh


7 16950 nschoe tail
16959 nschoe ps

re getting used to it: we can see zsh has PID 16127 and tail had PID 16950 and has PID 16127 for parent, which is zsh . Classic.

http://nschoe.com/articles/2016-05-26-Docker-Taming-the-Beast-Part-1.html

10/24/2016

nschoe's labs - Docker: Taming the Beast - Part I

Page 6 of 9

. Your terminal will most likely complain with something like zsh: you have jobs running . Ignore that and hit CTRL + D
k this time, and your terminal should exit, as before.

w, lets see what happened to tail : ps -Ao ppid,pid,user,fname :

950 nschoe tail


17003 nschoe ps

t, we see that zsh doesnt appear anymore, which is normal, because we killed it. But interestingly, we can see that tail still exists! It was
d. We know its the same tail process, because it has the same PID (even though PID numbers can be re-used, in this case, this
e!). Even more interestingly, we can see that now tail s parent PID is 1 . And this time, you should have 1 too.

s is another key concept of how Linux processes work: there really is one process to control them all. In Linux, there always is a top-m
ent-most process, called the init process. It used to really be called init, but its very likely yours is called systemd now. You can see it w
o ppid,pid,user,fname -p 1 (be sure to remove the A ). It should return something like:
1 root

systemd

first three columns should be identical: we displayed the process with PID 1 . This process is launched by root and has no parent (hence
ent PID). Whats susceptible to change, is the name of the process. Most likely you should have systemd , but its still possible that you have
way, the very first process in a Linux system, is always the init process. And there can be only one.

s process is the first one launched when the system boots, and the last one killed when the system shuts down. This is his first role.
second role is precisely what weve just seen: his role to become the parent process of children who do not have a parent anymore. This
tail : we killed bash/zsh (and prevented the forwarding of the SIGNAL) and so tail became orphaned (this is also the correct, offi
systemd/init adopted it, and it became its parent.

can always go back to the first process, the init process: run ps -Ao ppid,pid,user,fname again, pick a PID, whichever you want, for me,
so PID 6194 . It has parent PID 5318 . Now Ill display information about the parent process: ps -o ppid,pid,user,fname -p 5318
r parent PID).
1067 5318 nschoe zsh . So the parent was zsh . This zsh has parent PID 1067 , lets print its information: ps -o ppid,pid,user,fname -p 1067
1 1067 nschoe .urxvtd- .
here we are! The parent process is urxvtd (this is my terminal emulator, yours might be gnome-terminal or konsole for instance), and this time,
, init.
have systemd/init as a distant parent, may it be the direct parent, the grandparent, the great-grandparent, etc.

hat Does it Have to do With Namespaces?


thought I had forgotten?
havent been avoiding namespaces, actually we have been laying the bricks to understanding them. Keep in mind everything we have seen ab
d and parent processes, and the init process as it will be useful in a minute.

w, there is something you and I have been doing for some time now and which will be crucial to understanding Docker containers and isolat
have launched several terminals and several programs (like tail for instance). Then we have run ps which allowed us to observe
y on, really) other processes. And with kill we have, well, killed other processes.

believe it or not, this is the key to understand all that: we have made processes interact with each other. Which is fabulous because it allowed
o everything and which is a disaster, because it means that if we have some process that we want to have running, others could kill it, or insp
opposite of isolation!

l, all of this is possible, because all of these processes run in the same namespace. To put it simply, we can consider that a namespace is an
. Here, we have one init process, which is the (more or less distant) parent of every other processes running: it defines one namespace. One
cept of Docker and process containerization in general is to create separate namespaces.

pical namespace, like you have right now on your computer looks like this:

|-- 6729 tail

|-- 7839 firefox

|
|

|-- 7840 firefox (tab 1)


|-- 7841 firefox (tab 2)

http://nschoe.com/articles/2016-05-26-Docker-Taming-the-Beast-Part-1.html

10/24/2016

nschoe's labs - Docker: Taming the Beast - Part I

Page 7 of 9

top-most process has PID 1 and is the init process (most likely called systemd on your machine). Then this init process has direct children, h
zsh with PID 6728 and firefox with PID 7839 .
firefox have children of their own, as you can see. The figure above forms a tree.

w what happens with containerization and Docker? Well, if you want to run isolated processes, the first thing you need is for these processes
e able to do what we have been doing up until now, i.e. spy on other processes and interact with them. You need to completely isolate them.
way it is done is that we create a second init process tree, i.e. a second namespace.
s say we want to containerize nginx , a web server. Nginx is started from a shell, bash for instance. Wed like bash and nginx to be isolated fr
rest of the system, so we have to make them believe they are in their own namespace. So based on what weve seen so far, they need their o
init process. In this case, bash can be the PID 1 init process, and it will be nginx s parent process.

of course, we actually have only one machine (the host) and one operating system (our Linux distribution), because we are not running a vir
, so whatever program we launch (that includes bash and nginx ), they will be child processes of the real PID 1 init process, the
ning on our system, i.e. systemd . Here is how the processes tree will look like:

|-- 6729 tail

|-- 7839 firefox

|
|

|-- 7840 firefox (tab 1)


|-- 7841 firefox (tab 2)

|___________________________________________
|
isolated process tree |
|
|
|-- 8937 (1) bash
|
| |
|
| |-- 8938 (4539) nginx
|
|
|
|
|
|-- 8939 (4540) nginx (worker)
|
|
|
|___________________________________________|

recognize the first items of the tree: we have our machine PID 1 process, init . It started zsh and firefox as it previously did, and them h
ted child processed themselves.
w the new part: we have our isolated process tree or -namespace which I have artistically ASCII-art-decorated. In this isolated tree,
the number enclosed in parentheses). This bash started another process, nginx , which has PID 4539 . Nginx is usually comprised of a core proc
ch read its configuration and creates children as needed to handle requests, here it created a child -called a worker- whose PID is

en we are more advanced in Docker, well come back to this and actually see it for ourselves, but now, believe me when I say that if we logged
isolated environment and ran ps , we would only see this.

facts are, all these bash and nginx (everything that is part of the isolated process tree) actually runs on the host Linux system, right? They
ssical processes, and they must have PIDs. This is the number I wrote before the parentheses. This extremely important and useful feat
ch allows a process to have several PIDs has been introduced in version 2.6.24 of the Linux Kernel, in 2008!

his is what we are talking about when we mention namespaces: nested sets of process trees in which processes cant get out. From inside
ated process tree, you cannot observe processes outside of it by running ps and you definitely cant kill them with kill . This is the first ste
gram isolation which Docker uses.

y the first step? Why isnt it enough? Well, there still are plenty of ways that these isolated processes can interact with the host system:
ent protected the filesystem, so they can read/write to the hosts files, they can run very expensive computing operations and take all CPU
M, etc. As for now, we have isolated the processes from seeing and directly interacting with each other, but Docker goes even further.

s take a little break and enjoy the fact that we finally can put something concrete on the notions of namespace and isolation.

http://nschoe.com/articles/2016-05-26-Docker-Taming-the-Beast-Part-1.html

10/24/2016

nschoe's labs - Docker: Taming the Beast - Part I

Page 8 of 9

trol Groups (a.k.a. cgroups) are another feature of the Linux kernel that Docker uses for providing isolation. They solve the problem that
e just introduced about computer resources.
cally, cgroups is a tool to allocate and monitor the resources that a group of processes (for instance, our isolated namespace) can use.

ou are running several isolated process trees, youd like to control their resource usage: for instance, the first group of processes may be runn
ritical stack of softwares, so you might want to allocate 75% of the CPU and 80% of the RAM; while your second process tree might
endable, so limit it to 10% of these resources.

of this is possible thanks to cgroups.

wont go into much details for cgroups because they are less essential to understand Docker than namespaces. Usually when using Docker
ate containers and care about how to make them communicate, etc. Only when you begin running serious stacks of containers do you care ab
trolling the resources.
it was still important to talk about cgroups, because when well use the docker stats command, this will rely on cgroups.

ers are the other magic component of Docker and solve the other problem we talked about: processes can still read and write on the filesyst
mehow breaking the isolation.

Talking about layers with Docker without talking about images and containers would be a challenge, and a pretty useless one in
opinion.
However, Id like to avoid talking too much about Docker images and containers in this post, because the second article will be ab
them.
So try to focus on the meaning and not on the detail for this part. I hate articles that say trust me and obsure things, but in this c
nt really have a choice, otherwise this article will grow in size.
in details about images and containers in the next article, and Ill even clarify this notion of layers, but for the moment, focus on

e want to present a truly isolated environment to our processes, we must hide the host filesystem from it and expose a clean, predictable one

at does this mean exactly?

means that for our containerized process, the filesystem should look like a fresh install: there must be no /home/nschoe , there must not be f
that are used by the hosts applications, etc. Conversely, it means that when the containerized process writes a file in /etc/something.c
hould not appear in the hosts filesystem.
containerized process should even be able to run rm -rf /* without impacting the hosts filesystem at all.
all
w is this magic possible?

ker makes use of union filesystems.


filesystems Whats that?

nion filesystem is a filesystem that makes use of union mounts.


mounts

t wait wait! Put down that weapon, Im just about to explain that!

know what a filesystem is? Examples of filesystem are ext2,3,4 , fat , ntfs , reseirFS , etc. Filesystems is a way to manage data on the hard driv
dles the notion of files , directories , symbolic/hard links , etc. This might seem trivial, but its really not.
en you have a big file on your computer, like 10GB big, how do you store it? Do you find a slot on your hard drive big enough to fit your 10GB
you break up the files in 10 1GB smaller parts? But then, how do you keep track of the locations of each of these parts? When your OS asks
of your file, do you have it stored as meta-data or do you compute the size by iterating through the blocks each time? etc.

of these are handled by the filesystem. If youre running a classic Linux installation, youre very very likely to have either ext3
ystem; unless you manually specified another, in which case you probably know what youre doing.

s get back to our topic. A union filesystem is not a filesystem in the same sense that the ones I cited previously are. It rather relies on on
se, and then implement union mounts. Union mounts are conceptually simple yet very useful: it takes two or more directories and presen
view of them at a specified mount point.
s take a simple example, suppose we have two directories dir1/ and dir2/ , each containing files, as such:

file1.txt
file2.ods
file3.iso

dir2/
|-|-|-|--

file4.mp3
file5.txt
file2.ods
file6.jpg

l, a union mount of dir1/ and dir2/ at mount point /path/to/mnt/ would give:

http://nschoe.com/articles/2016-05-26-Docker-Taming-the-Beast-Part-1.html

10/24/2016

nschoe's labs - Docker: Taming the Beast - Part I

Page 9 of 9

transparently see the contents of both dir1/ and dir2/ at location /path/to/mnt/ .
on mount brings two good features to the table:
transparently making several directories content available to a single, unified mount point (directory)
files

s second notion probably answers the question at your lips right now: what about file2.ods ?!
ou look closely at the example before, you can see that both dir1/ and dir2/ have a file named file2.ods . So what happens?
l it depends on the union filesystem, but most of the time, there is a notion of precedence. In OverlayFS for instance, when unifying
ctories, you have the upper and lower directories, and the upper directory takes precedence.
when there are two files with the same name, the one in the upper takes precedence.

hout going into more details, this precedence thing solves one problem: what do we do when two files are named the same? but it ra
ther, more subtle problem. What happens if I want to delete file2.ods ? Simply removing it wont work: because i will remove it from the upp
ctory, but then, there wont be a name conflict between the two directories and then file2.ods will still be visible, but this time, it will be
from the lower directory. To solve this problem, union filesystems generally use a without file (the implementations can vary),
cally, what this means is that when you delete a file, rather than physically deleting it, it simply adds a third layer, which take precedence,
masks the file to delete.

way, this is a detail of implementation that we are not yet ready for.

hat Did You Talk About Layers in the First Place?


ause its very very important to Docker, and is arguably the core feature (well, not really, processes isolation is too). When you create a doc
tainer, as I said before, it has to run in its own place, with its own set of processes (weve already covered that) and its own filesystem

union filesystem (along with union mounts) are how its done: when you create a new container, Docker makes use of union mounts to moun
ential files and directories (this is why/how you do get a classic, linux filesystem architecture inside your containers: / , /etc ,
) and by making extensive use of shadowing, it can effectively delete or mask everything thats related to the host. This is why you do
directory in your container, but you dont get your specific, host-related /etc/file.conf .
he same sense, this is how it allows you to write files in your container and not pollute your host environment and other containers.

ually, union filesystem is used extensively in another place in Docker: containers and images. But this is a topic of its own, which is very of
fused so Id like to take some proper time to explain it. In another post.

ending on your personality and expectations, you might be frustrated after reading this first article because I did not use any docker
often I see people coming on the #docker IRC chan and ask questions which show that they are trying to use docker like they are trying to u
w text editor. Docker is not simple. The tools on which Docker relies are not trivial. Docker does a wonderful job of packaging complex, low-l
s into a simple, beautiful API (and it has very well-written documentation). But it still remains that the overall system is a beautifully comp

relatively easy to get a few containers running with Docker, because it makes such a beautiful job of abstracting the complexity away, but if
t take the time to analyze, document and understand what is going on under, you will quickly run into walls: you wont understand the notio
ges vs.containers (topic of next post!), you wont be able to share your environment, you will have your disk space used by up unnecess

the contrary, you dont need to be an expert in every of the details we saw: I am not myself. This is why I did no go into too much details. F
to document more on the topics you want, and if there are topics youd like me to talk about more in-depth, email me.

pe I was clear enough and that I shed some lights on some concepts that were obscure, if you still have gray areas, feel free to email me
) on IRC ( #docker ).
he next article, I will relieve some of the frustration and we will begin playing with real docker commands. In the meantime, you dont need
anything specific, Ill begin the next article with the installation instructions.

t II is available to read here!


here

http://nschoe.com/articles/2016-05-26-Docker-Taming-the-Beast-Part-1.html

10/24/2016

Das könnte Ihnen auch gefallen