Why do we need to use a Dockerfile?
Dockerfile is not yet-another shell. Dockerfile has its special mission: automation of Docker image creation.
Once, you write build instructions into Dockerfile, you can build the same image just with docker build command.
Dockerfile is also useful to tell the knowledge of what a job the container does to somebody else. Your teammates can tell what the container is supposed to do just by reading Dockerfile. They don’t need to know login to the container and figure out what the container is doing by using ps command.
For these reasons, you must use Dockerfile when you build images. However, writing Dockerfile is sometimes painful. In this post, I will write a few tips and gochas in writing Dockerfile so that you love the tool.
ADD and understanding context in Dockerfile
ADD is the instruction to add local files to Docker image. The basic usage is very simple. Suppose you want to add a local file called myfile.txt to /myfile.txt of image.
1 2 |
|
Then your Dockerfile looks like this.
1
|
|
Very simple. However, if you want to add /home/vagrant/myfile.txt, you can’t do this.
1 2 3 4 5 6 7 8 9 |
|
You got no such file or directory error even if you have the file. Why? This is because /home/vagrant/myfile.txt is not added to the context of Dockerfile. Context in Dockerfile means files and directories available to the Dockerfile instructions. Only files and directories in the context can be added during build. Files and sub directories under the current directory are added to the context. You can see this when you run build command.
1 2 |
|
What’s happening here is Docker client makes tarball of entries under the current directory and send it to Docker daemon. The reason why thiis is required is because your Docker daemon may be running on remote machine. That’s why the above command says Uploading.
There is a pitfall, though. Since automatically entries under current directories are added to the context, it tries to upload huge files and take longer time for build even if you don’t add the file.
1 2 3 4 5 6 |
|
So the best practice is only placing files and directories that you need to add to image under current directory.
Treat your container like a binary with CMD
By using CMD instruction in Dockerfile, your container acts like a single executable binary. Suppose you have these instructions in your Dockerfile.
1 2 3 |
|
When you build a container from this Dockerfile and run with docker run -i run_image, it runs /usr/local/bin/run.sh script and exists.
If you don’t use CMD, you always have to pass the command to the argument: docker run -i run_image /usr/local/bin/run.sh.
This is not just cumbersome, but also considered to be a bad practice from the perspective of operation.
If you have CMD instruction, the purpose of the container becomes explicit: all what the container wants to do is running the command.
But, if you don’t have the instruction, anybody except the person who made the container need to rely on external documentation to know how to run the container properly.
So, in general, you should have CMD instruction in your Dockerfile.
Difference between CMD and ENTRYPOINT
CMD and ENTRYPOINT are confusing.
Every commands, either passed as an argument or specified from CND instruction are passed as argument of binary specified in ENTRYPOINT.
/bin/sh -c is the default entrypoint. So if you specify CMD date without specifying entrypoint, Docker executes it as /bin/sh -c date.
By using entrypoint, you can change the behaviour of your container at run time that makes container operation a bit more flexible.
1
|
|
With the entrypoint above, the container prints out current date with different format.
1 2 3 4 5 |
|
exec format error
There is one caveat in default entrypoint. For example, you want to execute the following shell script.
/usr/local/bin/run.sh
1
|
|
Dockerfile
1 2 3 |
|
When you run the container, your expectation is the container prints out hello, world. However, what you will get is a error message that doesn’t make sense.
1 2 |
|
You see this message when you didn’t put shebang in your script, and because of that, default entrypoint /bin/sh -c does not know how to run the script.
To fix this, you can either add shebang
/usr/local/bin/run.sh
1 2 |
|
or you can specify from command line.
1
|
|
Build caching: what invalids cache and not?
Docker creates a commit for each line of instruction in Dockerfile. As long as you don’t change the instruction, Docker thinks it doesn’t need to change the image, so use cached image which is used by the next instruction as a parent image. This is the reason why docker build takes long time in the first time, but immediately finishes in the second time.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
However, when cache is used and what invalids cache are sometimes not very clear. Here is a few cases that I found worth to note.
Cache invalidation at one instruction invalids cache of all subsequent instructions
This is the basic rule of caching. If you cause cache invalidation at one instruction, subsequent instructions doesn’t use cache.
1 2 3 4 5 6 7 8 9 10 |
|
Since you add Run apt-get update instruction, all instructions after that have to be done from the scratch even if they are not changed. This is inevitable because Dockerfile uses the image built by the previous instruction as a parent image to execute next instruction. So, if you insert an instruction that creates a new parent image, all subsequent instructions cannot use cache because now parent image differs.
Cache is invalid even when adding commands that don’t do anything
This invalidates caching. For example,
1 2 3 4 5 |
|
Even if true command doesn’t change anything of the image, Docker invalids the cache.
Cache is invalid when you add spaces between command and arguments inside instruction
This invalids cache
1 2 3 4 5 |
|
Cache is used when you add spaces around commands inside instruction
Cache is valid even if you add space around commands
1 2 3 4 5 |
|
Cache is used for non-idempotent instructions
This is kind of pitfall of build caching. What I mean by non-idempotent instructions is the execution of commands that may return different result each time. For example, apt-get update is not idempotent because the content of updates changes as time goes by.
1 2 |
|
You made this Dockerfile and create image. 3 months later, Ubuntu made some security updates to their repository, so you rebuild the image by using the same Dockerfile hoping your new image includes the security updates. However, this doesn’t pick up the updates. Since no instructions or files are changed, Docker uses cache and skips doing apt-get update.
If you don’t want to use cache, just pass -no-cache option to build.
1
|
|
Instructions after ADD never cached (Only versions prior to 0.7.3)
If you use Docker before v7.3, watch out!
1 2 3 4 |
|
If you have Dockerfile like this, Run apt-get update and Run apt-get install openssh-server will never be cached.
The behavior is changed from v7.3. It caches even if you have ADD instruction, but invalids cache if file content is changed.
1 2 3 4 5 6 7 8 9 10 11 |
|
Since you change rock.you file, instructions after Add doesn’t use cache.
Hack to run container in the background
If you want to simplify the way to run containers, you should run your container on background with docker run -d image your-command. Instead of running with docker run -i -t image your-command, using -d is recommended because you can run your container with just one command and you don’t need to detach terminal of container by hitting Ctrl + P + Q.
However, there is a problem with -d option. Your container immediately stops unless the commands are not running on foreground.
Let me explain this by using case where you want to run apache service on a container. The intuitive way of doing this is
1
|
|
However, the container stops immediately after it is started. This is because apachectl exits once it detaches apache daemon.
Docker doesn’t like this. Docker requires your command to keep running in the foreground. Otherwise, it thinks that your applications stops and shutdown the container.
You can solve this by directly running apache executable with foreground option.
1 2 3 4 5 6 7 |
|
Here we are manually doing what apachectl does for us and run apache executable. With this approach, apache keeps running on foreground.
The problem is that some application does not run in the foreground. Also, we need to do extra works such as exporting environment variables by ourselves. How can we make it easier?
In this situation, you can add tail -f /dev/null to your command. By doing this, even if your main command runs in the background, your container doesn’t stop because tail is keep running in the foreground. We can use this technique in the apache case.
1
|
|
Much better, right? Since tail -f /dev/null doesn’t do any harm, you can use this hack to any applications.






