Optimizing a Docker Image
This post continues where we left off with Building a Docker Image. We are looking into how we can optimize the process of building a Docker image. This optimization can aim at reducing the size of images built or the time it takes for packaging an image.
A clever person solves a problem. A wise person avoids it.
Optimizing for image size
We have a multitude of options for taking control over the resulting image size.
First we can make a decision regarding the base image we start from.
Use light distribution (such as
alpine
)Use slim version of distribution (e.g. Debian
jessie
versus jessie-slim`)Use distro-less images https://github.com/GoogleContainerTools/distroless
Next we can pay attention to the content we add to the image
Install only required packages through
RUN
COPY
only required files (.dockerignore
will exclude content from the build context only)Remove temporary files (e.g. created by package managers)
Follow the Single responsibility principle. |
Finally we can try reducing the number of layers
Only the instructions
RUN
,COPY
, andADD
create layers.Other instructions create temporary intermediate images, and do not increase the size of the image.
Combine multiple
RUN
instructions to squash layers.
# Prefer command chaining
RUN command1 && command2
# over executing individual commands with separate RUN instructions
RUN command3
RUN command4
With earlier versions of Docker images were managed by a AUFS
storage driver.
AUFS
used to impose a hard limit of a maximum of 42 or 127 layers.
Current versions of Docker now use the overlay
/overlay2
storage driver.
Optimzing for image build duration
Exclude content from the Docker context
First we can use a .dockerignore
file that excludes files from being considered for addition.
This works almost like a .gitignore
file with Git VCS.
docker image build
command sends the build context from the client machine to the Docker daemonDue to the client/server architecture, sending the context is almost always an expensive remote operation (socket, http, …)
.dockerignore
file can exclude files from the context and speed up context transfer
The build context is typically some local file system content to be added to a Docker image.
The docker image build
command compresses the build context and sends it to the daemon
Use the build cache
Next we are advised to utilize the build cache. But in order to do so, we have to understand how the build cache works.
ADD
and COPY
instructionsA checksum of the content of each added file is calculated.
The last-modified and last-accessed times of the file(s) are not considered.
During the cache lookup, the checksum is compared against the checksum in the existing images/layers.
If anything has changed in the file(s), such as the contents and metadata, the cache is invalidated and the layer and all downstream layers rebuilt.
RUN
Cache checking does not look at the files (content) manipulated by a
RUN
instructionThe command string itself is used to find a cache match
Modifying a
RUN
instruction invalidates the layer built by the instruction
With understanding the cache, we can try to apply this knowledge in a first experiment.
RUN apk update (1)
RUN apk add --no-cache git bash (2)
1 | Updating the package manager sources is a required step for adding packages |
2 | If this line is changed, package installation may break if the package manager works with outdated package info. |
RUN apk update && \
apk add --no-cache git bash (1)
1 | Either both commands are cached or none by applying command chaining! |
Next lets apply some other pattern for optimizing cache utilization.
Arrange (re-order) Dockerfile instructions for efficient cache usage for a given Dockerfile with the following assumptions:
The software to package as image will change frequently
The
RUN
instruction changes with each release to download a newer version.
FROM openjdk:8-jre-alpine
RUN mkdir /opt/app
ARG VERSION=latest
ARG ARTIFACT_BASE_URL=https://dl.bintray.com/software-craftsmen/continuousdelivery/at/software-craftsmen/continuousdelivery
RUN wget -q -O /opt/app/app.jar \
${ARTIFACT_BASE_URL}/${VERSION}/continuousdelivery-${VERSION}-exec.jar (1)
CMD java -jar /opt/app/app.jar
1 | Move RUN wget instruction to bottom to allow cache hit for CMD ! |
The above example demonstrates how we rearrange instructions by moving instructions that change frequently to a top layer (to the bottom of the Dockerfile). Doing so we can prevent that layers are rebuilt when they shouldn’t.