Optimizing a Docker Image

2019-08-20 4 min read Martin

This post continues where we left off with Building a Docker Image. We are looking into how we can optimize the process of building a Docker image. This optimization can aim at reducing the size of images built or the time it takes for packaging an image.

A clever person solves a problem. A wise person avoids it.

— Albert Einstein (misattributed)

Optimizing for image size

We have a multitude of options for taking control over the resulting image size.

First we can make a decision regarding the base image we start from.

Use light distribution (such as alpine)
Use slim version of distribution (e.g. Debian jessie versus jessie-slim`)
Use distro-less images https://github.com/GoogleContainerTools/distroless

Next we can pay attention to the content we add to the image

Install only required packages through RUN
COPY only required files (.dockerignore will exclude content from the build context only)
Remove temporary files (e.g. created by package managers)

Follow the Single responsibility principle.

Finally we can try reducing the number of layers

Only the instructions RUN, COPY, and ADD create layers.
Other instructions create temporary intermediate images, and do not increase the size of the image.
Combine multiple RUN instructions to squash layers.

# Prefer command chaining
RUN command1 && command2
# over executing individual commands with separate RUN instructions
RUN command3
RUN command4

With earlier versions of Docker images were managed by a AUFS storage driver. AUFS used to impose a hard limit of a maximum of 42 or 127 layers. Current versions of Docker now use the overlay/overlay2 storage driver.

Optimzing for image build duration

Exclude content from the Docker context

First we can use a .dockerignore file that excludes files from being considered for addition. This works almost like a .gitignore file with Git VCS.

docker image build command sends the build context from the client machine to the Docker daemon
Due to the client/server architecture, sending the context is almost always an expensive remote operation (socket, http, …)
.dockerignore file can exclude files from the context and speed up context transfer

The build context is typically some local file system content to be added to a Docker image. The docker image build command compresses the build context and sends it to the daemon

Use the build cache

Next we are advised to utilize the build cache. But in order to do so, we have to understand how the build cache works.

Cache behaviour of ADD and COPY instructions

A checksum of the content of each added file is calculated.
The last-modified and last-accessed times of the file(s) are not considered.
During the cache lookup, the checksum is compared against the checksum in the existing images/layers.
If anything has changed in the file(s), such as the contents and metadata, the cache is invalidated and the layer and all downstream layers rebuilt.

Cache behaviour of RUN

Cache checking does not look at the files (content) manipulated by a RUN instruction
The command string itself is used to find a cache match
Modifying a RUN instruction invalidates the layer built by the instruction

With understanding the cache, we can try to apply this knowledge in a first experiment.

Non-optimized

RUN apk update (1)
RUN apk add --no-cache git bash (2)

1	Updating the package manager sources is a required step for adding packages
2	If this line is changed, package installation may break if the package manager works with outdated package info.

Optimized

RUN apk update && \
    apk add --no-cache git bash (1)

1	Either both commands are cached or none by applying command chaining!

Next lets apply some other pattern for optimizing cache utilization.

Arrange (re-order) Dockerfile instructions for efficient cache usage for a given Dockerfile with the following assumptions:

The software to package as image will change frequently
The RUN instruction changes with each release to download a newer version.

FROM openjdk:8-jre-alpine
RUN mkdir /opt/app
ARG VERSION=latest
ARG ARTIFACT_BASE_URL=https://dl.bintray.com/software-craftsmen/continuousdelivery/at/software-craftsmen/continuousdelivery
RUN wget -q -O /opt/app/app.jar \
    ${ARTIFACT_BASE_URL}/${VERSION}/continuousdelivery-${VERSION}-exec.jar (1)
CMD java -jar /opt/app/app.jar

1	Move `RUN wget` instruction to bottom to allow cache hit for `CMD`!

The above example demonstrates how we rearrange instructions by moving instructions that change frequently to a top layer (to the bottom of the Dockerfile). Doing so we can prevent that layers are rebuilt when they shouldn’t.