Martin Ahrer

Thinking outside the box

Docker image build pipeline

2019-08-30 4 min read martin

Let’s recap what we have covered in this blog post series so far. In Building a Docker Image we had a look at the anatomy of a Docker image. And Optimizing a Docker Image shed some light on optimizations when building an image.

In part 3 of this series about building docker images we look at a typical Dockerfile for building some binary artifact that is packaged as a Docker image.

A simple pipeline

The build script has two major blocks

  • Building the binary artifact. In this case this is gradle a popular build tool for Java developers building an executable Java JAR.

  • Packaging the artifact into some Docker image

Dockerfile
FROM openjdk:8-jdk-alpine
WORKDIR /project
RUN apk update && \
    apk add --no-cache git bash
RUN git clone https://github.com/MartinAhrer/continuousdelivery.git ./
RUN ./gradlew assemble --no-daemon

LABEL maintainer='Martin Ahrer <this@martinahrer.at>'
RUN mkdir -p /opt/app
EXPOSE 8080
CMD java -jar /opt/app/app.jar
RUN cp /project/build/libs/continuousdelivery-0.1.jar /opt/app/app.jar && \
    rm -rf /project

Before multistage builds (which we will cover a bit later) existed, developers created all-in-one Dockerfile scripts for compiling, packaging and dockerizing an application. Those kinds of build scripts are helpful when working with simple build environments such as Docker Hub for creating Docker images. But this style certainly comes with a major problem regarding the image size. The final Docker image contains all the tools required form compiling the source code. It is based on a openjdk:8-jdk-alpine image that includes compiler and various other tools only required at build time but not during runtime.

Building the image
docker build -t continuousdelivery:pre-multistage .
docker image ls \
  --filter reference=continuousdelivery:pre-multistage \
  --format "Size of {{.Repository}}:{{.Tag}} is {{.Size}}"

At the time of writing the image size was 684MB.

So we have multiple shortcomings with such an approach

  • Produces images larger than required (including make tools)

  • Potentially images include secrets (e.g. when git clone is using ssh)

  • Caching of temporary files is impossible (e.g. Maven, Gradle, npm)

  • Counter-productive caching (e.g. git clone instruction is cached)

Multistage build

So let’s improve the above build script by utilizing multistage build.

Dockerfile
FROM openjdk:8-jdk-alpine AS builder (1)
WORKDIR /project
RUN apk update && apk add --no-cache git bash
# TIMESTAMP argument is required for invalidating cache and trigger a rebuild of the following layers
ARG TIMESTAMP
RUN git clone https://github.com/MartinAhrer/continuousdelivery.git ./
RUN ./gradlew assemble --no-daemon

FROM openjdk:8-jre-alpine AS release (2)
LABEL maintainer='Martin Ahrer <this@martinahrer.at>'
RUN mkdir -p /opt/app
EXPOSE 8080
CMD java -jar /opt/app/app.jar
COPY --from=builder /project/build/libs/continuousdelivery-0.1.jar /opt/app/app.jar
1The first stage is the builder stage responsible for building the binary artifact.
2The second stage release starts from a fresh image openjdk:8-jre-alpine only containing binaries required for runtime an copies the artifact built by the builder stage.
Building the image
docker build -f Dockerfile.multistage -t continuousdelivery:multistage .
docker image ls \
  --filter reference=continuousdelivery:multistage \
  --format "Size of {{.Repository}}:{{.Tag}} is {{.Size}}"

At the time of writing the image size was only 131MB.

Caching issues

I already mentioned earlier that the above pipeline implementation suffers from some problems related to the Docker build cache. We can get around that by adding a build time argument for forcing a cache invalidation. By passing an argument with a different value (here a timestamp) we can invalidate the cached layers for all layers following the ARG instruction.

Dockerfile
# TIMESTAMP argument is required for invalidating cache and trigger a rebuild of the following layers
ARG TIMESTAMP
RUN git clone https://github.com/MartinAhrer/continuousdelivery.git ./
RUN ./gradlew assemble --no-daemon
Building the image with the build argument
docker image build --build-arg TIMESTAMP=$(date +%Y%m%d-%H%M%S) .

My personal opinion is that this is a hack and should be avoided.

Still we have to deal with caching files required by most popular build tools

  • Downloaded tools

  • Downloaded dependencies

  • …​

In the next post in this series we will look into BuildKit which is not enabled by default and an experimental feature at the time of writing this post.