TDD for infrastructure with Vagrant

2020-08-01 6 min read martin

Introduction

For many years (almost decades) developers have been practicing test driven development and write tests to get confidence that their code works.

However, with the rise of infrastructure as code where DevOps engineers try to describe infrastructure provisioning using code, this is something that is not observed generally. In many situations I have encountered infrastructure that

is not ephemeral
is not setup with automation tools
is not complete
is just buggy in the beginning
just fails the first deployment

So wouldn’t we love to see infrastructure development following the test driven development paradigm by

Writing infrastructure tests first
Code infrastructure
And repeat that cycle until we are done

In this post and some more posts to follow later I will look at tools for

Infrastructure Testing
Creating Virtual Machines (and discuss why this may fail with continuous delivery/deployment)
Machine Provisioning
Modeling (and testing) a pipeline

While looking at those tools I will demonstrate how to do test driven development for setting up a cluster of 3 nodes. Each of them will have a Docker engine and the ntp service installed. Finally I will try to scale that to n+ nodes and demo how to do a fully automated build for the infrastructure using a build engine.

Test driven development

Test specifications

While there are numerous test frameworks available for supporting TDD for application coding we have only a few candidates when it comes to infrastructure testing. There may be more tools, but the ones below I have found most helpful for different tasks.

chef inspec (Ruby DSL)
server-spec (Ruby DSL)
bats (bash shell script)

While the first two I judge as first class tools for testing infrastructure, I find the latter is perfect for testing shell scripts. So we could say the first two are good for integration testing while the latter allows decent unit testing.

For the upcoming experiments I will be using chef inspec which provides a DSL based on Ruby.

Target infrastructure

In order to test the provisioning scripts we need some target nodes for running the provisioning scripts and then running the test specifications against.

We can probably start off with virtual machines such as VirtualBox, HyperV, Vmware, etc. This is the traditional approach and eventually we will experience that provisioning virtual machines is quite slow and will not give a nice TDD experience. So in some follow-up posts I will move on to use containers (like Docker containers) or lightning fast KVM/firecracker based machines with weaveworks’s footloose and ignite.

VirtualBox and Vagrant

I’m going to demonstrate a full TDD cycle of writing tests, writing the infrastructure code and run the tests until everything works as specified. VirtualBox is going to be the provider for the virtual machines and Hashicorp’s Vagrant will do the work of setting up the machines.

Write some tests first

The Chef inspec DSL is pretty much self-explanatory.

Test specification ntp.rb

describe service('ntp') do
  it { should be_installed }
  it { should be_enabled }
  it { should be_running }
end

For executing the spec we need a target host that we connect through SSH (using a private/public key pair). We will create this host and the SSH keys in just a moment.

inspec exec ntp.rb \
    --target ssh://vagrant@172.28.127.100:22 \
    --key-files ../vagrant/.vagrant/machines/vagrant-node-0/virtualbox/private_key

Build the execution environment (hosts)

Now Vagrant comes to play for creating and running the target host(s). Vagrant provides a Ruby based DSL to configure VirtualBox and the hosts. For connecting to the hosts, Vagrant also automatically creates a set of SSH private/public keys files for each node.

  config.vm.provider "virtualbox" do |vb|
    vb.memory = ENV['BOX_MEMORY'] || "256"
    vb.linked_clone = true
  end
  (0..2).each do |i|
    config.vm.define "vagrant-node-#{i}" do |machine|
      machine.vm.hostname = "vagrant-node-#{i}"
      # assign a well known IP address for ansible inventory
      machine.vm.network "private_network", ip: "172.28.127.#{100+i}"
    end
  end

To start the hosts we run the command below and also take some timing metrics.

time vagrant up

Vagrant firing up the hosts the first time takes about 1 minute, 51 seconds on my decent MacBook Pro. This is way too slow given that we periodically have to teardown and recreate the machines for testing the provisioning scripts.

But let’s get the next task done, writing some infrastructure code.

Infrastructure as code with Ansible

I have decided to use Ansible as it is simple to setup and use. But even more important we can run it agent-less. I’m not going to explain how Ansible works, this is beyond the scope of that post. All of the setup is in the main/ansible folder that contains Ansible playbooks and inventory.

These are the playbooks for installing the required services.

main/ansible/ntp.yml

-   hosts: all
    become: true
    roles:
        - role: geerlingguy.ntp
          vars:
              ntp_manage_config: true

main/ansible/docker.yml

-   hosts: cluster_nodes
    become: true
    roles:
        -   role: geerlingguy.docker
            vars:
                docker_install_compose: false
                docker_users:
                    - "{{inventory_hostname_short}}_ansible"

The ansible inventory describes the set of hosts and how to connect.

main/ansible/inventory/vagrant/hosts.yml

all:
    hosts:
    children:
        cluster_nodes:
            hosts:
                vagrant-node-0:
                vagrant-node-1:
                vagrant-node-2:

Host or host group specifics are as follows.

main/ansible/inventory/vagrant/group_vars/hosts.yml

ansible_user: 'vagrant'
ansible_port: '22'
ansible_become: true
# the following sh_common_args variable is important so that no local ~/.ssh/config is interfering with the inventory
ansible_ssh_common_args: '-F /dev/null'

main/ansible/inventory/vagrant/host_vars/vagrant-node-0.yml

ansible_host: '172.28.127.100'
ansible_ssh_private_key_file: '../vagrant/.vagrant/machines/vagrant-node-0/virtualbox/private_key'

Provision the hosts

Ansible target hosts require python to be installed before we can run playbooks.

cd ansible
ansible --become -m raw \
    -a "ln -sf /bin/bash /bin/sh && apt-get update && apt-get install -y python3 gnupg2" \
    --inventory=inventory/vagrant/hosts.yml all

Now we run the playbooks for the inventory (all vagrant hosts).

ansible-playbook -i inventory/vagrant/hosts.yml ntp.yml docker.yml

Test the hosts

With the hosts being provsioned we can now run the test specifications.

cd inspec
inspec exec ntp.rb docker.rb \
    --target ssh://vagrant@172.28.127.100:22 \
    --key-files ../vagrant/.vagrant/machines/vagrant-node-0/virtualbox/private_key

We are rewarded with a hopefully green test indicator for each test. It hopefully looks like below:

Profile: tests from ntp.rb (tests from ntp.rb)
Version: (not specified)
Target:  ssh://vagrant@172.28.127.100:22

  Service ntp
     ✔  is expected to be installed
     ✔  is expected to be enabled
     ✔  is expected to be running

Test Summary: 3 successful, 0 failures, 0 skipped

Wrapup

With VirtualBox (any many other virtualization software) and Vagrant for managing virtual hosts, and chef inspec for testing I have successfully exercised test driven development for infrastructure.

It is obvious that provisioning a virtual machine is a process where we have to accept that going through a full TDD cycle takes more time than we wanted. So this is not a nice user experience. We also have to question if this approach can be implemented in a continuous integration pipeline. A continous integration pipeline means that a build service would have to create virtual machines for testing provisioning.

In the next post in this series I will try to look into the option of running the hosts as containers.