Early on at Narrative Science, we were keenly aware of the need for an automated deployment system. We wanted to utilize the modern benefits of the cloud, including flexible resource management and autoscaling.
When we initially implemented our system, there were two popular players in this space: Puppet and Chef. After a short dalliance with Puppet, we converted to Chef, which we’ve been using for the past few years.
Puppet and Chef are both deployment systems configured through declarative specifications. These systems promise to streamline server administration through predictable consistent configuration.
However, as we grew, we found these systems both suffered from certain problems:
- Installation on running services without impacting quality of service was a challenge
- Rollbacks are difficult because of layered configuration dependencies
- Failed deployments when auto-scaling at peak hours left us under-resourced leading to quality of service degradation
- Failed deployments could leave machines in broken or unknown states
- Build times became very long as the complexity of our configuration grew
- Over time, machine configuration became poorly understood as outdated assets and orphaned configuration piled up
- Declarative configuration was difficult to debug and obscured underlying system changes
Finally, we were frustrated because there didn’t seem to be a clear division between doing development on the deployment system and using the deployment system.
Enter Immutable Hardware
To deal with these issues, our engineering team has focused on an immutable hardware strategy. Immutable hardware is based on the idea that it is cheap and easy to create and destroy new instances in a virtualized environment. In this scenario, deployments mean simply shipping an existing virtual machines to our computation environment.
We refer to the generation of virtual images as machine builds, as to distinguish them from machine deployments.
Immutable hardware directly speaks to many of the issues we suffered from with previous build systems:
- Autoscaling and upgrades are fast and reliable
- Builds are not done on in-service hardware, so broken builds never break production
- Rollbacks are straightforward deployments of stored builds
- Machines configuration is well understood and does not drift over time
For our first foray into virtualization, we decided to leverage Amazon Machine Images (AMI) inside of our AWS environment. In the future, we will likely investigate other kinds of virtualization such as Docker containers.
We use a tool called Packer to manage the machine build process. Packer automates the process of provisioning and capturing several types of virtual images including AMI’s, Docker containers and more.
With that decided, we needed to come up with a provisioning tool. We thought about writing raw shell, but felt something a little more modern might be desirable.
In the end, we settled on Ansible. Ansible has several benefits over our previous systems. The configuration language is slightly-lower level and procedurally oriented, thus it is very easy to understand the exact actions being taken on the machine. In fact, directives generally translate directly into shell commands.
Results that Scale
As a result of the switch to our immutable hardware strategy, we have been able to dramatically reduce deployment times, from as high as 45 minutes in some cases to less than a minute.
Nearly all of this efficiency comes from shifting most of our configuration from deployment time to build time. By pre-installing our software and configuration on machine images, we remove the vast bulk of configuration management done during deployment. This also provides fantastic stability, since deployment of these images is highly repeatable.
We are currently working to deploy all of our cloud stacks in this manner. Once we finish, we will be able to drive down computing costs through better resource management and auto-scaled deployments.