Using Terraform to Manage Infrastructure

Large applications are often a mix of code your team has written and third-party applications your team needs to manage. These third-party applications could be things like AWS or Docker. In my team’s case, it’s Twilio TaskRouter.

The configuration of these services may not change as often as your app code does, but when it does, the process is fraught with the potential for errors. This is because there is no way to write tests for the changes or easily roll them back–things we depend on as developers when shipping our application code.

Using Terraform improves your infrastructure management by allowing users to implement engineering best practices in what would otherwise be a GUI with no accountability, tests, or revision history.

On the Conversations team, we recently implemented Terraform to manage a piece of our infrastructure to great success. Let’s take a deeper look at why we did it, and how.

My team builds Shopify’s contact center. When a merchant or partner interacts with an agent, they are likely going through a tool we’ve built. Our app suite contains applications we’ve built in-house and third-party tools. One of these tools is Twilio TaskRouter.

TaskRouter is a multi-channel skill-based task routing API. It handles creating tasks (voice, chat, etc.) and routing them to the most appropriate agent, based on a set of routing rules and agent skills that we configure.

As our business grows and becomes more complex, we often need to make changes to how merchants are routed to the appropriate agent.

Someone needs to go into our Twilio console and use the graphic user interface (GUI) to update the configuration. This process is fairly straightforward and works well for getting off the ground quickly. However, the complexity quickly becomes too high for one person to understand it in its entirety.

In addition, the GUI doesn’t provide a clear history of changes or a way to roll them back.

As developers, we are used to viewing a commit history, reading PR descriptions and tests to understand why changes happened, and rolling back changes that are not working as expected. When working with Twilio TaskRouter, we had none of these.

Using Terraform to Configure Infrastructure

Terraform is an open source tool for configuring infrastructure as code.

It is a state machine for infrastructure that gives teams all the benefits of engineering best practices listed above to infrastructure that was previously only manageable via a GUI.

Terraform requires three things to work:

  1. A reliable API. We need a reliable API for Terraform to work. When using Terraform, we will stop using the GUI and rely on Terraform to make our changes for us via the API. Anything you can’t change with the API, you won’t be able to manage with Terraform.
  2. A Go client library. Terraform is written in Go and requires a client library for the API you’re targeting written in Go. The client library makes HTTP(S) calls to your target app.
  3. A Terraform provider. The core Terraform software uses a provider to interact with the target API. Providers are written in Go using the Terraform Plugin SDK.

With these three pieces, you can manage just about any application with Terraform!

Image from: https://learn.hashicorp.com/img/terraform/providers/core-plugins-api.png<

A Terraform provider adds a set of resources Terraform can manage. Providers are not part of Terraform’s code. They are created separately to manage a specific application. Twilio did not have a provider when we started this project, so we made our own.

Since launching this project, Twilio has developed its own Terraform provider, which can be found here.

At its core, a provider enables Terraform to perform CRUD operations on a set of resources. Armed with a provider, Terraform can manage the state of the application.

Creating a Provider

Note: If you are interested in setting up Terraform for a service that already has a provider, you can skip to the next section.

Here is the basic structure of a Terraform provider:

This folder structure contains your Go dependencies, a Makefile for running commands, an example file for local development, and a directory called twilio. This is where our provider lives.

A provider must contain a resource file for every type of resource you want to manage. Each resource file contains a set of CRUD instructions for Terraform to follow–you’re basically telling Terraform how to manage this resource.

Here is the function defining what an activity resource is in our provider:

Note: Go is a strongly typed language, so the syntax might look unusual if you’re not familiar with it. Luckily you do not need to be a Go expert to write your own provider!

This file defines what Terraform needs to do to create, read, update and destroy activities in Task Router. Each of these operations is defined by a function in the same file.

The file also defines an Importer function, a special type of function that allows Terraform to import existing infrastructure. This is very handy if you already have infrastructure running and want to start using Terraform to manage it.

Finally, the function defines a schema–these are the parameters provided by the API for performing CRUD operations. In the case of Task Router activities, the parameters are friendly_name, available, and workspace_sid.

To round out the example, let’s look at the create function we wrote:

Note: Most of this code is boilerplate Terraform provider code which you can find in their docs.

The function is passed context, a schema resource, and an empty interface.

We instantiate the Twilio API client and find our workspace (Task Router activities all exist under a single workspace).

Then we format our parameters (defined in our Schema in the resourceTwilioActivity function) and pass them into the create method provided to us by our API client library.

Because this function creates a new resource, we set the id (setID) to the sid of the result of our API call. In Twilio, a sid is a unique identifier for a resource. Now Terraform is aware of the newly created resource and it’s unique identifier, which means it can make changes to the resource.

Using Terraform

Once you have created your provider or are managing an app that already has a provider, you’re ready to start using Terraform.

Terraform uses a DSL for managing resources. The good news is that this DSL is more straightforward than the Go code that powers the provider.

The DSL is simple enough that with some instruction, non-developers should be able to make changes to your infrastructure safely–but more on that later.

Here is the code for defining a new Task Router activity:

Yup, that’s it!

We create a block declaring the resource type and what we want to call it. In that block, we pass the variables defined in the Schema block of our resourceTwilioActivity, and any resources that it depends on. In this case, activities need to exist within a workspace. So we pass in the workspace resource in the depends_on array. Terraform knows it needs this resource to exist or to create it before attempting to create the activity.

Now that you have defined your resource, you’re ready to start seeing the benefits of Terraform.

Terraform has a few commands, but plan and apply are most common. Plan will print out a text-based representation of the changes you’re about to make:

Terraform makes visualizing the changes to your infrastructure very easy. At this planning step you may uncover unintended changes - if there was already an offline activity the plan step would show you an update instead of a create. At this step, all you need to do is change your resource block’s name,and run terraform plan again.

When you are satisfied with your changes, run terraform apply to make the changes to your infrastructure. Now Terraform will know about the newly created resource, and its generated id, allowing you to manage it exclusively through Terraform moving forward.

To get the full benefit of Terraform (PRs, reviews, etc.), we use an additional tool called Atlantis to manage our GitHub integration.

This allows people to make pull requests with changes to resource files, and have Atlantis add a comment to the PR with the output of terraform plan. Once the review process is done, we comment atlantis apply -p terraform to make the change. Then the PR is merged.

We have come a long way from managing our infrastructure with a GUI in a web app! We have a Terraform provider communicating via a Go API client to manage our infrastructure as code. With Atlantis plugged into our team’s GitHub, we now have many of the best practices we rely on when writing software–reviewable PRs that are easy to understand and roll back if necessary, with a clear history that can be scanned with a git blame.

How was Terraform Received by Other Teams?

The most rewarding part of this project was how it was received by other teams. Instead of business and support teams making requests and waiting for developers to change Twilio workflows, Terraform empowered them to do it themselves. In fact, some people’s first PRs were changes to our Terraform infrastructure!

Along with freeing up developer time and making the business teams more independent, Terraform provides visibility to infrastructure changes over time. Terraform shows the impact of changes, and the ease of searching GitHub for previous changes makes it easy to understand the history of changes our teams have made.

Building great tools will often require maintaining third-party infrastructure. In my team’s case, this means managing Twilio TaskRouter to route tasks to support agents properly.

As the needs of your team grow, the way you configure your infrastructure will likely change as well. Tracking these changes and being confident in making them is very important but can be difficult.

Terraform makes these changes more predictable and empowers developers and non-developers alike to use software engineering best practices when making these changes.

Jeremy Cobb is a developer at Shopify. He is passionate about solving problems with code and improving his serve on the tennis court.


Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.