Reliving Your Happiest HTTP Interactions with Ruby’s VCR Gem

by Stephen Prater
Development

Jan 5, 2023
6 minute read

VCR is a Ruby library that records HTTP interactions and plays them back to your test suite, verifying input and returning predictable output.

In Ruby apps it's most frequently used as a testing tool, but having it in your toolbox provides you with a rich set of organizational and debugging tools, even if you choose not to use its popular “automocking” feature.

This post will be helpful to you if:

You’re having trouble managing a complex set of Webmock stubs.
You need to debug or test a complicated or involved set of remote API operations.
You’re working in an application that uses VCR.

What’s the Difference Between VCR and Webmocks?

It's important to understand that VCR and Webmocks do essentially the same thing—the main difference is the level at which the external API needs to be reasoned about.

When using Webmocks, you create “mocks” for a few common endpoints and return a known set of data for those endpoints. You’ll need to explore and understand the behavior of the API with your meat-brain, and determine yourself if the data you construct is valid. This is easier to change, but it requires that you don't change the Webmock in a way that's incompatible with the actual API.

When using VCR, you “record” an interaction with a live copy of the API. VCR matches these recordings to requests, and “plays them back” when the same request is made. The API itself provides data to your application, and the VCR verifies requests at a configurable level of specificity. You’ll need to ensure that your collaborator API provides responses that give you the test cases you want. This is harder and more complicated to change, but is guaranteed to be an accurate representation of the behavior of the API at the time the “cassette” is recorded.

So…why not just use mocks/stubs?

“Client” Mocks Have a Different Use Case

Mocks have a slightly different primary use case, which is to verify the attributes of an outgoing message that passes a system boundary. They shouldn’t be used solely to provide difficult to arrange data, unless that data is verified against the source in a different test.

Mocks aren't stubs. A stub is more appropriate for returning dummy data, but note that a stub normally only responds to a limited set of calls with a limited set of data. There’s no guarantee that you've accurately reproduced the behavior of a troublesome API with a stub.

Stubs Require Deep Mocking and a Detailed Understanding of the System

Web Service stubs require you to have an internal knowledge of the system. You must understand at a low level how the HTTP layer in a given area is implemented, and then you need to reach deep inside that area, and reimplement part of it.

Although this is a common problem—and an often-raised objection to VCR is that it results in lots of very similar HTTP recordings when a single Webmock Stub request would work—this is a design smell in our application.

It is almost always a better design to split these collaborators, so that data from an external system is injected into a class that you control and can be easily unit tested.

For example:

With this design, you're able to write more and faster unit tests around the AResult object that encompass all of the different logic involved in transforming one representation of data to a different representation.

AClient is easily verifiable with “mocks as external verifiers.”

But what about A itself? If it's more than a toy example, chances are we need to test it as well. VCR is excellent for this use case, because it doesn't require you to set up a bunch of different states on the server, but merely to verify the happy path.

Your Frail Human Eyes Need Rest—You Should Write a Test!

Noodling around in the console to understand how an API works is slow and error prone. If you find yourself running code in the console, or doing lots of puts statements to discover what is happening, you should consider an exploratory test. VCR speeds up this process because it captures the API call with PerfectRobot knowledge.

Random hot take: EVERYONE DOES TDD ALL THE TIME. It's just a question of whether you’re automating the tedious bits or having the computer do it. You know how to write a test and you know what it should say, otherwise, how are you writing a feature? What does the feature do? How will you know when it works?

If you mess about with GraphQL until you figure out what the API returns, then copy that return to your Webmock, you’re doing the exact same thing as VCR—just with extra steps and meat involved.

VCR Is Not a Magic Hammer That Shoots Silver Bullets

Sometimes, a hand-rolled stub is still the right approach. VCR depends not only on running your own codebase, but on having a ready copy of the API you’re integrating with. That can mean running “test” against production endpoints, or standing up your own copy of the API in question. The answer to the question of whether or not you should use VCR should seem familiar to most developers—“Well, it depends.”

It requires you to have a copy running of the API you're interacting with, and you may need to perform some setup to harness it for testing.

For simple, single call, GET operations, it's a lot of moving parts that might be faster served by a Webmock.

If the API changes, the “auto-mocks” generated by VCR do not change. It shares this limitation with Webmock.

When Should I Use VCR?

You should definitely consider VCR when:

The API call sequence or timing is not predictable.
You don’t have good understanding of the behavior of the API.
There are multiple API calls involved in a single logical domain action.
The API is slow, unreliable or obnoxious to use.

When Shouldn’t I Use VCR?

VCR is probably overkill for:

Single API calls with well-formed and understood schemas
APIs that you control and that you’re working on in parallel with the client
An API that you understand very well, which has very well-defined behavior
Calling a mutation on the API results in a difficult to rollback state change, for example deleting a record you can't easily recreate.

When Should I Mix The Two?

VCR is best for large, integration-y tests. For specific behavior related to specific API state, it's often better to use VCR only for integration tests, and then use Webmock or decompose to unit tests for “difficult to reproduce, but functionally possible” return values.

Convinced yet? Great. Here’s how to use it.

Add it to your Gemfile, and configure it in your test helper.

Webmock or similar is required.

When you want to use it in a test, wrap your callsite in a VCR.use_cassette block.

Note that you can configure what is considered a “matching” request in VCR by using the “use_cassette” method. Use caution letting VCR auto-title and record your cassettes based on your test name, since it will then re-record cassettes whenever the description of your test changes.

Work against the live API as much as possible.

Don't record interactions until you're at least somewhat sure it's going to work.

To have VCR not record anything while you're experimenting:

VCR creates YAML files with the HTTP Interactions recorded in it. You should delete these liberally. When in doubt, delete it!

Don't use `:new_episodes`

New Episodes silently records requests that it can't match with existing requests. Although it's the default, this option is optimized for the “interact with many different APIs in an integration test” use case.

Unless you’re using that feature, consider :once —which will only record on first playback, or :none —which will not allow new HTTP requests instead.

The exception to this is when splitting requests to a single API. Since one of the main motivations for organizing cassettes this way is to capture API interactions with non-deterministic behavior, recording interactions outside the context of a single test run can be useful.

Don't leave it on though.

Use `before_filter` to log all requests.

This is an extremely handy feature, even if you don't use VCR to capture requests. It will log reasonably complete information about each outgoing request generated by your app. req is a Net:HTTP request, so even more information is available, as well as pry or other debuggers.

Use `filter_sensitive_data` to replace sensitive data and values which might change per-environment.

This method replaces data in cassettes with static placeholders. If you have “dynamic data” in cassettes, use this method to replace it at runtime.

It's also super useful for not checking-in credentials that you might need to access the API. Any string that can be evaluated at runtime can be replaced with an easily identifiable placeholder.

Advanced Technique: Split out troublesome Collaborators

Some APIs can be particularly hard to characterize. Certain authentication APIs can be called a random number of times by their clients at random intervals depending on when its internal cache expires. In order to keep these random (but vital!) calls out of cassettes for other services, you can use before filters to isolate them to their own cassette.

This keeps those requests isolated, but can hide errors since every HTTP interaction generated by your tests no longer lives in one YAML file.

Advanced Technique: Inspect your YAML files

Even if you don't intend to use the auto-mocks that VCR generates, it's still helpful to have a record of the HTTP interactions for debugging. You can compare these against other interactions, highlight weird arguments or responses, and even modify them directly to mock difficult to reproduce situations.

You can even use ERB in the YAML files to provide dynamic content!

Now, Go Forth and Record an API without the Express Written Permission of Major League Baseball!

VCR is a powerful tool for systematizing your interactions with HTTP APIs.

Is this tool right for your use case? The answer, like always, is “it depends”.

But, if you're struggling with difficult to maintain mocks, misbehaving APIs or complex multi-step interactions and would like tests that are more reliable, faster, and easier to debug, VCR can help you get there.

Prater (he/him) is a Staff Engineer at Shopify. He’s been doing Ruby mad-science since 2008 and wants to see the wackiest code you’ve ever written. Connect with him on Twitter or Mastodon.

We all get shit done, ship fast, and learn. We operate on low process and high trust, and trade on impact. If you’re seeking hypergrowth, can solve complex problems, and can thrive on change (and a bit of chaos), you’ve found the right place. Visit our Engineering career page to find your role.