Caching is critical to how Rails applications work. At every layer, whether it be in page rendering, database querying, or external data retrieval, the cache is what ensures that no single bottleneck brings down an entire application.

But caching has a dirty secret, and that secret’s name is Marshal.

Marshal is Ruby’s ultimate sharp knife, able to transform almost any object into a binary blob and back. This makes it a natural match for the diverse needs of a cache, particularly the cache of a complex web framework like Rails. From actions, to pages, to partials, to queries—you name it, if Rails is touching it, Marshal is probably caching it.

Marshal’s magic, however, comes with risks.

A couple of years ago, these risks became very real for us. It started innocently enough. A developer at Shopify, in an attempt to clean up some code in our core monolith, shipped a PR refactoring some key classes around beta flags. The refactor got the thumbs up in review and passed all tests and other checks.

As it went out to production, though, it became clear something was very wrong. A flood of exceptions triggered an incident, and the refactor was quickly rolled back and reverted. We were lucky to escape so easily.

The incident was a wake-up call for us. Nothing in our set of continuous integration (CI) checks had flagged the change. Indeed, even in retrospect, there was nothing wrong with the code change at all. The issue wasn’t the code, but the fact that the code had changed.

The problem, of course, was Marshal. Being so widely used, beta flags were being cached. Marshal serializes an object’s class along with its other data, so many of the classes that were part of the refactor were also hardcoded in entries of the cache. When the newly deployed code began inserting beta flag instances with the new classes into the cache, the old code—which was still running as the deploy was proceeding—began choking on class names and methods that it had never seen before.

As a member of Shopify’s Ruby and Rails Infrastructure team, I was involved in the follow-up for this incident. The incident was troubling to us because there were really only two ways to mitigate the risk of the same incident happening again, and neither was acceptable. The first is simply to put less things into the cache, or less variety of things; this decreases the likelihood of cached objects conflicting with future code changes. But this defeats the purpose of having a cache in the first place.

The other way to mitigate the risk is to change code less, because it’s code changes that ultimately trigger cache collisions. But this was even less acceptable: our team is all about making code cleaner, and that requires changes. Asking developers to stop refactoring their code goes against everything we were trying to do at Shopify.

So we decided to take a deeper look and fix the root problem: Marshal. We reasoned that if we could use a different serialization format—one that wouldn’t cache any arbitrary object the way Marshal does, one that we could control and extend—then maybe we could make the cache safe by default.

The format that did this for us is MessagePack. MessagePack is a binary serialization format that’s much more compact than Marshal, with stricter typing and less magic. In this two-part series (based on a RailsConf talk by the same name), I’ll pry Marshal open to show how it works, delve into how we replaced it, and describe the specific challenges posed by Shopify’s scale.

But to start, let’s talk about caching and how Marshal fits into that.

You Can’t Always Cache What You Want

Caching in Rails is easy. Out of the box, Rails provides caching features that cover the common requirements of a typical web application. The Rails Guides provide details on how these features work, and how to use them to speed up your Rails application. So far, so good.

What you won’t find in the guides is information on what you can and can’t put into the cache. The low-level caching section of the caching guide simply states: “Rails’ caching mechanism works great for storing any kind of information.” (original emphasis) If that sounds too good to be true, that’s because it is.

Under the hood, all types of cache in Rails are backed by a common interface of two methods, read and write, on the cache instance returned by Rails.cache. While there are a variety of cache backends—in our core monolith we use Memcached, but you can also cache to file, memory, or Redis, for example—they all serialize and deserialize data the same way, by calling Marshal.load and Marshal.dump on the cached object.

A diagram showing the differences between the cache encoding format between Rails 6 and Rail 7 — Cache encoding format in Rails 6 and Rails 7

If you actually take a peek at what these cache backends put into the cache, you might find that things have changed in Rails 7 for the better. This is thanks to work by Jean Boussier, who’s also in the Ruby and Rails Infrastructure team at Shopify, and who I worked with on the cache project. Jean recently improved cache space allocation by more efficiently serializing a wrapper class named ActiveSupport::Cache::Entry. The result is a more space-efficient cache that stores cached objects and their metadata without any redundant wrapper.

Unfortunately, that work doesn’t help us when it comes to the dangers of Marshal as a serialization format: while the cache is slightly more space efficient, all those issues still exist in Rails 7. To fix the problems with Marshal, we need to replace it.

Let’s Talk About Marshal

But before we can replace Marshal, we need to understand it. And unfortunately, there aren’t a lot of good resources explaining what Marshal actually does.

To figure that out, let’s start with a simple Post record, which we will assume has a title column in the database:

We can create an instance of this record and pass it to Marshal.dump:

This is what we get back:

This is a string of around 1,600 bytes, and as you can see, a lot is going on in there. There are constants corresponding to various Rails classes like ActiveRecord, ActiveModel and ActiveSupport. There are also instance variables, which you can identify by the @ symbol before their names. And finally there are many values, including the name of the post, Caching Without Marshal, which appears three times in the payload.

The magic of Marshal, of course, is that if we take this mysterious bytestring and pass it to Marshal.load, we get back exactly the Post record we started with.

You can do this a day from now, a week from now, a year from now, whenever you want—you will get the exact same object back. This is what makes Marshal so powerful.

And this is all possible because Marshal encodes the universe. It recursively crawls objects and their references, extracts all the information it needs, and dumps the result to the payload.

But what is actually going on in that payload? To figure that out, we’ll need to dig deeper and go to the ultimate source of truth in Ruby: the C source code. Marshal’s code lives in a file called marshal.c. At the top of the file, you’ll find a bunch of constants that correspond to the types Marshal uses when encoding data.

Marshal types defined in marshal.c — Marshal types defined in `marshal.c`

Top in that list are MARSHAL_MAJOR and MARSHAL_MINOR, the major and minor versions of Marshal, not to be confused with the version of Ruby. This is what comes first in any Marshal payload. The Marshal version hasn’t changed in years and can pretty much be treated as a constant.

Next in the file are several types I will refer to here as “atomic”, meaning types which can’t contain other objects inside themself. These are the things you probably expect: nil, true, false, numbers, floats, symbols, and also classes and modules.

Next, there are types I will refer to as “composite” that can contain other objects inside them. Most of these are unsurprising: array, hash, struct, and object, for example. But this group also includes two you might not expect: string and regex. We’ll return to this later in this article.

Finally, there are several types toward the end of the list whose meaning is probably not very obvious at all. We will return to these later as well.

Objects

Let’s first start with the most basic type of thing that Marshal serializes: objects. Marshal encodes objects using a type called TYPE_OBJECT, represented by a small character o.

Here’s the Marshal-encoded bytestring for the example Post we saw earlier, converted to make it a bit easier to parse.

The first thing we can see in the payload is the Marshal version (0408), followed by an object, represented by an ‘o’ (6f). Then comes the name of the object’s class, represented as a symbol: a colon (3a) followed by the symbol’s length (09) and name as an ASCII string (Post). (Small numbers are stored by Marshal in an optimized format—09 translates to a length of 4.) Then there’s an integer representing the number of instance variables, followed by the instance variables themselves as pairs of names and values.

You can see that a payload like this, with each variable itself containing an object with further instance variables of its own, can get very big, very fast.

Instance Variables

As mentioned earlier, Marshal encodes instance variables in objects as part of its object type. But it also encodes instance variables in other things that, although seemingly object-like (subclassing the Object class), aren’t in fact implemented as such. There are four of these, which I will refer to as core types, in this article: String, Regex, Array, and Hash. Since Ruby implements these types in a special, optimized way, Marshal has to encode them in a special way as well.

Consider what happens if you assign an instance variable to a string, like this:

This may not be something you do every day, but it’s something you can do. And you may ask: does Marshal handle this correctly?

The answer is: yes it does.

It does this using a special type called TYPE_IVAR to encode instance variables on things that aren’t strictly implemented as objects, represented by a variable name and its value. TYPE_IVAR wraps the original type (String in this case), adding a list of instance variable names and values. It’s also used to encode instance variables in hashes, arrays, and regexes in the same way.

Circularity

Another interesting problem is circularity: what happens when an object contains references to itself. Records, for example, can have associations that have inverses pointing back to the original record. How does Marshal handle this?

Take a minimal example: an array which contains a single element, the array itself:

What happens if we run this through Marshal? Does it segmentation fault on the self-reference?

As it turns out, it doesn’t. You can confirm yourself by passing the array through Marshal.dump and Marshal.load:

Marshal does this thanks to an interesting type called the link type, referred to in marshal.c as TYPE_LINK.

The way Marshall does this is quite efficient. Let’s look at the payload: 0408 5b06 4000. It starts with an open square bracket (5b) representing the array type, and the length of the array (as noted earlier, small numbers are stored in an optimized format, so 06 translates to a length of 1). The circularity is represented by a @ (40) symbol for the link type, followed by an index of the element in the encoded object the link is pointing to, in this case 00 for the first element (the array itself).

In short, Marshal handles circularity out of the box. That’s important to note because when we deal with this ourselves, we’re going to have to reimplement this process.

Core Type Subclasses

I mentioned earlier that there are a number of core types that Ruby implements in a special way, and that Marshal also needs to handle in a way that’s distinct from other objects. Specifically, these are: String, Regex, Array, and Hash.

One interesting edge case is what happens when you subclass one of these classes, like this:

If you create an instance of this class, you’ll see that while it looks like a hash, it’s, indeed, an instance of the subclass:

So what happens if you encode this with Marshal? If you do, you’ll find that it actually captures the correct class:

Marshal does this because it has a special type called TYPE_UCLASS. To the usual data for the type (hash data in this case), TYPE_UCLASS adds the name of the class, allowing it to correctly decode the object when loading it back. It uses the same type to encode subclasses of strings, arrays, and regexes (the other core types).

The Magic of Marshal

We’ve looked at how Marshal encodes several different types of objects in Ruby. You might be wondering at this point why all this information is relevant to you.

The answer is because—whether you realize it or not—if you’re running a Rails application, you most likely rely on it. And if you decide, like we did, to take Marshal’s magic out of your application, you’ll find that it’s exactly these things that break. So before doing that, it’s a good idea to figure out how to replace each one of them.

That’s what we did, with a little help from a format called MessagePack. In the next part of this series, we’ll take a look at the steps we took to migrate our cache to MessagePack. This includes re-implementing some of the key Marshal features, such as circularity and core type subclasses, explored in this article, as well as a deep dive into our algorithm for encoding records and their associations.

Chris Salzberg is a Staff Developer on the Ruby and Rails Infra team at Shopify. He is based in Hakodate in the north of Japan.

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.