Media at Scale: Callbacks vs pipelines

The Shopify admin is a Ruby on Rails monolith that hundreds of developers work on, with a continuous deployment cycle used by millions of people.

Shopify recently launched the ability to add videos and 3d models to products as a native feature within the Admin section. This feature expands a merchant's ability to showcase their products through media using videos and 3d models in addition to the already existing images. Images have been around since the beginning of Shopify, and there are currently over 7 billion images on the platform.

A requirement of this project was to support the legacy image infrastructure while adding the new capabilities for all our merchants. In a large system like Shopify, it is critical to have control over the code and safe database operations. For this reason, at Shopify, transactions are used to make database writes safe.

One of the challenges we faced during the design of this project was deciding whether to use callbacks or a pipeline approach. The first approach we can take to implement the feature is a complete rails approach with callbacks and dependencies. Callbacks, as rails describes them, allow you to trigger logic during an object's life cycle.

“Callbacks are methods that get called at certain moments of an object's life cycle. With callbacks, it is possible to write code that will run whenever an Active Record object is created, saved, updated, deleted, validated, or loaded from the database.”

Media via Callbacks

Callbacks are quick and easy to set up. It is fast to hit the ground running. Let’s look at what this means for our project. For the simplest case of only one media type, i.e., image, we have an image object and a media object. The image object has a polymorphic association to media. Video and 3d models can be other media types. The code looks something like this:

With this setup, the creation process of an Image will look something like this:

We have to add a transaction during creation since we need both image and media records to exist. To add a bit more complexity to this example, we can add another media type, video, and expand beyond create. Video has unique aspects, such as the concept of a thumbnail that doesn’t exist for an image. This adds some conditionals to the models.

From the two previous examples, it is clear that getting started with callbacks is quick when the application is simple; however, as we add more logic, the code becomes more complex. We see that, even for a simple example like this one, our media object has granular information on the specifics of video and images. As we add more media types, we end up with more conditional statements and business logic spread across multiple models. In our case, using callbacks made it hard to keep track of the object’s lifecycle and the order in which callbacks are triggered during each state. As a result, it became challenging to maintain and debug the code.

Media via Pipelines

In an attempt to have better control over the code, we decided not to use callbacks and instead try an approach using pipelines. Pipeline is the concept where the output of one element is the input to the next. In our case, the output of one class is the input to the next. There is a separate class that is responsible for only one operation of a single media type.

Let’s imagine the whole system as a restaurant kitchen. The Entrypoint class is like the kitchen’s head chef. All the orders come to this person. In our case, the user input comes into this service (Product Create Media Service). Next, the head chef assigns the orders to her sous chef. This is the media create handler. The sous chef looks at the input and decides which of the kitchen staff get’s the order. She assigns the order for a key lime pie to the pastry chef and the order for roasted chicken to the roast chef. Similarly, the media create handler assigns the task to create a video to the video create handler and the task to create an image to the image create handler. Each of these individuals specializes in their tasks and are not aware of the details of others. The video create handler is only responsible for creating a media of type video. It has no information about image or 3d models.

All of our individual classes have the same structure but are responsible for different media types. Each class has three methods:
  • before_transaction
  • during_transaction
  • after_transaction

As the name suggests, these methods have to be called in that specific order. Going back to our analogy, the before transaction method is responsible for all the prep work that goes in before we create the food. The during transaction is everything involved in creating the dish, which involves cooking the dish and plating it. For rails, this is the method that is responsible for persisting the data to the database. Finally, after_transaction is the clean up that is required after each dish is created.

Let's look at what the methods do in our case. For example, the video create handler will look like this:

Similarly, if we move a step up, we can look at the media create handler. This handler will also follow a similar pattern with three methods. Each of these methods in turn calls the handler for the respective media type and creates a cascading effect.

Media Create Handler

The logic for each media type remains confined to its specific class. Each class is only aware of its operation, like how the example above is only concerned with creating a video. This creates a separation of concerns. Let's take a look at the product create media service. The service is unaware of the media type, and it’s only responsibility is to call the media handler.

Product Create Media Service

The product create media service also has a public entry point, which is used to call the service.

The caller of the service has a single entry point and is completely unaware of the specifics of how each media type is created. Like in a restaurant, we want to make sure that the food for an entire table is delivered together. Similarly, in our code, we can manage that interdependent objects are created together using a transaction. This approach gives us the following features:

  • Different components of the system can create media and manage their own transactions.
  • The system components no longer have access to the media models but can interact with them using the service entry point.
  • The media callbacks don't get fired with those of the caller, making it easier to follow the code. When developers new to rails use callbacks, it requires a lot of knowledge of the framework and hides away the implementation details.
  • This approach makes it easier to follow and debug the code. The cognitive load on the reader is low, they are all ruby objects, and it is easy to understand.
  • It also gives us control over the order in which objects are created, deleted, updated.

From the code example, we see that the methods of implementation using callbacks is quick and easy to set up. Ruby on rails can speed up the development process by abstracting away the implementation details, and it is a great feature to use when working with a simple use case. However, as the code evolves and grows more complex, It can be hard to maintain a large production application with callbacks. As we saw in the example above, we had conditionals spread across the active record models.

An alternate approach can better serve the purpose of long-term maintenance and understandability of the code. In our case, pipelines helped achieve this. We separated the business logic in separate files, enabling us to understand the implementation details better. It also avoided having conditionals spread across the active record models. The most significant advantage of the approach is that it created a clear separation of concerns and different parts of the application do not know the particulars of the media component.

When designing a pipeline it is important to make sure that there is a single entry point that can be used by the consumer. The pipeline should only perform the actions it is expected to and not have side effects. For example, our pipeline is responsible for creating media and no other action, the client does not expect any other product attribute to be modified. Pipelines are designed to make it easy for the caller to perform certain tasks and so we hide away the implementation details of creating media from the caller. And finally having several steps that perform smaller subtasks can create a clear separation of concern within the pipeline.

We're always on the lookout for talent and we’d love to hear from you. Visit our Engineering career page to find out about our open positions.