Currently Shopify is going through the process of updating from the Apollo GraphQL client 2 to client 3. The Apollo client is a library used to query your GraphQL services from the frontend, it has a feature for caching objects/queries you’ve already made which will be the focus of this post. Through the process of migrating the Apollo client we started to discuss bugs we've run into in the past, for which a common thread was misuse, or more likely a misunderstanding of the cache.
This was the catalyst for me diving further into the cache and exploring how it fetches, transforms, and stores data that we previously queried. Having worked with the Apollo client for the vast majority of my career, I still couldn’t say I understood exactly what was happening in the cache internally. So I felt compelled to find out. In this post, I’ll focus on the Apollo client cache and the life cycle of objects that are cached within it. You’ll learn:
- What the cache is.
- Where it gets data from.
- What data looks like within it.
- How it changes over time.
- How we get rid of it, if at all.
The query above isn’t returning the proper data for the metas.metaData.values object because it doesn’t return anything for the slug field. Do you see what’s going wrong here? I definitely didn’t understand this before diving into my cache research, but we’ll circle back to this in a bit after we explore the cache a little more and see if we can unearth what’s going on.
What exactly is the Apollo cache? It’s an InMemoryCache where the data from your network queries are stored in memory. This means that when you leave your browser session (by closing or reloading the tab) the data in the InMemoryCache will not be persisted. The cache isn’t stored in local storage or somewhere that can persist between sessions, it’s only available during the session it was created in. The other thing is that it’s not quite your data, it's a representation of your data (but we’ll circle back to that concept in a bit).
Fetch Policies: Where Does the Data Come From?
The first step to understanding the cache is knowing when we actually use data from it or retrieve data over the network. This is where fetch policies come into play. A fetch policy defines where to get data from, be it the network, the cache, or a mixture of the two. I won’t get too deep into fetch policies as Apollo has a great resource on their website. If you don’t explicitly set a fetch policy for your GraphQL calls (like from useQuery), the Apollo client will default to the cache-first policy. With this policy, Apollo looks in the cache, and if all the data you requested is there, it’s returned from the cache. Otherwise Apollo goes to the network, saves the new data in the cache, and returns that data to you. Understanding when to use the various fetch policies (which are different variations of going to the network first, going to the cache only or network only) saves you considerable headache for solving some bugs.
Data Normalization: How Is the Data Stored?
Now that we know where we’re getting our data from, we can delve into how only a representation of your data is stored. That's where normalization comes into play. Whenever you query data that’s stored in the cache, it goes through a process called normalization that can be broken down into three steps.
The first step of the normalization process is to split your queried data into individual objects. The cache tries to split the objects up as best as possible, using ID fields as a cue for when to do so. These ID’s also need to be unique, but that falls into the next step of the normalization flow.
The second step is now to take each of the objects that have been broken out and assign it a globally unique cache identifier. These identifiers are normally created by appending the object's __typename field with its id field. Now the keyword here is normally. As you can probably imagine your graph has some fields that can be given globally unique identifiers but using a field other than an id field that those objects may not have. Well that's where the keyfields API comes into play. It allows you to define another set of fields besides just __typename and id to be used for cache keys created. The key to having a good cache key is that it's stable and reproducible, so whenever the cache is looking up objects it's consistent.
Speaking of lookups, we come to the final step of normalization that's taking those broken out objects with their unique identifiers and putting them into a flattened data structure. The reason the Apollo Cache uses a flattened data structure (essentially a hash map) is because it has the fastest lookup time for those objects. That's why the keys are so key (pun intended) to the process as they allow the cache to consistently and quickly return objects when they're looked up. This also ensures that any duplicate objects are stored in the same location in the cache, making it as small as possible.
Automatic Updates: Merging and Adding Data into Our Cache
After data is stored in our cache from our first query, you may be wondering what happens when new data comes in? This is where things get a little closer to the reality of working with the cache. It’s a sticking point for many (myself included) because when things are automatic, like how the cache updates are usually, it feels like magic, but when you expect automatic updates to happen in your UI and instead nothing happens, it becomes a huge frustration. So let’s delve into these (not so) automatic updates that happen when we query for new data or get data sent to the frontend from mutation responses. Whenever we query for data (or have a mutation respond with updates), and our cache policy is one that lets us interact with the cache, one of two things happen with the new data and the existing data. The new data’s IDs are calculated, then they are either found to exist in the current cache and we update that data, or they’re new data objects and are added to the cache. This is a theoretical last step in the object lifecycle where if the same structure is used these objects are continually overwritten and updated with new data.
So knowing this, when the cache is automatically updating the UI, we understand that to be a merge. The following are the two situations where you can expect your data to be merged and updated in the UI automatically.
1. You are editing a single entity and returning the same type in your response
For example, you’ve got a product and you favorite that product. You likely fire a mutation with the products ID, but you must have that mutation return the product as its return type, with the ID of the product favorited and at least the field that determines its favorite status. When this data returns, the cache calculates that internal cache ID and determines there’s already an object with that ID in the cache. It then merges your incoming object (preferring the fields from the incoming object) with the one that’s found in the cache. Finally, it broadcasts an update to any queries that had queried this object previously, and they receive the updated data, re-rendering those components.
2. The second situation is you’re editing entities and returning all entries in that collection of the same type
This is very similar to the first situation, except that this automatic update behavior also works with collections of objects. The only caveat is that all the objects in that collection must be returned in order for an automatic update to occur as the cache doesn’t know what your intentions are with any missing or added objects.
Now for the more frustrating part of automatic updates is when the cache won’t automatically update for you. The following are the four situations you’ll face.
1. Your query response data isn’t related with the changes you want to happen in the cache
This one is straightforward, if you want your query response to change data that you didn’t respond to it with, you need to write an update function in the cache to do this side effect change for you. It really comes into play when you want to do things that are related to response data but isn’t directly that data. For example, extending on our favorite scenario from before, if you successfully complete the favoriting action, but you want a number of favourited products to update, that requires an update function to be written for that data or a refetch for a “number of favourited products” query to work.
2. You’re unable to return an entire set of changed objects
This expands on the returning entire collections point above, if you change multiple entities in a mutation for example and want those to be reflected in the UI automatically, your mutation must return the original list in its entirety, with all the objects and their corresponding IDs. This is due to the fact that the cache doesn’t infer what you want to do with missing objects, whether they should be removed from the list or something else. So you, as the developer, must be explicit with your return data.
3. The order of the response set is different from the currently cached one
For example, you’re changing the order of a list of todos (very original, I know), if you fire a mutation to change its order and get a response, you will notice that the UI isn’t automatically updated, even though you returned all the todos and their IDs. This is because the cache, again, doesn’t infer the meaning of changes like order, so to reflect an order change, an update function needs to be written to have an order change reflected.
4. The response data has an added or removed item in it
This is similar to #2, but essentially the cache can’t reason that an item has been added or removed from a list unless a whole list is returned. For example, the favoriting situation, if on the same page we have a list of favorites, and we unfavorite a product outside this list, its removal from the list isn’t immediate as we likely only returned the removed objects ID. In this scenario, we also need to write an update function for that list of favorited objects to remove the object we’re operating on.
… I Did Say We Would Circle Back to That Original Query
Now that we’ve got a bit of a handle on how automatic updates (merging) and normalization work, let’s circle back to that query that isn’t returning the proper data. So in this query above the productMetas and metaData objects are returning the same type, MetaData, in this example they both had the same ID, and the cache normalized them into a singular object. The issue really came to light during that normalization process as the cache tried to normalize the values object on these into a singular value. However, you’ll notice only one of the values objects has an id field and the other just returns a slug. So here the cache is unable to normalize that second value object correctly due to it not having a matching id and therefore is “losing” the data. But the data isn’t lost, it's just not included in the normalized MetaData.values object. So the solution here is relatively simple, we just need to return the id for the second value object so the cache can recognize them as the same object and merge them correctly.
In the cached object lifecycle this is essentially the end, without further interference objects will live in your normalized cache indefinitely as you update them or add new ones. There are situations however, where you might want to remove unused objects from your cache, especially when your application is long lived and has a lot of data coming into it. For example, if you have a mapping application where you move the map with a bunch of points of interest on it, the points of interest you moved away from will sit in the cache but are essentially useless, taking up memory. Over time you’ll notice the application get slower as the cache takes up more memory, so how can we mitigate this?
Garbage Collection: Cleaning Up After Ourselves
We have reached the final step of an object's lifecycle in the cache, Garbage Collection and Eviction.
Well the best way to deal with that leftover data is to use the garbage collector built into the Apollo client. In client 3, it's a new tool for clearing out the cache, a simple call to the cache.gc() method clears unreachable items from the cache and returns a list of IDs for removed items. Garbage collection isn’t run automatically however, so it’s up to the developer to run this method themselves. Now let’s explore how these unreachable items are created.
Below is a sample app (available here). In this app I have a pixel representation of a pikachu (Painstakingly recreated in code by yours truly), and I’m printing out the cache to the right of it. You will notice a counter that says “Cache Size: 212”. This is a calculation of the number of keys in the normalized cache, and this is just top level keys to illustrate a rough idea of the cache size.
Now behind this frontend application is a backend GraphQL server with a few mutations setup. All these pixels are being delivered from a PixelImage query. There’s also a mutation, where you can send a new color to change the pikachu’s main body pixels to get the shiny version of pikachu. So I’m going to fire that query and take a look at the size of the cache below:
Notice that the cache is now 420 keys large. It essentially doubles in size because the pixels all have unique identifiers that changed when we changed pikachus colors. So the new data came in after our mutation and replaced the old data. Now the old pixel objects for our regular pikachu aren’t deleted. In fact they’re still rolling around in the cache, but they just aren’t reachable. This is how we orphan objects in our cache by re-querying the same data with new identifiers, and this (contrived example) is why we might need the garbage collector. So let's take a look at a representation of the cache below, where the red outlines are the garbage collector traversing the tree of cached objects. On the left are our new and reachable objects, you can see the Root is the root of our GraphQL queries, and the garbage collector is able to go from object to object, determining that things are reachable in the cache. On the right is our original query, which is no longer reachable from the root, and this is how the garbage collector determines that these objects are to be removed from memory.
The garbage collector removing objects essentially finishes the lifecycle of an object in the cache. Thinking of any field requested from your GraphQL server as being part of an object that is living and updating in the cache over time has really made some of the interactions in my applications I run into so much more clear. For example, whenever I query for things with IDs, I clearly put it in my mind that I may be able to extract automatic updates for those objects when I mutate states like changing whether something is pinned or favorited, leading to components that are designed around the GraphQL data updates. When the GraphQL data determines state updates purely by its values we don’t end up duplicating server side data into client side state management, a step that often adds further complexity to our application.Hopefully this peeling back of the caching layers leads to you thinking about how you query for objects, and how you can take advantage of some of the free updates you can get through the cache. I encourage you to take a look at the demo applications (however crude) below to see the cache updating on screen in real time as you perform different interactions and add the raw form of the cache representation to your mental model of frontend development with the apollo client.
Just fork these two projects, in the server project once it has completed initialization, take the “url” displayed and go and update the frontend projects ApolloClient setup with that url so you can make those queries.
Raman is a senior developer at Shopify. He's had an unhealthy obsession with all things GraphQL throughout his career so far and plans to keep digging into it more. He's impatiently waiting for winter to get out snowboarding again and spends an embarrassing amount of time talking about and making food.
Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our career page to find out about our open positions and learn about Digital by Design.