As more clients rely on GraphQL to query data, we witness performance and scalability issues emerging. Queries are getting bigger and slower, and net-new roll-outs are challenging. The web & mobile development teams working on Orders & Fulfillments spent some time exploring and documenting our approaches. On mobile, our goal was to consistently achieve a sub one second page load on a reliable network. After two years of scaling up our Order screen in terms of features, it was time to re-think the foundation on which we were operating to achieve our goal. We ran a few experiments in mobile and web clients to develop strategies around those pain points. These strategies are still a very open conversation internally, but we wanted to share what we’ve learned and encourage more developers to play with GraphQL at scale in their web and mobile clients. In this post, I’ll go through some of those strategies based on an example query and build upon it to scale it up.

1. Designing Our Base Query

Let’s take the case of a client loading a list of products. To power our list screen we use the following query:

Using this query, we can load the first 100 products and their details (name, price, and image). This might work great, as long as we have fewer than 100 products. As our app grows we need to consider scalability:

How can we prepare for the transition to a paginated list?
How can we roll out experiments and new features?
How can we make this query faster as it grows?

2. Loading Multiple Product Pages

Good news, our products endpoint is paginated on Shopify’s back-end side and can now implement the change on our clients! The main concern on the client side is to find the right page size because it could also have UX and Product implications. The right page size will likely change from one platform to another because we’re likely to display fewer products at the same time on the mobile client (due to less space). This weighs on the performances as the query grows.

In this step, a good strategy is to set performance tripwires, that is create some kind of score (based on loading times) to monitor our paginated query. Implementing pagination within our query immediately reduces the load on the back-end and front-end side if we opt for a lower number than the initial 100 products:

We add two parameters to control the page size and index. We also need to know if the next page is available to show, hence the hasNextPage field. Now that we have support for an unlimited amount of products in our query, we can focus on how we roll out new fields.

3. Controlling New Field Rollouts

Our product list is growing in terms of features, and we run multiple projects at the same time. To make sure we have control on how changes are rolled out in our ProductList query we use @include and @skip tags to make optional some of the net-new fields we’re rolling out. It looks like this:

In the example above the description field is hidden behind the $featureFlag parameter. It becomes optional, and you need to unwrap its value when parsing the response. If the value of $featureFlag is false, the response will return it as null.

The @include and @skip tags require any new field to keep the same naming and level as renaming or deleting those fields will likely result in breaking the query. A way around this problem is to dynamically build the query at runtime based on the feature flag value.

Other rollout strategies can involve duplicating queries and running a specific query based on feature flags or working off a side branch until rollout and deployment. Those strategies are likely project and platform specific and come with more trade-offs like complexity, redundant code, and scalability.

The @include and @skip tags solution is handy for flags on hand, but what about for conditional loading based on remote flags? Let’s have a look at chained queries!

4. Chaining Queries

From time to time you’ll need to chain multiple queries. A few scenarios where this might happen are

Your query relies on a remote flag that comes from another query. This makes rolling out features easier as you control the feature release remotely. On mobile clients with many versions in production, this is useful.
A part of your query relies on a remote parameter. Similar to the scenario above, you need the value of a remote parameter to power your field. This is usually tied to back-end limitations.
You’re running into pagination limitations with your UX. You need to load all pages on screen load and chain your queries until you reach the last page. This mostly happens in clients where the current UX doesn’t allow for pagination and is out of sync with the back-end updates. In this specific case solve the problem at a UX level if possible.

We transform our local feature flag into a remote flag and this is what our query looks like:

In the example above, the RemoteDescriptionFlag query is executed first, and we wait for its results to start the ProductsList query. The descriptionEnabled (aliased to remoteFlag) powers the @include inside our ProductsList query. This means we’re now waiting for two queries at every page or screen load to complete before we can display our list of products. It significantly slows down our performance. A way to work around this scenario is to move the remote flag query outside of this context, probably at an app-wide level.

The TL;DR of chained queries: only do it if necessary.

5. Using Parallel Queries

Our products list query is growing significantly with new features:

We added search filters, user permission, and banners. Those three parts aren’t tied to the products list pagination because if they were included in the ProductsList query, we have to re-query those three endpoints every time we ask for a new page. It slows down performance and gives redundant information. This doesn’t scale well with new features and endpoints, so this sounds like a good time to leverage parallel querying!

Parallel querying is exactly what it sounds like: running multiple queries at the same time. By splitting the query into scalable parts and leaving aside the “core” query of the screen, it brings the benefits to our client:

Faster screen load: since we’re querying those endpoints separately, the load is transferred to the back-end side instead. Fragments are resolved and queried simultaneously instead of being queued on the server-side. It’s also easier to scale server-side than client-side in this scenario.
Easier to contribute as the team grows: by having one endpoint per query, we diminish the risk of code conflict (for example, fixtures) and flag overlapping for new features. It also makes it easier to remove some endpoints.
Easier to introduce the possibility of incremental and partial rendering: As queries are completed, you can start to render content to create the illusion of a faster page load for users.
Removes the redundant querying by leaving our paginated endpoint in its own query: we only query for product pages after the initial query cycle.

Here’s an example of what our parallel queries look like:

Whenever one of those queries becomes too big, we apply the same principles and split again to accommodate for logic and performances. What’s too big? As a client developer, it’s up to you to answer this question by setting up goals and tripwires. Creating some kind of trackable score for loading time can help you make the decision on when to cut the query in multiple parts. This way the GraphQL growth in our products list is more organic ( an outcome that looks at scalability and developer happiness) and doesn't impact performance: each query can grow independently and reduces the amount of potential roll-out & code merge conflicts.

Just a warning when using parallel queries, when transferring the load server-side, make sure you set tripwires to avoid overloading your server. Consult with site reliability experts (SREs or at Shopify, production engineers), and back-end developers, they can help monitor the performances server-side when using parallel querying.

Another challenge tied to parallel queries, is to plug the partial data responses into the screen state’s. This is likely to require some refactor into the existing implementation. It could be a good opportunity to support partial rendering at the same time.

Over the past four years, I have worked on shipping and scaling features in the Orders mobile space at Shopify. Being at the core of our Merchants workflow gave me the opportunity to develop empathy for their daily work. Empowering them with better features meant that we had to scale our solutions. I have been using those patterns to achieve that, and I’m still discovering new ones. I love how flexible GraphQL is on the client-side! I hope you’ll use some of these querying tricks in your own apps. If you do, please reach out to us, we want to hear how you scale your GraphQL queries!

Additional Information on GraphQL

Théo Ben Hassen is a development manager on the Merchandising mobile team. His focus is on enabling mobile to reach its maximum impact through mobile capabilities. He's interested about anything related to mobile analytics and user experience.