Earlier this year I took the train from Ottawa to Toronto. While I was waiting in line in the main hall of the station, I noticed a police officer with a detection dog. The police officer was giving the dog plenty of time at each bag or person as they worked and weaved their way back and forth along the lines. The dog would look to his handler for direction, receiving it with the wave of a hand or gesture towards the next target. That’s about the moment I began asking myself a number of questions about dogs… and APIs.
To understand why, you have to appreciate that the Canadian government recently legalized cannabis. Watching this incredibly well-trained dog work his way up and down the lines, it made me wonder, how did they “update” the dogs once the legislation changed? Can you really retrain or un-train a dog? How easy is it to implement this change, and how long does it take to roll out? So when the officer ended up next to me I couldn’t help but ask,
ME: “Excuse me, I have a question about your dog if that’s alright with you?”
OFFICER: “Sure, what’s on your mind?”
ME: “How did you retrain the dogs after the legalization of cannabis?”
OFFICER: “We didn’t. We had to retire them all and train new ones. You really can’t teach an old dog new tricks.“
ME: “Wow, seriously? How long did that take?”
OFFICER: “Yep, we needed a full THREE YEARS to retire the previous group and introduce a new generation. It was a ton of work.”
I found myself sitting on the train thinking about how simple it might have been for one layer of government plotting out the changes, to completely underestimate the downstream impact on the K9 unit of the police services. To anyone that didn’t understand the system (dogs), the change sounds simple. Simply detect substances in a set that is now n-1 in size. In reality, due to the way this dog-dependent system works, it requires significant time and effort, and a three-year program to migrate from the old system to the new.
How We Handle API Versioning
At Shopify, we have tens of thousands of partners building on our APIs that depend on us to ensure our merchants can run their businesses every day. In April of this year, we released the first official version of our API. All consumers of our APIs require stability and predictability and our API versioning scheme at Shopify allows us to continue to develop the platform while providing apps with stable API behavior and predictable timelines for adopting changes.
The increasing growth of our API RPM quarter over quarter since 2017 overlaid with growth in active API clients
To ensure that we provide a stable and predictable API, Shopify releases a new API version every three months at the beginning of the quarter. Version names are date-based to be meaningful and semantically unambiguous (for example, 2020-01).
Although the Platform team is responsible for building the infrastructure, tooling, and systems that enforce our API versioning strategy at Shopify, there are a 1000+ engineers working across Shopify, each with the ability to ship code that can ultimately affect any of our APIs. So how do we think about versioning, and help manage changes to our APIs at scale?
Our general rule of thumb about versioning is that
API versioning is a powerful tool that comes with added responsibility. Break the API contract with the ecosystem only when there are no alternatives or it’s uneconomical to do otherwise.
API versions and changes are represented in our monolith through new frozen records, one file for versions, and one for changes. API changes are packaged together and shipped as a part of a distinct version. API changes are initially introduced to the unstable version, and can optionally have a beta flag associated with the change, to prevent the change from being visible publicly. At runtime, our code can check whether a given change is in effect through a ApiChange.in_effect?
construct. I’ll show you how this, and other methods of the ApiChange
module are used in an example later on.
Dealing With Breaking and Non-breaking Changes
As we continue to improve our platform, changes are necessary and can be split into two broad categories: breaking and non-breaking.
Breaking changes are more problematic and require a great deal of planning, care and go-to-market effort to ensure we support the ecosystem and provide a stable commerce platform for merchants. Ultimately, a breaking change is any change that requires a third-party developer to do any migration work to maintain the existing functionality of their application. Some examples of breaking changes are
- adding a new or modifying an existing validation to an existing resource
- requiring a parameter that wasn’t required before
- changing existing error response codes/messages
- modifying the expected payload of webhooks and async callbacks
- changing the data type of an existing field
- changing supported filtering on existing endpoints
- renaming a field or endpoint
- adding a new feature that will change the meaning of a field
- removing an existing field or endpoint
- changing the URL structure of an existing endpoint.
Teams inside Shopify considering a breaking change conduct an impact analysis. They put themselves into the shoes of a third-party developer using the API and think through the changes that might be required. If there is ambiguity, our developer advocacy team can reach out to our partners to gain additional insight and gauge the impact of proposed changes.
On the other hand, to determine if a change is non-breaking, a change must pass our forward compatibility test. Forward compatible changes are those which can be adopted and used by any merchant, without limitation, regardless of whether shops have been migrated or any other additional conditions have been met.
Forward compatible changes can be freely adopted without worrying about whether there is a new user experience or the merchant’s data is adapted to work with the change, etc. Teams will keep these changes in the unstable API version and if forward compatibility cannot be met, keep access limited and managed by protecting the change with a beta flag.
Every change is named in the changes frozen record mentioned above, to track and manage the change, and can be referenced by its name, for example,
ApiChange.in_effect?(:really_big_change)
Analyzing the Impact of Breaking Changes
If a proposed change is identified as a breaking change, and there is agreement amongst the stakeholders that it’s necessary, the next step is to enable our teams to figure out just how big the change’s impact is.
Within the core monolith, teams make use of our API change tooling methods mark_breaking
and mark_possibly_breaking
to measure the impact of a potential breaking change. These methods work by capturing request metadata and context specific to the breaking code path then emitting this into our event pipeline, Monorail, which places the events into our data warehouse.
The mark_breaking
method is called when the request would break if everything else was kept the same, while mark_possibly_breaking
would be used when we aren’t sure whether the call would have an adverse effect on the calling application. An example would be the case where a property of the response has been renamed or removed entirely:
ApiChange.mark_breaking(:really_big_change)
.
Once shipped to production, teams can use a prebuilt impact assessment report to see the potential impact of their changes across a number of dimensions.
Measuring and Managing API Adoption
Once the change has shipped as a part of an official API version, we’re able to make use of the data emitted from mark_breaking
and mark_possibly_breaking
to measure adoption and identify shops and apps that are still at risk. Our teams use the ApiChange.in_effect?
method (made available by our API change tooling) to create conditionals and manage support for the old and new behaviour in our API. A trivial example might look something like this:
The ApiChange
module and the automated instrumentation it drives allow teams at Shopify to assess the current risk to the platform based on the proportion of API calls still on the breaking path, and assist in communicating these risks to affected developers.
At Shopify, our ecosystem’s applications depend on the predictable nature of our APIs. The functionality these applications provide can be critical for the merchant’s businesses to function correctly on Shopify. In order to build and maintain trust with our ecosystem, we consider any proposed breaking change thoroughly and gauge the impact of our decisions. By providing the tooling to mark and analyze API calls, we empower teams at Shopify to assess the impact of proposed changes, and build a culture that respects the impact our decisions have on our ecosystem. There are real people out there building software for our merchants, and we want to avoid ever having to ask them to replace all the dogs at once!
We're always on the lookout for talent and we’d love to hear from you. Please take a look at our open positions on the Engineering career page.