In October 2022, Shopify released ShopifyQL Notebooks, a first-party app that lets merchants analyze their shop data to make better decisions. It puts the power of ShopifyQL into merchants’ hands with a guided code editing experience. In order to provide a first-class editing experience, we turned to CodeMirror, a code editor framework built for the web. Out of the box, CodeMirror didn’t have support for ShopifyQL–here’s how we built it.
ShopifyQL is an accessible, commerce-focused querying language used on both the client and server. The language is defined by an ANTLR grammar and is used to generate code for multiple targets (currently, Go and Typescript). This lets us share the same grammar definition between both the client and server despite differences in runtime language. As an added benefit, we have types written in Protobuf so that types can be shared between targets as well.
All the ShopifyQL language features on the front end are encapsulated into a typescript language server, which is built on top of the ANTLR typescript target. It conforms to Microsoft's language server protocol (LSP) in order to keep a clear separation of concerns between the language server and a code editor. LSP defines the shape of common language features like tokenization, parsing, completion, hover tooltips, and linting.
When code editors and language servers both conform to LSP, they become interoperable because they speak a common language. For more information about LSP, read the VSCode Language Server Extension Guide.
Connecting The ShopifyQL Language Server To CodeMirror
CodeMirror has its own grammar & parser engine called Lezer. Lezer is used within CodeMirror to generate parse trees, and those trees power many of the editor features. Lezer has support for common languages, but no Lezer grammar exists for ShopifyQL. Lezer also doesn’t conform to LSP. Because ShopifyQL’s grammar and language server had already been written in ANTLR, it didn’t make sense to rewrite what we had as a Lezer grammar. Instead, we decided to create an adapter that would conform to LSP and integrate with Lezer. This allowed us to pass a ShopifyQL query to the language server, adapt the response, and return a Lezer parse tree.
Lezer supports creating a tree in one of two ways:
- Manually creating a tree by creating nodes and attaching them in the correct tree shape
- Generating a tree from a buffer of tokens
The ShopifyQL language server can create a stream of tokens from a document, so it made sense to re-shape that stream into a buffer that Lezer understands.
Converting A ShopifyQL Query Into A Lezer Tree
In order to transform a ShopifyQL query into a Lezer parse tree, the following steps occur:
- Lezer initiates the creation of a parse tree. This happens when the document is first loaded and any time the document changes.
- Our custom adapter takes the ShopifyQL query and passes it to the language server.
- The language server returns a stream of tokens that describe the ShopifyQL query.
- The adapter takes those tokens and transforms them into Lezer node types.
- The Lezer node types are used to create a buffer that describes the document.
- The buffer is used to build a Lezer tree.
- Finally, it returns the tree back to Lezer and completes the parse cycle.
Understanding ShopifyQL’s Token Offset
One of the biggest obstacles to transforming the language server’s token stream into a Lezer buffer was the format of the tokens. Within the ShopifyQL Language Server, the tokens come back as integers in chunks of 5, with the position of each integer having distinct meaning:
In this context, length, token type, and token modifier were fairly straightforward to use. However, the behavior of line and start character were more difficult to understand. Imagine a simple ShopifyQL query like this:
This query would be tokenized like this:
In the stream of tokens, even though
product_title is on line 1 (using zero-based indexes), the value for its line integer is zero! This is because the tokenization happens incrementally and each computed offset value is always relative to the previous token. This becomes more confusing when you factor in whitespace-let’s say that we add five spaces before the word
The tokens for this query are:
Notice that only the start character for
SHOW changed! It changed from
5 after adding five spaces before the
SHOW keyword. However,
product_title’s values remain unchanged. This is because the values are relative to the previous token, and the space between
product_title didn’t change.
This becomes especially confusing when you use certain language features that are parsed out of order. For example, in some ANTLR grammars, comments are not parsed as part of the default channel–they are parsed after everything in the main channel is parsed. Let’s add a comment to the first line:
The tokens for this query look like this (and are in this order):
Before the parser parses the comment, it points at
product_title, which is two lines after the comment. When the parser finishes with the main channel and begins parsing the channel that contains the comment, the pointer needs to move two lines up to tokenize the comment–hence the value of -2 for the comment’s line integer.
Adapting ShopifyQL’s Token Offset To Work With CodeMirror
CodeMirror treats offset values much simpler than ANTLR. In CodeMirror, everything is relative to the top of the document–the document is treated as one long string of text. This means that newlines and whitespace are meaningful to CodeMirror and affect the start offset of a token.
So to adapt the values from ANTLR to work with CodeMirror, we need to take these values:
And convert them into this:
The solution? A custom
TokenIterator that could follow the “directions” of the Language Server’s offsets and convert them along the way. The final implementation of this class was fairly simple, but arriving at this solution was the hard part.
At a high level, the
- Takes in the document and derives the length of each line. This means that trailing whitespace is properly represented.
- Internally tracks the current line and character that the iterator points to.
- Ingests the ANTLR-style line, character, and token length descriptors and moves the current line and character to the appropriate place.
- Uses the current line, current character, and line lengths to compute the CodeMirror-style start offset.
- Uses the start offset combined with the token length to compute the end offset.
Here’s what the code looks like:
Building A Parse Tree
Now that we have a clear way to convert an ANTLR token stream into a Lezer buffer, we’re ready to build our tree! To build it, we follow the steps mentioned previously–we take in a ShopifyQL query, use the language server to convert it to a token stream, transform that stream into a buffer of nodes, and then build a tree from that buffer.
Once the parse tree is generated, CodeMirror then “understands” ShopifyQL and provides useful language features such as syntax highlighting.
Providing Additional Language Features
By this point, CodeMirror can talk to the ShopifyQL Language Server and build a parse tree that describes the ShopifyQL code. However, the language server offers other useful features like code completion, linting, and tooltips. As mentioned above, Lezer/CodeMirror doesn’t conform to LSP–but it does offer many plugins that let us provide a connector between our language server and CodeMirror. In order to provide these features, we adapted the language server’s
doValidate with CodeMirror’s
linting plugin, the language server’s
doComplete with CodeMirror’s
autocomplete plugin, and the language server’s
doHover with CodeMirror’s
Once we connect those features, our ShopifyQL code editor is fully powered up, and we get an assistive, delightful code editing experience.
This approach enabled us to provide ShopifyQL features to CodeMirror while continuing to maintain a grammar that serves both client and server. The custom adapter we created allows us to pass a ShopifyQL query to the language server, adapt the response, and return a Lezer parse tree to CodeMirror, making it possible to provide features like syntax highlighting, code completion, linting, and tooltips. Because our solution utilizes CodeMirror’s internal parse tree, we are able to make better decisions in the code and craft a stronger editing experience. The ShopifyQL code editor helps merchants write ShopifyQL and get access to their data in new and delightful ways.
This post was written by Trevor Harmon, a Senior Developer working to make reporting and analytics experiences richer and more informative for merchants. When he isn't writing code, he spends time writing music, volunteering at his church, and hanging out with his wife and daughter. You can find more articles on topics like this one on his blog at thetrevorharmon.com, or follow him on GitHub and Twitter.