Shopify and YJIT

Back in July 2020, I joined the Ruby & Rails Infrastructure (R&RI) team at Shopify. Our team focuses on making sure that Ruby as well as Ruby on Rails, central to the infrastructure behind all Shopify stores and much of the modern web, run as smoothly and efficiently as possible.

As part of the R&RI team, I got to meet skilled engineers that were doing open source work, directly contributing patches to CRuby itself. Since my background is in compiler design, I started to discuss with my manager the possibility that we could build a relatively simple Just-In-Time (JIT) compiler for Ruby. To my surprise, my manager and two colleagues were immediately on board with this idea, and what would become the YJIT project was born.

Building YJIT was hard work. There were many long, intense debugging sessions involved, but within just over a year, we’d managed to deliver roughly 20% speedups on railsbench. Following that, the CRuby core contributors invited us to upstream YJIT, and so YJIT was released as an official part of Ruby 3.1 in December of 2021. Upstreaming YJIT had been an aspirational goal for the team from the start, but we had never thought it would happen this fast. I’ll take this opportunity to say that I’m very thankful to Shopify for letting us take on some risks, and to the Ruby community for being so open-minded.

Major YJIT Improvements in Ruby 3.2

A lot has happened for YJIT in 2022. For one thing, we’ve expanded the team. We wrote about job openings in the YJIT team on this blog last year, and we were flooded with applications from people excited to work on a Ruby JIT, all of them with impressive CVs and a long list of systems programming skills. We ended up recruiting three skilled engineers which became part of the YJIT dream team. One of these new recruits is no other than Takashi Kokubun, long-time CRuby core member and maintainer of MJIT.

The YJIT team has made multiple improvements to YJIT which are now available as part of Ruby 3.2. The good news is that, as you might expect, the new version of YJIT brings better performance, both on benchmarks and on real workloads, but I would say that the broader theme for 3.2 has been to make YJIT more robust, more maintainable, and generally more production-ready.

Rewriting YJIT to Rust

To start 2021, we decided to port YJIT from C99 to Rust. The motivation for this was twofold. Rust provides additional safety guarantees that C doesn’t, which is important when doing low-level systems programming with many constraints, as in a JIT compiler. The secondary motivating factor was that we felt that, as the complexity of YJIT increases, we needed better tools to manage that complexity. Writing C code, we had to resort to implementing our own dynamic arrays in terms of C macros, which felt both unsafe and awkward. Rust provides a much richer standard library and many nice and fast abstractions. It took Noah Gibbs, Alan Wu, and me about three months to port YJIT to Rust, and I’m happy to say that our new Rust codebase does feel much easier to maintain.

Improved Memory Usage

One of the challenges with JIT compilers is that they always incur some amount of memory overhead over interpreters. At the most basic level, a JIT compiler needs to generate executable machine code, which an interpreter doesn’t, so JIT compilers must use more memory than interpreters. On top of that, however, JIT compilers also need to allocate memory for auxiliary data structure (metadata), which can also add quite a bit of extra memory overhead.

We were unhappy with how much extra memory YJIT used in Ruby 3.1. We felt that the amount of memory needed back then made it difficult to deploy in production at Shopify, and so we’ve made multiple improvements to reduce memory usage. The good news is that, thanks to the hard work of Alan and Takashi, the overhead has been cut down to approximately one third of what it was for 3.1, which helps make YJIT a lot more usable in production. To achieve this, we’ve optimized how much space our metadata takes, we’ve implemented a garbage collector for machine code that is no longer used, and we’ve made it so YJIT will lazily allocate memory pages for machine code as opposed to allocating and initializing a large block of memory up front.

Improved Performance

Depiction of matching performed on two sets of samples based on their propensity scores. — YJIT’s performance vs the interpreter in Ruby 3.2 (higher is better). *Image source: speed.yjit.org*

YJIT 3.2 doesn’t just use less memory though, it’s also faster. We now speed up railsbench by about 38% over the interpreter, but this is on top of the Ruby 3.2 interpreter, which is already faster than the interpreter from Ruby 3.1. According to the numbers gathered by Takashi, the cumulative improvement makes YJIT 57% faster than the Ruby 3.1.3 interpreter. It’s not just our numbers that show that the new YJIT delivers great performance, the Ruby community has done their own benchmarking as well.

“Tweet — Source: @rafael_falco on Twitter.

ARM64 Support

Another major change in YJIT 3.2 is that we now have a new backend that can generate machine code for multiple CPU platforms, which enables us to support ARM64 CPUs. In 3.1, we only supported x86-64 on Mac and Linux. With developers at Shopify migrating to Apple M1/M2 laptops, we found ourselves in the awkward situation where we could only run YJIT locally through emulation with Rosetta. With Ruby 3.2, it’s now possible to run YJIT natively on Apple M1 & M2, AWS Graviton 1 & 2, and even on Raspberry Pis! Interestingly, YJIT gets an even bigger speedup on Mac M1 hardware than it does on Intel x86-64 CPUs. We hope that this will encourage people to try out YJIT locally on their development machines.

Additional Improvements

Ruby 3.2 also includes another major change that has been in the works for a while. Jemma Issroff and Aaron Patterson have done an impressive amount of work in order to reimplement Ruby’s internal representation for objects, which is now based on the concept of object shapes. This allows both the interpreter and YJIT to benefit from faster instance variable accesses.

In addition to this, Eileen Uchitelle implemented a tool to trace YJIT exits, Jimmy Miller worked on improving YJIT support for various types of Ruby method calls, and Kevin Newton implemented a finer-grained constant cache invalidation mechanism. This change was brought about to address a situation we had seen in production where constants being redefined would cause YJIT to recompile a lot of code.

Last but not least, Peter Zhu and Matthew Valentine-House have made improvements to Ruby 3.2’s garbage collector, and made it possible to allocate variable-sized objects. This improves Ruby’s memory usage and also significantly improves the interpreter’s performance. It also makes it possible to allocate larger objects which are more cache-friendly.

Running YJIT in Production

The main reason why Shopify chose to invest in the development of YJIT is of course that Shopify runs a large amount of infrastructure built on top of Ruby and Ruby on Rails. Multiple large clusters of servers distributed across the world, capable of serving over 75 million requests per minute. From the start, the objective was to eventually be able to use YJIT to improve the efficiency of Shopify’s Storefront Renderer (SFR).

Given that YJIT 3.1 had significant memory overhead and was still marked as experimental, we didn’t want to deploy it globally right away. However, starting about a year ago, we’ve started to run a few SFR nodes using YJIT. This has been extremely valuable to us, because it’s enabled us to gather statistics and see how YJIT and our codebase behave under a real-world deployment with real traffic, which has exposed some performance issues we couldn’t see on benchmarks.

This year, with Ruby 3.2, YJIT has improved enough that we’ve deemed it production-ready, and Shopify has proceeded to deploy it globally on its entire SFR infrastructure. We’re able to measure real speedups ranging from 5% to 10% (depending on time of day) on our total end-to-end request completion time measurements.

I want to be honest and say that YJIT is still not perfect. It still has some memory overhead, but we think it’s worth the speedups, and of course, we intend on improving the situation further. One of the key advantages of YJIT is its very fast compilation times. At Shopify, we deploy continuously, often multiple times every day, sometimes multiple times in a single hour. That means YJIT has to be able to compile code very quickly, otherwise some Shopify customers might see their request time out whenever a deployment occurs. It’s not just the speed of the code we compile that matters, it’s also how fast we can compile code.

We’ve successfully deployed YJIT in production at Shopify, but the YJIT team has relatively little visibility into how many people are using YJIT in practice outside of interacting with people on Twitter or at conferences. If you’re using YJIT in production, for your dev environment, or even for a hobby project, please let us know and share your feedback! We’d love to hear your YJIT success stories (or pain points, for that matter).

Future Plans

The year 2023 has just begun and we already have a long list of new improvements we want to bring to YJIT. Since we’ve just deployed YJIT, I think it’s important that we continue to remain grounded and use statistics from our real-world deployment to address the biggest pain points. YJIT’s biggest flaw is still its memory footprint, and this is something we need to continue working to further improve.

In terms of the biggest opportunities for speedups, Ruby is method calls all the way down. That is, loop iteration as well as most basic operations in Ruby are method calls, and typical Ruby code contains many calls to small Ruby methods. As such, the most obvious area for potential improvements would be to make method calls faster. There are a few avenues we're exploring to achieve this, such as potentially implementing a more efficient calling convention, and also inlining method calls.

In addition to optimizing the performance of method calls, we’d also like to better optimize the machine code that YJIT generates. We still don’t have a proper register allocator, and we don’t really optimize across basic blocks. Finally, we may also want to optimize the way YJIT and CRuby perform various hash and string operations, as these are very common in web workloads.

More About YJIT

If you’re interested in trying out Ruby 3.2, the release notes and tarball packages can be found here, it’s also possible to directly install Ruby 3.2 via brew if you’re on macOS, or using the ruby-install tool. In order to make sure that YJIT is available, you just need to make sure that you have rustc 1.58.0 or newer (or the Rust toolchain) installed on your machine before you install/build Ruby using your favorite tool (brew, ruby-build, ruby-install, etc.). You can then run Ruby with YJIT enabled by passing the --yjit command-line flag to Ruby, or by setting the RUBY_YJIT_ENABLE environment variable.

For more information on YJIT’s design or how to use it, you can check out our documentation, or one of the resources below.

I’d like to conclude with a big thank you to the YJIT team, and everyone that has contributed to this project’s success, including: Alan Wu, Aaron Patterson, Jemma Issroff, Eileen Uchitelle, Kevin Newton, Noah Gibbs, Jimmy Miller, Takashi Kokubun, Ufuk Kayserilioglu, Mike Dalessio, Jean Boussier, John Hawthorn, Rafael França, and more!

Maxime Chevalier-Boisvert obtained a PhD in compiler design at the University of Montreal in 2016, where she developed Basic Block Versioning (BBV), a JIT compiler architecture optimized for dynamically-typed programming languages. She leads the YJIT project at Shopify.

Open source software plays a vital and integral part at Shopify. If being a part of an Engineering organization that’s committed to the support and stewardship of open source software sounds exciting to you, visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Ruby 3.2’s YJIT is Production-Ready