GitHub Does My Operations Homework: A Ruby Speed Story

Hey, folks! Some of you may remember me from Rails Ruby Bench, Rebuilding Rails, or a lot of writing about Ruby 3 and Ruby/Rails performance. I’ve joined Shopify, on the YJIT team. YJIT is an open-source Just-In-Time Compiler we’re building as a fork of CRuby. By the time you read this, it should be merged into prerelease CRuby in plenty of time for this year’s Christmas Ruby release.

I’ve built a benchmarking harness for YJIT and a big site of graphs and benchmarks, updated twice a day .

I’d love to tell you more about that. But before I do, I know you’re asking, How fast is YJIT? That’s why the top of our new public page of YJIT results looks like this:

 this is a big textbox saying: “Overall, YJIT is 20.6% faster than interpreted CRuby, or 17.4% faster than MJIT! On Railsbench specifically, YJIT is 18.5% faster than CRuby, 21.0% faster than MJIT!”
Overall, YJIT is 20.6% faster than interpreted CRuby, or 17.4% faster than MJIT! On Railsbench specifically, YJIT is 18.5% faster than CRuby, 21.0% faster than MJIT!

After that, there are lots of graphs and, if you click through, giant tables of data. I love giant tables of data.

And I hate doing constant ops work. So let’s talk about a low effort way to make GitHub do all your operational work for a constantly-updated website, shall we?

I’ll talk benchmarks along the way, because I am still me.

By the way, I work on the YJIT team at Shopify. We’re building a Ruby optimizer to make Ruby faster for everybody. That means I’ll be writing about it. If you want to keep up, this blog has a subscribe thing (down below.) Fancy, right?

The Bare Necessities

I’ve built a few YJIT benchmarks. So have a lot of other folks. We grabbed some existing public benchmarks and custom-built others. The benchmarks are all open-source so please, have a look. If there’s a type of benchmark you wish we had, we take pull requests!

When I joined the YJIT team, that repo had a perfectly serviceable runner script that would run benchmarks and print the results to console (which still exists, but isn’t used as much anymore.) But I wanted to compare speed between different Ruby configurations and do more reporting. Also, where do all those reports get stored? That’s where my laziness kicked in.

GitHub Pages is a great way to have GitHub host your website for free. A custom Jekyll config is a great way to get full control of the HTML. Once we had results to post, I could just commit them to Git, push them, and let GitHub take care of the rest.

But Jekyll won’t keep it all up to date. That needs GitHub Actions. Between them, the final result is benchmarks run automatically, the site updates automatically, and it won’t email me unless something fails.

Perfect.

Want to see the gritty details?

Setting up Jekyll

GitHub Pages run on Jekyll. You can use something else, but then you have to run it on every commit. If you use Jekyll, GitHub runs it for you and tells you when things break. But you’d like to customise how Jekyll runs and test locally with bundle exec jekyll serve. So you need to set up _config.yml in a way that makes all that happen. GitHub has a pretty good setup guide for that. And here's _config.yml for speed.yjit.org.

Of course, customising the CSS is hard when it’s in a theme. You need to copy all the parts of the theme into your local repo, like I did, if you want to change how they work (like not supporting <b> for bold and requiring <strong>, I’m looking at you , Slate).

But once you have that set up, GitHub will happily build for you. And it’s easy! No problem! Nothing can go wrong!

Oh, uh, I should mention, maybe, hypothetically, there might be something you want to put in more than one place. Like, say, a graph that can go on the front page and on a details page, something like that. You might be interested to know that Jekyll requires anything you include to live under _includes or the current subdirectory, so you have to generate your graph in there. Jekyll makes it really hard to get around the has to be under _includes rule. And once you’ve put the file under _includes, if you want to put it onto a page with its own URL, you should probably research Jekyll collections. And an item in a collection gets one page, not one page per graph… Basically, your continuous reporting code, like mine, is going to need to know more about Jekyll than you might wish.

A snippet of Jekyll _config.yml that adds a collection of benchmark objects which should be output as individual pages

But once you’ve set Jekyll up, you can have it run the benchmarks, and then you have nice up-to-date data files. You’ll have to generate graphs and reports there too. You can pre-run jekyll build to see if everything looks good. And as a side benefit, since you’re going to need to give it GitHub credentials to check in its data files, you can have it tell you if the performance of any benchmark drops too much.

AWS and GitHub Actions, a Match Made In… Somewhere

GitHub actions are pretty straightforward, and you can set one to run regularly, like a cron job. So I did that. And it works with barely a hiccup! It was easy! Nothing could go wrong.

Of course, if you’re benchmarking, you don’t want to run your benchmarks in GitHub Actions. You want to do it where you can control the performance of the machine it runs on. Like an AWS instance! Nothing could go wrong.

I just needed to set up some repo secrets for logging into the AWS instance. Like a private key, passed in an environment variable and put into an SSH identity file, that really has to end with a newline or everything breaks. But it’s fine. Nothing could go wrong!

Hey, did you know that SSH remembers the host SSH key from any previous time you SSH’d there? And that GitHub Actions uses a shared .known_hosts file for those keys? And AWS re-uses old public IP addresses? So there’s actually a pretty good chance GitHub Actions will refuse to SSH to your AWS instance unless you tell it -oStrictHostKeyChecking=no. Also, SSH doesn’t usually pass environment variables through, so you’re going to need to assign them on its command line.

So, I mean, okay, maybe something could go wrong.

If you want to SSH into an AWS instance from GitHub Actions, you may want to steal our code, is what I’m saying.

For the Love of Graphs

Of course, none of this gets you those lovely graphs. We all want graphs, right? How does that work? Any way you want, of course. But we did a couple of things you might want to look at.

A line graph of how four benchmarks’ results have changed over time, with ‘whiskers’ at each point to show the uncertainty of the measurement.

A line graph of how four benchmarks’ results have changed over time, with ‘whiskers’ at each point to show the uncertainty of the measurement.

For the big performance over time graph on the front page, I generated a D3.js graph from Erb. If you’ve used Rails, generating HTML and JS from Ruby should sound pretty reasonable. I’ve had good luck with it for several different projects. D3 is great for auto-generating your X and Y axis, even on small graphs, and there’s lots of great example code out there.

If you want to embed your results, you can generate static SVGs from Ruby. That takes more code, and you’ll probably have more trouble with finicky bits like the X and Y axis or the legend. Embeddable graphs are hard in general since you can’t really use CSS and everything has to be styled inline, plus you don’t know the styling for the containing page. Avoid it if you can, frankly, or use an iframe to embed. But it’s nice that it’s an option.

A large bar graph of benchmark results with simpler axis markings and labels.

A large bar graph of benchmark results with simpler axis markings and labels.

Both SVG approaches, D3 and raw SVG, allow you to do fun things with JavaScript like mouseover (we do that on speed.yjit.org) or hiding and showing benchmarks dynamically (like we do on the timeline deep-dive). I wouldn’t try that for embeddable graphs, since they need more JavaScript that may not run inside a random page. It’s more enjoyable to implement interesting features with D3 instead of raw SVG.

a blocky, larger-font bar graph generated using matplotlib

A blocky, larger-font bar graph generated using matplotlib.

If fixed-sized images work for you, matplotlib also works great. We don’t currently use that for speed.yjit.org, but we have for other YJIT projects.

Reporting Isn’t Just Graphs

Although it saddens my withered heart, reporting isn’t just generating pretty graphs and giant, imposing tables. You also need a certain amount of English text designed to be read by “human beings.”

That big block up-top that says how fast YJIT is? It’s generated from an Erb template, of course. It’s a report, just like the graphs underneath it. In fact, even the way we watch if the results drop is calculated from two JSON files that are both generated as reports—each tripwire report is just a list of how fast every benchmark was at a specific time, and an issue gets filed automatically if any of them drop too fast.

So What’s the Upshot?

There’s a lot of text up there. Here’s what I hope you take away:

GitHub Actions and GitHub Pages do a lot for you if you’re running a batch-updated dynamic site. There are a few weird subtleties, and it helps to copy somebody else’s code where you can.

YJIT is pretty fast. Watch this space for more YJIT content in the future. You can subscribe below.

Graphs are awesome. Obviously.

Noah Gibbs wrote the ebook Rebuilding Rails and then a lot about how fast Ruby is at various tasks. Despite being a grumpy old programmer in Inverness, Scotland, Noah believes that some day, somehow, there will be a second game as good as Stuart Smith’s Adventure Construction Set for the Apple IIe. Follow Noah on Twitter and GitHub.


Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Default.