4 minute read
Merchants can sell their products on Shopify in many ways. In addition to their own website, merchants can integrate with some of the biggest social media channels on the internet. Over 140,000 merchants participate in our Facebook channel, where they can display and sell their products on Facebook.
We launched the Facebook channel in 2015 with just 5,000 merchants. As the number of merchants taking part grew, so did the need to scale the service. Recently, we rolled out a optimization to this channel app, which enabled us to horizontally scale the backend of our app and keep the high speed of synchronization between Facebook and Shopify.
We had to do this to ensure we could keep up with the growth of Shopify merchants, now at over 500,000. We learned a lot from diagnosing what we needed for the future and in this post I want to describe how we achieved the horizontal scalability.
Our sync system has two distinct parts.
At the front of this pipeline, we accept all product update requests, in the form of webhooks, and mark the associated products as needing to be updated on Facebook.
The back part of the pipeline does all the heavy lifting. It pulls the latest product data from the Shopify API. It then checks whether the updates to the product are data that affects how the product is presented on Facebook, and then checks the validity of the product’s data per various validation rules. If any of these checks fail, the product is not synced to Facebook, and the merchant is notified through the UI.
The team focused our attention on the serial nature of the back part of the pipeline. When our app found a shop’s products that needed syncing, it would take the first 100 products from this data set and attempt to synchronize them serially. Because single product syncs aren’t cross-dependent, the most obvious solution was to parallelize the sync operation per shop, as well as product levels.
Part of this work was already in place. We run a scheduler process that enqueues a certain set of jobs at regular intervals, ranging from minutes to once a day. The shop synchronization scheduled job is configured to retrieve a 1,000 shops at random from the list of all shops with products needing sync. It would then enqueue a job per shop, which would attempt to sync that shop’s product set. We decided to replace that serial product sync path with a concurrent one to considerably speed up things.
To manage the lifetime of concurrent product sync jobs, we leveraged Sidekiq Pro’s batching functionality. The per-shop sync job was changed to act as a supervisor that enqueues a batch of product sync jobs. These product sync jobs run concurrently and the supervisor waits for them all to complete. We required this behavior as the supervisor job holds a lock on the shop’s domain, ensuring that we do not generate race conditions on a product record level.
After the individual product sync jobs complete, the supervisor checks to see if there any more products that need syncing for its shop. If so, the supervisor re-enqueues a copy of itself and completes. This immediate re-enqueue greatly increases the utility of our sync workers as we are no longer waiting for the scheduler to select this shop on the polling interval.
Concurrent product sync combined with immediate re-enqueueing returned impressive results. Prior to the change, the number of products needing sync would increase by 1,000 products per hour on average. After the full rollout, the system is ahead most of the time, decreasing the size of the dataset by 2,000 products per hour. Most importantly, the syncing is now horizontally scalable. We can increase the throughput of product sync by simply adding more servers, which was not possible with serial product sync.
Finding ways to horizontally scale applications written in Ruby can be tricky. However, it is possible. Architecting Rails apps such that the majority of intensive work is offloaded to background jobs, opens up the possibility to leverage the wealth of libraries in the Ruby community in order to make apps run fast in high concurrency environments.