The Evolution of Kit: Automating Marketing Using Machine Learning

For many Shopify business owners, whether they’ve just started their entrepreneur journey or already have an established business, marketing is one of the essential tactics to build audience and drive sales to their stores. At Shopify, we offer various tools to help them do marketing. One of them is Kit, a virtual employee that can create easy Instagram and Facebook ads, perform email marketing automation and offer many other skills through powerful app integrations. I’ll talk about the engineering decision my team made to transform Kit from a rule based system to an artificially-intelligent assistant that looks at a business owner’s products, visitors, and customers to make informed recommendations for the next best marketing move. 

As a virtual assistant, Kit interacts with business owners through messages over various interfaces including Shopify Ping and SMS. Designing the user experience for messaging is challenging especially when creating marketing campaigns that can involve multiple steps, such as picking products, choosing audience and selecting budget. Kit not only builds a user experience to reduce the friction for business owners in creating ads, but also goes a step further to help them create more effective and performant ads through marketing recommendation.

Simplifying Marketing Using Heuristic Rules

Marketing can be daunting, especially when the number of different configurations in the Facebook Ads Manager can easily overwhelm its users.

Facebook Ads Manager ScreenshotFacebook Ads Manager screenshot

There is a long list of settings that need configuring including objective, budget, schedule, audience, ad format and creative. For a lot of business owners who are new to marketing, understanding all these concepts is already time consuming, let alone making the correct decision at every decision point in order to create an effective marketing campaign.

Kit simplifies the flow by only asking for the necessary information and configuring the rest behind the scenes. The following is a typical flow on how a business owner interacts with Kit to start a Facebook ad.

Screenshot of the conversation flow on how a business owner interacts with Kit to start a Facebook ad
Screenshot of the conversation flow on how a business owner interacts with Kit to start a Facebook ad

Kit simplifies the workflow into two steps: 1) pick products as ad creative and 2) choose a budget. We use heuristic rules based on our domain knowledge and give business owners limited options to guide them through the workflow. For products, we identify several popular categories that they want to market. For budget, we offer a specific range based on the spending behavior of the business owners we want to help. For the rest of configurations, Kit defaults to best practices removing the need to make decisions based on expertise.

The first version of Kit was a standalone application that communicated with Shopify to extract information such as orders and products to make product suggestions and interacted with different messaging channels to deliver recommendations conversationally.

System interaction diagram for heuristic rules based recommendation. There are two major systems that Kit interacts with: Shopify for product suggestions; messaging channels for communication with business owners
System interaction diagram for heuristic rules based recommendation. There are two major systems that Kit interacts with: Shopify for product suggestions; messaging channels for communication with business owners

Building Machine Learning Driven Recommendation

One of the major limitations in the existing heuristic rules-based implementations is that the range of budget is hardcoded into the application where every business owner has the same option to choose from. The static list of budget range may not fit their needs, where the more established ones with store traffic and sales may want to spend more. In addition, for many of the business owners who don’t have enough marketing experience, it’s a tough decision to choose the right amount in order to generate the optimal return.

Kit strives to automate marketing by reducing steps when creating campaigns. We found that budgeting is one of the most impactful criteria in contributing to successful campaigns. By eliminating the decision from the configuration flow, we reduced the friction for business owners to get started. In addition, we eliminated the first step of picking products by generating a proactive recommendation for a specific category such as new products. Together, Kit can generate a recommendation similar to the following:

Screenshot of a one-step marketing recommendation
Screenshot of a one-step marketing recommendation

To generate this recommendation, there are two major decisions Kit has to make:

  1. How much is the business owner willing to spend?
  2. Given the current state of the business owner, will the budget be enough for the them to successfully generate sales?

Kit decided that for business owner Cheryl, she should spend about $40 for the best chance to make sales given the new products marketing opportunity. From a data science perspective, it’s broken down into two types of machine learning problems:

  1. Regression: given a business owner’s historic spending behavior, predict the budget range that they’re likely to spend.
  2. Classification: given the budget a business owner has with store attributes such as existing traffic and sales that can measure the state of their stores, predict the likelihood of making sales.

The heuristic rules-based system allowed Kit to collect enough data to make solving the machine learning problem possible. Kit can generate actionable marketing recommendation that gives the business owners the best chance of making sales based on their budget range and the state of their stores using the data we learnt.

The second version of Kit had its first major engineering revision by implementing the proactive marketing recommendation in the app through the machine learning architecture in Google Cloud Platform:

Flow diagram on generating proactive machine learning driven recommendation in Kit
Flow diagram on generating proactive machine learning driven recommendation in Kit

There are two distinct flows in this architecture:

Training flow: Training is responsible for building the regression and classification models that are used in the prediction flow.

  1. Aggregate all relevant features. This includes the historic Facebook marketing campaigns created by business owners through Shopify, and the store state (e.g. traffic and sales) at the time when they create the marketing campaign.
  2. Perform feature engineering, a process using domain knowledge to extract useful features from the source data that are used to train the machine learning models. For historic marketing features, we derive features such as past 30 days average ad spend and past 30 days marketing sales. For shop state features, we derive features such as past 30 days unique visitors and past 30 days total orders. We take advantage of Apache Spark’s distributed computation capability to tackle the large scale Shopify dataset.
  3. Train the machine learning models using Google Cloud’s ML Engine. ML Engine allows us to train models using various popular frameworks including scikit-learn and TensorFlow.
  4. Monitor the model metrics. Model metrics are methods to evaluate the performance of a given machine learning model by comparing the predicted values against the ground truth. Monitoring is the process to validate the integrity of the feature engineering and model training by comparing the model metrics against its historic values. The source features in Step 1 can sometimes be broken leading to inaccurate feature engineering results. Even when feature pipeline is intact, it’s possible that the underlying data distribution changes due to unexpected new user behavior leading to deteriorating model performance. A monitoring process is important to keep track of historic metrics and ensure the model performs as expected before making it available for use. We employed two types of monitoring strategies: 1) threshold: alert when the model metric is beyond a defined threshold; 2) outlier detection: alert when the model metrics deviates from its normal distribution. We use z-score to detect outliers.
  5. Persist the models for prediction flow.
Prediction flow: Prediction is responsible for generating the marketing recommendation by optimizing for the budget and determining whether or not the ad will generate sales given the existing store state.
  1. Generate marketing recommendations by making predictions using the features and models prepared in the training flow.
  2. Send recommendations to Kit through Apache Kafka.

At Shopify, we have a data platform engineering team to maintain the data services required to implement both the training and prediction flows. This allows the product team to focus on building the domain specific machine learning pipelines, prove product values, and iterate quickly.

Moving to Real Time Prediction Architecture

Looking back at our example featuring business owner Cheryl, Kit decided that she can spend $40 for the best chance of making sales. In marketing, making sales is often not the first step in a business owner’s journey, especially when they don’t have any existing traffic to their store. Acquiring new visitors to the store in order to build lookalike audiences that are more relevant to the business owner is a crucial step to expand the audience size in order to create more successful marketing campaigns afterward. For this type of business owner, Kit evaluates the budget based on a different goal and suggests a more appropriate amount to acquire enough new visitors in order to build the lookalike audience. This is how the recommendation looks:

[Screenshot of a recommendation to build lookalike audience
Screenshot of a recommendation to build lookalike audience

To generate this recommendation, there are three major decisions Kit has to make:

  1. How many new visitors does the business owner need in order to create lookalike audiences?
  2. How much are they willing to spend?
  3. Given the current state of the business owner, will the budget be enough for them to acquire those visitors?

Decision two and three are solved using the same machine learning architecture as described previously. However, there’s a new complexity in this recommendation that step one needs to determine the required number of new visitors in order to build lookalike audiences. Since the traffic to a store can change in real time, the prediction flow needs to process the request at the time when the recommendation is delivered to the business owner.

One major limitation for the Spark-based prediction flow is that recommendations are optimized in batch manner rather than on demand, i.e., the prediction flow is triggered from Spark on schedule basis rather than from Kit at the time when the recommendation is delivered to business owners. With the Spark batch setting, it’s possible that the budget recommendation is already stale by the time it’s delivered to the business owner. To solve that problem, we built a real time prediction service to replace the Spark prediction flow.

Flow diagram on generating real time recommendation in Kit
Flow diagram on generating real time recommendation in Kit

One major distinction compared to the previous Spark-based prediction flow is that Kit is proactively calling into the real time prediction service to generate the recommendation.

  1. Based on the business owner’s store state, Kit decides that their marketing objective should be building lookalike audiences. Kit sends a request to the prediction service to generate budget recommendation from which the request reaches an HTTP API exposed through the web container component.
  2. Similar to the batch prediction flow in Spark, the web container generates marketing recommendations by making predictions using the features and models prepared in the training flow. However, there are several design considerations:
    1. We need to ensure efficient access to the features to minimize prediction request latency. Therefore, once features are generated during the feature engineering stage, they are immediately loaded into a key value store using Google Cloud’s Bigtable.
    2. Model prediction can be computationally expensive especially when the model architecture is complex. We use Google’s TensorFlow Serving which is a flexible, high-performance serving system for machine learning models, designed for production environments. TensorFlow Serving also provides out-of-the-box integration with TensorFlow models from which it can directly consume the models generated from the training flow with minimal configurations.
    3. Since the most heavy-lifting CPU/GPU-bound model prediction operations are dedicated to TensorFlow Serving, the web container remains a light-weight application that holds the business logic to generate recommendations. We chose Tornado as the Python web framework. By using non-blocking network I/O, Tornado can scale to tens of thousands open connections for model predictions.
  3. Model predictions are delegated to the TensorFlow Serving container.
  4. TensorFlow Serving container preloads the machine learning models generated during the training flow and uses them to perform model predictions upon requests.

Powering One Third of All Kit Marketing Campaigns

Kit started as a heuristic rules-based application that uses common best practices to simplify and automate marketing for Shopify’s business owners. We progressively improved the user experience by building machine learning driven recommendations to further reduce user friction and to optimize budgets giving business owners a higher chance of creating a more successful campaign. By first using a well established Spark-based prediction process (that’s well supported within Shopify) we showed the value of machine learning in driving user engagement and marketing results. This also allows us to focus on productionalizing an end-to-end machine learning pipeline with both training and prediction flows that serve tens of thousands of business owners. 

We learned that having a proper monitoring component in place is crucial to ensure the integrity of the overall machine learning system. We moved to an advanced real time prediction architecture to solve use cases that required time-sensitive recommendations. Although the real time prediction service introduced two additional containers (web and TensorFlow Serving) to maintain, we delegated the most heavy-lifting model prediction component to TensorFlow Serving, which is a well supported service by Google and integrates with Shopify’s existing cloud infrastructure easily. This ease of use allowed us to focus on defining and implementing the core business logic to generate marketing recommendation in the web container.

Moving to machine learning driven implementation has been proven valuable. One third of the marketing campaigns in Kit are powered by machine learning driven recommendations. Kit will continue to improve its marketing automation skills by optimizing for different marketing tactics and objectives in order to support their diverse needs.


We're always on the lookout for talent and we’d love to hear from you. Please take a look at our open positions on the Data Science & Engineering career page.