Winning AI4TSP: Solving the Travelling Salesperson Problem with Self-programming Machines

Winning AI4TSP: Solving the Travelling Salesperson Problem with Self-programming Machines

Running a business requires making a lot of decisions. To be competitive, they have to be good. There are two complications, though:

  1. Some problems are computationally very hard to solve.
  2. In reality, we are dealing with uncertainty, so we do not even know what exact problem setting we should optimize for.

The AI4TSP Competition fosters research on the intersection of optimization (addressing the first issue of efficient computation for hard problems) and artificial intelligence (addressing the second issue of handling uncertainty). Shopify optimization expert Meinolf Sellmann collaborated with his former colleagues Tapan Shah at GE Research, Kevin Tierney, Fynn Schmitt-Ulms, and Andre Hottung from the University of Bielefeld to compete and win first prize in both tracks of the competition. The type of problem studied in this competition matters to Shopify as the optimization of our fulfillment system requires making decisions based on estimated data.

The Travelling Salesperson Problem

The AI4TSP Competition focuses on the Travelling Salesperson Problem (TSP), one of the most studied routing problems in the optimization community. The task is to determine the order to visit a given set of locations, starting from, and returning to, a given home location, so the total travel time is minimized. In its original form, the travel times between all locations are known upfront. In the competition, these times weren’t known but sampled according to a probability distribution. The objective was to visit as many locations as possible within a given period of time, whereby each location was associated with a specific reward. To complicate matters further, the locations visited on the tour had to be reached within fixed time windows. Arriving too early meant having to wait until the location would open, arriving too late was associated with penalties.

An image of two solutions to the same TSP instance with the home location in black. The route solutions can be done counterclockwise or clockwise
Two solutions to the same TSP instance (home location in black)

When travel times are known, the problem looks innocent enough. However, consider this: the number of possible tours grows more than exponentially and is given by “n! = 1 * 2 * 3 … * n” (n factorial) for a problem instance with n locations. Even if we could:

  1. evaluate, in parallel, one potential tour for every atom in the universe
  2. have each atomic processor evaluate the tours at Planck time (shortest time unit that anything can be measured in)
  3. run that computer from the Big Bang until today.

It wouldn’t even enumerate all solutions for just one TSP instance with 91 locations. The biggest problems at the competition had 200 locations—with over 10375 potential solutions.

The Competition Tracks

The competition consisted of two tracks. In the first, participants had to determine a tour for a given TSP instance that would work well on expectation when averaged over all possible travel time scenarios. A tour had to be chosen and participants had to stick to that tour no matter how the travel times turned out when executing the tour. The results were then averaged ten times over 100,000 scenarios to determine the winner.

A table of results for the final comparison of front runners in Track 1. It shows Meinholf's team as the winner.
Final Comparison of Front Runners in Track 1 (Shopify and Friend’s Team “Convexers”)

In the second track, it was allowed to build the tour on the fly. At every location, participants could choose which location to go to next, taking into account how much time had already elapsed. The policy that determined how to choose the next location was evaluated on 100 travel time realizations for each of 1,000 different TSP instances to determine the winner.

Optimal Decisions Under Uncertainty

For hard problems like the TSP, optimization requires searching. This search can be systematic, whereby we search in such a manner that we can efficiently keep record of the solutions that have already been looked at. Alternatively, we can search heuristically, which generally refers to search methods that work non-systematically and may revisit the same candidate solution multiple times during the search. This is a drawback of heuristic search, but it offers much more flexibility as the search controller can guide where to go next opportunistically and isn’t bound by exploring spaces that neatly fit to our existing search record. However, we need to deploy techniques that allow us to escape local regions of attraction, so that we don’t explore the same basin over and and over.

For the solution to both tracks, Shopify and friends used heuristic search, albeit in two very different flavors. For the first track, the team applied a search paradigm called dialectic search. For the second track, they used what’s known in machine learning as deep reinforcement learning.

The Age of Self-Programming Machines

Key to making both approaches work is to allow the machine to learn from prior experience and to adjust the program automatically. If the ongoing machine learning revolution had to be summarized in just one sentence, it would be:

If, for a given task, we fail to develop algorithms with sufficient performance, then shift the focus to building an algorithm that can build this algorithm for us, automatically.

A recent prominent example where this revolution has led to success is AlphaFold, DeepMind’s self-taught algorithm for predicting the 3D structure of proteins. Humans tried to build algorithms that could predict this structure for decades, but were unable to reach sufficient accuracy to be practically useful. The same was demonstrated for tasks like machine vision, playing board games, and optimization. At another international programming competition, the MaxSAT Evaluation 2016, Meinolf and his team entered a self-tuned dialectic search approach which won four out of nine categories and ten medals overall. 

These examples show that machine-generated algorithms can vastly outperform human-generated approaches. Particularly when problems become hard to conceptualize in a concise theory and hunches or guesses must be made during the execution of the algorithm, allowing the machine to learn and improve based on prior experience is the modern way to go.

Meinolf Sellmann, Director for Network Optimization at Shopify, is best known for algorithmic research, with a special focus on self-improving algorithms, automatic algorithm configuration and algorithm portfolios based on artificial intelligence, combinatorial optimization, and the hybridization thereof. He received a doctorate degree (Dr. rer. nat.) from Paderborn University (Germany). Prior to this he was Technical Operations Leader for machine learning and knowledge discovery at General Electric, senior manager for data curation at IBM, Assistant Professor at Brown University, and Postdoctoral Scholar at Cornell University.
His honors include the Prize of the Faculty of the University of Paderborn (Germany) for his doctoral thesis, an NSF Career Award in 2007, over 20 Gold Medals at international SAT and MaxSAT Competitions, and first places at both tracks of the 2021 AI for TSP Competition. Meinolf has also been invited as keynote speaker and program chair of many international conferences like AAAI, IAAI, Informs, LION and CP.


Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Default.