One of the biggest challenges most managers face (in any industry) is trying to assign their reports work in an efficient and effective way. But as data science leaders—especially those in an embedded model—we’re often faced with managing teams with responsibilities that traverse multiple areas of a business. This juggling act often involves different streams of work, areas of specialization, and stakeholders. For instance, my team serves five product areas, plus two business areas. Without a strategy for dealing with these stakeholders and related areas of work, we risk operational inefficiency and chaotic outcomes.

There are many frameworks out there that suggest the most optimal way to structure a team for success. Below, we’ll review these frameworks and their positives and negatives when applied to a data science team. We’ll also share the framework that’s worked best for empowering our data science teams to drive impact.

An example of the number of product and business areas my team supports at Shopify — An example of the number of product and business areas my data team supports at Shopify

First, Some Guiding Principles

Before looking at frameworks for managing these complex team structures, I’ll first describe some effective guiding principles we should use when organizing workflows and teams:

Efficiency: Any structure must provide an ability to get work done in an efficient and effective manner.
Influence: Structures must be created in such a way that your data science team continues to have influence on business and product strategies. Data scientists often have input that is critical to business and product success, and we want to create an environment where that input can be given and received.
Stakeholder clarity: We need to create a structure where stakeholders clearly know who to contact to get work done, and seek help and advice from.
Stability: Some teams structures can create instability for reports, which leads to a whole host of other problems.
Growth: If we create structures where reports only deal with stakeholders and reactive issues, it may be difficult for them to develop professionally. We want to ensure reports have time to tackle work that enables them to acquire a depth of knowledge in specific areas.
Flexibility: Life happens. People quit, need change, or move on. Our team structures need to be able to deal with and recognize that change is inevitable.

Traditional Frameworks for Organizing Data Teams

Alright, now let’s look at some of the more popular frameworks used to organize data teams. While they’re not the only ways to structure teams and align work, these frameworks cover most of the major aspects in organizational strategy.

Swim Lanes

You’ve likely heard of this framework before, and maybe even cringed when someone has told you or your report to "stay in your swim lanes". This framework involves assigning someone to very strictly defined areas of responsibility. Looking at the product and business areas my own team supports as an example, we have seven different groups to support. According to the swim lane framework, I would assign one data scientist to each group. With an assigned product or business group, their work would never cross lanes.

In this framework, there's little expected help or cross-training that occurs, and everyone is allowed to operate with their own fiefdom. I once worked in an environment like this. We were a group of tenured data scientists who didn’t really know what the others were doing. It worked for a while, but when change occurred (new projects, resignations, retirements) it all seemed to fall apart.

Let’s look at this framework’s advantages:

Distinct areas of responsibility. In this framework, everyone has their own area of responsibility. As a manager, I know exactly who to assign work to and where certain tasks should go on our board. I can be somewhat removed from the process of workload balancing.
High levels of individual ownership. Reports own an area of responsibility and have a stake in its success. They also know that their reputation and job are on the line for the success or failure of that area.
The point-of-contact is obvious to stakeholders. Ownership is very clear to stakeholders, so they always know who to go. This model also fosters long-term relationships.

And the disadvantages:

Lack of cross-training. Individual reports will have very little knowledge of the work or codebase of their peers. This becomes an issue when life happens and we need to react to change.
Reports can be left on an island. Reports can be left alone which tends to matter more when times are tough. This is a problem for both new reports who are trying to onboard and learn new systems, but also for tenured reports who may suddenly endure a higher workload. Help may not be coming.
Fails under high-change environments. For the reasons mentioned above, this system fails under high-change environments. It also creates a team-level rigidity that means when general organizational changes happen, it’s difficult to react and pivot.

Referring back to our guiding principles when considering how to effectively organize a date team, this framework hits our stakeholder clarity and efficiency principles, but only in stable environments. Swim lanes often fail in conditions of change or when the team needs to pivot to new responsibilities—something most teams should expect.

Stochastic Process

As data scientists, we’re often educated in the stochastic process and this framework resembles this theory. As a refresher, the stochastic process is defined by randomness of assignment, where expected behavior is near random assignments to areas or categories.

Likewise, in this framework each report takes the next project that pops up, resembling a random assignment of work. However, projects are prioritized and when an employee finishes one project, they take on the next, highest priority project.

This may sound overly random as a system, but I’ve worked on a team like this before. We were a newly setup team, and no one had any specific experience with any of the work we were doing. The system worked well for about six months, but over the course of a year, we felt like we'd been put through the wringer and as though no one had any deep knowledge of what we were working on.

The advantages of this framework are:

High levels of team collaboration. Everyone is constantly working on each other’s code and projects, so a high-level of collaboration tends to develop.
Reports feel like there is always help. Since work is assigned in terms of next priority gets the resource, if someone is struggling with a high-priority task, they can just ask for help from the next available resource.
Extremely flexible under high levels of change. Your organization decides to reorg to align to new areas of the business? No problem! You weren’t aligned to any specific groups of stakeholders to begin with. Someone quits? Again, no problem. Just hire someone new and get them into the rotation.

And the disadvantages:

Can feel like whiplash. As reports are asked to move constantly from one unrelated project to the next, they can develop feelings of instability and uncertainty (aka whiplash). Additionally, as stakeholders work with a new resource on each project, this can limit the ability to develop rapport.
Inability to go deep on specialized subject matters. It’s often advantageous for data scientists to dive deep into one area of the business or product. This enables them to develop deep subject area knowledge in order to build better models. If we’re expecting them to move from project to project, this is unlikely to occur.
Extremely high management inputs. As data scientists become more like cogs in a wheel in this type of framework, management ends up owning most stakeholder relationships and business knowledge. This increases demands on individual managers.

Looking at the advantages and disadvantages of this framework, and measuring them against our guiding principles, this framework only hits two of our principles: flexibility and efficiency. While this framework can work in very specific circumstances (like brand new teams), the lack of stakeholder clarity, relationship building, and growth opportunity will result in the failure of this framework to sufficiently serve the needs of the team and stakeholders.

A New Framework: Diamond Defense

Luckily, we’ve created a third way to organize data teams and work. I like to compare this framework to the concept of diamond defense in basketball. In diamond defense, players have general areas (zones) of responsibility. However, once play starts, the defense focuses on trapping (sending extra resources) to the toughest problems, while helping out areas in the defense that might be left with fewer resources than needed.

This same defense method can be used to structure data teams to be highly effective. In this framework, you loosely assign reports to your product or business areas, but ensure to rotate resources to tough projects and where help is needed.

Referring back to the product and business areas my team supports, you can see how I use this framework to organize my team:

An example of how I use the diamond defense framework to structure my data team and align them to zones of work — An example of how I use the diamond defense framework to structure my data team

Each data scientist is assigned to a zone. I then aligned our additional business areas (Finance and Marketing) to a product group, and assigned resources to these groupings. Finance and Marketing are aligned differently here because they are not supported by a team of Software Engineers. Instead, I aligned them to the product group that mostly closely resembles their work in terms of data accessed and models built. Currently, Marketing has the highest number of requests for our team, so I added more resources to support this group.

You’ll notice on the chart that I keep myself and an additional data scientist in a bullpen. This is key to the diamond defense as it ensures we always have additional resources to help out where needed. Let’s dive into some examples of how we may use resources in the bullpen:

DS2 is under-utilized. We simultaneously find out that DS1 is overwhelmed by the work of their product area, so we tap DS2 to help out.
SR DS1 quits. In this case, we rotate DS4 into their place, and proceed to hire a backfill.
SR DS2 takes a leave of absence. In this situation, I as the manager slide in to manage SR DS2’s stakeholders. I would then tap DS4 to help out, while the intern who is also assigned to the same area continues to focus on getting their work done with help from DS4.

This framework has several advantages:

Everyone has dedicated areas to cover and specialize in. As each report is loosely assigned to a zone (specific product or business area), they can go deep and develop specialized skills.
Able to quickly jump on problems that pop up. Loose assignment to zones enable teams the flexibility to move resources to the highest-priority areas or toughest problems.
Reports can get the help they need. If a report is struggling with the workload, you can immediately send more resources towards that person to lighten their load.

And the disadvantages:

Over-rotation. In certain high-change circumstances, a situation can develop where data scientists spend most of their time covering for other people. This can create very volatile and high-risk situations, including turnover.

This framework hits all of our guiding principles. It provides the flexibility and stability needed when dealing with change, it enables teams to efficiently tackle problems, focus areas enable report growth and stakeholder clarity, and relationships between reports and their stakeholders improves the team's ability to influence policies and outcomes.

Conclusion

There are many ways to organize data teams to different business or product areas, stakeholders, and bodies of work. While the traditional frameworks we discussed above can work in the short-term, they tend to over-focus either on rigid areas of responsibility or everyone being able to take on any project.

If you use one of these frameworks and you’re noticing that your team isn’t working as effectively as you know they can, give our diamond defense framework a try. This hybrid framework addresses all the gaps of the traditional frameworks, and ensures:

Reports have focus areas and growth opportunity
Stakeholders have clarity on who to go to
Resources are available to handle any change
Your data team is set up for long-term success and impact

Every business and team is different, so we encourage you to play around with this framework and identify how you can make it work for your team. Just remember to reference our guiding principles for complex team structures.

Levi manages the Banking and Accounting data team at Shopify. He enjoys finding elegant solutions to real-world business problems using math, machine learning, and elegant data models. In his spare time he enjoys running, spending time with his wife and daughters, and farming. Levi can be reached via LinkedIn.

Are you passionate about solving data problems and eager to learn more about Shopify? Check out openings on our careers page.

How to Structure Your Data Team for Maximum Influence

First, Some Guiding Principles

Traditional Frameworks for Organizing Data Teams

Swim Lanes

Stochastic Process

A New Framework: Diamond Defense

Conclusion