Over the past few years, as I’ve been managing multiple developer infrastructure teams at once, I’ve found some tensions that are hard to resolve. In my current mental model, I have found that there are three poles that have a natural tension and are thus tricky to balance: management support, system and domain expertise, and road maps. I’m going to discuss the details of these poles and some strategies I’ve tried to manage them.
What’s Special About Developer Infrastructure Teams?
Although this model likely can apply to any software development team, the nature of developer infrastructure (Dev Infra) makes this situation particularly acute for managers in our field. These are some of the specific challenges faced in Dev Infra:
- Engineering managers have a lot on their plates. For whatever reason, infrastructure teams usually lack dedicated product managers, so we often have to step in to fill that gap. Similarly, we’re responsible for tasks that usually fall to UX experts, such as doing user research.
- There’s a lot of maintenance and support. Teams are responsible for keeping multiple critical systems online with hundreds or thousands of users, usually with only six to eight developers. In addition, we often get a lot of support requests, which is part of the cost of developing in-house software that has no extended community outside the company.
- As teams tend to organize around particular phases in the development workflow, or sometimes specific technologies, there’s a high degree of domain expertise that’s developed over time by all its members. This expertise allows the team to improve their systems and informs the team’s road map.
What Are The Three Poles?
The Dev Infra management poles I’ve modelled are tensions, much like that between product and engineering. They can’t, I don’t believe, all be solved at the same time—and perhaps they shouldn’t be. We, Dev Infra managers, balance them according to current needs and context and adapt as necessary. For this balancing act, it behooves us to make sure we understand the nature of these poles.
1. Management Support
Supporting developers in their career growth is an important function of any engineering manager. Direct involvement in team projects allows the tightest feedback loops between manager and report, and thus the highest-quality coaching and mentorship. We also want to maximize the number of reports per manager. Good managers are hard to find, and even the best manager adds a bit of overhead to a team’s impact.
We want the manager to be as involved in their reports’ work as possible, and we want the highest number of reports per manager that they can handle. Where this gets complicated is balancing the scope and domain of individual Dev Infra teams and of the whole Dev Infra organization. This tension is a direct result of the need for specific system and domain expertise on Dev Infra teams.
2. System and Domain Expertise
As mentioned above, in Dev Infra we tend to build teams around domains that represent phases in the development workflow, or occasionally around specific critical technologies. It’s important that each team has both domain knowledge and expertise in the specific systems involved. Despite this focus, the scope of and opportunities in a given area can be quite broad, and the associated systems grow in size and complexity.
Expertise in a team’s systems is crucial just to keep everything humming along. As with any long-running software application, dependencies need to be managed, underlying infrastructure has to be occasionally migrated, and incidents must be investigated and root causes solved. Furthermore, at any large organization, Dev Infra services can have many users relative to the size of the teams responsible for them. Some teams will require on-call schedules in case a critical system breaks during an emergency (finding out the deployment tool is down when you’re trying to ship a security fix is, let’s just say, not a great experience for anyone).
A larger team means less individual on-call time and more hands for support, maintenance, and project work. As teams expand their domain knowledge, more opportunities are discovered for increasing the impact of the team’s services. The team will naturally be driven to constantly improve the developer experience in their area of expertise. This drive, however, risks a disconnect with the greatest opportunities for impact across Dev Infra as a whole.
3. Road Maps
Specializing Dev Infra teams in particular domains is crucial for both maintenance and future investments. Team road maps and visions improve and expand upon existing offerings: smoothing interfaces, expanding functionality, scaling up existing solutions, and looking for new opportunities to impact development in their domain. They can make a big difference to developers during particular phases of their workflow like providing automation and feedback while writing code, speeding up continuous integration (CI) execution, avoiding deployment backlogs, and monitoring services more effectively.
As a whole Dev Infra department, however, the biggest impact we can have on development at any given time changes. When Dev Infra teams are first created, there’s usually a lot of low- hanging fruit—obvious friction at different points in the development workflow—so multiple teams can broadly improve the developer experience in parallel. At some point, however, some aspects of the workflow will be much smoother than others. Maybe CI times have finally dropped to five minutes. Maybe deploys rarely need attention after being initiated. At a large organization, there will always be edge cases, bugs, and special requirements in every area, but their impact will be increasingly limited when compared to the needs of the engineering department as a whole.
At this point, there may be an opportunity for a large new initiative that will radically impact development in a particular way. There may be a few, but it’s unlikely that there will be the need for radical changes across all domains. Furthermore, there may be unexplored opportunities and domains for which no team has been assembled. These can be hard to spot if the majority of developers and managers are focused on existing well-defined scopes.
How to Maintain the Balancing Act
Here’s the part where I confess that I don’t have a single amazing solution to balance management support, system maintenance and expertise, and high-level goals. Likely there are a variety of solutions that can be applied and none are perfect. Here are three ideas I’ve thought about and experimented with.
1. Temporarily Assign People from One Team to a Project on Another
If leadership has decided that the best impact for our organization at this moment is concentrated in the work of a particular team, call it Team A, and if Team A’s manager can’t effectively handle any more reports, then a direct way to get more stuff done is to take a few people from another team (Team B) and assign them to the Team A’s projects. This has some other benefits as well: it increases the number of people with familiarity in Team A’s systems, and people sometimes like to change up what they’re working on.
When we tried this, the immediate question was “should the people on loan to Team A stay on the support rotations for their ‘home’ team?” From a technical expertise view, they’re important to keep the lights on in the systems they’re familiar with. Leaving them on such rotations prevents total focus on Team A, however, and at a minimum extends the onboarding time. There are a few factors to consider: the length of the project(s), the size of Team B, and the existing maintenance burden on Team B. Favour removing the reassigned people from their home rotations, but know that this will slow down Team A’s work even more as they pick up the extra work.
The other problem we ran into is that the manager of Team B is disconnected from the work their reassigned reports are now working on. Because the main problem is that Team A’s manager doesn’t have enough bandwidth to have more reports, there’s less management support for the people on loan, in terms of mentoring, performance management, and prioritization. The individual contributor (IC) can end up feeling disconnected from both their home team and the new one.
2. Have a Whole Team Contribute to Another Team’s Goals
We can mitigate at least the problem of ICs feeling isolated in their new team if we have the entire team (continuing the above nomenclature, Team B) work on the systems that another team (Team A) owns. This allows members of Team B to leverage their existing working relationships with each other, and Team B’s manager doesn’t have to split their attention between two teams. This arrangement can work well if there is a focused project in Team A’s domain that somehow involves some of Team B’s domain expertise.
This is, of course, a very blunt instrument, in that no project work will get done on Team B’s systems, which themselves still need to be maintained. There’s also a risk of demotivating the members of Team B, who may feel that their domain and systems aren’t important, although this can be mitigated to some extent if the project benefits or requires their domain expertise. We’ve had success here in exactly that way in an ongoing project done by our Test Infrastructure team to add data from our CI systems to Services DB, our application-catalog app stewarded by another team, Production Excellence. Their domain expertise allowed them to understand how to expose the data in the most intuitive and useful way, and they were able to more rapidly learn Services DB’s codebase by working together.
3. Tiger Team
A third option we’ve tried out in Dev Infra is a tiger team: “a specialized, cross-functional team brought together to solve or investigate a specific problem or critical issue.” People from multiple teams form a new, temporary team for a single project, often prototyping a new idea. Usually the team operates in a fast-paced, autonomous way towards a very specific goal, so management oversight is fairly limited. By definition, most people on a tiger team don’t usually work together, so the home and new team dichotomy is sidestepped, or at least very deliberately managed. The focus of the team means that members put aside maintenance, support, and other duties from their home team for the duration of the team’s existence.
The very first proof of concept for Spin was built this way over about a month. At that time, the value was sufficiently clear that we then formed a whole team around Spin and staffed it up to tackle the challenge of turning it into a proper product. We’ve learned a lot since then, but that first prototype was crucial in getting the whole project off the ground!
No Perfect Solutions
From thinking about and experimenting with team structures during my decade of management experience, there doesn’t seem to be a perfect solution to balance the three poles of management support, system maintenance and domain expertise, and high-level goals. Each situation is unique, and trade-offs have to be judged and taken deliberately. I would love to hear other stories of such balancing acts! Find me on Twitter and LinkedIn.
Mark Côté is the Director of Engineering, Developer Infrastructure, at Shopify. He's been in the developer-experience space for over a decade and loves thinking about infrastructure-as-product and using mental models in his strategies.
Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.