Do we need to read code before editing it?
The idea isn’t as wild as it sounds. In order to safely fix a bug or update a feature, we may need to learn some things about the code. However, we’d prefer to learn only that information. Not only does extra reading waste time, it overcomplicates our mental model. As our model grows, we’re more likely to get confused and lose track of critical info.
But can we really get away with reading nothing? Spoiler: no. However, we can get closer by skipping over areas that we know the computer is checking, saving our focus for areas that are susceptible to human error. In doing so, we’ll learn how to identify and eliminate those danger areas, so the next person can get away with reading even less.
Let’s give it a try.
Find the Entrypoint
If we’re refactoring code, we already know where we need to edit. Otherwise, we’re changing a behavior that has side effects. In a backend context, these behaviors would usually be exposed APIs. On the frontend, this would usually be something that’s displayed on the screen. For the sake of example, we’ll imagine a mobile application using React Native and Typescript, but this process generalizes to other contexts (as long as they have some concept of build or test errors; more on this later).
If our goal was to read a lot of code, we might search for all hits on RelevantFeatureName
. But we don’t want to do that. Even if we weren’t trying to minimize reading, we’ll run into problems if the code we need to modify is called AlternateFeatureName
, SubfeatureName
, or LegacyFeatureNameNoOneRemembersAnymore
.
Instead, we’ll look for something external: the user-visible strings (including accessibility labels—we did remember to add those, right?) on the screen we’re interested in. We search various combinations of string fragments, quotation marks, and UI inspectors until we find the matching string, either in the application code or in a language localization file. If we’re in a localization file, the localization key leads us to the application code that we’re interested in.
Tip
If we’re dealing with a regression, there’s an easier option: git bisect. When git bisect works, we really don’t need to read the code. In fact, we can skip most of the following steps. Because this is such a dramatic shortcut, always keep track of which bugs are regressions from previously working code.
Make the First Edit
If we’ve come in to make a simple copy edit, we’re done. If not, we’re looking for a component that ultimately gets populated by the server, disk, or user. We can no longer use exact strings, but we do have several read-minimizing strategies for zeroing in on the component:
- Where is this component on the screen, relative to our known piece of text?
- What type of standard component is it using? Is it a button? Text input? Text?
- Does it have some unusual style parameter that’s easy to search? Color? Corner radius? Shadow?
- Which button launches this UI? Does the button have searchable user-facing text?
These strategies all work regardless of naming conventions and code structure. Previous developers would have a hard time making our life harder without making the code nonfunctional. However, they may be able to make our life easier with better structure.
For example, if we’re using strategy #1, well-abstracted code helps us quickly rule out large areas of the screen. If we’re looking for some text near the bottom of the screen, it’s much easier to hit the right Text
item if we can leverage a grouping like this:
<SomeHeader />
<SomeContent />
<SomeFooter />
rather than being stuck searching through something like this:
// Header
<StaticImage />
<Text />
<Text />
<Button />
<Text />
...
// Content
...
// Footer
...
where we’ll have to step over many irrelevant hits.
Abstraction helps even if the previous developer chose wacky names for header, content, or footer, because we only care about the broad order of elements on the screen. We’re not really reading the code. We’re looking for objective cues like positioning. If we’re still unsure, we can comment out chunks of the screen, starting with larger or highly-abstracted components first, until the specific item we care about disappears.
Once we’ve found the exact component that needs to behave differently, we can make the breaking change right now, as if we’ve already finished updating the code. For example, if we’re making a new component that displays data newText
, we add that parameter to its parent’s input arguments, breaking the build.
If we’re fixing a bug, we can also start by adjusting an argument list. For example, the condition “we shouldn’t be displaying x
if y
is present” could be represented with the tagged union {mode: 'x', x: XType} | {mode: 'y'; y: YType}
, so it’s physically impossible to pass in x
and y
at the same time. This will also trigger some build errors.
Tagged Unions
Tagged unions go by a variety of different names and syntaxes depending on language. They’re most commonly referred to as discriminated unions, enums with associated values, or sum types.
Climb Up the Callstack
We now go up the callstack until the build errors go away. At each stage, we edit the caller as if we’ll get the right input, triggering the next round of build errors. Notice that we’re still not reading the code here—we’re reading the build errors. Unless a previous developer has done something that breaks the chain of build errors (for example, accepting any
instead of a strict type), their choices don’t have any effect on us.
Once we get to the top of the chain, we adjust the business logic to grab newText
or modify the conditional that was incorrectly sending x
. At this point, we might be done. But often, our change could or should affect the behavior of other features that we may not have thought about. We need to sweep back down through the callstack to apply any remaining adjustments.
On the downswing, previous developers’ choices start to matter. In the worst case, we’ll need to comb through the code ourselves, hoping that we catch all the related areas. But if the existing code is well structured, we’ll have contextual recommendations guiding us along the way: “because you changed this code, you might also like…”
Update Recommended Code
As we begin the downswing, our first line of defense is the linter. If we’ve used a deprecated library, or a pattern that creates non-obvious edge cases, the linter may be able to flag it for us. If previous developers forgot to update the linter, we’ll have to figure this out manually. Are other areas in the codebase calling the same library? Is this pattern discouraged in documentation?
After the linter, we may get additional build errors. Maybe we changed a function to return a new type, and now some other consumer of that output raises a type error. We can then update that other consumer’s logic as needed. If we added more cases to an enum, perhaps we get errors from other exhaustive switches that use the enum, reminding us that we may need to add handling for the new case. All this depends on how much the previous developers leaned on the type system. If they didn’t, we’ll have to find these related sites manually. One trick is to temporarily change the types we’re emitting, so all consumers of our output will error out, and we can check if they need updates.
Exhaustiveness
An exhaustive switch statement handles every possible enum case. Most environments don’t enforce exhaustiveness out of the box. For example, in Typescript, we need to have strictNullChecks
turned on, and ensure that the switch statement has a defined return type. Once exhaustiveness is enforced, we can remove default cases, so we’ll get notified (with build errors) whenever the enum changes, reminding us that we need to reassess this switch statement.
Our final wave of recommendations comes from unit test failures. At this point, we may also run into UI and integration tests. These involve a lot more reading than we’d prefer; since these tests require heavy mocking, much of the text is just noise. Also, they often fail for unimportant reasons, like timing issues and incomplete mocks. On the other hand, unit tests sometimes get a bad rap for requiring code restructures, usually into more or smaller abstraction layers. At first glance, it can seem like they make the application code more complex. But we didn’t need to read the application code at all! For us, it’s best if previous developers optimized for simple, easy-to-interpret unit tests. If they didn’t, we’ll have to find these issues manually. One strategy is to check git blame
on the lines we changed. Maybe the commit message, ticket, or pull request text will explain why the feature was previously written that way, and any regressions we might cause if we change it.
At no point in this process are comments useful to us. We may have passed some on the upswing, noting them down to address later. Any comments that are supposed to flag problems on the downswing are totally invisible—we aren’t guaranteed to find those areas unless they’re already flagged by an error or test failure. And whether we found comments on the upswing or through manual checking, they could be stale. We can’t know if they’re still valid without reading the code underneath them. If something is important enough to be protected with a comment, it should be protected with unit tests, build errors, or lint errors instead. That way it gets noticed regardless of how attentive future readers are, and it’s better protected against staleness. This approach also saves mental bandwidth when people are touching nearby code. Unlike standard comments, test assertions only pop when the code they’re explaining has changed. When they’re not needed, they stay out of the way.
Clean Up
Having mostly skipped the reading phase, we now have plenty of time to polish up our code. This is also an opportunity to revisit areas that gave us trouble on the downswing. If we had to read through any code manually, now’s the time to fix that for future (non)readers.
Update the Linter
If we need to enforce a standard practice, such as using a specific library or a shared pattern, codify it in the linter so future developers don’t have to find it themselves. This can trigger larger-scale refactors, so it may be worth spinning off into a separate changeset.
Lean on the Type System
Wherever practical, we turn primitive types (bools, numbers, and strings) into custom types, so future developers know which methods will give them valid outputs to feed into a given input. A primitive like timeInMilliseconds: number
is more vulnerable to mistakes than time: MillisecondsType
, which will raise a build error if it receives a value in SecondsType
. When using enums, we enforce exhaustive switches, so a build error will appear any time a new case may need to be handled.
We also check methods for any non-independent arguments:
- Argument A must always be null if Argument B is non-null, and vice versa (for example,
error/response
). - If Argument A is passed in, Argument B must also be passed in (for example,
eventId/eventTimestamp
). - If Flag A is off, Flag B can’t possibly be on (for example,
visible/highlighted
).
If these arguments are kept separate, future developers will need to think about whether they’re passing in a valid combination of arguments. Instead, we combine them, so the type system will only allow valid combinations:
- If one argument must be null when the other is non-null, combine them into a tagged union:
{type: 'failure'; error: ErrorType} | {type: 'success'; response: ResponseType}
. - If two arguments must be passed in together, nest them into a single object:
event: {id: IDType; timestamp: TimestampType}
. - If two flags don’t vary independently, combine them into a single enum:
'hidden'|'visible'|'highlighted'
.
Optimize for Simple Unit Tests
When testing, avoid entanglement with UI, disk or database access, the network, async code, current date and time, or shared state. All of these factors produce or consume side effects, clogging up the tests with setup and teardown. Not only does this spike the rate of false positives, it forces future developers to learn lots of context in order to interpret a real failure.
Instead, we want to structure our code so that we can write simple tests. As we saw, people can often skip reading our application code. When test failures appear, they have to interact with them. If they can understand the failure quickly, they’re more likely to pay attention to it, rather than adjusting the failing assertion and moving on. If a test is starting to get complicated, go back to the application code and break it into smaller pieces. Move any what code (code that decides which side effects should happen) into pure functions, separate from the how code (code that actually performs the side effects). Once we’re done, the how code won’t contain any nontrivial logic, and the what code can be tested—and therefore documented—without complex mocks.
Trivial vs. Nontrivial Logic
Trivial logic would be something like if (shouldShow) show()
. Something like if (newUser) show()
is nontrivial (business) logic, because it’s specific to our application or feature. We can’t be sure it’s correct unless we already know the expected behavior.
Whenever we feel an urge to write a comment, that’s a signal to add more tests. Split the logic out into its own unit tested function so the “comment” will appear automatically, regardless of how carefully the next developer is reading our code.
We can also add UI and integration tests, if desired. However, be cautious of the impulse to replace unit tests with other kinds of tests. That usually means our code requires too much reading. If we can’t figure out a way to run our code without lengthy setup or mocks, humans will need to do a similar amount of mental setup to run our code in their heads. Rather than avoiding unit tests, we need to chunk our code into smaller pieces until the unit tests become easy.
Confirm
Once we’ve finished polishing our code, we manually test it for any issues. This may seem late, but we’ve converted many runtime bugs into lint, build, and test errors. Surprisingly often, we’ll find that we’ve already handled all the edge cases, even if we’re running the code for the first time.
If not, we can do a couple more passes to address the lingering issues… adjusting the code for better “unread”-ability as we go.
Tip
Sometimes, our end goal really is to read the code. For example, we might be reviewing someone else’s code, verifying the current behavior, or ruling out bugs. We can still pose our questions as writes:
- Could a developer have done this accidentally, or does the linter block it when we try?
- Is it possible to pass this bad combination of arguments, or would that be rejected at build time?
- If we hardcode this value, which features (represented by unit tests) would stop working?
JM Neri is a senior mobile developer on the Shop Pay team, working out of Colorado. When not busy writing unit tests or adapting components for larger text sizes, JM is usually playing in or planning for a TTRPG campaign.
Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.