Double Trouble — why We Decided Against Mocking

Published in

Talkdesk Engineering

9 min readFeb 24, 2020

“So what if We Mock?”

Source: Wikimedia — (Image retrieved from https://bit.ly/2P2jPHd)

There’s a well-known science joke about a dairy farmer who asks a theoretical physicist to help him solve his farm’s low milk production problem.

The physicist returns with an answer that has a caveat: “it only works for spherical cows in a vacuum.”

This illustrates an essential part of both physicists’ and programmers’ jobs: the concept of abstraction.

All of the software building and the good practices that come with it hinge on correctly and progressively, abstracting from the details of a given problem.

And so it stands to reason that we should also abstract from the implementation details on our tests. A good unit test should be repeatable and easy to implement, so why should we care about the gritty details that reality casts upon us?

That’s what test doubles are for: if we were, say, testing a class that milks cows, we could, instead of using an actual cow, just use a sphere that simulates the ability to be milked. It would be much simpler, with way fewer chances of the cow kicking you violently, and so no one would condone testing on animals.

The flip side is, of course, you have no guarantee an actual cow won’t kick the milk bucket in production — and there’s no use crying over spilled milk.

Some Cowntext

We are sorry if we mislead anyone, but this is not an article about cattle, spherical or otherwise, but about software testing. Particularly about unit testing and the over-reliance on test doubles, topics that arose within the Atlas team at Talkdesk.

Atlas is a platform we’re building to make our development quicker and more flexible by using a micro-frontend approach, as well as providing a more customizable and easily extensible experience to our end customer.

We generally use an OO paradigm and most of our codebase is written in plain (if you consider ES6 plain) JavaScript, atop a React frontend.

Even though we would strongly argue if the underlying rationale of the article remains valid, keep in mind our starting point: JavaScript is a dynamic loosely typed language. And dynamic, loosely typed non-cows are the hardest to manage. But let’s get to business.

SUT up: Some Unit Test Concepts

So what is a unit? It’s not a simple answer as it may seem but, if it helps, you can define it as the smallest testable piece of software of the System Under Test (SUT).

In the context of OO programming, it often refers to the class under test, and for the purposes of this post, it’s enough to think of it as such.

Test doubles consist of any code that stands in for the real implementation in a test. Just like movie stunt doubles, they take the fall for you.

Though the word Mock is often used interchangeably with test doubles, they’re only one of several flavors. Mocks provide a given behavior depending on a set of expectations, but test doubles can also refer to alternatives. Just to name a couple, such as stubs (functions that provide a canned answer regardless of the call value — you can read more on the difference between these and mocks by the inevitable Martin Fowler), or spies (let’s define these simply as function wrappers that provide information on function calls).

Many traits separate the wheat from the chaff in unit tests, and many articles pinpointing them, saying the same with different wording. We can sum it all up: unit tests need to be reliable, maintainable and readable. And while mocking and using test doubles may look helpful, they can also make it more difficult as your codebase, the number of unit tests and your tech debt grows.

Their initial convenience made us over-dependent on them. We were building Atlas from scratch and while testing new code we gladly leaned on mocks.

Only when we decided to do a big refactor we realized that we had made such extensive use of test doubles that they became unreliable. When you refactor you alter the code, but the mocks are impervious to that change and keep deprecated interfaces. Our tests lost relevancy because they didn’t depend on the actual code, but on a test double version of it — they worked for spherical cows in a vacuum.

That’s when we agreed to do something about it, and also when we found out we were embroiled in a long-standing battle, a centuries-old feud that makes the conflict between Lannisters and Starks look like child’s play. We can say it is the dispute between the Capulets and the Montagues of Test Driven Development (TDD).

The Classicists vs. the Mockists

Ok, we may be overselling the rivalry a bit, but there are two schools of TDD, usually known by the cities where they were supposedly created: London and Detroit.

The London style is top-down: you start building your software from the most generic component and then refine the abstractions you depend on.

If you develop using the London TDD style, you have no choice but to mock your dependencies on your tests: what you’re testing depends on code that has not been written yet.

Meet the contender representing London, the Mockist Sandi “Ruby” Metz!

In a talk at 2013’s RubyConf, Sandi gave several simple and straight pointers on how to write good unit tests.

One such aphorism we particularly like is “be a minimalist.” We recommend watching the talk if you haven’t before, but in case you’re in a rush, it can be summarized by this table:

(Image retrieved from https://youtu.be/URSWYvyc42M)

What Sandi means by minimalism is: don’t test what you don’t need to be testing.

From an OO perspective, objects communicate with each other via messages. A Mockist would say: “only care for the messages! Test the message, not the messenger!”

Sandi says you can ignore any of the messages that do not have an impact outside of your unit. Because remember, your unit is already the smallest testable part of your code.

For everything else, you can replace your dependencies with mocks because all you care about are the messages they exchange, i.e. that they respect their defined interface.

But could we be minimalistic with our mocks? The Detroit style of TDD works bottom-up: you start building and testing the smallest components and then develop more complex components from those smaller parts.

Unlike the London style, your dependencies are already there when you start testing, so you don’t need to mock them. In fact, there’s no need to mock anything! Right? Riiiight?

More or less. Meet the contender representing Detroit, the Classicist Robert “Uncle Bob” Martin!

What Uncle Bob suggests is to mock across “architecturally significant boundaries.” This quote is the knockout move when it comes to the use of test doubles.

Tests should be repeatable and quick to run so you should never depend on a faulty server connection or access to a database.

But you shouldn’t need to mock every class you depend on either, especially if you’re responsible for maintaining both and they are tightly coupled.

Clean up Your Act

What we found out, as we rewrote our tests to make less extensive use of test doubles (and we’re hardly the first ones to point this out), is that the need for mocks is actually a code smell.

Mocks represent the dependencies of your unit. So if the reason you want to write mocks is that your unit has a lot of dependencies, or your unit’s dependencies have a lot of dependencies, then mocking is not the way to go.

If you want to simplify your tests, don’t use doubles — rewrite your code instead.

Ever since we decided to adopt an “avoid mocks” mindset, we have been finding opportunities to make our code better.

We’ll give you a recent example, in which we were adding a Menu class to an already messy bundle of interdependent units:

An arrow represents the dependency; in yellow the last class to be added (diagram by Talkdesk)

To test the Menu class in a mock-free way, we would have had to instantiate not only Navigation but all its dependencies.

That is a hard test to build (the diagram is already a simplification of reality), and an easy solution would be just to mock Navigation — the mock can hide all the other dependencies, and the Menu itself only depended on Navigation.

However, that would be sweeping tech debt under the rug. Bad code is rarely the result of one day of bad programming by a single coder. More often it is the bastard child of small increments by a growing team of well-intentioned programmers that progressively lose track of the overall picture.

It may be because they don’t feel confident rewriting the existing code or because the problem was not apparent.

But avoiding test doubles will make the problem visible and give you more confidence to refactor old code.

Instead of mocking Navigation, we decided to figure out what those classes had in common and use an independent state store to manage it.

So, in the end, our dependency diagram looked like this:

Now, hello handsome (diagram by Talkdesk)

Now the only class we had to mock, no matter which unit we were testing, was the Location State Store. It’s a pretty dumb class that has only one job: to be the source of truth of the common location state.

Immediately our tests became a lot simpler. Even without relying on mocks, in the long run, this code will become easier to extend, debug and maintain.

Avoiding test doubles forces our units to have a single and well-defined responsibility — no test doubles means bad code will be twice as hard to test.

It also makes us feel safer when refactoring it because when we change any given unit, the tests of the other units that depend on it will also break until we fix them as well.

It is an art

And by art we mean, there’s no strict rule that can replace experience.

Here at the Atlas team, we are fans of the Kent C. Dodds’ React Testing Library. He wrote: “write tests, not too many, mostly integration.”

The sentiment that drove the decision to replace Enzyme with Kent C. Dodds’ library is the same that motivated this article. By avoiding mocks, we are indeed blurring, if ever so slightly, the line between integration and unit tests.

With Enzyme you’re testing bits of React (using shallow rendering), rather than testing the actual code React cobbles together, and your application runs on.

The same way, when using mocks, you’re making your tests hinge on a mere reflection of the actual code it depends on.

You can change the behavior or the interface of a class, but if you forget to keep all the mocks consistent, its dependencies will still pass the tests, because they are being tested against a mock. You get false positives.

We are not, however, about to dismiss all mocks: we’re striving for balance.

With experience, meaning art, Classicists and Mockists will come to the same conclusion. Test doubles sometimes are necessary, sometimes they are a pain.

You can read Kent C. Dodds advocating for mocks, reminding us that “when you mock something, you’re making a trade-off”, and you can watch Sandi Metz (in the aforementioned talk) sharing how hard it is to maintain mocks and avoiding API drift. Whatever side you end up choosing, be smart about it.

Keep this in Mind

We prescribe reduced use of test doubles while recognizing they are important. We’ll stress again Uncle Bob’s suggestion of mocking across “architecturally significant boundaries.”

Take this into account:

Use the full extent of existing test doubles. Not only mocks but stubs. And very often spies. Use the simplest and least compromising test double that does the trick.
Mock network calls, database accesses, external libraries whose behavior you don’t want your SUT to depend strictly on. Mock anything that is unreliable or will make your tests run for too long.
If your test is getting too complicated to write, it may be a code smell. Take a look at your code, check if there’s any fault to it, and demand a refactor — whether it is adhering to SOLID principles, for example. But then, if you’re convinced the problem is not architectural, use test doubles.

The cost of writing tests should not dwarf what you get from them, and hopefully, unit tests are only a layer of the stack of tests you’re running. Just be wary of spherical cows.