Why I avoid test stubs in TypeScript
stub /stʌb/ (verb)
- To replace a function on an object or class with a test-only implementation. See sinon.js for a well-written example.
- To leave an impediment to future engineers that they are likely to "stub" themselves on. Related topics: footgun, anti-pattern, spies, testing.
The etymology of "stub" is disputed, but a common theory is that it was from "stubbing one's toe," and that the reaction of many engineers to encountering one involves a similar release of vile imprecation.
— Precepts, Principles, Paradigms, and Parlance of Programming for Plaudited Pairing Partners, Third Edition
I've been burned by test stubs enough times that I have a visceral dislike of them in JS/TS, and I try to avoid them whenever I can. They are neither legible nor explicit; when you have stubbed code from within a test, there's no indication in source code that your function has been stubbed.[1]
Starting off, I think it's illustrative to think about how a stub actually works. In JavaScript, stubbing a function requires mutating an object:
function stub (obj, fnName, stubFn) {
const originalFn = obj[fnName];
obj[fnName] = stubFn
const undoStub = () => obj[fnName] = originalFn;
return undoStub;
}
function doThing (userId) {
return OtherModel.fn(userId);
}
const MyModel = { doThing }
it("does the thing", async () => {
const restore = stub(MyModel, "doThing", (userId) => "hello, world!");
try {
doTestStuff();
} finally {
restore(); // in a real test-suite, this would be in a global hook that runs after every test
}
});
One common source of confusion is that the stub will only work if you call your function by referencing the object that has the stubbed function. If you have a function, doThing
, that is sometimes called directly and sometimes as Model.doThing
, stubbing the module as a whole will not replace all of the calls with your stubbed function. A stub does not replace a function; it mutates an object to point to a new function. (Spies do essentially the same thing.)
More pernicious is that there's no indication in source code that that object has been temporarily mutated by tests. Whenever you mutate a long-lived object, it makes code hard to understand, and stubbing is no exception; I've spent more time than I like to admit trying to trace through why a particular function was returning a strange value for a test before realizing the function was stubbed. And I'm not alone—when a codebase uses stubs, I've often found myself helping other engineers debug similar issues. Careless use of stubs that doesn't specify the exact arguments for which the stub should provide an override can make things even worse.
In the cases where a stub is unavoidable, I advocate for "explicit stubbing": adding source code to replace the logic for tests. For our doThing
function, it might look something like this:
let testOnlyOverrides = {}
export function setTestOnlyOverrides (obj) {
testOnlyOverrides = obj;
}
function doThing (userId) {
if (isTesting() && testOnlyOverrides[userId]) {
return testOnlyOverrides[userId]
}
return OtherModel.fn(userId);
}
At first glance, this looks terrible: we took our clear source code and polluted it with test-specific logic. Why on earth would I think that the manual stub is better?[2]
- It is legible and explicit. It's immediately obvious to someone looking at this code that it might be stubbed during a test.
doThing
will always be stubbed, even if it is called directly.stub(MyModel, "doThing", fn)
only stubsMyModel.doThing()
and notdoThing()
.
I'm happy to have test-specific logic in my source code if it makes testing easier and more explicit, so when I do need to use a stub, that's how I do it. And in cases where a stub is required, I'll often look to see if it makes sense to turn that stub into a general testing utility.
Should you stub databases calls?
One common place where people regularly use a test stub or mock is to replace a database in tests, and I believe that to be a huge mistake. The database is a key piece in a backend system, and one of the main things I want to get out of my tests is surety that my queries actually work.[3] Comprehensive fixture data makes both development and testing faster, safer, and easier. Stubbing database queries is a spot where I believe stubs aren't necessary and are actively harmful.
Stubs often point to areas where we should build test utilities
One of the main reasons I find stubs in testing code frustrating is that they are often used in situations where a shared testing utility would be more appropriate. An engineer will use a test to solve a "one-off" problem of setting a feature toggle value, checking whether a push is sent appropriately, or forcing a third-party dependency to return the value that we'd like. But many of these situations will crop up over and over again in a code-base, so we miss out on an opportunity to simplify our testing setup. Let's talk through a few cases.
One common case I see is controlling a feature toggle[4] deep within the bowels of a function:
function doThing (userId) {
await someLogic();
if (!(await userMatchesCriteria(userId))) {
return;
}
if (await getFeatureToggleValue("my_toggle", userId) === "test") {
sendEmailTest(userId);
} else {
sendEmail(userId)
}
await someMoreLogic();
}
We have a few different options here:
- We could make the feature-toggle value an optional argument of the
doThing
function:const featureToggleValue = shouldSendEmail ?? await getFeatureToggleValue("my_toggle", userId)
. This works, but adds an extra option to the function that callers need to worry about.[5] - We could use the "explicit stubbing" pattern I discussed above within
getFeatureToggleValue
to create a reusable utility that makes it simple to set a feature toggle value for any particular user for the duration of the test:setFeatureToggleForTests("my_toggle", userId, "test")
. - We could even build a utility to mutate the feature toggle for the duration of the test. This will marginally slow down the test because we're adding two database writes to set and reset the feature toggle, but it makes our test behavior more closely match our production behavior. We may even discover that the user that we're testing with doesn't meet some of the criteria to be included in the audience for this test, which could surface a problem that we might face in production.
Another common use-case for stubs or spies is to test side-effects of a function: email sends, pushes, work that we enqueued, or some other side-effect:
- We can set up automated snapshots for side-effects. We'll still want to write tests to exercise the side-effects in some cases, but having automated snapshots for emails or pushes can highlight inadvertent changes to logic and simplify the test code that we need to write.
- In cases where we have complex logic for side-effects, I often like to separate out the function that decides what email template or job to enqueue from the function that actually does the work. Rather than having a function like
do_thing_and_then_send_one_of_three_emails
, I'd much rather be able to test aget_email_to_send
function. - It some cases, it may make sense to have a function return a description of the work that it enqueued.
- And in some cases, it may make sense to set up explicit stubbing/spying of the function the way that I discussed above. But in this case, I'd want to create a nice, reusable utility:
expectSentEmailsToBe([{templateName: "my_email", email: "foo@bar.com" }])
.
These shared utilities will make it easier to write and maintain our tests, so when I see sinon.stub(model, "method", fn)
, it jumps out to me as a sign that a test might be hard to maintain for the long haul. It means that that test has hidden logic that won't be visible within source code, and it gives me an indication that the codebase is missing important testing utilities that will make it easier to write future tests.
For the purposes of this post, I'm talking about the stubbing pattern of mutating an object/class/export to replace one function with a different one. I have less of a problem with "stubs" that live in source code—for the purproses of this post, I'm calling those stubs "explicit stubs." ↩︎
You can use stubbing or mocking libraries for these manual stubs; my objection isn't about the libraries, but about the practice of not giving in-source indications that a reader needs to be aware that a particular function call will sometimes be stubbed.
Sinon is one of the main JS libraries that people use for stubs/spies, and it seems like a well-written library. The docs are clear and there are appropriate functions for everything that I'd want a library like that to do.
If you do use
sinon
, I’d recommend using.callsFake
orwithArgs
to be explicit about which call you’re replacing. That way, you won’t unintentionally override a code-path that you didn’t intend to. ↩︎I wrote up some notes about how we keep this style of testing fast in 18,957 tests in under 6 minutes: ClassDojo's approach to backend testing. ↩︎
As an aside, I strongly recommend against boolean feature toggles because as soon as you start wanting to run experiment, you need to be able to distinguish between a minimum of three values: test, control, and not-in-audience/not-turned-on-yet. ↩︎
In general, I think optional options are a huge anti-pattern. It makes it to easy for a user of a function to make incorrect assumptions about the behavior of a function that they're calling because they haven't looked through all of the options. ↩︎