If you're writing a CRUD app and mocking your database calls instead of just starting an actual Postgres instance before running the tests, you're probably using mocking wrong.
If you're writing a custom frontend for GitHub using the GitHub API and don't bother writing a decent set of mocks for how you expect the GitHub API to behave, your app will quickly require either full manual QA at best or become untestable at worst. Some APIs are very stable, and testing against the API itself can hit rate limiting, bans, and other anti-abuse mechanisms that introduce all kinds of instability to your test suite.
> If you're writing a custom frontend for GitHub using the GitHub API and don't bother writing a decent set of mocks for how you expect the GitHub API to behave, your app will quickly require either full manual QA at best or become untestable at worst. Some APIs are very stable, and testing against the API itself can hit rate limiting, bans, and other anti-abuse mechanisms that introduce all kinds of instability to your test suite.
I've been doing E2E testing using 3rd-party APIs for a decade now, and this has yet to be a significant problem. The majority of my APIs had a dedicated sandbox environment to avoid "rate limiting, bans, and other anti-abuse mechanisms". The remainder were simple enough that the provider didn't care about users exploring on the live API, and were usually read-only as well.
Did I run into the occasional flaky failure, or API stability issues? Sure. But it was very rare and easy to workaround. It never devolved into becoming "untestable" or "full manual QA"
My other teams that relied on mocks suffered from far worse problems - a ton of time being spent on manual-QA, and bugs that leaked into production, because of mock-reality mismatches.
There are plenty of libraries out there, like VCR, that can set up a test and then save the response for future test runs. You don't really have to renew them that often either.
That was always the go-to for me when testing against 3rd party services, especially because the tests would then survive the offboarding of the engineer who set them up with their personal credentials.
If your test suite relies on live Github PATs or user-specific OAuth access tokens, then you can either figure out how to manage some kind of service account with a 'bot' user, or live with things breaking every time someone leaves the org.
Services that incur a per-request charge, or consume account credits, are another problem. Especially if they don't have sandboxes.
I have a custom frontend for GitHub using the GitHub API (https://github.com/fastai/ghapi/) and don't use mocks - I test using the real API. I've had very occasional, but not enough to ever cause any real issues.
I don't find mocks for this kind of thing very helpful, because what you're really testing for are things like changes to how an API changes over time -- you need real API calls to see this.
Yeah, even if there's no sandbox mode, a separate sandbox account will usually do. Sometimes this catches misuse that would've caused rate-limiting in prod. And if a service makes this hard, maybe you shouldn't use it in prod either.
When you write tests with mocks you almost always at some point end up with tests that test your mocks lol, and tests that test that you wrote the tests you think you wrote -- not the software itself.
I’ve never been thrilled by tests that rely on mocking — it usually means you need to re-express your module interface boundary.
Mocks for me fall into the class of software I affectionately call “load-bearing paint.” It’s basically universally the wrong tool for any given job but that really doesn’t stop people. Putting in a data class or similar model object and a delegate is usually sufficient and a much better tool.
> It’s basically universally the wrong tool for any given job but that really doesn’t stop people.
I find mocks useful for testing conditions that are on the periphery and would be a decent amount of trouble to set up. For instance, if I have a REST controller that has a catch all for exceptions that maps everything to a 500 response, I want a test that will cause the DAO layer to throw an exception and test that the rest of the "real stack" will do the translation correctly. A mock is the easiest way to accomplish that.
I agree. I will mock 3rd party APIs sometimes so I can test that my system correctly handles failures. For example, what if I get a 500 response from the API? With my mock I can easily make that happen. If I was using the actual API, I would have no way of forcing a 500 to happen.
I agree that if you need to write mocks, it's likely that your interfaces are poorly defined. This is one of the claimed benefits of test driven development - writing the tests first forces you to design the code in a way that cleanly separates modules so they can be tested.
That’s… not true? No matter how you define your dependencies to inject, if you want to mock the dependencies you inject you have to mock them (it’s almost tautological), no matter if you use dependency inversion or not
Maybe you mean "less surface to mock", which is irrelevant if you generate your mocks automatically from the interface
> We can say that a Mock is a kind of spy, a spy is a kind of stub, and a stub is a kind of dummy. But a fake isn’t a kind of any of them. It’s a completely different kind of test double.
If you want. Replace mock by fake in my post and you get the same thing. It means I’m using fakes too when I’m not doing dependency inversion, so no mocks either
For what it’s worth, I find the distinction between mocks, fakes, spies, stubs, dummies and whatever completely useless in practice, the whole point is to control some data flows to test in isolation, it’s really all that matters.
Fun thing this kind of bikeshedding comes from Bob Martin, who famously has nothing worthwhile to show as an example of his actual work. Every time I read something from him, I get more and more convinced he’s a total fraud.
If words don't mean anything then of cause there is no difference. You might as well program in brainf*ck -- hey, it is turing complete, so same diff.
Personally, I like seeing "dummy" name passed into a function and understanding just from the name that no methods are expected to called on it during the test. Or seeing "fake_something " and understanding that a manual (perhaps test-specific) implementation is used. It is a minor point but such chunking is universally useful (there is enough stuff to keep track of).
Re. postgres, this is actually something I have always struggled with, so would love to learn how others do it.
I’ve only ever worked in very small teams, where we didn’t really have the resources to maintain nice developer experiences and testing infrastructure. Even just maintaining representative testing data to seed a test DB as schemas (rapidly) evolve has been hard.
So how do you
- operate this? Do you spin up a new postgres DB for each unit test?
- maintain this, eg have good, representative testing data lying around?
Docker Compose is a super easy way to run Postgres, Redis, etc. alongside your tests, and most CI platforms can either use a Compose file directly or have a similar way of running service containers alongside your tests. Example: https://docs.github.com/en/actions/using-containerized-servi...
Typically you'd keep the database container itself alive, and you would run the schema migrations once at startup. Then your test runner would apply fixtures for each test class, which should set up and tear down any data they need to run or that they create while running. Restarting the database server between each test can be very slow.
The test data is a harder problem to solve. For unit tests, you should probably be creating specific test data for each "unit" and cleaning up in between each test using whatever "fixture" mechanism your test runner supports. However, this can get really nasty if there's a lot of dependencies between your tables. (That in and of itself may be a sign of something wrong, but sometimes you can't avoid it or prioritize changing it.)
You can attempt to anonymize production data, but obviously that can go very wrong. You can also try to create some data by using the app in a dev environment and then use a dump of that database in your tests. However, that's going to be very fragile, and if you really need hundreds of tables to be populated to run one test, you've got some big problems to fix.
Property-based testing is an interesting alternative, where you basically generate random data subject to some constraints and run your tests repeatedly until you've covered a representative subset of the range of possible values. But this can be complicated to set up, and if your tests aren't fast, your tests can take a very long time to run.
I think at the end of the day, the best thing you can do is decouple the components of your application as much as possible so you can test each one without needing giant, complicated test data.
We use TestContainers for this, and it's superb. It's a full instance of the DB, started for each unit test, running inside a docker container. TC does smart things to make sure it doesn't slow the suite too much.
we have the same strategy for testing against Kafka., etc.
where we care about data, we seed the db with data for a specific group of tests. Otherwise, we just nuke the db between each test.
Prior to doing this, we'd use in-memory db for tests, and a real db for runtime, using JPA / Hibernate to make things transferrable. But this was leaky, and some things would pass in tests then fail at runtime (or vice versa)
TestContainers has been so much better, as we're running against a real version of the database, so much smaller chance of test and runtime diverging.
TestContainers, or just assume there is a postgres running locally.
> - maintain this, eg have good, representative testing data lying around?
This can be tricky, but usually my advice is to never even being trying to do write seed data in the database unless its very static. It just gets annoying to maintain and will often break. Try to work out a clean way to setup state in your tests using code, and do not rely on magic auto increment ids. Some of the more effective ways I have found is to f.ex. have every test create a fresh customer, then the test does work on that customer. Avoid tests assuming that the first object you create will get id == 1, makes it very annoying to maintain.
> operate this? Do you spin up a new postgres DB for each unit test?
Generally I've seen a new database (schema in other dbs?) in postgres that is for testing, i.e "development_test" vs "development". The big thing is to wrap each of your tests in a transaction which gets rolled back after each test.
> maintain this, eg have good, representative testing data lying around
This is much harder. Maintaining good seed data - data that covers all the edge cases - is a large amount of work. It's generally easier to leave it up to each test to setup data specific to their test case, generalizing that data when possible (i.e if you're testing login endpoints, you have all your login test cases inherit from some logic specific data setup, and they can tweak as needed from there). You will end up with duplicated test setup logic. It's not that bad, and often you don't really want to DRY this data anyways.
That being said, if you have the time and resources to maintain seed data it's absolutely a better way to go about it. It's also beneficial outside of tests.
if random users have creds to touch the prod database at all, much less delete data / drop tables, you had a big problem before you were running tests.
You've made multiple assumptions here that couldn't be further from reality. You don't have to use production for testing or share test credentials between environments for automated tests to exploit an unintended environment, you just have to forget to clean up a `.env` file after testing a hotfix.
... is that good? Hell no. But it's a much more common version of reality than you're assuming is happening.
What you’re saying is tautological: your assumption is that tests were running in production, and your evidence is that somebody accidentally ran tests in production. It reads a bit like “Oh my, an accident? Why’ve you done that.”
Anyway, the missed point is that you can’t just do anything in tests and just expect the best of intentions to ensure it doesn’t backfire. One must also consider the security situation, infrastructure, secrets management, shared environments, and more. It’s not as simple as just plopping down a test database and expecting everything to go smoothly. You wouldn’t be that careless with other things, so don’t do it with tests, and don’t rely on “don’t run tests in production” as your only safeguard.
Do not delete develoment_test on your tests, it's supposed to be stable on your machine.
But, the one important thing is, do not give people direct access to production. And for the few that must have it, it should not be easy to connect to it.
I have tried various approaches and here's what worked best, assuming that there is some natural way to partition most of the data (e.g. per account):
1. Init the DB with some "default" data - configuration, lookup tables, etc
2. Each test in the test suite owns its data. It creates a new account and inserts new records only for that account. It can for example create users on this account, new entities, etc. It can run multiple transactions, can do rollbacks if needed. It is important to only touch the account(s) created by the test and to avoid touching the initial configuration. There's no need to clean up the data after the test finishes. These tests can run concurrently.
3. Create a separate integration test suite which runs sequentially and can do anything with the database. Running sequentially means that these tests can do anything - e.g. test cross-account functionality, global config changes or data migrations. In practice there aren't that many of those, most tests can be scoped to an account. These tests have to clean up after themselves so the next one starts in a good state.
Other approaches had tons of issues. For example if each test is wrapped with a transaction which is later rolled back then testing is very limited - tests cannot use transactions on their own. Savepoints have similar issue.
At several places I worked at, we would snapshot the production DB, and use that for testing. You cannot get more ”real-world“ than that. We would also record real requests, and replay them (optionally at increased speed) for load testing.
Obviously, there are some caveats, e.g.:
* While this approach works perfectly for some tests (load testing, performance testing, …), it does not work for others (e.g. unit testing).
* You have to be careful about PII, and sanitize your data.
I run a replicated copy of the production database on top of zfs and snapshot it before starting tests. PostgreSQL takes a few seconds to start on the snapshot and then you're off to the races with real production data. When the test suite finishes, the snapshot is discarded. This also ensures that migrations apply correctly to the production db before an actual prod is used.
I feel that trying to maintain "representative testing data" is generally not a good idea; set up the data you want/need in the test instead.
Just run PostgreSQL on your local machine, connect to that, setup a new schema for every test (fairly cheap-ish) inside a test database.
def Test1:
setupdb()
obj1 = createObj1()
obj2 = createObj2()
have = doStuff(obj1, obj2)
if have != want: ...
def Test1:
setupdb()
obj = createObj1()
have = doOtherStuff(obj1)
if have != want: ...
Creating reasonably scoped reasonably contained "unit-y tests" like this means you will actually be able to understand what is going on. Too often have I seen people set up huge wads of "mock data" and then run all their tests on this. Then Test1 does something Test2 doesn't expect and you're screwed. Or worse: Test42 does something that screws Test185. Good luck with that. Or you introduce a regression somewhere and now you've got tons of data to understand.
2. Your creation functions are well tested so that the rest of your tests can rely on them.
If you have spotty coverage or just poorly defined creation semantics, or it's a bunch of calls to functions all over the place just to set up your test data, then this doesn't work.
But the solution typically isn't "write a bunch of JSON mock test data", it's to solve those problems.
The ideal experience is that you anonymize prod and sync it locally. Whether it's for testing or debugging, it's the only way to get representative data.
When you write mock data, you almost always write "happy path" data that usually just works. But prod data is messy and chaotic which is really hard to replicate manually.
This is actually exactly what we do at Neosync (https://github.com/nucleuscloud/neosync). We help you anonymize your prod data and then sync it across environments. You can also generate synthetic data as well. We take care of all of the orchestration. And Neosync is open source.
i dont understand why everyone just doesn't do this unless they are working with really large volumes of test data. it literally takes a fraction of a second to mkdir, call pginit, and open a postgres socket.
idk if you've solved this, but PG doesn't like to bind to 0, so you have to manage ports. And I've had issues with processes sticking around if the test driver has crashed (I dont currently, but i'm turning off setsid in postgres).
Great answers below (test containers for example).
However, it’s not always possible.
For example:
- you use oracle db (takes minutes to start, license, hope the containers run on ARM fine, etc.)
- sometimes an in memory fake is just much faster, and can be an official db on its own for people to try the product
- your storage might be only available through a library by a third party provider that is not available locally.
I've been on teams where we've done this (very successfully in my opinion!) by creating helper code that automates creating a separate Postgres schema for each test, running all migrations, then running your test function before tearing it all down again. This all runs on CI/CD and developer machines, no credentials to any actual environments.
A major benefit of doing separate schemas for each test is that you can run them in parallel. In my experience, unless you have a metric ton of migrations to run for each test, the fact that your database tests can now run in parallel makes up (by a lot!) for the time you have to spend running the migrations for each test.
EDIT: usually we also make utilities to generate entities with random values, so that it's easy to make a test that e.g. tests that when you search for 5 entities among a set of 50, you only get the 5 that you know happen to match the search criteria.
Running all migrations before every tests can take you a surprisingly long way.
Once that gets a bit too slow, running migrations once before every suite and then deleting all data before each test works really well. It's pretty easy to make the deleting dynamic by querying the names of all tables and constructing one statement to clear the data, which avoids referential integrity issues. Surprisingly, `TRUNCATE` is measurably slower than `DELETE FROM`.
Another nice touch is that turning off `fsync` in postgres makes it noticeably faster, while maintaining all transactional semantics.
My integration tests expect the db to run. If I need fixture data, those are sql and read in at the start of the suite. Each test uses its own temp db/tables and/or clears potentially old data before running.
Firstly, I'm seeing all these answers that say spin up a new server, and I have to wonder "WTF?"
No need to spin up a new server, not in a container, not in a new directory, not at all. It's pointless busywork with too many extra points of failure.
Nothing is stopping you using an existing server and creating a new DB, which takes about 1/100th the time that starting up a new server (whether in Docker or otherwise) takes.
Secondly, I don't actually do unit-testing on the database layer - there's little point in it. I test workflows against the database, not units!
What I do is create multiple 'packages' of tests, each with multiple tests. A single 'package' creates a temp db, runs its tests sequentially and then drops the temp db. Each package will setup itself with SQL statements.
This lets the tests perform tests of actual workflows, instead of testing in isolation that an object can be (de)serialised. IOW, I can test that the sequence of `addUser(); setProfilePassword(); signIn(); viewInfo(); signOut();` work as expected, and that `removeUser(); signIn();` fail with the correct error.
They way we did this was basically separate readonly and read/write tests. All the readonly tests would use the same instance with seeded data in parallel, and the read/write tests would get their own databases per test.
In my tests spinning up a PG instance (ultimately just an `initdb` and `createdb` invocation, loading a schema and test data (`psql`), running the test, and tearing down the PG instance is quite fast.
> So how do you
> ...
> - maintain this, eg have good, representative testing data lying around?
This one can be very easy, depending on the kind of data you're working with. Many places shall simply dump a part (or the whole if it's not too big) of the production DB into dev and pre-prod environments.
Now if there are sensitive, non-encrypted, data that even the devs cannot see, than it can get tricky (but then arguably they cannot see the logs in the clear either, etc.).
But yeah: a recent dump of the prod DB is good, representative data.
I've worked at places where pre-prod had a daily dump of the prod DB. Simple.
> If you're writing a CRUD app and mocking your database calls instead of just starting an actual Postgres instance before running the tests,
Actually that's wrong too. The production database will be different than the "testing Postgres instance", leading to bugs.
It turns out that whatever testing solution you use, if it's not the actual production instance and you're not using real production data, there will be bugs. Even then there's still bugs.
This is the simple truth: you can't catch all the bugs. Just put in Good Enough testing for what you're doing and what you need, and get on with life. Otherwise you will spend 99% of your time just on testing.
> The production database will be different than the "testing Postgres instance", leading to bugs.
It never happened to me to be honest. This reads an argument for "if you can’t do perfect, just do it badly" but it’s nonsense. Running tests against a local Postgres instance with the same major.minor version and same extensions as your prod instance WILL work.
And testing your storage layer against the database is probably the most reliable safety net you can add to an app.
> Running tests against a local Postgres instance with the same major.minor version and same extensions as your prod instance WILL work.
A team I worked with recently said the same thing. But, as I predicted, they ran into bugs because the CloudSQL Postgres was different than their Dockerized Postgres, even though it was the same core version.
There will always be testing problems you can't anticipate. Especially with systems that are not your own code. Just be ready to adapt your testing when it doesn't work as expected, and don't invest too much in the testing if it's not worth it.
I’ve been ready for 5 years as it’s the duration I’ve been testing my storage layers with 90% integration tests (the 10% being the hard to reproduce error cases, these tests have low value but are easy to test with mocks so I still test them). The only issue I’ve encountered was with time zones (shocking), and it made me ensure I got full control of the time zones in my app, in my deployment, in my local Postgres and in my prod Postgres, so net benefit in the end.
I don't remember them all off the top of my head, but I do remember:
- Access to different parts of the database are limited in CloudSQL since it's a managed database. This makes some features of automated tooling (like migrations) not work on CloudSQL (i'm not saying migrations don't work, i'm saying some features do work, some don't). Sometimes elevated permissions can fix it, but some aspects are just walled off.
- There are schema differences from the stock Postgres (I don't remember specifics). No support for custom tablespaces.
- Import/export can often lock up a CloudSQL instance, whereas it might work fine on a local instance.
- You only get Read Committed transaction isolation.
- Operation is different (load, logging, replication, nodes, etc) and this affects performance, and performance can lead to bugs if there's an expectation of a certain run time or performance that doesn't match up with the development experience. Often times some job will have run faster on a laptop than in the cloud, and that leads to weird corner cases where production has an issue due to weird performance and it has to be triaged in the cloud and a fix applied to avoid it. Performance issues don't sound like a bug, but if you have to change your code to make it work in prod, it's a bug.
- To access CloudSQL they want you to use a proxy app, and that can have inconsistent results compared to a dev connecting directly to a Postgres instance. Even something as simple as handling reconnects is often not considered when all you have is a local instance that never needs reconnecting.
- There's a limited selection of Postgres extensions supported. And I could be wrong but I think the version of the extensions is pinned for each version of Postgres core used in CloudSQL.
To all of this you might reply "well none of that affects me", and that's fine... but it does affect other people, and that's important to note when you're telling people on the internet there will be no problems.
One of the nice things about the .NET ORM EntityFramework is that you can swap a mocked in-memory database for your prod DB with dependency injection, so without modifying your code at all and theoretically without affecting the behavior of the ORM. Which is to say, you're right, it's about using the right tools. Those tools of course vary by ecosystem and so in some cases mocking the database is in fact the correct decision.
Probably the single most obnoxious production defect I ever found related to a database would never have made it into production if we had been using a real database instead of a test double. It happened because the test double failed to replicate a key detail in the database's transaction isolation rules.
After figuring it out, I swapped us over to running all the tests that hit the database against the real database, in a testcontainer, with a RAM disk for minimizing query latency. It was about a day's worth of work, and turned up a few other bugs that hadn't bit us in production yet, too. Also sailing past our test suite because the test double failed to accurately replicate the behavior in question.
Total time to run CI went up by about 10 seconds. (For local development you could chop that way down by not starting a fresh server instance for every test run.) Given how many person-hours we spent on diagnosing, resolving, and cleaning up after just that first defect, I estimated the nominally slower non-mocked tests are still a net time saver if amortized over anything less than about 50,000 CI runs, and even then we should probably only count the ones where an engineer is actually blocking on waiting for the tests to complete.
That said, there was a time when I thought test doubles for databases was the most practical option because testing against real databases while maintaining test isolation was an unholy PITA. But that time was 5 or 6 years ago, before I had really learned how to use Docker properly.
I simply don't think that I will ever be able to come up with anything even vaguely as comprehensive as the test coverage that Microsoft already has for ensuring their ORM behaves consistently across database providers. In my over 10 years of using EF, I have never once encountered a database bug like you describe. If I were to discover such a bug (which I'll admit does occasionally happen even though it hasn't happened to me), it would be easier and better by far to submit an issue to the EF team and let them figure out a fix (including the appropriate tests) than it would be to rework my own test infrastructure. I am not in the business of developing requirements or code for databases, and building an elaborate test model for what I consider the essential requirements for a database would be a distraction from developing code that is more valuable to the business.
The same logic does not apply to all ORMs, of course, which do not all benefit from the same quality of professional dev support that EF receives from MS. But that's my main point from above: the correct design decision depends on the context. For services written in other languages with other ORMs or raw SQL, I absolutely will spin up a full Postgres test container because it is indeed trivial (have one running in the background on my laptop right now in fact). It just isn't necessary in the specific context of EntityFramework code.
I can think of a few things that will upset this particular apple cart, chief amongst them is the behaviour of different databases and sorting / collation which might not be generally categorised as the kind of bug a test suite will uncover, but certainly creates production bugs / issues.
I love EntityFramework, it's easily the best ORM I have ever used but it has a few cross-platform footguns that require testing against the actual database service you're using.
If you need to test collation or some database implementation detail, that is the exception where it may be good to use a real database but certainly is not the rule.
Using an in-memory database does not increase my confidence that in my software either. I also started using dockerised dependencies in tests a couple of years ago.
Can you please explain what you did with a RAM disk to speed them up?
Configure the container with a tmpfs volume (https://docs.docker.com/storage/tmpfs/), and configure the database server to store all its files in that volume's directory.
To test against real 3rd-party http API from time to time, and to generate mocks automatically for the same tests, you could use VCR https://pypi.org/project/vcrpy/
> Some APIs are very stable, and testing against the API itself can hit rate limiting, bans, and other anti-abuse mechanisms that introduce all kinds of instability to your test suite.
Those rate limits, bans, and other anti-abuse mechanisms are things that would be good to uncover and account for during tests. Better for the test suite to detect those potential failures than the production deployment :)
And if you have to mock, at least try to have somebody else write the mock. Testing your understanding of GitHub's API against your understanding of GitHub's API isn't useful. Testing your interpretation of the API behavior against somebody else's interpretation provides a lot more value, even if it isn't nearly as good as testing against the actual API.
Clearly mocking DB is a footgun and it's not that hard to setup e2e test.
Use TestContainer or Docker on a random port, run your API on a random port.
Every tests seeds all the data needed to run (user, org, token), it requires an initial setup but then you just reuse it everywhere, and voila. No side effects, no mock to maintain, it also test your auth and permissions, almost 1:1 with prod.
Tests are a tool for you, the developer. They have good effects for other people, but developers are the people that directly interact with them. When something fails, it's a developer that has to figure out what change they wrote introduced a regression. They're just tools, not some magic incantation that protects you from bugs.
I think the author might be conflating good tests with good enough tests. If IOService is handled by a different team, I expect them to assure IOService behaves how it should, probably using tests. The reason we're mocking IOService is because it's a variable that I can remove, that makes the errors I get from a test run MUCH easier to read. We're just looking at the logic in one module/class/method/function. It's less conceptually good to mock things in tests, since I'm not testing the entire app that we actually ship, but full no-mocks E2E tests are harder to write and interpret when something goes wrong. I think that makes them a less useful tool.
The thing I do agree on, is assuming your mocks should only model the happy path. I'd say if something can throw an exception, you should at least include that in a mock. (as a stubbed method that always throws) but making the burden of reimplementing your dependancies mandatory, or relying on them in tests is going to mean you write less tests, and get worse failure messages.
This 100%. I'm not sure how the author managed to create consistent failure cases using real service dependencies, but in my code I find mocks to be the easiest way to test error scenarios.
With I/O in general, I've observed that socket, protocol, and serialization logic are often tightly coupled.
If they're decoupled, there's no need to mock protocol or serialization.
There's some cliché wrt "don't call me, I'll call you" as advice how to flip the call stack. Sorry, no example handy (on mobile). But the gist is to avoid nested calls, flattening the code paths. Less like a Russian doll, more like a Lego instructions.
In defense of mocks, IoC frameworks like Spring pretty much necessitate doing the wrong thing.
> E2E tests are harder to write and interpret when something goes wrong.
If the test is hard to debug when it goes wrong, then I assume the system is hard to debug when something goes wrong. Investing in making that debugging easy/easier unlocks more productivity. Of course it matters on how often bugs show up, how often the system changes, the risks of system failure on the business, etc. it may not be worth the productivity boost to have a debuggable system. In my cases, it usually is worth it.
I think it's always going to be harder to debug 1 thing, versus everything, regardless of how a system is built. If you're not mocking anything, then anything could have gone wrong anywhere.
But also, if you're able to fix things effectively from E2E test results due to a focus on debug-ability, then that's great! I think it's just the framing of the article I have trouble with. It's not an all or nothing thing. It's whatever effectively helps the devs involved understand and fix regressions. I haven't seen a case where going all in on E2E tests has made that easier, but I haven't worked everywhere!
I don't think mocking is an anti-pattern. Using only unit tests and then mocking everything probably is.
Mocks have a perfectly viable place in testing. They help establish boundaries and avoid side effects that are not pertinent to the logic being tested.
I would reference the testing pyramid when thinking about where to be spending time in unit tests vs. integration tests vs. end to end tests. What introduces risk is if we're mocking behaviors that aren't being tested further up the pyramid.
I like the testing pyramid specifically because it captures the tradeoffs between the different kinds of tests. Mocks can come in handy, but like anything else can be abused. We need a "Mock this, not that" kind of guide.
I used to love mocks, once upon a time. Nowadays, though, I internally sigh when I see them.
I've come to the opinion that test doubles of any kind should be used as a last resort. They're a very useful tool for hacking testability into legacy code that's not particularly testable. But in a newer codebase they should be treated as a code smell. Code that needs mocks to be tested tends to be code that is overly stateful (read: temporally coupled), or that doesn't obey the Law of Demeter, or that does a poor job of pushing I/O to the edge where it belongs. And those are all design elements that make code brittle in ways that mocking can't actually fix; it can only sweep it under the carpet.
At some point, you need to interact with something that looks like I/O or an external service. Not handling failures from them is a source of a lot of bugs.
Even if pushed to the periphery, how do you test the wrapper you built to hide these failures from the rest of your code base? If you don’t hide these failures in some wrapper, how do you test that your system handles them properly?
I think the answer, like most things, is "it depends". Specifically it depends on the complexity of the thing you're mocking. Mocking a database is a bad idea because there's a ton of complexity incumbent in Postgres that your mocks are masking, so a test that mocks a database isn't actually giving you much confidence that your thing works, but if your interface is a "FooStore" (even one that is backed by a database), you can probably mock that just fine so long as your concrete implementation has "unit tests" with the database in the loop.
Additionally, mocking/faking is often the only way to simulate error conditions. If you are testing a client that calls to a remote service, you will have to handle I/O errors or unexpected responses, and that requires mocking or faking the remote service (or rather, the client side transport stack).
But yeah, I definitely think mocks should be used judiciously, and I _really_ think monkeypatch-based mocking is a travesty (one of the best parts about testing is that it pushes you toward writing maintainable, composable code, and monkey patching removes that incentive--it's also just a lot harder to do correctly).
fully agree with you, mock is a shoehorn to fit unit tests into stateful monoliths with messed up dependencies that cross several stacks and reused in multiple modules.
with better separation of concerns and separation of compute from IO one should not need mocks.
Seconded. I shudder when I think back to the "test driven development" days when I wrote so much throwaway test code. Later, you try to refactor the app and it's another 50% of effort to update all the tests. The solution is to avoid it in the way you described.
Some of the advice is good, like decoupling I/O and logic where that makes sense. But the general idea of mocking being an anti-pattern is overreach.
This kind of thinking is overly rigid/idealistic:
> And with Postgres you can easily copy a test database with a random name from a template for each test. So there is your easy setup.
> You need to test reality. Instead of mocking, invest in end-to-end (E2E) testing.
"Easily" is like "just." The ease or difficulty is relative to skill, time, team size, infrastructure, and so on.
As for testing reality, sure. But there's also a place for unit tests and partial integration tests.
In some situations, mocking makes sense. In others, full E2E testing is better. Sometimes both might make sense in the same project. Use the right tool for the job.
I've worked with a lot of pro-mocking engineers, and the time they spent on mocks easily outstripped the time a good build engineer would have spent creating a fast reusable test framework using real databases/dummy services/etc. The mocks won not because they were better or more efficient, but because of lack of deeper engineering skill and cargo culting.
If I didn't have the team to properly do a good test build I would tell all my engineers to design their code to not need mocks if at all possible and call it a day at 85% coverage. That's very doable with dependency injection and modular function libraries. The time spent not chasing full coverage could be better spent improving CI/integration testing/deployment.
It depends. The type of project affects the decision too.
If it's a small, standalone CRUD app on a SQLite database, mocks would probably be a bad option, sure.
On the other hand, it could be an integration platform that integrates with many third-party services. Some of them may not have test environments. Or some of the integrations may be written by third-party contractors, and we can't expose service credentials because of poor permissions granularity. Mocks are a good option there.
That engineer may have spent a couple of dozen hours on their mock. But the engineers who spent time on a test framework that uses real databases will soak up thousands of developer hours in CI time over the next decade.
Spinning up DB instances is a lot faster now than it used to be in the past. There are even modules for in-memory instances of certain databases. The speed of a unit test vs. that of one that uses an actual database is small enough for it to be a real consideration.
That being said, of course, "it depends" on your use case. But I've found setting up this sort of test environment quite a bit easier now than writing database mocks, a lot less time-and-maintenance intensive, and relatively quick to run in any environment.
(Also, in a decade, I'm pretty confident this gap will get even smaller, while the time to maintain mocks will stay constant)
Each mock needs to be maintained, sanity checking against the latest behavior of the actual dependency. And CI costs hardware, not developer time if he has anything else to work on.
The only challenge I have encountered with Postgres template [0] databases over the years, is depending on the test framework to get a random name for the database and then inject that random name into the connection URL. Always found a solution though.
For some reason this article gives me flashbacks to the new CTO who comes in and declares 'micro-services' or 'containers' as the perfect solution for some problem that no one has actually run into. The article's author has had their pain points, but it doesn't mean all mocking is bad everywhere in every use case.
I wrote some code recently that detects cycle errors in objects with inheritance and I mocked the DB calls.
- Did I test for DB failures? No, but that's not the goal of the tests.
- Could I have refactored the code to not rely on DB calls? Yes, but every refactor risks the introduction of more bugs.
- Could I have launched a temporary DB instance and used that instead? Yes, but there's no obvious reason that would have been easier and cleaner than mocking DB calls.
In python it wasn't hard to implement. It was the first time I'd used the mock library so naturally there was learning overhead but that's unavoidable - any solution would have learning overhead.
> Modelling the happy path is great for refactoring - even a necessity, but doesn’t help with finding bugs.
This is a common misconception (one that I also initially held). Unit tests aren't meant to find bugs, they're meant to protect against regressions, and in doing so, act as a documentation of how a component is supposed to behave in response to different input.
> Unit tests aren't meant to find bugs, they're meant to protect against regressions
That hasn't been the general consensus on unit tests for at least 30 years now. Regression tests are a small subset of tests, typically named for an ID in some bug tracker, and are about validating a fix. The majority of unit tests catch issues before a bug is even opened, and pretty much any random developer you talk to will consider that to be the point.
> Regression tests are a small subset of tests, typically named for an ID in some bug tracker, and are about validating a fix.
This is how I also tend to think of them, but it's not how the phrase is generally used. The general meaning of regression tests it to ensure known correct functionality doesn't break with a future change. There's no actual requirement it be tied to a known bug.
They do not "find" bugs in the way that exploratory testing or user operation might (or even in the way that broader integration tests might), that is they don't find bugs that are not in the known problem space. But they are very good at proving a method works correctly and covers the known execution permutations.
1. Changes often require changing the functionality of a component, which means many of the current unit tests are bunk and need to be updated. Changes that are simply refactoring but should retain the same behavior, need to update/rewrite the tests, in which case again often requires significant refactoring of the existing tests.
2. Small isolated changes usually require testing everything which in a big org is very time consuming and slows down builds and deploys unnecessarily.
3. A lot of false confidence is instilled by passing unit tests. The tests passed, were good! Most of the production bugs I've seen are things you'd never catch in a unit test.
I really can't imagine a large refactor where we wouldn't end up rewriting all the tests. Integration tests are much better for that imo, "units" should be flexible.
Yes changing contracts implies updating tests. They should.
Refactoring under the same contract should not lead to refactoring of tests. Unless of course you introduce a new dependency you have to mock ? That's just one example.
If your code changes a lot it has nothing to do with tests being hard to change. It has to do with the code it tests changes too often. Poor contracts perhaps.
And just like the parent comment. Tests are not about finding or solving bugs, they are about regressions and making sure your contracts are correctly implemented.
If your refactoring includes changes to interfaces, different abstractions, logical changes, business logic, then most of your tests need to be effectively rewritten.
The only part where I see unit tests being useful for refactoring is making changes to the internals of a single unit. Its always been more trouble than its worth for me.
In some cases it makes sense, like testing small units that heavy in logic (function that calculates order prices for example, scientific computing, etc). But unit testing every single piece of code has always seemed dogmatic to me (unit tests are good engineering, write unit tests always everywhere). Everything has tradeoffs and as engineers I think our job is to understand the pros and cons and apply them effectively.
I think writing tests as a form of documentation is a waste of time. If I'm using a component I don't want to read unit tests to figure out what it should do.
Unit tests are most often used to cover a few more lines that need coverage. That's the value they provide.
A well designed API will generally allow users to understand usage without any additional documentation, sure. However, those who modify the API in the future will want to know every last detail that you knew when you were writing it originally. That must be documented to ensure that they don't get something wrong and break things – and for their general sanity. That is, unless you hate future developers for some reason.
You could do it in Word instead, I suppose, but if you write it in code then a computer can validate that the documentation you wrote is true. That brings tremendous value.
Nothing is left undefined. Absent of documentation, most likely something will end up defined by inference. Which is not a good place to be as a developer as you have lost the nuance that went into it originally.
You don’t change what is already defined (even if only by inference). Change is only by amendment. Will you successfully amend the changes without also changing what was previously defined if all you have is inference to go on? Probably not.
That’s assuming change is even necessary. Oftentimes you only need to modify the implementation, which doesn’t change what is defined. A change in implementation has no impact on the outside, at least as long as you have properly covered your bases, which should you should be able to do as long as you have proper documentation. Without documentation, good luck to you.
Unit testing proves correctness in regard to the test written (not necessarily the correctness of the application itself). They're similar in that they are both typically fast to run, and that they check an aspect of the program for correctness.
They typically can only prove correctness for specific input data, and then there’s often still some runtime or environment-dependent chance involved which may cause some fraction of the invocations to fail. Is it correct or not if a single invocation succeeds? How can you be sure?
Maybe I am missing something, but how else would I test various exception handling paths?
There is a whole world of errors that can occur during IO. What happens if I get a 500 from that web service call? How does my code handle a timeout? What if the file isn't found?
It is often only possible to simulate these scenarios using a mock or similar. These are also code paths you really want to understand.
Put a small data interface around your IO, have it return DATA | NOT_FOUND etc.
Then your tests don't need behavioral mocks or DI, they just need the different shapes of data and you test your own code instead of whatever your IO dependency is or some simulation thereof.
Sure. This is a good practice for multiple reasons. However, the code that glues my interface to the underlying I/O is still there and needs testing, right?
I agree with you in general. But it always feels like there are spots where a mock of some kind is the only way to cover certain things.
You could call the data you generate for the tests "mocks".
But they really aren't "mocks" in the sense of behavioral mocks via IoC/DI and you don't need to manipulate them via some kind of interface in order to put them into the right state for your particular tests.
There are some extra steps, but you get extremely simple and reliable tests in return.
In many(!) cases you already have a data interface, especially with HTTP/REST APIs. All you need to do is simply not bury the IO call down the stack and maybe describe the failure conditions as plain data in your signature and voila.
(This is not a replacement for higher order testing like, manual, E2E or integration tests. But it certainly beats unit testing with mocks IMO.)
I don't think there's a disagreement; the author states "Whenever I look at mocks, they mostly have the same problem as all unit tests that I see, they only model the happy path". So by corollary their opinion of the correct usage of mocking would also include modelling brokenness.
Don’t really care about advice from the guy who invented an incredibly non pragmatic programming language. I also honestly had to look up who he was. So his sage advice hasn’t brought him much fame.
Testing both your code and the message system is exactly what you want, since if the message system is broken in a way that upstream didn't catch, you want to learn about it during testing and not production, if possible.
I’m still mad about the time I was told to mock a payment gateway in tests even though they had a testing environment and then got a slew of bug reports from people whose company names had punctuation (and thus failed the name validation the payment gateway was secretly running).
You should be unit testing components that interact with the payment gateway. This could involve dozens of even hundreds of tests, where the gateway should be mocked. These tests should be fast and reliable. In addition, you should have a small suite of integration/E2E tests against a real test instance of the gateway. These tests may be slow and unreliable (because of the real network involved) but catch those hairy issues you failed to find in unit tests.
Also, when your integration suite (or customer reports) discovers that the payment gateway fails on punctuation, another unit test should be added with a mock that responds the same way, and an E2E test added with punctuation in the input data and a failure expectation.
What makes you so certain you would have included punctuation in the input data if the test had not mocked the gateway?
That reminds me that Stripe actually maintains (or used to) their own mock for their Ruby package. This puts the burden on maintaining the mock on the library owner, where it is more likely that they would implement the mock correctly, edge cases and all.
Keep in mind that there are different kinds of testing. What Beck called unit tests and integration tests.
Unit tests are really for purposes of documentation. They show future programmers the intent and usage of a function/interface so that others can figure out what you were trying to do. Mocking is fine here as future programmers are not looking to learn about the message system here. They will refer to the message system's own documentation when they need to know something about the message system.
Integration tests are for the more classical view on testing. Beck suggested this is done by another team using different tools (e.g. UI control software), but regardless of specifics it is done as a whole system. This is where you would look for such failure points.
Unit tests as a form of example code based documentation is where I could see unit tests complimenting documentation, yes.
However, depending on the industry, code coverage is a valuable tool to gauge the maturity of the software baseline and burning down software execution risk. One example of this is Airworthiness or Safety Critical Code.
Of course, there is no single code coverage metric. Code covered by unit tests does not count towards code covered by integration tests. They are completely separate systems. And, at least in Beck's opinion, should be carried out by completely different teams.
It is so cringe to see bad advice like this being given. Yes, you can write mocks incorrectly. You should not model them after the "happy path" but you should make sure they cover the most use-cases both good and bad. I have been a senior or principal engineer on teams that did both of these approaches and the non-mocking approach is terrible because you end up with separate tests that have colliding data. It's slower using a real database back-end and becomes a mess and leads to issues where your database is heavily coupled to your test code which is the real anti-pattern. Then a year or two later when you want to change databases or database architectures you're screwed because you have to go into a bunch of tests manually and change things. The whole point of the mocks is it makes everything modular.
This is a problem with the test authors, not mocks.
“All the bugs are when talking to an actual database.”
Databases have rules that need to be fillowed, and a lot of those can be tested very quickly with mocks. The combined system can have bugs, so don’t only use mocks. Mocks and unit tests are not a substitute for all the other tests you need to do.
How this person can claim to be a CTO I have no idea.
Then he should have said that. Is not clear communication a requirement for CTO these days?
Everything you are describing is about actually testing the database. A database is a complex server and things like db triggers and store procedures should be tested isolation too. And then you have integration tests too.
My team just found a bug that wasn’t covered in a unit test. We found it in a long running API test. And so we added a unit test for the specific low level miss, and a quick integration test too.
How can one write an article about testing that doesn't even mention the invariants you're trying to validate (by construction or testing)? That's the minimum context for addressing any QA solution.
The GoF pattern book did list patterns, but it primarily argued for a simple language about patterns: context, problem, solution, limitations. It's clear.
The blog-o-sphere recipe of click-bait, straw-man, glib advice designed not to guide practice but to project authority (and promise career advancement) is the exact opposite, because it obfuscates.
The point of writing is to give people tools they can apply in their proximal situation.
Are you really testing if your solutions start by refactoring the code to be more testable? That's more like design if not architecture -- excellent, but well beyond scope (and clearly in the CTO's bailiwick).
And as for mocks: they're typically designed to represent subsystems at integration points (not responses to functions or IO/persistence subsystems). How hard is that to say?
The CTO's way is not to win the argument but to lead organizations by teaching applicable principles, providing guardrails, and motivating people to do the right thing.
Sorry to be exasperated and formulaic, but I think we can do better.
The problem is when IOService has edge cases. When building the mock, does it address the edge cases? When you want to find bugs by testing, the tests need to test the real world. So to work, the mock for IOService needs to model the edge cases of IOService. Does it? Does the developer know they need to model the edge cases, or the mock is not helping with finding bugs? Do you even know the edge cases of IOService? When IOService is a database service, does your mock work for records that do not exist? Or return more than one record?
It depends. Mocks are used to remove variables from the experiment you are running (the tests) and see if it behaves under very specific conditions. If you want to test how the code behaves when a specific row is returned by the database, but instead the dependency returns something else, then you are not testing that use case anymore. Reproducibility also has its values. But yes, you can definitely make your mocks return errors and fail in a myriad of ways.
Not to say you should mock everything. Of course having proper integration tests is also important, but articles like these will rarely tell you to have a good balance between them, and will instead tell you that something is correct and something else is wrong. You should do what makes sense for that specific case and exercise your abilities to make the right choice, and not blindly follow instructions you read in a blog post.
I totally agree, there is a balance between what makes sense to mock, and what needs proper integration tests.
Additionally, just using integration tests does not guarantee that edge cases are covered, and you can just as easily write integration tests for happy path, without thinking about the rest.
I prefer dependency injection instead of mocking. Not only is injecting a "mock" service better than monkey patch mocks in pretty much all cases, but it's an actually useful architectural feature beyond testing.
That's the only way to mock in some languages/testing frameworks. In C++ monkey patching would be quite difficult, but DI is simple. googlemock works this way.
Many comments are about the danger of over mocking, which is right.
But, I’ve also suffered the opposite: having to use a lib that assumes it only runs in production, and always initialises some context no matter what (up to assuming only a specific VM would be used, never ever elsewhere, especially not in local)
In the wild, I’ve rarely (if ever) saw code that was too testable. Too complex for no reason? Yes.
The golden rule was to only mock your own code. Make a facade around the framework class using an interface and mock that if needed to decouple your tests. Then write integration tests against your implementation of the interface. The moment you mock other people’s code you have brittle tests.
Everytime I read about this kind of argument against mock is usually due to a misunderstanding on why unit tests with mocking exists in the first place. The underlying assumption is that tests are a quality assurance tool but I think that this is true only for E2E (possibly in production). In outside in TDD unit test are used as a design tool not a QA one, and mocking is a convenient way to quickly do that without the need of implementing the next layer, mock usually don’t replace an implemented service that does IO they implemented a noop that triggers an exception (so they your E2E won’t pass until you implement that)
The problem is in the name, unit test should be called implementation spec or in-code documentation.
Each layer of testing has its roles and serves a different purpose.
As a Java (mostly Spring) dev, I use mocks a lot to separate different components from each other, if I only want to test one of them. If your code only contains tests that mock other things, you're missing something, as others have pointed out. But just because you have a good coverage of integration testing, doesn't mean that writing isolated unit tests are bad. I find it much easier to diagnose a problem in a tight unit test than in a integration test that covers half the project.
Some criticism to the article:
The "more unit testing" section reminds me of junior devs asking why they can't test private methods in Java. If I'm testing a unit, I want to test the contract it promises (in this case, a method that does some checks and then sends something). That the behavior is split between multiple methods is an implementation detail, and writing tests around that makes changes harder (now I can't refactor the methods without also having to update the tests, even if the contract doesn't change) and it doesn't even test the contract! (There's nothing that makes sure that the mail is actually sent - we could be testing methods that aren't used by anything but the test code)
For the "easier to test IO" section: just don't. Your tests now depend on some in-memory implementation that will behave differently than the real thing. That's just mocking with extra steps, you still don't know whether your application will work. If you want to do io, do the real io
"Separation of logic and IO": this is in general the right thing to do, but the way it's described is weird. First, it does the same as in the "more unit testing" section with the same problems. Then, the code is refactored until it's barely understandable and the article even admits it with the Greenspan quote. In the end, the production code is worse, just to ... Not test whether there's actually some code doing the IO.
I actually think there are some good ideas in there: separating the logic from the IO (and treating them as separate units) is important, not just for better testability, but also for easier refactoring and (if done with care) to be easier to reason about. In the end, you will need both unit and integration tests (and if your system is large enough, e2e tests). Whether you're using mocks for your unit tests or not, doesn't make much of a difference in the grand picture.
Just don't mock stuff in integration or e2e if you absolutely can't prevent it.
The catch-22 with refactoring to be able to write unit tests is that refactoring introduces risk as you are changing code, and you need tests to help reduce that risk. But you can't easily write tests without refactoring. This has been a very difficult problem for the team I'm currently on.
The only strategy I'm aware of is described in `Working Effectively With Legacy Code`, where you start by writing throwaway unit or E2E tests that give you "cover" for being able to refactor. These tests depend on the implementation or may use mocking just to get started. Then you refactor, and write better unit tests. Then get rid of the throwaway tests.
Why get rid of working e2e tests? IMO they are more useful than unit tests at finding the kinds of problems that stop a release/deployment.
You can attack from both directions: e2e tests make sure that certain processes work in fairly ordinary situations, then look for little things that you can unit test without huge refactoring. When you've pushed these as far as you can, section off some area and start refactoring it. Do your best to limit your refactoring to single aspects or areas so that you are never biting off more than you can chew. Don't expect everything to become wonderful in one PR.
Your e2e tests will catch some errors and when you look at what those commonly are then you can see how to best improve your tests to catch them earlier and save yourself time. In python I had stupid errors often - syntax errors in try-catch blocks or other things like that. If I used a linter first then I caught many of those errors very quickly.
I was working on a build system so I mocked the build - created a much simpler and shorter build - so I could catch dumb errors fast, before I ran the longer e2e test on the full build.
IMO you need to progress to your vision but trying to reach it in one step is very dangerous. Make life better piece by piece.
You can even do PRs where you only add comments to the existing files and classes (not too much detail but answering questions like "why" is this file/class here). This helps to make sure you really understand the current system is doing before you change it.
I once added type hints everywhere to a legacy python program - it wasn't as helpful as I'd hoped but it did prevent some issues while I was refactoring.
Mocks can be useful when there is a standard protocol and you want to document and test that your code follows the protocol exactly, doing the same steps, independently of whether some other component also follows the protocol. It tests something different from whether or not two components work together after you change both of them.
It takes time to come up with good protocols that will remain stable and it might not be worth the effort to test it when the protocol design is new and still in flux, and you don’t have alternative implementations anyway. This is often the case for two internal modules in the same system. If you ever want to change the interface, you can change both of them, so an integration test will be a better way to ensure that functionality survives protocol changes.
Database access tends to be a bad thing to mock because the interface is very wide: “you can run any SQL transaction here.” You don’t want to make changing the SQL harder to do. Any equivalent SQL transaction should be allowed if it reads or writes the same data.
Compare with testing serialization: do you want to make sure the format remains stable and you can load old saves, or do you just want a round trip test? It would be premature to test backwards compatibility when you haven’t shipped and don’t have any data you want to preserve yet.
This is a solid article. So many mocks that, at the end, verify you set up your mock amd that 1=1.
One paragraph I think is missing: error handling. You want units to be able to error so you can validate error handling which is _very_ hard on E2E tests. You can simulate disk full or db errors and make sure things fall back or log as expected. This can be done with fakes. Mocks are a specific type of test double that I have very little use of.
Anyone who is overly zealous about anything is always wrong in the end. Including testing.
"Why would people mock everything? Why not stand up a real test db and test on it?" Because the test zealous have explicitly declared that EACH test should be atomic. Yes you can find these people at major tech conferences. Each test should mock its own db, web service, etc. Every single time. And it should do that in no more than a few milliseconds, so that the entire project compiles in no more than 2mins, even for the largest and most complex corporate projects. And these tests should be fully end-to-end, even for complex microservices across complex networking architecture.
Some of you may be rolling on the floor laughing at how naive and time-consuming such a project would be.
We all agree such testing is a noble goal. But you need a team of absolute geniuses who do nothing but write "clever" code all day to get there in any sizeable project.
My organization won't hire or pay those people, no matter what they say about having 100% coverage. We just do the best we can, cheat, and lower the targets as necessary.
Lets not forget how long it would take to spin up an enterprise database, even in memory, there are hundreds (or thousands) of tables. Also there can be multiple databases with their own schema, and each require a fair amount of data in some of those tables just to do anything..
Wow, some great examples in here for how to use mocks wrong. I get the impression the author has just never seen tests that use mocks properly, honestly. The various refactorings contained in here are fine, of course, but I see no reason to call the entire use of mocks an anti-pattern. They're a tool, and they need to be used properly. Let's not throw the baby out with the bath water.
Mocking is an indeed an anti pattern ... when dealing with tests that pretend to be unit tests but are not actually unit tests (e.g. needing to be aware if IO edge-cases, to quote the article).
But tests that are not actually unit tests masquerading as unit tests and vice versa is arguably the bigger problem here. Not mocking per se.
If you inherited a project with no tests at all, mocking is a lifesaver. It allows you to only worry about specific aspects of the application so you can start writing and running tests. I agree though that if not done properly, it can be overused and can make your tests practically worthless.
A radical point of view. And as such it is of course wrong ;).
First of all, there are languages where dry-running your code with all parameters mocked is still a valid test run. Python, js, and Perl for instance make it very simple to have a stupid error in the routine that crashes every run.
But more importantly, a unit test usually executes inside the same process as the code. That gives you tremendous introspection capabilities and control over the execution flow.
Testing for a specific path or scenario is exactly what you should do there.
Finally, what if not mocks, are in-memory filesystems or databases? They, too, won't show all the behaviors that the real thing will do. And so so test containers or even full dedicated environments. It's all going to be an approximation.
I can foresee the "not what I mean" answers to everything. Oh, sure it's a fake DB but that's not a mock. Oh, yeah, you need to test with something that always makes an error but that's not a mock.
Eventually, what they mean is that if it sucks, it's what they're talking about, and you should never do that. If it was really useful, it's not a mock.
When I implemented the test suite for my JS framework [1], I realized that there was a ton of cruft and noise in most test set ups. The solution? Just start a mirror of the app [2] and its database(s) on different ports and run the tests against that.
Do away with mocks/stubs in favor of just calling the code you're testing, intentionally using a test-only settings file (e.g., so you can use a dev account for third-party APIs). You can easily write clean up code in your test this way and be certain what you've built works.
If the normal way of testing is passing some parameters to test some code is what I'm going to call "outside-in" testing.
Then mocking is "inside-out" testing. You check that your code is passing the right params/request to some dependency and reacting correctly to the output/response.
Its really the same thing and you can flip between them by "inverting".
Sometimes mocking just makes much more sense, and sometimes just passing paramaters to a function directly does. The end goal is the same: test some unit of codes behaviour against some specific state/situation.
They have their place but like all testing should be layered with other types of test to "test in depth".
Mocks aren't an anti-pattern. Anti-patterns are a "common response to a recurring problem, usually ineffective, risking being highly counterproductive". On the contrary, mocks are a common response to a recurring problem which are often effective and have no greater risk than a great many alternative testing methodologies. They do solve problems and they are useful. But like literally anything else in the universe: it depends, and they don't solve every problem.
You wanna know how to test without mocking? Use any kind of test. Seriously, just make a test. I don't care what kind of test it is, just have one. When you notice a problem your testing doesn't catch, improve your testing. Rinse, repeat. I don't care what kind of 10x rockstar uber-genius you think you are, you're going to be doing this anyway no matter what super amazing testing strategy you come up with, so just start on it now. Are there some ways of testing that are more effective than others? Yes, but it depends. If testing were simple, easy, straightforward and universal we wouldn't be debating how to do it.
(about 99% of the time I'm disappointed in these clickbait blog posts upvoted on HN. they are shallow and brief (it's a blog post, not a book), yet quite often dismissive of perfectly reasonable alternatives, and in the absence of any other information, misleading. it would be better to just describe the problem and how the author solved it, and leave out the clickbaity sweeping generalizations and proclamations)
Go ahead. Don't mock that external service that you rely on for an API. Now you need to have multiple keys, one for each developer, or share keys separate from various environments? Does it not offer dev/test/staging/prod keys? Well, now you need to share those keys. Does it only offer Prod keys? Now you are stuck sharing that. API request limits? Now you are eating through that just to run tests.
And let's not forget that testing things locally means you are mocking the network, or lack-thereof. "Mocking is an anti-pattern" is a great sentiment if you ignore costs or restrictions in the real world.
That is a fairly good reason for trying to use external systems/tools that make testing easy/cheap to do.
So a good approach would be to have tests where you can run with the mock and then run the same tests with the real system. Anything you catch with the mock saves you from using the costly system but you still get real testing.
If your dependencies are unstable then that is very important to know! If it means you have to add forms of resilience then that's good for your code perhaps?
Maybe you’re writing them incorrectly then? I’ve written several that were for core app features used in 30ish test cases on a team with 7 engineers and they’ve worked flawlessly for over two years.
No, it is that mocks can hide interface changes. So if you have a mock, then you need to test that the interface works without the mock. And if you are doing that, why not just skip the mock?
foo calls x(user, date)
foo mock # tests pass
x changes to x(user, time)
but the tests for foo do not change, tests still pass, runtime errors.
If you have static/strong typing the compiler will pick this up – but for dynamic languages you have a problem.
In general I have fake IO object and a real IO object. Then run the same bunch of tests against them to make sure behaviour matches. You have verified your mock has the same behaviour as the real thing.
I then run unit tests against the fake io object. I don't mock internals, only boundaries. If for whatever reason i want to test it against the real db i can simply swap out the fake for the real object.
In some languages A might free a memory allocation e.g. after communicating with some server.
If B also frees that memory then there is a bug. Presumably this means B's tests are wrong/incomplete. If B was mocking A to avoid the IO, you might not find out.
Most IO nowadays in my context is to call some REST API. I prefer to use nock (https://github.com/nock/nock) With that I can create an environment for my test to run in without changing anything about the implementation.
The article does not seem to bring up this way to do it.
Mocking is useful for testing small parts of your programs/libraries. For full-scale testing you really need to not emulate because any emulation will be woefully incomplete, so you're going to have to spin up a virtual network with all the services you need including DNS.
> When you add UI-driven tests (and you should have some),
I disagree. If you want to send your test suite into the toilet, add a headless browser driver and nondeterministic assertions based on it. Most output that becomes UI can be tested; the rest can be checked by a quick QA.
I've mocked a lot in my past. Last 2 years I've been using fakes explicitly, although it has an overhead, I like it as there is less maintenance and refactoring with tests.
i dont get it. I if am taking a dependency on database or another class and i mock it using its interface, what is the harm in it?
Essentially i have tested that given my dependencies working correctly my class would also work as expected.
Almost -- you're testing that given your mocking implementation perfectly mirrors what the dependency would do given the inputs tested with that your functions produce the correct outputs (and hopefully you also verified the side-effects).
The article is stating that almost nobody goes through the trouble of implementing a mock database perfectly, they just do something like make a single call return some hard-coded data. While this works a bit, it means that if the database ever changes its interface you have to remember to notice and implement that change as well.
In fact, Mocking is an essential tool for writing _unit_ tests; you know, testing exactly one thing (a 'unit') at a time. In Java for instance, a 'unit' would be a single static method, or a single class. Other languages will have different definitions of these terms, but the essential point would be "smallest reasonable grouping of code that can be executed, preferably deterministically"
The problem is people conflate the various levels of integration tests. You actually should* have both: Full unit test coverage + an integration test to prove all of the pieces work together successfully. Small unit tests with mocks will point you _very quickly_ to exactly where a problem is a codebase by pointing out the effects of contract changes. Large integration tests prove your product meets requirements, and also that individual components (often written by different teams) work together. They are two different things with two different goals.
* Important Caveat on the word 'should': Testing de-risks a build. However, if your business product is a risk itself (lets say you're hedging a startup on NFTs going wild), then your testing should reflect the amount of risk you're underwilling to spend money on. Unit testing in general speeds up development cycles, but takes time to develop. A good Software Engineering leader recognizes the risks in both the business side and development side and finds a balance. As a product matures, so should the thoroughness of it's testing.
If you take the article's advice to move everything that's not IO into pure, testable code (which is good), what's left is code that does IO. What are you even testing when you call such a procedure? At that point, it's mostly calls into other people's code. Maybe that's a good place to draw the line on testing things?
For car crash tests, we should always use full humans. A test dummy might have a lot of sensors and be constructed to behave like a human in a crash, but you'll never get the full crash details with a doll.
Notice the problem here? This argument does not consider the costs and risks associated with each approach.
For testing, IO is very expensive. It leads to huge CI setups and testsuites that take multiple hours to run. There is no way around this except using some kind of test double.
I've been doing this stuff (software) for a very long time and if it hadn't been invented by others, I'd never have thought of Mocking. It's that stupid of an idea. When I first came across it used in anger in a large project it took me a while to get my head around what was going on. When the penny dropped I remember a feeling of doom, like I had realized I was in The Matrix. Don't work there, and don't work with mock-people any more.
I don't like mocking either, but there are periodically situations where I've found it useful. Sometimes there is a complex system (whether of your own design or not) that isn't amenable to integration/e2e testing, and the interesting parts can't be easily unit tested due to external or tightly coupled dependencies.
Of course you can always pick it apart and refactor so it can be unit tested, but sometimes the effort required makes mocking look pretty appealing.
With containerization it’s very quick to spin up test dependencies as well as part of your CICD. Why mock calls to a datastore when it’s super easy to spin up an ephemeral postgresql instance to test on?
> Why mock calls to a datastore when it’s super easy to spin up an ephemeral postgresql instance to test on?
It's actually super hard to get Postgres to fail, which is what you will be most interested in testing. Granted, you would probably use stubbing for that instead.
If you're writing a custom frontend for GitHub using the GitHub API and don't bother writing a decent set of mocks for how you expect the GitHub API to behave, your app will quickly require either full manual QA at best or become untestable at worst. Some APIs are very stable, and testing against the API itself can hit rate limiting, bans, and other anti-abuse mechanisms that introduce all kinds of instability to your test suite.
Use the right tools to solve your problems.
I've been doing E2E testing using 3rd-party APIs for a decade now, and this has yet to be a significant problem. The majority of my APIs had a dedicated sandbox environment to avoid "rate limiting, bans, and other anti-abuse mechanisms". The remainder were simple enough that the provider didn't care about users exploring on the live API, and were usually read-only as well.
Did I run into the occasional flaky failure, or API stability issues? Sure. But it was very rare and easy to workaround. It never devolved into becoming "untestable" or "full manual QA"
My other teams that relied on mocks suffered from far worse problems - a ton of time being spent on manual-QA, and bugs that leaked into production, because of mock-reality mismatches.
That was always the go-to for me when testing against 3rd party services, especially because the tests would then survive the offboarding of the engineer who set them up with their personal credentials.
If your test suite relies on live Github PATs or user-specific OAuth access tokens, then you can either figure out how to manage some kind of service account with a 'bot' user, or live with things breaking every time someone leaves the org.
Services that incur a per-request charge, or consume account credits, are another problem. Especially if they don't have sandboxes.
you also need to test the mocks against the real thing separately.
What you're describing sounds like a fake to me.
It's important to have a contract testing layer in place to make sure your test doubles are still behaving like the real thing, though.
I thought interested mostly fizzled out over the years after the initial hype died down
I don't find mocks for this kind of thing very helpful, because what you're really testing for are things like changes to how an API changes over time -- you need real API calls to see this.
I’ve never been thrilled by tests that rely on mocking — it usually means you need to re-express your module interface boundary.
Mocks for me fall into the class of software I affectionately call “load-bearing paint.” It’s basically universally the wrong tool for any given job but that really doesn’t stop people. Putting in a data class or similar model object and a delegate is usually sufficient and a much better tool.
I find mocks useful for testing conditions that are on the periphery and would be a decent amount of trouble to set up. For instance, if I have a REST controller that has a catch all for exceptions that maps everything to a 500 response, I want a test that will cause the DAO layer to throw an exception and test that the rest of the "real stack" will do the translation correctly. A mock is the easiest way to accomplish that.
Your way means you only ever have siblings. With an orchestrator pulling results out of one module and pushing it into another.
Maybe you mean "less surface to mock", which is irrelevant if you generate your mocks automatically from the interface
You define fakes in this case, not mocks https://blog.cleancoder.com/uncle-bob/2014/05/14/TheLittleMo...
For what it’s worth, I find the distinction between mocks, fakes, spies, stubs, dummies and whatever completely useless in practice, the whole point is to control some data flows to test in isolation, it’s really all that matters.
Fun thing this kind of bikeshedding comes from Bob Martin, who famously has nothing worthwhile to show as an example of his actual work. Every time I read something from him, I get more and more convinced he’s a total fraud.
Personally, I like seeing "dummy" name passed into a function and understanding just from the name that no methods are expected to called on it during the test. Or seeing "fake_something " and understanding that a manual (perhaps test-specific) implementation is used. It is a minor point but such chunking is universally useful (there is enough stuff to keep track of).
My original point was that the architecture can facilitate so called classical unit tests over mocks (make the latter less necessary) https://martinfowler.com/articles/mocksArentStubs.html#Class...
And then you mock the delegate?
I’ve only ever worked in very small teams, where we didn’t really have the resources to maintain nice developer experiences and testing infrastructure. Even just maintaining representative testing data to seed a test DB as schemas (rapidly) evolve has been hard.
So how do you
- operate this? Do you spin up a new postgres DB for each unit test?
- maintain this, eg have good, representative testing data lying around?
Typically you'd keep the database container itself alive, and you would run the schema migrations once at startup. Then your test runner would apply fixtures for each test class, which should set up and tear down any data they need to run or that they create while running. Restarting the database server between each test can be very slow.
The test data is a harder problem to solve. For unit tests, you should probably be creating specific test data for each "unit" and cleaning up in between each test using whatever "fixture" mechanism your test runner supports. However, this can get really nasty if there's a lot of dependencies between your tables. (That in and of itself may be a sign of something wrong, but sometimes you can't avoid it or prioritize changing it.)
You can attempt to anonymize production data, but obviously that can go very wrong. You can also try to create some data by using the app in a dev environment and then use a dump of that database in your tests. However, that's going to be very fragile, and if you really need hundreds of tables to be populated to run one test, you've got some big problems to fix.
Property-based testing is an interesting alternative, where you basically generate random data subject to some constraints and run your tests repeatedly until you've covered a representative subset of the range of possible values. But this can be complicated to set up, and if your tests aren't fast, your tests can take a very long time to run.
I think at the end of the day, the best thing you can do is decouple the components of your application as much as possible so you can test each one without needing giant, complicated test data.
we have the same strategy for testing against Kafka., etc.
where we care about data, we seed the db with data for a specific group of tests. Otherwise, we just nuke the db between each test.
Prior to doing this, we'd use in-memory db for tests, and a real db for runtime, using JPA / Hibernate to make things transferrable. But this was leaky, and some things would pass in tests then fail at runtime (or vice versa)
TestContainers has been so much better, as we're running against a real version of the database, so much smaller chance of test and runtime diverging.
> - maintain this, eg have good, representative testing data lying around?
This can be tricky, but usually my advice is to never even being trying to do write seed data in the database unless its very static. It just gets annoying to maintain and will often break. Try to work out a clean way to setup state in your tests using code, and do not rely on magic auto increment ids. Some of the more effective ways I have found is to f.ex. have every test create a fresh customer, then the test does work on that customer. Avoid tests assuming that the first object you create will get id == 1, makes it very annoying to maintain.
There's times when a big test fixture can provide value, but it's very context dependent and almost never for smaller/micro tests.
Generally I've seen a new database (schema in other dbs?) in postgres that is for testing, i.e "development_test" vs "development". The big thing is to wrap each of your tests in a transaction which gets rolled back after each test.
> maintain this, eg have good, representative testing data lying around
This is much harder. Maintaining good seed data - data that covers all the edge cases - is a large amount of work. It's generally easier to leave it up to each test to setup data specific to their test case, generalizing that data when possible (i.e if you're testing login endpoints, you have all your login test cases inherit from some logic specific data setup, and they can tweak as needed from there). You will end up with duplicated test setup logic. It's not that bad, and often you don't really want to DRY this data anyways.
That being said, if you have the time and resources to maintain seed data it's absolutely a better way to go about it. It's also beneficial outside of tests.
Every place I've ever worked which tried this has managed to get a production database deleted by somebody running tests.
Why not?
That's just a different way of saying "Every place I've ever worked at use production for testing" :-/
TBH, any place using the same credentials for test and for production have bigger problems than would ever be fixed by mocking.
... is that good? Hell no. But it's a much more common version of reality than you're assuming is happening.
Sure I've made an assumption, but in my defence it's a single reasonable assumption: that one wouldn't be running tests in production.
If you have a leftover `.env` that contains production credentials, you were running your test in production.
Anyway, the missed point is that you can’t just do anything in tests and just expect the best of intentions to ensure it doesn’t backfire. One must also consider the security situation, infrastructure, secrets management, shared environments, and more. It’s not as simple as just plopping down a test database and expecting everything to go smoothly. You wouldn’t be that careless with other things, so don’t do it with tests, and don’t rely on “don’t run tests in production” as your only safeguard.
But, the one important thing is, do not give people direct access to production. And for the few that must have it, it should not be easy to connect to it.
1. Init the DB with some "default" data - configuration, lookup tables, etc
2. Each test in the test suite owns its data. It creates a new account and inserts new records only for that account. It can for example create users on this account, new entities, etc. It can run multiple transactions, can do rollbacks if needed. It is important to only touch the account(s) created by the test and to avoid touching the initial configuration. There's no need to clean up the data after the test finishes. These tests can run concurrently.
3. Create a separate integration test suite which runs sequentially and can do anything with the database. Running sequentially means that these tests can do anything - e.g. test cross-account functionality, global config changes or data migrations. In practice there aren't that many of those, most tests can be scoped to an account. These tests have to clean up after themselves so the next one starts in a good state.
Other approaches had tons of issues. For example if each test is wrapped with a transaction which is later rolled back then testing is very limited - tests cannot use transactions on their own. Savepoints have similar issue.
Obviously, there are some caveats, e.g.:
* While this approach works perfectly for some tests (load testing, performance testing, …), it does not work for others (e.g. unit testing).
* You have to be careful about PII, and sanitize your data.
Just run PostgreSQL on your local machine, connect to that, setup a new schema for every test (fairly cheap-ish) inside a test database.
Creating reasonably scoped reasonably contained "unit-y tests" like this means you will actually be able to understand what is going on. Too often have I seen people set up huge wads of "mock data" and then run all their tests on this. Then Test1 does something Test2 doesn't expect and you're screwed. Or worse: Test42 does something that screws Test185. Good luck with that. Or you introduce a regression somewhere and now you've got tons of data to understand.1. It's easy to create the objects you need
2. Your creation functions are well tested so that the rest of your tests can rely on them.
If you have spotty coverage or just poorly defined creation semantics, or it's a bunch of calls to functions all over the place just to set up your test data, then this doesn't work.
But the solution typically isn't "write a bunch of JSON mock test data", it's to solve those problems.
When you write mock data, you almost always write "happy path" data that usually just works. But prod data is messy and chaotic which is really hard to replicate manually.
This is actually exactly what we do at Neosync (https://github.com/nucleuscloud/neosync). We help you anonymize your prod data and then sync it across environments. You can also generate synthetic data as well. We take care of all of the orchestration. And Neosync is open source.
(for transparency: I'm one of the co-founders)
idk if you've solved this, but PG doesn't like to bind to 0, so you have to manage ports. And I've had issues with processes sticking around if the test driver has crashed (I dont currently, but i'm turning off setsid in postgres).
Start it once across a bunch of suites, and have each suite manage its DB state. Done deal.
Doesn't solve your testing-data question, but it'll save you from spinning up a new DB.
https://github.com/electric-sql/pglite/
https://github.com/electric-sql/postgres-wasm
However, it’s not always possible.
For example:
- you use oracle db (takes minutes to start, license, hope the containers run on ARM fine, etc.) - sometimes an in memory fake is just much faster, and can be an official db on its own for people to try the product - your storage might be only available through a library by a third party provider that is not available locally.
A major benefit of doing separate schemas for each test is that you can run them in parallel. In my experience, unless you have a metric ton of migrations to run for each test, the fact that your database tests can now run in parallel makes up (by a lot!) for the time you have to spend running the migrations for each test.
EDIT: usually we also make utilities to generate entities with random values, so that it's easy to make a test that e.g. tests that when you search for 5 entities among a set of 50, you only get the 5 that you know happen to match the search criteria.
Once that gets a bit too slow, running migrations once before every suite and then deleting all data before each test works really well. It's pretty easy to make the deleting dynamic by querying the names of all tables and constructing one statement to clear the data, which avoids referential integrity issues. Surprisingly, `TRUNCATE` is measurably slower than `DELETE FROM`.
Another nice touch is that turning off `fsync` in postgres makes it noticeably faster, while maintaining all transactional semantics.
No need to spin up a new server, not in a container, not in a new directory, not at all. It's pointless busywork with too many extra points of failure.
Nothing is stopping you using an existing server and creating a new DB, which takes about 1/100th the time that starting up a new server (whether in Docker or otherwise) takes.
Secondly, I don't actually do unit-testing on the database layer - there's little point in it. I test workflows against the database, not units!
What I do is create multiple 'packages' of tests, each with multiple tests. A single 'package' creates a temp db, runs its tests sequentially and then drops the temp db. Each package will setup itself with SQL statements.
This lets the tests perform tests of actual workflows, instead of testing in isolation that an object can be (de)serialised. IOW, I can test that the sequence of `addUser(); setProfilePassword(); signIn(); viewInfo(); signOut();` work as expected, and that `removeUser(); signIn();` fail with the correct error.
Yes.
This one can be very easy, depending on the kind of data you're working with. Many places shall simply dump a part (or the whole if it's not too big) of the production DB into dev and pre-prod environments.
Now if there are sensitive, non-encrypted, data that even the devs cannot see, than it can get tricky (but then arguably they cannot see the logs in the clear either, etc.).
But yeah: a recent dump of the prod DB is good, representative data.
I've worked at places where pre-prod had a daily dump of the prod DB. Simple.
Actually that's wrong too. The production database will be different than the "testing Postgres instance", leading to bugs.
It turns out that whatever testing solution you use, if it's not the actual production instance and you're not using real production data, there will be bugs. Even then there's still bugs.
This is the simple truth: you can't catch all the bugs. Just put in Good Enough testing for what you're doing and what you need, and get on with life. Otherwise you will spend 99% of your time just on testing.
It never happened to me to be honest. This reads an argument for "if you can’t do perfect, just do it badly" but it’s nonsense. Running tests against a local Postgres instance with the same major.minor version and same extensions as your prod instance WILL work.
And testing your storage layer against the database is probably the most reliable safety net you can add to an app.
A team I worked with recently said the same thing. But, as I predicted, they ran into bugs because the CloudSQL Postgres was different than their Dockerized Postgres, even though it was the same core version.
There will always be testing problems you can't anticipate. Especially with systems that are not your own code. Just be ready to adapt your testing when it doesn't work as expected, and don't invest too much in the testing if it's not worth it.
I’ve been ready for 5 years as it’s the duration I’ve been testing my storage layers with 90% integration tests (the 10% being the hard to reproduce error cases, these tests have low value but are easy to test with mocks so I still test them). The only issue I’ve encountered was with time zones (shocking), and it made me ensure I got full control of the time zones in my app, in my deployment, in my local Postgres and in my prod Postgres, so net benefit in the end.
- Access to different parts of the database are limited in CloudSQL since it's a managed database. This makes some features of automated tooling (like migrations) not work on CloudSQL (i'm not saying migrations don't work, i'm saying some features do work, some don't). Sometimes elevated permissions can fix it, but some aspects are just walled off.
- There are schema differences from the stock Postgres (I don't remember specifics). No support for custom tablespaces.
- Import/export can often lock up a CloudSQL instance, whereas it might work fine on a local instance.
- You only get Read Committed transaction isolation.
- Operation is different (load, logging, replication, nodes, etc) and this affects performance, and performance can lead to bugs if there's an expectation of a certain run time or performance that doesn't match up with the development experience. Often times some job will have run faster on a laptop than in the cloud, and that leads to weird corner cases where production has an issue due to weird performance and it has to be triaged in the cloud and a fix applied to avoid it. Performance issues don't sound like a bug, but if you have to change your code to make it work in prod, it's a bug.
- To access CloudSQL they want you to use a proxy app, and that can have inconsistent results compared to a dev connecting directly to a Postgres instance. Even something as simple as handling reconnects is often not considered when all you have is a local instance that never needs reconnecting.
- There's a limited selection of Postgres extensions supported. And I could be wrong but I think the version of the extensions is pinned for each version of Postgres core used in CloudSQL.
To all of this you might reply "well none of that affects me", and that's fine... but it does affect other people, and that's important to note when you're telling people on the internet there will be no problems.
If my prod were using CloudSQL, I'd use CloudSQL for tests too. Haven't noticed so many differences between Heroku Postgres and stock.
After figuring it out, I swapped us over to running all the tests that hit the database against the real database, in a testcontainer, with a RAM disk for minimizing query latency. It was about a day's worth of work, and turned up a few other bugs that hadn't bit us in production yet, too. Also sailing past our test suite because the test double failed to accurately replicate the behavior in question.
Total time to run CI went up by about 10 seconds. (For local development you could chop that way down by not starting a fresh server instance for every test run.) Given how many person-hours we spent on diagnosing, resolving, and cleaning up after just that first defect, I estimated the nominally slower non-mocked tests are still a net time saver if amortized over anything less than about 50,000 CI runs, and even then we should probably only count the ones where an engineer is actually blocking on waiting for the tests to complete.
That said, there was a time when I thought test doubles for databases was the most practical option because testing against real databases while maintaining test isolation was an unholy PITA. But that time was 5 or 6 years ago, before I had really learned how to use Docker properly.
The same logic does not apply to all ORMs, of course, which do not all benefit from the same quality of professional dev support that EF receives from MS. But that's my main point from above: the correct design decision depends on the context. For services written in other languages with other ORMs or raw SQL, I absolutely will spin up a full Postgres test container because it is indeed trivial (have one running in the background on my laptop right now in fact). It just isn't necessary in the specific context of EntityFramework code.
I love EntityFramework, it's easily the best ORM I have ever used but it has a few cross-platform footguns that require testing against the actual database service you're using.
I ended up a few times with discrepancy in the format (excel, .net and windows ) because someone changed it.
Can you please explain what you did with a RAM disk to speed them up?
Those rate limits, bans, and other anti-abuse mechanisms are things that would be good to uncover and account for during tests. Better for the test suite to detect those potential failures than the production deployment :)
Every tests seeds all the data needed to run (user, org, token), it requires an initial setup but then you just reuse it everywhere, and voila. No side effects, no mock to maintain, it also test your auth and permissions, almost 1:1 with prod.
Can also be used to test version updates of your DB.
https://github.com/dolthub/dolt
We use the reset functionality to speed up our tests.
https://www.dolthub.com/blog/2022-06-10-enginetest-perf/
I think the author might be conflating good tests with good enough tests. If IOService is handled by a different team, I expect them to assure IOService behaves how it should, probably using tests. The reason we're mocking IOService is because it's a variable that I can remove, that makes the errors I get from a test run MUCH easier to read. We're just looking at the logic in one module/class/method/function. It's less conceptually good to mock things in tests, since I'm not testing the entire app that we actually ship, but full no-mocks E2E tests are harder to write and interpret when something goes wrong. I think that makes them a less useful tool.
The thing I do agree on, is assuming your mocks should only model the happy path. I'd say if something can throw an exception, you should at least include that in a mock. (as a stubbed method that always throws) but making the burden of reimplementing your dependancies mandatory, or relying on them in tests is going to mean you write less tests, and get worse failure messages.
Like everything, it depends eh?
If they're decoupled, there's no need to mock protocol or serialization.
There's some cliché wrt "don't call me, I'll call you" as advice how to flip the call stack. Sorry, no example handy (on mobile). But the gist is to avoid nested calls, flattening the code paths. Less like a Russian doll, more like a Lego instructions.
In defense of mocks, IoC frameworks like Spring pretty much necessitate doing the wrong thing.
If the test is hard to debug when it goes wrong, then I assume the system is hard to debug when something goes wrong. Investing in making that debugging easy/easier unlocks more productivity. Of course it matters on how often bugs show up, how often the system changes, the risks of system failure on the business, etc. it may not be worth the productivity boost to have a debuggable system. In my cases, it usually is worth it.
But also, if you're able to fix things effectively from E2E test results due to a focus on debug-ability, then that's great! I think it's just the framing of the article I have trouble with. It's not an all or nothing thing. It's whatever effectively helps the devs involved understand and fix regressions. I haven't seen a case where going all in on E2E tests has made that easier, but I haven't worked everywhere!
Mocks have a perfectly viable place in testing. They help establish boundaries and avoid side effects that are not pertinent to the logic being tested.
I would reference the testing pyramid when thinking about where to be spending time in unit tests vs. integration tests vs. end to end tests. What introduces risk is if we're mocking behaviors that aren't being tested further up the pyramid.
I've come to the opinion that test doubles of any kind should be used as a last resort. They're a very useful tool for hacking testability into legacy code that's not particularly testable. But in a newer codebase they should be treated as a code smell. Code that needs mocks to be tested tends to be code that is overly stateful (read: temporally coupled), or that doesn't obey the Law of Demeter, or that does a poor job of pushing I/O to the edge where it belongs. And those are all design elements that make code brittle in ways that mocking can't actually fix; it can only sweep it under the carpet.
Even if pushed to the periphery, how do you test the wrapper you built to hide these failures from the rest of your code base? If you don’t hide these failures in some wrapper, how do you test that your system handles them properly?
Additionally, mocking/faking is often the only way to simulate error conditions. If you are testing a client that calls to a remote service, you will have to handle I/O errors or unexpected responses, and that requires mocking or faking the remote service (or rather, the client side transport stack).
But yeah, I definitely think mocks should be used judiciously, and I _really_ think monkeypatch-based mocking is a travesty (one of the best parts about testing is that it pushes you toward writing maintainable, composable code, and monkey patching removes that incentive--it's also just a lot harder to do correctly).
with better separation of concerns and separation of compute from IO one should not need mocks.
only unit tests + integration e2e tests
One of his key points is that throwaway test code is only a problem if you neglect to throw it away.
Some of the advice is good, like decoupling I/O and logic where that makes sense. But the general idea of mocking being an anti-pattern is overreach.
This kind of thinking is overly rigid/idealistic:
> And with Postgres you can easily copy a test database with a random name from a template for each test. So there is your easy setup.
> You need to test reality. Instead of mocking, invest in end-to-end (E2E) testing.
"Easily" is like "just." The ease or difficulty is relative to skill, time, team size, infrastructure, and so on.
As for testing reality, sure. But there's also a place for unit tests and partial integration tests.
In some situations, mocking makes sense. In others, full E2E testing is better. Sometimes both might make sense in the same project. Use the right tool for the job.
This goes back to team size and skills. Not all teams have build engineers. And not all mocks are so complicated that they take up that much time.
Again, it depends on the scope and the resources. The article goes too far by calling mocking an anti-pattern. It simply isn't.
If it's a small, standalone CRUD app on a SQLite database, mocks would probably be a bad option, sure.
On the other hand, it could be an integration platform that integrates with many third-party services. Some of them may not have test environments. Or some of the integrations may be written by third-party contractors, and we can't expose service credentials because of poor permissions granularity. Mocks are a good option there.
That being said, of course, "it depends" on your use case. But I've found setting up this sort of test environment quite a bit easier now than writing database mocks, a lot less time-and-maintenance intensive, and relatively quick to run in any environment.
(Also, in a decade, I'm pretty confident this gap will get even smaller, while the time to maintain mocks will stay constant)
[0] https://www.postgresql.org/docs/current/manage-ag-templatedb...
I wrote some code recently that detects cycle errors in objects with inheritance and I mocked the DB calls.
- Did I test for DB failures? No, but that's not the goal of the tests.
- Could I have refactored the code to not rely on DB calls? Yes, but every refactor risks the introduction of more bugs.
- Could I have launched a temporary DB instance and used that instead? Yes, but there's no obvious reason that would have been easier and cleaner than mocking DB calls.
In python it wasn't hard to implement. It was the first time I'd used the mock library so naturally there was learning overhead but that's unavoidable - any solution would have learning overhead.
This is a common misconception (one that I also initially held). Unit tests aren't meant to find bugs, they're meant to protect against regressions, and in doing so, act as a documentation of how a component is supposed to behave in response to different input.
That hasn't been the general consensus on unit tests for at least 30 years now. Regression tests are a small subset of tests, typically named for an ID in some bug tracker, and are about validating a fix. The majority of unit tests catch issues before a bug is even opened, and pretty much any random developer you talk to will consider that to be the point.
This is how I also tend to think of them, but it's not how the phrase is generally used. The general meaning of regression tests it to ensure known correct functionality doesn't break with a future change. There's no actual requirement it be tied to a known bug.
The "issue" that is being caught is the bug the parent is talking about, not a "bug" in JIRA or something.
1. Changes often require changing the functionality of a component, which means many of the current unit tests are bunk and need to be updated. Changes that are simply refactoring but should retain the same behavior, need to update/rewrite the tests, in which case again often requires significant refactoring of the existing tests.
2. Small isolated changes usually require testing everything which in a big org is very time consuming and slows down builds and deploys unnecessarily.
3. A lot of false confidence is instilled by passing unit tests. The tests passed, were good! Most of the production bugs I've seen are things you'd never catch in a unit test.
I really can't imagine a large refactor where we wouldn't end up rewriting all the tests. Integration tests are much better for that imo, "units" should be flexible.
Refactoring under the same contract should not lead to refactoring of tests. Unless of course you introduce a new dependency you have to mock ? That's just one example.
If your code changes a lot it has nothing to do with tests being hard to change. It has to do with the code it tests changes too often. Poor contracts perhaps.
And just like the parent comment. Tests are not about finding or solving bugs, they are about regressions and making sure your contracts are correctly implemented.
The only part where I see unit tests being useful for refactoring is making changes to the internals of a single unit. Its always been more trouble than its worth for me.
In some cases it makes sense, like testing small units that heavy in logic (function that calculates order prices for example, scientific computing, etc). But unit testing every single piece of code has always seemed dogmatic to me (unit tests are good engineering, write unit tests always everywhere). Everything has tradeoffs and as engineers I think our job is to understand the pros and cons and apply them effectively.
Unit tests are most often used to cover a few more lines that need coverage. That's the value they provide.
You could do it in Word instead, I suppose, but if you write it in code then a computer can validate that the documentation you wrote is true. That brings tremendous value.
That’s assuming change is even necessary. Oftentimes you only need to modify the implementation, which doesn’t change what is defined. A change in implementation has no impact on the outside, at least as long as you have properly covered your bases, which should you should be able to do as long as you have proper documentation. Without documentation, good luck to you.
There is a whole world of errors that can occur during IO. What happens if I get a 500 from that web service call? How does my code handle a timeout? What if the file isn't found?
It is often only possible to simulate these scenarios using a mock or similar. These are also code paths you really want to understand.
Then your tests don't need behavioral mocks or DI, they just need the different shapes of data and you test your own code instead of whatever your IO dependency is or some simulation thereof.
I agree with you in general. But it always feels like there are spots where a mock of some kind is the only way to cover certain things.
But they really aren't "mocks" in the sense of behavioral mocks via IoC/DI and you don't need to manipulate them via some kind of interface in order to put them into the right state for your particular tests.
There are some extra steps, but you get extremely simple and reliable tests in return.
In many(!) cases you already have a data interface, especially with HTTP/REST APIs. All you need to do is simply not bury the IO call down the stack and maybe describe the failure conditions as plain data in your signature and voila.
(This is not a replacement for higher order testing like, manual, E2E or integration tests. But it certainly beats unit testing with mocks IMO.)
~ Rich Hickey
For example making a wrapper around a message system if you don't mock, it tests both your code and the message system.
However the the overhead of keeping the mocking system up to date is a pain in the balls.
Also, when your integration suite (or customer reports) discovers that the payment gateway fails on punctuation, another unit test should be added with a mock that responds the same way, and an E2E test added with punctuation in the input data and a failure expectation.
What makes you so certain you would have included punctuation in the input data if the test had not mocked the gateway?
You can get the best of both.
Unit tests are really for purposes of documentation. They show future programmers the intent and usage of a function/interface so that others can figure out what you were trying to do. Mocking is fine here as future programmers are not looking to learn about the message system here. They will refer to the message system's own documentation when they need to know something about the message system.
Integration tests are for the more classical view on testing. Beck suggested this is done by another team using different tools (e.g. UI control software), but regardless of specifics it is done as a whole system. This is where you would look for such failure points.
However, depending on the industry, code coverage is a valuable tool to gauge the maturity of the software baseline and burning down software execution risk. One example of this is Airworthiness or Safety Critical Code.
This is a problem with the test authors, not mocks.
“All the bugs are when talking to an actual database.”
Databases have rules that need to be fillowed, and a lot of those can be tested very quickly with mocks. The combined system can have bugs, so don’t only use mocks. Mocks and unit tests are not a substitute for all the other tests you need to do.
How this person can claim to be a CTO I have no idea.
Try mocking DB triggers, views, access rules etc in mocks and you will know why most teams don't bother mocking but use the real thing instead.
And about the comment about him being a CTO. Well he is a CTO and you?
Everything you are describing is about actually testing the database. A database is a complex server and things like db triggers and store procedures should be tested isolation too. And then you have integration tests too.
My team just found a bug that wasn’t covered in a unit test. We found it in a long running API test. And so we added a unit test for the specific low level miss, and a quick integration test too.
This article did not change my opinion on the subject.
The word anti-pattern is confusing in itself. "Anti" is usually a prefix for something that battles or goes against the word that it prefixes.
In my opinion a better word be a hostile or adverse pattern.
The GoF pattern book did list patterns, but it primarily argued for a simple language about patterns: context, problem, solution, limitations. It's clear.
The blog-o-sphere recipe of click-bait, straw-man, glib advice designed not to guide practice but to project authority (and promise career advancement) is the exact opposite, because it obfuscates.
The point of writing is to give people tools they can apply in their proximal situation.
Are you really testing if your solutions start by refactoring the code to be more testable? That's more like design if not architecture -- excellent, but well beyond scope (and clearly in the CTO's bailiwick).
And as for mocks: they're typically designed to represent subsystems at integration points (not responses to functions or IO/persistence subsystems). How hard is that to say?
The CTO's way is not to win the argument but to lead organizations by teaching applicable principles, providing guardrails, and motivating people to do the right thing.
Sorry to be exasperated and formulaic, but I think we can do better.
Not to say you should mock everything. Of course having proper integration tests is also important, but articles like these will rarely tell you to have a good balance between them, and will instead tell you that something is correct and something else is wrong. You should do what makes sense for that specific case and exercise your abilities to make the right choice, and not blindly follow instructions you read in a blog post.
Additionally, just using integration tests does not guarantee that edge cases are covered, and you can just as easily write integration tests for happy path, without thinking about the rest.
Context always matters.
But, I’ve also suffered the opposite: having to use a lib that assumes it only runs in production, and always initialises some context no matter what (up to assuming only a specific VM would be used, never ever elsewhere, especially not in local)
In the wild, I’ve rarely (if ever) saw code that was too testable. Too complex for no reason? Yes.
The problem is in the name, unit test should be called implementation spec or in-code documentation.
Each layer of testing has its roles and serves a different purpose.
Some criticism to the article:
The "more unit testing" section reminds me of junior devs asking why they can't test private methods in Java. If I'm testing a unit, I want to test the contract it promises (in this case, a method that does some checks and then sends something). That the behavior is split between multiple methods is an implementation detail, and writing tests around that makes changes harder (now I can't refactor the methods without also having to update the tests, even if the contract doesn't change) and it doesn't even test the contract! (There's nothing that makes sure that the mail is actually sent - we could be testing methods that aren't used by anything but the test code)
For the "easier to test IO" section: just don't. Your tests now depend on some in-memory implementation that will behave differently than the real thing. That's just mocking with extra steps, you still don't know whether your application will work. If you want to do io, do the real io
"Separation of logic and IO": this is in general the right thing to do, but the way it's described is weird. First, it does the same as in the "more unit testing" section with the same problems. Then, the code is refactored until it's barely understandable and the article even admits it with the Greenspan quote. In the end, the production code is worse, just to ... Not test whether there's actually some code doing the IO.
I actually think there are some good ideas in there: separating the logic from the IO (and treating them as separate units) is important, not just for better testability, but also for easier refactoring and (if done with care) to be easier to reason about. In the end, you will need both unit and integration tests (and if your system is large enough, e2e tests). Whether you're using mocks for your unit tests or not, doesn't make much of a difference in the grand picture.
Just don't mock stuff in integration or e2e if you absolutely can't prevent it.
The only strategy I'm aware of is described in `Working Effectively With Legacy Code`, where you start by writing throwaway unit or E2E tests that give you "cover" for being able to refactor. These tests depend on the implementation or may use mocking just to get started. Then you refactor, and write better unit tests. Then get rid of the throwaway tests.
You can attack from both directions: e2e tests make sure that certain processes work in fairly ordinary situations, then look for little things that you can unit test without huge refactoring. When you've pushed these as far as you can, section off some area and start refactoring it. Do your best to limit your refactoring to single aspects or areas so that you are never biting off more than you can chew. Don't expect everything to become wonderful in one PR.
Your e2e tests will catch some errors and when you look at what those commonly are then you can see how to best improve your tests to catch them earlier and save yourself time. In python I had stupid errors often - syntax errors in try-catch blocks or other things like that. If I used a linter first then I caught many of those errors very quickly.
I was working on a build system so I mocked the build - created a much simpler and shorter build - so I could catch dumb errors fast, before I ran the longer e2e test on the full build.
IMO you need to progress to your vision but trying to reach it in one step is very dangerous. Make life better piece by piece.
You can even do PRs where you only add comments to the existing files and classes (not too much detail but answering questions like "why" is this file/class here). This helps to make sure you really understand the current system is doing before you change it.
I once added type hints everywhere to a legacy python program - it wasn't as helpful as I'd hoped but it did prevent some issues while I was refactoring.
It takes time to come up with good protocols that will remain stable and it might not be worth the effort to test it when the protocol design is new and still in flux, and you don’t have alternative implementations anyway. This is often the case for two internal modules in the same system. If you ever want to change the interface, you can change both of them, so an integration test will be a better way to ensure that functionality survives protocol changes.
Database access tends to be a bad thing to mock because the interface is very wide: “you can run any SQL transaction here.” You don’t want to make changing the SQL harder to do. Any equivalent SQL transaction should be allowed if it reads or writes the same data.
Compare with testing serialization: do you want to make sure the format remains stable and you can load old saves, or do you just want a round trip test? It would be premature to test backwards compatibility when you haven’t shipped and don’t have any data you want to preserve yet.
One paragraph I think is missing: error handling. You want units to be able to error so you can validate error handling which is _very_ hard on E2E tests. You can simulate disk full or db errors and make sure things fall back or log as expected. This can be done with fakes. Mocks are a specific type of test double that I have very little use of.
"Why would people mock everything? Why not stand up a real test db and test on it?" Because the test zealous have explicitly declared that EACH test should be atomic. Yes you can find these people at major tech conferences. Each test should mock its own db, web service, etc. Every single time. And it should do that in no more than a few milliseconds, so that the entire project compiles in no more than 2mins, even for the largest and most complex corporate projects. And these tests should be fully end-to-end, even for complex microservices across complex networking architecture.
Some of you may be rolling on the floor laughing at how naive and time-consuming such a project would be.
We all agree such testing is a noble goal. But you need a team of absolute geniuses who do nothing but write "clever" code all day to get there in any sizeable project.
My organization won't hire or pay those people, no matter what they say about having 100% coverage. We just do the best we can, cheat, and lower the targets as necessary.
But tests that are not actually unit tests masquerading as unit tests and vice versa is arguably the bigger problem here. Not mocking per se.
First of all, there are languages where dry-running your code with all parameters mocked is still a valid test run. Python, js, and Perl for instance make it very simple to have a stupid error in the routine that crashes every run.
But more importantly, a unit test usually executes inside the same process as the code. That gives you tremendous introspection capabilities and control over the execution flow. Testing for a specific path or scenario is exactly what you should do there.
Finally, what if not mocks, are in-memory filesystems or databases? They, too, won't show all the behaviors that the real thing will do. And so so test containers or even full dedicated environments. It's all going to be an approximation.
Eventually, what they mean is that if it sucks, it's what they're talking about, and you should never do that. If it was really useful, it's not a mock.
Do away with mocks/stubs in favor of just calling the code you're testing, intentionally using a test-only settings file (e.g., so you can use a dev account for third-party APIs). You can easily write clean up code in your test this way and be certain what you've built works.
[1] https://cheatcode.co/joystick
[2] A mirror of the app/db creates a worry-free test env that can easily be reset without messing up your dev env.
Then mocking is "inside-out" testing. You check that your code is passing the right params/request to some dependency and reacting correctly to the output/response.
Its really the same thing and you can flip between them by "inverting".
Sometimes mocking just makes much more sense, and sometimes just passing paramaters to a function directly does. The end goal is the same: test some unit of codes behaviour against some specific state/situation.
They have their place but like all testing should be layered with other types of test to "test in depth".
You wanna know how to test without mocking? Use any kind of test. Seriously, just make a test. I don't care what kind of test it is, just have one. When you notice a problem your testing doesn't catch, improve your testing. Rinse, repeat. I don't care what kind of 10x rockstar uber-genius you think you are, you're going to be doing this anyway no matter what super amazing testing strategy you come up with, so just start on it now. Are there some ways of testing that are more effective than others? Yes, but it depends. If testing were simple, easy, straightforward and universal we wouldn't be debating how to do it.
(about 99% of the time I'm disappointed in these clickbait blog posts upvoted on HN. they are shallow and brief (it's a blog post, not a book), yet quite often dismissive of perfectly reasonable alternatives, and in the absence of any other information, misleading. it would be better to just describe the problem and how the author solved it, and leave out the clickbaity sweeping generalizations and proclamations)
And let's not forget that testing things locally means you are mocking the network, or lack-thereof. "Mocking is an anti-pattern" is a great sentiment if you ignore costs or restrictions in the real world.
So a good approach would be to have tests where you can run with the mock and then run the same tests with the real system. Anything you catch with the mock saves you from using the costly system but you still get real testing.
No I don't really charge em – but it gets the idea across that mocks have costs that you don't always see up front.
foo calls x(user, date) foo mock # tests pass
x changes to x(user, time)
but the tests for foo do not change, tests still pass, runtime errors.
If you have static/strong typing the compiler will pick this up – but for dynamic languages you have a problem.
I then run unit tests against the fake io object. I don't mock internals, only boundaries. If for whatever reason i want to test it against the real db i can simply swap out the fake for the real object.
If B also frees that memory then there is a bug. Presumably this means B's tests are wrong/incomplete. If B was mocking A to avoid the IO, you might not find out.
The article does not seem to bring up this way to do it.
I disagree. If you want to send your test suite into the toilet, add a headless browser driver and nondeterministic assertions based on it. Most output that becomes UI can be tested; the rest can be checked by a quick QA.
should be more like
"Enhance your use of Mocks with better unit tests and integration tests".
The listed complaints sound more like problems with sloppy/lazy coding practices than actual problems with mocks.
Not sure how this will solve edge cases problems described at the beginning of the article
From memory the HTTP client 404s every request in testing mode
The article is stating that almost nobody goes through the trouble of implementing a mock database perfectly, they just do something like make a single call return some hard-coded data. While this works a bit, it means that if the database ever changes its interface you have to remember to notice and implement that change as well.
In fact, Mocking is an essential tool for writing _unit_ tests; you know, testing exactly one thing (a 'unit') at a time. In Java for instance, a 'unit' would be a single static method, or a single class. Other languages will have different definitions of these terms, but the essential point would be "smallest reasonable grouping of code that can be executed, preferably deterministically"
The problem is people conflate the various levels of integration tests. You actually should* have both: Full unit test coverage + an integration test to prove all of the pieces work together successfully. Small unit tests with mocks will point you _very quickly_ to exactly where a problem is a codebase by pointing out the effects of contract changes. Large integration tests prove your product meets requirements, and also that individual components (often written by different teams) work together. They are two different things with two different goals.
* Important Caveat on the word 'should': Testing de-risks a build. However, if your business product is a risk itself (lets say you're hedging a startup on NFTs going wild), then your testing should reflect the amount of risk you're underwilling to spend money on. Unit testing in general speeds up development cycles, but takes time to develop. A good Software Engineering leader recognizes the risks in both the business side and development side and finds a balance. As a product matures, so should the thoroughness of it's testing.
For car crash tests, we should always use full humans. A test dummy might have a lot of sensors and be constructed to behave like a human in a crash, but you'll never get the full crash details with a doll.
Notice the problem here? This argument does not consider the costs and risks associated with each approach.
For testing, IO is very expensive. It leads to huge CI setups and testsuites that take multiple hours to run. There is no way around this except using some kind of test double.
Of course you can always pick it apart and refactor so it can be unit tested, but sometimes the effort required makes mocking look pretty appealing.
It's actually super hard to get Postgres to fail, which is what you will be most interested in testing. Granted, you would probably use stubbing for that instead.