As "observable state goes from a to b" is much closer to a business/functional requirement that will still/always be true, regardless of refactorings.
Refactoring in codebases with state-based tests is a pleasure; in codebases with mock-based tests it's tedious, constantly updating tests when no semantic behavior was supposed to change.
Also, mocking via module hacks like in the article (and in the JS world) is scary; modules are basically global variables so it's a very coarse grained slice point. Dependency injection is almost always better.
Dependency injection leads to elaborate, brittle fixtures that become more elaborate and arcane over time.
And the sunk cost fallacy has people working for hours (and I’ve witnessed pairs spend two days, that’s over three man days!) trying to maintain them.
I use mocks to arrange state A without going through all the business rules involved in creating state A. But you have to expose the states to do it, which can make for pretty descriptive code but is outside of some people’s experience and so they resist it.
Earlier tests have already verified all routes to State A, the next batch of test now takes it as a given. This controls test explosion by utilizing transitivity. A->B B->C implies A->C. Tons of unit tests for A->B and B->C and then you only need a couple of functional tests (say, one negative and positive) test of A->C. You’re just checking the plumbing isn’t broken.
Otherwise, you get Cartesian products. You end up with elaborate, (often custom), mocks that couple all of the tests together in hard to maintain ways. You end up with tests that accidentally test the mocks/fixtures instead of the code.
Some of my current coworkers do this too. I don’t know where this pattern comes from. It’s often easier to replace their two page fixtures with two or three lines of mocks per test. It’s often almost the same amount of code, but so what, each test can be read. Each test can be run individually, and behavioral changes to the code due to urges affect mostly the tests you would expect. Those tests can be fixed, rewritten, or just removed as they disagree with the new requirements.
The absolute worst is fixtures with asynchronous code. Those break constantly, and often invisibly. What’s the point of a fire alarm if the damned thing doesn’t work? It’s almost worse than nothing at all.
> The absolute worst is fixtures with asynchronous code
I concur that setting up a test for async code is a complete pain in the arse. It's extra fun when the caller doesn't get a reply, so there's no response that you can assert your as valid or invalid. It's extra, extra fun when the tests in your test suite need to run in parallel without stepping on each other.
Here's an example. Say you've got PipelineService123 that receives a message with a chunk of data, transforms that data, and sends it elsewhere. Try thinking about dynamically setting an instance of PipelineService123, wiring up an input message sender, giving that sender the dynamic address of the instance of PipelineService123, wiring up an out message receiver, making sure the instance of PipelineService123 knows how to reach the receiver, all to test whether or not everything is wired up correctly & that the data is being sent, transformed, and received properly, running this test in parallel with all of the other test cases, and keeping it reasonably easy to reason about. Good luck with that! I'm not sure even I even stayed on top of my attempt to describe it :P
I've noticed that my tests are a lot less tortured when the language supports async/await.
But I do wonder if await should be the default behavior, since most async code is still sequential within the operation, at least until we start composing async operations. In which case having the keywords point out the composition probably improves reading comprehension.
In my previous work I had to work on a big asynchronous codebase in JavaScript and I had zero issues with asynchronicity at all, so maybe it's an ecosystem/language dependent issue ?
It depends on what you mean by parallel: nodeJs is single-threaded, so every CPU-bound task runs sequentially. But all the asynchronous tests are run concurrently by the test runner (Mocha[1] in my case).
> Dependency injection leads to elaborate, brittle fixtures
I believe that can happen, but personally (N=1) haven't seen it. If anything, (well-done) DI is supposed to prevent that, b/c fixtures are isolated instead of depending on random global behavior.
That said, I also don't like Spring/Guice/Dagger auto-wiring DI (somewhat guessing that is what caused your headaches), and instead just create an "AppContext" type with all of the applications "singletons" and pass that around.
Granted, it's still global-ish, so maybe I'm cheating, but its nicer IMO than module hacks.
> I use mocks to arrange state A without going through
> all the business rules involved in creating state A
I like tests being able to immediately jump to state A, but fwiw don't see why mocks would be needed to do so.
I do agree re avoiding test explosion/transitivity, and a few functional happy/sad plumbing tests, but again seems orthogonal to state/mock.
> You end up with elaborate, (often custom), mocks that
> couple all of the tests together in hard to maintain ways
Totally agreed. Another -1 for mocks. :-) I.e. with state-based you should be able to test "end-to-end" (with state-based / in-memory stubs for your input/output data stores/etc.) without any of "oh right, copy/paste these 5 lines of 'when method X return result Y' for ~2-5 some mocked-out calls).
(And, to be pedantic, if you mean something other than 'when method X return result Y' for the term 'mock', then we're probably talking about different things.)
> That said, I also don't like Spring/Guice/Dagger auto-wiring DI (somewhat guessing that is what caused your headaches), and instead just create an "AppContext" type with all of the applications "singletons" and pass that around.
> Granted, it's still global-ish, so maybe I'm cheating, but its nicer IMO than module hacks.
I agree with you. I would like better support for (a) intersection types and (b) excluding global names.
With intersection types, you avoid the globalish nature, since you can say "this method takes a PersistentUserStore", "this method takes a LoggerInterface", and "the parent method takes a PersistentUserStore&LoggerInterface" which can be passed to both. Lo and behold, the fact that it's globalish disappears since at any given point, you only
Unfortunately, as with all things, PhpStorm does this exactly wrong - so that an intersection type is inspected as tho it were a union type and it gives me no particular benefit. Jetbrains seems less interested in helping me avoid bugs and focuses on helping me avoid reading the php.net manual. I appreciate it when I don't need to check out the manual, but I need to avoid bugs.
As for excluding global names - if there were some way to make it illegal to say `new Foo`, except in the top of the code, for certain values of "Foo", it would be a serious help in keeping my tests of business logic free of implementation details. I'm not entirely sure how to restrict those "certain values of Foo" though.
> I use mocks to arrange state A without going through all the business rules involved in creating state A.
An issue is that unless the types are really nailed down you can create easily create invalid or incomplete states, or create a state which is correct at one point then becomes invalid / incomplete at an other.
> Earlier tests have already verified all routes to State A
Unless they also assert the complete state at that point and that assertion is tied into the next state[0] then what they've verified is a series of steps.
Seems risky to assume it matches what an unrelated tests starts with, unless you've implemented state restoration, at which point you'd have to add cross-test dependencies such that the result of the tests validating section A can be saved and restore for every test needing a state A to test section B.
[0] that is, ∅ -> A tests end with an identity check for state A (that the system entire has no more and no less than expected) and that A -> B tests start with one, right after they've generated their fake state
I don't see how you decided that DI requires some kind of special fixtures that are more complex than mocks.
Mocks are generalized fixtures—just a reusable way to create them. And you can use them with DI. Only the method of delivery is changed, from an implicit backdoor to an explicit argument.
DI puts new requirements on production code and tests that aren't related to a specific mock, that much is true. At least if there's no automatic way to create ‘default’ dependencies.
That was an empirical statement. Inasmuch as it was a value judgement, it was a condemnation of hand-rolling your own mocks. Which people seem to do with an alarming regularity, and with a fairly consistently appalling degree of skill.
Except that with singletons, ‘context objects’ or similar quasi-global state, the effect might be non-isolated, and effects may be silently introduced that the calling code doesn't know about.
A situation in which I've used mocking well: testing parsing of data coming over a serial port. I have had several situations to need to parse common data formats coming from hardware over a wire and mocking that data with recorded streams is super helpful for making sure your parser is working right.
It sounds to me like you've got two things: some hardware code, and some data code. You're not testing the hardware code, you are testing the data code.
If you split them into two places, you could get the same effect, with no mocks. And be more clear about what parts are tested and what aren't.
I really don't like that the topic title is a general statement but the article under is talking strictly in python landscape as an example and not discussing the idea itself.
Mocking by itself works fine. It's a good idea that works if used correctly. Misusing it or bad usage leads to issues - duh.
I'm familiar with the narrative that anything that is too hard (to get right at the first try) in our field means that it is bad - but I don't agree with it at all. We are at the point where craftsmanship should be a good metric to distinguish medicore developers from experts..
I had the same disappointed reaction. I wanted to read something general about mocking. But then I remembered that Ned writes a lot about Python, so I could've guessed it would be about Python. Most of his readers would know too.
Not everything is written to score points on HN. If the title is confusing here, that's our problem, not the authors'.
What title would you prefer? "Your Python mock might not work, but it could still be a good idea if you do it right, and here I will explain how"? :)
I didn't mean to imply that mocking is bad. Is that what you took from it? Why would I explain how to get mocks to work if I thought people shouldn't use mocks?
I’ve recently learned that some actually do consider mocking harmful.
Where I work right now, there’s a really outdated and unfashionable fight over the benefit of unit testing in general. Existing engineers don’t see value in test driven development, exhaustive testing, unit testing, and mocks/spies are thrown in... and we’re a python shop. I’m utterly confused. I, too, grew concerned after reading–will I be hearing this cited/twisted as further evidence against investing in our dreadful testing situation?
I'm a died in the wool TTDer, but I actually think that mocking unnecessarily is usually harmful. I use it as a technique of last resort.
I should define my terms before I explain, because many people use the term "mock" to mean things it didn't originally mean. Test objects that are used in place of ___domain level objects were traditionally known as "fakes". A fake that represented a fixed known value was called a "stub". A fake that included an assertion that a function was called (or that collected data on function calling) was called a "mock". It's a bit confusing for me that many people use the term "mock" to mean "fake". I found it weird that the original article pointed to an article on faking and then used the term "mock" without referred to the original meaning of the that word (which makes me wonder what they mean when they say "fake").
Anyway, usually you want a fake when you don't have access to some part of the system to test it directly. Sometimes that's because it's a completely different service. You can fake out that service so that you can see if the code that interacts with that service is working, without having to actually set up the service.
A stub is useful in situations where you need to know that your code is working with specific values of data inputs. So you might have an object that you pass to a function and you want to know what happens if one of the properties on the object is null. It might be hard to set that up, so you can stub it out.
A mock on the other hand, basically tests if a function is called. A good example where you might legitimately need a mock is where you pass an object to a function and you are expecting that a callback on that object will be called. It's really hard to test that without a mock.
Where mocks can be dangerous is when you completely mock out any interfaces and stub the return values. You pass a fake object as a collaborator to your function and you test that your function works. The problem is that your fake object may not necessarily represent a real object in the system.
If you ever want to refactor the code, your tests will no longer tell you that a property is missing, or that a function is missing because all of your test code is using fake objects with mocked and stubbed methods. Ideally a unit test that uses an interface should fail when you change that interface. This allows you simply to change an interface somewhere and have your tests tell you exactly what you need to do to make that change work.
Where you end up getting a lot of conflict WRT testing strategy is that some people believe very strongly that unit tests should test things in isolation. Secondly people believe that unit testing should be a black box testing strategy. So you should test through your public interfaces only and any collaborators that adheres to the interface contract should work as expected.
In this style of testing, you are often encouraged to mock anything and everything at the interface boundary. This has many advantages. First, it means that your test objects can be very simple, so writing tests is very quick -- even if the code in the system is complex (because you aren't using any of that code). Second, because you are testing the public interface only and there is minimal setup, your tests become documentation of the interface contracts. Third, because it is black box, if you change the implementation of your "unit", you don't have to change your tests.
Despite these benefits, I'm not a big fan of this style. I like white box testing using real collaborators. My goal is not to define interfaces and nail them up -- quite the opposite. I want to be able to change interfaces fluidly. I value ease of refactoring over just about anything else. Second, I want to use real collaborators almost because it is painful. If your collaborator is awkward and brittle to set up in tests, it is also awkward and brittle to set up in production code. My goal is to remove that and to simply the code. Again, my highest value is my ability to refactor the code. I want the code to become easier to work with over time, not harder and more complex. Finally, I want to code to break at a "white box" level, not a "black box" level when I change behaviour. Ideally, I want my tests to say, "On the third line of that function, we're going to have a problem because that function is different now". I don't want to be aware of problems at a larger scope "Somewhere in function A that calls function B which calls function C and D there is something wrong because it does something weird".
In the end I write small functions that are tested directly with real collaborators. I avoid private functions because it hides my implementation details. I test at a low level so that I avoid test complexity from excess branching. I get incredible specificity from failing tests, when end up essentially giving me a TODO list for what I need to do when refactoring code.
Hope that gives you some idea of at least why one person avoids mocking -- although, you do need it sometimes. And to be fair, sometimes I'll do a London School, outside in, mock the world implementation if I'm not sure what I'm building. However, I throw away all my mocks and re-TDD once I know what I'm building.
> some people believe very strongly that unit tests should test things in isolation. Secondly people believe that unit testing should be a black box testing strategy. So you should test through your public interfaces only and any collaborators that adheres to the interface contract should work as expected.
I think much of the confusion and talking-past-each-other comes from ambiguous language. I actually agree with all the things in the above quote (isolation, black-box, public-only, relying only on specified interfaces). Where I've differed from co-workers is that I consider the appropriate "unit" to be a feature/piece-of-functionality (e.g. "logging in"), whereas they consider the appropriate "unit" to be a piece of code (e.g. a method or class).
I think Michael Feathers explained it the best. He likened unit testing to clamping a piece of woodwork while you are working on it. The bits you are working on need to be in motion because you are working on them. The bits you are not working on them need to be clamped in place -- you don't want those things moving while you are working on some other bit. A "unit" is anything you might want to be clamped in place. It can be a function. It can be an object. It can be a subsystem. You want to unit test at different levels of abstraction so that you can "clamp" those levels of abstraction down.
One of the things I've found people get confused with is that they see unit testing and integration testing as orthogonal. They think a unit test should exercise a small piece of code in isolation and an integration test should test examples of real collaborators. Frequently they mock out all their unit tests and write a few integration tests. Then their unit tests become brittle and annoying and so they delete them, leaving only a few integration tests. This leads a lot of people with the impression that only integration tests are useful. If we can back up and redefine "unit test", then the problem disappears.
I read your "rant". Don't even get me started on BDD :-) Originally people had problems understanding the purpose of TDD because the word "test" had them confused. They would think, "I need to write tests to ensure that this is working". They didn't think about it in terms of clamping the behaviour so that it doesn't change when you are working on another part of the system. For that reason, a lot of people discussed changing the word "test" to something else that truly embodied what TDD was all about. Many people hit on the word "behavior" -- you want to document the current behaviour of your "units" (at different levels of abstraction). Somehow this got totally confused with automated acceptance testing! Now we have things like cucumber (which I don't actually hate, but it accomplishes a completely different goal than TDD!)
What really frustrates me is when I talk to people about this stuff and they think I'm a complete lunatic :-)
Ian Cooper reminds what was Kent's original proposition on TDD, what misunderstandings occurred along the way,
and suggests a better approach to TDD, one that supports development rather impeding it.
I'm not sure what your objection is. People try to use mocks, and they don't work because they've mocked the wrong name. I explain this in the article. The article is about why their mock didn't work. How is this a misleading title? How is this a title I don't agree with?
Are we talking about the same piece and the same title? You seem to be under the impression that I am trying to tell people not to use mocks. Have you read the piece?
I notice that the title on HN is "Why a mock doesn't work," which could be interpreted as "Mocks don't work." My title was (and still is) "Why your mock doesn't work." I don't know if that is the source of the confusion.
Now you seem to be willfully misunderstanding. He states clearly that your article is specifically about mocking in python. But mocking as a concept need not relate to python at all. Hence the confusion.
A better title for your article would be "Why a mock doesn't work in python"
I can see why adding "Python" would help here on Hacker News. The original complaint seemed to go deeper than that, I think because the title here is different than the title on my site.
Personally I've actually come to prefer such titles—where the thesis is in the title, and I can judge at once whether I should read for details. I'd aim for about 200 characters. HN is already full of uninformative links.
Since we're in this discussion, it'd be fun to change the title to “Why Python tests may fail to mock imported module functions, and a better approach to doing that”.
For a person who does not code very large programs in Python, this looks scary and like something that is hardly "only one way to do it" and that the most elegant solution gives you what you want. How I import things (or the libraries I depend on!) affects how I can write my tests? Really?
My own Python scripts are typically single-screen in length. And one-off stuff, where they either work or don't, basically.
Is this stuff really what developers of large-scale Python programs have to take into account? Or is this blog post misinformed, because there is an obviously better and standard way?
no, this blog post is absolutely on point and correct. it can be extremely inconvenient and complicated to get around using mocks in many situations where you want to test things, so we want to use mocks when appropriate. Then, you definitely want to mock at the most specific level possible.
while mocking in a way that is specific to how modules are imported is technically "fragile", in that it is deeply dependent on the structure code that's being tested, this is not an issue in practice, because the mocks are present in our test suites that run for every code change. If a code change moves around module assignments, our test will fail, and we know that we have to adjust the test to accommodate for the change. With appropriate continuous integration and code review practices, a broken mock-oriented test can't be inadvertently pushed into a repository.
This does mean that when using mocks, you need to make sure mocks were called in the way that was expected for those cases where the code might silently move off using the mock in a way that wouldn't be detectable. The "os.listdir()" example in this blog is a pretty common case, using mocks to test code that works with filesystems, where you don't need or want to get involved with actually creating filesystems which may be a complex and expensive process, especially if the test suite runs concurrent processes. if you mock the behavior of "os.listdir" to return a series of results, after the test code has been exercised, you usually want to assert using mock.mock_calls that the test code did in fact call the functions expected, unless it's clear in some other way that the test code definitely used that information.
An example of this kind of code that I just helped someone with can be seen here: https://github.com/sqlalchemy/alembic/commit/02a1bf3454acb7b... The Alembic test suite has a lot of test cases that go through all the trouble to build up real directory structures to test things, but that's a lot more work than just using a mock, so I use them where I can get away with this simpler approach.
> The "os.listdir()" example in this blog is a pretty common case, using mocks to test code that works with filesystems, where you don't need or want to get involved with actually creating filesystems which may be a complex and expensive process, especially if the test suite runs concurrent processes.
Alternative: have an abstract base class that describes a file system API. Have one implementation that builds on top of OS primitives and have another one that is a mock (potentially auto-generated through some mocking framework). That way there is no need to monkey-patch standard library functions at runtime.
I did that within a project of mine, written in Go:
Sure but then I have to write my real library code using an abstraction, making my code more difficult to read and maintain; a dependency injection system is then necessary in order to have the correct concrete implementations set up at runtime.
In this sense, mocks are solving the problem of having code that is full of dependency-injected AbstractFooBarFileSystemWithExtraPickles style of code, which is considered to be pretty un-Pythonic. I spent many years with Java and Spring so I can attest to both sides of this equation.
Those of us using Python are using it because it is an interpreted, dynamic scripting language. If I'm coding in something more rigid like C or Go, then I'd expect to have a more complex architecture in order to achieve things that are fairly simple in a scripting language.
>Sure but then I have to write my real library code using an abstraction, making my code more difficult to read and maintain
Or easier - as the abstraction can simplify the interface, localize the logic for directory work, make it easier to port, and so on. That's the reason why most of us use "requests" and not urllib, too, for example, or Yoda and not Java's legacy time mess.
>a dependency injection system is then necessary in order to have the correct concrete implementations set up at runtime.
You can write alterative implementations (whether with base class, interface, traits, or what have you) and use them for production code and for testing without any dependency injection.
>Those of us using Python are using it because it is an interpreted, dynamic scripting language.
The abstractions mentioned come from Smalltalk, which arguably is the same or even more dynamic than Python. Having a class/interface/protocol/trait and a test implementation is by no means a static typing/Java thing, or counter to a dynamic language, as implied here...
> Sure but then I have to write my real library code using an abstraction, making my code more difficult to read and maintain;
It depends, right? If you suddenly wanted to let your existing set of classes read their inputs not from disk, but from some other kind of storage (e.g., files embedded in a Zip file), you'd only need to write one extra class and you're good to go. That would be a lot harder if your code called os.* directly.
> a dependency injection system is then necessary in order to have the correct concrete implementations set up at runtime.
If by dependency injection system you mean invoking one extra constructor in, say, main() and pass the object along as a handle, sure.
well again, I grew up on GOF programming and once I grokked how mocks in Python worked, I was very glad to embrace their approach, which has allowed me to write much simpler code that is more thoroughly tested; I of course still use abstractions to a great degree, but I no longer have to build out an abstraction system when I just want to make sure some fairly straightforward code is fully tested. I no longer have to build out everything as an abstraction when such a system is generally YAGNI, I can use the Python standard library directly.
This allows me to do less work, write and maintain less code, and have better test coverage. It allows my code to be more fully tested even when it has not yet been abstracted, if that's what's in store for it. Patching local imports within the scope of two lines of code is a non-issue thanks to Python context managers. I have much more complicated examples of code that was already plenty complicated and mocks allowed me to get it tested quickly and effectively, instead of having to break it out into even more complexity.
Basically mocks have been all productivity and no downside for me whatsoever, using the Python standard library mock which is extremely well designed.
Yes - totally agree. Sure, mocking everything is a bad idea. And mocking can make changes brittle. But I would much rather just mock HTTP requests, dates, and filesystem stuff than spend a day figuring out how to write some wrapper around HTTP requests to inject in .NET core.
I think they are asking a question about Python, not about mocks.
I was also very surprised Python does it this way. Normally I would expect "import" statements in a language to just rearrange the namespace and have no other effects at all. The author explains that it doesn't work that way:
> “from mod import val” means, import mod, and then do the assignment “val = mod.val”.
In other words, I would expect an import to alter a symbol table. I would not expect it to create a new variable and then perform an assignment.
How is "altering a symbol table" different than "creating a new variable"? Imports have to create a new variable of some sort: their entire point is to provide a name that you can use.
"import mod" defines mod, which didn't exist in your module before.
"from mod import val" defines val, which didn't exist in your module before.
For a moment, let's talk in general terms, not about how Python chose to do it.
Most languages have scopes. Things may exist in your scope, so you can access them. Or they may exist but in some other scope not visible to you where you currently are. Symbol tables track what is in a scope.
A variable is a name associated with a storage ___location. The ___location could be a stack, heap, register, or something way more abstract as long as it behaves as a "place" you can store a value into and read from.
When a variable is created, a name is defined and some storage is arranged for. But that name is also added to a scope. (If it weren't, it would have a name, but nobody could see it.)
There's no reason, once the variable has been added to one scope, that it can't be added to another scope. That is what "import" statements do in a lot of languages.
Python has apparently chosen a very different approach for imports, which is to create a new variable. This isn't necessarily wrong or right, it's just not something I had ever seen a language do before.
they likely are referring to how import mechanisms in other languages operate outside the scope of imperative execution, like Perl's "use" statement, or in the way that a compiled language like Java handles imports. It is exactly the vast confusion that Perl's "use" caused me, even after I used Perl in a professional setting for almost ten years, that allowed Python's "first class object" approach to imports to be one of the most liberating breaths of fresh air I've ever had in my programming career. Of course, imports being imperatively executed causes the nasty problem of import dependency cycles in code but it doesn't even bother me.
It's not specific to compiled languages. An interpreter has access to the symbol table while interpreting your code in the same way that a compiler has access to it while compiling it.
In the alternative approaches cited by the author, the article "Itamar Turner-Trauring’s article" [0] presents an example that is more representative (IMO) on what people do for "mocking".
> How I import things (or the libraries I depend on!) affects how I can write my tests? Really?
Well, no. Everything that needs to be understood is explained in the first section, A quick aside about assignment. Everything else is just discussion about logical consequences of this behavior.
If you do an assignment (and "from foo import bar" is an assignment) you're just creating another name for a value (in this case, you're creating a name foo, in your namespace (presumably, but not necessarily a module) that points at whatever the name foo in module bar happened to point at at the moment you created it). If the original (or should I say, one of the previous) name(s) to the value is reassigned that does not affect what your name is pointing at.
This is really all there is to know, and once you've grokked that, everything Python falls in place.
>My own Python scripts are typically single-screen in length. And one-off stuff, where they either work or don't, basically. (...) Is this stuff really what developers of large-scale Python programs have to take into account?
Not the only way to go about it, but yes, the underlying concerns are absolutely stuff that developers of large-scale Python programs have to take into account.
At "single screen" length you can do anything you want. Really, anything goes -- and you could ever just stare at the code a lot to find all or almost all bugs.
At 1000s of lines or more (can get into 100s or 1M lines), and especially on growing and worked on code (that gets new features, refactors, etc) you need to follow other practices to ensure it all works.
If you find yourself fighting with mocks, ask yourself: is there a deeper design problem with my code? I often find that things I can't test easily have crappy design.
Couldn't agree more. I've done a few "coding dojo" sessions where my team and I start from scratch and write a new set of mock-based tests for a piece of existing code, and when it starts to get gnarly, it's always been because of an inconsistent interface, or confusing API of the code under test.
Bob Martin talks about this a lot; your UTs should be thought of as first-class clients of your objects' APIs. If something is hard to test, it's probably hard to use, or abstracted at the wrong level.
I think Ned strikes the right balance, showing risks associated with mock objects without condemning them outright. There is no doubt that people sometimes go overboard with mocking and there is no doubt that there are situations where it is really helpful.
The author hasn't a problem with mocks. It has got a problem with monkey patching. You can use a mock along dependency injection, and never run into the problems the author has.
Mocking is hard really hard, in order to mock something you need to imitate its functionality and interface. This means the mock is inherently tightly-coupled to the implementation, which is now another dependency in your system.
After trying to work with mock databases and file systems, I've personally found that there's no substitute for the real thing. There's much less maintenance and greater reliability in spinning up a test environment with the exact implementation that will be used in the production environment.
There are cases where mocks are the only practical solution, (embedded systems, distributed systems) but mocking is surely the last resort...
I've wrestled with this problem several times. My conclusion was mocks are just fine, but this is a wart in that there isn't "one way to do it"
For the most part I can get by with one rule: always mock the module.
with mock.patch('os.listdir'):
will always work, even if it doesn't accomplish what you want.
with mock.patch('mymodule.os.listdir')
will fail if that module does not explicitly import os and instead does something like from os import listdir (perhaps because a later dev did not realize importing os directly was actually a requirement for the test and changed the code).
The rule is not perfect though. In the above case, the error will actually be
ImportError: No module named os
This can be fixed with e.g.,
assert hasattr(module, 'os'), "os module is not explicitly imported"
as a preamble to your test but ... it is not perfect by any means.
I don't understand what you mean by "will always work, even if it doesn't accomplish what you want." Mocking os.listdir will be useless if your product code's imports don't match it. How is this "one rule" to use?
The os module ships with python, so the function call mocking os.listdir will always succeed even if the code being tested does not use os.listdir
By mocking mymodule.os.listdir you add a requirement that mymodule actually import the os module and take advantage of mock.patch failing loudly if it does not.
I've always been dubious of using mock for tests which involve external API calls; at best they require you to reimplement the API according to the documentation (which there may be none, or that the API doesn't follow exactly for edge cases (i.e. the things you're meant to be testing)). At worst you're implementing a very small subset of the behaviour of the API and not testing how your code responds to the other behaviour. But I haven't come across other solutions (not that I have that a heap of experience here, just contributions to a couple of oss projects).
Mocking inputs from APIs is actually great if someone besides the original developer picks up the codebase, whether from a system or even down to isolated function stubs. This is because it gives insight into the expected inputs the original developer(s) had been expecting and can elucidate the source under test.
Integration tests fill the hole you're pointing out. You can even have integration tests designed to validate the mock inputs in many cases.
Ned's implicit definition of a mock is narrower than the generally accepted one. He actually described a stub created by monkey-patching. A mock allows for call verifications as well.
There are 3 main categories of techniques for managing dependent components used these days:
You're talking about mocking database calls though. In my line of work (insurance brokerage), we use lots of insurance APIs and they are sometimes very slow (+20 seconds / call) or completely down at random hours. There is simply no way around mocking those API calls if you want a fast and reliable testsuite.
Also describes "dinosaur payment company" API sandboxes (stage).
So we end up testing against the production API with a staff member's credit card - well if we want to deploy any time soon.
Or I guess you could mock and cross your fingers that they haven't changed the API recently without telling you. Payment APIs are the most solid, but that's a low bar considering the state of third-party APIs in the real world.
If it wasn't hard for enterprises to build and manage APIs, then Google Apigee and Mulesoft wouldn't be worth billions.
var oldMen = from p in context.Persons where p.Sex == “M” && p.Age > 65;
“context” could be an Entity Framework context that would generate sql at runtime, an IMongoQueryable that would generate MongoQuery at runtime or an in memory List<Person> that would generate the equivalent foreach loop with an if condition (overly simplified) while running your tests.
Beautiful explanation for something that tripped me up in early days of using mock/patch. Summary:
(1) Variables in Python are names that refer to values.
(2) (For that reason) Mock an object where it is used, not where it is defined.
the python mock module is one of those modules i would like to see rewritten from scratch. you won't get it right by first principles. you always need to go to the documentation. that's a sign that something is not right imo.
https://martinfowler.com/articles/mocksArentStubs.html
As "observable state goes from a to b" is much closer to a business/functional requirement that will still/always be true, regardless of refactorings.
Refactoring in codebases with state-based tests is a pleasure; in codebases with mock-based tests it's tedious, constantly updating tests when no semantic behavior was supposed to change.
Also, mocking via module hacks like in the article (and in the JS world) is scary; modules are basically global variables so it's a very coarse grained slice point. Dependency injection is almost always better.