If it can cause a regression, it's not internal. My rule of thumb is "test for regression directly", meaning a good test is one that only breaks if there's a real regression. I should only ever be changing my unit tests if the expected behavior of the unit changes, and in proportion to those changes.
A well-known case is the Timsort bug, discovered by a program verification tool. Also well known is the JDK binary search bug that had been present for many years. (This paper discusses the Timsort bug, and references the binary search bug: http://envisage-project.eu/proving-android-java-and-python-s...)
In both cases, you have an extremely simple API, and a test that depends on detailed knowledge of the implementation, revealing an underlying bug. Obviously, these test cases, when coded, reveal a regression. Equally obviously, the test cases do test internals. You would have no reason to come up with these test cases without an incredibly deep understanding of the implementations. And these tests would not be useful in testing other implementations of the same interfaces, (well, the binary search bug test case might be).
In general, I do not believe that you can do a good job of testing an interface without a good understanding of the implementation being tested. You don't know what corner cases to probe.
Using implementation to guide your test generation ("I think my code might fail on long strings") is fine, even expected. Testing private implementation details ("if I give it this string, does the internal state machine go through seventeen steps?") is completely different.
That's not what he's saying. He's saying the test should measure an externally visible detail. In this case that would be "is the list sorted". This way the test will still pass without maintenance if the sorting algorithm is switched again in the future. You can still consider the implementation to create antagonistic test cases.
One of my colleagues helped find the Timsort bug and recently another such bug (might be the Java binary search, don't remember).
The edge case to show a straightforward version of that recent bug basically required a supercomputer. The artifact evaluation committee complained even.
So you can try to test for that only based on output. But it's gigantically more efficient to test with knowledge of internals.
this sounds like a case where no amount of unit testing ever would've found the bug. someone found the bug either through reasoning about the implementation or using formal methods and then wrote a test to demonstrate it. you could spend your entire life writing unit tests for this function and chances are you would never find out there was an issue. i'd say this is more of an argument for formal methods than it is for any approach to testing.
One doesn't need to have detailed knowledge of the implementation, but merely if provided initial state creates invalid output then we can write test for that. Though yes, having knowledge of implementation allows you to define the state that produces invalid result.
Fair enough. And how do you know, before causing a regression, whether your test could detect one? In other words, how can you tell beforehand whether your test checks something internal or external?
"External" functionality will be behavior visible to other code units or to users. If you have a sorting function, the sorted list is external. The sorting algorithm is internal. Regression tests are often used in the context of enhancements and refactorings. You want to test that the rest of the program still behaves correctly. Knowing what behavior to test is specific to the ___domain and to the technologies used. You can ask yourself, "how do I know that this thing actually works?"
Isn’t the point that internal functions often have a much smaller state space than external functions, so it’s often easier to be sure that the edge cases of the internal functions are covered than that the edge cases of the external function are covered?
So, having detailed tests of internal functions will generally improve the chances that your test will catch a regression.
> Isn’t the point that internal functions often have a much smaller state space than external functions
That's the general theory, and why people recommend unit tests instead of only the broader possible integration tests. But things are not that simple.
Interfaces do not only add data, they add constraints too. And constraints reduce your state space. You will want to cut your software over the smallest possible interface complexity you can find and test those, those pieces are what people originally called "unities". You don't want to test any high-complexity interface, those tests will harm development and almost never give you any useful information.
It's not even rare that your unities are composed of vertical cuts through your software, so you'll end up with only integration tests.
The good news is that this kind of partition is also optimal for understanding and writing code, so people have been practicing it for ages.
I agree that they would help in the regression testing process, especially in diagnosing the cause. However, I think those are usually just called "unit" tests, not "regression" tests. For instance, the internal implementation of a feature might change, requiring a new, internal unit test. The regression test would be used to compare the output of the new implementation of the feature versus the old implementation of the feature.
Worth noting that performance is an externally visible feature. You shouldn't be testing for little performance variations, but you probably should check for pathological cases (e.g. takes a full minute to sort this particular list of only 1000 elements).
For features, I need to take the time to think of required behavior. If I just focus on the implementation, the tests add no documentation and I'm not forced through the exercise of thinking about what matters.