What is property based testing?

santoriv · on April 2, 2017

Here's a question that's tangentially related to the article in that it deals with property based testing. I'm curious what the software veterans thing about this:

Should you write property based tests of a third party custom api if you are on a short term contract and pretty sure that no one is going to maintain the testing suite? On my last contract the product quote api and the product order customization UI I was writing were being co-developed at the same time by two different parties. I found a lot of errors in the schema/third party endpoint by randomly generating lots of product orders and tossing them at the endpoint. While this helped a lot in development, it took a fair amount of time to write. Knowing the parties involved I'm 95% certain that the tests were never run again and probably ignored after I left. Is it more cost conscious to just not write the tests and slog through to the hand-off?

I realize that this may be a little hard to answer without knowing the project intimately. I just sometimes wonder if I write tests too often.

AdieuToLogic · on April 2, 2017

Over the years, I've run across this type of concern when authoring test suites which validate external system interaction, not just property-based ones.

The situation you describe appears to be ideal for having test suites which exercise the third-party API in ways the system your team expects to use it, especially if those suites are viewed as executable documentation. Even if others in the organization do not value automated testing (and, yes, this has happened a lot more than I care to remember), time spent on them bring value by having a reference in-house. Key here, IMHO, is regarding these test suites as verification of what the third-party claims in their documentation (if any).

It's important to limit time spent on making these tests, as one wants to avoid making a brittle suite which causes non-trivial maintenance costs for the organization. IOW, try not to go too deep down the rabbit hole.

MaulingMonkey · on April 3, 2017

> I just sometimes wonder if I write tests too often.

I try to do an extremely rough guesstimate as to the cost of not testing:

- Debugging sessions to reproduce latent bugs - Debugging sessions to track down refactoring mistakes - More thorough code reviews because you can't trust the testing suite to catch problems - QA time to catch/reproduce bugs - Costs of downtime when the bugs take out a critical system - End user, customer support, and/or I.T. staff overhead in dealing with the damage caused by uncaught bugs, or working around unsolved bugs

My usual conclusion is that nobody I work with (nor myself) does enough automated testing.

seanwilson · on April 2, 2017

> Should you write property based tests of a third party custom api if you are on a short term contract and pretty sure that no one is going to maintain the testing suite?

That sounds like you're doing their work for them. If I'm accepting risk in a contract, my major assumption would be the third party libraries I'm using are fit for purpose. You surely can't be held liable if the other contractors aren't doing their work to an acceptable standard.

nickpsecurity · on April 2, 2017

It's not quite that simple. Much of the problems of Windows reliability was from 3rd party drivers and apps (esp killer apps). They couldn't ditch the hardware or app vendors because their customers demanded them. So, Windows source was littered with all kinds of provisions to basically work around all that. They eventually forced driver analysis and safer language onto the ecosystem when that was too much. But a key part of their path to dominance and backward compatibility was working around known issues in 3rd party code.

Avoid this when possible obviously. However, it's sometimes a good idea when it's for tie-in to ecosystem that has benefits worth it.

seanwilson · on April 2, 2017

Maintaining Windows is a very complex and special case though. I mean for most contract work you can safely assume the libraries you're going to be using are mostly bug free.

fineline · on April 3, 2017

"Safely assume" is an oxymoron. Assumptions are a form of risk taking. The upside reward is that you save the cost of testing (and thus eliminating) your assumption. The (potential) downside is the complete failure of everything resting on that assumption. In many cases, the upside is small and knowable, the downside is hard to predict or contain - i.e. the worst kind of asymetric risk. Thus as a rule of thumb, assumptions are very bad practice, and should always be called out as liabilities in contract terms - e.g. "The cost of repair or rework due to any bugs found in the provided libraries shall be borne by X."

seanwilson · on April 3, 2017

Do you have any concrete examples where a third party library turned out to be the major factor of a project failing? I've done plenty of projects that rely on a large number of at least moderately popular open source components for example and I've never once had a major problem. Obviously all assumptions carry risk, you should have good contract terms etc. but from experience this element is not a high risk at all. Clients tend to understand the elements that are out of your control. Changing requirements is a vastly higher risk for example.

hibikir · on April 2, 2017

If you are contracting, and the libraries are also custom, and built by different contractors, what you'll get out of your tests is just a tool to accurately know who to blame: I'd much rather have that be done by whoever is in charge of the entire project, but hey, if you have to, you have to.

However, when you are in charge of a project, and it's your ultimate responsibility whether it success or fails, testing libraries and services can be crucial. I once worked on a large distributed system that was built around a third party distributed database. Said database had a bunch of features past what a typical distributed database promised, and a good percentage of them were part of the design.

The people in charge of the project at the time never bothered testing those features thoroughly, and spent a lot of man-years building. When it came time to have everything working together, a whole lot of those special database features just didn't work reliably or quickly, there was no observability to be able to identify the problems, so the project was an unmitigated disaster, and the reasons for this weren't even clear. I came later to try to rescue the project, built a bunch of testing, and got a good picture of which features worked, which didn't, and evaluated the workable uses cases. Then came a redesign around what worked, with me reimplementing some of those broken features in ways that compensated around the database's practical limitations.

Property based testing and load testing can be ignored for toy projects, but you'll be happy you've done your homework early before you build something that will be used in anger.

I just wish property based testing libraries were built by people with more industry focus, instead of just those interested in thought experiments and academic proofs: The ones I've used often needed extensions to make them have industry-level ergonomics, instead of behaving like someone's Ph.D thesis.

platz · on April 2, 2017

What in your opinion is the most important missing extension at a high level?

dorian-graph · on April 2, 2017

I've thought about this same thing before, also in the context of not contractual work, but longterm.

At least 1 thing is that if something goes wrong later on, you can point to the tests to show that _your work_ is no the culprit. I'm curious to hear what the more experienced people have to say on your question.

platz · on April 2, 2017

referencing fuzzing so heavily in the definition places a lot of faith in understanding that there are those subtle varieties of fuzzers that can generate structured data/inputs.

I didn't know fuzzers did that, so I'd guess that others would miss the nuance there too; and it's a rather large part of property based testing

This is a definition that aims to be more "categorically correct" without providing a lot of intuitive explanatory power. (E.g. A monad is a monoid in the category of endofunctors). Maybe that's ok in that was the aim and PBT can't be reduced to a single sentence like a lot of abstract concepts

It also assumes a well-defined definition of a fuzzer

I was also surprised to not see the word 'invariants' mentioned once. Maybe that's very close to "property", but property has other connotations so it still seems useful

bbcbasic · on April 2, 2017

To me property based testing is just checking that a function obeys a certain law by throwing lots of random data at it.

That means you have to define the law in terms of the input and output. Rather than fix the input like you'd do on a regular test and just assert the output.

The hard part is trying to figure out what laws you should expect from your functions.

michaelfeathers · on April 2, 2017

I look at property-based testing the same way I look at Design by Contract. It can be used to find errors in existing systems, but it's also valuable when you are designing.

Just asking yourself what laws a function should adhere to as you are considering writing it function puts you in a different frame of mind and you can end up with simpler software.

nickpsecurity · on April 2, 2017

Yeah, fuzzing is just throwing random data. Property-based is supposed to be more like DbC where it's guided to test the righg things.

paulddraper · on April 2, 2017

Maybe you should write a function that does the right thing, and then use it to tell you the correct output :/

dbcurtis · on April 2, 2017

Not totally silly. Just last Friday I was writing a Python hypothesis-based test using constrained-random stimulus generation. The target function under test is rather complicated because it is coded to be space and time efficient. It is not so easy to read and reason about, surprise, surprise. But there is a straight-forward, easy to ready, while-loops-and-simple-if's implementation that is slow and inefficient, but it's not too hard to convince yourself it produces the correct answer. So I use that in my test driver to generated expected results. It's certainly possible for both implementations to contain bugs, but extremely unlikely for them both to report the same wrong answer for the same stimulus.

di4na · on April 3, 2017

Just nitpicking about "extremely unlikely for them both to report the same wrong answer for the same stimulus".

This assumption is what are "blind implementation voting systems" were based on, and it was proved to not survive that well in practice. It may totally work in your case, but you may be interested in looking at the work on Nancy Leveson at the MIT (all links are from her website)

http://sunnyday.mit.edu/papers/nver-tse.pdf

http://sunnyday.mit.edu/critics.pdf

http://sunnyday.mit.edu/papers/nver2.pdf

http://sunnyday.mit.edu/papers/consistent-comp.pdf

This body of work launched quite a controversy, but well, it is still here...

llimllib · on April 3, 2017

Here's an example I wrote where I test the fast API against the (probably correct) brute-force version: https://github.com/llimllib/ckmeans/blob/master/ckmeans.py#L...

hyperpape · on April 2, 2017

You may be joking, but this can be a good technique for cases where you have a simple but inefficient solution and a complex optimized solution. Ditto for anything that has a special case that can be easily answered.

platz · on April 2, 2017

Also known as an 'oracle'

paulddraper · on April 2, 2017

Half-joking. It can be helpful to verify an optimized implementation with a naive one. That's a small minority of testing though.

Special casing is effectively traditional testing: input is 1, output is 2.

nickpsecurity · on April 2, 2017

I did that for high-assurance systems as an equivalence check to ensure optimizing compiler didn't break things. It's also standard in hardware verification where they equivalence check each step from highest-level form to lowest-level form.

seanwilson · on April 2, 2017

I'm really surprised QuickCheck and similar tools aren't used much outside of languages like Haskell. QuickCheck is easy to implement, simple to learn, the test code is concise and it's really good at finding obscure bugs.

_ibu9 · on April 2, 2017

By the time you discover QuickCheck, you might as well just use Haskell :)

seanwilson · on April 2, 2017

Haskell isn't practical for a lot of projects though just based on where your code has to run and what existing libraries you want to use.

grumpyprole · on April 3, 2017

Yes. But this is more the problem of vendor lock-in than anything to do with Haskell.

seanwilson · on April 3, 2017

Yeah, not knocking the language. It's frustrating how it takes so long for good programming language ideas to defuse into the mainstream. It's crazy how strong type systems have been around for literally decades and we're still using weakly typed dynamic languages for example. TypeScript is a decent compromise for me at the moment despite being a fan of actual functional languages.

KurtMueller · on April 2, 2017

Anybody here do any property-based testing with Ruby? If so, what do you use and what do you test with it?

hyperpape · on April 2, 2017

I don't do Ruby, but the author of this piece has an article on property testing in many different languages, including Ruby: http://hypothesis.works/articles/quickcheck-in-every-languag....

vram22 · on April 2, 2017

Is property-based testing related to equivalence (class) partitioning?

https://en.wikipedia.org/wiki/Equivalence_partitioning

kristianp · on April 3, 2017

The article mentions that smallcheck doesn't qualify as property based testing. However it's hard to see what it is from its description. Any ideas what smallcheck does?

> SmallCheck is a testing library that allows to verify properties for all test cases up to some depth.

https://hackage.haskell.org/package/smallcheck

paradoja · on April 3, 2017

It's like QuickCheck, but instead of testing random values, it tests all values in a certain range.

For example, if the function takes an array of 128 bit values, it's test all the possible arrays of length 0, 1, 2... up to the length you want to test for.

(Actually, not necessarily all, but all that an exhaustive generator generates; the generator would normally go through all values, but it potentially could only test some values, but would not select them at random).