Type I and Type II Errors: The Inevitable Errors in Optimization Experiments

jhayward · on April 9, 2020

I'm reminded of this apt tweet [1]

> If a Type I error is a false positive, a Type II error is a false negative, and a Type III error is getting the right answer to the wrong question, is a Type IV error GIVING IMPORTANT CONCEPTS NUMBERS INSTEAD OF NAMES FOR NO GODDAMN REASON AND CONFUSING GENERATIONS OF STUDENTS

Also, I find the Confusion Matrix[2] to be a helpful reference, especially the multi-colored table with formulae for each condition.

[1] https://twitter.com/mjskay/status/1201380151356989440

[2] https://en.wikipedia.org/wiki/Confusion_matrix

avz · on April 9, 2020

To highlight the persistence of the confusion compare the confusion matrix in two wikipedia articles: [1] and [2].

[1] https://en.wikipedia.org/wiki/Confusion_matrix

[2] https://en.wikipedia.org/wiki/Pre-_and_post-test_probability...

(In case edits are made one way or the other: both articles display similar confusion matrices but disagree on the names of the off-diagonal elements. In [1] false positives are called "type I errors" and false negatives are called "type II errors". In [2] false positives are called "type II errors" and false negatives are called "type I errors".)

(Edit: layout)

ImaCake · on April 9, 2020

I hate the number naming system. I have no idea why anyone would find it preferable to use them for literally any reason. Specificity and Sensitivity already almost mean the right things in common language already, they are perfect words for this. The only use for these is to confuse people so they don't realise you have no idea what you are talking about.

I think some people struggle to grasp that these are actually strict mathematical concepts. But personally, that makes me feel comfortable with them. It is a lot easier to get a math concept consitently right than some of the fuzzier concepts in epidemiology.

oarabbus_ · on April 9, 2020

I particularly liked this:

> I like that "Type 1 error" and "Type 2 error" are both 4 syllables, just like "false negative" and "false positive". So they aren't briefer than the thing they describe (at least out loud, and even by char count it's not a huge win).

henrikeh · on April 9, 2020

Remember Peter and the wolf. The villagers committed several Type I errors first and then a Type II error second.

But, yes, a prime example of terrible naming.

andrewla · on April 9, 2020

Not "Peter and the wolf", you mean "The Boy Who Cried Wolf". There are no villagers in Peter and the Wolf, and no statistical errors committed.

petervm · on April 10, 2020

I believe in Portuguese, at least, that boy is also named Peter.

henrikeh · on April 10, 2020

Yes! You are correct! Somehow I got them mixed up!

earthboundkid · on April 9, 2020

One of the greatest terrible technical names of all time has got to be the philosophers who knew than "intention" was already a word but decided that the opposite of "extension" should be "intension" because WHY SHOULDN'T WE JUST USE HOMOPHONES FOR SIMILAR BUT SUBTLY DIFFERENT CONCEPTS FML

yters · on April 9, 2020

So that's what that means! I remember reading a philosophy article using that word and not being able to make heads or tails of it, constantly thinking "how is intension related to intention?"

yourpalkeith · on April 9, 2020

Thank you for this! May it be shared far and wide until we can stop using these silly codewords.

Scea91 · on April 9, 2020

It is funny that although I have published peer-reviewed papers about evaluation of ML models and could probably discuss with you about many nuances of the process if you woke me at 3 AM, I always find it difficult to remember which of [Type I, Type II] is false positive and which is a false negative. If I have this problem I suppose almost everyone has it.

I wish people stopped using [Type I, Type II] when we can use clearly superior terminology. This feels to me very similar to amateur software engineers using non-descriptive variable names.

pfortuny · on April 9, 2020

This is exactly my experience. I have never known what type the error I was talking about was...

Imagine calling (for instance) type I groups those which are abelian and type II those which are not... And then not being a professional mathematician. So is xy=yx here? Mmmmhhh type I says yes or was it no?

laichzeit0 · on April 9, 2020

I have the same problem with specificity and sensitivity. Always have to look up the damned terms even though I’ve used them countless of times.

ImaCake · on April 9, 2020

They mean what they would mean in common english.

Specificity because "how specific is the test? Does it measure the true positives correctly without getting a whole heap of false positives as well?" You might ask someone to be "more specific" about something so they aren't including irrelevant things in their discussion.

Sensitivity because "how sensitive is the test? Does it detect the needle in the haystack you need to find? How many does it miss." In common english you might complain that your car brakes are too sensitive - they are too quick to register the pressure from your foot.

Don't forget these are actually strict mathematical concepts. Hope this helps clarify :)

parekhnish · on April 10, 2020

I think the issue people have with these names is that their English meanings (as you described) make sense when the positive class is not as prevalent as the negative class. If they are equally probable (or worse, if it's the opposite), then the English meanings quickly become out-of-context

ashfromconvert · on April 14, 2020

Thank you for your clear explanation, indeed Specificity instead of Type I errors, and Senvisity for Type II errors, make a lot more sense!

twanvl · on April 9, 2020

Imagine mathematicians calling commutative groups abelian? How do you remember if xy=yx there?

jerf · on April 9, 2020

"Abelian" is at least a fresh new concept to hang your own associations off of, with no previous interference, and without interference from the similarly-named "Adelian" groups or something equally stupid.

The problem isn't just that the term is something you've never heard before, but that "I" and "II" is not a very good concept to try to hang them off of. This is relevant to software engineering naming too: In general, you should not use naming schemes that imply properties that don't actually exist in your values. I and II have all sorts of properties that don't apply to the terms in question, most noticeably, they have an order. But, which is "first", false positives or false negatives? They don't have a natural order. Using numbers to name them just gets in the way.

(Especially when there are perfectly serviceable words.)

Math jargon isn't perfect by any means. But it does at least avoid naming things by sheer numbers most of the time, unless there isn't really a choice because it needs a few hundred names right now.

(Similarly, pop quiz: In Kahneman's classification scheme, is System 1 the fast or the slow system? Odds are, even if you get that right, it's because the book title "Thinking, Fast and Slow" is something that stuck and it happen to be in order. It probably wasn't because you remember them by number.)

thaumasiotes · on April 9, 2020

> Imagine mathematicians calling commutative groups abelian? How do you remember if xy=yx there?

Actually, even if you ignore jerf's response, this is different in an important way from the "Type I" / "Type II" terminology.

Group in which the group operation is commutative: "Abelian group".

Group with no guarantees except the group axioms: "group".

The special one is marked and the non-special one is unmarked. In contrast, the designations "Type I" and "Type II" are parallel; it's not at all obvious which one is the default and which one deviates from the default.

beagle3 · on April 9, 2020

Maybe I can help you with that ... the mnemonic I developed is:

Type I error, with probability often denoted by α (alpha) which is the first (I) letter of the alpha bet. AL-PH-A stands for al ALlegedly PHalse Alarm. Or just fALse-Positive-HA!

(yes, it's stupid, and weird, but has been helping me remember it for nearly three decades now).

andrewla · on April 9, 2020

As another commenter noted, the story of the Boy Who Cried Wolf is an easy mnemonic -- the villagers committed a series of Type I errors, and then a Type II error.

jackallis · on April 9, 2020

what are those "superior terminology"?

amluto · on April 9, 2020

“False positive” and “false negative” are pretty good.

twanvl · on April 9, 2020

They are certainly better than Type I and Type II, but it is still a potentially ambiguous (at least as a non-native speaker). What makes a "false positive" false? Is it called a false positive because it is actually a negative, or is it called a false positive because it is a positive for which you made the error of calling it negative?

setr · on April 9, 2020

That much is fine -- it's ambiguous if you don't know the general idea in the first place -- it's true of most things. The problem with type 1/2 is that it's so utterly devoid of memory hooks that even if you recognize it, and know the idea, you can't confidently identify which is which.

thaumasiotes · on April 9, 2020

> or is it called a false positive because it is a positive for which you made the error of calling it negative?

Not to pick on the non-nativeness of the problem here, but that's not really a way you can use "false". I'd be a lot more comfortable calling an underlying positive that tested negative an "unidentified positive" or a "misdiagnosed positive" [this one really is ambiguous in exactly the way you suggest] or anything else that suggested that the positive was there and an error occurred in noticing it, as opposed to suggesting that the positive wasn't there in the first place.

So, it's a fair complaint for non-native speakers, but you just can't choose all your terminology to meet their needs. :/

jdashg · on April 9, 2020

It's a false positive when the test comes back positive, but it was wrong. "Test falsely reported positive"

naasking · on April 9, 2020

Substitute "incorrect" for "false", and it's all clear, ie. "incorrect positive result", and "incorrect negative result".

RosanaAnaDana · on April 9, 2020

Even reading this thread, I feel like I need to get to get a tattoo that says type I : false positive ; type II : false negative

downshun · on April 9, 2020

Mistaken positive and mistaken negative may sound more intuitive.

Too many negativey terms being used: null (hypothesis), false, negative, error

yters · on April 9, 2020

If I accidentally swap "Type I" and "Type II" in my mind, is that a Type I or Type II error?

ashfromconvert · on April 14, 2020

You've just made a Type I error because you've lacked 'specificity' in your mind by swapping the two! :O

selimthegrim · on April 11, 2020

Just to mix things up:

https://statmodeling.stat.columbia.edu/2019/03/12/r-package-...

Volt · on April 10, 2020

What's with the shortened URL?