Berkson's Paradox

alexpetralia · on March 24, 2021

My favorite article on this: https://erikbern.com/2020/01/13/how-to-hire-smarter-than-the...

Arech · on March 27, 2021

Excellent article, thank you very much for sharing! Infinitely better than everything else I read on the topic.

dang · on March 24, 2021

If curious, past threads:

Berkson's Paradox - https://news.ycombinator.com/item?id=18667423 - Dec 2018 (21 comments)

Berkson's Paradox - https://news.ycombinator.com/item?id=8264252 - Sept 2014 (20 comments)

amelius · on March 24, 2021

Am I the only one who dislikes this form of presentation, i.e. as a series of tweets?

rrmm · on March 24, 2021

The series of tweets for me just wasn't illuminating and I didn't get what the actual 'paradox' was given the graphs. But my issue was more that the graphs weren't clear in pointing out what I should be looking at.

Wikipedia was much clearer for me, https://en.wikipedia.org/wiki/Berkson's_paradox , but ymmv of course.

Another good statistical foible to be aware of along with Simpson's.

billynomates111 · on March 25, 2021

If anyone's impatient like me, this example from wikipedia helped me get it:

> For example, a person may observe from their experience that fast food restaurants in their area which serve good hamburgers tend to serve bad fries and vice versa; but because they would likely not eat anywhere where both were bad, they fail to allow for the large number of restaurants in this category which would weaken or even flip the correlation.

klyrs · on March 25, 2021

Having selected for interesting content: quality of presentation is inversely correlated to length of a tweet thread?

anitil · on March 24, 2021

I always go to threadreaderapp.

In this case it's https://threadreaderapp.com/thread/1373266475230789633.html

Edit to add: In this case I'd recommend wikipedia, the thread is quite short and light on details

Dylan16807 · on March 25, 2021

> In this case I'd recommend wikipedia, the thread is quite short and light on details

To each their own, I guess. Sometimes a short explanation is plenty.

anonymousiam · on March 24, 2021

No, you aren't the only one. It has become even worse now that Twitter will not render without JavaScript enabled. Unfortunately, I still do not know what Berkson's Paradox is because I will not enable JavaScript for Twitter.

anonymousiam · on March 24, 2021

Okay, I googled it. A non-hostile site hosts a definition here: https://en.wikipedia.org/wiki/Berkson%27s_paradox

junippor · on March 24, 2021

Thank you. Does anyone understand the difference between this and Simpson's paradox?

kgwgk · on March 24, 2021

The latter appears when analyzing subgroups gives a different result than analyzing the pooled data.

The former is about correlations that appear in samples which are not representative of the general population, due to the way that those samples are selected.

junippor · on March 24, 2021

> The latter appears when analyzing subgroups gives a different result than analyzing the pooled data.

> The former is about correlations that appear in samples which are not representative of the general population, due to the way that those samples are selected.

You just said the same thing twice. Think about it.

For one you used terms like "subgroups" and "pooled data" and for the other "samples" and "general population". Those are the same things.

Then you used "[the effect] appears in" and in the other "correlations". Well, Simpsons paradox can also manifest itself in correlations. So you just said the same thing twice.

FeepingCreature · on March 25, 2021

Simpson's paradox: analyzing trends per subgroup can give a different result than pooled data.

Berkson's paradox: analyzing a single subgroup selected with a function aggregating two traits (additively?) will indicate an anticorrelation between the traits.

Simpson's paradox says you can't judge group trends from subgroup trends. Berkson's paradox says given a group selected in a specific way, it will have a certain property in itself. They're just different statements.

Pyramus · on March 25, 2021

Yes and no.

Berkson's paradox is a special case of Simpson's for the two subgroups selected and non-selected.

The difference is that Berkson's paradox involves selecting the subgroup a posteriori and in a particular way, Simpson's paradox assumes a selection a priori.

kgwgk · on March 25, 2021

Another difference is that Simpson's "paradox" involves all the subgroups that the full population is partitioned into, unlike Berkson's "paradox".

junippor · on March 25, 2021

I like how you put paradox in quotes. I also annoys me when people call these things paradoxes. They're more properly called counter-intuitive phenomena. I wonder if there's a single-word name for that.

kgwgk · on March 25, 2021

paradox :-)

junippor · on March 25, 2021

So I actually checked and... turns out you're right.

According to Wikipedia, "paradox" can either mean "logically self-contradictory statement" or a "statement that runs contrary to one's expectation". I always thought that it meant the former only.

These two concepts should really really have separate words.

getlawgdon · on March 25, 2021

You "checked Wikipedia," is that it? You're done now?

junippor · on March 25, 2021

You sound like you're trying to make a point. Make a point.

doubleunplussed · on March 24, 2021

Eh. There is intentional splitting into subgroups, and there is accidental selection bias. I think that's the difference.

caddemon · on March 25, 2021

I think Berkson's paradox is more specific than just correlations arising from non-random sample selection. Correlations that are not representative of the general population could still be useful, if it's a meaningful correlation within some subgroup of interest. The problem is when the features you are correlating relate too closely to the features that were used for sample selection - then you can end up with a trivial result.

I've always learned of Simpson's paradox as relating more to different sample sizes when partitioning data, which can happen entirely arbitrarily - for example a baseball player getting injured part way through the season.

The fact that one player's at bats get partitioned differently than another's is not caused by the on field performance, so there's no "double dipping" going on like I would imagine with Berkson's. Conversely I'm having trouble fitting a Berkson's example into the framework of Simpson's paradox, since there's no reason the poorly-selected subpopulation can't theoretically be exactly half of the general population. And if all of the samples are of equal size Simpson's paradox doesn't exist anymore (because with equal bin sizes the mean of means is equivalent to the overall mean).

kgwgk · on March 25, 2021

> I've always learned of Simpson's paradox as relating more to different sample sizes when partitioning data

When you look at proportions based on binary outcomes it may be related to imbalanced groups but it's more general than that.

In the context discussed here of correlations between continous variables the groups can be of similar size.

See for example the chart here: https://towardsdatascience.com/simpsons-paradox-d2f4d8f08d42

caddemon · on March 25, 2021

Interesting, I only ever heard of Simpson's paradox in the context of comparing overall averages versus subgroup averages.

I guess this paradox could then be thought of as a special case of Simpson's paradox? Since the out group will exclude people with both traits there should also be a negative correlation there, which disappears in the overall population. But in Berkson's case it seems they're implying the subgroup correlation is spurious whereas with Simpson's it could go either way.

kgwgk · on March 25, 2021

> Since the out group will exclude people with both traits there should also be a negative correlation there

Not necessarily. Imagine the traits are distributed uniformly and independently in [-1 1]. There is no correlation:

    ******
    ******
    ******
    ******
    ******
    ******

If you select people with at least one positive trait you will find negative correlation in the group + but the correlation will still be zero in the group -.

    ++++++
    ++++++
    ++++++
    ---+++
    ---+++
    ---+++

caddemon · on March 25, 2021

Makes sense, I was picturing more of a diagonal boundary but you're right the paradox doesn't specify the shape of the boundary. Thanks!

kgwgk · on March 25, 2021

I don't think so.

In the first one you have a partition in subgroups A and B (or more than two) which show similar correlations, different from the correlation seen in A+B.

In the second one you have only a subgroup A (the implicit complement notA is not observed) where the correlation is not the same as in the (unobserved) full population A+notA. Nothing is said about the correlation in notA. It could be at either side of the correlation in the full population, while in Simpson’s paradox both subgroups are in the same side.

Edit: and I also mention "due to the way that those samples are selected" for Berkson's paradox where the selection is based on the variables of interest while in Simpson's paradox the subgroups are "external" (but influence the correlation between those variables).

1vuio0pswjnm7 · on March 29, 2021

You do not need to enable Javascript, you only need to change your User-Agent header to one that is acceptable.

lurquer · on March 24, 2021

Of the tweets I bother to read, I’ve found that the more interesting the tweet, the more likely it is to be poorly formatted.

/meta

SilasX · on March 24, 2021

You're not alone. I think it caught on because a long article (even with pictures) might seem like too much of an investment to a lot of people but a self-contained tweet that keeps getting extended is less intimidating.

TBH, I'd say it's less that I dislike this form of presentation than that I hate all the anti-pattern bloat that Twitter adds, like clickable items not being detectable by extensions and previews being cut off.

rkagerer · on March 25, 2021

Yes it's one of the reasons I hate Twitter. It was designed with aversion to substance. Personally, I find older fashioned forums (with small communities of experts) more illuminating.

tejtm · on March 24, 2021

From the wikipedia page it seems to be a generalization on sampling below the Nyquist frequency can lead to incorrect interpretation of wave forms but in more dimensions.

smitty1e · on March 24, 2021

I don't understand why a good book/good movie are even included here.

Two different media for (occasionally) related work.

Calling whatever inverse relation was somehow crafted a "paradox" seems tendentious.

reactchain · on March 24, 2021

The argument here is that "smart people are worse looking" is actually a case of _of the people you encounter_ smart people are worse looking, but that overall there is no correlation. This makes sense, but I think it's more complex. If you took the entire population, I think you could still conclude the "smart people are worse looking" if you define smart to include non-innate, learned behaviour, for the simple reason that good looking people have an easier time in life (getting jobs and so forth) and are therefore less compelled to spend time and effort becoming "smart". So there's a self-balancing aspect that produces these correlations in the general population as well.

Kranar · on March 25, 2021

This doesn't make much sense and I think may actually be another instance of this paradox. For example, why would having an easier time in life dissuade someone from putting in effort to become smart (by your definition of smart)?

Do you think people who have a hard time in life are compelled to study hard and succeed, as if somehow people living in poverty or in third world countries are putting in significant amounts of effort to become smart? Of course not, not because people in poverty don't want to be smart of course, but because they are compelled to deal with time consuming hardships.

People who have it easy in life are far more compelled to study, to the point that the term "scholar" is literally the Greek word for "leisure".

I wouldn't be surprised if you drew out two axis, one measuring an individual's hardship in life and one measuring how "smart" they are, you'd reveal how paradoxical your statement is. The overall population would show that hardship places a huge burden that inhibits ones ability to learn and pursue intellectual endeavors while having an easier time in life facilitates it... and yet if you then filtered out the bottom left group (hard life and low "smart" score), you'd see the exact inverse correlation that Berkson's Paradox is all about.

reactspa · on March 25, 2021

As a Taleb follower, this concept seems similar to Survivorship Bias or Selection Bias.

caddemon · on March 25, 2021

I'd say it's a type of selection bias, but yeah seems very closely related to survivorship bias.

Also "survival" taken literally is kinda interesting to think about in this framework. Like say there was some disaster so that the vast majority of people surviving would be either athletic or smart. This subset would likely have a negative correlation between athleticism and intelligence, even if they correlated positively in the general population. Except in this scenario the subset IS your new population.

So I wonder if there are real life traits that correlate negatively across all modern humans, but had no such correlation among our ancestors. Or is there too much regression to the mean with reproduction? Particularly if "opposites attract" is true.