It would be interesting to know which age group(s) HN readers consist mostly of.
Voluntary response poll results will not provide reliable data on this question. You should have seen this FAQ with the previous polls. (See also pg's comment,
which I imagine will continue to be the top comment in this thread.)
As I commented previously when we had a poll on the ages of HNers, the data can't be relied on to make such an inference. That's because the data are not from a random sample of the relevant population. One professor of statistics, who is a co-author of a highly regarded AP statistics textbook, has tried to popularize the phrase that "voluntary response data are worthless" to go along with the phrase "correlation does not imply causation." Other statistics teachers are gradually picking up this phrase.
Sorry Kim, but it just aint so. Voluntary response data are worthless. One excellent example is the books by Shere Hite. She collected many responses from biased lists with voluntary response and drew conclusions that are roundly contradicted by all responsible studies. She claimed to be doing only qualitative work, but what she got was just plain garbage. Another famous example is the Literary Digest "poll". All you learn from voluntary response is what is said by those who choose to respond. Unless the respondents are a substantially large fraction of the population, they are very likely to be a biased -- possibly a very biased -- subset. Anecdotes tell you nothing at all about the state of the world. They can't be "used only as a description" because they describe nothing but themselves.
I think Professor Velleman promotes "Voluntary response data are worthless" as a slogan for the same reason an earlier generation of statisticians taught their students the slogan "correlation does not imply causation." That's because common human cognitive errors run strongly in one direction on each issue, so the slogan has to take the cognitive error head-on. Of course, a distinct pattern in voluntary responses tells us SOMETHING (maybe about what kind of people come forward to respond), just as a correlation tells us SOMETHING (maybe about a lurking variable correlated with both things we observe), but it doesn't tell us enough to warrant a firm conclusion about facts of the world. The Literary Digest poll
is a spectacular historical example of a voluntary response poll with a HUGE sample size and high response rate that didn't give a correct picture of reality at all.
When I have brought up this issue before, some other HNers have replied that there are some statistical tools for correcting for response-bias effects, IF one can obtain a simple random sample of the population of interest and evaluate what kinds of people respond. But we can't do that here on HN.
Another reply I frequently see when I bring up this issue is that the public relies on voluntary response data all the time to make conclusions about reality. To that I refer careful readers to what Professor Velleman is quoted as saying above (the general public often believes statements that are baloney) and to what Google's director of research, Peter Norvig, says about research conducted with better data,
that even good data (and Norvig would not generally characterize voluntary response data as good data) can lead to wrong conclusions if there isn't careful thinking behind a study design. Again, human beings have strong predilections to believe certain kinds of wrong data and wrong conclusions. We are not neutral evaluators of data and conclusions, but have predispositions (cognitive illusions) that lead to making mistakes without careful training and thought.
Another frequently seen reply is that sometimes a "convenience sample" (this is a common term among statisticians for a sample that can't be counted on to be a random sample) of a population offers just that, convenience, and should not be rejected on that basis alone. But the most thoughtful version of that frequent reply I have previously seen in online discussion did correctly point out that if we know from the get-go that the sample was not done statistically correctly, then even if we are confident (enough) that HN participants are young, we wouldn't want to extrapolate from that to conclude that the users of any technology site are young, or that users of the Internet as a whole are young.
On my part, I wildly guess that most HNers are younger than I am in part because this kind of poll recurs often on HN. Other preoccupations of younger rather than older people make up frequent topics on HN, and I've tried looking for signs that there are large hidden numbers of old participants here without finding many.
So you are saying that almost all studies in social sciences (e.g. psychology) are invalid because almost all of them require participants to voluntarily sign up for the studies.. Also most psychology studies are done with college students, hardly a very representative sample of the population...
This is not new information to social science researchers. Responsible research will attempt to correct for these biases and/or simply acknowledge them upfront and not generalize their result further than the demographics included in the study.
The second part is true. The first part is partially negated through trickery; what is actually being studied is almost never what the respondents believe is being studied.
Voluntary response poll results will not provide reliable data on this question. You should have seen this FAQ with the previous polls. (See also pg's comment,
https://news.ycombinator.com/item?id=5537023
which I imagine will continue to be the top comment in this thread.)
As I commented previously when we had a poll on the ages of HNers, the data can't be relied on to make such an inference. That's because the data are not from a random sample of the relevant population. One professor of statistics, who is a co-author of a highly regarded AP statistics textbook, has tried to popularize the phrase that "voluntary response data are worthless" to go along with the phrase "correlation does not imply causation." Other statistics teachers are gradually picking up this phrase.
-----Original Message----- From: Paul Velleman [[email protected]] Sent: Wednesday, January 14, 1998 5:10 PM To: [email protected]; Kim Robinson Cc: [email protected] Subject: Re: qualtiative study
Sorry Kim, but it just aint so. Voluntary response data are worthless. One excellent example is the books by Shere Hite. She collected many responses from biased lists with voluntary response and drew conclusions that are roundly contradicted by all responsible studies. She claimed to be doing only qualitative work, but what she got was just plain garbage. Another famous example is the Literary Digest "poll". All you learn from voluntary response is what is said by those who choose to respond. Unless the respondents are a substantially large fraction of the population, they are very likely to be a biased -- possibly a very biased -- subset. Anecdotes tell you nothing at all about the state of the world. They can't be "used only as a description" because they describe nothing but themselves.
http://mathforum.org/kb/thread.jspa?threadID=194473&tsta...
For more on the distinction between statistics and mathematics, see "Advice to Mathematics Teachers on Evaluating Introductory Statistics Textbooks"
http://statland.org/MyPapers/MAAFIXED.PDF
and "The Introductory Statistics Course: A Ptolemaic Curriculum?"
http://escholarship.org/uc/item/6hb3k0nz
I think Professor Velleman promotes "Voluntary response data are worthless" as a slogan for the same reason an earlier generation of statisticians taught their students the slogan "correlation does not imply causation." That's because common human cognitive errors run strongly in one direction on each issue, so the slogan has to take the cognitive error head-on. Of course, a distinct pattern in voluntary responses tells us SOMETHING (maybe about what kind of people come forward to respond), just as a correlation tells us SOMETHING (maybe about a lurking variable correlated with both things we observe), but it doesn't tell us enough to warrant a firm conclusion about facts of the world. The Literary Digest poll
http://historymatters.gmu.edu/d/5168/
http://www.math.uah.edu/stat/data/LiteraryDigest.pdf
is a spectacular historical example of a voluntary response poll with a HUGE sample size and high response rate that didn't give a correct picture of reality at all.
When I have brought up this issue before, some other HNers have replied that there are some statistical tools for correcting for response-bias effects, IF one can obtain a simple random sample of the population of interest and evaluate what kinds of people respond. But we can't do that here on HN.
Another reply I frequently see when I bring up this issue is that the public relies on voluntary response data all the time to make conclusions about reality. To that I refer careful readers to what Professor Velleman is quoted as saying above (the general public often believes statements that are baloney) and to what Google's director of research, Peter Norvig, says about research conducted with better data,
http://norvig.com/experiment-design.html
that even good data (and Norvig would not generally characterize voluntary response data as good data) can lead to wrong conclusions if there isn't careful thinking behind a study design. Again, human beings have strong predilections to believe certain kinds of wrong data and wrong conclusions. We are not neutral evaluators of data and conclusions, but have predispositions (cognitive illusions) that lead to making mistakes without careful training and thought.
Another frequently seen reply is that sometimes a "convenience sample" (this is a common term among statisticians for a sample that can't be counted on to be a random sample) of a population offers just that, convenience, and should not be rejected on that basis alone. But the most thoughtful version of that frequent reply I have previously seen in online discussion did correctly point out that if we know from the get-go that the sample was not done statistically correctly, then even if we are confident (enough) that HN participants are young, we wouldn't want to extrapolate from that to conclude that the users of any technology site are young, or that users of the Internet as a whole are young.
On my part, I wildly guess that most HNers are younger than I am in part because this kind of poll recurs often on HN. Other preoccupations of younger rather than older people make up frequent topics on HN, and I've tried looking for signs that there are large hidden numbers of old participants here without finding many.