Aren't false positives acceptable in this situation? I'm assuming a human (paper...

csa · 2025-03-08T15:40:03 1741448403

> I'm assuming a human (paper author, journal editor, peer reviewer, etc) is reviewing the errors these tools are identifying.

This made me laugh so hard that I was almost crying.

For a specific journal, editor, or reviewer, maybe. For most journals, editors, or reviewers… I would bet money against it.

karaterobot · 2025-03-08T19:15:49 1741461349

You'd win that bet. Most journal reviewers don't do more than check that data exists as part of the peer review process—the equivalent of typing `ls` and looking at the directory metadata. They pretty much never run their own analyses to double check the paper. When I say "pretty much never", I mean that when I interviewed reviewers and asked them if they had ever done it, none of them said yes, and when I interviewed journal editors—from significant journals—only one of them said their policy was to even ask reviewers to do it, and that it was still optional. He said he couldn't remember if anyone had ever claimed to do it during his tenure. So yeah, if you get good odds on it, take that bet!

RainyDayTmrw · 2025-03-08T16:54:20 1741452860

That screams "moral hazard"[1] to me. See also the incident with curl and AI confabulated bug reports[2].

[1]: Maybe not in the strict original sense of the phrase. More like, an incentive to misbehave and cause downstream harm to others. [2]: https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-stands-f...

xeonmc · 2025-03-08T15:34:26 1741448066

Let me tell you about this thing called Turnitin and how it was a purely advisory screening tool…

topaz0 · 2025-03-08T22:28:06 1741472886

Note that the section with that heading also discusses several other negative features.

The only false positive rate mentioned in the article is more like 30%, and the true positives in that sample were mostly trivial mistakes (as in, having no effect on the validity of the message) and that is in preprints that have not been peer reviewed, so one would expect that that false positive rate would be much worse after peer review (the true positives would decrease, false positives remain).

And every indication both from the rhetoric of the people developing this and from recent history is that it would almost never be applied in good faith, and instead would empower ideologically motivated bad actors to claim that facts they disapprove of are inadequately supported, or that people they disapprove of should be punished. That kind of user does not care if the "errors" are false positives or trivial.

Other comments have made good points about some of the other downsides.

rainonmoon · 2025-03-08T23:31:02 1741476662

People keep offering this hypothetical 10% acceptable false positive rate, but the article says it’s more like 35%. Imagine if your workplace implemented AI and it created 35% more unfruitful work for you. It might not seem like an “unqualified good” as it’s been referred to elsewhere.

ALittleLight · 2025-03-10T06:01:40 1741586500

It depends if you do stuff that matters or not. If your job is meaningless, then detecting errors with a 35% false positive rate would just be extra work. On the other hand, if the quality of your output matters - 35% seems like an incredibly small price to pay if it also detects real issues.

rainonmoon · 2025-03-10T23:21:36 1741648896

Lots to unpack here but I'll just say that I think it would probably matter to a lot of people if they were forced to use something that increased their pointless work by 35%, regardless of whether their work mattered to you or not.

nxobject · 2025-03-08T18:51:13 1741459873

> is reviewing the errors these tools are identifying.

Unfortunately, no one has the incentives or the resources to do doubly triply thorough fine tooth combing: no reviewer or editor’s getting paid; tenure-track researchers who need the service to the discipline check mark in their tenure portfolios also need to churn out research…