Aren't false positives acceptable in this situation? I'm assuming a human (paper author, journal editor, peer reviewer, etc) is reviewing the errors these tools are identifying. If there is a 10% false positive rate, then the only cost is the wasted time of whoever needs to identify it's a false positive.
I guess this is a bad idea if these tools replace peer reviewers altogether, and papers get published if they can get past the error checker. But I haven't seen that proposed.
You'd win that bet. Most journal reviewers don't do more than check that data exists as part of the peer review process—the equivalent of typing `ls` and looking at the directory metadata. They pretty much never run their own analyses to double check the paper. When I say "pretty much never", I mean that when I interviewed reviewers and asked them if they had ever done it, none of them said yes, and when I interviewed journal editors—from significant journals—only one of them said their policy was to even ask reviewers to do it, and that it was still optional. He said he couldn't remember if anyone had ever claimed to do it during his tenure. So yeah, if you get good odds on it, take that bet!
Note that the section with that heading also discusses several other negative features.
The only false positive rate mentioned in the article is more like 30%, and the true positives in that sample were mostly trivial mistakes (as in, having no effect on the validity of the message) and that is in preprints that have not been peer reviewed, so one would expect that that false positive rate would be much worse after peer review (the true positives would decrease, false positives remain).
And every indication both from the rhetoric of the people developing this and from recent history is that it would almost never be applied in good faith, and instead would empower ideologically motivated bad actors to claim that facts they disapprove of are inadequately supported, or that people they disapprove of should be punished. That kind of user does not care if the "errors" are false positives or trivial.
Other comments have made good points about some of the other downsides.
People keep offering this hypothetical 10% acceptable false positive rate, but the article says it’s more like 35%. Imagine if your workplace implemented AI and it created 35% more unfruitful work for you. It might not seem like an “unqualified good” as it’s been referred to elsewhere.
It depends if you do stuff that matters or not. If your job is meaningless, then detecting errors with a 35% false positive rate would just be extra work. On the other hand, if the quality of your output matters - 35% seems like an incredibly small price to pay if it also detects real issues.
Lots to unpack here but I'll just say that I think it would probably matter to a lot of people if they were forced to use something that increased their pointless work by 35%, regardless of whether their work mattered to you or not.
> is reviewing the errors these tools are identifying.
Unfortunately, no one has the incentives or the resources to do doubly triply thorough fine tooth combing: no reviewer or editor’s getting paid; tenure-track researchers who need the service to the discipline check mark in their tenure portfolios also need to churn out research…
I guess this is a bad idea if these tools replace peer reviewers altogether, and papers get published if they can get past the error checker. But I haven't seen that proposed.