The actual claim of this paper is that any watermark can be erased, which it justifies by proving the existence of an Erase function that removes the watermark without degrading quality. Therefore, since any watermark can be erased, let's use the simplest one that's visually identical.
This is true, of course, but also vacuous. The problem is that there is no sense of the computational complexity or difficulty of implementing the Erase function. The proof holds even if the watermark can only be removed in O(e^n) or some similarly absurd time span. A good watermark, like a good encryption scheme or a good password, is one that can be verified quickly but must be reversed slowly.
The paper's stance is no different from saying "since any watermark can be reversed, none of them matter, and we should just use THIS WAS WRITTEN BY CHATGPT".
Which is why you should never assume that a paper makes sense merely because it is technically correct.
> That's no different from saying "since any watermark can be reversed, none of them matter, and we should just use THIS WAS WRITTEN BY CHATGPT".
I’m not sure we shouldn’t do this; at least people wouldn’t put false stock in watermark detection methods. As it is now, what a watermark not being detected says is that this wasn’t copy-pasted directly from an LLM, but what lay people will understand is that “this was not produced by an LLM.” That’s a dangerous muddying of the waters in my opinion.
You can make watermarks that survive manual transcription. But then you have to get a lot more clever than replacing whitespace with different whitespace, and instead encode information in parts a human would preserve. Word choice, spelling choices, punctuation, etc. provide a lot of space to encode a signal. Basically anything used by traditional authorship detection/stylometry.
A few lifetimes back, when I was in high school, I developed a text stegonography program that encoded this way. If relied entirely on minor changes to grammar that didn't make the text ungrammatical.
That code is long lost, and I was a kid, so never characterized it in any meaningful way. But it was interesting and seemed to work. The downside was that you had to have a fairly lengthy text to encode anything but very short messages.
I mean, if the detection method is open for people tp use they can perturb it until it isn’t detected anymore and if it isn’t open how do we know it isn’t full of false positives.
The approach proposed in this paper is to watermark LLM generated text using character-substitution from various simple characters (normal whitespace, normal letters, etc) to semantically equivalent Unicode code points (such as U+2004 THREE-PER-EM SPACE instead of normal spaces, or replacing specific character sequences with equivalent ligatures).
The authors appear to be entirely aware that this sort of substitution can be trivially stripped out by normalizing down to a simplified character set ("The critical limitation of Whitemark is that it can be bypassed by replacing all whitespaces with the basic whitespace U+0020, then the validator can no longer detect the watermark"), but believe that it still has value because the typical student using an LLM to write their essay won't know anything about Unicode.
This seems a bit naive to me. Implementing the necessary "watermark remover" normalization as a simple webapp would be an easy afternoon project for most of us here, and if this approach reached any sort of widespread use there would be many such sites. Students who intend to cheat by using an LLM to write their essays are entirely capable of learning "there's some secret data hidden in the text so copy-paste it through this other site to strip that out before turning it in". Even without access to such a tool they could simply...retype the text themselves?
Arguably this still has some value. In most contexts there is minimal downside to watermarking the generated text in this way, and a slight possibility of catching some cases in which people lazily present LLM generated text as human written. However this might give people a misplaced belief that the absence of such a watermark means the text is authentically human authored, which might outweigh the benefits of catching the occasional lazy or ignorant user.
> Students who intend to cheat by using an LLM to write their essays are entirely capable of learning "there's some secret data hidden in the text so copy-paste it through this other site to strip that out before turning it in"
In fact there is precedent for this. When I was at school a lot of kids would start writing an essay by copy and pasting the most relevant Wikipedia article into Microsoft Word, and then edit it to sound different, but this resulted in a subtle light-blue background being inserted into the resulting printed page, which made it very obvious that they had copied from Wikipedia. They quickly learnt that they had to paste it through Notepad or similar first to get rid of the background colour.
Has anyone ever actually wanted paste from an HTML source into a word processor to drag all the random formatting along for the ride? I still don't understand why that is the default. I see about an email per day with mismatched styles because people are pasting from various documents, each with their own slightly different formatting. No one really cares, presumably, but it's ugly. Give me plain text any day (except as the company logo is inserted as an image, it's verboten by the server now).
It's usually Ctrl-Shift-V to not include formatting (or get a menu of options, of which that's one), by the way.
I like it when Excel parses and recreates a HTML table into a spreadsheet. You could possibly paste it as text and split to columns, but I doubt it would work as well.
Reminds me of the time I was in college and a friend of mine in CS 101 wanted my help in stealing somebody else's programming assignment. We had no trouble stealing one from an account which had open permissions but the program had bugs and we had to fix it.
I could hardly comprehend, at that time, how much this was preparation for a career in software development.
Isn't that the key take away? Legal plagiarism is rampant in the corporate world. Trying to root it out at the university level now that LLMs have trivialized content stealing seems like a waste of time.
This is so very true.... Academia has not fully adapted to the reality of the world now with the internet and AI, and the reality of real world use cases, work environments and human nature in relationship to it.
We're still in the early stages of it, but AI is and will continue to force us to re-explore our relationships with work, productivity, authenticity and what really matters to us about the "human element" in anything.
At least as far as a career in software engineering goes in the United States, the field moves so fast and is so thoroughly differentiated that it seems like it might make more sense from both the student's and the employer's perspective to replace the current university-to-job pipeline with an apprenticeship program of sorts. Given the emphasis placed on internships in college Computer Science programs, this seems to already be implicitly understood and inefficiently implemented on some level.
Come to think of it, the current socioeconomic equilibrium where students take out loans (or pull on their parents' purse strings) to fund their own education to provide more value to future employers than they ultimately get back seems woefully inefficient, not just for software engineering, but for most academic and industrial fields more generally.
Why not run application cycles or even scout students directly out of high school and enroll them in professional programs run by the organizations themselves in exchange for some number of months or years of discounted labor? Obviously, this isn't happening because it transfers risk from individuals to organizations, but it also seems obvious that, were it subsidized or enforced in some way (insurance?), it might lead to better, more equitable outcomes.
Has anyone else had similar thoughts? Or thoughts to the contrary?
Universities are not meant as incubators for office bees. They are meant to allow yourself being curious in whatever field you signed up for. They are meant to be places for critical discussions. In some countries that's working fine. In others, not so much. Anyways, universities are not schools.
As popular as ChatGPT is, I'm sure it will only take a few weeks before TikToks are widely circulating instructing users how to un-flag text if this was adopted. It would be so widely known that even non tech savvy students would be searching or asking friends how to get away with using it. Either a web app or saving as ASCII text in Notepad will probably be the preferred approach.
There are so many ways you could catch leakers of sensitive information this way. Look at how often government agencies react information in PDFs by writing black blocks over the text.
Note it could be used for authentication in the opposite direction, only accepting text with the unusual spaces in it.
So far as catching the indolent and the ignorant, making an example here or their works wonders.
I wonder if it is possible to implement a watermark via patterns of Oxford comma occurrences and similar linguistic styles that would fit into a simple character set.
It's possible to implement a purely informational watermark lots of ways, but no such watermark long survives wide awareness of its use. To know how it works is to know how to defeat it.
This specific scheme is also not remotely novel; I once saw it implemented, something like six or eight years back, in an effort to quell leaks to an industry rag with a habit of posting paragraph-length excerpts verbatim. They also did this with some of the watermarked emails, having stripped the watermarking whitespace before publication.
> I wonder if it is possible to implement a watermark via patterns of Oxford comma occurrences
This would be a glaring stylistic inconsistency in every text produced with a watermark. You could just as well implement a watermark by doing automated thesaurus replacements on certain of the words and using the index of the selected entry as a code.
A watermark that deeply unnerves everyone who reads the text can carry information, but it tends to render the tool itself unfit for purpose.
> Even without access to such a tool they could simply...retype the text themselves?
There was a story I remember hearing, I think from an older student during highschool or during college from another student's highschool, where some kid was cheating by copying a hand-written paper from another student, and the paper had two names on it. They had put their name in the corner then just blindly wrote all text on the other paper, including the other student's name.
Yeah, someone will certainly make a tool to strip that out. This doesn't seem as useful for cheating, but more as a notice for content that was AI generated. That could be useful just as an indicator for general website content. Presumably there are also trivial watermarks for images/video/audio. Of course it can be easily foiled, so it's more of a disclaimer out of politeness.
Just a fun rejoinder. One of the most important 'papers' [1] in the history of quantum mechanics was a short 'letter to the editor' by Bohr, in which he quickly plugged some numbers into a formula and compared with a recent experiment. Probably didn't take him more than a few hours to write it.
That observation told us that we were on the right track of both predicting and explaining the spectra of atoms.
Of course, this paper is likely nothing like Bohr's work. But sometimes very simple ideas have far reaching consequences.
I'm surprised it took this long for someone to write it up as a paper. I assume there's plenty of other somewhat obvious things that could be written up and published, if they haven't already.
To the comments about this being easy to defeat: when it comes to detecting whether a person submitted a document containing LLM generated text (whether a law document, school essay, work document etc) the real value in a technique like this is high precision, not necessarily high recall.
Yes many people can circumvent this simple watermark technique but for those who don't, it is essentially guaranteed that they used a LLM if their text has clearly atypical unicode marks (Whether U+2004, ligatures, or variant selectors). Thus an organization can feel confident in taking action against the individual who submitted the document.
Whereas right now there are a bunch of dubious "LLM detector" models that output a confidence score that may or may not correspond to whether the person used an LLM. This low precision technique leads to people getting incorrectly accused of using LLM content.
So in my opinion, a world of high precision (but potentially low recall) LLM watermarks using simple techniques is way better than this current high-noise low precision black box world of low quality "LLM detector models"
What I don't get is who will apply the watermark? Certainly not the company running the LLM (why would they degrade the output of a commercial product?). The people submitting text generated by the LLM have no interest to do that. The only people who have interest in this watermark are the recipient but they aren't involved in the production.
>why would they degrade the output of a commercial product?
For PR? It's not a degradation for legitimate uses of AI. It only degrades output being used in an attempt to mislead people. Someone using an LLM to e.g. translate would usually be fine admitting they used it. I'm working under the assumption this isn't intended for something like a code model where it would break things, but only for output being used as readable text.
Instead of adding in secret whitespace unicode characters, wouldn't an approach similar to a Bloom filter work pretty well?
Just spitballing here.
1. Identify the ~N most common tokens (let's say N is 5), and call this set S.
2. Restrict the model so that every T tokens it emits, the Tth token must be from the set S.
Maybe you can be clever and say T must be a prime number or something.
Anyway, the quality of the output should suffer minimally, since even though you are constraining the model to pick "sub-par" tokens every T tokens, it still gets to pick from the N most common ones anyway.
And to validate, you simply scan the text and see if every T tokens is always from the set S. If yes, there's a high probability it has the watermark (similar to a Bloom filter, adjust the values to adjust the probability). If no, then it's 100% guaranteed to not have the watermark.
Of course, there are pitfalls. What if you ask the model to generate code? Which is full of uncommon symbols. If you happen to get unlucky, then maybe the only token that makes sense at a given position is '}', and if you force the model to select from ['the', 'a', 'not'] etc then it simply cannot produce a good output. Still, the approach is interesting if you ask me.
Another pitfall: this is easily circumvented by the end user generating a long text and then randomly adding/removing a few words here and there. This could be solved by changing the simple check of "every Tth token belongs to S" to something like "the average distance between subsequent S tokens is very close to T".
Also embarrassingly easy to defeat, but I do think it probably still has a place in catching low effort plagiarism especially from people who don't have a good feedback loop for getting caught (think bloggers more than students).
<i>Text watermarking is becoming increasingly important with the advent of Large Language Models (LLM). LLMs can generate texts that cannot be distinguished from human-written texts.</i>
If they cannot be distinguished, then there is no need to distinguish them. Seriously.
Worried about someone having an LLM do their work? Why? Ad long as the work us good, does it matter?
Teachers worried that students will have LLMs do their assignments? You need different assignments and better tests.
LLMs are a tool. There were typewriters, then word processors, then spelling and grammar checkers. Now we have LLMs. Progress is great!
Paying someone on fiver.com to write your essay for you is also difficult to distinguish from completing your assignment honestly, but that's still cheating, and that distinction should be drawn nevertheless. "First there were typewriters, now there's fiver.com, c'mon, it's just another tool"—call it a tool if you want, but it's a tool students aren't allowed to use when being assessed, because it prevents accurate assessment.
It's interesting I remember an article that explains how Genius.com utilized a unique method of watermarking their lyrics by embedding patterns of straight spaces and non-breaking spaces within the lyrics, that approach seems to be a more robust method of avoiding "text cleaning", which is a common when you scrape text. The method from the article uses Unicode characters that could be easily "clean".
Not only can the watermarks be deleted easily. They might also get deleted accidentally by some post-processing like tokenization for grammar and spellchecking.
Also, I think 3.1 and the following proof is pseudo-formalism. A simple sentence that explains the reason is enough.
This is true, of course, but also vacuous. The problem is that there is no sense of the computational complexity or difficulty of implementing the Erase function. The proof holds even if the watermark can only be removed in O(e^n) or some similarly absurd time span. A good watermark, like a good encryption scheme or a good password, is one that can be verified quickly but must be reversed slowly.
The paper's stance is no different from saying "since any watermark can be reversed, none of them matter, and we should just use THIS WAS WRITTEN BY CHATGPT".
Which is why you should never assume that a paper makes sense merely because it is technically correct.