Somewhat off topic, however, I'm thoroughly convinced that there is a very high probability something is AI generated when I see Em dashes. Anyone else noticing this?
ChatGPT for example almost always uses them. I'm sure they are more common in academic writing, but its now super common on boards like Reddit.
I've been employing em-dashes extensively since I went on a JD Salinger binge circa 2002. Also, "incidentally", for the same reason. I use "Nb" a lot, from reading a bunch of DFW years ago. Oh, and that very-precise construction he does with "which" all the time, I stole that.
Before LLMs, I think em-dashes mostly signaled that you read books and paid attention to details, to the extent they signaled anything.
To generalize your point: A lot of the "brown m&ms" that we've walked around with for detecting a writers status, education, etc., are less useful in an age of LLMs.[1]
We might even be entering some waves of counter-signaling.
[1] They'll never totally nail all of DFW's mannerisms, though.
When looking at the context of a given text, use of certain words or punctuation, can very well indicate AI use.
The "original" example was delve. There is no doubt that AI (did, or still does) use this word at a significantly higher frequency than the average person. I would say the same about em dashes.
When browsing a Reddit thread about a video game, if you encounter numerous comments written perfectly, especially those containing indicators like em dashes, the word delve, or similar language, it certainly can raise the question: am I genuinely seeing comments from users who write this way in this specific context, or is this content more likely produced by an LLM?
It depends. Em dashes in news articles and written publications? Definitely expected. Em dashes on social media or reddit? Either someone who works in typesetting, or an LLM. Most likely an LLM, giving the dying nature of printed media.
Only typography nerds and professional printers care about things like these. Popular media, even modern professional media, hasn't been paying all that much attention.
I’m not sure the same happened with “delve.” I saw an analysis of paper abstracts showing a clear uptick of “delve” starting with the mass-adoption of ChatGPT. Maybe it suddenly became a trendy word — especially in paper abstracts — or maybe more paper abstracts were edited by ChatGPT.
Combining the various "tells" of an LLM (em dashes, delve, grammatical signs etc) with the context (Reddit comments vs professional setting), you could establish a rough probability it was AI generated. At this point, it's the best we can hope for.
There are regular folk who tend to be pedantic with their writing. I'm not sure this is a good test of whether text is generated by LLM. Consider that some may use LLMs to correct spelling or grammar, and the LLMs may often edit an en dash to em dash.
To be clear, It's essentially impossible to know if a given text is autonomously LLM generated (a bot on social media for example) or is the result of revision of real human effort.
To what extent that distinction matters, I'm not sure.
I've encountered and used em dashes regularly for the last 20 years. If most of your reading and writing are associated with social media, I could see the trend you're describing appearing real within that limited context. But em dashes are not new and have been a feature of high quality writing for many decades.
Yes, several of the most popular (and even lesser-popular but newly open-sourced models such as Gemma 3 27b) overuse Em dashes. Even when prompting them to not use dashes, they almost can't help themselves and include them occasionally anyways as it must be part of their learned stylometry. It's just not a common symbol to use at all as most people generally use commas for the same purpose. I can't even remember learning about Em dashes in my college english classes.
I submitted an application which I typeset using LaTeX, and some people thought it was AI-generated because of en and em dashes. I have been using these since forever.
If it's posted through a publishing platform (not just a commend on one or on a public site), it's very possible they do an automatic conversion of some of the common cases. That could also be filtering down to comment boxes and stuff, I'm not sure.
That's not to say that generated content doesn't use them, just that using them as an indicator might require a bit of nuance based on where you're seeing them.
There is a special kind of irony in the fact that habits that used to set one apart from the unwashed masses (like the proper use of punctuation) now serve as a signal for being non-human.
I saw a reel the other day where some Young People(tm) were talking about "the ChatGPT hyphen" (an em-dash.) There was much wailing and gnashing of (false) teeth from Old People(tm) in the comments.
Everyone I know that writes a lot, especially for copy or product design, seems to use em dashes more heavily. I've even seen a Drake format meme where he is shaking his head at parantheses, commas, and colons but—finally—nodding in approval at the em dash.
Em and en dash usage is officially part of style guides such as The Chicago Manual of Style [1], so it's often a work requirement for many writers and editors to use them in writing. This is why these kinds of dashes are everywhere in newspaper and magazine articles.
Eventually, people learn to include them out of habit—especially as most people see them as aesthetically nicer than a simple hyphen (-).
Exactly. If I see an Em/En dash in a publication of really any kind, I don't think twice. Because that's the traditional context for them. Professional writing.
I saw this comment a day ago but it only clicked today. The way we tell it's AI is the use of too formal grammar. I think that means they now pass the Turing test. Or at most a hair's breadth from passing.
Yep, definitely been noticing it, especially on Reddit. It almost always makes me navigate away from the post, unless the author mentions that they’re using AI.
Hold on, I'm coming back to this thread, I think I've cracked it guys. Some real alpha for you right here:
If the em dash has spaces around it -- as seen in AP style -- it was probably written by a real human, because that's how it comes out most conveniently on a word processor.
But if the em dash has no spaces around it--Chicago style--there's a good chance you're looking at LLM slop.
The only people still using em-dashes are those who think it's somehow a signal of high intellect rather than being (extremely) behind the times. Case in point: this exact comment section where you see it with ~10000x the frequency of standard human writing, or even the average HN thread.
Just makes me roll my eyes really seeing a human use an em-dash. We've in the age of informality, and at least for me personally I've definitely filed the em-dash away as "a near guarantee the text was written by a machine". No matter how much and perhaps especially because HN commentators are coming out of the woodworks to insist they've been using it daily for years.
ChatGPT for example almost always uses them. I'm sure they are more common in academic writing, but its now super common on boards like Reddit.