Hacker News new | past | comments | ask | show | jobs | submit login

Somewhat off topic, however, I'm thoroughly convinced that there is a very high probability something is AI generated when I see Em dashes. Anyone else noticing this?

ChatGPT for example almost always uses them. I'm sure they are more common in academic writing, but its now super common on boards like Reddit.




I've been employing em-dashes extensively since I went on a JD Salinger binge circa 2002. Also, "incidentally", for the same reason. I use "Nb" a lot, from reading a bunch of DFW years ago. Oh, and that very-precise construction he does with "which" all the time, I stole that.

Before LLMs, I think em-dashes mostly signaled that you read books and paid attention to details, to the extent they signaled anything.


To generalize your point: A lot of the "brown m&ms" that we've walked around with for detecting a writers status, education, etc., are less useful in an age of LLMs.[1]

We might even be entering some waves of counter-signaling.

[1] They'll never totally nail all of DFW's mannerisms, though.


What is this very precise construction?


Something like, “the monks wore brown habits, which habits were made from wool”.

The slight ambiguity if you don’t do that now irks me, having seen a way to eliminate it.


So you're saying that when you see an Em dash in someone's prose, it's a big minus?


As I said in another comment, it depends highly on the context and previous / alternative knowledge of the source.


(How about when you see a pun in an HN thread?)

:)


It’s largely the Baader-Meinhof phenomenon. You’ve started noticing it because you just learned about it.


I feel this is an broad oversimplification.

When looking at the context of a given text, use of certain words or punctuation, can very well indicate AI use.

The "original" example was delve. There is no doubt that AI (did, or still does) use this word at a significantly higher frequency than the average person. I would say the same about em dashes.

When browsing a Reddit thread about a video game, if you encounter numerous comments written perfectly, especially those containing indicators like em dashes, the word delve, or similar language, it certainly can raise the question: am I genuinely seeing comments from users who write this way in this specific context, or is this content more likely produced by an LLM?


It sucks that people understanding their own language marks them as possibly AI.


No, it's not. AI uses em dashes far more frequently than the average human.


Why is this getting downvoted? ChatGPT is completely obsessed with em dashes. I don't even know how to make it on my keyboard.


Yeah, people are saying "well you didn't know about em dashes before LLMs".

No, I learned about em dashes in school, I just literally don't know how to type them on my keyboard and I'm too stubborn to learn how to.


It depends. Em dashes in news articles and written publications? Definitely expected. Em dashes on social media or reddit? Either someone who works in typesetting, or an LLM. Most likely an LLM, giving the dying nature of printed media.

Only typography nerds and professional printers care about things like these. Popular media, even modern professional media, hasn't been paying all that much attention.


Plausible. But apparently per TFA it's actually spelled Baader–Meinhof, with an en-dash not a hyphen.


yep. been using them for years. others have too. it’s not weird

same thing happened with “delve” — these are just words and grammar, people use them

there is no accurate way to tell whether text came out of a neural network or not


I’m not sure the same happened with “delve.” I saw an analysis of paper abstracts showing a clear uptick of “delve” starting with the mass-adoption of ChatGPT. Maybe it suddenly became a trendy word — especially in paper abstracts — or maybe more paper abstracts were edited by ChatGPT.


Combining the various "tells" of an LLM (em dashes, delve, grammatical signs etc) with the context (Reddit comments vs professional setting), you could establish a rough probability it was AI generated. At this point, it's the best we can hope for.


Gemini is in love with the phrase "It's important to..."

Whenever I see that at the start of a paragraph I know that there's an 80% chance it was written by Gemini.


There are regular folk who tend to be pedantic with their writing. I'm not sure this is a good test of whether text is generated by LLM. Consider that some may use LLMs to correct spelling or grammar, and the LLMs may often edit an en dash to em dash.


To be clear, It's essentially impossible to know if a given text is autonomously LLM generated (a bot on social media for example) or is the result of revision of real human effort.

To what extent that distinction matters, I'm not sure.


I've encountered and used em dashes regularly for the last 20 years. If most of your reading and writing are associated with social media, I could see the trend you're describing appearing real within that limited context. But em dashes are not new and have been a feature of high quality writing for many decades.


Yes, several of the most popular (and even lesser-popular but newly open-sourced models such as Gemma 3 27b) overuse Em dashes. Even when prompting them to not use dashes, they almost can't help themselves and include them occasionally anyways as it must be part of their learned stylometry. It's just not a common symbol to use at all as most people generally use commas for the same purpose. I can't even remember learning about Em dashes in my college english classes.


I submitted an application which I typeset using LaTeX, and some people thought it was AI-generated because of en and em dashes. I have been using these since forever.


If it's posted through a publishing platform (not just a commend on one or on a public site), it's very possible they do an automatic conversion of some of the common cases. That could also be filtering down to comment boxes and stuff, I'm not sure.

That's not to say that generated content doesn't use them, just that using them as an indicator might require a bit of nuance based on where you're seeing them.


There is a special kind of irony in the fact that habits that used to set one apart from the unwashed masses (like the proper use of punctuation) now serve as a signal for being non-human.


I’ve noticed this, too. ChatGPT especially overuses them relative to other models. It’s an easy tell-sign that something is probably LLM-written.


I saw a reel the other day where some Young People(tm) were talking about "the ChatGPT hyphen" (an em-dash.) There was much wailing and gnashing of (false) teeth from Old People(tm) in the comments.


Everyone I know that writes a lot, especially for copy or product design, seems to use em dashes more heavily. I've even seen a Drake format meme where he is shaking his head at parantheses, commas, and colons but—finally—nodding in approval at the em dash.

I wonder if it's a more recent phenomenon.


Em and en dash usage is officially part of style guides such as The Chicago Manual of Style [1], so it's often a work requirement for many writers and editors to use them in writing. This is why these kinds of dashes are everywhere in newspaper and magazine articles.

Eventually, people learn to include them out of habit—especially as most people see them as aesthetically nicer than a simple hyphen (-).

[1] https://www.chicagomanualofstyle.org/qanda/data/faq/topics/H...


Exactly. If I see an Em/En dash in a publication of really any kind, I don't think twice. Because that's the traditional context for them. Professional writing.


I saw this comment a day ago but it only clicked today. The way we tell it's AI is the use of too formal grammar. I think that means they now pass the Turing test. Or at most a hair's breadth from passing.


Yep, definitely been noticing it, especially on Reddit. It almost always makes me navigate away from the post, unless the author mentions that they’re using AI.


I’m bored with y’alls keyboard habits.

Not all though. Many people on HN use em-dashes and other proper punctuation.


Hold on, I'm coming back to this thread, I think I've cracked it guys. Some real alpha for you right here:

If the em dash has spaces around it -- as seen in AP style -- it was probably written by a real human, because that's how it comes out most conveniently on a word processor.

But if the em dash has no spaces around it--Chicago style--there's a good chance you're looking at LLM slop.


The only people still using em-dashes are those who think it's somehow a signal of high intellect rather than being (extremely) behind the times. Case in point: this exact comment section where you see it with ~10000x the frequency of standard human writing, or even the average HN thread.

Just makes me roll my eyes really seeing a human use an em-dash. We've in the age of informality, and at least for me personally I've definitely filed the em-dash away as "a near guarantee the text was written by a machine". No matter how much and perhaps especially because HN commentators are coming out of the woodworks to insist they've been using it daily for years.


This level of thinly veiled insecurity is just projection on your part.


Maybe you're projecting? Not everyone has an agenda beyond just thinking it looks good.


Yes! It's a tell-tale sign something is written by AI.


it is not




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: