Hacker News new | past | comments | ask | show | jobs | submit login

For a while, em dashes were really popular among LLM enthusiasts because of the idea that it would encourage the LLM to draw from training data that contained em dashes—which typically were higher quality training data written by a professional writer or somebody with a professional editor. Subjectively, I think it worked. I suspect that the LLMs trained to be used as chatbots were finetuned to use the em dash liberally for that reason. Now, after a few generations of these models, I think that the em dash is starting to have the effect of drawing from "slop" training data that was written by other LLMs rather than well-written human data.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: