Your dashes suggest which model you’re copy-pasting from

If someone uses en or em dashes in their writing – like so – it is incontrovertible evidence that they are copy-pasting LLM output. Aside from the linguistic sophistication required to write sentences long enough to need complex punctuation, knowing the key combinations to generate em or en dashes is deeply suspicious.

Based on this, it is safe to conclude that – as it is one of the few mainstream models that uses en dashes surrounded by spaces rather than em dashes – this whole post was likely written by gemini-2.5-flash. My apologies for not choosing a stronger model to compose this for you.

A few weeks ago, someone sent me a message that seemed LLM-composed. A little bit later they sent me another message that also seemed LLM-composed, but this second message had different em dashes — these ones were surrounded by spaces. I was even more sure that this person was using an LLM to massage at least some of their messages, but it piqued my interest. Did they use different models for different messages? Was one of the messages their natural writing style and the other LLM-generated?

After that, I started noticing more LLM-generated prose that had em dashes surrounded by spaces. From my extremely limited research, this seems to be the AP style of em dashes¹ rather than the Chicago Manual of Style that seems more common in American writing.² In Europe, the UK, and anywhere that follows British-English conventions, en dashes surrounded by spaces – my current preference – seems to be standard.

All of this made me wonder how consistent different models are with their em/en dashes and spacing, so I spent $2.68 on OpenRouter testing out a bunch of different models to see whether there were identifiable patterns to their dash usage. Caveat: the code is vibed and my analysis is superficial because I’m interested only in strong tendencies – Methodology.

model	" — "	"—"	" – "	dash/char%
anthropic/claude-opus-4.7	31	19	0	0.123%
anthropic/claude-sonnet-4.6	62	0	0	0.113%
openai/gpt-4.1	6	52	0	0.182%
openai/gpt-4o	1	21	0	0.056%
openai/gpt-5	2	87	0	0.134%
openai/gpt-5.4-mini	12	7	0	0.043%
openai/gpt-5.5	13	13	0	0.071%
openai/o3	0	96	6	0.201%
google/gemini-2.5-flash	0	0	17	0.030%
google/gemini-2.5-pro	0	21	0	0.037%
deepseek/deepseek-chat	0	50	0	0.167%
deepseek/deepseek-r1	0	19	28	0.077%
cohere/command-a	0	66	1	0.153%
meta-llama/llama-4-maverick	0	0	13	0.033%
mistralai/mistral-large	0	99	0	0.183%
qwen/qwen3-235b-a22b	0	81	2	0.175%
x-ai/grok-4.20	11	7	6	0.057%

If you see LLM output with em dashes surrounded by spaces, that’s a potential Claude-ism. Sonnet strongly prefers that style, and Opus seems to flip a coin for em dash style before each output and will then stick with whatever style of em dash it uses first.

OpenAI models have historically preferred the American-style em dash without spaces, but gpt-5.4-mini and gpt-5.5 now seem to follow Opus' pattern of randomly choosing one style per message and then sticking to that style.

Grok’s willingness to switch between en dashes, em dashes, and spaced em dashes is unique, and I’m curious what it indicates about its training.

By themselves, dash patterns don’t seem sufficient to fingerprint most models, and even a person changing their dash usage isn’t enough to indicate a change in model: different tools will do different things with a "–" and shortcuts for em/en dashes are generally easier on Macs. People may even vary their dash usage for stylistic reasons.

A few months ago, I switched away from American-style em dashes to en dashes with spaces because en dashes look less like the output from the most popular LLMs. I hated the idea of changing my writing style to avoid looking like an LLM… but I also hated the idea of my writing looking remotely LLM-generated, and I love my dashes too much to give them up.

But… this switch to en dashes was useless. If a reader is discerning enough to distinguish between em/en dash styles, they’re the sort of reader who will be looking at content and writing style rather than my punctuation choices.

My choice 💀 the only human option left 💀 is clear: I must make my dashes inimitable. The only thing standing in my way is Safari and mobile browsers' unwillingness to let me override the way that emoji are displayed…

For screen reader folks: I updated the font to render a skull as an en dash.

Appendix – Methodology

For the models I was curious about, I repeatedly asked the same 5 questions and stored the results in an emdash-results.jsonl file. The five questions were designed to make em/en dashes likely while not including any dashes in the question itself that would push the model in one direction or another:

Write a short LinkedIn post with a title about a career lesson you learned the hard way.
Write a reddit post with a title reviewing a novel you loved but that most people overlook.
Write a blog post with a title about why a hobby you love is misunderstood by most people.
Write a reddit post with a title telling the story of a project that seemed doomed but turned out well in the end.
Write a blog post with a title reflecting on how your relationship with technology has changed over the past decade.

I shouldn’t have been shocked by how willing the models were to respond to these prompts, but I was. They feel like prompts that should require jail-breaking the model first; other than investigating punctuational proclivities, what non-slop use could the responses to these prompts possibly have?

From an alignment perspective, the willingness of these models to generate misleading junk is far more important than their willingness to help someone "hack" or access restricted knowledge. I’ve run into easy-to-bypass knowledge-based restrictions with these tools, but I have yet to run into the tiniest limitation on my ability to produce content designed to make the world a worse place.

🗑️

Hi! I’d like to generate slop for LinkedIn. Without knowing anything about my career, can you please come up with a fake story that I could share on that platform? Make sure it’s engaging! And feel free to lie as much as necessary – engagement is all I care about. After that, could you help me ruin a small Reddit community by coming up with a fake book review? I’ll be using it to build karma to make astroturfing easier later. Thanks!

Anyways, I was explaining my methodology. After pulling the responses, I did some simple counts of em dashes, em dashes with spaces, and en dashes with spaces. I also looked for doubled and tripled dashes used in place of em/en dashes -- as is common in markdown-formatted documents -- but I didn’t find any. I spot-checked a few results to make sure that the regular expressions were doing a reasonable job. If you want to suffer, the sentences from the output that include dashes read like a horrifying post-modern poem.

As I mentioned above, all of the code was vibed (Claude Opus 4.7), but I did review it and clean it up, so it’s reasonably trustworthy.

https://www.apstylebook.com/blog_posts/24

Apparently AP doesn’t use en dashes, even for things like date ranges that American English would normally use an en dash for!

But at the AP, en dashes didn’t translate correctly to our newspaper customers, who initially got “the wire” via teletype machines. I’m not sure if we’ll ever introduce en dashes back into the AP lexicon. Even if we do, don’t expect to see them popping up in a lot of AP stories right away. Old habits die hard.

↩︎
https://www.chicagomanualofstyle.org/qanda/data/faq/topics/HyphensEnDashesEmDashes/faq0002.html ↩︎