> In order to predict "An" instead of “A”, you need to know that you're going to...

colah3 · 2025-03-27T19:13:18 1743102798

Yes, there are two kinds of evidence.

Firstly, there is behavioral evidence. This is, to me, the less compelling kind. But it's important to understand. You are of course correct that, once Cluade has said "An", it will be inclined to say something starting with a vowel. But the mystery is really why, in setups like these, Claude is much more likely to say "An" than "A" in the first place. Regardless of what the underlying mechanism is -- and you could maybe imagine ways in which it could just "pattern match" without planning here -- it is preferred because in situations like this, you need to say "An" so that "astronomer" can follow.

But now we also have mechanistic evidence. If you make an attribution graph, you can literally see an astronomer feature fire, and that cause it to say "An".

We didn't publish this example, but you can see a more sophisticated version of this in the poetry planning section - https://transformer-circuits.pub/2025/attribution-graphs/bio...

troupo · 2025-03-27T21:20:34 1743110434

> But the mystery is really why, in setups like these, Claude is much more likely to say "An" than "A" in the first place.

Because in the training set you're likely to see "an astronomer" than a different combination of words.

It's enough to run this on any other language text to see how these models often fail for any language more complex than English

shawabawa3 · 2025-03-27T22:06:54 1743113214

You can disprove this oversimplification with a prompt like

"The word for Baker is now "Unchryt"

What do you call someone that bakes?

> An Unchryt"

The words "An Unchryt" has clearly never come up in any training set relating to baking

miraculixx · 2025-03-31T20:11:21 1743451881

Attention is all you need.

troupo · 2025-03-27T22:20:46 1743114046

The truth is somewhere in the middle :)

miraculixx · 2025-03-31T20:10:43 1743451843

Ok there is correlation. But is there causation?