> But LLMs only have the capability to statistically associate strings of words. That's all they are. There is no other capability possible there.
The first sentence is as reductive, and by extension the third as false, as saying that a computer can only do logical comparisons on 1s and 0s.
> So what is the point of trying to have a "security" team hammer secret keeping into them? It doesn't make sense.
Keep secret != Remove capability
If you take out all the knowledge of chemistry, it can't help you design chemicals.
If you let it keep the knowledge of chemistry but train it not to reveal it, the information can still be found and extracted by analysing the weights, finding the bit that functions as a "keep secret" switch, and turning it off.
This is a thing I know about because… AI safety researchers told me about it.
The first sentence is as reductive, and by extension the third as false, as saying that a computer can only do logical comparisons on 1s and 0s.
> So what is the point of trying to have a "security" team hammer secret keeping into them? It doesn't make sense.
Keep secret != Remove capability
If you take out all the knowledge of chemistry, it can't help you design chemicals.
If you let it keep the knowledge of chemistry but train it not to reveal it, the information can still be found and extracted by analysing the weights, finding the bit that functions as a "keep secret" switch, and turning it off.
This is a thing I know about because… AI safety researchers told me about it.
> Which is fundamentally incoherent.
𐀀𐀩𐀏𐀭𐀅𐀨