"But any trick prompt like this is going to start giving expected results once it gets well-known enough."
Which makes it difficult to fairly evaluate whether the models have actually gotten better at the feather/iron problem or if it just got enough samples of trick questions that it learned better, either naturally from the internet, or fed as part of the training data. I am fairly certain the training data has had "trick questions" like this added to it, because, I mean, why wouldn't it?
I have noticed in my playing with image AIs that they do seem more prone to getting dragged into local maxima when a human would know the prompt than the LLMs. Perhaps it's all the additional data in an image that reveals it.
Which makes it difficult to fairly evaluate whether the models have actually gotten better at the feather/iron problem or if it just got enough samples of trick questions that it learned better, either naturally from the internet, or fed as part of the training data. I am fairly certain the training data has had "trick questions" like this added to it, because, I mean, why wouldn't it?
I have noticed in my playing with image AIs that they do seem more prone to getting dragged into local maxima when a human would know the prompt than the LLMs. Perhaps it's all the additional data in an image that reveals it.