Fair, but OpenAI was doing that half year ago (though limited access; I myself got it maybe a month ago), and I haven't seen it yet translate into anything in practice, so I feel like it (and multimodality in general) must be a GPT-3 level ability at this point.
But I do expect the next qualitative change to come from this area. It feels exactly like what is needed, but it somehow isn't there just yet.
But I do expect the next qualitative change to come from this area. It feels exactly like what is needed, but it somehow isn't there just yet.