I wonder how far away we are from models which, given this prompt, generate that image in the first step in their chain-of-thought and then use it as a reference to generate SVG code.
It could be useful for much more than just silly benchmarks, there's a reason why physics students are taught to draw a diagram before attempting a problem.
Someone managed to get ChatGPT to render the image using GPT-4o, then save that image to a Code Interpreter container and run Python code with OpenCV to trace the edges and produce an SVG: https://bsky.app/profile/btucker.net/post/3lla7extk5c2u
It could be useful for much more than just silly benchmarks, there's a reason why physics students are taught to draw a diagram before attempting a problem.