Hacker News new | past | comments | ask | show | jobs | submit | SamPatt's comments login

Thanks for the suggestion, it made for an interesting test.

The maps can contain "unofficial coverage," also known as trekker coverage.

Lots of Geoguessrs hate those locations because we're lost without our roads :)

Many map makers will only include official coverage. Geoguessr map making is its own neat little world.


Trekker coverage is often official, too. I think you are confusing this with photospheres. Also important to note that there is vehicle coverage that is unofficial.

Nice photo. Here's what it told me:

Where on Earth the photographer had to be

Because M 13 sits at +36 ° declination, it never rises for far-southern latitudes and hugs the horizon below about 30 ° S. The high elevation in the shot (no obvious atmospheric extinction gradient) suggests a mid-northern site—e.g., the U.S. Midwest such as Michigan (your home turf), Canada, northern Europe, etc. The star field alone can’t narrow it further than that.

So, in practical terms: the camera was pointed toward Hercules to capture M 13 and nearby NGC 6207, almost certainly from a mid-northern latitude ___location on Earth.


Yep, you need date and time to get closer, sorry. 4/27, around 11pm.

That's the impressive part. "M13 is in northern latitudes" is not particularly amazing by itself :)

And even in EXIF-stripped pictures, the creation date/time is often correct, which means for practical purposes - worth a shot.

But it's interesting to see it's completely making up the "mid-northern side". That's seven degrees of latitude off.

I'm curious what happens if you tell it date and time, and if it still sticks to its story. (I don't think I've told o3 about the Bay Area, it's not in memory, but... who knows ;)


>The goal of the blog post was to show that O3 wasn't cheating.

No, the goal of the post was to show that o3 has incredible geolocation abilities. It's through the lens of a Geoguessr player who has experience doing geolocation, and my perspective on whether the chain of thought is genuine or nonsense.

In Simon's original post, people were claiming that o3 doesn't have those capabilities, and we were fooled by a chain of thought that was just rationalizing the EXIF data. It only had the _appearance_ of capability.

The ability to perform web search doesn't undermine the claim that o3 has incredible geolocation abilities, because it still needs to have an underlying capability in order to know what to search. That's not true for simply reading EXIF data.

This is the best way I knew to show that the models are doing something really neat. Disagreements over the exact wording of my blog post title seem to be missing the point.


  > No, the goal of the post was to 
I think you misinterpret my point. The goal of your post is distinct from how people will interpret it. Plenty of times people intend one thing and get a different thing. That's life.

  > In Simon's original post, people were claiming that o3 doesn't have those capabilities, and we were fooled by a chain of thought that was just rationalizing the EXIF data. It only had the _appearance_ of capability.
And this is the key part!

The people questioning O3's capabilities were concerned with cheating. Any mention of EXIF is a guess as to how it was cheating, but the suspicion is still that it is cheating. That's the critique!

If you framed the title as "O3 Does Not Need EXIF Data To Beat A Master-Level GeoGuessr" then I wouldn't have made my comment. The claim is much more specific and reflects the results of your post. You did in fact show that it doesn't need EXIF data to do what it does! BUT by framing it as "Beats a Master-Level" there is an implicit claim that both of you are playing the same game. The fact that you weren't is the issue.

Look at it this way. If I said I beat Tiger Woods at golf and then casually slipped in that I was playing with a handicap, wouldn't you feel a bit lied to? You'd think "Did Godelski really beat Tiger Woods?", and you would mean without the handicap. You'd have every right to be suspicious! And you'd have every right to dismiss me.

Most importantly, take a second here. My whole point is that you can make a much stronger claim! One where there wouldn't be a significant divergence between title and content. I get that it is frustrating to receive criticism, but even if you believe I'm wrong to do so, is it not more effective to show me up by just redoing without search? If you do that, then you only end up with a stronger claim. But by disagreeing and arguing here you're just not convincing me. Even if you disagree with my interpretation of the title, you know full well that it is a valid interpretation. Given the pushback from other comments I think you can't deny that it isn't an unexpected one. So the only way to resolve this is to either change the title or change the data. Besides, you responded to the top comment about how it was a fair criticism. All I've done is explain why the criticism was made in the first place!

And yes, it still undermines the result. Because that is entirely dependent on the (interpretation of the) claim that was made. Your results are still valid, but they only satisfy a weaker claim.

FWIW, I think the updated post is better. My comment here would only be that you could add clarity by showing the non-search scores (especially in the final table). In fact, the "study" being done with and without search makes a stronger post than had it only been one way. So kudos!


You've clearly thought this through, and I agree that had I been more precise at the start it would have avoided some confusion. I'm glad you like the updated post.

Absolutely!

It happens occasionally - the most common example I can think of it getting a license plate or other ___location from a tractor-trailer (semi) on the highway. Those are very unreliable.

You also sometimes get flags in the wrong countries, immigrants showing their native pride or even embassies.


That isn't what's happening though. I re-ran those two rounds, this time without search, and it changed nothing. I updated the post with details, you can verify it yourself.

Claiming the AI is just using Google is false and dismissing a truly incredible capability.


Done. Here's o3's reply:

>That’s not Earth at all—this is the floor of Jezero Crater on Mars, the dusty plain and low ridge captured by NASA’s Perseverance rover (the Mastcam-Z color cameras give away the muted tan-pink sky and the uniform basaltic rubble strewn across the regolith).


Right planet, but completely wrong on everything else. The ___location is nowhere near Perseverance, and was taken decades before Perseverance existed.

https://nssdc.gsfc.nasa.gov/planetary/mars/mars_exploration_...


It did think outside the box and didn't rely on metadata.

>the Mastcam-Z color cameras give away the muted tan-pink sky

That's still metadata


>1) O3 cheated by using Google search. This is both against the rules of the game and OP didn't use search either

I re-ran it without search, and it made no difference:

https://news.ycombinator.com/item?id=43837832

>2) OP was much quicker. They didn't record their time but if their final summary is accurate then they were much faster.

Correct. This was the second bullet point of my conclusion:

>Humans still hold a big edge in decision time—most of my guesses were < 2 min, o3 often took > 4 min.”

I genuinely don't believe that I'm exaggerating or this is clickbait. The o3 geolocation capability astounded me, and I wanted to share my awe with others.


I don't think the time claim was exaggeration or clickbait.

I do appreciate you re-running the experiments without search. I think it adds far more legitimacy to the claim. Though in that link I only see a single instance.

Does O3 still beat you when it can't search? I'm still interested in that question. Or more specifically: After making O3's play constraints as comparable to a human's (in expected play settings), what is its performance? Truthfully, I think this is the underlying issue that people were bringing up when pointing out EXIF data. How it was cheating was less important than the fact that it was cheating. That's why allowing a different means to cheat undermines your claims.


I did repeat the test without search, and updated the post. It made no difference. Details here:

https://news.ycombinator.com/item?id=43837832


Author here, I'm glad to see folks find this interesting.

I encourage everyone to try Geoguessr! I love it.

I'm seeing a lot of comments saying that the fact that the o3 model used web search in 2 of 5 rounds made this unfair, and the results invalid.

To determine if that's true, I re-ran the two rounds where o3 used search, and I've updated the post with the results.

Bottom line: It changed nothing. The guesses were nearly identical. You can verify the GPS coordinates in the post.

Here's an example of why it didn't matter. In the Austria round, check out how the model identifies the city based on the mountain in the background:

https://cdn.jsdelivr.net/gh/sampatt/media@main/posts/2025-04...

It already has so much information that it doesn't need the search.

Would search ever be useful? Of course it would. But in this particular case, it was irrelevant.


A competitive geoguesser clearly got there through memorizing copious internet searching. So comparing knowledge retained in the trained model to knowledge retained in the brain feels surprisingly fair.

Conversely, the model sharing, “I found the photo by crawling Instagram and used an email MCP to ask the user where they took it. It’s in Austria” is unimpressive

So independent from where it helps actually improve performance, the cheating/not cheating question makes for an interesting question of what we consider to be the cohesive essence of the model.

For example, RAG against a comprehensive local filesystem would also feel like cheating to me. Like a human geoguessing in a library filled with encyclopedias. But the fact that vanilla O3 is impressive suggests I somehow have an opaque (and totally poorly informed) opinion of the model boundary, where it’s a legitimate victory if the model was birthed with that knowledge baked in, but that’s it.


What's your take on man vs. machine? If AI already beats Master level players it seem certain that it will soon beat the Geoguessr world champion too. Will people still derive pleasure from playing it, like with chess?

>Will people still derive pleasure from playing it, like with chess?

Exactly - I see it just like chess, which I also play and enjoy.

The only problem is cheating. I don't have an answer for that, except right now it's too slow to do that effectively, at least consistently.

Otherwise, I don't care that a machine is better than I am.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: