Unless you were running the model locally in a way you control, it seems very possible that the model has additional hidden context regarding your ___location (perhaps based on IP, perhaps based on credit card billing information, etc).
ChatGPT doesn't let you share chats with images (why?), but here is what o3 said for an image with memory turned off:
"Thought for 11 seconds
It’s hard to pin down an exact spot from this single photo, but a few visual clues narrow the possibilities:
Temperate-zone trees: Most of the trees have lost their leaves, but scattered white-pines or spruces remain green. That mix—deciduous hardwoods plus a sprinkling of conifers—is typical of the northeastern United States and southeastern Canada.
Broad, moderately fast river: The channel looks 75-100 m wide with swirling surface currents but no large rapids, suggesting a mid-sized river in its lower reaches—something like the Merrimack, Connecticut, Delaware, or Susquehanna rather than a small mountain stream.
Late-winter / early-spring light: Bare branches and a clear blue sky with fairly high sun angle hint at March–April in that latitude range.
So my best guess is that you’re standing on the bank of a northeastern U.S. river—perhaps somewhere in New England or the Mid-Atlantic—during late winter or early spring. If you can share any nearby landmarks, town names, or additional photos, I might be able to zero in more precisely."
With memory turned on:
"Thought for 5 seconds
Beautiful spot! From what I can see:
a broad, slow‑moving river with a sandy/rocky bank in the foreground
bare deciduous trees mixed with some white pines on the far shore (so it’s probably early spring or late autumn)
no mountains on the horizon, just gently rolling terrain typical of low‑lying floodplains in the U.S. Northeast
That mix of hardwoods and pines, plus the flat, braided water surface, reminds me a lot of the Merrimack or Concord Rivers in Massachusetts or southern New Hampshire—both have public footpaths where you can stand right at the water’s edge like this.
Of course, many temperate rivers look similar. If you can give me a hint (e.g., the town or nearby landmark), I can narrow it down further—maybe even point you to trail maps or history of the exact stretch you photographed."
Southern NH is correct, and I am certain it is drawing from memory/past chats. However, I can't replicate a specific behavior I once had, which is in temporary chat (no past chat/memory enabled), it said that it guessed where the photo was taken based on my ___location.
"ChatGPT doesn't let you share chats with images (why?)"
Probably because if you uploaded pornography (or illegal imagery) to ChatGPT and then shared a link with the world it would be embarrassing for OpenAI.
My guess is that OpenAI are risk averse on this particular issue, because people could come up with some very creative awful scenarios with ChatGPT and image analysis. "Which of these people looks the most stupid" plus a jailbreak would be instant negative press.
Finance is sort of a unique beast in that the field is inherently negative-sum. The profits you take home are always going to be profits somebody else isn't getting.
If you're doing like, real work, solving problems in your ___domain actually adds value, and so the profits you get are from the value you provide.
If you're algo trading then yes, which is what the person you're replying to is talking about.
But "finance" is very broad and covers very real and valuable work like making loans and insurance - be careful not to be too broad in your condemnation.
AI has a long way to go before it can serve as a trustworthy middleman between research papers and patients.
For instance, even WebMD might waste more time in doctor's offices than it saves, and that's a true, hallucination-free source, written specifically to provide lay-people with understandable information.
This study found that an LLM outperformed doctors "on a standardized rubric of diagnostic performance based on differential diagnosis accuracy, appropriateness of supporting and opposing factors, and next diagnostic evaluation steps, validated and graded via blinded expert consensus."
If you look in the discussion section you'll find that wasn't exactly what the study ended up with. I'm looking at the paragraph starting:
> An unexpected secondary result was that the LLM alone performed significantly better than both groups of humans, similar to a recent study with different LLM technology.
They suspected that the clinicians were not prompting it right since the LLM without humans was observed to be outperforming the LLM with skilled operators.
Exactly - if even the doctors/clinicians are not "prompting it right," then what are the odds that the layperson is going to get it to behave and give accurate diagnoses, rather than just confirm their pre-existing biases?
Read it for the first time just now - it seems to me that Pierrot has stolen the narrator's purse (under the guise of dusting the chalk from their cloak) and successfully convinced them to blame Truth, instead. There's almost certainly more to it that I'm missing.
That's the core of it, but it's implied, not outright stated, and requires some tricky language parsing, basic theory of mind, and not being too distracted by the highly symbolic objects.
Sure there is. If you want to know if students understand the material, you don't hand out the answers to the test ahead of time.
Collecting a bunch of "Hard questions for LLMs" in one place will invariably result in Goodhart's law (When a measure becomes a target, it ceases to be a good measure). You'll have no idea if the next round of LLMs is better because they're generally smarter, or because they were trained specifically on these questions.
The small and successful company (~100 people) my brother-in-law works at is currently self-destructing, specifically because the CEO is that exact kind of family-loyalty "father figure" wannabe.
The latter - he valued loyalty over competency (ending up surrounding himself with yes men), and demanded too much of the competent folk until they burnt out and left.
Assuming you're serious, there are two main reasons. You want to regain storage space after you delete things, and sometimes you want to actually delete things (e.g. to be in compliance with regulations).
One thing I notice is that they specify that the robot has never seen the homes before, but certain objects, like the laundry baskets, are identical.
Doing your demo is significantly easier if you've already programmed/trained the robot to recognize the specific objects it has to interact with, even if those items are in different locations.
They also got these things working corners of a ___location instead of stacking tasks on different areas of the same ___location. And even on these "one-area" task groups it can fail a good amount. Kudos to them for showing the failures though
isn't object recognition essentially solved? AI models were beating humans at image classification (in terms of error rate) back in 2016. even if this particular model isn't the best at it, they can always call out to an API or have a secondary on-device VLM that has stronger object recognition capabilities
reply