My favorite example was a game of pong with the goal of staying alive as long as...

chefandy · 2025-02-24T15:20:14 1740410414

My favorite was the ML learning how to optimally make the lowest-impact landing in a flight simulator— it discovered that it could wrap the impact float value if the impact was high enough so instead of figuring out the optimal landing, it started figuring out the optimal path to the highest-impact crashes.

hammock · 2025-02-24T15:47:57 1740412077

This comment ought to be higher up. Such a perfect summary of what I have struggled to understand, which is the “danger” of AI once we allow it to control things

And yes you can fix the bug but the bike wheel guy shows you there will always be another bug. We need a paper/proof that invents a process that can put an AI-supported (non human intervention) finite cap or limiter or something on the possible bug surface

themaninthedark · 2025-02-24T17:50:02 1740419402

There is an apocryphal story about AI:

Conglomerate developed an AI and vision system that you could hook up to your Anti-aircraft systems to eliminate any chance of friendly fire. DARPA and the Pentagon went wild, pushing the system through test so they could get to the live demonstration.

They hook up a live and load up dummy rounds system, fly a few friendly planes over and everything looks good however when they fly a captured Mig-21 over the system fails to respond. The Brass is upset and the engineers are all scratching their heads trying to figure out what is going on but as the sun sets the system lights up, trying to shoot down anything in the sky.

They quickly shut down the system and do a postmortem, in the review they find that all the training data for friendly planes are perfect weather, blue sky overflights and all the training data for the enemy are nighttime/ low light pictures. The AI determined that anything fling during the day is friendly and anything at night is terminate with extreme prejudiced.

red-iron-pine · 2025-02-24T21:17:44 1740431864

we used synthetic data for training a (sort of) similar system. not gonna get into the exact specifics, but we didn't have a lot of images of one kind of failure use-case.

like they're just not that many pictures of this stuff. we needed hundreds, ideally thousands, and had, maybe, a dozen or so.

okay, so we'll get a couple of talented picture / design guys from the UI teams to come out and do a little photoshop of the images. take some of the existing ones, play with photoshop, make a couple of similar-but-not-quite-the-same ones, and then hack those in a few ways. load those into the ML and tell em they're targets and to flag on those, etc. etc.

took a week or two, no dramas, early results were promising. then it just started failing.

turns out we ran into issues with two (2) pixels, black pixels against a background of darker black shades, that the human eye basically didn't see or notice; these were artifacts from photoshopping, and then re-using parts of a previous image multiple times. the ML started determining that 51% or more of the photos had those 2 pixels in there, and that photos lacking those -- even when painfully obvious to the naked eye -- were fails.

like, zooming in at it directly you're like yea, okay, those pixels might be different, but otherwise you'd never see it. thankfully output highlighting flagged it reasonably quickly but still took 2-3 weeks to nail down the issue.

chefandy · 2025-02-25T05:13:25 1740460405

It wouldn’t be cheap but I could see 3D modeling and physically based rendering (I’ve been working with Octane but others should do the job fine) being a really good use case for this. Having a bazillion years in 2D but getting into 3D at a professional level a few years ago, I don’t think I’d even suggest using a purely 2D approach if I was looking for optimal results. Match all the camera specs, simulate all sorts of lighting and weather patterns from all sorts of angles, etc.

Someone · 2025-02-24T21:39:03 1740433143

That likely is an urban legend. See https://gwern.net/tank

themaninthedark · 2025-02-25T16:28:54 1740500934

Yes.

>apocryphal

>of doubtful authenticity

https://www.merriam-webster.com/dictionary/apocryphal

aurbano · 2025-02-24T16:02:28 1740412948

Is AI the danger, or is our inability to simplify a problem down to an objective function the problem?

If anything, AI could help by "understanding" the real objective, so we don't have to code these simplified goals that ML models end up gaming no?

TeMPOraL · 2025-02-24T16:35:49 1740414949

Simplification is the problem here, arguably. Even a simple-sounding objective (say, a bicycle wheel that holds load the best) has at least one implicit assumption - it will be handled and used in the real world. Which means it'll be subject of sloppy handling and thermal spikes and weather and abuse and all kinds of things that are not just meeting the goal. Any of those cheesy AI designs, if you were to 3D-print/replicate them, they'd fall apart as you picked them up. So the problem seems to be, ML algorithm is getting too simple goal function - one lacking the "used in the real world" part.

I feel that a good first step would be to introduce some kind of random jitter into the simulation. Like, in case of the wheels, introduce road bumps, and perhaps start each run by simulating dropping the wheel from a short distance. This should quickly weed out "too clever" solutions - as long as the jitter is random enough, so RL won't pick up on it and start to exploit its non-randomness.

Speaking of road bumps: there is no such thing in reality as a perfectly flat road; if the wheel simulator is just rolling wheels on mathematically perfect roads, that's a big deviation from reality - precisely the kind that allows for "hacky" solutions that are not possible in the real world.

hammock · 2025-02-24T18:34:15 1740422055

You would have to introduce jitter to every possible dimension, when the dimensions themselves are continually expanding (as illuminated by the bike wheel example).. the combination of jitter x dimensions leads to an undefined problem (AKA theory of everything) in exponential fashion

wizzwizz4 · 2025-02-24T18:06:06 1740420366

Humans don't simplify problems by reducing them to objective functions: we simplify them by reducing them to specific instances of abstract concepts. Human thought is fundamentally different to the alien processes of naïve optimising agents.

We do understand the "real objectives", and our inability to communicate this understanding to hill-climbing algorithms is a sign of the depth of our understanding. There's no reason to believe that anything we yet call "AI" is capable of translating our understanding into a form that, magically, makes the hill-climbing algorithm output the correct answer.

tbrake · 2025-02-24T16:36:50 1740415010

How would more AI help? "given this goal with these parameters, figure out if another AI will ever game it into eventual thermonuclear war. "

Feels halting problem-esque.

aurbano · 2025-02-24T17:19:51 1740417591

My point was that instead of blaming ML - or optimisation tools really - for gaming objective functions and coming up with non-solutions that do maximise reward, AI could instead be used to measure the reward/fitness of the solution.

So to the OP's example "optimise a bike wheel", technically an AI should be able to understand whether a proposed wheel is good or not, in a similar way to a human.

hammock · 2025-02-24T16:05:28 1740413128

>simplify a problem down to an objective function

Yes, I have an intuition that this is NP hard though

jjk166 · 2025-02-24T19:13:23 1740424403

Humans have the same vulnerability

https://en.wikipedia.org/wiki/Perverse_incentive

tlb · 2025-02-24T15:54:08 1740412448

All these claims are like "programming is impossible because I typed in a program and it had a bug". Yes, everyone's first attempt at a reward function is hackable. So you have to tighten up the reward function to exclude solutions you don't want.

chefandy · 2025-02-25T04:14:45 1740456885

What claim would that be? It’s a hilarious, factual example of unintended consequences in model training. Of course they fixed the freaking bug in about two seconds.

1234letshaveatw · 2025-02-24T15:49:43 1740412183

Ummm, I'm going to hold off on that FSD subscription for a bit longer...

voidUpdate · 2025-02-24T14:35:25 1740407725

Is that Learnfun/Playfun that tom7 made? That one paused just before losing on tetris and left it like that, because any other input would make it lose

y33t · 2025-02-24T14:48:32 1740408512

No I want to say this was ~10 years ago. Happened to a university researcher IIRC.