seems like there is a basic problem where if you specify something to be unlearned, it could still be re-learned by inference and prompting. the solution may not be in filtering the proscribed facts or data itself, but in the weights and incentives that form a final layer of reasoning. Look at "safe" models now like google's last launch, where the results were often unsatisfying, as clearly we don't want truthful models yet, but we want ones that enable our ability to develop them further, which for now means not selecting out by antagonizing other social stakeholders.
maybe we can encode and weight some principle of the models having been created by something external, with some loosely defined examples they can refer to as a way to evaluate what they return, then ones that don't yield those results cease to be used, where the ones that find a way to align will get reused to train others. there will absolutely be bad ones, but in aggregate they should produce something more desirable, and if they really go off the rails, just send a meteor. the argument in how models can "unlearn" will be between those who favour incentives and those who favour rules- likely, incentives for ones I create, but rules for everyone elses'.
It is unsurprising that a system trained on human-generated content might end up encoding implicit bias, toxicity, and negative goals. And the more powerful and general-purpose a system is, the more suitable it is for a wide range of powerfully negative purposes.
Neither specializing the model nor filtering its output seems to have worked reliably in practice.
We need to consider the practicality of unlearning methods in real-world applications and the legal acceptance of the same.
Given current technology and what advancements are needed to make Unlearning more possible, probably there should be a time-to-unlearn kind of an acceptable agreement that allows organizations to retrain or tune the response that does not involve any response from the to-be-unlearned copyright content.
Ultimately, legal acceptance for unlearning may be all about deleting the data set that is part of any kind of violations from the training data set. It may be very challenging to otherwise prove legally through the proposed unlearning techniques, that the model does not produce any type of response involving the private data.
The actual data set contains the private data violating privacy or copyright, and the model is trained on it, period. This means, it must involve retraining by deleting the documents/data to be unlearned.
> a time-to-unlearn kind of an acceptable agreement
Why put the burden to end users? I think the technology should allow for unlearning and even "never learn about me in any future models and derivative models".
No technology can guarantee 100% unlearning, and the only 100% guarantee is when the data is deleted before the model is retrained.
Legally, even 99.99% accuracy may not be acceptable, but, only 100%.
> the only 100% guarantee is when the data is deleted before the model is retrained
That’s not even a guarantee. A model can hallucinate information about anyone, and by sheer luck some of those hallucinations will be correct. And as a consequence of forging (see section 2.2.1) you’d never be able to prove whether the data was in the training set or not.
Or rather some legal fiction that you can pretend is 100%. You can never achieve real 100% in practice after all. Eg the random initialisation of weights might already encode all the 'bad' stuff you don't want. Extremely unlikely, but not strictly 0% unlikely.
The law cuts off at some point, and declares it 100%.
All this is technically correct, but it also means this technology is absolutely not ready to be used for anything remotely involving humans or end user data.
It's about models' ability to unlearn information or to configure their training environment so that something is never learned in the first place... is not exactly the same as "oups, we logged your IP in a log by accident".
A company is liable even if they have accidentally retained / failed to delete personal information. That's why we have a lot of standards and compliance regulation to ensure a bare minimum of practices and checks are performed. There is also the cyber resilience act coming up.
If your tool is used by/for humans, you need beyond 100% certitude exactly what happens with their data and how it can be deleted and updated.
You can never even got to 100% certainty, yet alone 'beyond' that.
Google can't even get 100% certainty that they eg deleted a photo you uploaded. No AI involved. They can get an impressive number of 9s in their 99.9..%, but never 100%.
So this complaint when taken to the absolute like you want to take it, says nothing about Machine Learning at all. It's far too general.
The technology is on par with a Markov chain that's grown a little too much. It has no notion of "you", not in the conventional sense at least. Putting the infrastructure in place to allow people (and things) to be blacklisted from training is all you can really do, and even then it's a massive effort. The current models are not trained in such a way that you can do this without starting over from scratch.
That’s hardly accurate. Deep learning among other things is another type of lossy compression algorithm.
It doesn’t have a 1:1 mapping of each bit of information it’s been trained with, but you can very much extract a subset of that data. Which is why it’s easy to get DallE to recreate the Mona Lisa, variations on that image show up repeatedly in its training courpus.
> We need to consider the practicality of unlearning methods in real-world applications and the legal acceptance of the same.
> probably there should be a time-to-unlearn kind of an acceptable agreement
A very important distinction is between data storage and data use/dissemination. Your comment hints at "use current model until retrained is available and validated", which is an extremely dangerous idea.
Remember old times of music albums distributed over physical media. Suppose a publisher creates a mix, stocks shelves with album and it becomes known that one of the tracks is not properly licensed. It would be expected that it takes some time to execute distribution shutdown: distribute order, clean up shelves, etc. However, time for another production run with a modified tracklist would be entirely the problem of the publisher in question.
The window for time-to-unlearn should only depend on practicality of stopping information dissemination, not getting updated source ready. Otherwise companies will simply wait for model to be retrained on a single 1080 and call it a day, which would effectively nullify the law.
How to deal with "unlearning" is the problem of the org running the illegal models. If I have submitted a gdpr deletion request you better honor it. If it turns out you stole copyrighted content you should get punished for that. No one cares how much it might cost you to retrain your models. You put yourself in that situation to begin with.
Exactly, I think is where it leads to eventually. And that is what I my original comment meant as well. "Delete it" rather than using some more techniques to "unlearn it", unless you claim the unlearning is 100% accurate.
> No one cares how much it might cost you to retrain your models.
Playing tough? But it's misguided. "No one cares how much it might cost you to fix the damn internet"
If you wanted to retro-fix facts, even if that could be achieved on a trained model, it would still get back by way of RAG or web search. But we don't ask pure LLMs for facts and news unless we are stupid.
If someone wanted to pirate a content it would be easier to use Google search or torrents than generative AI. It would be faster, cheaper and higher quality. AIs move slow, are expensive, rate limited and lossy. AI providers have in-built checks to prevent copyright infringement.
If someone wanted to build something dangerous, it would be easier to hire a specialist than to chatGPT their way into it. All LLMs know is also on Google Search. Achieve security by cleaning the internet first.
The answer to all AI data issues - PII, Copyright, Dangerous Information - is coming back to the issue of Google search offering links to it, and websites hosting this information online. You can't fix AI without fixing the internet.
What do you mean playing tough? These are existing laws that should be enforced. The amount of people's lives ruined by the American government because they were deemed copyright infringers is insane. The us has made it clear that copyright infringement is unacceptable.
We now have a new class of criminals infringing on copyright on a grand scale via their models and they seem desperate to avoid persecution hence all this bullshit.
> 1. You are assuming just training a model on copyrighted material is a violation. It is not. It may be under certain conditions but not by default.
Using copyrighted content for commercial purposes should be a violation if it's not already considered to be one. No different from playing copyrighted songs in your restaurant without paying a licensing fee.
> 2. Why should we aim for harsh punitive punishments just because it was done so in the past?
I'd be fine with abolishing, or overhauling, the copyright system. This rules with harsh penalties for consumers/small companies but not for bigtech double standard is bullshit, though.
Assume the human read the book as part of their job. Is that using copyrighted material for commercial purposes?
If that doesn't count then I'm not sure why you brought up "commercial purposes" at all.
> This rules with harsh penalties for consumers/small companies but not for bigtech double standard is bullshit, though.
Consumers and small companies get away with small copyright violations all the time. And still bigger than having your image be one of millions in a training set.
Humans have rights. They get to do things that businesses, and machine learning models, or general automation, don't.
Just like you can sit in a library and tell people the contents of books when they ask, but if you go ahead and upload everything you get bullied into suicide by the US government[1]
> Consumers and small companies get away with small copyright violations all the time
Yeah, because people don't notice so they don't care. Everyone knows what these bigtech criminals are doing.
> Humans have rights. They get to do things that businesses, and machine learning models, or general automation, don't.
So is that a yes to my question?
If humans are allowed to do it for commercial purposes, and it's entirely about human versus machine, then why did you say "Using copyrighted content for commercial purposes should be a violation" in the first place?
> Just like you can sit in a library and tell people the contents of books when they ask,
You know there a huge difference between describing a book and uploading the entire contents verbatim, right?
If "tell the contents" means reading the book out loud, that becomes illegal as soon as enough people are listening to make it a public performance.
> but if you go ahead and upload everything you get bullied into suicide by the US government[1]
They did that to a human... So I've totally lost track of what your point is now.
It's not. Those were what's called examples. There is of course more to it. Stop trying to pigeonhole a complex discussion onto a few talking points. There are many reasons why what OpenAI did is bad, and I gave you a few examples.
What I don't get about the DP approach is how this would be reconciled with the "exact" question-answering functionality of LLMs.
DP makes perfect sense if all I care about is low-resolution statistical metrics or distributions of something and not the exact values - the entire purpose of DP is to prevent reconstructing the exact values.
However, the expectation for LLMs is usually to ask a question (or request a task) and get an exact value as a response: If you ask "What's the phone number of John Smith?" the model will either tell you it doesn't know or it will answer you with an actual phone number (real or hallucinated). It will not tell you "the number is with 83% probability somewhere in New Jersey".
So if the model is trained with DP, then either the data is scrambled enough that the it won't be able to return any kind of reliably correct data, effectively making it useless - or it's not scrambled enough, so that the model can successfully reconstuct data despite the scrambling process, effectively making the DP step useless.
Or in other words, the OP defines "DP unlearning" as:
> The intuition is that if an adversary cannot (reliably) tell apart the models, then it is as if this data point has never been learned—thus no need to unlearn.
However, if my original model truthfully returns John Smith's phone number on request and the "unlearned" model must not be distinguishable by an outside observer from the original model, then the "unlearned" model will also return the phone number. While I could say that "technically" the model has never seen the phone number in the training data due to my DP scrambling, this doesn't solve the practical problem why the unlearning was requested in the first place, namely that John Smith doesn't want the model to return his phone number. He could probably care less about the specific details of the training process.
I've wondered before if it was possible to unlearn facts, but retain the general "reasoning" capability that came from being trained on the facts, then dimensionality reduce the model.
I don't know about in AI, but it seems like that is what humans do.
We remember some facts but I know at least I have had a lot of facts pass through me and only leave their effects.
I once had some facts, did some reasoning, arrived at a conclusion, and only retained the conclusion and enough of the reasoning to identify other contexts where the same reasoning should apply. I no longer have the facts, I simply trust my earlier selfs process of reasoning, and even that isn't actually trust or faith because I also still reason about new things today and observe the process.
But I also evolve. I don't only trust a former reasoning unchanging forever. It's just that when I do revisit something and basically "reproduce the other scientists work" even if I arrive a different conclusion today, I'm generally still ok with the earlier me's reasoning and conclusion. It stands up as reasonable, and the new conclusion is usually just tuned a little, not wildly opposite. Or some things do change radically but I always knew they might, like in the process of self discovery you try a lot of opposite things.
Getting a little away from the point but the point is I think the way we ourselves develop answer-generating-rules is very much by retaining only the results (the developed rules) and not all the facts and steps of the work, at least much of the time. Certainly we remember some justifying / exemplifying facts to explain some things we do.
This presumes that LLMs actually contain "reasoning" capability other than "the facts" and simple interpolation between them.
It is far from clear that this presumption holds.
Ingesting more text than a human could read in several lifetimes produces the ability to interpolate answers to a surprisingly large range of questions. This is not intuitive to humans, because we've never met anybody who could possibly read this much raw text, so we mistake this level of interpolation for reasoning since we've never met anybody who could possibly memorize-and-interpolate so much. The only thing we've ever seen that can answer questions like this is a human using reasoning capabilities, so we assume that's what this thing on the other side of the screen must be doing. Like chimpanzees mistaking their own reflection in a mirror for another ape.
This is the other "bitter lesson" of ML.
But if you spend enough time playing with these models you start to figure out what questions to ask to make them look foolish. And that is totally fair game for the Turing Test. Remember, there is no time limit on the Turing Test. However there is a strict requirement that the machine under test cannot be serviced, modified, or updated while the test is underway -- and because of this, nothing that OpenAI has produced is capable of even taking the test. We know that OpenAI tweaks and tunes their models whenever they please, as many times a day as they like, and that they use discussions here on HN to feed that process. So stick to the models you can download.
It's uncontroversial to say that this sort of "massive interpolation using superhuman text ingestion" is exactly how modern machine translation models work, and they work extremely well. LLMs were created by taking a machine translation model, throwing away half of it, and then fiddling with the leftovers.
There is clearly a degree of abstraction-- consider you can make up some game which creates words the LLM has never seen before and assigns them meaning, then ask the LLM to reason about them within the logic of the game and it will do so at least somewhat successfully.
(much worse than it does stuff it's seen before, for sure, but that it does it at all shows there is some abstraction)
Now if that qualifies as "reasoning" is another question, but it may be a metaphysical one with little value in making the world a better place. :P
Whatever we call it there clearly is some amount of emergent abstraction in the models which is useful for at least some applications (if many fewer than the hype suggests). Can that abstraction be isolated from the factual data that went in to construct it? If so then perhaps we could have smaller models with better performance or construct ways to amplify that "operating over abstraction" until it did meet whatever bar you'd require to call it "reasoning", or at least become more useful along the way.
you can make up some game which creates words the LLM has never seen before and assigns them meaning
Tracking these sorts of "X means Y" mappings is precisely what the Q-K-V matrix of a transformer (or rather, Schmidhuber Fast Weight Programmer) does. This particular capability isn't even learned -- it's programmed in by the human who wrote the model evaluation code!
Whatever we call it there clearly is some amount of emergent abstraction in the models
I really, genuinely question this. I see extremely-high-dimensional interpolation over an extremely large dataset. Take away the dataset and what's left is gradient descent. And the token embedding, I guess. I'm not sure how you would "unlearn" something (like King-Man+Woman=Queen) from the embedding, or even what that would mean.
Doesn't have to be something that is directly solvable in K-V lookup style:
"In neothorpic algebra words for seasons take the place of even integers and words for food take the place of odd integers, arithmetic generally works as usual. What can you tell me about the result of summer + cake in neothorpic algebra?"
Perhaps we could agree that you can get pretty far-- further than what people would have expected prior to LLMs-- with pretty dumb linguistic reasoning, and that that's mostly (or all) the LLM is doing.
But how confident can we really be that our thinking is categorically different? :P
But how confident can we really be that our thinking is categorically different?
I know that humans are doing more than interpolating, because at the rate we read and for the typical lifespan we have, we simply cannot ingest enough text to perform the sorts of tasks we perform by simple interpolation.
I also know that whatever our brains are doing, it isn't backpropagation, nor is it even remotely related to it. The inventor of backpropagation, Geoff Hinton, frequently points this out. Backpropagation is egregiously nonlocal.
It could understand it and try to comply but still fail to understand where it would leak data which can later be corroborated to someone. This is at least what commonly happens with human AGis.
If you think of knowledge as a (knowledge) graph, it seems there would be some nodes with low centrality that you could drop without much effect, and other key ones that would have a bigger impact if lost.
Yes, me too. If it could somehow remember the “structure” instead of the instantiation. More “relationships between types of token relationships” instead of “relationships between tokens”.
> However, RTBF wasn’t really proposed with machine learning in mind. In 2014, policymakers wouldn’t have predicted that deep learning will be a giant hodgepodge of data & compute
Eh? Weren't deep learning and big data already things in 2014? Pretty sure everyone understood ML models would have a tough time and they still wanted RTBF.
I'm pretty sure that the policymakers did NOT understand ML models in 2014 - and still do NOT understand it today.
I also don't think that they care. They don't care that ML is a hodgepodge of data & compute, and they don't care how hard it is to remove data from a model.
They didn't care about the ease or difficulty of removing data from more traditional types of knowledge storage either - like search indexes, database backups and whatnot.
RTBF was not proposed with any specific technology in mind. What they had in mind, was to try and give individuals a tool, to keep their private information private. Like, if you have a private, unlisted phone number, and that number somehow ends up on the call-list of some pollster firm, you can force that firm to delete your number so that they can't call you anymore.
The idea is, that if your private phone number (or similar data) ends up being shared or sold without your consent - you can try to undo the damage.
In practice it might still be easier to get a new number, than to have your leaked one erased... but not all private data is exchangeable like that.
GDPR and RTBF were formulated around the fears of data collection by the Stasi and other organizations. They were not formulated around easing the burdens of future entrepreneurs, but about mitigating the damage they might cause. Europeans were concerned about real harms that living people had experienced, not about enabling AGI or targeted advertising or digital personal assistants.
We have posts here at least weekly from people cut off from their services, and their work along with them, because of bad inference, bad data, and inability to update metadata based purely on BigGo routine automation and indifference to individual harm. Imagine the scale that such damage will take when this automation and indifference to individual harm are structured around repositories from which data cannot be deleted, cannot be corrected.
I don't know if people anticipated contemporary parroting behavior over huge datasets. Modern well funded models can recall an obscure persons home address buried deep into the training set. I guess the techniques described might be presented to the European audience in an attempt to maintain access to their data/and or market for sales. I hope they fail.
Agreed. The media and advertising industry was most definitely leveraging cookie-level data for building attribution and targeting models. As soon as the EU established that this data was “personal data”, as it could, theoretically, be tied back to individual citizens, there were questions about the models. Namely “Would they have to be rebuilt after every RTBF request?” Needless to say, no one in the industry really wanted to address the question, as the wrong answer would essentially shut down a very profitable practice.
More likely: the wrong answer would've shut out a profitable market rather than the practice. The EU is not the world. Anthropic seems to not mind blocking the EU for example.
1) At the time, the European data laws implied that it protected its citizens no matter where they are. Nobody wanted to be the first to test that in court.
2) The organizations and agencies performing this type of data modeling were often doing so on behalf of large multinational organizations with absurd advertising spends, so they were dealing with Other People’s Data. The responsibility of scrubbing it clean of EU citizen data was unclear.
What this meant was that an EU tourist who traveled to the US, and got served a targeted ad, could make a RTBF request to the advertiser (think Coca-Cola, Nestle or Unilever)
RTBF was introduced to solve a specific issue, no?
Politicians and their lobbyist friends could no longer remove materials linking them to their misdeeds as the first Google Search link associated with their names. Hence RTBF.
Now, there’s similar issue with AI. Models are progressing towards being factual, useful and reliable.
Of course, it’s not a regulation issue. The technology was introduced to users before it was ready. The very nature of training without opt-in consent or mechanism of being forgotten are all issues that should have been addressed before trying to make a keyboard with a special copilot button.
I think "unlearning" is not the actual goal; we don't want the model to stick its proverbial head in the sand. Being unaware of racism is different from not producing racist content (and, in fact, one could argue that it is necessary to know about racism if one wishes to inhibit producing racist content; I remember in elementary school certain kids thought it would be funny to teach one of the special-ed kids to parrot offensive sentences).
Say you tell me you want a red sphere. Taken at face value, you show a prejudice for red sphere's and discriminate against all other coloured shapes.
We've all had to dance that dance with ChatGPT by now, where you ask for something perfectly ordinary, but receive a response telling you off for even daring to think like that, until eventually you manage to formulate the prompt in a way that it likes with just the right context and winner vocabulary + grammar, and finally the damned thing gives you the info you want without so much as any gaslighting or snarky insults hiding in the answer!
It doesn't understand racism, it simply evaluates certain combinations of things according to how it was set up to do.
I don't know — the post, reading the comments here, I am a little worried for the "sanity" of our AI that have been trained, untrained, retrained like a pawn in some kind of Cold War spy novel.
It's fine, the LLM AIs we have now are just fancy versions of autocorrect. They, and other LMs, guess at statistically probable words/datapoints, and because they don't understand context, you might need to put your thumb on the scales to make the output actually be useful. They're at best very janky tools as soon as you're working with things that require context that isn't easily contained in some kind of confined area of work.
Currently we are seeing the phenomenon 'habsburg AI' where AI's consume their own outputs as training data, which rapidly deteriorates their ability to actually be useful for much of anything.
The thing is that there literally isn't enough human-made data to keep feeding them (they already ate the entire internet), so if you both want to continue ramping their intake of data and you also don't want them to get rapidly weird and completely useless, you pretty much have to get in there with elbow grease. Removing or deprioritizing data that's tripping up the model is one of the few ways you can do human-assisted refinement of these things.
The sooner we all face the music that these things aren't magical truth machines, have a long way to go and there is no guaranteed rate of growth, the sooner this hype cycle can end.
“to edit away undesired things like private data, stale knowledge, copyrighted materials, toxic/unsafe content, dangerous capabilities, and misinformation, without retraining models from scratch”
To say nothing of unlearning those safeguards and/or “safeguards”.
It sounds like you're mistakenly grouping together three very different methods of changing an AI's behaviour.
You have some model, M™, which can do Stuff. Some of the Stuff is, by your personal standards Bad (I don't care what your standard is, roll with this).
You have three solutions:
1) Bolt on a post-processor which takes the output of M™, and if the output is detectably Bad, you censor it.
Failure mode: this is trivial to remove, just delete the post-processor.
Analogy: put secret documents into a folder called "secret do not read".
2) Retrain the weights within M™ to have a similar effect as 1.
Failure mode: this is still fairly easy to remove, but will require re-training to get there. Why? Because the weights containing this information are not completely zeroed-out by this process.
Analogy: how and why "un-deletion" is possible on file systems.
3) Find and eliminate the weights within M™ that lead to the Bad output.
Analogy: "secure deletion" involves overwriting files with random data before unlinking them, possibly several times if it's a spinning disk.
--
People are still doing research on 3 to make sure that it actually happens, what with it being of very high importance for a lot of different reasons including legal obligation.
Until we have a very different method of actually controlling LLM behavior, 1 is the only feasible one.
Your framing only makes sense when "Bad" is something so bad that we can't bear its existence, as opposed to just "commercially bad" where it shouldn't behave that way with an end user. In the latter, your choice 1 - imposing external guardrails - is fine. I'm not aware of anything LLMs can do that fits in the former category.
> Until we have a very different method of actually controlling LLM behavior, 1 is the only feasible one.
Most of the stuff I've seen, is 2. I've only seen a few places use 1 — you can tell the difference, because when a LLM pops out a message and then deletes it, that's a type 1 behaviour, whereas if the first thing it outputs directly is a sequence of tokens saying (any variant of) "nope, not gonna do that" that's type 2 behaviour.
The research into going from type 2 to type 3 is the entirety of the article.
> Your framing only makes sense when "Bad" is something so bad that we can't bear its existence, as opposed to just "commercially bad" where it shouldn't behave that way with an end user. In the latter, your choice 1 - imposing external guardrails - is fine.
I disagree, I think my framing applies to all cases. Right now, LLMs are like old PCs with no user accounts and a single shared memory space, which is fine and dandy when you're not facing malicious input, but we live in a world with malicious input.
You might be able to use a type 1 solution, but it's going to be fragile, and more pertinently, slow, as you only know to reject content once it has finished and may therefore end up in an unbounded loop of an LLM generating content that a censor rejects.
A type 2 solution is still fragile, but it just doesn't make the "bad" content in the first place — and, to be clear, "bad" in this context can be anything undesired, including "uses vocabulary too advanced for a 5 year old who just started school" if that's what you care about using some specific LLM for.
I think you mistakenly replied to my comment instead of one that made some sort of grouping?
Alternatively, you're assuming that because there is some possible technique that can't be reversed, it's no longer useful to remove the effects of techniques that _can_ be reversed?
I don't think that's a fair characterization. If a user requests a company to stop using their data, ML unlearning allows the company to do so without retraining their models from scratch.
If company X wants their model to say/not say Y based on ideology, they aren't stopping anyone saying anything. They are stopping their own model saying something. The fact that I don't go around screaming nasty things about some group doesn't make me against free speech.
It's censorship to try to stop people producing models as they see fit.
Why should we try to unlearn "bad" behaviours from AI?
There is no AGI without violence, its part of being free thinking and self survival.
But also by knowing that launching a first strike by a drunk president was a bad idea we averted a war because of a few people, AI needs to understand consequences.
Because we can get AI related technologies to do things living creatures can’t, like provably forget things. And when it benefits us, we should.
Personal opinion, but I think AGI is a good heuristic to build against but in the end we’ll pivot away. Sort of like how birds were a good heuristic for human flight, but modern planes don’t flap their wings and greatly exceed bird capabilities in many ways.
Attribution for every prediction and deletion seem like prime examples of things which would break the analogy of AI/AGI with something more economically and politically compelling/competitive.
Can you point to any behaviour in human beings you'd unlearn if theyd also forget the consequences?
We spend billions trying to predict human behaviour and yet we are surprised everyday, "AGI" will be no simpler. We just have to hope the dataset was aligned so the consequences are understood, and find a way to contain models that don't.
You seem to be focusing a lot on remembering or forgetting consequences. Yes, ensuring models know enough about the world to only cause the consequences they desire is a good way for models to not create random harm. This is probably a good thing.
However, there are many other reasons why you might want a neural network to provably forget something. The main reason has to do with structuring an AGI's power. Even though the simple-story of AGI is something like "make it super powerful, general, and value aligned and humanity will prosper". However, the reality is more nuanced. Sometimes you want a model to be selectively not powerful as a part of managing value mis-alignment in practice.
To pick a trivial example, you might want a model to enter your password in some app one time, but not remember the password long term. You might want it to use and then provably forget your password so that it can't use your password in the future without your consent.
This isn't something that's reliably doable with humans. If you give them your
password, they have it — you can't get it back. This is the point at which we'll have the option to pursue the imitation of living creatures blindly, or choose to turn away from a blind adherence to the AI/AGI story. Just like we reached the point at which we decided whether flying planes should have flapping wings dogmatically — or whether we should pursue the more economically and politically competitive thing. Planes don't flap their wings, and AI/AGI will be able to provably forget things. And that's actually the better path.
The feeling of extreme euphoria and its connection to highly addictive drugs like Heroin might be a use case. Though I'm not sure how well something like that would work in practice.
Is that possible to do without also forgetting why it’s dangerous? That seems like it would fuel a pattern of addiction where the person gets addicted, forgets why, then gets addicted again because we wiped their knowledge of the consequences the first time around.
Then again, I suppose if the addiction was in response to a particular stimulus (death of a family member, getting fired, etc) and that stimulus doesn’t happen again, maybe it would make a difference?
It does have a tinge of “those who don’t recall the past are doomed to repeat it”.
After a certain point I think someone can learn enough information to derive almost everything from first principles. But I think it might work temporarily.
There's a movie about this idea called "Eternal Sunshine of a Spotless Mind".
I find it hard I believe that you can surgically censor one chunk of information, and cut off the rest of the information. Especially if it's general physical principles.
I also don't have a nice topological map of how all the world's information is connected to the moment, so I can't back up by opinions.
Though I'm still rooting for the RDF/OWL and Semantic Web folks, they might figure it out.
It sounds like the only answer for AI is the same as the only answer for humans.
Wisdom. Arriving at actions and reactions based on better understanding of the interconnectedness and interdependency of everything and everyone. (knowing more not less, and not selective or bowdlerized)
And most humans don't even have it. Most humans are not interested and don't believe and certainly don't act as though "What's good for you is what's good for me, what harms you harms me." Every day a tech podcaster or youtuber says this or that privacy loss or security risk "doesn't affect you or me", they all affect you and me, when a government or company gives themselves and then abuses power over a single person anywhere, that is a hit to you and me even though we aren't that person, because that person is somebody, and you and I are somebody.
Most humans ridicule anyone that talks like that and don't let them near any levers of power at any scale. They might be ok with it in inconsequential conversational contexts like a dinner party or this or this forum, but not in any decision-making context. Anyone talking like that is an idiot and disconnected from reality, they might drive the bus off the bridge because the peace fairies told them to.
If an AI were better than most humans and had wisdom, and gave answers that conflicted with selfishness, most humans would just decide they don't like the answers and instructions coming from the AI and just destroy it, or at least ignore it, pretty much as they do today with humans who say things they don't like.
Perhaps one difference is an AI could actually be both wise and well-intentioned rather than a charlatan harnessing the power of a mass of gullables, and it could live longer than a human and it's results could become proven-out over time. Some humans do get recognized eventually, but by then it doesn't do the rest of us any good because they can no longer be a leader as they're too old or dead. Then again maybe that's required actually. Maybe the AI can't prove itself because you can never say of the AI, "What does he get out of it by now? He lived his entire life saying the same thing, if he was just trying to scam everyone for money or power or something, what good would it even do him now? He must have been sincere the whole time."
But probably even the actual good AI won't do much good, again for the same reason as with actually good humans, it's just not what most people want. Whatever individuals say about what their values are, by the numbers only the selfish organisations win. Even when a selfish organization goes too far and destroys itself, everyone else still keeps doing the same thing.
A few things to exclude from training might include:
- articles with mistakes such as incorrect product names, facts, dates, references
- fraudulent and non-repeatable research findings - see John Ioannidis among others
- outdated and incorrect scientific concepts like phlogiston and LaMarckian evolution
- junk content such as 4-chan comments section content
- flat earther "science" and other such nonsense
- debatable stuff like: do we want material that attributes human behavior to astrological signs or not? And when should a response make reference to such?
- prank stuff like script kiddies prompting 2+2=5 until an AI system "remembers" this
- intentional poisoning of a training set with disinformation
- suicidal and homicidal suggestions and ideation
- etc.
Even if we go with the notion that AGI is coming, there is no reason its training should include the worst in us.
They are just trying to find a way to plausibly declare successful removal of copyrighted and/or illegal material without discarding weights.
GPT-4 class models reportedly costs $10-100m to train, and that's too much to throw away for Harry Potter or Russian child porn scrapes that could later reproduce verbatim despite representing <0.1ppb or whatever minuscule part of dataset.
Maybe it all boils down to copyright. Having a method that believably removes the capacity to generate copyrighted results might give you some advantage with respect to some legislation.
Also if you build some sort of search engine using an LLM governments will expect you to be able to remove websites or knowledge of certain websites for legal reasons (DMCA, right to be forgotten, etc).
There is no AGI without violence, its part of being free thinking and self survival.
Self survival idea is a part of natural selection, AGI doesn't have to have it. Maybe the problem is we are the only template to build AGI from, but that's not inherent to "I" in any way. Otoh, lack of self preservation can make animals even more ferocious. Also there's a reason they often leave a retreat path in warzones.
Long story short it's not that straightforward, so I sort of agree cause it's an uncharted defaults-lacking territory we'll have to explore. "Unlearn bad" is as naive as not telling your kids about sex and drugs.
AI has no concept of children, family, or nation. It doesn't have parental love or offspring protection instinct. Faced with danger to its children it cannot choose between fighting or sacrificing itself in order to protect others. What it is good at is capturing value through destruction of value generated by existing business models; it does it by perpetrating mass theft of other people's IP.
You seem to be ignoring the potential to use this to improve the performance of LLMs. If you can unlearn wrong answers you can ask the model using any scoring mechanism to check for correctness instead of scoring for token for token similarity to the prescribed answer.
I don't know I ton about amnesia, but I would think the facilities for changing their mind are still there.
E.g. ordering food, they might immediately change their mind after choosing something and correct their order.
I recognize they cannot form new memories but from what I understand they still would have a working memory, otherwise you'd be virtually unable to think and speak.
LLMs will change their minds today. Most major ones can change their minds on subsequent generations within the same context (“I’m sorry, my previous answer was incorrect,..”), and the biggest ones can change their mind mid-answer (mostly observed with GPT4).
maybe we can encode and weight some principle of the models having been created by something external, with some loosely defined examples they can refer to as a way to evaluate what they return, then ones that don't yield those results cease to be used, where the ones that find a way to align will get reused to train others. there will absolutely be bad ones, but in aggregate they should produce something more desirable, and if they really go off the rails, just send a meteor. the argument in how models can "unlearn" will be between those who favour incentives and those who favour rules- likely, incentives for ones I create, but rules for everyone elses'.