I object to this paper title. The idea of unlearning is already well-known, given there are benchmarks and previous results mentioned in the abstract. This paper introduces a new model and gets good performance but doesn't deserve to be named after the whole field.
I wonder: is this the usual case, that users behavior or other data is used as input for ML projects without their consent? Would they always have to opt out rather than preparing the data in a way that prevents from disregarding users' privacy? Or, assuming that testing for a relevant outcome invariably transgresses on users' privacy, I wonder if this kind of ML work isn't a bit unethical as a whole?
In many cases, the terms of service of a company include provisions that usage data of the products will be analyzed for research and experimentation purposes, which are often considered necessary for the health of the business (and thus even meet GDPR requirements for this data capture & use).
For example, a company couldn’t remain competitive and serve customers or continue existing if it can’t perform a/b testing on new features or changes, or look at descriptive statistics about which types of customers use which products. Creating statistical models to answer these questions or to have aspects of a product that personalize based on these data is a routine matter of business operation. Rightly or wrongly, the terms of service are usually enough to allow the business to use data this way, and often label it as critical for the operation of the business.
I don't see a problem of creating ML models for improving the same company's services, which I can't really even imagine to require much explicit customers' consent. If it's for the same company, the same way all clearance would go to researchers as it went to statisticians doing evaluations for companies in the past. If it's about optimizing your business, do you even need ML? Asking questions about statistical data has been done long before the current age of big-data statistical self-betrayal.
The way I see it is that people start to try building their businesses around the ideas of ML, basically ML as a service, the catch there is just that their ordering businesses data, which is really their customers' data will end up in the big mess of aggregated, weakly correlated data, from which they then try to derive their models that are supposed to make their money. At no point there, I as the customer of company A, can be sure if I'm correctly or incorrectly being correlated in those models. The need to delete me from these evaluations arises from my wish to protect not just my individuality from Brazil-like misinterpretations, but also to protect the companies asking the questions for their businesses, too.
I don't know about you, but to me this casts doubt on the utility of non-specific ML as an arbitrary interpretation of unspecific data that is as useless to me as it is to my competitors, seems just Jack shit, really. You wanna solve a problem? Go solve it by bringing the consumer and the producer closer together, that counts for any business out there, especially insurance and policy, and stop ramming another PC-driven layer of middle management ML between them.
I have such trouble trying to figure out which of these many algorithms that are released will end up having a significant impact on a 5 year horizon. But this I have a rare hunch about that it could be quite significant (at least the direction in which it's trying to push).
But if the point of Machine Learning is to generalise a given dataset, wouldn't a particular pattern (the one one wishes to forget) be (unintentionally) found given other unrelated and/or similar patterns?
Anonymising data is surprisingly difficult. I'm wondering if there is the "falsehoods programmers believe about" about this, as there are for topics such as names, addresses, time, etc.?
All human-produced data is inherently non-anonymous.
The only respectful mechanical way to use it is through information-theoretic guarantees, e.g. using it for zero-knowledge proofs and then burning the data.
> “ Machine learning (ML) exacerbates this problem because any model trained with said data may have memorized it,”
Yikes, that’s a really draconian scare tactic way to frame it. It clearly is meant to exacerbate misunderstandings of how statistical modeling actually works.