To be honest, this is more like "requesting the data to be deleted". There's nothing that guarantees that the personal information will be physically wiped out of the hard drives used to store them.
Sure, you can sue the hollowed out shell of a bankrupt limited liability corporation that will soon have no assets for a court to sieze for whatever paltry damages a court finds.
the current US presidential administration is in the process of completely hollowing out the US judicial system. nothing in the courts can be trusted if the SCOTUS can't be trusted.
but even with sane leadership there is nothing preventing stuff from being leaked or sold.
unless you can physically verify the shred-it guys destroying the disks, it's out there.
> > this is more like "requesting the data to be deleted".
> Ideally the court can compel the purchaser(s) to destroy the data.
I suppose a court order holds more weight than checking a box on a web site, but I'm not sure how much I'd trust the eventual purchaser.
Worse yet, if the seller drops their guard (due to lack of funding for proper security) will someone steal the data? At that point web requests and court orders are moot.
I guess I'm seeing a black hat behind every bush. I'm not sure if this kind of data has any value on the black market.
Good point. There is no confirming the data was actually deleted, beyond the request to do so. There is nobody to go after, as the new owner does not have to respect the request, unless a court orders them to do so. Even in that case, going after the new owner, it would appear to be time critical. Can users stop the selling or transfer of data, before any court could block it?
The correct timing, appears to be that 23andMe would have had an opt-out, blocking the selling or transfer of their data. That also should have included confirmation of data deletion, if requested. Since none of that existed, the options for users are quite limited. In fact, many would have participated in 23andMe Research, so their data was likely going to wherever long before.
Of course. And I'm not saying that they might do it in malice.
All I'm suggesting is that tapping some pixels on your backlit rectangular glass won't necessarily translate into pulses of electrons that'll eradicate the 0s and 1s representing your data.
I'm sure that corner of the codebase is one of the least visited parts, so bugs may lurk in, or misconfigurations, etc.
How you lawyer is going to prove the data is not deleted?
And what damages are you going to claim in court.
Lawyers are not cheap, no lawyer will work on a case less that even $1k. My only hope is donating to privacy fighting organizations like EFF that file class actions.
Also, what prevents new owners from restoring from backups because "we were hacked" or any other reason for retrieving backup data for something that is currently "deleted"?
For those curious about what actual data they are recording, they use Infinium Global Screening Array which records about 650-750k SNPs (single-nucleotide-polymorphisms) .
Obviously this data infers heritage, disease risk, relations. It could be used for discrimination, surveillance, potentially poisoning .
Everyone should request their data to be deleted, but this is an engineering forum, and we know what that means in practice. Every company like this has hundreds of copies of the data, and has shared it with dozens of providers.
Like Rev Tevia said, you can't put the feathers back into an opened pillow.
Their array of SNPs in ASCII letters is under 10MB compressed, probably well under that using a specialized SNP format/compression algorithm. Less than a complex Microsoft Office file.
Yeah, I can imagine they have a few dozen copies strewn over various backup media/blob buckets. There probably isn't much effort from what's left of their IT team to track them all down to delete.
It’s not about the size. Every task will have made a copy and a derivative . I doubt the company ever cared to build a dependency tree for removal, certainly not managing copies given to partners.
Now the company is bankrupt this is the last thing on their task list to implement.
Poisoning really? All the risks are pretty sci fi. It’s pretty easy otoh to harm someone without bothering analyzing the snp data if that is your intention.
This feels as hopeless as trying to keep your email/contacts from social media sites. Even if you are vigilant about never allowing an app/service to download your contacts, your friends will share theirs and it is trivial to recreate your contact list. If I keep my DNA from these companies, my relatives will share theirs and they basically have my DNA.
The distinction isn't super important, but 23andMe doesn't have your whole genome, just some specific locations from it. Roughly 750k base pairs or so.
The Genetic Information Nondiscrimination Act makes it illegal to adjust health (but not life) insurance premiums or discriminate for employment based on genetic information. Couples who do genetic testing before having kids have the same protections and they're very effective.
To the best of my knowledge use of genetic data is illegal in the USA and several other countries. It has been operational banned (self-imposed) by the life insurance industries in the UK and Australia. This was a hit topic in the late 1990s. Here we are 25 years later with few if any known abuses by the life insurance industry. They have MUCH bigger fish to fry: Do you smoke? What is your income, age, and sex, and perhaps your blood pressure and blood chemistry. Each of those is worth 10X your genotype.
And let me flip this situation: are there any laws that prevent advertisers from looking at genetic data to target cohorts? If I were an unethical advertiser, I'd want to advertise to customers with less risk aversion, higher neuroticism, higher sense of FOMO. You could do some truly sickening stuff. Target higher mortality groups, certain personality types, cross reference with familial mortality data and have a field day...
There are untold ways this could be abused that I'm almost certain the law doesn't fully protect against.
Don't think your reply is arguing in good faith (unless parent was edited) when you basically ignored the implied examples of that comment and made 3 strawmen points
I'm not sure if there are any current laws that prevent this, but there are quite a few laws that would prevent an advertising company from getting this information. Like most things, if we're going by a company that doesn't work within the law, we're already going to lose.
What's to stop someone within an advertisement company from reaching out to someone in healthcare IT, and offering a large amount of money for this information? Trying to link this physical data to an online presence is probably not worth the risk and amount of money and time (at this current point in time).
All my searching currently shows that there are only laws to protect against using genetics in employment and insurance, within the US. It doesn't look like there are any other protections in the US, other than unrelated laws like HIPAA compliance. I wouldn't even try to pretend to be able to figure out other countries' laws around this (and probably don't understand US law any further than not being able to find information easily available with search tools).
> What's to stop someone within an advertisement company from reaching out to someone in healthcare IT, and offering a large amount of money for this information? Trying to link this physical data to an online presence is probably not worth the risk and amount of money and time (at this current point in time).
HIPAA works because it comes with personal liability. Anyone who sells/leaks/loses HIPAA data gets hit with a $1000 or so fine per person. So if you sell 100 patients' data, you're personally on the hook for $100,000. Your employer pays another cool $10,000/person on top.
More of these laws should come with personal liability. HIPAA is the only one I've ever seen people take seriously.
HIPAA isn’t really a personal privacy regulation at all ...
Like other privacy regulation, it’s there to protect the industry and their business/commercial interests.
Barriers to access mean less controversy, fewer lawsuits, fewer investigative news stories, fewer insurance disputes.
I’d say it’s also designed to reduce contamination or adulteration of data: if every facility needs to do new testing and new evaluation then they can be sure they got the results they need, instead of taking some rando’s word for it.
HIPAA isn’t the most onerous barrier to personal access to records, but it’s a huge hassle for someone who wants it opened up for family, friends, and other entities because those forms are onerous. With good transparency in patient portals, authorized users can manage a lot on their own.
Also, good luck reading anything but textual notes, because imaging and other medical data is often always distributed in proprietary file formats that don’t simply import into Gimp!
> HIPAA isn’t really a personal privacy regulation at all ...
HIPAA as a whole is not.
The HIPAA Privacy and Security Rules, which are enforced by a different entity than the rest of HIPAA, are (the bulk of HIPAA is insurance administration rules enforced by the Centers for Medicare and Medicaid Services; the Privacy and Security Rules are personal privacy and information security rules enforced by the DHHS Office of Civil Rights.)
HIPAA is a stepping-stone to single-payer and socialized medicine.
I once joined a health sharing ministry where reviews said "it requires an Olympic-class athlete in paperwork and bureaucracy". Being "not insurance" it was completely DIY and "self-pay" and begging for reimbursements after the fact.
I've also attempted to visit independent PCPs. An independent PCP who isn't part of a major health system, when they refer you out, refers you to some other independent specialist with their own process, their own IT tooling and portal, and their own claims/billing services. Now multiply those specialists by the number of your conditions, or simply the multiplicity of organs in your body, and all the fiefdoms commanded by different medical boards.
I sincerely pity any sane family of 4 or 5, because speaing for myself as an insane family of 1, the process is mind-blowing, byzantine, and frustrating by design, and the gatekeeping is exhausting but, obviously, necessary. Dealing with doctors arguably did not drive me insane, but it certainly helps keep me that way.
Gatekeeping doesn't end with single-payer and socialization, but all this back-and-forth and multiple independent systems should ideally be coalesced into one monolithic Brazil/12 Monkeys sized system.
I pity parents with sick children the most, I suppose. I mean it's bad enough for elderly parents and adult children to handle when they don't love their parents enough. But for parents to care for a sick child enough to funnel them into endless medical appointments, drugs, invasive therapies and even experimental Herr Mengele shit because it's cheap or free, feels like cruelty and exploitation being visited on that family, rather than mercy or healing. I found the Karen Ann Quinlan case (I suppose I was too young to remember when it hit the news before Terri Schiavo) and I found Karen's parents' attitude and comments to be quite poignant. It's called a "right-to-die" milestone, but I consider that the parents advocated for her right to be free from pain and distress associated with unnecessary medical treatment.
HIPAA is a fuckin' bugaboo when you're trying to coordinate care among payors, providers, billers, HIMS admins, family and friends, because all of these parties I mention are compartmentalized and the compartmentalization is nearly as fierce as military/espionage systems, except there's usually not a guy sitting next to the curtain wielding a semiautomatic rifle.
Sure they do, but it's very hard to market illegal activities in a b2b context. How does 23andMe go about selling data allowing insurers to discriminate, without saying that's what they do?
I can think of a few things you could try, maybe. But judging from 23andMe's present state, it's clear that whatever things they tried to monetize customers' DNA info, didn't work out well for them.
> have job offers rescinded, or be targeted by scams
Can you expand on this?
I understand the insurance thing due to genetic diseases and so on, but which jobs would I be denied for based on genetic information which wouldn’t be checked anyways?
I can only come up with stuff like colorblindness but that would probably be checked anyways if it were a strict requirement for the job so keeping the DNA secret wouldn’t help.
I see most comments concentrated on employment. For a scam, think of someone that has been told they have a specific genetic disease, and that information is available in their DNA "data". As a scammer, I can start to send you information about alternative health treatments specific to your disease, that have no scientific backing to them. Since I'm a scammer, I can write anything I want to, like stating that the information is backed by FDA approval and even put statements like that in the fine print to build up my credibility. You could also try and sell fake services that wipe your released DNA information from databases online. There's a lot of potential for scams if you can link what people think is private (DNA), and their email/personal information.
When I was younger, I read a lot of ethics course material, and spent a lot of time thinking about how someone could get around existing laws or technology, and most of it boils down to most people believing what they're told with a bit of coaxing (building that credibility; social engineering). Luckily, I never went ahead using this information, and have actually turned down projects where my morals were put into question, but I think it prepared me to be more conscious of scams and shady advertising. I work for a digital advertising agency, and use an adblocker during my development work so I can see how a site is useful or mostly worthless when someone turns ad networks/tracking off. One of the benefits of working for a smaller company.
Why not do all of this without the data. Saves the scammer a lot of money up front. Scammers are pros at making up plausible stories. And yet here we are 15 years into 23andMe —- have you ever heard of a genetics scam? I have not.
Well, I have, actually. There are heir-hunter scams. You're contacted by someone claiming to be from a lost heir hunting company. These companies claim to track down the closest relatives to people who died without known heirs, in return they get a share of the inheritance. So you can't bypass them, they won't tell you who you're supposedly the heir of. They promise they're not asking for money, only a share... until they do ask for money, of course.
This scam doesn't use your actual DNA data though, just the fact that you have a profile on a DNA site.
>>> And what’s the scam angle when the DNA is known?
A person with apparent authority, telling people something about themselves, that they believed to be hidden, is a tactic for gaining psychological control. A strong-minded person should be able to withstand it under normal circumstances, but we're not all strong-minded under all circumstances. Hence the power of things like personality tests, police interrogations, and so forth.
This would be wholly illegal, but companies could screen candidates prior to extending offers to them. After they get your primary details and history, they can look you up in the gene database. They could look for a whole host of genetic markers, including but not limited to:
- Markers like ADHD and other neurodivergence and performance signals
- Disease likelihoods to reduce their insurance burden. Cardiovascular, cancer, neurodegeneration, etc.
- Markers for intelligence and tenacity. Personality type. Conversely, dishonesty, neuroticism, etc.
They could screen for literally any hypothetical condition that could in theory impact performance, risk, cost, etc. By excluding candidates with "low genetic scores", they might think they're saving margin.
There is a ton of literature beyond what 23andMe is legally allowed to report on with respect to the SNP data they collect. These studies report on a wide range of phenotypical states and behaviors that could impact job performance. The stack of research is deep.
> And what’s the scam angle when the DNA is known?
Look for any markers that indicate IQ, agreeableness, neurodegeneration, schizophrenia, personality type, etc. It gives scammers a hypothetically better hit rate.
And again, they don't need your DNA to do this. Just a relative's.
> 1.3x to 11.5x Increased risk of autoimmune thyroid disease
> 1.3x higher risk of ER+ breast cancer
> 2 - 3x higher prostate cancer risk if routinely exposed to the pesticide fonofos
> 1.5x - 2x increased risk for cervical cancer, HNSCC, and breast cancer
> 2x risk of Alzheimer's disease
> Lack of empathy? You have a SNP in the oxytocin receptor which may make you less empathetic than other people.
> Increased risk of Multiple Sclerosis.
> HLA-DRB11501 carrier; higher multiple sclerosis risk Rs3135391(C;T) is highly correlated with the HLA-DRB11501 allele. There is a 3x higher risk of multiple sclerosis associated with the (C;T) genotype.
> 1.4x higher risk of lupus increased risk of Systemic lupus erythematosus.
(And on and on...)
This is stuff that 23andme can't legally show you, and many of the studies are small and inconclusive. But many of the disease markers are noteworthy.
None of those are particularly useful on an individual level. E.g.:
- 1.42x risk of Autism
Okay, great, the population incidence is about 1 in 36, so 1.42x risk is about 1 in 25. What possible actionable use is this? It's not even particularly useful input to "should I follow up with some kind of actual assessment".
But even that and the other not-particularly-useful numeric risk multipliers are better than:
- You have a SNP in the oxytocin receptor which may make you less empathetic than other people.
At this level of specificity, you may as well be consulting a magic 8-ball.
Yes, those are tiny relative risk scores for large diverse (messy) POPULATIONS. They are absolutely NOT individual predictions. Even the most sophisticated polygenic risk score are jokes for most traits—-particularly psychosocial traits.
You want actionable information— a 30 minute interview.
A 15 min interview will give them 100X more data than a VCF file or even a 30X whole genome. The list of traits you enumerated are definitely not well predicted by a VCF file.
Question: How are they going to link the DNA to people?
Some will be easier than others, sure. I'm trying to decide how "safe" my data is, since I created a single-use gmail account, with fictitious name, and paid for it with a gift card. I was afraid that some information in there might lead to being uninsurable, so I decided to row away from the rocks. Thankfully, my genetics didn't pop up any red flags, knock on wood.
I guess if you signed up using your normal e-mail address and your real name and used your credit card, you can still take the Shaggy defense ("It wasn't me"), but I suppose at that point they could ask you to prove it. I mean, most businesses aren't obligated to do business with you, for any or no reason at all.
23andMe does not operate as a laboratory itself but contracts with U.S.-based labs that are certified under CLIA and accredited by the College of American Pathologists (CAP). According to their website, all saliva samples are processed in CLIA-certified and CAP-accredited labs, ensuring compliance with federal standards for accuracy and reliability. This certification is crucial, as it aligns with FDA requirements for certain health-related genetic tests. This distinction is significant, as CLIA primarily regulates labs, not the companies that contract them, potentially affecting the applicability of retention requirements to 23andMe’s broader operations.
CLIA’s record retention requirements, as per Section 493.1105, states labs must retain test requisitions, authorizations, and reports for at least 2 years, with longer periods for specific tests like pathology (10 years for slides).
CLIA Laboratory Record Retention Requirements:
- Test requisitions and authorizations: 2 years minimum.
- Test reports: 2 years minimum, 10 years for pathology reports.
- Cytology slide preparations: 5 years.
- Histopathology slides: 10 years.
- Pathology specimen blocks: 2 years.
- Tissue: Until diagnosis is made.
Notably, these requirements focus on test-related records, such as requisitions (which may include patient details like date of birth and sex) and reports (which for genetic tests would include interpreted results). However, there is no explicit mention of retaining raw genetic data, such as the full genotype data, in the CLIA regulations. This raises questions about whether 23andMe’s assertion to retain raw genetic information is strictly required by CLIA or if it extends beyond the regulation for other reasons, such as research or quality control.
Here's a great post by a lawyer, linked to further down in that thread: https://bourniquelaw.com/2024/10/09/data-23-and-me/ It suggests a way to challenge them on their assertions that they must keep your data and samples.
I'm sorry but this lawyer has absolutely no idea what he is talking about with regards to CLIA compliance. And he even admits as much, but keeps talking anyway.
CLIA is one of the excuses 23andMe uses to explain why they retain your genetic information, date of birth, and sex. The author cites the code sections he believes 23andMe are referencing to make this claim, then explains why he believes it doesn't apply. As a CLIA expert, do you mind explaining what he's getting wrong for our benefit?
The data has already been sold off to the real customers (i.e., not you and me) [1]. You can (and should) request a deletion, but the damage has already been done.
This is false, we've sold data with PII to no one. Or it is misleading: the page you linked to even says, "It is selling de-identified, aggregate data for research, if you give them consent."
To what extent and using what method is it "de-identified"? Plenty of such schemes are very easy to circumvent, especially with a large enough pool of data. Given the nature of genetics in particular positively identifying a single case can be used to unmask whole families. In particular depending on the anonymization this would be a task suited to 'AI' very well.
Basically, if you imagine this as a table of "user's name, date of birth, and address" keys mapping to genomic and other data, the key was replaced with a random identifier that could not be trivially joined to recover the user name, date of birth, and address.
These systems are not robust against motivated and capitalized adversaries.
I can go to a data broker and purchase access to de-identified EMR data for most of the U.S. population. There are much more useful de-identified datasets around than ours, if someone is motivated to try to re-identify those datasets. That data is all bought and sold without anyone's consent and this is all fine under HIPAA.
I wasn't trying to convince anybody otherwise. I think the noise about 23&Me's data to be pretty uninteresting. I published my own genome (through PGP) for anybody to download, and I know that people have identified me from my post https://news.ycombinator.com/item?id=7641201 and other comments.
That's more or less what I expected. Ah well, the odds that this becomes something of significance to most people seems remote, but either way you can't unring the bell.
Here "de-identified" means stripped of PII (name, address, phone number, email, etc). You are correct that genetic information is intrinsically identifiABLE (in the sense that it is stable and uniquely distinguishing for individuals). When we've shared individual-level data with a partner, it was with consent of the participants involved, and under a contract that prohibits re-identification.
I would not argue with you on that it is "selling your data". But I also think there are meaningful differences in harm levels for different kinds of "selling your data", and fully identified data has more potential harms than de-identified data where you have to assume that an adversary is willing to violate contracts and/or the law to learn about particular individuals.
There is considerable confusion about the distinction between aggregated data and de-identified individual-level data. I would say that I don't consider sufficiently aggregated data to be "your data" in a particularly meaningful personal sense of "your", even though there are still some re-identification risks from these types of datasets.
I was contesting the statement that "The data has already been sold... [and] the damage is already done" which I still think is highly misleading.
I don't think 23andme has been casual or callous with people's data; they are probably a step above the average firm that handles this sort of data. The consent process is well-documented.
My complaint has always been about 23&Me has always been Anne Wojciki's naivete about the utility of genomic data for health treatment, as well as whether her company needed to work with the government (she wrote a useful retrospective that helped shed light: https://hbr.org/2020/09/23andmes-ceo-on-the-struggle-to-get-...).
Most of us who worked in genomics at the time were sort of dumbfounded by her approach and wanted to know what magic she had that let her get as far as she did with the company and its product.
I don't have any problem with the family history side of the product; that's how my dad found out that he had a number of unexpected children (IVF through donated sperm) who were able to connect with him years after their conception. And I really wish disease genetics had turned out to be far more straightforward as I've long been fascinated with how complex phenotypes arise from genomes.
The top-of-thread linked to an article that was specifically about aggregated data sharing, not individual-level data sharing. The consent document you're linking to did not exist when that article was written; our general research consent only covers aggregated data sharing. It was only in 2018 that we added the second-level consent for individual-level data sharing.
I think we've generally been pretty careful to present only scientifically well supported results, which has not helped the perceived utility of our health product. There are certainly valid arguments to be had about the business model.
Indeed, but here "re-identification" generally means the sort of attack where you have an aggregated genomic dataset, and you already have access to full genomic data for a target individual, and you use the genomic dataset to infer something about that target that you didn't know, like whether or not they participated in that study. Not to entirely minimize this sort of attack, but the NIH decided it was a sufficiently low risk that most of the sorts of datasets it applies to (like GWAS) are routinely shared with no access controls.
are you asking about methods to improve privacy of aggregated datasets? They seem to be not super popular with people in the field, I think because they sharply curtail how data can be used compared to having access to datasets with no strong privacy guarantees. I think the maybe more impactful recent shift is toward "trusted research environments" where you get to work with a particular dataset only in a controlled setting with actively monitored egress.
Homomorphic encryption enables standard GWAS workflows (not just summary stats) while “sharing” all genotypes and phenotypes. Richard Mott and colleagues have a paper and colleagues on this method;
I know I'm in the vast minority here, but I honestly don't really care what is done with my DNA data as long as it's not used against me for healthcare & insurance purposes (which I believe is already illegal.) If someone wants to use it to make new drugs, research, etc, I just don't care.
Am sure there’s a very real possibility that some shady data brokers will ultimately gain access to the treasure trove of data before long - and am sure that in our capitalistic world healthcare and insurance companies would happily pay for access.
In the USA. In Europe and such, I'm not saying it's zero chance, but it's extremely unlikely. As Europeans don't rely on insurance for healthcare, but we use government for that :)
( the exception being, if you want private insurance if your government provided one isn't enough)
I never made an account there or uploaded anything DNA related but what happens if a relative did? Is there the concept of a "ghost" account that gets filled in for people who didn't sign up yet but is likely related? Can this be deleted without making an account?
Yeah, they teamed up with the Mormons mapping out the entire human family tree and have been able to predict every possible child DNA sequence in order to eventually create Paul Atreides.
Oh, the scams i have to show you, such sights to behold :
"Dear Sir,Madam
while reading a recently acquired db, i came across your brother in law who has disease xy. Now, me being a decent person, i kept things qiet for now. No need to rattle potential love interests , your children or the community with news about this genetic curse. If you want this silence to last, just subscribe by donating 0.01 bitcoin per year to the Mammal & Animal Foundation for Integrity Agency."
Has anyone tried to export their ancestry data? I've notice the PDF summary lists broad regions, but the data in the App shows more details (e.g. specific counties of countries). Anyone know how to export that data? I'll just take some screenshots, but maybe this info is somewhere else in the export.
Is anyone having trouble downloading their data? Some data, such as "Raw Data", "Imputed Genotype Data R6", and "Phased Genotype Data" need to be requested, but I haven't heard back from them yet.
Would like to download everything first before requesting deletion.
If I had to guess they are really behind right now. Everyone I know is doing this. I checked my email and I emailed mine to myself in 2018 thank goodness. Told them to destroy my sample today and delete my data (don’t know if it will be done but at least I did it).
Not sure if others are having this problem, but my login no longer works, and password-reset emails never arrive (tried three times on three days w/o success)
I just can't understand who in their right mind, and is even remotely concerned about the privacy of their personal information, would upload their own DNA to a business. An internet-based one at that. Lemmings, I tell ya.
Hah, yes. I used to work at a company that sold a popular DNA product. I built part of the GDPR data deletion pipeline. During my last week there, I submitted a request to delete my data from the systems. The final integration test!