Hacker News new | past | comments | ask | show | jobs | submit login

>It is selling de-identified, aggregate data

Just a note that re-identifying aggregate data is a whole field of study that is decently successful.




Indeed, but here "re-identification" generally means the sort of attack where you have an aggregated genomic dataset, and you already have access to full genomic data for a target individual, and you use the genomic dataset to infer something about that target that you didn't know, like whether or not they participated in that study. Not to entirely minimize this sort of attack, but the NIH decided it was a sufficiently low risk that most of the sorts of datasets it applies to (like GWAS) are routinely shared with no access controls.


“In principle” but In 20 years do we gave any cautionary tales? i don’t and follow this area somewhat. Homomorphic encryption is pretty hard to crack.


are you asking about methods to improve privacy of aggregated datasets? They seem to be not super popular with people in the field, I think because they sharply curtail how data can be used compared to having access to datasets with no strong privacy guarantees. I think the maybe more impactful recent shift is toward "trusted research environments" where you get to work with a particular dataset only in a controlled setting with actively monitored egress.


Yes, that is the new UKBiobank approach.

Homomorphic encryption enables standard GWAS workflows (not just summary stats) while “sharing” all genotypes and phenotypes. Richard Mott and colleagues have a paper and colleagues on this method;

https://pubmed.ncbi.nlm.nih.gov/32327562/




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: