Hacker News new | past | comments | ask | show | jobs | submit login
Anonymouth – Document Anonymization Tool (github.com/psal)
69 points by q-_-p on Aug 20, 2016 | hide | past | favorite | 10 comments



Someone should make something like this that could be given an arbitrary file and simply removes metadata from it. For example, remove GPS and camera data from pics or author license key info from Word docs and PDFs.


exiftool [1] can remove all metadata (-all flag) from a whole lot of file types

[1] http://www.sno.phy.queensu.ca/%7Ephil/exiftool/, seems to be down currently though

https://web.archive.org/web/20160816014908/http://www.sno.ph...


I have found rm -f to work great for that! /s

Wouldn't such a tool have to be based on easily outdated or broken blacklists of parts of the file? Complex file formats like DOC can leak data in an immense number of ways.


rm only removes OS "metadata" -- it removes the link to the reference. The file's data is not touched.

Here's a stupid but effective way I came up with to delete the data in files when they are too large to be stored in RAM:

Use dd on the device where the file is stored.

Files intended for eventual deletion can be stored on their own dedicated virtual block devices, or "file-backed virtual disks".

Unless things have changed, on OpenBSD these virtual block devices can be created from /dev/vnd.

To create a ___location to store the file(s) at, create an empty "backing" file with dd, associate it with a vnd, newfs the vnd and mount it.

To delete all the files on the mounted vnd, either umount and dd if=/dev/zero of=/dev/vnd{no}d or dd if=/dev/zero of=rvnd{no}d and umount.

One can also configure a cryptographic disk device over the vnd using a random throwaway password.



Very cool! Thanks for sharing.


Here's one (of mine) that removes IP addresses from academic papers:

https://github.com/kanzure/pdfparanoia


Last change is 3 years old. That means it is quite dead. Also, so many libraries but no Maven :(


It's so far an academic tool created by a research lab and not used much in the wild [?]. So it needs more funding, or someone to adopt it for regular real-world usage:

Some related things are discussed here: https://github.com/psal/anonymouth/issues/6


I tried to get this working a few weeks ago and the outcome was rather sad. See https://news.ycombinator.com/item?id=12194720

Would definitely love to see progress be made on this or a similar tool!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: