Internet Archive: Security breach alert

Springtime · 2024-10-10T03:48:47 1728532127

Just in terms of privacy, it's worth noting that anyone who has uploaded something on IA already has their email address publicly viewable.

This isn't something that commonly known (even judging by comments here) but in the publicly viewable metadata of every upload it contains the uploader's IA account email address. So from a security perspective it's bad but from a privacy perspective a lot of users probably weren't aware of this detail if they've uploaded anything.

hunter2_ · 2024-10-10T04:04:39 1728533079

This raises an interesting question: should email addresses be private? Addresses of buildings aren't private, and they're somewhat analogous as with many computing concepts. (Aside: Before spam filters were quite good, it was typical to avoid scraping of addresses by mild obfuscation, but I think those days are gone, and this is distinct from privacy anyway.)

If someone wants to upload and never be found out, then they need to use a throwaway address in any case, lest they be providing their "private" address to the administrators of the service without explicitly forbidding further disclosure. If I say something to Alice without demanding that Alice keep it from Bob, then I implicitly don't mind if Alice tells Bob what I said.

tjoff · 2024-10-10T05:39:39 1728538779

Whether the email is considered private or not is completely orthogonal to whether you are allowed / should tie an action to your email. And then again completely orthogonal whether you can/should make that connection public.

Even if your email is public information and even if what is uploaded is public information that doesn't imply that the email address behind the account that uploaded that information should be public.

nerdponx · 2024-10-10T15:20:28 1728573628

The same exact thing goes for physical addresses too. The fact that I live at my address is public knowledge. But the presence of my address in any particular database, mailing list, etc. is not and should not be public knowledge.

stefs · 2024-10-10T10:27:02 1728556022

i agree. if "user contacting another user" is a feature, there should be the option to (optionally) supply a different email address than your account email or use an online form that keeps your account email hidden.

emidoots · 2024-10-10T04:42:19 1728535339

There is software which is intended to e.g. locate the GitHub profiles of people working at companies, then scrape all public repositories they've contributed to for their email address and the emails of their coworkers - to enable targeted advertising to those individuals. Very common in enterprise sales.

With ChatGPT, this can be extended to create emails that look very personal - as if someone has followed all of your work and is genuinely interested in what you are up to - with extremely low effort. And people are already doing this, I already get emails like this today.

Should emails be private? I don't know - I personally consider them to be public because I know for a fact mine will eventually be public whether I like it or not. But I am aware AI is out their slurping up every public communication I've ever had, and is likely trying to manipulate me in various ways already today.

benterix · 2024-10-10T10:48:06 1728557286

This was a problem already before the generative AI era, it just got less expensive. The only way to reduce it is to have two work addresses: one that you rarely check and is exposed to the public, listed on your profile etc., and the real internal one just to get the work done.

pixl97 · 2024-10-10T13:55:41 1728568541

>it just got less expensive

Quantity is a quality. Add that the AI can profile you and do a decent job spear phishing and you're talking about a sea change.

>and the real internal one

“Three can keep a secret, if two of them are dead.”

There is no such thing as an 'internal' email you communicate to other people outside your company with. It's just an email address. Someone at some point will leak it by accident or malice.

benterix · 2024-10-10T18:42:11 1728585731

> There is no such thing as an 'internal' email you communicate to other people outside your company with. It's just an email address. Someone at some point will leak it by accident or malice.

Sure, so personally I never use it to communicate with people outside. Also, I make sure it's never used to register with external licenses like Docker Desktop etc. as they subscribe me to their spamlist and send the usual semi-personalized messages - but as far as I can tell most of these bigger companies don't sell them outside (for a good reason). Startups, however, will do what they want and will make sure to squeeze the last drop from the info that such-and-such person works and that company and does X.

Roark66 · 2024-10-10T06:00:05 1728540005

About AI slurping all information. I bet one of the first ideas organisations that spy on population had when the recent AI boom happened was: How about we just train our AI on all the intercepted data and just ask it? Is John Smith a terrorist (for our definition of terrorist)? And the AI would reply: Yes he it, he searched on Google where to buy these ingredients that can be used to make explosives. So then they go and figure out some way to "legally" arrest the guy and obtain more private info. It looks like the guy was buying the stuff because he's got a plot of land to fertilise and an old car to paint. So they ask the AI again. You said John Smith is a terrorist! And the AI would answer. I'm really sorry, I'm doing my best and I'll endeavour to do better in future. After this the agents ask for another billion $ because clearly they need more VRAM.

boscovn · 2024-10-10T15:21:53 1728573713

Personally I've been using an email aliasing service (simplelogin) and try to use a different alias for every purpose. I don't use it for my git commits but I find that email aliasing services are something to look into not just for privacy concerns but also spam mitigation

kurisufag · 2024-10-10T17:11:47 1728580307

>With ChatGPT, this can be extended to create emails that look very personal - as if someone has followed all of your work and is genuinely interested in what you are up to - with extremely low effort. And people are already doing this, I already get emails like this today.

shit, now i don't feel like sending e-mails to people i'm actually interested in

II2II · 2024-10-10T12:58:46 1728565126

> This raises an interesting question: should email addresses be private? Addresses of buildings aren't private, and they're somewhat analogous as with many computing concepts.

There are several ways to look at that.

The organization that I work for considers anything that ties two pieces of information about a person together as private information. That is to say that a person's name is not private and a phone number is not private, but connecting a phone number to a name is private. In one form or another, an email is frequently tied to a name (e.g. the email address is based on their name, or an account record includes both a name and an email address).

Another way is to consider how accessible the information is. There was a lot of information that was not considered as private prior to the widespread adoption of the internet. One issue that I remember popping up in the early 1990's involved property (i.e. land) records. Historically, people had to go to a government office to access them but they were publicly available. Since they were publicly available, some governments made them available online. Once they were available online, the barriers to access were removed (e.g. having to physically visit an office) and the ability to abuse that information was vastly increased. All of a sudden, people started considering something that used to be considered as public information as private information.

Springtime · 2024-10-10T04:57:04 1728536224

An issue is for most sites/services an email has just become a standard authentication method, rather than something that can easily be more unique per account. So any usernames across sites/services that share it identify that user as being the same person (for data broker profiling, doxxing, etc), which is the privacy issue (not the email address per se, unless it perhaps contained one's real name).

For contrast truly unique email aliases for example aren't possible on common services like free Gmail*, only things like self-hosting/certain paid email hosts, which makes less feasible for many. So from a privacy perspective while in an ideal world everyone would be able to freely create entirely unique per-account creds we're mostly stuck with the email implementation.

* One could create entirely separate accounts but it's high friction and IIRC the same phone number (now a requirement) can only be used for 2-3 accounts.

StressedDev · 2024-10-10T06:51:48 1728543108

Proton Mail and iCloud’s hide my e-mail feature allow users to have unlimited e-mail addresses. You can also get unlimited e-mail addresses by running your own e-mail server or using something like Office 365’s business e-mail (costs about $4 per month).

bossyTeacher · 2024-10-10T11:47:52 1728560872

is running your own e mail server a good idea in 2024? Security issues aside, you are at the mercy of the big email providers and whatever rules they want you to follow

kroltan · 2024-10-10T12:14:53 1728562493

For e-mail addresses as an authentication tool, you don't really need to be able to send emails at all, just receive them, and I think that is pretty feasible to not run afoul of the usual shenanigans.

rrwo · 2024-10-10T14:40:53 1728571253

I think the cost of paying for a dedicated email service is worth it. (There are plenty of smaller, privacy-oriented services such as Proton Mail or Fast Mail.)

They're better at it than I am, and it means I don't have to fill up my free time maintaining another server.

bsammon · 2024-10-10T05:53:39 1728539619

> One could create entirely separate accounts but it's high friction and IIRC the

> same phone number (now a requirement) can only be used for 2-3 accounts.

I've wondered about this. Every Android/ChromeOS device I've ever bought, I had a new Google account created for it (during setup, instead of using an existing account), and only a few actually had phone numbers (I don't generally use smartphones for telephony). Is "Google account" synonymous with "GMail account" these days?

I've had this idea for an experiment where I get such a device (without a simcard), and see how many times I can iterate the Initialize-Device-With-New-Google-Acct-PowerWash-Repeat cycle, and how many Gmail accounts I would have as a result.

sureglymop · 2024-10-10T06:55:18 1728543318

Why did you do that? Android doesn't require an account to work.

bsammon · 2024-10-11T06:02:25 1728626545

(For both Android and ChromeOS) I thought it would be significantly easier to let it use a Google account, than it would be to make it proceed without one. Was I wrong? Serious question.

Links to information would be appreciated, even/especially if it's a complex task to do this.

(I never put a lot of effort into this, because having the Google account be anonymous/fake-named was generally tolerable for my privacy standards)

exe34 · 2024-10-10T09:00:16 1728550816

I think it does if you want to install anything from the Play Store.

gdevenyi · 2024-10-10T10:54:27 1728557667

Aurora store gets around that

exe34 · 2024-10-10T11:42:41 1728560561

the search doesn't really work does it? you have to search on Google and then click on it to open with aurora.

but you're right, it does help!

gdevenyi · 2024-10-10T13:49:12 1728568152

The search worked for me to find a single app I needed when I was setting up a single-use tablet recently, but I haven't used it hugely beyond that. YMMV

KronisLV · 2024-10-10T07:01:57 1728543717

> This raises an interesting question: should email addresses be private?

I sadly don't think that's viable.

What might be, in our current world, would be having a mail server/client setup where you can generate random addresses for yourself like [email protected] and never re-use an e-mail address, much like with passwords, while being able to see all of the incoming mail in the same place and respond with the corresponding accounts.

Then, when your address gets traded around, it'd be fairly obvious (with some basic bookkeeping, e.g. a text field with purpose/URL for why a certain address was created) who is to blame for it and blocking incoming traffic from somewhere would be trivial as well.

I do have a self-hosted mail server and there are commands to create new accounts pretty easily, I'd just need to figure out the configuration for collecting everything in one place, as well as maybe make a web UI for automating some of the bits. I wonder if there are any off the shelf solutions for this out there.

ddoeth · 2024-10-10T07:15:00 1728544500

I also have my own mailserver and I don't create new accounts, I have a wildcard filter that drops all emails that come to my ___domain in my inbox. This is of course only viable when you are the only person using the ___domain, but I just sign up with a new mail address every time I sign up, for example my hackernews account would be [email protected] That way I have a clear differentiator for every ___domain.

iam-TJ · 2024-10-10T08:30:21 1728549021

I do something similar except that I do not allow wildcard reception - I create unique service-identifying user@ for each service I give an address to, and have a simple script that immediately adds that to the Postfix virtual table.

That way the SMTP server can reject all unknown user@ without accepting them in the first place - preventing spamming and some types of denial of service through resource starvation.

I also apply greylist based on a unique tuple (From, To, client IP address) so on first connection with that tuple valid SMTP clients need to re-deliver the email after a waiting period. Any subsequent delivers are accepted immediately.

KronisLV · 2024-10-10T07:23:53 1728545033

That's a pretty cool approach! I'd only be worried about the risk of leaking the main account address when responding to anything, but it's probably doable with a bit of research, like Postfix catch-all setups seem straightforward enough.

climb_stealth · 2024-10-10T08:33:19 1728549199

FWIW that should just be a matter of using the right configuration and mail client. With Fastmail for example I get to use a catch-all setup with my ___domain, and respond to whatever email it was sent to.

And the other way around as well. Send an email from an arbitrary <whatever>@___domain email address.

EVa5I7bHFq9mnYK · 2024-10-10T07:25:13 1728545113

Yes, but privacy suffers with this approach, because if one of emails ending in @___domain.com is tied to your identity, all are.

KronisLV · 2024-10-10T07:30:47 1728545447

That's not really my use case, but seems like an important concern for many!

At that point, you probably want to use whatever features one of the big providers use, like: https://proton.me/support/aliases-mail

Maybe even something that'd sit in front of a mail server that you yourself control, I wonder what the variety of options out there is.

Sebb767 · 2024-10-10T09:06:09 1728551169

This is true for someone manually searching for your info, but sufficient to fool spam lists and most data brokers. This really depends on your threat scenario.

squarefoot · 2024-10-10T09:23:53 1728552233

> This raises an interesting question: should email addresses be private?

Yes and no. Both of them. As any powerful tool, email is going to be abused, like any other alternative would be when it will come one day. Those services allowing creation of dynamic email addresses do their job (until they're banned, that's why I'm not mentioning them), however using them isn't automatic and most people don't even know about their existence. What if we then did upgrade email protocols to reflect current needs wrt privacy and modified existing mail servers so that they could create dynamic addresses when asked by a simple flag? Example: I want to subscribe to a service from company XYZ, however I'm not sure how much I can trust them, therefore, when writing an email or filling a web form I can activate the option to create a new address that is tied to the recipient I'll be writing to, and will work as a dedicated proxy for my real address, that is, every mail I send to the recipient using my real address will be actually sent from the new dynamic address, then all replies to the dynamic address will be routed to my real one, but a field in its headers will always contain either a memo by me (example: "signup with XYZ") or the original recipient (example: "info@xyz_trustuswerenotspammers_yeahsure.com"). This way one can immediately spot whoever sold their address to others and blacklist them. As said, those services work well but not being built in into mail servers and clients their adoption is quite restricted. I don't see why that function shouldn't be embedded in a new upgraded email protocol as the modification would neither be that hard nor consume any serious resource. I would however expect heavy resistance against the adoption, of course.

tomjen3 · 2024-10-10T04:58:59 1728536339

In a world where email costs ten cents to send (per receiver) email addresses need not be private. In our world? They kinda need to for sanity.

skeeter2020 · 2024-10-10T16:16:38 1728576998

even 1/100 of a cent would solve the problem - but create a bunch more!

numpad0 · 2024-10-11T02:21:42 1728613302

I think it just needs to be communicated. Some websites allow login only by login name and not by email, some people have identifying last name, others hardly identifying full name and whatnot. There's no universal or universally agreed answer to that, so it needs to be said whether your service _consider_ it public information or not.

makach · 2024-10-10T04:38:08 1728535088

Pr definition the email address is considered as private information and should be protected accordingly.

figassis · 2024-10-10T13:27:48 1728566868

It should, mainly because an email is not just an email, it's a channel to reach otu to you, your internet address. And we know how that is going in your inbox.

weinzierl · 2024-10-10T08:12:59 1728547979

This raises an interesting question: should email addresses be private?

GDPR is clear on this and there have been significant fines for revealing email addresses against the will of their owners (e.g. using cc instead of bcc). Not saying this is the ultimate wisdom, just a data point to consider.

theragra · 2024-10-10T19:43:23 1728589403

By itself or linked to other data? Afaik PII is usually a set of linked data. As in common name and surname are not PII. Together with age, they can be.

iicc · 2024-10-10T14:40:44 1728571244

>Addresses of buildings aren't private, and they're somewhat analogous as with many computing concepts.

Buildings are analogous to domains, not email addresses.

fortyseven · 2024-10-10T04:36:47 1728535007

> should email addresses be private?

I dunno. Should your personal phone number be private? Or your home address? Would you be okay if I knew it and shared it with a stranger? Or would you rather be asked permission to share it first?

Seems pretty cut and dry to me. Yeah, there's going to be someone out there (there always is) who doesn't care, but I'd wager the majority would be pretty ticked off if you gave those pieces of information out to a rando on the street.

mjr00 · 2024-10-10T04:59:59 1728536399

None of that information is actually private though. Your home address and personal phone number are likely in the public record for any number of reasons, such as ownership records or court filings. Or maybe a Facebook post from 2009 that your mom made. Unless you're one of the 0.00001% of people who do things like rotate your phone number and address annually, it's out there somewhere.

But public vs private is a spectrum, not a binary true/false. My phone number is public because I get sales calls from various companies to it. It's annoying, but bearable. But there's a big gap between that and the New York Times putting my name, number and picture on the front page.

So your home address and phone number aren't private. But they're also not readily accessible unless someone is really dedicated to finding them, so they're not quite public either.

amszmidt · 2024-10-10T04:44:09 1728535449

There are plenty of countries where all that is public information, back in the day there even used to be a phone book with .. name, phone number, and address. And many countries have this now in digital form.

chii · 2024-10-10T07:31:34 1728545494

The missing part is the action part.

An email (or phone number, or address) is an identifier. Asking whether this identifier is public or private misses the important thing, which is the action that can be paird with the identifier.

So therefore, there's no universal answer to whether the identifier should be public or private. It's a case by case basis, when paired with an action.

For example, i don't want a shop to see me buying condoms, so shops shouldn't get my email address (or phone number).

emidoots · 2024-10-10T04:48:50 1728535730

Interestingly, public U.S. state property records will just disclose where you live whether you like it or not. With as little as your name, a home address is trivial to find.

harywilke · 2024-10-10T11:24:48 1728559488

We used to get these big books delivered to our doorsteps that had your name, your address and your personal phone number. You could pay to opt out.

the_gorilla · 2024-10-10T16:01:22 1728576082

If I published a list of all name and addresses, that's still different than "here is harywikle's full name and address". I imagine you wouldn't be too pleased?

hunter2_ · 2024-10-10T17:24:28 1728581068

The link between online identity and offline identity is a sacred barrier. And I'm not sure that archive.org breached that particular barrier.

the_gorilla · 2024-10-10T17:29:48 1728581388

That's the issue I take with the "phonebook" defense. It justifies doxing people by collecting and connecting publicly available information online. All the information is out there, it's all on a phone book, your email was published online, and so on, but the end result is clearly bad so something in the process should be handled more carefully.

GeoAtreides · 2024-10-10T07:05:41 1728543941

Phonebooks were a thing not so long ago...

mdp2021 · 2024-10-10T07:12:04 1728544324

And they contained data of which people allowed disclosure. When you did not want your information to be published, you informed the telephony provider and the phonebooks would not include it.

exodust · 2024-10-10T13:30:49 1728567049

For a fee. In Australia at least it cost money not to be listed in the phone book.

Numbers were however tied to a property rather than individual personal phones in our pockets. When you think about it, mobile phone technology arrived quickly and caught everyone by surprise. Back in the 80s very few people thought we'd be carrying around "pocket TV phones" in such a short time.

szundi · 2024-10-10T06:45:34 1728542734

This question could not be more academic

keybpo · 2024-10-10T13:30:49 1728567049

It's not just uploads but any item that uses the email address as a unique user identifier (I'm not technical enough to explain this clearer but [1]).

An email address will be part of the xml in his uploads but also in his profile, which anyone can access by simply changing the url from https://archive.org/details/@foobar to https://archive.org/download/foobar. So, in essence, one just needs to have a registered account, independeltly any uploads made.

[1] https://help.archive.org/help/accounts-a-basic-guide-2/

steffanA · 2024-10-10T15:26:59 1728574019

This is bad enough. This alone is a privacy bug/data leak.

Theoretically, someone could scrape the pages and compile a list of exposed email addresses.

spease · 2024-10-10T16:25:32 1728577532

> Theoretically, someone could scrape the pages and compile a list of exposed email addresses.

I laughed. Oh no! Anyways…

The people interested in identity theft are probably too busy figuring out what to do with all the SSNs they stole (not from this breach, but from the annual catastrophic breach of a credit bureau or government repository).

And the people who want your email probably already got it from one of the hundreds of other services you have to create an account for now.

I’m not really sure if there are circumstances where donating to the internet archive could be held against you and lead to persecution. Maybe in certain Luddite communities? The Amish? But then, how would they know…

rrwo · 2024-10-10T14:34:39 1728570879

One solution is to use a unique email address for every website, and change the address if the site gets compromised (with the old address getting added to a spam filter).

999900000999 · 2024-10-09T22:24:48 1728512688

A pulled an old friends website down from Internet Archive.

He's moved on the next stage, but I was glad I was able to put his site back up.

It'll be a shame if IA goes down permanently, but we need a decentralized solution anyway.

Having a single mega organization in charge of our collective heritage isn't a good idea.

gabeio · 2024-10-09T22:44:27 1728513867

I have always thought about this. It would be interesting to have users actually store small amounts of redundant info on a device connected to the internet. Very similarly to what a torrent does but with more peers (more data shards than full copies) and less seeds. And try and keep a huge database for everyone. Obviously open source and it would end up something like tor where they just assist the network with security patches but they don’t actually have any real “control” (admin dashboard control) over the network at large. We already do something smaller but like that with website static file caching, but at much smaller scale. Obviously security implications of this would be very hard but maybe not impossible to overcome. ipfs comes close but it again does more seeds then peers.

if anyone knows something like what I'm suggesting, I'd love to hear about it!

pbhjpbhj · 2024-10-09T23:10:42 1728515442

IIRC there were a few storage based projects that popped up using alt coins to encourage people to offer excess storage space for other randos on there internet. The possibility you might be storing illegal content might have been what killed it/them.

https://en.wikipedia.org/wiki/Cooperative_storage_cloud gives a few examples, like Filecoin.

fwip · 2024-10-10T20:17:47 1728591467

In my opinion, IPFS was killed by a few things:

1) wedding itself to crypto with FileCoin.

2) terrible performance due to architectural choices (basically: too much pointer-chasing, except every pointer was back out to the DHT).

3) No serious attempts to integrate with existing software distribution strategies.

I think it's still a good core idea.

anacrolix · 2024-10-11T09:57:21 1728640641

Its DHT implementation was shit. Ignoring all existing wisdom, it uses persistent connections, rates peers and has far too many special nodes.

IAmGraydon · 2024-10-10T18:49:02 1728586142

Are you, by any chance, named Richard Hendricks?

xyzsparetimexyz · 2024-10-09T23:46:33 1728517593

The main issue that such hosting faces is that it's less efficient and more expensive than just regular centralized servers.

999900000999 · 2024-10-10T02:47:05 1728528425

Anything would be better than the current system where you basically just have one source.

Independently ran mirrors all over the world, along with snapshots.

Have the occasional fork or two. Say your from a small town in Northern Illinois. If you have 2 TB of image archives from a defunct local newspaper, it might be good for photography forks even if it wouldn't make sense for the main archive.

rottc0dd · 2024-10-14T09:58:54 1728899934

Does https://ipfs.tech/ fit the bill?

Geezus_42 · 2024-10-10T03:42:40 1728531760

This was a plot line in Silicon Valley.

Xen9 · 2024-10-11T21:07:13 1728680833

I believe that it would be possible to cost effectively build and implement an architecture for a distributed IA backup—this comment entails some notes.

The system that asks volunteers about their age, sex, ___location, and storage format details (the model, past use etc. can be used to predict the durability of a single storage) without sharing most of this data anywhere.

The downloaders are then algorithmically allocated pieces of the archive. Exampli gratia such that there is at least limited amount of overlap between the pieces, and two people same country won't provide redunancy for each other.

When a downloader verifies that they have completed the download by giving (unique, to prevent fake-download sabotage) SHA hashes of the data, the information that these pieces have been downloaded in this or that country, plus an estimate of the reliability of the storage, is added to a public database, for the algorithm to use in the future.

Every downloader is then generated a public and private key so that they can give the hash of their download again once in a while or just verify that the piece is still there. The reliability estimates (based on storage / hardware details) would be empirically calibrated based on the data about the actual storage failures.

A public counter, estimating how well the archive is currently backed up via this scheme, could be displayed.

For copyright issues, it would be possible to encrypt some of the data, e.g. such that normally borrowable items become readable files only when X% of downloads are pieced together.

The scheme would be primarily based on existing designs and algorithms but work roughly as depicted above. I am not an expert of what compression, hashing and other algorithms should be used, and it needs lots of good work, to determine how to avoid errors in the scientific part of estimating the reliability of the downloads—and generally a situation where it would turn out that lots of data was lost when attempting to put the pieces back together again.

Remark (engineering): To empirically validate the correctness of the software of the backup architecure by testing it on grids of real hard drives in single places will probably give safety against catastrophic failure. Even better would be to obtain large amount of old hard drives and SSDs kept in a single place for a long time, to validate that the software works over time.

Remark (integrity): That a downloader actually has the downloads can be verified efficiently by IA server adding small part to the piece the downloader has, hashing it again, and requesting the new hash.

Remark (redunancy): It may be possible to develop a social program that analyzes whether a volunteer in certain place can provide more redunancy by buying themselves a hard drive or by supporting the acquisition of hard drives for volunteers who have proved themselves realiable elsewhere. This is speculative and the benefit may be lower than the risks.

Finally, instead of "public database" it may be much more optimal to decide to use a blockchain of some sort. Not a cryptocurrency, but a blockchain. This is because if the idea is to distribute copies over the world to ensure continguency in case of IA main architecture collapse, then the more parts of the distributed backup architecture (which must actually not be "the backup architecture" but "a scheme", that no everyday IA decisions rely upon, and that just exists out there) are on a blockchain network run by a "decentralized" system, the more reliable it will be.

My heuristic plausibility analysis: 0. IA backup would not need to be constantly accessed or changed (this makes storage easier, cheaper and prolongs the maximun age of the storage) 1. Not all IA has to be backed up: a distrobuted backup that successfully recovers 10% of IA in a catastrophe is by all means a great success (consequently priorization of what might / should be stored should probably be part of the algorithm that decides what volunteers download; and what existing "big" archives already store that overlaps with IA should be taken into account in this analysis) 2. I recall you estimated 30-40 M USD ballparks for a single copy: a properly led open source project may be able to develop this for free, and fairly compensated one could be ~ 0.1% to 1% of the cost. 3. The Sia network https://siascan.com/ has space for 7PB; and it's for storage where one can download their own files at any time; and they have had very little publicity. 4. 2TB hard drive costs 50-100 USD and 20PB would be 10 000 humans buying one 2TB hard drive which by itself is possible. Hobbyists and organizations may be able to provide even larger capacities. 5. Most IT projects fail, but since lots of technology already exists and in this we know what we are doing and IA might be able to recruit above talent we can conservatively, give conservatively 50% chance the groundwork development to succeed, or 45% without funding. 6. If the develoment succeeds, then there may already be around ~ 100 potential volunteers. I estimated that 0.1% IA visitors may volunteer, plus 1% from Hacker News traffick were to project to be mentioned there, plus growth over first few years and traffick from elsewhere. Perhaps 75% chance to get 10% of IA backed up by volunteers, given development succeeds. 7. If that much is backed up, there is perhaps 5% of attaining 200 TB in next few decades.

Conservatively, given that open-source development starts, one gets apprx. 33% - 38% chance that 10% backup is achieved & apprx. 1-2% that 100% of what is now in the IA, could be backed up. These are of course rather meaningless numbers, but the fact seems that in the lack of funding to build a complete backup IA can best guarantee continguency by starting to build a distributed one. Perhaps this was needlessly lots of words for a simple proposal.

- X

---

Note: It's probable that at least the NSA has a private full IA backup.

max-throat · 2024-10-10T16:20:44 1728577244

This is why BitTorrent and other P2P solutions were invented, but alas: A. The RIAA, MPAA, and ESA have given these technologies a terrible reputation. B. Nobody likes to seed. Some kind of seeding-based crypto would have been a great incentive if cryptocurrency wasn't also demonized by now.

fwip · 2024-10-10T20:22:10 1728591730

Part of the reason people don't/didn't like seeding is that many residential lines are so terribly asymmetric. If you had 100down/5up, seeding your torrent at a useful speed was often enough to degrade your connection into unusability.

aucisson_masque · 2024-10-09T22:45:01 1728513901

It's called torrent protocol and it doesn't work, no one wants to spend money and bandwidth hosting a god forsaken movie or book that only a handful of people care about.

squarefoot · 2024-10-09T23:14:21 1728515661

Not much money and bandwidth if you aren't on a metered connection. You can share tens of gigabytes or more on a cheap read only flash plugged into into a $25 single board computer that draws way less than a full PC and can be left sitting there near the router. Just limit its bandwidth on the torrent client and you won't even notice it during online gaming. The client can be as small as the Transmission daemon running headless on one of the many Debian based embedded distros: all control through either the web interface or from its client: no monitor, mouse, keyboard etc. just a small cheap box.

https://www.friendlyelec.com/index.php?route=product/product...

(just an example, as it's way overkill for the task)

https://transmissionbt.com/

https://github.com/transmission-remote-gui/transgui

oxygen_crisis · 2024-10-09T23:50:42 1728517842

I see 24 seeders for the entire 72-episode run of the 1991 sitcom "Herman's Head" which was so poorly rated that it's never seen a home media or streaming release, your premise doesn't hold any water at all.

pessimizer · 2024-10-10T00:40:52 1728520852

People are pirating comic books and cookbooks from the 30s; there are a lot of people in this world, if something goes on the web and you tell everyone you put it there, it's pretty much preserved. It's only law enforcement that kills free availability of everything all the time online, for better or for worse.

With copyright, as individuals we get to trade all of the wonderful stuff already made (and long paid for) for the flood of minute-old shit and sludge inundating us online constantly. It's a bad trade. Maybe copyright should stop encouraging creativity; the answer to how "artists" would get paid post-copyright might be "who cares, quit if you want."

We already have Herman's Head, we don't need any more crap.

sgc · 2024-10-10T05:02:38 1728536558

I never thought about UBI and copyright - but as soon as you say that, it is immediately obvious to me that when we have some kind of UBI, copyright should be dramatically reduced.

tourmalinetaco · 2024-10-10T05:19:52 1728537592

Copyright should be reduced in general. 20 years was already excessive for exclusive control over culture, 200 is just absurd.

sgc · 2024-10-10T13:33:09 1728567189

I 100% agree. Just pointing out that UBI changes the discourse on this subject.

throw10920 · 2024-10-12T15:41:21 1728747681

> With copyright, as individuals we get to trade all of the wonderful stuff already made (and long paid for) for the flood of minute-old shit and sludge inundating us online constantly.

What does this have to do with copyright? People post sludge online even in chaotic meme environments where copyright is irrelevant and people constantly take and repost each others' stuff.

0x1ch · 2024-10-09T22:53:47 1728514427

It does work, when you don't notice it. We need sane limits and permanent seeders. This is why so many regular people get hit with ISP notices, they don't know they've seeded Captain America for the last six months every time they started their PC.

idle_zealot · 2024-10-09T23:02:49 1728514969

Yup. If browsers built in support for magnet links and (on desktop) defaulted to seeding with some capped bandwidth then a lot of centralized hosting platforms would become unnecessary.

kmeisthax · 2024-10-10T04:21:52 1728534112

You can build something very similar with WebRTC. Browsers already have P2P networking capability, it's just not immediately interoperable with BitTorrent clients. Standardizing some sort of BitTorrent over WebRTC bridge and adding it to BT clients would fix this problem.

That being said, please do not host content this way. P2P blows away the already thin privacy guarantees that the web provides. Anyone seeding the site gets the IP addresses of everyone on that site, and can trivially correlate that with other sites to build detailed dossiers on, if not individual people, at least households[0] of people. After all, that's how the MAFIAA[1] sent your ISP DMCA scare letters back in the 2000s P2P wars.

[0] IPv4 CGNAT would frustrate this level of tracking, but IPv6 is still subnet per subscriber. Note that you can't use individual v6 addresses because we realized very early on that the whole "put the MAC in the lower 64 bits of the address" thing was also a privacy nightmare, so IPv6 hosts rotate addresses every hour or so.

[1] Music And Film Industry Association of America, a ficticious merger of the MPAA and RIAA in a hoax article

palata · 2024-10-10T09:35:07 1728552907

> You can build something very similar with WebRTC.

Isn't that exactly what WebTorrent is?

idle_zealot · 2024-10-10T07:45:41 1728546341

I hadn't considered the privacy implications. For this to be workable, you'd need to pair it with near-ubiquitous use of some anonymizing overlay network.

geraldhh · 2024-10-10T17:12:31 1728580351

iirc opera browser tried that

Timber-6539 · 2024-10-10T04:24:58 1728534298

If the whole world has bandwidth available for TikTok, it can make the same available for sharing torrent files.

homebrewer · 2024-10-10T00:34:36 1728520476

I've been seeding some unpopular torrents for ten years (would have done for even longer if I did not change the torrent client a decade ago). "No one" is too strong a word, as usual with these absolutist things.

aucisson_masque · 2024-10-11T06:33:28 1728628408

Agree, shouldn't have said no one. But you got to recognize that some torrent are most popular than other.

I would have absolutely no trouble downloading the latest marvel movie but if you are looking for some old Soviet movie, Iranian movie or even old American movie then you're in bad luck. I've never seen more than 0 seeder on thepiratebay.

trinix912 · 2024-10-10T07:39:55 1728545995

In addition to the costs, I'd say it's also that no one wants to risk getting sued like the IA is getting.

EamonnMR · 2024-10-10T19:28:00 1728588480

I keep wanting to do this for old sites, make like a personal mini IA. Besides just using wget or curl, any tips for pulling down useable complete websites from IA?

account42 · 2024-10-10T10:48:51 1728557331

Agreed, especially an organziation that has already shown to not always be impartial.

Simran-B · 2024-10-10T07:34:14 1728545654

A decentralized solution, doesn't that scream internet archive on blockchain? What could go wrong.

brundolf · 2024-10-10T19:26:21 1728588381

This is one of the very few real use-cases I can think of for the blockchain

micromacrofoot · 2024-10-10T14:09:19 1728569359

torrents maybe

steffanA · 2024-10-09T22:53:39 1728514419

More details here about the data breach. Stolen database contains 31 million records.

https://www.bleepingcomputer.com/news/security/internet-arch...

ano-ther · 2024-10-09T23:38:56 1728517136

> the Have I Been Pwned data breach notification service created by Troy Hunt, with whom threat actors commonly share stolen data to be added to the service

Do they? Why?

Maxious · 2024-10-09T23:44:27 1728517467

Proves they really did hack something. There's other sites where hackers register defacements etc.

richbell · 2024-10-09T23:43:50 1728517430

If Troy authenticates the data, they can use that as an 'endorsement' when trying to sell it.

ianhawes · 2024-10-10T03:46:14 1728531974

This. Typically HIBP attribution includes the email of the "submitter". Various data aggregators will contact them and buy the stolen data. Everybody wins*.

* Exceptions apply.

Thorrez · 2024-10-10T05:35:13 1728538513

Where on HIBP can I see the email of the submitter?

ramimac · 2024-10-10T07:51:43 1728546703

It's not available in this case, or every case. When available, you can search "The data was provided by" in https://haveibeenpwned.com/PwnedWebsites

Thorrez · 2024-10-11T07:59:30 1728633570

Thanks! Slight correction: only 2 breaches say "provided by" with a source, but a ton of breaches say "provided to" HIBP with a source.

divbzero · 2024-10-10T19:26:01 1728588361

Is there a way to modify the HIBP reporting process to avoid aiding the sale of stolen data?

RamRodification · 2024-10-10T12:16:04 1728562564

Doesn't the value drop dramatically if it has already been shared with Troy and the HIBP database? Or is there a time frame where it has been authenticated by Troy but not yet added to the database?

richbell · 2024-10-10T12:58:25 1728565105

I don't think so.

Troy isnt publicly sharing the credentials and that's what's valuable — especially having "exclusive" access.

He blogged or tweeted about this at some point. Sadly, I can't find the link.

xproot · 2024-10-09T23:50:58 1728517858

Anyone who buys it or finds it in the wild can also upload it.

mkl · 2024-10-09T23:05:58 1728515158

> The data will soon be added to HIBP

My unique-to-archive.org email address is not there yet.

nikisweeting · 2024-10-09T23:10:29 1728515429

I just checked and my unique-to-archive.org email is showing up in the breach as of 2024-08-09.

SushiHippie · 2024-10-09T23:37:45 1728517065

Mine isn't, but I've created my account only a week ago, so maybe I've created the account after the breach.

EDIT: Should've read TFA more thoroughly, it says the breach happened before the 30th September. And I created my account around the 2nd October

Funes- · 2024-10-09T23:14:21 1728515661

Mine too.

paulnpace · 2024-10-10T12:24:23 1728563063

Many hackers will remove addresses that are obviously unique, including tags, to keep silent which database has been hacked, but it seems inconsistent.

I have checked and known my address was in a hack and it isn't there, while other times it is there. I also wonder if they start filtering out by ___domain, as they see a ___domain across multiple databases with unique addresses in each database exactly one time.

mobeigi · 2024-10-10T03:10:44 1728529844

Out of curiosity, do you use a unique email address for every single service?

mkl · 2024-10-10T05:18:02 1728537482

Yes, without exception. I want to know who is leaking/selling my address, and usually stop doing business with those who do. It also makes filtering really easy. People sometimes have strange reactions when I verbally give them an email address with their company name in it, especially when I'm a new customer.

All you need is a ___domain and an email provider that allows catch-all addresses, both of which are easy and cheap.

pixxel · 2024-10-10T05:44:22 1728539062

I do the same but use initials and random chars so hackers or employees can’t assume my email addresses for other sites/services.

e.g.: [email protected]

jenscow · 2024-10-10T10:43:39 1728557019

I also use @my.other.___domain for websites, so my human contacts won't assume it is me if they see it.

Towaway69 · 2024-10-10T12:53:28 1728564808

I love doing that, when someone asks me for an email address, it’s always [email protected] - always gets strange looks!

Edit: even more fun with catch all domains then it’s [email protected]

dyingkneepad · 2024-10-10T17:15:45 1728580545

I always see people claiming they use this strategy, but I never ever ever see people blaming services saying "this and this company sold my data to spammers". Where are the name-and-shame people? Have you ever caught anybody doing anything?

mkl · 2024-10-10T19:26:08 1728588368

It's hard to distinguish between leaking and selling, but I think leaking is much more common. Dropbox famously leaked a lot of emails in ~2012, including mine - I was never a paying customer and that put me off becoming one or using them (to this day most spam sent to my ___domain is to that Dropbox address). Two local PC parts companies leaked or sold my email. I confronted one about it and they claimed they hadn't had a data breach, so either they sold it, or they were too incompetent to know they'd been hacked, or they lied - I suspect incompetence but whatever happened they lost my business. A couple more incidents long ago too.

Real estate agents can be pretty aggressive with emailing, but IME respect unsubscribes and don't seem to share/leak emails. I kind of wish I'd used an address per agent instead of per company to see what was happening better.

Non-company uses can also reveal issues. I had an address scraped from a flatmate finding site, and one apparently lifted from a relative's contact list somehow (I only have one I use for family, so that was a concern, but spam to it petered out quickly).

TobTobXX · 2024-10-11T14:07:03 1728655623

Yes, I was one time suddebly getting whine ads on an E-Mail for a service I signed up. I contacted the service (rather unfriendly) and they apologized and the unwanted E-Mails stopped.

markgoho · 2024-10-10T16:22:44 1728577364

is each address truly unique or are you doing something like [email protected], [email protected], etc.

mkl · 2024-10-10T20:37:00 1728592620

It's a separate address that can have its own mailbox if need be, but unless you want to keep meticulous records on the go, and refer to them constantly, some sort of pattern is required.

systems_glitch · 2024-10-10T11:23:52 1728559432

Yeah we run this on our own Proton Mail whitelabel, and for a few customers who have us manage it, mostly for the filtering aspect, and the occasional customer who has the wrong/mis-spelled address in their system and won't change it.

buildsjets · 2024-10-10T03:18:29 1728530309

Not the author but yes, I do. It’s trivially easy so why not?

nicolas_t · 2024-10-10T04:46:58 1728535618

Same here, only issue I’ve ever had was when my email address had the name of the company in it in the format of [email protected] CS people are sometimes confused by that and I’ve been accused of attempting to hack them by a small shop online because of my email.

qingcharles · 2024-10-10T05:44:28 1728539068

Major SMTP provider refused my email address as login because of this. Luckily my moaning eventually made its way to one of their developers who fixed it.

You can't sign up for a Samsung account with the name Samsung anywhere in your e-mail address. Aliexpress another offender. There my email is just spam@___domain.

jmb99 · 2024-10-10T06:55:15 1728543315

I used ali@___domain for aliexpress, which was accepted.

JCharante · 2024-10-10T07:41:57 1728546117

"Are you from corporate?" is what I often get when I need to give my email to a store associate.

phantomathkg · 2024-10-10T04:32:06 1728534726

Curious, how trivially easy is that?

TheDong · 2024-10-10T04:44:19 1728535459

It's quite trivial.

1. Buy a ___domain. About $10/year for a .com

2. Buy a /24 ipv4 block with good reputation (maybe like $10k)

3. Get a rack in a nearby datacenter, rack up a BGP-capable router and your servers for redundancy to run email. Takes about $30k initial setup costs if you buy all new, and about $5k initial setup costs if you cut corners and buy used. It'll be $2k/mo after that, so less than the cost of 1 $100 avocado toast per day, quite affordable.

4. Setup your mailserver of choice, such as dovecot + postfix. Enable either a catch-all address, or use recipient_delimiters. The former means "[email protected]" works, and the latter means "[email protected]" works (assuming your recipiient_delimiters are '-'). I recommend using a real catchall.

5. Setup your spam setup, this is the hardest part. I have no guidance here.

6. Point your DNS over, setup SPF and DKIM records, test, and off you go! This should all take about 1 to 3 days if you know what you're doing.

7. Find out that some email will go to spam anyway because you're not using one of the big 4 email providers, but it can't be helped, and anyway no one uses email anymore.

And after that, for less than $30k/year, you have email with catchall or subadressing support. Nice and easy.

You can also pay Fastmail for email and use their "catchall" feature https://www.fastmail.help/hc/en-us/articles/1500000277942-Ca...

Or Google Apps also has a catchall feature.

Then, after you do this, you can simply give internet archive the email address "[email protected]", or generate a random string. If you forget the email you used, you can search your email history for the first email they sent you, and check the To field.

2Gkashmiri · 2024-10-10T04:54:13 1728536053

Hold on.

Why do you need a dc rackspace and a /24 just to have your email ?

TheDong · 2024-10-10T05:18:00 1728537480

This is hacker news, we're all either founders who have 2 billion dollars in (illiquid) stock options, or FAANG employees making 600k/year, what else are we going to do if we want email?

Sure, you could pay fastmail $40/year for this, but that's not really the hacker news spirit, and no one on this site knows how to count as low as $40.

The real justifications you can give yourself:

Shared VPS hosting pretty much all bans email, AWS, DO, etc all have ToS that say "no email" as anti-spam measures.

Shared IP space will go straight to spam due to people having spammed on it in the past. Buy a /24 to ensure you don't go straight to spam.

Rackspace ensures you actually own your email, at least moreso than with other shared hosting, and owning your email is important.

account42 · 2024-10-10T10:35:56 1728556556

> Shared VPS hosting pretty much all bans email, AWS, DO, etc all have ToS that say "no email" as anti-spam measures.

Complete FUD.

Here is DO's acceptable use policy:

https://www.digitalocean.com/legal/acceptable-use-policy

You can see that they explicitly have policies for email hosts.

Here is a guide they host on how to setup a mail server:

https://www.digitalocean.com/community/tutorials/how-to-run-...

They forbid spamming, not all mail.

> Shared IP space will go straight to spam due to people having spammed on it in the past. Buy a /24 to ensure you don't go straight to spam.

I have had no problems with deliverability to Google from an IP on a shared block. I don't send marketing mails or any other kind of spam though. Microsoft blocks my IP but they are too small (outside businesses) for me to care to give them special snowflake treatment.

Deliverability of your own mails is also irrelevant for the original discussion about using unique email addresses for signing up to services - you don't need to be able to send at all for that.

2Gkashmiri · 2024-10-10T09:44:09 1728553449

been using racknerd.com vps for last 3 years for running miab. ZERO problems so far.

costs around $12/year+___domain

jmb99 · 2024-10-10T07:02:07 1728543727

For the “least painful” self-hosted email setup, you can’t be hosting on an IP in a subnet that’s ever sent spam, if you want to avoid being blackholed occasionally. This means you can’t have an IP allocated to you by a hosting provider, or a residential ISP, or a “business” ISP, or any cloud provider. That leaves very few options.

Note that I am speaking from personal experience here. I have been self-hosting email for over a decade, from the same IP, with (roughly) the same DNS records. Occasionally, for no reason, I will end up on the global spam list for Gmail, Outlook, or iCloud - never more than one at the same time, and never with a discernible reason. The best I can figure is that the IP is allocated to me by a hosting provider that occasionally sends out spam from its subnet (aka any hosting provider that doesn’t block smtp). I have also tried self-hosting a different mail server from a variety of residential IPs in different cities and countries, and ran into the same problem.

marmaduke · 2024-10-10T05:18:01 1728537481

It’s a joke ! You can run an email server off your phone

squarefoot · 2024-10-10T09:37:11 1728553031

Not sure if mobile carriers would allow the required ports to be routed, and the connection is usually behind CGNAT, so you can't accept connections from the outside to receive emails. Many home ISPs however can give you a (mostly) unfiltered public IP that once paired with a dynamic DNS service can be reached from the outside. Once the network part is solved, a small cheap box (*Pi like board, mini PC, etc) can be set up to act as mail server, with firewall rules on the router that don't expose anything else to the outside.

marmaduke · 2024-10-12T08:53:31 1728723211

I meant just in terms of compute power. Like my isp gives me a static IP with forward and reverse dns, and the box lets me put the phone WiFi ip address in the DMZ so all traffic is handled by the phone. Then the termux app lets me run sshd and other stuff.

And actually I think this is a kind of setup people could get into: an Android dist that focuses on self hosting off an older device.

dgellow · 2024-10-10T09:09:53 1728551393

Satire

biztos · 2024-10-11T05:07:09 1728623229

Hold on.

Where are you finding $100 avocado toast?

JCharante · 2024-10-10T07:44:05 1728546245

I have an even easier approach:

- have an iphone/mac w/ icloud+

- go into settings

- add custom email

- get redirected to login to cloudflare

- buy/pick a ___domain for $12

- icloud+ automatically sets up the MX records on the ___domain via cloudflare

- enable catch-all emails in icloud settings

- Done!

Takes about 10 minutes & icloud provides the email hosting without any additional fees

useless_foghorn · 2024-10-10T17:21:07 1728580867

I use Bitwarden coupled with AnonAddy (0) for simple and free on demand email alias generation.

0. https://bitwarden.com/help/generator/#username-types

echoangle · 2024-10-10T04:41:18 1728535278

Some providers allow you to use Alias emails (I think google redirects mail to [email protected] to [email protected]), and if you use your own ___domain, you can just use a catchall redirect and enter a random address ([email protected] which goes to [email protected]).

beAbU · 2024-10-10T07:02:13 1728543733

1/ Buy a ___domain of your choice 2/ Register an account on Migadu.com and pay them $20/year 3/ Configure your ___domain nameserver with the settings provided by Migadu 4/ Done.

meindnoch · 2024-10-10T10:27:28 1728556048

1. Register ___domain on Cloudflare

2. Configure a catch-all forwarding address to your private GMail

Done.

drsim · 2024-10-10T04:43:45 1728535425

Many providers support plus addresses like [email protected]. Servicename can be anything and doesn’t require any setup.

duggan · 2024-10-10T06:56:46 1728543406

The +, however is just a comment delimiter.

All a service provider or malicious actor has to do is simply not include it when storing or publishing it to evade tracking.

Stripping it is not uncommon for services to prevent duplicate accounts.

buildsjets · 2024-10-10T19:06:53 1728587213

Register an account on spamgourmet.com, move on with life.

LtdJorge · 2024-10-11T08:09:52 1728634192

Purelymail allows it

ranger_danger · 2024-10-09T23:12:48 1728515568

How do they get a hold of all these leaks so fast?

Aachen · 2024-10-09T23:16:10 1728515770

Voluntary sharing, since afaik they don't pay the criminals to get the data. Either the criminals share it directly (fat chance, usually), or someone else bought it and shared it either publicly, privately with HIBP, or privately with someone who then reported it to HIBP

How this specific instance unfolded, time will have to tell. The leak may have occurred in 2020 for all we know at this point

steffanA · 2024-10-09T23:34:23 1728516863

There is a strange dynamic between the threat actors who conduct these breaches and researchers.

When not used for extortion and for "status" in the hacking community, they share them with researchers (commonly HIBP) to warn people about a site's security and so that site is forced to fix things.

Definitely a strange dynamic.

lazide · 2024-10-10T03:16:48 1728530208

A form of ‘counting coup’ I imagine. [https://en.m.wikipedia.org/wiki/Counting_coup]

crtasm · 2024-10-10T00:02:49 1728518569

"Breach date: 28 September 2024" - I'm assuming they've checked with some recent signups to confirm the timeframe.

https://haveibeenpwned.com/PwnedWebsites#InternetArchive

maltris · 2024-10-10T12:12:43 1728562363

My question is: How did Scott Helme end up with a password hash that features his own name?

jgrahamc · 2024-10-10T13:05:21 1728565521

He didn't. If you break down that field you see:

    $2a$
    10$
    Bho2e2ptPnFRJyJKIn5Bie
    hIDiEwhjfMZFVRM9fRCarKXkemA3Pxu
    ScottHelme

2a = bcrypt, 10 = 2^10 rounds, Bho2e2ptPnFRJyJKIn5Bie is the 22 character salt, hIDiEwhjfMZFVRM9fRCarKXkemA3Pxu is the 31 character hash value, and then there's ScottHelme. Best guess is that the archive.org folks just appended the user name to the stored hash. Maybe once upon a time they didn't have a username column in their table and this was a creative way of adding it.

Funes- · 2024-10-09T23:12:56 1728515576

Friendly reminder to generate a unique password for every account you create so database leaks like this one don't bother you (besides on the site they're used).

AStonesThrow · 2024-10-09T23:40:03 1728517203

https://xkcd.com/2176/

paulnpace · 2024-10-10T12:27:34 1728563254

I think pretty much the same argument for old-world POTS. While nothing was encrypted, nothing was recorded and someone had to physically access the local copper, which in reality provided more privacy than the future (today) where everything is recorded forever and you can bribe, extort, hack, blackmail, or just for fun leak everything recorded.

voiper1 · 2024-10-10T03:29:08 1728530948

I hadn't seen that one, I love it!

JohnMakin · 2024-10-09T23:18:34 1728515914

account42 · 2024-10-10T10:47:23 1728557243

... is not something your should rely on.

JohnMakin · 2024-10-10T16:30:37 1728577837

… but something you should do anyway.

Having unique passwords isn’t something you should rely on either. Good MFA practices limits the impact of breaches like this. It isn't an either/or thing, do both.

haha112 · 2024-10-10T06:06:30 1728540390

I use login with google, idk if it is safe

ewenjo · 2024-10-09T20:54:26 1728507266

Just noticed the site now alerts this:

> Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach? It just happened. See 31 million of you on HIBP!

mewpmewp2 · 2024-10-09T21:52:19 1728510739

Jokes on them... I'm already on HIBP countless of times...

jsheard · 2024-10-09T21:55:30 1728510930

It's all good, as long as you're not in that recent AI Girlfriend breach which exposed a ton of users who were trying to coax it into generating CSAM images.

https://x.com/troyhunt/status/1843788319785939422

mrkramer · 2024-10-09T22:07:58 1728511678

“I went to the site to jerk off (to an adult scenario, to be clear) and noticed that it looked like it [the Muah.ai website] was put together pretty poorly,” the hacker told 404 Media. “It's basically a handful of open-source projects duct-taped together. I started poking around and found some vulnerabilities relatively quickly. At the start it was mostly just curiosity but I decided to contact you once I saw what was in the database.”

What a nice guy.

rpmisms · 2024-10-10T03:10:40 1728529840

True penetration testing.

account42 · 2024-10-10T10:58:25 1728557905

Well, only success with one kind.

throwaway73583 · 2024-10-10T02:39:21 1728527961

Not sure if you're being sarcastic or not, but pentesting is not a particularly evil activity — and you often have to look at data to see if you actually found something.

What is evil is the way that he's ensured that the predators in the dataset will never face any consequences by making the data available to HaveIBeenPwned, making it trivial for predators to protect themselves (the method through which this is possible intentionally left as an exercise for the reader), and making the data available to a news website for...some reason, but it's bound to ensure that the vulnerability will be patched out quickly and no one else will be able to access the data.

I find it much more likely that this hacker who sought out a website for uncensored AI erotica isn't actually a good guy, and might even have something to hide within the dataset. Hopefully, I'm wrong and we'll see more of this.

lazide · 2024-10-10T03:19:24 1728530364

How would that protect predators?

urbandw311er · 2024-10-10T10:46:38 1728557198

Did you miss the joke? Parent poster means penetration as in penetrative sex

to-too-two · 2024-10-10T03:06:02 1728529562

I'm also on HIBP over 10x. What are we supposed to do? Create a new email address for every service we sign up for?

I don't know what the best practice is for keeping our personal data safe anymore.

perching_aix · 2024-10-10T07:25:20 1728545120

> Create a new email address for every service we sign up for?

Exactly that, yes! Various services like icloud or proton offer "hide-my-email" addresses, or you can use any email service and just leverage a dedicated email aliasing service like SimpleLogin (paid but cheaper).

This way your email addresses are always random, and since these are shared services, the fact that it's random doesn't identify you either. In proton's / simplelogin's case, you can even set the display name used and email first, so from the outside it's not going to appear as strange, or have any real limitations.

If you think about it, modern email services don't really allow for easily testing if an email address is valid or not, so pretty much the only way your email is ever found out is if you share it on. So never share it on. Always share an alias instead. With automated systems, you may even want to rotate it every so often, so that if there's a leak, you can identify not just who leaked, but also roughly when.

Fixed identifiers, like an email address, are terrible, as their lifetime is always significantly longer than whatever context they're being used in for.

BobbyTables2 · 2024-10-10T03:29:28 1728530968

Using unique email addresses makes phishing attempts extremely obvious…

(No, this official looking email from my bank is fake since it was sent to [email protected] …)

wiredfool · 2024-10-10T08:04:26 1728547466

I get a ton of "This is your email administrator -- your email password needs to be reset" to github@mydomain

account42 · 2024-10-10T11:00:29 1728558029

Hey at least after they fill your account up with spam they also send you warnings that you are running out of space.

jmb99 · 2024-10-10T07:14:52 1728544492

Truly unique email addresses and passwords per service is the strongest approach, but there may be alternatives. For instance, Gmail allows [email protected], which will save you from the lowest hanging fruit (block the +tag when it’s compromised to prevent the laziest spam from reaching you). iCloud also allows automatically generating a new email address that forwards to your inbox for a new account when using iCloud Keychain (possibly when using other password managers too, but I haven’t tried).

DoctorDabadedoo · 2024-10-10T13:17:05 1728566225

Gmail's +tag (and the .) is nice in theory, but terrible in practice. It's super easy for malicious actors to just drop them and there are a few services out there that simply are not able to work with the +tag, potentially getting you locked you out of your own account. Not gmail's fault, but I would recommend against using it.

varenc · 2024-10-10T06:33:19 1728541999

> Create a new email address for every service we sign up for?

Yes! Just get a ___domain and have every email it go to you. Mine is something like “@super-secure-no-viruses.email”

account42 · 2024-10-10T11:06:20 1728558380

There are probably people that would sign up for such a mail. Like urlify.io and other similar URL "shorteners".

megous · 2024-10-10T18:21:11 1728584471

Yep. ~300 addresses on my ___domain, 0 breaches across all of them on HIBP ___domain search over >6 years.

I guess internet security is not as bad these days. :)

lazide · 2024-10-10T03:18:39 1728530319

Password manager + unique password per site + 2FA for anything of value.

nxobject · 2024-10-10T16:39:02 1728578342

And my SSN's probably available for purchase with 9 types of crypto, too.

mendym · 2024-10-09T21:08:19 1728508099

I assume that if this is a bad actor, then account email/name will be leaked?

uticus · 2024-10-09T20:59:04 1728507544

Is it a genuine alert, or hacking artifact?

Sometimes with friendly / attempt-at-humorous error messages it’s difficult to tell

jrochkind1 · 2024-10-09T22:28:46 1728512926

I feel like it's safe to assume the official Internet Archive would not write a "friendly"/attempt-at-humurous/unprofessional/confusing/delivered-by-popup message advertising a devastating security breach. Oh also while announcing that nowhere else.

Obv an attackers ability to insert a message does imply a breach beyond a DoS. But I am pretty confident that message was not from the IA.

n_i_k_h_i_l · 2024-10-09T21:02:53 1728507773

It's a literal window.alert()

PLenz · 2024-10-09T21:45:32 1728510332

But was that code placed there by IA or by the malicious party?

abracadaniel · 2024-10-09T21:50:19 1728510619

Verge reports someone has taken credit for an ongoing DDOS against IA. "An account on X called SN_Blackmeta said it was behind the attack and implied that another attack was planned for tomorrow" https://www.theverge.com/2024/10/9/24266419/internet-archive...

dang · 2024-10-09T21:52:22 1728510742

Ok, let's switch to that link. Thanks!

Submitted URL was https://archive.org/.

silexia · 2024-10-10T11:26:30 1728559590

The verge generally is clickbait, another site choice would have been better.

dang · 2024-10-10T19:26:00 1728588360

That class of sites generally is, yes. But on HN we go by article quality, not site quality (https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...) and I didn't see a better specific article on this. If there is a better one, we can change the link again.

varun_ch · 2024-10-10T08:15:06 1728548106

This bad actor has videos of them supposedly “ddosing” Spotify by pinging 1.1.1.1 in two terminal windows on their Twitter.

Is there any link between them and the real attack or are they just unrelated people claiming credit for it?

seanw444 · 2024-10-09T21:47:01 1728510421

Sounds snarky to me. I'll bet it was the malicious party.

whimsicalism · 2024-10-10T15:14:02 1728573242

it wouldn’t be a window.alert if it were IA

EKSolutions · 2024-10-09T21:10:03 1728508203

It looks like someone has compromised one of their subdomains for Polyfill

Update: Subdomain seems to be returning normal responses again now.

Aachen · 2024-10-09T21:14:45 1728508485

You mean the IA included some JS polyfill from a subdomain and that's what's compromised / where the alert is coming from?

mendym · 2024-10-09T21:17:00 1728508620

Yup.

https://news.ycombinator.com/item?id=41792651