Instapaper stores only salted SHA-1 hashes of passwords, so those are relatively safe.
--
Obligatory statement on NEVER USING SHA-1 HASHES to make passwords "safe".
Any normal person can brute force millions of SHA-1 hashes (salted however much you want) per second on a GPU.
If the FBI so wanted (although I don't believe they do) I'm sure they could brute force almost every single password in that database. Granted, it's the government and they have better ways of obtaining such information, but if there is someone the FBI is watching on Instapaper's databases and they so wanted, storing the SHA-1 hash of the password all but handed them over to the FBI.
I am now glad my Instapaper password was generated randomly, 16 characters long, and I will now change it just to be safe.
For anyone running a database which stores ussername/passwords, take a look at bcrypt or scrypt. They're millions (no, I am not exaggerating) of time better than SHA-1.
This is more "crypto nerd imagination", a la the XKCD comic. The FBI doesn't care about the encrypted passwords because it has access to all the content in plaintext. And what else would they need the passwords for? Other accounts on other services? They can just confiscate those servers too, where the content is most likely also in plaintext.
So in this case, where the FBI is involve, using a SHA-1 hash poses no extra security vulnerability.
They can just confiscate those servers too, where the content is most likely also in plaintext.
I imagine that many companies are better prepared to deal with the FBI than this data center was. I have a hard time imagining the FBI going into a Google data center and easily walking out with a few racks. But even if that's too optimistic, I doubt the FBI could go about seizing servers for very long. If nothing else, this would eventually piss off big companies who will lobby Congress to curtail the FBI.
In this case since the warrant probably didn't allow for the seizure of Instapaper's servers / data you run a serious Fourth Amendment risk of any evidence within being inadmissible. That said even if it is inadmissible the FBI now know things they might not have known before. There is the obvious point that there is very little likelihood of any direct evidence of a crime in Instapaper's data, there maybe indirect or circumstantial evidence though.
"So in this case, where the FBI is involve, using a SHA-1 hash poses no extra security vulnerability."
meeeh...
remember the fbi is not a person, it's an organization. the org can have bad actors in it who might be able to access the encrypted passwords but not be able to confiscate servers.
also, confiscating a server(s) is much more visible / detectable...
Modern consumer video cards can do billions per second now. You might as well just store them in plaintext instead of using SHA1/MD5 with or without salting. :/
Secure password hashes don't protect users, and particularly not users who use one-time effectively-random passwords.
Secure password hashes protect application developers from the disclosure of hundreds or thousands of user passwords from their database. It allows them to attest to their userbase "your password is cryptographically stored in a manner that makes them hard to break even by dedicated hardware; you should consider changing your password if it's weak and shared", instead of, "expect to see your password on Pastebin any day now".
To clarify, I assume you mean that using secure password hashes instead of insecure ones does not help users who use one-time effectively-random passwords, because those users are already safe?
That is true.
However, it seems to me that the combination of an effectively-random password and password hashing does protect users, because their password is not effectively crackable in a situation like this. Additionally, there's a tradeoff between how secure your password hashing is and how much randomness users need to put into their password: every additional factor of 1000 in the iterations of the hash saves you a random character or two.
I wish everyone could use complex, unique, strong passwords all the time, but some use cases just don't support it. For example, I have to type my Apple ID into my iPhone/iPad what seems like every 5 minutes in iOS. Without access to 1password or a similar tool, I just can't use a strong password. Even if I did, I couldn't change it as often as I'd like to. FWIW, I wish I could.
My point was that the choice is sometimes not left in the user's direct control. If I thought I could choose an absurdly strong password (e.g., to overcome the shortcomings of the developer's choice of SHA1), I would always do that – except if I'm going to need to enter that password from memory a bunch of times per day.
You're looking at more like 5+ billion/sec today. There was a listing of modern consumer video cards + their hashing capabilities posted on another HN story recently, but I can't find it.
"Hashes" aren't interchangeable, they're an abstract concept. In this case, the thread is talking about the SHA-1 hash algorithm. AFAIK Bitcoin uses SHA-2 w/256-bit digests.
So, actually, in this case the two happen to be related but different - SHA-1 being considered potentially flawed but not (yet) the stronger SHA-2.
Also, the Bitcoin block headers that are hashed are (I think) 80 bytes (640 bits) long, salted passwords probably a bit shorter.
Wikipedia says SHA-256 runs about 2/3 the raw throughput of SHA-1. You can the comparative rate yourself on any nix computer:
$ openssl speed sha1 sha256
-snip-
Doing sha1 for 3s on 16 size blocks: 7760096 sha1's in 2.99s
Doing sha1 for 3s on 64 size blocks: 5502820 sha1's in 3.00s
-snip=
Doing sha256 for 3s on 16 size blocks: 5460366 sha256's in 3.00
Doing sha256 for 3s on 64 size blocks: 3169031 sha256's in 3.00s
-snip-
*
So... about 2/3 faster. I don't know enough about crypto implementation, but I'd guess this ratio would scale roughly to the much faster GPU implementations as well.
Yes, I have 2 6950s, which together average to about 660MH/s on hashkill mining bitcoin, but when I tried MD5, IIRC they averaged to about 3200MH/s. I'd assume SHA1 to be slightly slower, but not much.
If my understanding is correct that's not the issue here. Hashes are meant to be one-way functions, the developer can easily check if a user's password matches the hash, but it should be practically impossible to deduce the password from the hash.
What the user chose as their password should be irrelevant if using a good hash.
[Edit: I stand corrected on the effect of password length.]
SHA-1 is a reasonably good hashing algorithm, but for the sake of argument, I'll talk about an imaginary SHA-4 which is perfect in every respect. It will be a 4096 bit hash function which has no faster-than-bruteforce collisions or preimage attacks or second preimage attacks.
Let's also assume that this perfect SHA-4 function is freakishly fast, say, a million times faster than SHA-1.
Now, even though my imaginary SHA-4 function is perfect in every way, it would be strictly worse to use this for password hashing than SHA-1. Why? Because cryptographic attacks aren't the problem here. The problem is that the entropy of a user's password is very VERY small. So small, in fact, that attacks on passwords aren't done through cryptographic weaknesses, they are done by simply hashing everything someone might pick as a password and asking "did I get it right?". An attacker will repeat this process for a little while, and eventually they'll get the answer "YES, this user chose to make abc123 as their password!".
(Edit: see child comment -- I was responding to something other than what was intended. I'm leaving this here for clarity, but you can ignore it.)
No, not really. Hashing functions aren't designed for passwords, they're mainly used for integrity checks and other uses which need to be fast: why do you think one of the axes the SHA-3 hashes are competing on is speed?
You have your 10gb file and want to send it to your coworker and let him know it's really yours and no one has messed with it. So you run an HMAC over it and then sign it with your private key.
You want it to be as fast as possible. It would be optimal if there was a single x86 instruction called sha4 which did this in the time it takes to do an add.
Hashing is really, really not meant for passwords.
Woah, slow down, I think you whipped up a 4 paragraph reply before you ever got to my last sentence.
Or, go on and tell me more about all the things hashes are used for as if I just fell off the turnip truck.
This discussion is not about checksums on files. It's abotu passwords. And your "perfect hash" in your example about passwords is "freakishly fast." In fact, like the other guy that replied to me mentioned, this is the entire point of the workfactor in bcrypt, right?
Of course, bycrypt is really, really not meant for checksums. Good thing nobody was talking about checksums then.
When you wrote "this fictional hash" I read that as talking about the SHA-4 I made up, not the one you did.
I then didn't respond to the rest of the post because when you said "for password hashing, it's a good start" I again assumed you were talking about my hash function, which is not good for hashing. Yours would be perfectly fine.
Apology accepted, and thank you for adding a lot to the discussion further down the page (the interesting maths related to the probability of collisions on a hashed-hash)
Sure, but if you can run through billions of possible hashes per second then it's possible to brute force the "one way" hash by exploring hashes for short passwords.
1 billion passwords per second is 86 trillion passwords per day. If a hacker wanted to gain access to a particular password then it can become trivial to crack with fairly modest resources. The only thing that protects you is a sufficiently complex and long password.
But there are rainbow tables, or tables of all of the MD5s/SHA1s/<insert favorite hash algorithm> for arbitrary strings. So the time's already sunk in.
8 days for one password is a very short amount of time comparatively (tiny for a botnet). If you use bcrypt, which you can force a certain complexity on, you can get that amount of time up much higher.
As they have the Web code base, you must assume that they have the salt to the hashing... If they actually want to get these passwords, all they have to do is generate rainbow tables using SHA1 and the appropriate salt. We're back to relying on the length and bit depth (range of characters) of the passwords you are trying to find.
but I'm assuming that if they have the code base, the plaintext user names (emailaddys) and the salted password, then they would have whatever the per user salt is.
Right, but if the salt varies per user, then you end up doing a bruteforce on each user's password; it's no longer a precomputation attack. There are no "Rainbow tables" in this case.
However, if you find Hale's bcrypt page (http://codahale.com/how-to-safely-store-a-password/) convincing, and I do, salting really doesn't matter because with modern GPUs you can bruteforce a reasonably-sized alphanumeric password, if the hash algorithm is a general-purpose (read: fast) one.
The solution is not salt, the solution is to use a purposely slow hash function.
I have been thinking about switching everything to bcrypt, but there is definitely way too much confusion about bcrypt vs scrypt, how many rounds to set for bcrypt, etc. What is the definitive source for figuring out what the new standard should be? Does anyone have any links to something that's peer-reviewed and approved for use by someone with enough authority to do so?
No there isn't. You only think that because when geeks discuss anything that involves one or more knobs, a huge debate must necessarily ensue about the proper values of those knobs.
Just use the bcrypt defaults. You will be fine. You will in particular be so much better off than salted SHA-1 that this topic will be mooted. Later on, maybe in 5-10 years, you can re-engage with the debate about what a good cost factor for bcrypt will be in 2020.
In the defense of geeks, this probably follows from the mantra that you should never ever run any command on your system ever without completely and fully understanding what it does and all of its options and etc. etc.
So now people are twitchy about "just use the defaults", especially when it comes to something they don't really understand, like cryptography.
What's a ready to go bcrypt library for C/C++? I mean include headers, link lib / so, and call a function.
I've been looking into this over the past few days, and I've decided to just extract the relevant files from py-bcrypt, and get rid of the compatibility layer.
Thanks. I am just trying to navigate the sea of misinformation that spews forth everywhere about salted hashes vs bcrypt vs scrypt. I can see that lots of people claim that bcrypt is better, but I am not aware of anything about it other than the original paper. Basically, I want to know what the chances are that two months after I implement bcrypt a huge issue with it will be discovered and I'll have to move everything to some new scheme.
There is virtually no chance that, after selecting bcrypt, you will be forced to scramble to replace it in 2 months.
There is no chance that, after selecting bcrypt, you will be forced to scramble to replace it with salted SHA-1 hashes. bcrypt is strictly better than what you're doing now.
Fantastic. One more question: does increasing the work factor automagically upgrade existing passwords in some way? As in, will bcrypt passwords created today be strong enough in 2020?
getsat and tptacek have already answered your question, so I won't rehash that (pun wholly intended), but I should point out that one interesting property of PBKDF2 is that you can increment the work factor (number of iterations).
Thus you could, say, increase the number of iterations every month. All that said, you should still use bcrypt; this is just an interesting property IMO.
That does introduce a security concern though. While it might be hard in practise, if you have a copy of a hashed password iterated 200 times, then a copy of the same hashed password iterated 300 times and have cracked the 200 iteration hash, you could verify the other hash is the same by applying 100 iterations to the hash. To solve this you would want to change the salt whenever you change the password, which involves doing all the iterations again. Then you are no better off then using a non-incrementing solution like bcrypt. The only situation where you wouldn't be able to make a new salt, however, is if the user hasn't logged in for a while (which is quite possible for single-use accounts on websites).
I think I'm right in choosing bcrypt, but one interesting argument against it was that, being slower, it would facilitate DoS attacks. You want the password hashing to be slow to prevent brute-forcing, but, if it's too slow, attackers could supposedly DoS your login system by trying tons of passwords.
I'm not a security expert, and I didn't know what to respond to that. How would one mitigate this problem? Is it even really a problem?
Just like you don't want over 60 requests per second from a client, you don't want them to be able to allow that much log in attempts. Look at Gmail failed login process. A captcha is required after 3 failed attempts is preferable to a "you can't login after this many attempts" that I remember getting on a forum.
One of the fundamental ideas of cryptology is using the right algorithm for the type of data and the length of its required security. If the cost required to break an algorithm is greater than the value of of the encrypted data, you're probably safe.
You can always store the password of the users again and update the crypto used, (more iterations, different digest algorithm). It's never a question of if it will be broken, but when.
Choosing iterations for a PBKDF takes a bit of common sense, yes if you're going to roflscale and think 100000000 iterations is a good idea currently, then you may run into performance issues.
The correct balance is performance vs security and you can only choose one. You want to authenticate the user as fast as possible while also making it unfeasible to recover the data. As with everything, a little common sense and knowledge goes a long way.
scrypt is better than bcrypt, but not by the same margin that bcrypt is better than salted hashes. Salted hashes are a straight-up vulnerability. bcrypt is a best practice.
Note that almost nobody uses scrypt. We don't recommend it, not because it's insecure, but because it's painful to implement for most companies.
But use either. Or just use PBKDF2. All of the adaptive hashes are fine.
I can't count the arguments I've heard centered around what is The One True Way to store passwords... this topic turns every programmer on the planet into an instant Crypto Expert (TM).
STFU and use one. Hell, glib's crypt() lets you pick any of three computationally expensive schemes, so use one of those.
I find this a bit of a misnomer. I understand what you mean in context, of course, but, strictly speaking, bcrypt is a "hash", and "salted" is always good.
I'm just wondering: What if I used SHA-1 a million times on the password, i.e. hashing the hash over and over. Wouldn't that make it much more time-consuming for an attacker? Or am I missing something? The input every time but the first would be a random-looking 160 bit number, so it would be hard to guess. And if the attacker wanna look for common passwords in a dictionary the attacker must hash them a million times, no?
You usually add a salt (an additional string which is stored in the clear, but which makes your local instance globally unique, so the attacker can't precompute value to hash mappings ("Rainbow Tables" [which are faster to make if you have alien technology, from what I've heard]) for all sites.
Bcrypt typically generates and stores the salt with the rest of the hash, all on its own, which reduces the chance for developer error. It's idiot-proof basically.
Of course, the other key thing is to avoid giving attackers offline access to the hash database if possible. Even with scrypt, if you let someone try offline, he will get good results on 100 password attempts per account. Users are often using such weak passwords that being only an online oracle and able to shut down after a number of tries on a password, or at least to do app level rate limiting, is still useful.
So if I'm using SHA-1 already to store passwords, what are my options for moving to a different system? I assume there's no way to rehash the passwords?
1. The next time a user logs in to your system and you verify against the SHA-1 hash that they are who they say they are, recompute the correct hash for bcrypt. Then, delete the SHA-1 hash. It does you no good to have a bcrypt version if you keep the SHA-1 version around.
2. Generate the bcrypt hash from the SHA-1 hash. That is, pretend that the SHA-1 hash is the user's password. This isn't as clean (your password authentication software will then have to do SHA-1 followed by bcrypt) but it means you'll be able to migrate your entire database all at once if you so choose. This also causes a very (very, very) slightly higher chance of password collisions, although there's not much to worry about from that.
Or you can do both, migrate everything to bcrypted SHA for now and then replace these with straight up bcrypt the next time the user logs in.
You ostensibly have a hash method identifier per hash, so just create "sha+bc" and "bc" along with your current "sha".
Also, why is the risk of a collision greater? It seems to me that, if anything, it should be lower, as SHA hashes always consist of a fixed number of bits, and thus aren't as likely to collide when hashed to bcrypt, assuming the length of a bcrypt hash is the same (or larger) number than a SHA one.
Basically, it seems to me that, if they're going to collide, they're more likely to collide at the SHA level, which is a problem either way.
Imagine you have two hash functions F and G, both mapping from the ___domain of integers to integers mod 2^128. Imagine they are perfect in that if you hash all the integers up to some large N, each hash is expected to recorded exactly the same number of times (probabilistically).
Now clearly if I hash a password F(P) and another password F(Q) there is a 1 in 2^128 chance they collide.
Now imagine we do G(F(P)) and G(F(Q)). We first have the chance of 1 in 2^128 that F(P) == F(Q) which implies G(F(P)) == G(F(Q)). However, that is not all!
We now have a new 1 in 2^128 chance that G(F(P)) == G(F(Q)). So we have (about) a 1 in 2^127 chance that the passwords will collide.
But, none of this really matters. Collisions aren't what you worry about with password hashes.
I see what you mean, and I agree that it doesn't matter, but it's an interesting exercise anyway.
I disagree that there's a 2^128 chance that they will collide. Trivially, I can show you a hash that will never collide for up to some N, and that is F(P) = P mod 2^128. This will never collide unless P is more than 128 bits long.
My rationale, above, was that SHA constrains the space to 128 bits. Therefore, for different SHA hashes (ones that haven't already collided), the probability that bcrypt will collide might be smaller (or zero, as in my example above).
In reality it doesn't work like that, I know, but in theory you can't be sure that the probabilities of collision will add (or, well, multiply) up.
Point taken, it's probably true that the probability that G(F(P)) == G(F(Q)) given F(P) != F(Q) is less than 1 in 2^128. But it's probably also true that it's greater than 0.
Clearly it's impossible to be less than zero. So no matter what you do, Defining H(X) to be G(F(X)) will have strictly more collisions than F(X).
The reason I would argue it's greater than zero is that if a function H existed such that H(X) will never collide for X less than 2^128, it would probably have some cryptographic weakness.
Hmm, you're right indeed, since they're additive. I should have said that the second layer is less likely to collide than the first, not than both combined.
I've had the "joy" of doing password migrations in the past when moving authentication systems. We made our auth chain support both methods and then when someone logged in with an old method it stored the password in the new method and deleted the old record. This was on a system with only a few hundred users though, so YMMV.
Convert as sessions expire and people sign in again. You'll obviously need a db column to keep track of which passwords are converted/unconverted, e.g. password_algo.
I currently use a sha hash (with salt) but rehash it x amounts of times. I have changed x over the years to be larger to get an acceptable trade off in computation time.
Why is bcrypt much better than this? Is it because the algorithm is less gpu friendly?
What you describe is basically PBKDF1. If you wanted to make it slightly better, you could go with PBKDF2. It's true that bcrypt is better in some ways, but you're fine with what you're doing now. If you really wanted to improve on things you could go with scrypt which eats memory also, but it's more difficult to get things to work right.
That was my first thought too. Not only does the FBI have the salted hashes, but they also have a copy of the code for the website. So they know what the salt values are. This makes it even easier to brute force the hashes.
You've been downmodded because password hash salts are public nonce values; usually, schemes that depend on "secret salts" are crackpot alternatives to secure password hashes.
thomas, another programmer in the office just asked this question which I haven't a good answer for: Instead of using a known salt stored in, say, a config file, or prepended to the stored hash, why not derive the salt from some substr of the supplied password? His example was, salt would be concat(left3chars, right3chars). And so when the user inputs his password into the system, you just derive the salt using that same consistent algorithm, and supply both that derived salt and the password into the algorithm.
My only answer was "It's always a bad thing to be clever with crypto, just do it by the book" but he asked for more and I couldn't give him a sound debunking (or an authoritative endorsement).
I build systems -- and do it very well -- and all I know about crypto is what I've had to learn to implement other peoples crypto systems.
The purpose of a salt is to randomize the password hashes so you can't easily precompute them. A "salt" derived from the password itself isn't random; it's deterministic. Salts don't add much security, but they do defeat precomputation. The scheme your coworker proposed doesn't do that.
I'm not going to lie and say I was already thinking that, but I did have a notion that, in such a scheme, if somebodies passowrd was "1111111" then your salt + password would be the unimpressive 1111111111111.
But if you don't mind a follow-up, wouldn't it still defeat rainbow tables? Why not?
In your scheme, if your password is "apple scrapple", the hash value is always going to be (say) "f1d2d2f924e986ac86fdf7b36c94bcdf32beec15". An attacker can precompute that and just use text search to find everyone with the password "apple scrapple".
They would still need a rainbow suited to the algorithm that is used to create the salt, or to have it large enough to contain the password+salt value within it. Still means that the entire database can be used with the same rainbow table, however.
Generating the salt from the password completely defeats its purpose. Users with the same password will have the same salt, and therefore the same hash. In your example, an attacker could find which users chose the password "password" by running your hashing algorithm with a salt value of "pasord". Your database would be wide open to rainbow table attacks.
Right, so then the idea is that if they get your DB dumb, and see this salt scheme in your code, they can compute a rainbow table using it and now they have cracked all your passwords in the time it would take to brute force one (well, not really, because it's not as if they'd have to brute force the entire keyspace before they got to the one password they're trying to break, but I think I'm onto the right idea about shrinking down the magnitude of the problem)
But what it WOULD do -- which is what to be honest tricked me about the concept -- is that it would still offer protection from a precomputed rainbow table that knew nothing of your sheme to derive salt from the password. (eg, the rainbow tables that are publicly searchable right now)
I might have read your question wrong, but the whole point of this thread is that passwords stored by hashing with a general purpose hashing algorithm can be easily brute-forced nowadays. Salting just turns your password into a different string and has no substantial effect on the brute force attack. (The attacker is already trying all possible strings.)
A variation on this is to use a random salt with each password that you store. Your auth process then becomes: 1. check the username 2: if the user exists, prepend the salt stored in the user row to supplied password, check against stored hash.
At the very least, you will make it harder for someone to crack all of your passwords by computing one table with a single salt.
If you are using "salts", you have to make them random for each stored password. But if you are doing this securely, you don't care about this detail, because the bcrypt library took care of it for you.
I was thinking more along the lines that knowing how the nonce(s) were added to the password would make it easier to scan for password in a brute force manner. For example, if you know the hashed password would be in the form 'nonce:username:password', it would be easier to know when you found the correct password, regardless of what the nonce is.
(Be prepared for your comment score to visit the grey depths if you attempt to relitigate Coda's blog post here and don't know exactly what you're talking about.)
Why is bcrypt better than simply recursively hashing SHA512 ~2^11 times to produce an equivalent work factor?
Assume wall time is held constant at 1 second per password using both methods: is there an entropy loss or weakness associated with recursive hashing that bcrypt avoids?
What you are referring to is called "stretching". It's a lot better than a simple salted hash, but bcrypt would still be better.
I'm no crypto expert, but I think this is due to the way bcrypt was designed, and their use of a pessimized Blowfish cypher. SHA512 was designed for speed, which is the opposite of what you want with a password hashing scheme.
tptacek talks a little about this in this blog post:
"Bcrypt uses Blowfish instead of MD5. Blowfish is a block cipher with a notoriously expensive setup time. To optimize Blowfish to run much faster, you’d have to contribute a major advance to cryptography. We security practioners are all “betting people”, and we usually like to place our bets on the side that “demands major advances in cryptography”."
Something doesn't make sense in that blog entry you cite. He says:
Now let’s re-explain rainbow tables:
1. take a “dictionary” —- say, of all combinations of alphanumerics
less than 15 characters
2. hash all of them
3. burn the results onto a DVD.
You now have several hundred billion hash values that you
can reverse back to text —- a “rainbow table”.
Alphanumeric usually means either 36 or 62 possible characters. Let's take 36. Then there are 36^14 possible 14 character alphanumeric passwords. (He said less than 15, so we should also consider 13 characters, 12 characters, and so on, so this is going to come out a little low since I'm just doing 14 exactly). That's 6.14 x 10^21 possible passwords.
If you could compute 10 billion hashes/second, that would take 20000 years. (41 million years if mixed case alphanumeric is allowed). Could anyone REALLY make a table covering all 14 character or less alphanumerics in 2007, and fit it on DVD?
I believe there were tables for 14 character Windows passwords then, but due to poor design Windows passwords were in effect treated as two 7 character passwords. You just needed tables that covered the hashes of all 7 character passwords, which is a lot more tractable. Could that be what the author was thinking of?
I think Thomas was simplifying rainbow tables in his post, to make it more understandable. In practice, you wouldn't use all combination of alphanumerics. You would use a dictionary. This greatly restricts the search space.
It takes much longer to compute the hash of a given password, which essentially makes it as if everyone chose passwords with a couple extra bytes of entropy in them.
He probably isn't. I wrote a login system for an ecommerce and b2b site a while ago. Got heavily into the salting/hashing side of things back then. Based on that... I think that most of the people pop-pooing salts in this thread don't know what they're talking about.
This security layer is the only code I've ever written that years later would still cause me to wake up in the middle of the night thinking "oh no! What if an attacker did X, Y and Z??!!"
Note: as far as I know, the security I put on it has never been broken. But it still caused nightmares even so.
No, not at all. Until a few months ago, Instapaper didn't require users to set a password -- you could (and originally were encouraged to) use it without a password at all.
This makes a lot of sense. If more sites storing non-critical data did this we'd have far less password fatigue and people more wary about what they trust to such sites. Just now they see their "password1" as impenetrable security when they might as well have no password at all.
Not having a password has a similar psychological effect as showing the password field in plain text, I think.
Just store in plaintext because I am already assuming you are. All the this talk about sha-1 vs bcrpyt vs scrpyt is nice and all but I have little faith that most companies care about this as much as HN does. I believe that most people are using the default password storage mechanism for their framework which are already known to be easy to break if the database is compromised. But all of that is mute anyways. Unless you have access to the site's source how would you know if they are hashing at all much less which one they are using? The best practice is to use a random password for each site you use. I just don't see any point in having an rememberable password for websites and hashing just leaves a false sense of security as illustrated by md5.
> Just store in plaintext because I am already assuming you are.
No, actually, I don't think I will store plaintext passwords.
> All the this talk about sha-1 vs bcrpyt vs scrpyt is nice and all but I have little faith that most companies care about this as much as HN does.
So what? Just because other people don't do it doesn't mean you don't have to also. Fortunately for us, there are a lot of startup founders here who might read this and learn something.
> I believe that most people are using the default password storage mechanism for their framework which are already known to be easy to break if the database is compromised.
I disagree. I think most people use SHA-1 because they know better than to store plaintext passwords. What they don't know is that it's terribly broken.
> But all of that is mute anyways.
No, it's really not.
> Unless you have access to the site's source how would you know if they are hashing at all much less which one they are using?
There are two problems here. (1) If you have access to the site's password database, there's a really good chance you have access to the entire database, and can look up how they're doing it. (2) Even if you can't lookup how they're doing it, you just try them all and find which one it is. I'd bet you money that if someone's hashing passwords, they're using one of {MD4, MD5, SHA0, SHA1, SHA2, DES}. If, god forbid, they're not using one of those and actually wrote their own hashing algorithm, you have even more to worry about.
> The best practice is to use a random password for each site you use.
For sure, no doubt about it. But what we're talking about here is the best practice for application developers, not the users. The users can't do anything about how their password is stored.
> I just don't see any point in having an rememberable password for websites and hashing just leaves a false sense of security as illustrated by md5.
Or, you know, you could use bcrypt and be secure about it.
> So what? Just because other people don't do it doesn't mean you don't have to also. Fortunately for us, there are a lot of startup founders here who might read this and learn something.
Yes I hope there are.
> I disagree. I think most people use SHA-1 because they know better than to store plaintext passwords. What they don't know is that it's terribly broken.
SHA-1 is the default on django and its easy to break that my entire point. It leaves a false sense of security.
> There are two problems here. (1) If you have access to the site's password database, there's a really good chance you have access to the entire database, and can look up how they're doing it. (2) Even if you can't lookup how they're doing it, you just try them all and find which one it is. I'd bet you money that if someone's hashing passwords, they're using one of {MD4, MD5, SHA0, SHA1, SHA2, DES}. If, god forbid, they're not using one of those and actually wrote their own hashing algorithm, you have even more to worry about.
They might as well be using ROT-13 if they are using any of those. Now with todays GPUs and rainbow tables the passwords might as well be in plaintext. The real solution is site security not password security.
> Or, you know, you could use bcrypt and be secure about it.
For how long? 4-5 years? Who will be maintaining your site then?
> The real solution is site security not password security.
That does not imply you don't worry about it though -- it's defense in depth. In the same way sometimes you'll need to go through two sets of doors locked with different keys to access a secured server room (or anything else, for that matter), it's worthwhile to protect everything you can as best you can.
> They might as well be using ROT-13 if they are using any of those. Now with todays GPUs and rainbow tables the passwords might as well be in plaintext. The real solution is site security not password security.
Exactly my point. That's why you use bcrypt.
> For how long? 4-5 years? Who will be maintaining your site then?
First, the premise of that question is that bcrypt is going to be secure for only 4-5 years, which is entirely wrong. You can modify the work factor on bcrypt as time goes by. I could, for example, make it take twice as long to generate a hash every year. I could have the program do this automatically. As for you actual question, which isn't terribly relevant, either (1) me or (2) the next guy, who I hope will have knowledge about security as well, but if he doesn't, then I just have to hope he'd keep the workfactor increases in the code.
That is fair enough and all your points were very good. However despite the down votes and everything else I stand by what I said. As a user I doubt most devs are as competent as the ones here and I would never trust a site with a password that I depend on. Do that there would need to be third-party auditing to make sure that they adhere to the standards you described.
I guess what I am really trying to say is that the state of web/Internet security is very poor right now and I don't believe bcrypt is worthy pursuit (sorry I am really not trying to troll). Since Mt.Gox was just hacked my salted password is on pastebin and then someone attempted to break into my gmail account but that will never happen because I use two factor authentication.
While it is certainly a good idea, as a user, to assume that the site developers have done things wrong (and therefore choose a strong, random, unique password), it is also a good idea, as a site developer, to assume that your users are doing things wrong (and therefore choose a strong password hashing method).
I am saying that your advice is appropriate for users (who cannot control what the server does) but inappropriate for servers (who cannot control what the user does).
Surely there is a legal precedent which provides at least some framework for what can or cannot be seized during a warrant search? This can't be the first time government agents have mistakenly seized property in an otherwise lawful search.
Also, while I completely understand Instapaper's unwillingness to pursue this through the courts, that is the way our legal system is structured. If you believe you have been harmed in some way by a government action, the courts are the avenue through which you must obtain recourse.
(Not a lawyer, so if I'm wrong about any of the above please correct me.)
It's called the constitution of the United States. If the enforcers don't follow it, your only recourse is the Supreme Court which will probably throw out your claim for national security reasons.
Not necessarily. I (also) am not a lawyer, but the question isn't whether the FBI has the authority, constitutional or otherwise, to seize the servers owned by the target of the warrant, the question is whether they overstepped their bounds in seizing three whole racks of servers. If it's shown they were careless or did not take sufficient caution in their raid to avoid seizing unrelated servers, they could be held liable for damages. In this case in particular, the number of unrelated companies that have been affected by this (and the number of servers present in 3 racks) makes a case for negligence.
Again, though, the question isn't whether they had the right to seize the servers they had warrants for - they did, and you won't get that questioned by any court - but whether they did so properly, and it's not unheard of for a law enforcement agency to get slapped for overstepping their bounds. It's not Common, but it's not unheard of, and it's not a 4th amendment issue either.
Right, but for the courts the question is, did they intend to violate the 4th or did they just screw up? Intent matters - in this case, whether they intended an unwarranted seizure of the server racks or if they just screwed up. Because they had a valid warrant to seize some servers from that data center, I think you'd have a really difficult time pressing the case that they Intended to violate the 4th amendment rights of the other people whose servers were on those racks. Negligence, on the other hand, is a much, much lower bar, much easier to prove in these circumstances, and should be adequate to secure damages - and frankly, I've got a lot easier time believing it was negligence in this case than an intentional violation of the other individuals' 4th amendment protections.
If I get in your car that's just laying there on the street and "borrow" it for a week without telling you anything, do you think the court will question whether I overstepped my bounds? Do you think I can avoid getting slapped?
After all, I do have the right to drive cars, including cars which I don't own.
How is this at all analogous to a warranted search? The FBI had permission to take something - so did they perform due diligence to make sure they took the correct something? That is the question.
So let's say I gave you permission to drive my car. It's green. You hop in a green car near my house, find the keys in the ignition and leave for four hours of errands. (Typically, I'd give you my key and you'd select a car that the key fits, but not today.) My neighbor reports his green car stolen and the police intercept you on your way back into the neighborhood. Did you perform "proper diligence" in selecting the correct car to take?
I have no idea how this situation would play out, but I suspect your own intent (to drive MY green car) plays into the considerations of the prosecuting attorney.
The more I think about it, the more I think this should be treated the same as any of the other thefts of data information to have happened in the past few months. Sony, Toyota, Sega, etc. A potentially hostile group now has a ton of personal info. People should know.
Perhaps even more important, they have a list of hundreds (thousands?) of pages I thought were interesting enough to read later. Seems like a fine base from which to build or enhance profiles of thousands of citizens.
Even better, they can use that information to build or enhance profiles WITHOUT getting any kind of judicial approval or oversight. Yay, Patriot Act (not).
As a real practical question out of curiosity: how would you design their system differently so unauthorized people having only your hard drives couldn't get any data at all?
On my collocated server, I use encrypted LVM for all of my filesystems (except /boot, of course). On my next hardware upgrade cycle, I'm going to install a USB gyroscope (inside the chassis, using one of the front USB headers) and write a daemon that will issue a `umount -lfa && halt -n` if the box is ever moved.
Note that this isn't simply to keep prying eyes off my data; I live near an overdue earthquake fault line. When it does finally give, I should have a (slightly) better chance of the machine coming through intact.
Full-disk encryption. You enter the key whenever the system needs to be rebooted. I know at least one company that does this with all of their US-hosted servers.
Perhaps all their employees live in the US, so they consider it not worth the trouble to fly someone to Indonesia whenever a server in Jakarta needs rebooting.
This would only be protection against thieves that steal your harddrives, if the US government had your drives they can legally compel you to divulge your password.
The company in question is actually located outside of the US, and so are all employees who have access to the key. It keeps its servers in the US only because of issues with cost and latency, I think.
If there was a court order for the decryption keys for your drive and you gave them a key that destroyed it, you would almost certainly be found guilty of deliberately destroying evidence.
>You could always hash the e-mails, although this would make resetting your password impossible.
Not really - everywhere I've gone has asked me for my email to send me my password. Same principle as with passwords: enter your email, we hash it, if it matches a hashed email in the DB, we send a 'reset' email to the address you entered.
I mean literally cutting the data sent into two pieces and each piece entering a different database server in a different country. Then, when requested, pulling both pieces and sending them to users who patch them together with client side script...?
So, basically a RAID array? You'd have to have a fair amount of redundancy, to make sure that if your Dystopiastan server gets taken down, your servers in Freedomia and Libertania have enough data to continue serving your users. (And yet you can't have so much redundancy that a single country could rebuild your database if they take enough servers.)
Not to mention, latency issues would be a problem.
Yes, OAuth tokens contain the Facebook id of the user. It is unlikely that this would matter since session tokens (required for most permissions) expire after an hour (no reason to store those) and sessionless tokens are per-application not per-user.
However, obtaining the application credentials for Instapaper would allow you to limited access to a user account on behalf of Instapaper. Since the credentials for doing this are per-application, Instapaper would likely store your Facebook id in their database. Given a Facebook id, you can easily see the account name by visiting https://graph.facebook.com/$id.
If the hardware is in hand then I don't see any practical way to protect the information. The only way that works is to not store the information in cloud servers.
Presumably most or all of these services will let you rotate your secret key, and Instapaper will do that. Assuming it was done rather quickly, then there would be no practical risk of unwanted access for any of the OAuth accounts. If the services don't let you rotate your secret key, well... they probably should.
I think the OP was unreasonably harsh on DigitalOne (never heard of them let alone have any interests). It is very possible that they are consumed with FBI questioning, gag orders or who knows what else. I would give them a pass for a few days until more detail comes out.
I have no idea whether I’ll ever see the server again
In this case the host probably doesn't know better than him. According to the NYTimes they are a swiss company, they only rent space and connectivity from the data center.
I see people jump up and down accusing their host being a bad host when their websites go down for 10 minutes. The thing is, shit like this happens all the time. Some years ago even Rackspace was taken offline because a truck hit their data center. Bizarre, right? Yes, but it did happen.
The problem with DigitalOne was a complete lack of communication around this event. It was a long time (and a lot of badgering) before any of us learned anything about what had happened. I can sympathize with being busy during a crisis, but total silence for 24+ hours, with no working website, email, status page, or twitter account, is not acceptable.
Yeah, that is what hosting companies often lack: communication with their customer during crisis. I totally agree that DigitalOne should inform their customers about the incident and they handled the case poorly. But, like I said, put all the blame on them is too harsh.
If they are truly blade servers, then they were possibly sharing the same chassis, power supply and backplane. Could the FBI have pulled just the blades in question? Possibly. But I can very easily imagine the entire blade chassis being viewed as a monolithic component that they would want to be able to perform whatever forensic analysis they are planning. They could also have pulled whatever blades they were not after, and left them, but until you replace the chassis, you are dead in the water.
To be clear, the server stopped responding, and the host he is paying for the server has not responded at all. The server could simply be unplugged, or all the network cables were unplugged during the raid. Who knows? I guess "The FBI stole my server is a better headline" though.
In my experience with our leased data center cages, we are expected to fly in to town if we ever need to physically manipulate the servers or even plug things in. The data center employees don't even go into the locked cages.
If the FBI forced open a locked cage, and did stuff in there, I would not expect anything to be addressed until DigitalOne showed up to fix it.
I'm trying to think of an analogy which can explain why this might be reasonable from the FBIs perspective.
Suppose you were using a shared storage space (shared servers, or server farm) with several other dudes. One of them is a drug dealer. One day the police/FBI decide to raid the storage space since the drug dealer has been using it to store illegal drugs.
Is it not reasonable to consider this collateral damage (which, granted, is totally unnecessary) during law enforcement operations?
I'm not saying this is OK in any case, but might this not be a reasonable move by the law enforcement agencies?
It is not reasonable if the FBI does not have a warrant for your servers(/storage space). Instapaper is completely right to call this "theft".
If his servers are included in the warrant because they were suspected of housing whatever it is the FBI was after, and the court granted the FBI the right to seize them, then yeah, it's reasonable.
If he was sharing a physical machine with the bad guys, then yeah, sorry, that's collateral damage. However, if he was on his own separate leased machine, there is absolutely no reason for the FBI to seize it. It'd be like them executing a seizure warrant on one of those self-storage spaces, and seizing the contents of all the adjoining compartments (which the person being investigated would have had no access to) just because.
Do we know what the warrant stated? If it authorized them to take the rack containing the server they were after, then this is legal, if unfortunate.
If the police have a warrant for my apartment, and you happen to leave your backpack and server, your stuff will most likely be confiscated, along with mine, if it interests the police.
I'm guessing they could have asked to take the whole rack as to not have to tell the hosting company about the raid and risk alerting the target. They also did the raid in the middle of the night which shows they were probably trying to avoid alerting the target.
They probably didn't have anyway to know which machine it was just which rack it was. They also probably didn't have to tell the hosting company directly just the facility that they were raiding.
Even if it was indeed a necessary precaution (for which I have doubts), any innocent parties affected by this should be contacted for arranging a proper reimbursement and be issued an official apology, as soon as the operation was completed, and without them having to pursue it.
If he was sharing a physical machine with the bad guys, then yeah, sorry, that's collateral damage. However, if he was on his own separate leased machine, there is absolutely no reason for the FBI to seize it.
The problem is that with blade servers like DigitalOne provided, both of these things can be true at the same time.
I can certainly think of scenarios in which this action was reasonable from the FBI perspective.
I don't like to be in the position of defending the FBI (my own personal and professional relationship with them is complicated), but I think the following situation is plausible (which isn't to say it's what happened, as we don't know):
FBI determines the originating IP address of whatever their investigation is targetting (based on published information, it looks like a "scareware" operation").
FBI determines the IP address is "owned" by an overseas hosting provider, and that the physical servers are in a datacenter in the U.S.
FBI obtains a warrant for the seizure of all associated computing equipment (which may very well include the upstream devices used by the hosting provider).
FBI executes warrant at datacenter, sees that the servers are actually blades in a chasis; takes entire chasis (as reconstructing the data later on may require that the servers be bootable.)
The very last forensic case I worked involved having to acquire evidence from a server which was hosting a web application by a hosting provider. This was a shared hosting scenario, so in addition to acquiring the targeted information, all other customers on the server were also effectively offline (as the server was being imaged, and later as the original hard drives were entered as evidence).
Now, obviously, that isn't the exact same situation as what is described here, but in the event that the servers were blades, I don't think it's outside the realm of possibility to think that the entire chasis would need to be retrieved.
Consider an analogy. The FBI gets a valid warrant for the servers belonging to a company with a street address of "101 Main St, Somewhere, DC". The building at 101 Main St. is a multi-tenant, multi-story, office building.
If the FBI seized all the computer equipment in the entire building or even just the computers on the same floor as the targeted company but belonging to other companies who just happen to be physically adjacent to the targeted company, would it seem reasonable?
I don't think that would be reasonable, but I also don't think that is analagous.
For starters, that hypothetical search warrant is too broad to be executed.
Keep in mind, I'm not saying that I believe that the FBI executed this seizure correctly. I'm saying that based on third-hand limited information, I don't think it's possible to rule out the possibility that what they did was warranted.
If you showed up to perform this acquisition and were able to deduce that the targets you were going after were blades in an HP chasis in a specific rack, and let's say those blades aren't identifiable within the chasis (like oh say, maybe the IP address isn't noted), it might be within reason to take the chasis and all the blades for that specific chasis.
It might also be within reason that if you can identify which specific blades are part of your acquisition, you take those, and also the chasis they are plugged into (but not the other blades, although they are now sitting on a table in a datacenter somewhere, not plugged into anything).
All we know is that customers of that same provider who were stored in the same datacenter were taken offline. Marco doesn't actually know that his blade server was physically taken, he just knows that it was brought offline.
But you've simply pointed out the strength of the analogy. Siezing adjacent blades in a multi-tenant rack is just as nonsensical as siezing adjacent computers in a multi-tenant office.
Physical proximity is simply not a valid justification in either situation.
If the courts and/or the FBI are unable to understand this, the remedy is to get them educated and not to simply accept the consequences of overly-broad warrants or seizures.
I've agreed that the hypothetical search warrant you outlined would be too broad to be enforceable, but I disagree that the search warrant in this case was necesarilly this broad.
I'm not saying it wasn't, I'm saying that it is not a requirement that it was.
I disagree that siezing adjacent blades is just as nonsensical as seizing adjacent computers. I think it's unfortunate, and suboptimal; but I don't think they are the same.
If the search warrant had nothing to do with computers, let's say it was for a silver Motorola Razr. The FBI enters the premises and finds a bucket with ten silver Motorola Razr's. Their job is then to try and determine which specific Razr they are looking for. You can be sure that it's within the realm of possibility that they'd sieze all of them, and then later determine which ones are unrelated.
You can argue, "but then the search warrant should have to be more specific, it should have to have the serial number of the specific Razrs on it", to which I'd agree, that'd be nice. Computer-related search warrants are almost always executed with only the originating IP address and the ___location to which the IP address was established to be at.
Assuming that they took all the blades (which again, we have no idea one way or the other), I agree it would have been nice to know ahead of time that the specific blades associated with the target were X. I'm not sure that the lack of that specificity of information makes it impossible for them to execute the search warrant.
But basically we're lambasting the FBI for something we have no idea if they've even done, without any actual information about the contents of the raid. I'm trying to keep in mind that it's actually possible (even if not likely) that their actions in this raid were not incorrect.
It seems like either the FBI didn't pay attention to the information given to them by DigitalOne or DigitalOne had poor information about where their servers are located.
The picture painted here is that the FBI came in and hastily took a bunch of equipment without making sure they were taking the right stuff. If that is accurate, then it's likely they might have missed a server with data on it that they needed for their case. Moving quickly and causing collateral damage in a relatively safe environment where you actually have the time to triple check your work is inexcusable on all fronts.
I was under the impression that DigitalOne wasn't even informed (they were in sweden or something) until 3 hours after the incident (from the NYT article).
this is not shared hosting. the server taken belonged to instapaper. being located in the same datacenter should not be grounds for seizure.
if you're looking for a metaphor, think about a self-storage facility ([one of these places](http://www.moversandpackers.org/wp-content/uploads/2010/10/s...). imagine you're renting one of those units, and somebody renting a unit on the other side of the yard is a drug dealer. the FBI comes in, and in the process of seizing the assets of the drug dealer across the yard, they also seize all the stuff in your storage unit. There is no way that is reasonable.
it belonged to instapaper. that's what leased means. if you lease a car or house or server or anything else, it belongs to you for the duration of the lease. and more importantly than the hardware, all the information on the server belonged to instapaper.
No it does not. I once lived in a house and the owner went bankrupt, they (the bank) seized the house immediately. How is that even possible if the house was mine that very moment?
I agree that the information belonged to Instapaper.
I think it's always important to remember that the first order of business in a raid is to preserve evidence against deletion or modification. This means that their first task is to remove the hardware from anybody's hands but theirs. At which point they can peruse the data as they are able.
Why did they take an entire rack, instead of a few servers? I can think of a couple of potential reasons.
- VM's, which could potentially live on any physical server in a VM pool
- Insufficient information on which physical servers belong to their suspects
- They just don't trust the colo operators to not be involved, and thus limit the suspect data to the servers they provide.
While I wholly agree that it's unfortunate that Instapaper and Pinboard were affected, it's not an unexpected consequence of having your servers alongside (or on the same physical machines) of people you don't know.
No, the first order of business is to stay within the bounds of the law. It does not matter how solid your chain of evidence is if that evidence is illegally obtained.
It's doubtful the evidence was illegally obtained. The warrant was probably for the hardware, and was probably overly broad allowing for the removal of more than was necessary. That's been the routine since at least the mid 90s. There are plenty of cases where the FBI has walked into a data center shown a warrant and walked out with complete racks of equipment most unrelated to their actual search because the warrant allowed them to do so.
If the warrant allows them to do so then yes it is legal although we should hold judges accountable. Voting in responsible judges is more important than legislatures IMHO as they tend to have a more direct impact on our personal lives. That being said I just see a lot of comments mentioning the imperative to preserve evidence and chain of custody which is important but completely subordinate to staying within the bounds of the warrant. Does anyone here know if warrants can be obtained through FOIA requests? I would sure love to see the scope of the one used in this case.
You'll get no argument from me about holding judges accountable it's an interesting issue though. Was the warrant issued by a state or federal judge? Federal judges are appointed for life and not elected so the people can't exactly kick them out of office during the next election.
I don't think the IT skill required to reliably extract evidence from an arbitrary hosting operation (of potentially arbitrary complexity) is simply "on tap" for the FBI.
If you want to say "tough luck that's just what it costs to collect evidence in 2011", fine, but it's probably not fair to say that the FBI should just naturally have that capability.
In general the FBI is still operating in a pre-datacenter mindset when it comes to evidence acquisition.
It wasn't until 2007 that they updated the Handbook of Forensic Services[1] to no longer require seizing peripherals of suspected evidence. Think about that for a second, that means mice, keyboards, monitors, etc.
The team who worked on this raid ironically is part of the DOD CCC, which is a joint forensic lab setup between the DOD and the FBI (they have two labs, one in Maryland, who would have been involved in this raid, and one in California). That team certainly has some smart folks on it (they're the subject-matter-experts for forensic acquisition at the FBI), but if they've devised special procedures for dealing with datacenter or cloud forensics, they haven't been codified yet into the HFS.
I'm not sure exactly what you mean, but the Defense Department works with other government agencies and non-governmental agencies; and has for quite a long time.
One of these collaborations is responsible for you being able to type that comment and have it be readable by someone on another computer.
As to the specifics of the DOD CyberCrime center, it was set up in 98 to offer training/services to other law enforcement and counterintelligence agencies.
Basically, someone figured that instead of having to have each seperate agency stumble around in the dark dealing with cyber crime, they could pool resources and try to standardize. It's actually a pretty good example of getting rid of beaurocracy.
Yeah well, actions like this give the image of fat guys in suits who hunt and peck at the keyboard and move icons around on the desktop to find where they are hiding that dang data.
Should they get better forensics people? Absolutely.
But, steel yourself: a very good forensics pro would probably have them acquiring expansive warrants for hardware seizures, because very good forensics pros are paid to foresee all the crazy things colluding providers and criminals can do to hide evidence.
I think it's probably safer to proceed with the assumption that any sufficiently motivated government can seize your physical machine anytime they wish.
I should note that I'm not disagreeing with you, I just think there are more important considerations to make before physical ___location of the data.
Of course, then you have to keep careful tabs on that country's politics. The police can't confiscate your data in June, until a new bill is passed in July, and suddenly it's up for grabs.
Furthermore, I'm not sure I'd want to host my data in a country where the police cannot pursue digital criminals.
Like it or not it's still the wild west, and I suspect most people here trust their own ability to protect themselves more than they trust the sheriff who only investigates crimes against the mayor and can't even ride a horse or shoot straight.
The Netherlands, Sweden, etc. provide pretty good coverage and is where a lot of the seedboxes for torrenting are held. If you have something you want the government to have a almost nonexistent (depending on what it is) chance of getting to, go with Russia, China, etc. and other countries that probably don't have the best relations with the United States.
Russia and China are perhaps good examples of places to host something the US government has little chance of getting to, but the governments of those countries are far worse about this sort of thing than that of the States, so I wouldn't want my data to be stored there.
Why isn't Facebook having their servers seized? Google? Amazon? If the FBI is really targeting the "badguys" I'm sure there have been more badguys using facebook/gmail/AWS than any single colo.
Why haven't there been similar seizures of any larger corporate entities? Even if the current FBI practices are valid, should the application of those practices be a function of size/wealth/power? Which servers of Sony's were seized after distributing rootkits?
Good question. No solid answers here, but my guess would be some combination of more redundancy, better and more active lawyers, and the large players not talking about it when it does go down.
I think it may be prudent to begin encrypting all data on disk that can reasonably be encrypted while being able to set up the server remotely so that no one can just snatch your server and get all your data.
This could work by encrypting your database in a truecrypt volume that must be mounted by entering the password. Thus, the data is only ever saved on disk in encrypted form, and the key to access the data is not saved on the disk. Of course, it is still in principle possible for anyone to access that information if they have physical access to the computer while it's running, but at least this makes that much harder.
Truecrypt is significantly slower, especially on the higher strength encryption methods. The program itself has a benchmark in it, so download it and check it out for yourself if you're that curious. (Note that it is relative to your hard drive's speed)
I suspect this would only be reasonably applicable if you manage to reduce disk accesses to the very minimum. I'm not very familiar with these setups, but I assume they slow down disk accesses quite a lot.
if i'm understanding you correctly, you're wrong. it's just another layer in the system (one more function in the mapping from your data to magnetic patterns - a function that's completely reasonable, predictable, etc, just hard to guess). the problem is that you need to enter the password on boot, which makes automated startup difficult.
for example, my laptop disk's main partition is encrypted. i need to enter the password when i boot, but nothing terrible happens if i lose power or the system crashes or whatever.
It would make for an interesting Freedom of Information (equipment) request. "Give me my damn server back." But the damage is of course done.
If you are a voting citizen of the US I recommend you write (not email, write a letter, put postage on it and everything) to your elected congressional representatives and ask that Congress immediately put curbs on the police powers of the FBI when it comes to infrastructure seizures.
i think his Instapaper account was full of stories about Whitey Bulger and his old friends/partners/etc... and this is how they "Big Data"-sifting-found him :)
What's the proper way of storing OAuth tokens in this situation? Given that all the tokens of users and your private key is on the server (even if it's embedded in code), there's no way for Instapaper for keeping those tokens secure in case of a compromise (by FBI or Lulzdudes or anyone).
Seems like Instapaper should change it's private key for, say, Facebook.
I would think encrypting the third-party tokens with the user's password would be a decent start.
When the user's password is verified, it could be used to unlock those tokens and store them in the active session structure in RAM. There'd still be some exposure, particularly in the case of being rooted, but an attacker couldn't just dump the database.
is there a better solution that encrypting data and putting the password in the source? obviously this is for cases where you can't use a hash.
it seems to me that, at least, it would make sense to have the db and web server physically separate in that case (although i guess someone stealing hardware is not normally a common scenario).
Hmm. I've built something similar to instapaper for myself. (Using a native OS X app). People were making jokes at me how I was re-inventing the wheel.
Now I'm somewhat happy having done the extra work. At least the FBI doesn't have my "read later" bookmarks. (Which often consist of the words 'hack', 'malware' and 'reverse engineering'.)
I guess I will reinvent the wheel instead of using cloud services more often in the future.
An in between solution would be better - write an open source version of Instapaper that people could install on their own servers, instead of everyone rolling their own.
I didn't downvote you, but your "If you're the FBI and you want X, you'd simply Y" has nothing to do with reality. In reality, when any law enforcement organization needs digital evidence of any kind, they take everything. There are chain-of-custody rules that require it.
--
Obligatory statement on NEVER USING SHA-1 HASHES to make passwords "safe".
Any normal person can brute force millions of SHA-1 hashes (salted however much you want) per second on a GPU.
If the FBI so wanted (although I don't believe they do) I'm sure they could brute force almost every single password in that database. Granted, it's the government and they have better ways of obtaining such information, but if there is someone the FBI is watching on Instapaper's databases and they so wanted, storing the SHA-1 hash of the password all but handed them over to the FBI.
I am now glad my Instapaper password was generated randomly, 16 characters long, and I will now change it just to be safe.
For anyone running a database which stores ussername/passwords, take a look at bcrypt or scrypt. They're millions (no, I am not exaggerating) of time better than SHA-1.
(Edit: Grammar)