It looks like they know what they're doing. I was especially curious about deduplication. The way they do it sounds perfectly reasonable:
> MEGA indeed uses deduplication, but it does so based on the entire file post-encryption rather than on blocks pre-encryption. If the same file is uploaded twice, encrypted with the same random 128-bit key, only one copy is stored on the server. Or, if (and this is much more likely!) a file is copied between folders or user accounts through the file manager or the API, all copies point to the same physical file.
Yes, but the first part makes no sense. If the 128-bit key is indeed chosen at random for each file (as it should be), the probability that the same key will be chosen again for a second upload of the same file is effectively zero (1/2^128).
Exactly. Read: There is no real deduplication of data. So if 1 file is reported, there's no way to track all the other copies, or automatically ban a specific file hash.
Then it's easy for law enforcement to force Mega to remove all known versions of certain copyrighted material as they can now prove that Mega are hosting copies of that particular bit of copyrighted material, .e.g.
If they find a ripped version of a movie/ebook/whatever they can just encrypt it using Mega's scheme (which would now derive the key from the data) and get a single version of the file out. They then tell Mega to remove any files that match that encrypted file.
If all files are encrypted with a random key there's no way for law enforcement to do this.
That has a good practical benefit (deduplication of files that most benefit from deduplication), but it doesn't actually solve the security problems at all, it's just choosing to make the trade off one way for large files and a different way for smaller files. If you have a legitimate reason to want privacy against data confirmation attacks then you need what you need regardless of file size.
The whole thing with deduplication is a little bit overblown anyway. You don't want a hundred copies of the same big file, but is that what really happens? Nobody wants to upload the same file a hundred times, especially if the file is very large. Once there is already a copy, passing around a link to it is much easier than uploading it again. So the most common cause for it to happen is when two totally unrelated people upload the same bit-for-bit identical file, which happens, but not so often as to be prohibitive.
And in many cases file-level deduplication is difficult or impossible anyway because users make changes to the files (like editing embedded metadata or pointlessly encapsulating a single already-compressed file into a .rar archive), so the benefits you get from deduplication are not nothing, but there are situations where it is or isn't a reasonable trade off to make against privacy.
They don't seem to do that, though. Note that they claim that it's a random key and that deduplication is "much more likely" to happen when files are copied. If they would derive the key from the data in a deterministic way, they could always dedup and the previous statement (deduplication of copied files is more likely) could not be true.
Based on all the analyses published so far, it does not look like that at all. In your view, what makes it appear that their crypto was implemented in anything resembling a proper fashion?
So it basically dedupes whatever you copy to a different folder on your own account. I guess this is the best they can do without knowing anything about the files (though not really that useful). To get true deduplication, convergent encryption is needed, which reveals more information about what you are storing (e.g. if I store the same file as you I will know what your file is)
> MEGA indeed uses deduplication, but it does so based on the entire file post-encryption rather than on blocks pre-encryption. If the same file is uploaded twice, encrypted with the same random 128-bit key, only one copy is stored on the server. Or, if (and this is much more likely!) a file is copied between folders or user accounts through the file manager or the API, all copies point to the same physical file.