Final? Really? What if I need xattrs and Posix ACLs (or I use Windows and want NT ACLs and streams; or forks under OSX; …)? Hard-coded encryption algorithms also don't seem particularly future safe.
No forward error correction either. It can't be considered "final" without some form of bitrot protection.
And no, storing it on a ZFS or BTRFS volume with error recovery enabled does not count. (They don't use FEC, they use 1960's triplicate storage. Hugely wasteful of space, does nothing to protect against transmission errors and can still be corrupted by two of the exact wrong bits being damaged.)
Storing it on a media that does use FEC also does not count. I want a per-file tunable FEC knob, not one vendor-determined setting. And as history has shown, it needs to be done through FOSS code and not trade secret firmware.
The FEC is implemented in the hard drive firmware. You're going to read back a full correct block or the whole block will fail. You're not going to read back a block with a single bit error, so it is insanity to protect against that at the FS level.
Also read up on RAID levels. RAID5/6 or ZRAID7 uses error correction not duplication.
What about over the net? One of the listed strengths is its streaming capabilities, which implies sending it over the net. Error correction would be important there as well.
I've downloaded and uploaded many files which have arrived incorrectly over the internet, so yes, if you truly want to prevent bitrot, you do want to add error correction.
Because you want to guarantee a certain error margin, regardless of the error detection & retransmit/correct capability of whatever networks/file systems/media it traverses in the interval till you want to read it.
Supposed to. If you do any backups to DVD/BD and still want to get your data back in 5 or 10 years, though, you'd be well advised to do some sort of FEC - burn multiple copies of each disc, generate a bunch of PAR2, whatever.
(You might want to do that for backups on hard drives too. Yeah, maybe the hard drive firmware is supposedly taking care of any errors below the block level and you're not too worried about bitflips, but that just means you'll lose entire blocks and files when you lose something.)
Perhaps they meant seekable when compressed? As in the header for each object has the compressed length of object? Presumably in a meta data header with name, compression format, claimed uncompressed length, date, etc.
Personally I'd like a tool that lets me extract files from TB size archives easily without decompressing everything. Only virtual disk images seem to have that random access functionality. There are ways of using gzip/bzip/xz that use sync points, so you could produce a compatible archive that allowed decompressing metadata bits, though it would suck for many small files.
Tar is not streamable when compressed with gzip. 4Q supports compression on a per-file basis and thus can support streaming and compression at the same time. Zip also supports compression on a per-file basis, but it requires an index and thus is not streamable.
You can stream zips if you don't use compression, and you can use compression on files small enough to hold in memory. I work at Barracuda Networks, and we actually do this every day.[1]
Things it doesn't support: symlinks, posix acls (xattrs). The first one makes it a certain failure for archival use. The hardcoded link to an external crypo service keybase makes it a failure for long term use.
"The final archive format" is a very big promise that 4q doesn't keep right now. It falls short of 7z, RAR and tar.xz, and certainly isn't ready to replace them at the moment.
I'm not too familiar with Coffeescript, but it doesn't seem like a good choice of language to write an archiver. There's no actual draft file format spec I can see, either? But from a first pass, I have the following comments:
Crypto: Encrypted blocks: AES-256-CBC, random IV, with no MAC (!!!). You need to look at that again: that could be a Problem. Hashed blocks: SHA-2-512. Maybe OK (how's length encoded? Look out for extension attacks). That crypto is 14 years old and missing a vital bit: not "modern". Modern choices would include CHACHA20_POLY1305 (faster, more secure, seekable if you do it right); hashes like BLAKE2b (as the new RAR already does); signing things with Ed25519. Look into that kind of thing. You need a crypto overhaul. The keybase.io integration is a nice thought for a UX - but is an online service in invite beta really ready for being baked into an archive format?
Packing: LZMA2 is pretty good: 7z and xz already use that. For a fast algorithm, Snappy is not as good as LZ4, I understand? Neither is the last word in compression. Text/HTML/source code packs much better with a PPM-type model, like PPMd (7z has that, too, as had RAR, but removed it recently), but you need to weigh up the decompression memory usage. ZPAQ's context model mixing can pack tighter, but that's much more intensive and while I like extensibility, I don't like the ZPAQ archive format having essentially executable bytecode.
Other missing features that other archivers have: Volume splitting? Erasure coding or some other FEC? Can you do deltas? (e.g. binary software updates)
You've got some pleasant UX ideas for a command-line archiver (compared to some other command-line archivers!), but sorry, I don't think you're ready for 1.0.
Chrome started flagging SSL where the cert has SHA-1 in the chain, iirc xkcd was used as an example of where this occurs further up the chain than the site's acutal cert.
Just checked, if you look at the rapidSSL CA cert, it uses SHA-1.
I really hope that this does not get mainstream or I will have to install yet another archiver tool...I really don't understand why people use 7zip for example when storage is cheaper than ever. Just use tar and get on with your life.
Tar already has the first two, and even POSIX xattrs (which this doesn't preserve), the third seems useless (seems being the key word here, some people might find it useful), and I'd rather just use a program that will encrypt the archive for me (i.e. have a .tar.xz.enc).
One advantage this could have over the above, is if you can open any file at random, as with the above scheme, you might have to linearly decrypt and decompress the entire archive up to that file.
Technically speaking then, the tar utility cannot compress or encrypt per-file, but the tar format can be used for this, and since we're talking about format then the tar format can accomodate the requirements. It's just that there's no tool doing it at the moment.
(A counter point: while each file could be compress and encrypted, there's nothing in the tar format that explicitly says so, meaning that each file would have to be probed to determine if it was compressed or encrypted)
> One advantage this could have over the above, is if you can open any file at random, as with the above scheme, you might have to linearly decrypt and decompress the entire archive up to that file.
4Q uses CBC and the crypto lib doesn't seem to support random access, unless you manually divide your file into separately encrypted streams.