On top of enabling indexing, it reduces the amount of data lost in the event of data corruption — something you get for free with block-based compression algorithms like BWT-based bzip2 but is most of the time missing from dictionary-based algorithms like LZ-based gzip.
I don't think many people use that last property or are even aware of it, which is a shame. I wrote a tool (bamrescue) to easily recover data from uncorrupted blocks of corrupted BAM files while dropping the corrupted blocks and it works great, but I'd be surprised if such tools were frequently used.
Why do you think I wanted to add hashes and encryption at the block level? :)
I’ve had to do similar things in the past and it’s a great side-feature of the format. It’s a horrible feeling when you find a corrupted FASTQ file that was compressed with normal gzip. At least with bgzip corrupted files, you can find and start recovery from the next block.
Even if it doesn't use block-based compression, if there isn't a huge range of corrupted bytes, corruption offsets are usually identifiable, as you will quickly end up with invalid length-distance pairs and similar errors. Although, errors might be reported a few bytes after the actual corruption.
I was motivated some years ago to try recovering from these errors [1] when I was handling a DEFLATE compressed JSON file, where there seemed to be a single corrupted byte every dozen or so bytes in the stream. It looked like something you could recover from. If you output decompressed bytes as the stream was parsed, you could clearly see a prefix of the original JSON being recovered up to the first corruption.
In that case the decompressed payload was plaintext, but even with a binary format, something like kaitai-struct might give you an invalid offset to work from.
For these localized corruptions, it's possible to just bruteforce one or two bytes along this range, and reliably fix the DEFLATE stream. Not really doable once we are talking about a sequence of four or more corrupted bytes.
It's great. I've been using it as a single-user Matrix homeserver for a little more than one year now and haven't had any issue with it whatsoever. It's taking around 100 MiB of resident memory and consuming 0% of the CPU on my small server; I've used chat /clients/ that use ten times more than that.
I've been using UNIX-like machines (mostly Linux) since the mid-2000s and single-user machines have always been the exception rather than the norm everywhere I've been.
Even at home, I've set up multiple accounts for myself (main one, one for closed-source programs, one for gaming I can share with other people…) and for my family (to each their preferences, wallpaper and so on). Having two or three user sessions running at the same time is not uncommon. I'm probably the exception here, but I don't think Podman targets the regular home user anyway.
It's not obvious to me how Podmansh would revolutionize that, but I guess it's nice, I'll try it for sure.
My understanding is still imperfect, but I'll try to provide some info:
Not all messages are encrypted with the same key, so if all of your clients are not connected at the same time, and the same is true for the sender, they can't exchange their keys. When that happens, each client can only decrypt the subset of the messages for which it has the keys. Also note that clients only exchange their keys with other verified clients.
If you look at the “session_id” attribute of the JSON source of the messages, you'll see that for a given session (ie. when the sender is logged in a client), all the messages are decrypted (which means you have the key for that session) or none of them are (which means you haven't received the key for that session yet).
My (imperfect) understanding is that this is because of how end-to-end encryption works: you not only need to receive the messages (which are stored on the server, so you don't have to worry about them as you can retrieve them when you want), but also the keys to decrypt these messages (which are only stored on the clients, so whether or not they are available depends on you).
Possibly, one of your clients has the keys needed to decrypt one of the messages, but you're using another client which doesn't. Things go back to normal when both are connected at the same time and can share the keys, or when the client of the sender is connected and still has the keys.
If you don't keep your clients connected all the time, you can use a secure backup on the server, so the clients can retrieve the encrypted keys from the server and decrypt them locally.
Not having the keys happens more often if one the parties uses short-lived sessions (like logging exclusively in a private browser window, for example).
I've been using Conduit for 10 months now and I love it. Thank you so much for it!
I've two questions:
- Is there any concern to have with regards to its future when you finish university? You seem to be by far the most active contributor and I'm worried the project is still dependent on how much time you can afford to put into it;
- What is the best way for a Rust / Linux developer to do a first impactful contribution to Conduit? With 155 open issues on GitLab at the moment and no problem really standing out for me as a user, I don't know where to start :p
Thanks!
BTW I hope you land a great job; I'd happily recommend you where I work, but we don't have any office near Dortmund unfortunately… Feel free to reach out to me if Dortmund / remote is not a requirement.
1: I can't say how much time I will have for Conduit, but I think the project is in a good shape and I think it can reach a stable release without me working full time on it, but of course it will take longer.
2: I think a good way to start is to hang around in the Conduit Matrix room and see if any issues pop up. Often these are relatively simple things like "these logs should have more details" and are a good way to get started.
I've used quite a few different window managers (WM) in the past, and I'm mostly sticking to awesome today, which shares with i3 the benefit of being highly configurable (even scriptable), which undoubtedly makes me much faster at navigating my windows than any other WM…
The exact monitor / window / new program to run I want to look at is always exactly one keystroke away, an so are moving, resizing, bringing to top and so on.
This did not happen by just installing this WM, though. I've had to tailor it to my specific needs, and that's still an ongoing process, as needs change over time. The WM makes it possible, but there's still work on the user side.
Also, I feel like the (real) benefits can easily be dwarfed by the inefficiencies of what's in the windows you're dealing with. What's the point of being able to open an app without leaving the home row of your keyboard or instantly focus on the right app window if the first thing you need to do to use that app is to grab a mouse and click in ten difference places? You get most of the benefits of i3, awesome and other similar WMs when you combine them with keyboard-driven programs.
Same here! I have no arthritis of any kind, but these are very thin gloves that do not get in the way when typing (unlike most other gloves) and still make a huge difference wrt cold hands. Got two additional pairs for my parents; they love them as much as I do.
I don't think many people use that last property or are even aware of it, which is a shame. I wrote a tool (bamrescue) to easily recover data from uncorrupted blocks of corrupted BAM files while dropping the corrupted blocks and it works great, but I'd be surprised if such tools were frequently used.