Hacker News new | past | comments | ask | show | jobs | submit login

With parameters as specified by SHA3 it's a lot slower than BLAKE3

Keccak (SHA-3) is actually a good deal faster than BLAKE(1) in hardware. That’s the reason why they chose it: It has acceptable performance in software, and very good performance in hardware.

KangarooTwelve / MarsupilamiFourteen are Keccak variants with fewer rounds; they should smoke BLAKE2 and probably even BLAKE3 in dedicated hardware. Also, they have tree hashing modes of operation like the later BLAKE developers.

The BLAKE family is best in situations where you want the best possible software performance; indeed, there are cases where you do not want hardware to outperform software (e.g. key derivation functions) where some Salsa20/ChaCha20/BLAKE variant makes the most sense. The Keccak family is when one already has dedicated hardware instructions (e.g. ARM already has a hardware level Keccak engine; Intel is dragging their feet but it is only a matter of time) or is willing to trade software performance for more hardware performance.

Keccak code is here: https://github.com/XKCP/XKCP




> they should smoke BLAKE2 and probably even BLAKE3 in dedicated hardware

That's certainly possible, but there are subtleties to watch out for. To really take advantage of the tree structure in K12, you need a vectorized implementation of the Keccak permutation. For comparison, there are vectorized implementations of the AES block cipher, and these are very useful for optimizing AES-CTR. This ends up being one of the strengths of CTR mode compared to some other modes like CBC, which can't process multiple blocks in parallel, at least not for a single input.

So one subtlety we have to think about, is that the sponge construction inside SHA-3 looks more like CBC mode than like CTR mode. The blocks form a chain. And that means that a vectorized implementation of Keccak can't benefit SHA-3 itself, again at least not for hashing a single input. So if this is going to be provided in hardware, it will have to be specifically with other constructions like K12 in mind. That could happen, but it might be harder to justify the cost in chip area. (At this point I'm out of my depth. I have no idea what Intel or ARM are planning. Or maybe vectorized hardware implementations of Keccak already exist and I'm just writing nonsense.)


Is the SHA-3 hardware support already available in existing and widely used hardware?


Last time I looked, it’s available for ARM but there aren’t Intel ISA SHA-3 instructions yet.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: