My employer produces low-power devices with hardware cryptography built into them. Without the crypto hardware, almost all crypto (including ECC) is too slow for practical use. It's all well and good to have "crypto agility" but that ends when it comes to depending on silicon. So if we're making our 20 year plan for, say, the next generation hardware platform that we're going to invest many millions of dollars and thousands of engineer hours to build, then which PQC algorithms will we select to be built into our platform? It's very unclear at this point.
Certainly we can be flexible in key and cert sizes, but also I happen to live in a world where a 1200 byte MTU actually matters a great deal, so it's easier to just push the requirement for dealing with enormous certificates down the road for the day when we actually have enormous certificates. Future-proofing isn't an issue yet because legacy devices will never be able to do PQC.
> So if we're making our 20 year plan for, say, the next generation hardware platform that we're going to invest many millions of dollars and thousands of engineer hours to build, then which PQC algorithms will we select to be built into our platform? It's very unclear at this point.
We've deprecated SSL 2.0 (2011), SSL 3.0 (2015), and TLS 1.0 & 1.1 (2020). We've gone from "use any >128-bits cipher", to don't use SSL CBC but RC4 is okay (POODLE, BEAST), to RC4 is not okay (Bar-mitzvah, NOMORE), to only use AEAD ciphers.
All in the last ten years.
If you have to add support for crypto acceleration in your products, perhaps look into FPGAs?
One of the premises of DBJ has been implementing algorithms that are efficient with off the shelf hardware. Would you say that that is wrong or if not, would it be better to buy hardware with generic vector units that can be used with newer algorithms that utilize the more common hardware?
The marketplace is clearly willing to implement hardware speedups for crypto, and in the low-power market where consumption of joules is tracked with multi-digit precision, shuttling expensive power operations to specialized hardware that can be powered off when not needed (e.g. after transition into a symmetric scheme) is a great thing. So I have to wonder if chasing off-the-shelf efficiency (e.g. in-CPU) is maybe not the right thing. If I have to power up a beefy CPU just so that I can do a key exchange or validation every few months, and pay for that the entire time it is not being utilized, that's just not a good power equation.
One of the benefits of SIDH and friends is the ability to re-use some existing ECC hardware. This is a very good thing for key exchange, but the signature problem remains challenging with enormous keys or signatures.
Is that a pertinent question? The availability of devices with integrated cryptography is very, very low due to ITAR. Perhaps the only thing I have encountered is a bluetooth controller.
Many things are not space constrained as they are cost constrained. It would be easier to put in a core high power enough to get acceptable performance for the one connection it needs to service.
It will be more expensive, probably in terms of development work. Will people do it? Probably not, but people weren't doing security right anyway.
We're not talking about desktop machines or even mobile phones. I'm talking about specialized sensors that are sold to big customers who deploy millions of them at a time. A 10-cent increase in the cost to manufacture easily results in millions of dollars lost over the lifetime of the product's sale and operation. We also sell battery operated devices expected to continue operating flawlessly for 20 years, on a small clutch of AA batteries. So no, we don't just "put in a core high power enough" because that boosts the production cost, eats into the power budget, and causes us to lose contracts.
We are absolutely watching the PQC space, but we absolutely will not move at all, beyond experimental toys, until NIST is done their first round and maybe not even then if there aren't any actual QCs around doing real work.
Also, pull out your wallet. I bet you can find several devices with embedded crypto. All of my debit cards, for example.
Yeah, I know all of this. Smart cards are kind of the exception because they aren't reprogrammable. But security costs money. As you've made it clear your business and customers prefer cost to security.
So they prefer cost to security. That's fine, most people do. If being quantum resistant was a priority they would figure out a way to do it and it may be similar to what I described, or not; but it could happen without hardware implementation if it was truly desired.
Nearly every SoC you can buy today has hardware accelerators in it, from STM32s up to Xeons. You have to be looking at really tiny, generally pretty old micros before you literally don't have any.
On top of that, hitting hardware speeds by putting in faster cores just isn't a thing for most parts. It's pretty easy to get 8-9x throughput wins on many primitives with a hardware accelerator, but getting a similar improvement just by getting bigger chips is often impossible and always expensive.
> Nearly every SoC you can buy today has hardware accelerators in it
True, but few are full-featured HW acceleration SoCs. Most support a few operations like for instance AES-ECB and maybe AES-CBC but if you want AES-CCM or AES-GCM you still need to implement parts of it in software. The HW may be super fast at ECB:ing many blocks of memory but the setup cost is steep so when you need to ECB just a single block (for your counter in CCM) it buys you very little performance gains over just ECB in SW. (Of course what you do then is setting up several counters in a larger block of memory, after each other, this is ok because the counters are just increments, and you ECB a bunch of blocks. Next you need to solve how to do the same to get CBCMAC with just CBC HW...)
This is just moving the goalposts. First it was "crypto accelerators are rare because ITAR", now it's "crypto accelerators are rare because they don't buy you much". Neither is true.
Crypto accelerators are extremely common, including those that implement full cryptosystems or even complete protocols. Nearly every wireless part will have them (especially for CCMP), as well as basically every modern+common consumer device SoC (eg, all Qualcomm, Samsung, Apple, AMD, and Intel parts). Several of these actually have overlapping accelerators for eg memory encryption or wireless (full protocol) and acceleration instructions like those for ARMv8. And they are there because they work.
Setup cost is a thing, but A) is largely paid when you rekey and therefore rarely for most protocols, B) is acceptable in many protocols because you can interleave other operations to prevent port contention without sacrificing throughout, and C) is often buried by the cost of a very small number of blocks, or even just one.
He didn't move the goalposts and usefully expanded on my point. Those devices you're talking about notably adhere to other external standards and are not typically user reprogrammable (where user is the integrator). Also important is that I would not consider them secure in general due to the standards they implement. You also certainly realize that their power consumption, when present, massively dwarfs the type of processor we were first discussing?
By the time you get to the ARMv8 accelerators, yes, you're going to exactly the same place I was arguing we should go with my original comment. There's actually a number of primitives that could be reused for various systems.
The original claim was that these parts were rare because of ITAR. They aren't rare, and ITAR doesn't have much to do with where they're present or absent. Shifting the argument to a different point about a specific accelerator or specific class of parts is exactly as I said: moving the goalposts.
The question of whether they're user programmable or not is nearer to the mark because EAR cares about it, but it still doesn't present a formidable barrier-- at least, I've been shipping parts with crypto accelerators at various levels of user configurability for a long time, and so has everybody else.
> Setup cost is a thing, but A) is largely paid when you rekey
Well, it depends on the crypto HW. Some HWs are designed for "throughput", which is completely useless for ECB but looks good on specs ("Our HW AES 10MB/s!"). So you set it up with src, dest and key pretty much as you setup your typical DMA transfer, only you almost never want to encrypt more than 16 bytes at a time with ECB so it's mostly wasted.
> consumer device SoC (eg, all Qualcomm, Samsung, Apple, AMD, and Intel parts)
We are not all so fortunate that we get to work with such powerful SoC. In my job it's mostly small embedded MPUs.
> B) is acceptable in many protocols
I think we are talking past each other here. I haven't even gotten to the protocol part yet. In order to support a wireless and/or network protocol you will need better building blocks than AES-ECB. You need AES-GCM (or at least AES-CCM). Not to mention ECDSA or RSA(>=3072)...
1/ They implement the expensive parts of a primitive for you and let you chain them together. This is how AES-NI and the ARMv8 crypto extensions work. Performance for these is generally measured in terms of cycle latency, or with a reference piece of software in cycles per byte. Common values for cycles per byte are anywhere from about 0.2 to 30. Much higher than that and people will start to go look at software as an option. You tend to see these on beefy systems with out-of-order cores.
2/ They implement a primitive for you, eg AES-ECB or SHA256, or more rarely AES-GCM and similar. These can then be chained together as with the above to build even higher level primitives like AES-CTR or AES-CCM, or they can be used as-is. These are usually found on micros as additional selling points, and therefore show up just above the bottom of most manufacturers' product lines as an upsell. These are typically measured in something like MB/s throughput, and I assume they're what you're focused on.
3/ They implement a full protocol, like TLS, CCMP, or secure boot. These show up on things that might more properly deserve the term SoC rather than microcontroller, largely because they tend to be attached to high-speed I/O. They generally aren't measured for cryptographic performance but rather for the performance of the implemented protocol.
In my mind, all three of these are using crypto accelerators. Taken together it is extremely common that a part will have one or more of these, and I'm not sure if we're still disagreeing on one or both of those points.
Regarding ECB, I don't know what you mean. Almost nobody uses ECB alone (thank goodness). Even if they have an accelerator for it, it's usually used to implement something like CTR with some software to glue it together (maybe with then yet more glue to do GCM). In that way, those accelerators act like a just-barely-higher-level version of the first type-- and if what you have is the first type of course you'll do that no matter what. This is still an accelerated implementation, it's just not 100% done in the accelerator. Of course, if you're doing that you're very often encrypting more than a block at a time. And because it's quite rare that you will be performance bottlenecked on a small infrequent operation in any context, you generally only do the work to turn on the accelerators when you care about that.
Regarding working on MCUs, I agree there's a minimum size past which you don't get crypto primitives, but overall don't think characterizing those parts as modern SoCs is terribly accurate (which was my claim).
Regarding needing better building blocks than ECB for a protocol... well, no, not necessarily. AES-NI doesn't even give you a full AES primitive, and yet it's extremely widely used.
Yes, my main experience is with 2) and these are pretty "modern" (as in recently released MCUs) that support AES-ECB (and maybe a few more in HW). These are not ARMv8 but Cortex-M level MCUs.
The problem and the point I'm trying to make is that a few platforms implement their ECB support in such a way to make it almost useless as a building block. They do not do it as processor instructions the way it's done in x86 (the right way IMHO) but instead it's implemented in a separate co-processor that you program in a similar manner as you setup a typical DMA-transfer. If you aim to encrypt 1KB or more the setup cost for this is negligible and you can get a comparatively good speed. However as we both agree there are very few cases (if any) where you _actually_ want to run ECB over 1KB blocks at a time. When you want to build something like CTR (or CBC), what you need is a fast way to ECB _a single_ AES block (i.e. 16 bytes). With this kind of solution the setting up of the co-processor eats up almost any gains won by doing ECB in HW compared to doing ECB in SW because the cost of the setup (it's I/O after all) comes close to the cost of a SW only ECB of 16 bytes.
Hmm? With CTR you usually just want to fill a long buffer with the appropriate counters and then shove it all through the accelerator. The resulting stream can then be used until exhausted by whatever higher level primitive you're working with. Obviously there's a trade-off in sizing the buffer correctly, but dozens of blocks would be more typical than one.
Yes, and that is what I said in my very first post in this thread. Still you need to handle the counters in SW, do the XORs in SW (unless you have some HW that does that for you as well) and then if you want CCM you need to solve CBCMAC (maybe you have CBC in HW but then there's the memory trade-off again). If you want GCM you need to do BigInt muls (Cortex-M MCUs do not support 128 bit muls). So either way you end up doing pretty substantial parts of it in SW which limits the usability.
> Nearly every SoC you can buy today has hardware accelerators in it, from STM32s up to Xeons. You have to be looking at really tiny, generally pretty old micros before you literally don't have any.
Well... the SoC in Raspberry Pi 4 doesn't have one. Although it does have enough CPU (and in theory GPU) oomph to still do crypto at reasonable rates, AES-128 at 85 MB/s per CPU core.
Not that I know of, but they're so cagey on details for those parts I wouldn't be surprised if it did and they just hadn't documented it. Certainly lots of quasi-similar boards like the espressobin have them (which I like better for the topaz switch anyway).
That is an exaggeration in my opinion. Many µC don't have a real use case for encryption as well as many sensors.
Maybe you meant that with really tiny, then forget about it. But I would think that there are a lot more units of these tiny chips sold compared to a fully featured 32bit ARM processor.
That covers AES and hashing. Try generating a 2048-bit RSA key on a Cortex-M. It will take minutes. ECC is thankfully more performant on resource constrained devices.
Certainly we can be flexible in key and cert sizes, but also I happen to live in a world where a 1200 byte MTU actually matters a great deal, so it's easier to just push the requirement for dealing with enormous certificates down the road for the day when we actually have enormous certificates. Future-proofing isn't an issue yet because legacy devices will never be able to do PQC.
The premise is not broken at all, for us.