Make Sure DSA Signing Exponentiations Really are Constant-Time
"...the OpenSSL team committed two code changes relevant to this work. The first adds a “constant-time” implementation of modular exponentiation..."
"The execution time of the “constant-time” implementation still depends on the bit length of the exponent, which in the case of DSA should be kept secret [12, 15, 27]. The second commit aims to “make sure DSA signing exponentiations really are constant-time” by ensuring that the bit length of the exponent is fixed."
"...While the procedure in this commit ensures that the bit length of the sum kq is fixed, unfortunately it introduces a software defect. The function BN_copy is not designed to propagate flags from the source to the destination. In fact, OpenSSL exposes a distinct API BN_with_flags for that functionality..."
"In contrast, with BN_copy the BN_FLG_CONSTTIME flag does not propagate to kq. Consequently, the sum is not treated as secret, reverting the change made in the first commit..."
Exploitation requires a local 'spy' process recording timing signals while the handshakes are running. I assume this is an unprivileged process, otherwise wouldn't the key be directly accessible?
Yes, the posited attacker is an unprivileged spy process sharing a cache with the victim. The FLUSH+RELOAD cache-timing algorithm they use relies on a shared cache. See section 2.2.
Of course, there may be other ways to extract the same data remotely. Bernstein's earlier paper[1] demonstrating cache-timing attacks on AES over the network is an example. He sent many packets of different sizes to evict different lines from cache. Compared to FLUSH+RELOAD, Bernstein's technique is extremely low-resolution; I don't believe anyone has ever demonstrated it against a typical, real-world server configuration.
"...our client saves the hash of
the concatenated string and the digital signature raw bytes
sent from the server. All subsequent messages, including
SSH_MSG_NEWKEYS and any client responses, are not required
by our attack. Our client therefore drops the connection at
this stage, and repeats this process several hundred times to
build up a set of distinct trace, digital signature, and digest
tuples. See Section 6 for our explicit attack parameters.
Figure 3 is a typical signal extracted by our spy program in
parallel to the handshake between our client and the victim
SSH server."
This inclines me to believe that an attack can be executed via connections to a vulnerable SSH server, not from a local process on the SSH server.
This is the only point where I wish the paper was more clear...
"Similar to Section 5.1, we wrote a custom SSH client that launches our spy program, the spy program collects the timing signals during the handshake. At the same time it performs an SSH handshake where the protocol messages and the digital signature are collected for our attack."
I think that 'spy process' must be local to observe the low-level cache timings they are using to perform the attack, but I haven't found anywhere they come out and say that explicitly.
Exactly. The paper is frustratingly unclear on what the exploit scenario is.
"...we wrote a custom SSH client that
launches our spy program, the spy program collects the timing
signals during the handshake..."
I have a bad feeling that this means an attacker with a custom client can extract private key information over the network by repeatedly establishing connections with a vulnerable server.
I believe they're talking about a test harness they built, to synchronize collection of cache traces with the execution of the handshake. FLUSH+RELOAD requires a co-resident attacker (they need to be on the same hardware, though not necessarily the same VM).
Oh, and in case you're wondering why was the constant time implementation put behind a flag anyway, the paper explains it's because the constant-time algorithm is slower than the alternate implementation, so constant-time is only supposed to be used when it's actually needed to protect a secret. Of course that opens the potential for not using constant-time when you were supposed to, which is exactly what happened here.
I'm not sure how it was missed that the constant-time code path wasn't actually being used in this case. If you put your security-critical DSA bugfix behind a flag, wouldn't you check the flag actually propagated to the internal function for DSA operations?
I think this would have been a much smarter way to code it! Default slow and safe, require an added flag for more performance. That would make it much harder to end up on the timing-sensitive / high performance implementation by mistake.
Probably the trap was since the constant-time algorithm was the new code, "backward compatibility" would suggest adding a flag to select the new algorithm.
You might be misinterpreting the code. Not all algorithms are vulnerable to this attack, so OpenSSL might be picking the faster version for those specific algorithms, and the constant time version for the others.
As others have mentioned, the attack requires a local process that has a shared cache, so it would also require a server that has a remote code execution vulnerability.
My memory is that there are ways to punch through virtualization that AWS uses and get insight into cache hits/misses of other tenants, but I'm not sure if that applies here.
In that world, you just rent some time on a large AWS instance and sit around mucking with the cache timing waiting for another tenant to do some cryptography.
I can't find the link to the paper now, but my thoughts went to combining these two immediately -- you can get a local process without remote code execution if you just log onto your shiny new multitenant virtual server.
"...the OpenSSL team committed two code changes relevant to this work. The first adds a “constant-time” implementation of modular exponentiation..."
"The execution time of the “constant-time” implementation still depends on the bit length of the exponent, which in the case of DSA should be kept secret [12, 15, 27]. The second commit aims to “make sure DSA signing exponentiations really are constant-time” by ensuring that the bit length of the exponent is fixed."
"...While the procedure in this commit ensures that the bit length of the sum kq is fixed, unfortunately it introduces a software defect. The function BN_copy is not designed to propagate flags from the source to the destination. In fact, OpenSSL exposes a distinct API BN_with_flags for that functionality..."
"In contrast, with BN_copy the BN_FLG_CONSTTIME flag does not propagate to kq. Consequently, the sum is not treated as secret, reverting the change made in the first commit..."
Exploitation requires a local 'spy' process recording timing signals while the handshakes are running. I assume this is an unprivileged process, otherwise wouldn't the key be directly accessible?