Common LibreSSL porting mistakes

frik · on May 2, 2014

Have you noticed the annoying "blink" (and the Comics Sans) on http://www.libressl.org ?

  <blink>Coming Soon Please Be Patient</blink>

As modern browser don't show the "blink" tag, I looked deeper in the source (CSS):

  blink {
    animation:blink 1s;
    animation-iteration-count: infinite;
    -webkit-animation:blink 1s;
    -webkit-animation-iteration-count: infinite;
  }
  @keyframes blink {
    0%{opacity:0.0;}
    50%{opacity:0.0;}
    50.01%{opacity:1.0;}
    100%{opacity:1.0;}
  }
  @-webkit-keyframes blink {
    0%{opacity:0.0;}
    50%{opacity:0.0;}
    50.01%{opacity:1.0;}
    100%{opacity:1.0;}
  }

Well, later I saw their footer message.

sanderjd · on May 2, 2014

Reminds me of the geocities bootstrap theme[0]. I'm surprised they don't have a scrolling marquee!

[0]: http://code.divshot.com/geo-bootstrap/

drdaeman · on May 2, 2014

That piece of CSS appeared relatively recently. Guess, it's a weird side effect - that page was supposed to annoy web hipsters, not challenge them to restore vintage tag behavior.

Cyph0n · on May 2, 2014

That's actually quite smart :)

Intermernet · on May 3, 2014

They need to add one of these:

http://www.geocities.ws/disseminator1/under_construction_gif...

joosters · on May 2, 2014

I thought /dev/urandom was meant to be just fine to use?

http://www.2uo.de/myths-about-urandom/

brigade · on May 2, 2014

They're probably referring to the bug/"feature" in Linux's implementation, which will happily return data before the system has enough entropy, instead of the sane behavior of blocking on only the first calls to it.

Fortunately major distros work around this bug, so it's only an issue in unusual cases, like cloud VMs.

See also http://sockpuppet.org/blog/2014/02/25/safely-generate-random...

bostik · on May 2, 2014

That link lines out the single most important technical detail:

> FreeBSD’s kernel crypto RNG doesn’t block regardless of whether you use /dev/random or urandom. Unless it hasn’t been seeded, in which case both block. This behavior, unlike Linux’s, makes sense. Linux should adopt it.

It boils down to a tricky and potentially misleading interface. Abstractions are leaky beasts, and if there are many ways to get apparently identical results, we will use the one that most closely aligns with our usual way of thinking.

Security is hard to get right. Cryptographic security depends on entropy, so getting sufficient entropy should be hard too. Right?

Maybe the default answer should be "yeah, right" instead.

sitkack · on May 2, 2014

Your random number generator shouldn't block.

pixl97 · on May 2, 2014

Your random number generator should give random numbers.

mey · on May 3, 2014

It depends what type of random number you need. Having one interface isn't sufficient to describe the types of random applications need. A load balancing system probably doesn't have the same requirements as the RSA private key generation algorithm.

derefr · on May 3, 2014

Would it help to think of it as a (kernel) RNG daemon that you're trying to connect to, that doesn't finish starting up until it's seeded? That's basically what the blocking means, in the OpenBSD case.

sitkack · on May 3, 2014

Which is the correct behavior. `/dev/urandom` should really be the only source of randomness on Linux. Mac [0] got this right, FreeBSD [1] gets this right. I totally agree with sockpuppet. Solving the tabula rasa system boot is a separate issue. Temporarily blocking for seeding is fine, my shouldn't was an RFC shouldn't.

[0] https://developer.apple.com/library/mac/documentation/Darwin...

[1] https://www.freebsd.org/cgi/man.cgi?query=random&sektion=4

Sanddancer · on May 3, 2014

Should your filesystem just start returning a stream of nulls or other deterministic data if the device isn't ready yet? If there's no entropy to pull from, then the kernel shouldn't try to pretend that there is.

ndeine · on May 2, 2014

/dev/urandom is a perfectly fine source of machine-generated randomness. In order to break /dev/urandom (but not /dev/random), you need to find a corner-case where you can break the cryptographically secure random number generator (CSPRNG) only when you can make certain guesses about its seed values. Since making those guesses about its seed values in the first place from the CSPRNG's output is a hard problem, and making an accurate guess about what eventually happened to the CSPRNG's output is a hard problem, this is pretty unlikely.

But let's quote the kernel source on the matter[1], just to be clear:

> The two other interfaces are two character devices /dev/random and /dev/urandom. /dev/random is suitable for use when very high quality randomness is desired (for example, for key generation or one-time pads), as it will only return a maximum of the number of bits of randomness (as estimated by the random number generator) contained in the entropy pool.

> The /dev/urandom device does not have this limit, and will return as many bytes as are requested. As more and more random bytes are requested without giving time for the entropy pool to recharge, this will result in random numbers that are merely cryptographically strong. For many applications, however, this is acceptable.

The real point to be made here is that, yes, /dev/random is theoretically better - but for many applications, letting /dev/random hang to wait for entropy is worse than having /dev/urandom use a CSPRNG in a way that is generally recognized to be secure.

[1]: http://repo.or.cz/w/davej-history.git/blob/d0562c8dc:/driver...

Edit:

I would like to add that the original article is talking about using /dev/urandom to generate long-lived keys, not session keys or similar. In this case, the blocking is sometimes acceptable to generate appropriate entropy, since the fact that the key is long-lived implies that you don't do this very often. The argument for /dev/urandom only holds clout when you are making a tradeoff for non-blocking behavior (which is 99% of the time). As such, there is nothing wrong with being slightly paranoid and using /dev/random if you can afford the time spent collecting entropy.

azinman2 · on May 2, 2014

That's the best explanation of urandom vs random I've seen. Thanks!

dspillett · on May 2, 2014

My understanding is that /dev/urandom is perfectly fine for almost all cases, including session keys and such but (if only for the sake of paranoia) for long lived keys it is worth the potential extra time waiting for /dev/random to serve what you need.

The key to remember is that when the pool has sufficient entropy there is no difference between /dev/random and /dev/urandom, and if the pool is low then there is practically no difference between /dev/random and /dev/urandom - the quality of the PRNG means it is practically impossible to tell the difference between the two outputs (take a few thousand bits from each at a time and see if any statistical analysis can reliably tell the difference).

It is increasingly common for CPUs and/or related chipsets to have a built in TRNG so keeping the entropy pool "topped up" is getting easier by feeding the pool from those using rng-tools. The SoC RPi's are based around has an RNG that pushes out more then 500kbit/s for instance.

stouset · on May 3, 2014

Your understanding is common, is stated explicitly in the manpage, and is unfortunately incorrect.

/dev/random and /dev/urandom both even use the same CSPRNG behind the scenes. The former tries to maintain a count of the estimated entropy, but this is a meaningless distinction. CSPRNGs can't run out of entropy (for instance, a stream cipher is essentially a non-reseeded CSPRNG that works by generating an arbitrarily long sequence of computationally random bits that can be XORed against a plaintext).

There might be a meaningful distinction if /dev/random provided "true" randomness (and could therefore be used for something like an OTP). But it doesn't. Both use the same CSPRNG algorithm.

dspillett · on May 6, 2014

I understand that both use the same CSPRNG and seed source(s) for entropy, the difference is one will block if those sources have not output enough information recently (the "pool count" is too low).

The is some genuine randomness there as the entropy sources are not (unlike the PRNG) deterministic: they take whitened fractional values from I/O timings (time between keep presses & mouse signals, and some aspects of physical drive I/O - the low bits of such timings essentially being random noise if the timer is granular enough).

/dev/urandom uses the CSPRNG in what-ever state it is in, /dev/random waits until it considered the CSPRNG to have been sufficiently randomly reseeded. In cases where the current situation is considered random enough (the pool count is high so /dev/random will not block) you will get the same value from either /dev/random or /dev/urandom.

edwintorok · on May 3, 2014

Assuming it has been seeded with enough entropy, if you just booted and haven't gathered/seeded with entropy yet, then /dev/urandom can potentially give you predictable values, whereas /dev/random would be safer as you'd wait until it has enough entropy. Too bad there isn't a way to tell whether the CSPRNG has been seeded or not.

sitkack · on May 3, 2014

Sometimes you will end up a waiting forever.

dspillett · on May 6, 2014

If you are being paranoid you might prefer to wait for ever for a good random value instead of accepting something you are even fractionally less sure of.

Though practically speaking, that would probably not be acceptable in most (if not all) circumstances.

If you are that paranoid then there are inexpensive true-RNGs out there (free in fact, if your CPU or other chipsets have one that is easily accessible) which can provide enough bits for all but the larger bulk requirements (i.e. generating many keys in a short space of time). You can either use one of them specifically for the process(es) that definitely wants absolutely true random of feed its output into the standard entropy pool.

sitkack · on May 7, 2014

I know, I was thinking there needs to be a local network RNG, maybe it is run by switch and accessible over UDP or raw ethernet frames.

AaronFriel · on May 2, 2014

Probably? It's fine to use /dev/urandom as a seed for random number generators, and for most applications it is safe. But I think within SSL/TLS implementations, there could be reasons to use their own cryptographic PRNG. For one thing, it's easier to reason about in a platform independent way. On modern Linux kernels, /dev/urandom is Probably Safe(tm). But what about everything else? That's where it gets murkier.

dlitz · on May 3, 2014

> On modern Linux kernels, /dev/urandom is Probably Safe(tm). But what about everything else? That's where it gets murkier.

No. That argument is exactly why I didn't just use /dev/urandom in PyCrypto's userspace RNG when I wrote it in 2008. The result was 5 years of a catastrophic failure in certain cases where fork() is used, even though I specifically designed it to cope with fork(). If someone hadn't made that argument, PyCrypto wouldn't have had a catastrophic failure mode that went undetected for 5 years until I stumbled across it: CVE-2013-1445 http://www.openwall.com/lists/oss-security/2013/10/17/3

It is surprisingly difficult to implement a fast, reliable CSPRNG in a crypto library. There are innumerable things that can leak or corrupt your state, which compromises everything. You can leak state as a result of multithreading, fork(), signal-handling, etc., and libraries generally can't cope with that without having complicated APIs that application developers WILL misuse, causing silent security failures for end-users that go unnoticed for years. Plus, since you're still relying on /dev/urandom anyway, it really only gives you another way to fail.

Arguably, there are so few people who understand this stuff that---at least in the FOSS world---we should kill off all but one implementation, so that the few of us who collectively understand how this stuff really works can focus on that one implementation.

paddyoloughlin · on May 2, 2014

Your point seems valid, but from my reading the article appears to be explicitly describing Linux's /dev/urandom as a poor source of entropy.

And it seems to be implying that while use of the arc4random_buf function is platform-independent, its implementation is permitted to be platform-specific.

loftsy · on May 2, 2014

I've had a quick browse through the LibreSSL commits. There is some comedy gold in there.

Check out:

http://freshbsd.org/commit/openbsd/e5136d69ece4682e6167c8f4a...

http://freshbsd.org/commit/openbsd/c862290df5533966091ded390...

http://freshbsd.org/commit/openbsd/a35e815be16befe27ee3f7623...

raverbashing · on May 2, 2014

The big-endian x86 is probably the worse one, but they're all gold

Intermernet · on May 3, 2014

That big-endian bug is a perfect example of OCD in coding.

"I can't ever reach that code path so I'll remove it".

I have caught myself doing this in some cases. I once removed a test to see if the CSPRNG was actually working because the test coverage showed that I could never reach that code-path. I then realised that this needed to be there, because otherwise, if the CSPRNG ever stopped working, the code wouldn't know about it, and (maybe) start using streams of zeroes as it's entropy.

Sometimes you need to remember that hardware can fail, or be compromised, even though in most cases it will just cause the program to crash.

micv · on May 2, 2014

Stuff like this is why you need fresh, objective eyes to review critical code. It's too easy to miss things when you're deep in the weeds.

jbert · on May 2, 2014

Would it be possible to add system tests for some/all of these problems?

e.g. a test which calls explicit_bzero() in a way which would have it optimised out in a platform with a low-quality port.

A reasonably descriptive comment in the header (or failure text) of the test should guide a porter onto the path of wisdom.

(If there is a problem in that the test would need to inspect the output of explicit_bzero(), hence negating the optimisation, it can be implemented as multiple processes).

clarry · on May 2, 2014

How does the other process inspect the memory at the right time? How do you know all the scenarios where some compiler would optimize things out? It doesn't sound like it'd be easy to do a portable & reliable test.

Testing that your entropy source is good sounds harder still.

reallocarray() should be pretty easy to test though.

But how many potential issues will your tests miss? If we had perfect test coverage for everything (and the tests were perfect, or we had test for them...), all software would be 100% bug-free.

Tests might not hurt, but I am not sure trying to cater for braindead porters is a good idea. They might get the idea they're doing it right once they get the tests to pass one way or another...

Reasonably descriptive commentary on the mentioned functions is there in the man pages. That is where porters should look.

jbert · on May 2, 2014

> How does the other process inspect the memory at the right time?

There must be some side effect of the optimisation, otherwise there wouldn't be a problem. Detect that side effect (e.g. write a buffer of memory to disk, check timing of some code, ptrace-attach to the other process and inspect it, trigger a core dump and pick over the bones, code up an exploit which would work if the explicit_bzero() wasn't present)

> How do you know all the scenarios where some compiler would optimize things out?

You only really need to know one case where the compiler will, if the prevent-optimisation compiler magic isn't sprinkled on it.

> Tests might not hurt, but I am not sure trying to cater for braindead porters is a good idea. They might get the idea they're doing it right once they get the tests to pass one way or another...

At least they'd get an idea that something was up. I guess you might get away with:

    #ifndef OPENBSD
    #error "You can't just call bcopy() for explicit_bcopy() - see http://good-description-here why not"
    #endif

clarry · on May 2, 2014

> At least they'd get an idea that something was up.

If they are up for the job, they get that idea when they try to compile the thing and it doesn't because they are missing a function. They will read that function's documentation, and understand it; they may even take a peek at the implementation too, before porting it or implementing their own.

Call me smug but I think porting security sensitive software should be left to people who have a clue. If you have to litter the code with hints and education for people who don't know what they are doing, then you end up with a port that was done by someone who seemed like he might know what he's doing, when there's a good chance that he doesn't. I would rather be able to immediately recognize ports made by people who obviously don't have a clue. So I know what to avoid...

I am all for education, by the way. There are good secure coding guides out there, though having more wouldn't hurt. I just don't believe the approach you proposed is a good one.

Someone · on May 2, 2014

"You only really need to know one case where the compiler will, if the prevent-optimisation compiler magic isn't sprinkled on it."

If you do that, the best you can get is a test that works with one specific version of one specific compiler used with one specific set of compiler flags. It probably is easier to just inspect the resulting binary.

And you may not even get that, as the optimizer may use some fairly complex heuristics to choose whether to optimize away a call, such as register pressure (for example, in a three level deep for loop, it may not make sense to try and get extra stuff into registers)

nitrogen · on May 2, 2014

The process under test could send SIGSTOP to itself at the right time, and a parent process could use one of the wait() variants to notice (or the test process could send SIGUSR1 to the monitor).

clarry · on May 2, 2014

How many systems today get calloc() wrong? I checked the implementation of quite a few open source implementations a year or two back, and I don't recall seeing one get it wrong.

X-Istence · on May 2, 2014

The question is whether they do overflow checks if you pass in two very large integers...

clarry · on May 2, 2014

Exactly. A conforming implementation must do that. I didn't find one that doesn't, although I didn't look too hard either.

mzs · on May 2, 2014

A bit off-topic (sorry) but the recent decision to allow ANSSI FRP256v1 in libressl worries me:

http://opensslrampage.org/post/84442190366/add-support-for-t...

I get that logic in the post, but there are concerns that FRP256v1 was weakened in the standard similarly to the FIPS curves. So I'm not sure that is good reasoning. Also I am unsure if the libressl/openssl implementation has good small subgroup attack defenses even.

X-Istence · on May 2, 2014

Are you suggesting they remove all curves that may be tainted and ship without them? Thereby forcing application developers that do want to use them to implement each and every single one themselves?

mzs · on May 2, 2014

I suppose so, but I'd rather people not use anything other than Goldilocks or 41417. I'm hoping that for those applications if they are forced so use something like p=192 they ignore the ECC option entirely, don't code it, and fallback to some interoperable DSA or RSA scheme instead in whatever protocol it may be. Maybe there is some case where that is not possible?

derefr · on May 3, 2014

How about seeming to ship them, but when you try to compile with them, getting an error containing a link to a page that explains why you shouldn't be using them?

callesgg · on May 2, 2014

To me it seams way to early to switch.

Wait until libreSSL is battle tested, and we know if it is actually better or worse than the original.

scrollaway · on May 2, 2014

How exactly do you think LibreSSL will be battle tested?

callesgg · on May 2, 2014

By OpenBSD

From http://www.libressl.org/ "LibreSSL is primarily developed by the OpenBSD Project, and its first inclusion into an operating system will be in OpenBSD 5.6. "

icebraining · on May 2, 2014

OpenBSD will tests its implementation; it probably won't test ports to other OSs. So unless you use OpenBSD yourself, it won't be of much help.

ForHackernews · on May 2, 2014

Maybe that's an argument for more people running OpenBSD.

i386 · on May 2, 2014

Will they? Point me to their CI server

estebank · on May 2, 2014

I can point you to their test rack:

http://www.openbsd.org/images/rack2009.jpg

They run every version of OpenBSD in every machine they support, including 32bit SPARC, HP 300 and SGI. By running in all those machines they uncover subtle bugs that are made evident by architecture differences.

Also, please see http://anoncvs.estpak.ee/cgi-bin/cgit/openbsd-src/tree/regre...

makomk · on May 2, 2014

That wouldn't have caught Heartbleed, wouldn't have caught a vulnerability like the one in Apple's TLS implementation, wouldn't have caught... Basically, testing that your software works in normal operation isn't enough to ensure it's secure, you need to explicitly test its behaviour under attack.

bsder · on May 2, 2014

Actually, OpenBSD did have things in place that would have caught Heartbleed. OpenSSL went out of their way to create a situation that defeated them.

Look, the whole OpenSSL debacle is the fact that OpenSSL has ONE programmer working on it reliably. LibreSSL now has 5x-10x the manpower that was working on OpenSSL--and that's STILL probably low by an order of magnitude.

Google should pledge 5 people to work on LibreSSL by itself. They clearly have them since one of their internal audits uncovered Heartbleed.

The thing is nobody in the companies actually cared until the NSA started spying on them.

wlesieutre · on May 2, 2014

In that case, you can't use OpenSSL either because its testing clearly doesn't ensure it's secure. So, good luck with that?

bonetruck · on May 2, 2014

All OpenBSD developers work on -current and commonly on multiple platforms. Snapshots are rolled continuously for most platforms and made available to anyone who wants to run the latest code without having to build it themselves. The entire ports tree is compiled regularly on -current too. The compiled packages are then made available.

Check out the 'snapshots' folder of any mirror. More info here: http://www.openbsd.org/ftp.html

i386 · on May 3, 2014

A bit unfair that this was down voted. Why does the hive mind think collectively that this is OK state for LibreSSL/OpenSSL - a critical component of internet security - to be in?

What does "testing" mean in the LibreSSL/OpenSSL situation anyway? It compiles? A regression suite passes? Manual verification?

clarry · on May 2, 2014

Battle testing sounds like something you'd do to a new implementation. But so far there's very little new in LibreSSL; it's just cleanups and bugfixes. Do you battle test dead code removals and bug fixes?

DanBC · on May 2, 2014

Yes. Otherwise you get the Debian SSL bug.

https://www.schneier.com/blog/archives/2008/05/random_number...

clarry · on May 2, 2014

If anything, battle testing failed to catch that bug.

Some other form of testing could've caught it. Careful code review could've caught it.

Battle testing evidently has failed to catch many of the OpenSSL bugs that have been fixed in LibreSSL.

bananas · on May 2, 2014

I think some of this rant is invalid.

If you look at the portable versions of their products they tend to ship a chunk of the OpenBSD library implementation with them to give consistency guarantees.

Perhaps we need a consistent OpenBSD platform abstraction layer that gives solid guarantees?

lplplplplp · on May 2, 2014

Most is invalid in principle, perhaps not in practice (stupid as that is):

1. Ye shall use C11 memset_s().

2. Ye shall (as you note) use a reallocarray() with OpenBSD-like (ANSI C) wrap checking.

3. Ye shall use /dev/urandom on Linux (I know you guys love him, see https://news.ycombinator.com/item?id=7361868 by tptacek)

4. Also, timingsafe_bcmp() is 3 lines of ANSI C99 code (minus variable and function declarations), include it with the code (as you note).

malkia · on May 2, 2014

Not sure whether it helps, but here is way to force a "C" function to never be inlined (by asking for it's address, and then calling it):

http://www.flipcode.com/archives/Forcing_Functions_To_Be_Cal...

silvestrov · on May 2, 2014

Both of the methods on the page are easy to optimize out. GCC does optimize the first version out.

These methods are not safe and they do not ensure that the compiler doesn't optimize out a call to bzero.

malkia · on May 2, 2014

I guess you are right. 10+ years ago that made the trick :)

Probably there is some declspec/attribute for it.

frik · on May 2, 2014

Is someone working on a Win32/64 port?

valarauca1 · on May 2, 2014

The problem with porting POSIX code to Win32/64 is other then windows not being POSIX which causes a lot of problems to start with.

Windows lacks a lot of fundamental equals to Unix-Like system calls. I.E.: Windows has no equal to Fork, instead you need to use something like spawn(), and do some tricky memory cloning to get the same effect.

In fact one could say there is no port of OpenSSL for windows. It hasn't been updated since 2004, and lacks 64bit support.

cypher543 · on May 2, 2014

What are you talking about?

OpenSSL doesn't need the fork system call and it already builds just fine on Win32 and Win64. You can even get pre-built binaries of the latest release:

http://slproweb.com/products/Win32OpenSSL.html