You could assert. You could throw. I can’t understand how, this modern age where so many programs end up getting hacked, that introducing more UB seems like a good idea.
This is one odd the major reasons I switched to rust, just to escape spending my whole life worrying about bugs caused by UB.
Assertions are debug-only. Exceptions are usually not guaranteed to be available and much of the standard library doesn't require them. You could std::abort, and that's about it.
I think the issue is that this just isn't particularly good either. If you do that, then you can't catch it like an exception, but you also can't statically verify that it won't happen.
C++ needs less of both undefined behavior and runtime errors. It needs more compile-time errors. It needs pattern matching.
I agree these things would be better, but I don’t understand how anyone can think UB is better than abort.
(Going to moan for a bit, and I realise you aren’t responsible for the C++ standards mess!)
I have been hearing for about… 20 years now that UB gives compilers and tools the freedom to produce any error catching they like, but all it seems to have done in the main is give them the freedom to produce hard to debug crash code.
You can of course usually turn on some kind of “debug mode” in some compilers, but why not just enforce that as standard? Compilers would still be free to add a “standards non-compliant” go fast mode if they like.
I don’t think people want that as standard. The whole point of using C++ tends to be because you can do whatever you need to for the sake of performance. The language is also heavily driven by firms that need extreme performance (because otherwise why not use a higher level language)
There are knobs like stdlib assertions and ubsan, but that’s opt-in because there’s a cost to it. Part of it is also the commitment to backwards compatibility and code that compiled before should generally compile now (though there are exceptions to that unofficial rule).
There does not need to be an additional cost for this.
Most users will do this:
1. Check if there is a value
2. Get the value
There is nothing theoretically preventing the compiler from enforcing that step 1 happens before step 2, especially if the compiler is able to combine the control flow branch with the process of conditionally getting the value. The practical issue is that there's no way to express this in C++ at all. The best you can do is the visitor pattern, which has horrible ergonomics and you can only hope it doesn't cause worse code generation too.
Some users want to do this:
1. Grab the value without checking to see if it's valid. They are sure it will be valid and can't or don't want to eat the cost of checking.
There is nothing theoretically preventing this from existing as a separate method.
I'm not a rust fanboy (seriously, check my GitHub @jchv and look at how much Rust I write, it's approximately zero) but Rust has this solved six ways through Sunday. It can do both of these cases just fine. The only caveat is that you have to wrap the latter case in an unsafe, but either way, you're not eating any costs you don't want to.
C++ can do this too. C++ has an active proposal for a feature that can fix this problem and make much more ergonomic std::variant possible, too.
Of course, this is one single microcosm in the storied history of C++ failing to adequately address the problem of undefined behavior proliferating the language, so I don't have high hopes.
I might have misinterpreted what you mean as “standard” (I was originally thinking like you meant that the standard dictates all operations must be safe and doing things like the unsafe/unchecked option references would be outside the standard). I realized that you might mean it like it should be the default behavior, so correct me if I’m wrong there.
Yeah I agree that sane defaults would be nice and that things like unchecked accesses should be more explicit. Having linear types to properly enforce issues with things like use-after-move would also be awesome. I don’t know that anyone has ever accused C++ of being particularly ergonomic. Sum types and pattern matching would be awesome.
A lot UB is things you wouldn't do anyway. While it is possible to define divide by zero or integer overflow, what does it mean. If you code does either of those things you have a bug in your code (a few encryption algorithms depend on specific overflow behavior - if your language promises that same behavior it is useful).
Since CPUs handle such things differently whatever you define to happen means that the compiler as to insert a if to check on any CPU that doesn't work how you define it - all for something that you probably are not doing. The cost is too high in a tight loop when you know this won't even happen (but the compiler does not).
I think there is a solid case for the existence of undefined behavior; even Rust has it, it's nothing absurd in concept, and you do describe some reasoning for why it should probably exist.
However, and here's the real kicker, it really does not need to exist for this case. The real reason it exists for this case is due to increasingly glaring deficiencies in the C++ language, namely, again, the lack of any form of pattern matching for control flow. Because of this, there's no way for a library author, including the STL itself, to actually handle this situation succinctly.
Undefined behavior indeed should exist, but not for common cases like "oops, I didn't check to see if there was actually a value here before accessing it." Armed with a moderately sufficient programming language, the compiler can handle that. Undefined behavior should be more like "I know you (the compiler) can't know this is safe, but I already know that this unsafe thing I'm doing is actually correct, so don't generate safeguards for me; let what happens, happen." This is what modern programming languages aim to do. C++ does that for shit like basic arithmetic, and that's why we get to have the same fucking CVEs for 20+ years, over and over in an endless loop. "Just get better at programming" is a nice platitude, but it doesn't work. Even if it was possible for me to become absolutely perfect and simply just never make any mistakes ever (lol) it doesn't matter because there's no chance in hell you'll ever manage that across a meaningful segment of the industry, including the parts of the industry you depend on (like your OS, or cryptography libraries, and so on...)
And I don't think the issue is that the STL "doesn't care" about the possibility that you might accidentally do something that makes no sense. Seriously, take a look at the design of std::variant: it is pretty obvious that they wanted to design a "safe" union. In fact, what the hell would the point of designing another unsafe union be in the first place? So they go the other route. std::variant has getters that throw exceptions on bad accesses instead of undefined behavior. This is literally the exact same type of problem that std::expected has. std::expected is essentially just a special case of a type-safe union with exactly two possible values, an expected and unexpected value (though since std::variant is tagged off of types, there is the obvious caveat that std::expected isn't quite a subset of std::variant, since std::expected could have the same type for both the expected and unexpected values.)
So, what's wrong? Here's what's wrong. C++ Modules were first proposed in 2004[1]. C++20 finally introduced a version of modules and lo and behold, they mostly suck[2] and mostly aren't used by anyone (Seriously: they're not even fully supported by CMake right now.) Andrei Alexandrescu has been talking about std::expected since at least 2018[3] and it just now finally managed to get into the standard in C++23, and god knows if anyone will ever actually use it. And finally, pattern matching was originally proposed by none other than Bjarne himself (and Gabriel Dos Reis) in 2019[4] and who knows when it will make it into the standard. (I hope soon enough so it can be adopted before the heat death of the Universe, but I think that's only if we get exceptionally lucky.)
Now I'm not saying that adding new and bold features to a language as old and complex as C++ could possibly ever be easy or quick, but the pace that C++ evolves at is sometimes so slow that it's hard to come to any conclusion other than that the C++ standard and the process behind it is simply broken. It's just that simple. I don't care what changes it would take to get things moving more efficiently: it's not my job to figure that out. It doesn't matter why, either. The point is, at the end of the day, it can't take this long for features to land just for them to wind up not even being very good, and there are plenty of other programming languages that have done better with less resources.
I think it's obvious at this point that C++ will never get a handle on all of the undefined behavior; they've just introduced far too much undefined behavior all throughout the language and standard library in ways that are going to be hard to fix, especially while maintaining backwards compatibility. It should go without saying that a meaningful "safe" subset of C++ that can guarantee safety from memory errors, concurrency errors or most types of undefined behavior is simply never going to happen. Ever. It's not that it isn't possible to do, or that it's not worth doing, it's that C++ won't. (And yes, I'm aware of the attempts at this; they didn't work.)
The uncontrolled proliferation of undefined behavior is ultimately what is killing C++, and a lot of very trivial cases could be avoided, if only the language was capable of it, but it's not.
I cannot follow your rant... I'll do my best to respond, but I'm probably not understanding something.
Divide by zero must be undefined behavior in any performant language. On x86 you either have a if before running the divide (which of course in some cases the compiler can optimize out, but only if it can determine the value is not zero); or you the CPU will trap into the OS - different OSes handle this in different ways, but most not in a while that makes it possible to figure out where you were and thus do something about it. This just came up in the C++ std-proposals mailing list in the past couple weeks.
AFAIK all common CPUs have the same behavior on integer overflow (two-complement). However in almost all cases (again, some encryption code is an exception) that behavior is useless to real code and so if it happens your code has a bug either way. Thus we may as well let compilers optimize assuming it cannot happen as it if it does you have a bug no matter what we define it as. (C++ is used on CPUs that are not two-complement as well, but we could call this implementation defined or unspecified, but it doesn't change that you have a bug if you invoke it.)
For std::expected - new benchmarks are proving in the real world, and with optimized exception handlers that exceptions are faster in the real world than systems that use things like expected. Microbenchmarks that show exceptions are slower are easy to create, but real world exceptions that unwind more than a couple function calls show different results.
As for modules, support is finally here and early adopters are using it. The road was long, but it is finally proving it worked.
Long roads are a good thing. C++ has avoided a lot of bad designs by spending a lot of time thinking about problems about things for a long time. Details often matter and move fast languages tend to run into problems when something doesn't work as well as they want. I'm glad C++ standardization is slow - it already is a mess without add more half backed features to the language.
The problem is 'undefined behaviour' is far too powerful.
Why not make division by zero implementation defined. I'm happy with my compiler telling me my program will get terminated if I divide by zero, no problem. Let's even say it "may" be terminated (because maybe the division is optimised out if we don't actually need to calculate it, fine).
My problem is that UB let's compilers do all kinds of weird things, like assume if I write:
Then set dividebyzero to always be 0, because 'obviously' y can't be 0, because then I would invoke undefined behaviour.
Also, for two-complement, it's fairly common people want wrapping behaviour. Also, I don't think basically anyone is using C++ on non-twos complement CPUs (both gcc and clang don't support it), and even if it does run on such CPUs, why not still require a well-defined behaviour, in the same way C++ runs on 32-bit and 64-bit systems, but we don't say asking for the size of a pointer is undefined behaviour -- everyone just defines what it is on their system!
> Divide by zero must be undefined behavior in any performant language. On x86 you either have a if before running the divide (which of course in some cases the compiler can optimize out, but only if it can determine the value is not zero); or you the CPU will trap into the OS - different OSes handle this in different ways, but most not in a while that makes it possible to figure out where you were and thus do something about it. This just came up in the C++ std-proposals mailing list in the past couple weeks.
I mean look, I already agree that it's not necessarily unreasonable to have undefined behavior, but this statement is purely false. You absolutely can eat your cake and have it too. Here's how:
- Split the operation in two: safe, checked division, and fast, unchecked division.
- OR, Stronger typing; a "not-zero" type that represents a numeric type where you can guarantee the value isn't zero. If you can't eat the cost of runtime checks, you can unsafely cast to this.
I think the former is a good fit for C++.
C++ does not have to do what Rust does, but for sake of argument, let's talk about it. What Rust does here is simple, it just defines divide-by-zero to panic. How? Multiple ways:
- If it knows statically it will panic, that's a compilation error.
- If it knows statically it can not be zero, it generates unchecked division.
- If it does not know statically, it generates a branch. (Though it is free to implement this however it wants; could be done using CPU exceptions/traps if they wanted.)
What if you really do need "unsafe" division? Well, that is possible, with unchecked_div. Most people do not need unchecked_div. If you think you do but you haven't benchmarked yet, you do not. It doesn't get any simpler than that. This is especially the case if you're working on modern CPUs with massive pipelines and branch predictors; a lot of these checks wind up having a very close to zero cost.
> AFAIK all common CPUs have the same behavior on integer overflow (two-complement). However in almost all cases (again, some encryption code is an exception) that behavior is useless to real code and so if it happens your code has a bug either way. Thus we may as well let compilers optimize assuming it cannot happen as it if it does you have a bug no matter what we define it as. (C++ is used on CPUs that are not two-complement as well, but we could call this implementation defined or unspecified, but it doesn't change that you have a bug if you invoke it.)
It would be better to just do checked arithmetic by default; the compiler can often statically eliminate the checks, you can opt out of them if you need performance and know what you're doing, and the cost of checks is unlikely to be noticed on modern processors.
It doesn't matter that this usually isn't a problem. It only has to be a problem once to cause a serious CVE. (Spoiler alert: it has happened more than once.)
> For std::expected - new benchmarks are proving in the real world, and with optimized exception handlers that exceptions are faster in the real world than systems that use things like expected. Microbenchmarks that show exceptions are slower are easy to create, but real world exceptions that unwind more than a couple function calls show different results.
You can always use stack unwinding or exceptions if you want to; that's also present in Rust too, in the form of panic. The nice thing about something like std::expected is that it theoretically can bridge the gap between code that uses exceptions and code that doesn't: you can catch an exception and stuff it into the `e` of an std::expected value, or you can take the `e` value of an std::expected and throw it. In theory this should not have much higher cost than simply throwing.
> As for modules, support is finally here and early adopters are using it. The road was long, but it is finally proving it worked.
Last I was at Google, they seemed to have ruled out C++ modules because as-designed they are basically guaranteed to make compilation times worse.
For CMake, you can't really rely on C++ Modules. Firstly, the Makefile generator which is default on most platforms literally does not and as far as I know will not support C++ Modules. Secondly, it doesn't support header units or importing the STL as modules. For all intents and purposes, it would be difficult to even use this for anything.
For Bazel, there is no C++ Modules support to my knowledge.
While fact-checking myself, I found this handy website:
...which shows CMake as supporting modules, green check mark, no notes! So that really makes me wonder what value you can place on the other green checkmarks.
> Long roads are a good thing. C++ has avoided a lot of bad designs by spending a lot of time thinking about problems about things for a long time. Details often matter and move fast languages tend to run into problems when something doesn't work as well as they want. I'm glad C++ standardization is slow - it already is a mess without add more half backed features to the language.
I'm glad you are happy with the C++ standardization process. I'm not. Not only do things take many years, they're also half-baked at the end of the process. You're right that C++ still winds up with a huge mess of half-baked features even with as slow as the development process is, and modules are a great example of that.
The true answer is that the C++ committee is a fucking mess. I won't sit here and try to make that argument; plenty of people have done a damningly good job at it better than I ever could. What I will say is that C faces a lot of similar problems to C++ and somehow still manages to make better progress anyways. The failure of the C++ standard committee could be told in many different ways. A good relatively recent example is the success of the #embed directive[1]. Of course, the reason why it was successful was because it was added to C instead of C++.
Why can't C++ do that? I dunno. Ask Bjarne and friends.
> What if you really do need "unsafe" division? Well, that is possible, with unchecked_div. Most people do not need unchecked_div. If you think you do but you haven't benchmarked yet, you do not. It doesn't get any simpler than that.
This attitude is why modern software is dogshit slow. People make this "if you haven't benchmarked, it doesn't matter" argument thousands of times and the result is that every program I run is slower than molasses in Siberia. I don't care about "safety" at the expense of performance.
> This attitude is why modern software is dogshit slow.
Bullshit. Here's my proof: We don't even do this. There's a ton of software that isn't safe from undefined behavior and it's still slow as shit.
> People make this "if you haven't benchmarked, it doesn't matter" argument thousands of times and the result is that every program I run is slower than molasses in Siberia. I don't care about "safety" at the expense of performance.
If you can't imagine a world where there's nuance between "we should occasionally eat 0.5-1ns on checking an unsafe division" and "We should ship an entire web browser with every text editor and chat app" the problem is with you. If you want your software to be fast, it has to be benchmarked, just like if you want it to be stable, it has to be tested. There's really no exceptions here, you can't just guess these things.
Modern software is dogshit slow because it's a pile of JS dependencies twenty layers deep.
Checked arithmetic is plenty fast. And as for safety vs performance, quickly computing the incorrect result is strictly less useful than slowly computing the correct one.
Google has a odd C++ style guide that rules out a lot of useful things for their own reasons.
There is no reason why make could not work with modules if someone wanted to go through the effort. The CMake people have even outlined what needs to be done. Ninja is so much nicer that you should switch anyway - I did more than 10 years ago.
I do use Ninja when I use CMake, but honestly that mostly comes down to the fact that the Makefiles generated by CMake are horrifically slow. I don't particularly love CMake, I only use it because the C++ ecosystem really has nothing better to offer. (And there's no chance I'm going to redistribute a project that can't build with the Makefile generator, at least unless and until Ninja is default.)
Anyway, the Google C++ style guide has nothing to do with why C++ modules aren't and won't be used at Google, it's because as-implemented modules are not an obvious win. They can theoretically improve performance, but they can and do also make some cases worse than before.
I don't think most organizations will adopt modules at this rate. I suspect the early adopters will wind up being the only adopters for this one.
> the lack of any form of pattern matching for control flow
Growing features after the fact is hard. Look at the monumental effort to get generics into Go. Look at how even though Python 3.10 introduced the match statement, it is a statement and not an expression - you can't write `x = match ...`, unlike Rust and Java 14. So it doesn't surprise me that C++ struggles with this.
Yes, C/C++ have far, far too many UB cases. Even down to idiotically simple things like "failing to end a source file with newline". C and C++ have liberally sprinkled UB as a cop-out like no other language.
> C++ does that for shit like basic arithmetic
I spent an unhealthy amount of time understanding the rules of integer types and arithmetic in C/C++. Other languages like Rust are as capable without the extreme mental complexity. https://www.nayuki.io/page/summary-of-c-cpp-integer-rules
Oh and, `(uint16_t)0xFFFF * (uint16_t)0xFFFF` will cause a signed 32-bit integer overflow on most platforms, and that is UB and will eat your baby. Scared yet? C/C++ rules are batshit insane.
> "Just get better at programming" is a nice platitude, but it doesn't work.
Correct. Far too often, I hear a conversation like "C/C++ have too many UB, why can't we make it safer?" "Just learn to write better code, dumbass". No, literal decades of watching the industry tells us that the same mistakes keep happening over and over again. The evidence is overwhelming that the languages need to change, not the programmers.
> it's obvious at this point that C++ will never get a handle on all of the undefined behavior; they've just introduced far too much undefined behavior all throughout the language and standard library
True.
> in ways that are going to be hard to fix, especially while maintaining backwards compatibility
Technically not true. Specifying undefined behavior is easy, and this has already been done in many ways. For example, -fwrapv makes signed overflow defined to wrap around. For example, you could zero-initialize every local variable and change malloc() to behave like calloc(), so that reading uninitialized memory always returns zero. And because the previous behavior was undefined anyway, literally any substitute behavior is valid.
The problem isn't maintaining backward compatibility, it's maintaining performance compatibility. Allegedly, undefined behavior allows the compiler to optimize out redundant arithmetic, redundant null checks, etc. I believe this is what stops the standards committees from simply defining some kind of reasonable behavior for what is currently considered UB.
> a meaningful "safe" subset of C++ that can guarantee safety from memory errors, concurrency errors or most types of undefined behavior is simply never going to happen
> The uncontrolled proliferation of undefined behavior is ultimately what is killing C++
It's death by a thousand cuts, and it hurts language learners the most. I can write C and C++ code without UB, but it took me a long time to get there - with a lot of education and practice. And UB-free code can be awkward to write. The worst part of it is that the knowledge is very C/C++-specific and is useless in other languages because they don't have those classes of UB to begin with.
I dabbled in C++ programming for about 10 years before I discovered Rust. Once I wrote my first few Rust programs, I was hooked. Suddenly, I stopped worrying about all the stupid complexities and language minutiae of C++. Rust just made sense out of the box. It provided far fewer ways to do things ( https://www.nayuki.io/page/near-duplicate-features-of-cplusp... ), and the easy way is usually the safe and correct way.
To me, Rust is C++ done right. It has the expressive power and compactness of C++ but almost none of the downsides. It is the true intellectual successor to C++. C++ needs to hurry up and die already.
Culturally, I think C++ has a policy of "there's no single right answer." Which leads to there being no wrong answers. We just need more answers so everyone's happy. Which is worse.
This is one odd the major reasons I switched to rust, just to escape spending my whole life worrying about bugs caused by UB.