Incredibly experienced programmers write C without those tools, and with practically no exceptions memory issues, crashes, and vulnerabilities surface if the code is large enough and the project used enough.
An easy example of this is the Linux kernel. They historically had relatively few verification tools and have one of the largest C codebases out there.
There's a new vulnerability related to memory safety every two or three weeks.
There's a bug fixed due to C memory management at least daily.
None of those bugs would have happened in Rust. Most of them couldn't have happened in well written C++ (whereas this well-written C is chock full of them).
Note as well that the kernel now gets some of the most exhaustive verification of any project (such as from google's syzkaller, etc), and it still has tons of C memory issues.
"Incredibly experienced programmers" ? The language is simple and well defined to the degree one can map out exactly what happens on paper, if need be. Everyone is so brainwashed to think C is some low level assembly like language that is "hard". No, not at all. It simply provides no training wheels or safety harness - meaning one needs to know what they are doing, not coding in a "type and see what it does" style.
> The language is simple and well defined to the degree one can map out exactly what happens on paper, if need be.
Without a copy of the language standard, I still can't remember all the intricate details about just arithmetic and comparison operators. Take the simple bitshift operators. What happens when the second operand is negative or larger than sizeof(left)*CHAR_BITS? What if it's equal? And the >> and << operators are not even symmetric in their definitions: left-shifting a negative value is undefined but right-shifting a negative value is implementation-defined! And even for simpler operators like + or >, can you immediately tell me all the integer promotion rules and usual arithmetic conversion rules? What happens when you add two integers with different signedness and width?
I can't and I doubt most programmers can. Try implementing a special C interpreter where every undefined and implementation-defined behavior is configurable and you will see how complicated things are.
> What happens when the second operand is negative or larger than sizeof(left) * CHAR_BITS? What if it's equal? And the >> and << operators are not even symmetric in their definitions: left-shifting a negative value is undefined but right-shifting a negative value is implementation-defined!
As a lover of C, these questions are totally irrelevant to me. What I've taken away is don't do things that do not have intuitively clear semantics. Use bitwise and shift operators only on unsigned numbers. Simple as that. And really I haven't found a need to do anything else. I think many of these complex rules are historical accidents or come from supporting a peculiar architecture or kind of hardware that's no longer relevant. In any case, remembering all these pesky rules is not C. C is simple. Remember how to stay away from the dubious operations and you're fine.
Look ma, undefined behavior due to signed integer overflow (on 32 bit targets anyways), despite only unsigned types being declared.
And what about the aliasing rules:
char *start, *end;
//This is a function that sets start/end to the start and end of an address range. It takes uintptr_t * arguments
GetAddressRange((uintptr_t *)&start, (uintptr_t *)&end);
*start=3; //Undefined behavior because we have now accessed start as both a uintptr_t and a char *.
The first is nice, but I think you should make sure that ubyte really has that many bits for it to make sense, no? And doesn't your compiler emit a warning if it takes advantage of that undefined behaviour? (e.g. using the upper part of the register where that byte is stored).
The second, why do you cast as uintptr_t? It makes no sense to me at all. Yep, it might be a little bit advanced, but it's well known that you should only have correctly typed pointers, or void pointers, or char pointers, to any object.
It's not a totally obscure thing, just as most people understand the need for "restrict". I'd say the code looks bad to me at first sight, and even a beginner is unlikely to write it because it's a complicated construction.
You have your point, but anyway the pointer-type rule makes sense and I don't know a better version of the rule that could prevent many unnecessary reloads.
With the first, John Regehr found many, many examples of it in real world code, and in some cases the bug he filed was closed without fixing it because "no compiler currently will do the wrong thing with it." I'm not aware of any compiler that emits a warning for this with default compilation flags.
The second is based off of code where an operating system call would get the mapped ranges of a shared memory region. For whatever reason, it took a MemoryAddress which was essentially a pre-c99 uintptr_t defined by the system headers (an unsigned value large enough to hold an address). The casts were there because the compiler warned about incompatible pointer types.
I was never able to convince the engineer who wrote the code (~20 years of C experience) that this code was wrong (it worked fine until intermodule inlining was enabled). Instead we just enabled the compiler option to disable optimizations based off of aliasing rules.
You are both greatly overestimating the degree to which programmers understand C, and overestimating the degree to which programmers who do understand C do not make mistakes. I've had lots of people say that the code in the second example "looks bad to me" but the commit still made it through code review and was being run for 5 years before optimization flag changes caused bugs to appear.
> "no compiler currently will do the wrong thing with it."
Well then, that's nice no? I don't like to play language lawyer. If there are machines were it can lead to errors, it might be bad machine design. The compiler or the user should work together to catch the error / specify more precisely what is the intent. If we can assume that (ubyte << n) == 0 for any n >= 8, I'm very fine with that, too.
I don't think compilers should exploit every undefined behaviour in the C standard. Some of those might be there only to support rare, quirky architectures.
> For whatever reason, it took a MemoryAddress which was essentially a pre-c99 uintptr_t defined by the system headers
That's inconvenient that the API forces such a strange type to the user. But I think the logical usage is to declare "start" and "end" as uintptr_t then. In this case there should be no problem, or do I miss something?
> You are both greatly overestimating the degree to which programmers understand C, and overestimating the degree to which programmers who do understand C do not make mistakes.
All I can say is I've never had this kind of problem really, so far. The kinds of problems I get by using safe toys on the other hand (e.g. array access only possible through proxy containers) are incredibly more painful, because these approaches are extremely bad for modularity and elegance. You have to specify the container, and also its type, or at least implement all kinds of boilerplate interfaces. This is incredibly restricting.
Everybody makes mistakes and while some memory safety problems definitely happen regularly even to very experienced coders, and are harder to detect than e.g. an exception, they are also the minority of the problems I encounter, even when counting the occasional buffer overrun or similar. Even in terms of data security, I figure more exploits are high-level (i.e. logic errors not catched in a high-level language - SQL injection or other auth bypass) than clever low-level ones. And not every type of program is extremely security-sensitive.
>> "no compiler currently will do the wrong thing with it."
> Well then, that's nice no? I don't like to play language lawyer. If there are machines were it can lead to errors, it might be bad machine design. The compiler or the user should work together to catch the error / specify more precisely what is the intent. If we can assume that (ubyte << n) == 0 for any n >= 8, I'm very fine with that, too.
> I don't think compilers should exploit every undefined behaviour in the C standard. Some of those might be there only to support rare, quirky architectures.
In this case, it would be legal for a compiler to assume that ubyte is less than 128. It's not unreasonable to assume that at some point in the future a compiler writer will discover a way to get a 0.1% improvement on whatever the current artificial benchmark of the day is and implement an optimization for it. The landscape of C libraries is littered with undefined code that used to work everywhere and now doesn't for just such a reason.
>> For whatever reason, it took a MemoryAddress which was essentially a pre-c99 uintptr_t defined by the system headers
> That's inconvenient that the API forces such a strange type to the user. But I think the logical usage is to declare "start" and "end" as uintptr_t then. In this case there should be no problem, or do I miss something?
Yes, the correct way is to declare them as uintptr_t and then assign them to a pointer of whatever type you need. It's not uncommon for systems software to treat addresses as integers in places, because they actually do represent specific addresses. I'd have to re-read the specification, but I think that having it take a void instead of a uintptr_t * would have the exact same issue, so I don't think it's the odd API choice that causes this.
I don't think there should be a problem. I think you can cast to void * and back as much as you like, but you aren't supposed to ever access the object as something else than char or its original type.
> I don't think compilers should exploit every undefined behaviour in the C standard. Some of those might be there only to support rare, quirky architectures.
Well, that is one way of looking at it. My personal experience is that if there is an undefined behavior in the C standard, it will be exploited, eventually. I have seen this break millions-of-lines applications during compiler upgrades and/or flag-twiddling. As you can imagine, debugging that problem was a nightmare.
[...]
> Everybody makes mistakes and while some memory safety problems definitely happen regularly even to very experienced coders, and are harder to detect than e.g. an exception, they are also the minority of the problems I encounter, even when counting the occasional buffer overrun or similar. Even in terms of data security, I figure more exploits are high-level (i.e. logic errors not catched in a high-level language - SQL injection or other auth bypass) than clever low-level ones. And not every type of program is extremely security-sensitive.
I believe that you are right in terms of number of exploits, but for the wrong reasons.
I can't find the statistics on this, but from the top of my head, ~20 years ago, 80% of the security advisories were buffer overflows/underflows. At the time, just about everything was written in C/C++.
These days, we see lots of high-level exploits, but I would tend to assume that the reason is that 1/ the web makes it very easy to carry out attacks; 2/ C and C++ have a very small attack perimeter on the web, even if you include Apache/nginx/... as part of the web.
Also, yes, web applications, especially in dynamically typed languages also would require military-grade testing :)
The first is only undefined on 16 bit targets as ubyte is first promoted to int, and int is always at least 16 bits but on typical general purpose machines 32 bits. And in any event modern C compilers will emit a diagnostic when constant shifts are undefined--just not in this case because it's not actually wrong.
The latter problem[1] is applicable to other languages, including Rust, when they permit such type punning. C supports type punning because occasionally it's useful and necessary. The above invokes UB on Rust, too. Type punning has such a horrible code smell that you don't need unsafe{} as a gateway. Maybe for very inexperienced programmers, but they shouldn't be writing C code in situations where type punning suggests itself anymore than they should be writing assembly or unsafe Rust code.
C has many pitfalls, but I don't think I'm being overly pedantic here. There's no question C gives you a significant amount of rope to hang yourself (sometimes affirmatively but usually by turning its head when you steal some), but all languages give you slack to escape the limitations of how they model the environment--the points of contention are how much slack and at what cost. Arguing that requires more precision.
For example, even if we can agree that automatic arithmetic conversions are on balance a bad idea, it's still the fact that there's some benefit to that behavior, such as making your first example well-defined. That's not coincidental.
[1] It's actually only a problem if that routine actually stores the value through a uintptr_t pointer. If it casts the parameter to pointer-to-pointer-to-char before the store it's fine. You can cast a pointer to whatever you want, as many times as you want, and never will the casting, alone, change the semantics of the code. It's only actual load or store expressions that matter, and specifically the immediate pointer type they occur through.
Where you typically find code like this you do the correct thing--the abuse of function parameter types in this particular manner is usually because of const-ness headaches and the lack of function overloading (though now there's _Generic), and in those cases you're already paying attention to type punning pitfalls because you're already at a point where you're trying to skirt around relatively strict, inconvenient typing. If you're not then, again, you're not going to be doing the right thing when writing unsafe Rust code--and there are many more cases where Rust's strict typing is inconvenient, so arguably there's _greater_ risk with a language like Rust when the programmer is lazy or tired and resorts to subverting the type system.
Moreover, this isn't even a pointer aliasing problem. Pointer aliasing problems occur when the _order_ of loads or stores matters, but because of type punning the compiler can neither determine nor assume that access through variables of _different_ type point to the same object and thus it mustn't reorder the sequence. Not only is your example not a case of a single object accessed through multiple types across dependent loads and stores, it's not even a case of accessing through the same type. Unless you use the restrict qualifier, the compiler always assumes that two variables of the same pointer type alias the same object. Whether it's the _wrong_ type is a different, less consequential problem related to the potential for trap representations or alignment violations. But if the hardware doesn't care you'll get expected behavior; it's not the type of pitfall that threatens nasal daemons.
> The first is only undefined on 16 bit targets as ubyte is first promoted to int, and int is always at least 16 bits but on typical general purpose machines 32 bits. And in any event modern C compilers will emit a diagnostic when constant shifts are undefined--just not in this case because it's not actually wrong.
If ubyte is greater than or equal to 128, then converting it to a signed 32-bit integer (as on a 32 bit machine) and then shifting it left by 24 causes signed integer overflow. This is undefined behavior. Therefore the compiler is allowed to make optimizations that assume ubyte is less than 128.
Look up integer promotion rules. Basically almost all arithmetic operators promote their operands to int/unsigned int before doing anything.
And you are the person who replied to my earlier post saying "don't do things that do not have intuitively clear semantics." Do integer promotion rules count as intuitively clear semantics? Most C programmers never learned them and therefore the semantics is confusing. For the few C programmers who learned them, the semantics is clearer when they can remember the rules, and confusing otherwise.
Now I hope I've convinced you that there are a lot of such subtle semantics issues in C, and hardly anything is intuitively clear, unless you've been a language lawyer or programmed in C for long enough.
Point taken. I agree that the weakly typed nature of C (integer promotions / implicit casts) can be problematic. Would be interesting to see if there are programs that would be painful to write with a stricter model.
It seems I've never really done "overshifting", otherwise I would have easily noticed that the shifted value is promoted first. If you don't overshift there's no way to trigger the undefined behaviour, even when you shift multiple times - since by assigning to the smaller type the result gets cut down again, effectively leading to the behaviour I'd assumed. I would hardly call this a gotcha.
Where it might be confusing is in a situation like this:
char x = 42;
printf("%d\n", x << 4);
But then again I'm undecided if the behaviour is that bad. It might be even useful.
"type and see what it does" is a strawman. Some people absolutely do program this way, I don't deny that, but there's a huge gulf between that and the levels of verification it takes to ensure that a large program really does behave as intended. Typos, minor oversights, unforeseen consequences of refactoring. . . the number of things you need to guard against is immense, and nobody can be operating at 100% 100% of the time.
And very few people can know 100% 100% times. We build software on top of abstractions. Abstraction leaks, abstraction breaks and abstraction isolates you from "know what you are doing".
If you build software in C, you will build abstractions in C, and debugging / analysis tools help to understand these abstractions.
> Meaning that if you are human you will make mistakes.
You can make mistakes in any language, and they will bite you in the backside any way.
Just because some people prefer to live in padded rooms that doesn't mean they can't or won't get hurt if they decide to pull stunts. Padded rooms just provide them with a false notion of safety that ends up making matters worse.
> You can make mistakes in any language, and they will bite you in the backside any way.
This is a false equivalency.. It's akin to saying "you can hit your thumb putting in a nail with any tool, so you might as well use a rock not a hammer since they're both bad".
Yes, you can make mistakes in any language, but that's not a reason to ignore modern tooling and advances and continue.
Other languages allow us to build abstractions and solve mistakes in ways which are just impossible with C... and without those abstractions, it's just needlessly difficult to keep all the state in your head and write correct code.
An easy example of this is locking and ownership.
In C, mutexes are managed manually and what they hold is by convention. It's easy to write races or accidentally deadlock (again as evidenced by such a bug in the kernel existing every few weeks).
In languages with RAII, python's "with", etc, you can't forget to unlock the lock. It's one less thing to think about.
In languages like rust, it's possible to model in the type system that a given resource can only be accessed while a mutex is locked. Again, one less way to make a mistake.
C has no ability to provide or build abstractions that are this robust.
Using C is, more often than not, like using a rock to hammer in a nail.
> Just because some people prefer to live in padded rooms
With C, people are living in a room where the floor is lava and nails coat most surfaces.
I'll take my room that's carpeted, and if I need to pull a stunt and rip up the carpet, sure, I can do that, but at least I can walk normally most of the time.
I've been a professional C++ programmer for almost two decades (and a C++ amateur for almost a decade before that). I've written large amounts of fairly advanced C++ using the latest standards.
My current job however, is all plain C.
One of the interesting (and, for a C++ fan like me, disturbing) things I've found is that the cognitive load is much lower when writing plain well-structured C.
Sure, you need to remember to release that lock and free that memory but this you can do in a very structured way. What you win over C++ is that you don't need to view each line of code with suspicion ("What happens if this line throws an exception?", "I wonder what this line will cost in terms of performance?", "Could this operation cause a cascading destruction of objects I'm going to access further down?").
I love RAII, yet I've debugged enough crashes where "seemingly innocuous operation destructs some local object that has an owning reference to an object that shouldn't be destructed just yet and BAM!", that I'm beginning to doubt its usefulness outside of an exception-generating language (in C++ it's essential for proper exception-handling).
Even from a C amateur point of view I feel the same way in my learning of the language. I can’t speak to multithreading or very large applications yet— but the view from the ground looking up is that C is relatively straightforward. The small size of the language is something of a relief.
It's not very surprising for me, because C is a simpler language and it's easy to paint yourself into a corner with C++.
The critical question is: what does the error rate look like for code of similar complexity? It's very possible that C programmers will try to keep things simple, because the language doesn't support them and they have to be extra careful. The flip-side is that they probably can't develop projects as complex as C++ would enable them to.
Yeah, most projects don't consist of a single function locking and unlocking a mutex. :) Try doing the same in five 200 LOC functions with multiple returns when three different threads are sharing the data, for 10 years, with a yearly personnel turnover of 20%.
Something really unusually stupid like returning from the function (assuming do_something() is a standin for arbitrary code, not specifically a function call)?
This is not even remotely the worst thing about mutexes though, so it wouldn't be why I would suggest avoiding them.
Honestly, to help avoid mistakes, I would keep it as a function call. Sure, it adds to the stack depth, but it also ensures a separate return doesn't cause the lock to be lost.
There's also the goto pattern, but anytime you separate a lock from an unlock by more than a screen's worth of text, they're forgotten.
And inline functions don't even add stack depth. It's still something that happens manually though. Every item that needs to be done manually and doesn't cause obvious failures when not done adds to development costs.
I'm glad you mentioned Python. Python, with its developers who accept raw pickle objects from the wild and are surprised when arbitrary code can be executed. Ruby's (and Python's) YAML libraries which execute arbitrary code. Javascript (and Ruby, and Python) developers pulling in untrusted and/or poorly tested libraries just to save a few lines of code. Rust with its `unsafe` blocks.
Seems like that padded floor has some rusted nails hiding right behind the pretty fabric.
RAII is not something limited to Rust, or C++, or any other language. The abstraction underpinning RAII can be done and has been done in C; you can see it done repeatedly in the code for collectd.
Its up to the developers to make their programs safe and reliable. No language to date will do that for them.
> Its up to the developers to make their programs safe and reliable. No language to date will do that for them.
But languages do make a huge contribution. For example, Rust, Ada and Modula-3 are all much safer by defaults alone compared to C. Most Rust code sits outside unsafe blocks, so the existence of this feature does not prove there is no point to Rust.
I didn't say anything along those lines. I said that it's up to developers to make their programs safe.
Defaults matter, no doubt. But they are not a silver bullet; greater base safety can even cause people to become lax when thinking about safety, resulting in even bigger problems. Why do Python developers accept and parse untrusted pickle objects? Because Python is safe, and they don't have to think about what's going on under the hood.
It's indirectly related to computer programming, but a study was done in Europe which showed that crashes in AWD vechicles were, on average, much more severe than 2WD vehicles. Why? Because of the added stability of AWD, people drove faster in adverse conditions.
You clearly don't understand the world of video game programmers. I worked on a game (Banjo Kazooie) that ended up being turned into more than 6 million ROM cartridges and distributed worldwide through retail. The cost of producing the cartridges alone was millions of $. With no internet patching available in case of a bug. We took verification VERY seriously.
Incredibly experienced programmers write C without those tools, and with practically no exceptions memory issues, crashes, and vulnerabilities surface if the code is large enough and the project used enough.
An easy example of this is the Linux kernel. They historically had relatively few verification tools and have one of the largest C codebases out there. There's a new vulnerability related to memory safety every two or three weeks. There's a bug fixed due to C memory management at least daily.
None of those bugs would have happened in Rust. Most of them couldn't have happened in well written C++ (whereas this well-written C is chock full of them).
Note as well that the kernel now gets some of the most exhaustive verification of any project (such as from google's syzkaller, etc), and it still has tons of C memory issues.