The first is nice, but I think you should make sure that ubyte really has that many bits for it to make sense, no? And doesn't your compiler emit a warning if it takes advantage of that undefined behaviour? (e.g. using the upper part of the register where that byte is stored).
The second, why do you cast as uintptr_t? It makes no sense to me at all. Yep, it might be a little bit advanced, but it's well known that you should only have correctly typed pointers, or void pointers, or char pointers, to any object.
It's not a totally obscure thing, just as most people understand the need for "restrict". I'd say the code looks bad to me at first sight, and even a beginner is unlikely to write it because it's a complicated construction.
You have your point, but anyway the pointer-type rule makes sense and I don't know a better version of the rule that could prevent many unnecessary reloads.
With the first, John Regehr found many, many examples of it in real world code, and in some cases the bug he filed was closed without fixing it because "no compiler currently will do the wrong thing with it." I'm not aware of any compiler that emits a warning for this with default compilation flags.
The second is based off of code where an operating system call would get the mapped ranges of a shared memory region. For whatever reason, it took a MemoryAddress which was essentially a pre-c99 uintptr_t defined by the system headers (an unsigned value large enough to hold an address). The casts were there because the compiler warned about incompatible pointer types.
I was never able to convince the engineer who wrote the code (~20 years of C experience) that this code was wrong (it worked fine until intermodule inlining was enabled). Instead we just enabled the compiler option to disable optimizations based off of aliasing rules.
You are both greatly overestimating the degree to which programmers understand C, and overestimating the degree to which programmers who do understand C do not make mistakes. I've had lots of people say that the code in the second example "looks bad to me" but the commit still made it through code review and was being run for 5 years before optimization flag changes caused bugs to appear.
> "no compiler currently will do the wrong thing with it."
Well then, that's nice no? I don't like to play language lawyer. If there are machines were it can lead to errors, it might be bad machine design. The compiler or the user should work together to catch the error / specify more precisely what is the intent. If we can assume that (ubyte << n) == 0 for any n >= 8, I'm very fine with that, too.
I don't think compilers should exploit every undefined behaviour in the C standard. Some of those might be there only to support rare, quirky architectures.
> For whatever reason, it took a MemoryAddress which was essentially a pre-c99 uintptr_t defined by the system headers
That's inconvenient that the API forces such a strange type to the user. But I think the logical usage is to declare "start" and "end" as uintptr_t then. In this case there should be no problem, or do I miss something?
> You are both greatly overestimating the degree to which programmers understand C, and overestimating the degree to which programmers who do understand C do not make mistakes.
All I can say is I've never had this kind of problem really, so far. The kinds of problems I get by using safe toys on the other hand (e.g. array access only possible through proxy containers) are incredibly more painful, because these approaches are extremely bad for modularity and elegance. You have to specify the container, and also its type, or at least implement all kinds of boilerplate interfaces. This is incredibly restricting.
Everybody makes mistakes and while some memory safety problems definitely happen regularly even to very experienced coders, and are harder to detect than e.g. an exception, they are also the minority of the problems I encounter, even when counting the occasional buffer overrun or similar. Even in terms of data security, I figure more exploits are high-level (i.e. logic errors not catched in a high-level language - SQL injection or other auth bypass) than clever low-level ones. And not every type of program is extremely security-sensitive.
>> "no compiler currently will do the wrong thing with it."
> Well then, that's nice no? I don't like to play language lawyer. If there are machines were it can lead to errors, it might be bad machine design. The compiler or the user should work together to catch the error / specify more precisely what is the intent. If we can assume that (ubyte << n) == 0 for any n >= 8, I'm very fine with that, too.
> I don't think compilers should exploit every undefined behaviour in the C standard. Some of those might be there only to support rare, quirky architectures.
In this case, it would be legal for a compiler to assume that ubyte is less than 128. It's not unreasonable to assume that at some point in the future a compiler writer will discover a way to get a 0.1% improvement on whatever the current artificial benchmark of the day is and implement an optimization for it. The landscape of C libraries is littered with undefined code that used to work everywhere and now doesn't for just such a reason.
>> For whatever reason, it took a MemoryAddress which was essentially a pre-c99 uintptr_t defined by the system headers
> That's inconvenient that the API forces such a strange type to the user. But I think the logical usage is to declare "start" and "end" as uintptr_t then. In this case there should be no problem, or do I miss something?
Yes, the correct way is to declare them as uintptr_t and then assign them to a pointer of whatever type you need. It's not uncommon for systems software to treat addresses as integers in places, because they actually do represent specific addresses. I'd have to re-read the specification, but I think that having it take a void instead of a uintptr_t * would have the exact same issue, so I don't think it's the odd API choice that causes this.
I don't think there should be a problem. I think you can cast to void * and back as much as you like, but you aren't supposed to ever access the object as something else than char or its original type.
> I don't think compilers should exploit every undefined behaviour in the C standard. Some of those might be there only to support rare, quirky architectures.
Well, that is one way of looking at it. My personal experience is that if there is an undefined behavior in the C standard, it will be exploited, eventually. I have seen this break millions-of-lines applications during compiler upgrades and/or flag-twiddling. As you can imagine, debugging that problem was a nightmare.
[...]
> Everybody makes mistakes and while some memory safety problems definitely happen regularly even to very experienced coders, and are harder to detect than e.g. an exception, they are also the minority of the problems I encounter, even when counting the occasional buffer overrun or similar. Even in terms of data security, I figure more exploits are high-level (i.e. logic errors not catched in a high-level language - SQL injection or other auth bypass) than clever low-level ones. And not every type of program is extremely security-sensitive.
I believe that you are right in terms of number of exploits, but for the wrong reasons.
I can't find the statistics on this, but from the top of my head, ~20 years ago, 80% of the security advisories were buffer overflows/underflows. At the time, just about everything was written in C/C++.
These days, we see lots of high-level exploits, but I would tend to assume that the reason is that 1/ the web makes it very easy to carry out attacks; 2/ C and C++ have a very small attack perimeter on the web, even if you include Apache/nginx/... as part of the web.
Also, yes, web applications, especially in dynamically typed languages also would require military-grade testing :)
The second, why do you cast as uintptr_t? It makes no sense to me at all. Yep, it might be a little bit advanced, but it's well known that you should only have correctly typed pointers, or void pointers, or char pointers, to any object. It's not a totally obscure thing, just as most people understand the need for "restrict". I'd say the code looks bad to me at first sight, and even a beginner is unlikely to write it because it's a complicated construction.
You have your point, but anyway the pointer-type rule makes sense and I don't know a better version of the rule that could prevent many unnecessary reloads.