UB is not implementation defined, it's UB. Some compilers have options to defuse certain classes of UBs but then you're effectively coding in a non-compatible dialect of C. Otherwise you can't ever rely on a certain UB behaving one way or an other: a simple compiler update, code change or compiler flag modification could break everything. A compiler is under no obligation to define what it does in case of UB and that's the point of it, it leaves some room for aggressive optimization.
Anyway that wasn't really my point, the problem is that some of these UB can arise because of subtle bugs in code that might not look suspicious at a glance. Things like breaking aliasing rules, mis-using unions, casting things that aren't compatible etc... Your code triggers an UB and you don't know it. Actually you might not notice it until you turn an optimization flag or you update your compiler and suddenly it doesn't do what you want anymore.
Even something as trivial as computing a pointer that's more than one byte after the end of an object is UB for instance (not dereferencing it, merely computing its address). For that reason `ptr.offset` in unsafe in Rust for instance, even though it doesn't dereference the pointer.
I find it a bit silly that `ptr.offset` is unsafe, but casting an arbitrary integer to a pointer isn't. E.g.:
fn main() {
// Look, an invalid pointer, no `unsafe` required.
let ptr = 1000 as *const u8;
// boom, segfault.
println!("{}", unsafe { *ptr });
}
Using casting one can even implement a "safe" pointer offset function, like so:
fn main() {
fn safe_offset<T>(ptr: *const T, offset: isize) -> *const T {
((ptr as usize).wrapping_add(offset as usize)) as *const T
}
let xs = [0u8, 10];
let ptr = safe_offset(&xs[0], 1);
println!("{}", unsafe { *ptr }); // prints '10'
}
Obviously this "safe_offset" function can easily be used to trigger UB by computing invalid pointers, and not a single line of unsafe code was required (although we do need `unsafe` to dereference the bad pointer and actually trigger segfaults).
Interesting. However, isn't casting a random (potentially invalid) integer into pointer triggering the same potential UB? I ask because for GCC it is apparently:
>When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8.
The exact rules of unsafe code are still up in the air. It’s not explicitly defined as UB yet, IIRC, and when we set the rules, we have a goal of not invalidating large swaths of code.
There are "implementation defined" details in the C standard but it's a different problem, see for instance: https://gcc.gnu.org/onlinedocs/gcc/C-Implementation.html
Anyway that wasn't really my point, the problem is that some of these UB can arise because of subtle bugs in code that might not look suspicious at a glance. Things like breaking aliasing rules, mis-using unions, casting things that aren't compatible etc... Your code triggers an UB and you don't know it. Actually you might not notice it until you turn an optimization flag or you update your compiler and suddenly it doesn't do what you want anymore.
Even something as trivial as computing a pointer that's more than one byte after the end of an object is UB for instance (not dereferencing it, merely computing its address). For that reason `ptr.offset` in unsafe in Rust for instance, even though it doesn't dereference the pointer.