Mysterious Memset

fguerraz · on May 11, 2022

Not really mysterious or a surprise. Passing count as a reference was a huge red flag from the very beginning.

Diggsey · on May 11, 2022

It would be a surprise to anyone not very familiar with C++'s strict-aliasing rule, which I imagine is quite a lot of people (and even many C++ programmers...)

Very few other languages have a similar rule (eg. Rust explicitly does not have this rule)

kevincox · on May 11, 2022

Although Rust doesn't have this rule it isn't nearly as relevant due to the way immutable (shared) references work. While `a: &Foo` and `b: &Bar` may alias it doesn't matter much because Rust knows that nothing can write to them so it can do basically all of the optimization it needs. Rust also knows that `c: &mut T` doesn't alias with anything so in this case it can make loads of assumptions.

One chink in this rule is interior mutability. This is why it is better to think of `&T` as a shared reference than an immutable reference. For example in the following code Rust can't assume that `a.get() == 5`. (Even if the second argument is changed to i8)

    pub fn test(a: &std::cell::Cell<u8>, b: &std::cell::Cell<u8>) -> u8 {
        a.set(5);
        b.set(7);
        a.get()
    }

https://rust.godbolt.org/z/e1c6Kx4eb

Commenting out the write to b does allow Rust to hardcode the return value as 5.

ahefner · on May 11, 2022

The surprising thing to me is that 'char8_t' (a new C++ 20 thing I'd never heard of) is not just a typedef to char with all the same aliasing implications, but a new and distinct type to which the magic 'char' alias rules don't apply (also, unsigned).

ncmncm · on May 11, 2022

And also distinct from std::byte, which they are hoping will pick up aliasing properties of char, allowing use and also abuse of char* to be someday eliminated from new code. std::byte does alias everything, but no operations are defined on it except copying (and, weirdly, bitwise operators).

MauranKilom · on May 11, 2022

Also, char is a distinct type from both signed char and unsigned char (even though it has the same size as both and the same signedness as one of them).

jhgb · on May 11, 2022

Is signed char and unsigned char subject to the same "can alias anything" rule? I never bothered to think about this in the past. Now I'm not sure. I've known that the three types are distinct, but that just means that that the rule doesn't have to apply to all of them, not that it doesn't.

saagarjha · on May 11, 2022

Yes, the rule applies to all three character types.

jhgb · on May 11, 2022

Thanks. This seems mildly ungoogleable, or at least I haven't been able to find a good search result for this. So I'll keep in mind that these are three distinct types with the same aliasing behavior.

saagarjha · on May 11, 2022

Actually, I take that back and should clarify: it's all three only in C. In C++ it is just char and unsigned char. (If you want to search for this, "signed char aliasing" gave me good results.)

quibono · on May 12, 2022

Is there any difference in using char vs unsigned char then? I understand these two types are different as far as the compiler's concerned. Would they behave differently too?

MauranKilom · on May 12, 2022

signedness of char is up to the implementation. You should not rely on char behaving like unsigned char.

Sharlin · on May 11, 2022

It basically has to be distinct from `char` because you can't use portably use `char` to hold a UTF-8 code unit (because the guaranteed valid range is only 0x00 to 0x7F) Also, this way you can overload based on legacy char vs UTF-8 char, and have `std::basic_string<char>` and `std::basic_string<char8_t>` (aka `std::string` and std::u8string`) be distinct types as well. So finally in C++20 we actually have a portable UTF-8 string type!

ncmncm · on May 11, 2022

Anyway, a UTF-8 sequence transport type. Few of u8string member operations make sense for UTF-8 as such. Usually that doesn't matter. Sometimes it matters a great deal, and we will need a whole new API for that.

Sharlin · on May 11, 2022

Yeah, good point.

planede · on May 11, 2022

> you can't use portably use `char` to hold a UTF-8 code unit

That's not true, in C++ a byte is guaranteed to be able to hold a UTF-8 code unit.

https://timsong-cpp.github.io/cppwp/n4868/intro.memory#1.sen...

Sharlin · on May 11, 2022

Yes, but if `char` is signed, as it usually is, its bit patterns correspond to values -0x80 to 0x7F. So yeah, you can no-cost encode the >=0x80 code units as their two’s complement counterparts but it feels suspicious. At least to me, after writing some Rust lately which very much does not do implicit signed–unsigned conversions. Much better for char to always represent the "basic character set" (ie. usually ASCII) and have a distinct type for UTF-8.

matheusmoreira · on May 11, 2022

I agree, it's surprising. Types like uint8_t* are widely used to reinterpret other structures as byte arrays which implies they are universal aliases just like char*. Not sure what makes char8_t different.

wahern · on May 11, 2022

> Types like uint8_t* are widely used to reinterpret other structures as byte arrays which implies they are universal aliases just like char*

Alternatively, it implies that there's alot of broken code out there. So much broken code that they've accidentally found safety in numbers, and compilers are unlikely to change a coincidental behavior upon which they wrongly relied. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110

rsstack · on May 11, 2022

TIL: https://stackoverflow.com/questions/98650/what-is-the-strict...

ReactiveJelly · on May 11, 2022

It surprised me, cause I forgot that aliasing is a problem for C. Guess I'm one of today's 10,000.

jhgb · on May 11, 2022

I'm sort-of, kind-of aware of the issue but what surprised me was how std::string pulls this into picture here. Is this because of what str[i] gets compiled into when str is a basic_string of chars? That seems non-obvious to me because I'm just accustomed to dealing with C++ strings as fairly opaque entities. If they somehow expose to the compiler that you're really manipulating characters via pointers, I can understand how this complicates things.

bonzini · on May 11, 2022

Replace *count with this->count and you can see why this can cause pessimization.

fguerraz · on May 11, 2022

What would that change? If this->count was a pointer you'd still have to de-reference it explicitly. Or am I missing the point of your argument?

bonzini · on May 11, 2022

I agree that in this toy example passing a count argument by reference makes little sense, but you'd get the same thing if you accessed the count field field of

    struct s {
        int count;
        char *chars;
    };

andi999 · on May 11, 2022

Nice. Simplest fix would be to introduce a local variable int c=*count before the loop. (which you also would do with a non optimizing compiler so the count could be in a register.

ncmncm · on May 11, 2022

The point, which they probably should have spelled out better, was that std::string contains a char* that aliases everything there might be a pointer to. It is often hard to prevent pointers being created to things; calling a function that takes a "T const&" argument, the compiler is typically obliged to assume that a pointer to the argument was taken and retained.

So, the "*count" in the example is a stand-in for a zillion other things you would actually have written, and that would also be affected by aliasing assumptions.

taneq · on May 11, 2022

I've been coding commercially in C and C++ (not full time but pretty regularly) for over 20 years and I'm starting to think the simplest fix is to not use C or C++.

rurban · on May 11, 2022

__restrict would have also helped.

in the case of memset, it's questionable if that library call is actually faster for count < 128

bonzini · on May 11, 2022

Recent Intel processors have "fast rep stos" which basically allows the compiler to always inline memset (which they couldn't do in the aliasing case).

kevingadd · on May 11, 2022

since it's std::memset, the compiler could theoretically inline it or have a small-count path that uses vector intrinsics

saagarjha · on May 11, 2022

It could, but for a case without a known count it doesn't really make sense to do that. (I honestly found the concern to be kind of strange for that reason, honestly.)

astrange · on May 11, 2022

A memset tail call would be less code size than the loop, so it's still worth it even for cold code.

SuchAnonMuchWow · on May 11, 2022

I believe restrict is a C only keyword, so no, it couldn't have helped.

adzm · on May 11, 2022

While not standardized, restrict or __restrict are indeed supported by most C++ compilers. Visual C++ even supports a declspec(restrict) that can apply to the function rather than just individual arguments.

einpoklum · on May 11, 2022

I have one word for us all: restrict ...

https://en.cppreference.com/w/c/language/restrict

Still missing from C++ after all these years. Get it in there, people!

See also:

https://stackoverflow.com/q/776283/1593077

fefe23 · on May 11, 2022

I don't get the reference to undefined behavior.

This is an aliasing issue, which is pretty well defined.

Am I missing something?

Other than that: Interesting insight!

sharikous · on May 11, 2022

I think the idea is that if the user had aliased *count and the string they would have got undefined behavior.

So the compiler can assume they didn't and is allowed to optimize away the loop with memset.

andi999 · on May 11, 2022

Or do more nefarious stuff.

masklinn · on May 11, 2022

> I don't get the reference to undefined behavior.

> [...]

> Am I missing something?

`int*` and `char8_t*` aliasing is UB under strict aliasing.

You can call it "an aliasing issue, which is pretty well defined". As far as the compiler is concerned it's no different than every other UB it uses for constraints inference.

eklitzke · on May 11, 2022

Aliasing in the first example is well defined, so no UB in the first example. Aliasing in the second example would be undefined in the sense that the strict aliasing rules forbids the two pointer types to be aliased in the first place.

awoimbee · on May 11, 2022

Couldn't the author pass a reference to a std::string instead of switching to an std::u8string, is that a C++ limitation ? And I really don't understand why the compiler doesn't assume strict aliasing in the first example.

LAC-Tech · on May 11, 2022

Interesting, yet terrifying, as is all C++ wizardry to me.

spyremeown · on May 11, 2022

I had to write business logic in C++20 for an embedded Linux box and o-m-g. It sucked. So much. The developer experience was horrendous. All the PRs dragged along because of micro-optimizations every step of the way. "Let me see the generated assembly in Godbolt" why, why?

I know this is probably not all due to the language, but at least one bit of it has to be. It's really cool if you're writing bare-metal/RTOS level embedded stuff and you're worrying about how many assigments can you put into a loop round to optimize the cache lines, but I don't understand why anyone would ever try to talk to the web using C++.

AshamedCaptain · on May 11, 2022

> "Let me see the generated assembly in Godbolt" why, why?

This is a embedded thing, not a C or C++ thing.

The other day someone was saying here on HN that writing Verilog feels like following a process where "you already more or less know which circuit do you want, you are just trying to figure which is the specific Verilog code that will get your synthesizer to generate that circuit".

On embedded platforms (or generally anywhere where you count memory usage in units of KB or less), that's exactly what many people do. They already know more or less the assembly code, they are just looking for the right higher-level program that will translate to that assembly. That's one of the reasons they get angry when the language tries to be too smart.

The reason you just don't code in assembly directly is because it's still a pain and your chances of mistake increase (e.g. doing complex arithmetic expressions).

vanderZwan · on May 11, 2022

Does that also mean that embedded programming involves being conservative about compiler updates? Because otherwise those choices might become completely invalid one upgrade later

electroly · on May 11, 2022

It's not unreasonable to stick with the same compiler version for the entire life of a product. I was involved with a hardware project where we attempted to upgrade the compiler mid-lifecycle; it caused a subtle malfunction (almost certainly due to a latent bug or undocumented compiler behavior dependency in our code, but we couldn't find it) and we simply decided never to upgrade the compiler for the life of the product. It wasn't considered a big deal to do so.

userbinator · on May 11, 2022

The basic principle is that known unknowns are better than unknown unknowns.

AshamedCaptain · on May 11, 2022

It's worse than that; many times the principle is: known evils are better than unknown angels.

ncmncm · on May 11, 2022

Which are in turn better than unknown knowns, or things you were just sure were true, but just ain't.

Had you not been "sure", you might have done something that would work right.

adrianN · on May 11, 2022

For safety critical software for example upgrading the compiler means you need to recertify the product. That's very expensive.

vanderZwan · on May 12, 2022

That makes sense, and I'm glad to hear they have those strict rules given how terrible we are at making reliable software

spyremeown · on May 11, 2022

Yea, I know all this jazz, I was, once upon a time, the dude unrolling loops in asm to get digital filters faster on PIC32s.

It still a Linux box, with plenty of memory to spare. This IS a C++ problem, where people are driven to do these insane optimizations.

gumby · on May 11, 2022

> All the PRs dragged along because of micro-optimizations every step of the way. "Let me see the generated assembly in Godbolt" why, why?

> I know this is probably not all due to the language, but at least one bit of it has to be

No, there’s a cultural problem. C++ gives you the power and flexibility to really optimize for space or performance but rarely is that worth it early in development. Instead, just start by simply writing the code. Good design will give you affordances for appropriate optimization later.

And if your “embedded” system is so massive it can run Linux then you could end up with better code density by using Python source and including the interpreter!

lpapez · on May 11, 2022

> All the PRs dragged along because of micro-optimizations every step of the way. "Let me see the generated assembly in Godbolt" why, why?

I think a large part of it is due to the language. C++ code can be simultaneously both very low-level and highly-abstracted and then you will get reviewers complaining about needlesly copying 48 bytes while making a network request...

scoutt · on May 11, 2022

If might depend if your program does 1 request every now and then, or if your program is doing several thousands network requests per second (then I might complain too).

lpapez · on May 11, 2022

Of course it depends on what you are building, but my point is that the language gives you access to the low-level facilities which nudge people to worry/think about them even when they are irrelevant and unimportant. Because details like copy elision are usually an obvious point which can be improved upon, and people generally have a need to participate and contribute, small things will be mentioned on review and delay the feature even when it makes no difference. It's not the fault of the language itself but rather the culture around it and the easy and obvious answer to this "just dont use C++ if you dont need it" stops being easy and obvious when you try to actually interop with other languages. </rant>

jcelerier · on May 12, 2022

Idk I once sped a particle renderer a few dozen-fold because people were doing unnecessary copies.. it burnt CPU cycles for nothing for a long time before that, which should really be made illegal

saagarjha · on May 11, 2022

Better that than having to convince someone your code is fine and doesn't need to have variables pulled out of the loop for "optimization". Compilers are good at code motion and many engineers will microptimize things that end up hurting readability for now benefit unless you show them Godbolt.

jcelerier · on May 11, 2022

> "Let me see the generated assembly in Godbolt" why, why

What else do you propose ?

ncmncm · on May 11, 2022

It would be the same in C, of course. Except worse.

hermitdev · on May 11, 2022

To the author: the initial for loop is not equivalent to a do/while loop. The condition of the for loop is executed before the loop runs. e.g. the loop body will never run if *count == 0.

codetrotter · on May 11, 2022

There’s an if statement around the do while loop on that page, so that the do while loop also will not run if *count == 0.

The code snippets

    for (int i = 0; i < *count; ++i) {
        str[i] = 0;
    }

and

  if (*count > 0) {
    long idx = 0;
    do {
      str->__data[idx] = 0;
      idx += 1;
    } while (idx <= *count);
  }

Will produce the same changes to the string data given the same count.