Hacker News new | past | comments | ask | show | jobs | submit login
Mysterious Memset (vector-of-bool.github.io)
105 points by ibobev on May 11, 2022 | hide | past | favorite | 63 comments



Not really mysterious or a surprise. Passing count as a reference was a huge red flag from the very beginning.


It would be a surprise to anyone not very familiar with C++'s strict-aliasing rule, which I imagine is quite a lot of people (and even many C++ programmers...)

Very few other languages have a similar rule (eg. Rust explicitly does not have this rule)


Although Rust doesn't have this rule it isn't nearly as relevant due to the way immutable (shared) references work. While `a: &Foo` and `b: &Bar` may alias it doesn't matter much because Rust knows that nothing can write to them so it can do basically all of the optimization it needs. Rust also knows that `c: &mut T` doesn't alias with anything so in this case it can make loads of assumptions.

One chink in this rule is interior mutability. This is why it is better to think of `&T` as a shared reference than an immutable reference. For example in the following code Rust can't assume that `a.get() == 5`. (Even if the second argument is changed to i8)

    pub fn test(a: &std::cell::Cell<u8>, b: &std::cell::Cell<u8>) -> u8 {
        a.set(5);
        b.set(7);
        a.get()
    }
https://rust.godbolt.org/z/e1c6Kx4eb

Commenting out the write to b does allow Rust to hardcode the return value as 5.


The surprising thing to me is that 'char8_t' (a new C++ 20 thing I'd never heard of) is not just a typedef to char with all the same aliasing implications, but a new and distinct type to which the magic 'char' alias rules don't apply (also, unsigned).


And also distinct from std::byte, which they are hoping will pick up aliasing properties of char, allowing use and also abuse of char* to be someday eliminated from new code. std::byte does alias everything, but no operations are defined on it except copying (and, weirdly, bitwise operators).


Also, char is a distinct type from both signed char and unsigned char (even though it has the same size as both and the same signedness as one of them).


Is signed char and unsigned char subject to the same "can alias anything" rule? I never bothered to think about this in the past. Now I'm not sure. I've known that the three types are distinct, but that just means that that the rule doesn't have to apply to all of them, not that it doesn't.


Yes, the rule applies to all three character types.


Thanks. This seems mildly ungoogleable, or at least I haven't been able to find a good search result for this. So I'll keep in mind that these are three distinct types with the same aliasing behavior.


Actually, I take that back and should clarify: it's all three only in C. In C++ it is just char and unsigned char. (If you want to search for this, "signed char aliasing" gave me good results.)


Is there any difference in using char vs unsigned char then? I understand these two types are different as far as the compiler's concerned. Would they behave differently too?


signedness of char is up to the implementation. You should not rely on char behaving like unsigned char.


It basically has to be distinct from `char` because you can't use portably use `char` to hold a UTF-8 code unit (because the guaranteed valid range is only 0x00 to 0x7F) Also, this way you can overload based on legacy char vs UTF-8 char, and have `std::basic_string<char>` and `std::basic_string<char8_t>` (aka `std::string` and std::u8string`) be distinct types as well. So finally in C++20 we actually have a portable UTF-8 string type!


Anyway, a UTF-8 sequence transport type. Few of u8string member operations make sense for UTF-8 as such. Usually that doesn't matter. Sometimes it matters a great deal, and we will need a whole new API for that.


Yeah, good point.


> you can't use portably use `char` to hold a UTF-8 code unit

That's not true, in C++ a byte is guaranteed to be able to hold a UTF-8 code unit.

https://timsong-cpp.github.io/cppwp/n4868/intro.memory#1.sen...


Yes, but if `char` is signed, as it usually is, its bit patterns correspond to values -0x80 to 0x7F. So yeah, you can no-cost encode the >=0x80 code units as their two’s complement counterparts but it feels suspicious. At least to me, after writing some Rust lately which very much does not do implicit signed–unsigned conversions. Much better for char to always represent the "basic character set" (ie. usually ASCII) and have a distinct type for UTF-8.


I agree, it's surprising. Types like uint8_t* are widely used to reinterpret other structures as byte arrays which implies they are universal aliases just like char*. Not sure what makes char8_t different.


> Types like uint8_t* are widely used to reinterpret other structures as byte arrays which implies they are universal aliases just like char*

Alternatively, it implies that there's alot of broken code out there. So much broken code that they've accidentally found safety in numbers, and compilers are unlikely to change a coincidental behavior upon which they wrongly relied. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110



It surprised me, cause I forgot that aliasing is a problem for C. Guess I'm one of today's 10,000.


I'm sort-of, kind-of aware of the issue but what surprised me was how std::string pulls this into picture here. Is this because of what str[i] gets compiled into when str is a basic_string of chars? That seems non-obvious to me because I'm just accustomed to dealing with C++ strings as fairly opaque entities. If they somehow expose to the compiler that you're really manipulating characters via pointers, I can understand how this complicates things.


Replace *count with this->count and you can see why this can cause pessimization.


What would that change? If this->count was a pointer you'd still have to de-reference it explicitly. Or am I missing the point of your argument?


I agree that in this toy example passing a count argument by reference makes little sense, but you'd get the same thing if you accessed the count field field of

    struct s {
        int count;
        char *chars;
    };


Nice. Simplest fix would be to introduce a local variable int c=*count before the loop. (which you also would do with a non optimizing compiler so the count could be in a register.


The point, which they probably should have spelled out better, was that std::string contains a char* that aliases everything there might be a pointer to. It is often hard to prevent pointers being created to things; calling a function that takes a "T const&" argument, the compiler is typically obliged to assume that a pointer to the argument was taken and retained.

So, the "*count" in the example is a stand-in for a zillion other things you would actually have written, and that would also be affected by aliasing assumptions.


I've been coding commercially in C and C++ (not full time but pretty regularly) for over 20 years and I'm starting to think the simplest fix is to not use C or C++.


__restrict would have also helped.

in the case of memset, it's questionable if that library call is actually faster for count < 128


Recent Intel processors have "fast rep stos" which basically allows the compiler to always inline memset (which they couldn't do in the aliasing case).


since it's std::memset, the compiler could theoretically inline it or have a small-count path that uses vector intrinsics


It could, but for a case without a known count it doesn't really make sense to do that. (I honestly found the concern to be kind of strange for that reason, honestly.)


A memset tail call would be less code size than the loop, so it's still worth it even for cold code.


I believe restrict is a C only keyword, so no, it couldn't have helped.


While not standardized, restrict or __restrict are indeed supported by most C++ compilers. Visual C++ even supports a declspec(restrict) that can apply to the function rather than just individual arguments.


I have one word for us all: restrict ...

https://en.cppreference.com/w/c/language/restrict

Still missing from C++ after all these years. Get it in there, people!

See also:

https://stackoverflow.com/q/776283/1593077


I don't get the reference to undefined behavior.

This is an aliasing issue, which is pretty well defined.

Am I missing something?

Other than that: Interesting insight!


I think the idea is that if the user had aliased *count and the string they would have got undefined behavior.

So the compiler can assume they didn't and is allowed to optimize away the loop with memset.


Or do more nefarious stuff.


> I don't get the reference to undefined behavior.

> [...]

> Am I missing something?

`int*` and `char8_t*` aliasing is UB under strict aliasing.

You can call it "an aliasing issue, which is pretty well defined". As far as the compiler is concerned it's no different than every other UB it uses for constraints inference.


Aliasing in the first example is well defined, so no UB in the first example. Aliasing in the second example would be undefined in the sense that the strict aliasing rules forbids the two pointer types to be aliased in the first place.


Couldn't the author pass a reference to a std::string instead of switching to an std::u8string, is that a C++ limitation ? And I really don't understand why the compiler doesn't assume strict aliasing in the first example.


Interesting, yet terrifying, as is all C++ wizardry to me.


I had to write business logic in C++20 for an embedded Linux box and o-m-g. It sucked. So much. The developer experience was horrendous. All the PRs dragged along because of micro-optimizations every step of the way. "Let me see the generated assembly in Godbolt" why, why?

I know this is probably not all due to the language, but at least one bit of it has to be. It's really cool if you're writing bare-metal/RTOS level embedded stuff and you're worrying about how many assigments can you put into a loop round to optimize the cache lines, but I don't understand why anyone would ever try to talk to the web using C++.


> "Let me see the generated assembly in Godbolt" why, why?

This is a embedded thing, not a C or C++ thing.

The other day someone was saying here on HN that writing Verilog feels like following a process where "you already more or less know which circuit do you want, you are just trying to figure which is the specific Verilog code that will get your synthesizer to generate that circuit".

On embedded platforms (or generally anywhere where you count memory usage in units of KB or less), that's exactly what many people do. They already know more or less the assembly code, they are just looking for the right higher-level program that will translate to that assembly. That's one of the reasons they get angry when the language tries to be too smart.

The reason you just don't code in assembly directly is because it's still a pain and your chances of mistake increase (e.g. doing complex arithmetic expressions).


Does that also mean that embedded programming involves being conservative about compiler updates? Because otherwise those choices might become completely invalid one upgrade later


It's not unreasonable to stick with the same compiler version for the entire life of a product. I was involved with a hardware project where we attempted to upgrade the compiler mid-lifecycle; it caused a subtle malfunction (almost certainly due to a latent bug or undocumented compiler behavior dependency in our code, but we couldn't find it) and we simply decided never to upgrade the compiler for the life of the product. It wasn't considered a big deal to do so.


The basic principle is that known unknowns are better than unknown unknowns.


It's worse than that; many times the principle is: known evils are better than unknown angels.


Which are in turn better than unknown knowns, or things you were just sure were true, but just ain't.

Had you not been "sure", you might have done something that would work right.


For safety critical software for example upgrading the compiler means you need to recertify the product. That's very expensive.


That makes sense, and I'm glad to hear they have those strict rules given how terrible we are at making reliable software


Yea, I know all this jazz, I was, once upon a time, the dude unrolling loops in asm to get digital filters faster on PIC32s.

It still a Linux box, with plenty of memory to spare. This IS a C++ problem, where people are driven to do these insane optimizations.


> All the PRs dragged along because of micro-optimizations every step of the way. "Let me see the generated assembly in Godbolt" why, why?

> I know this is probably not all due to the language, but at least one bit of it has to be

No, there’s a cultural problem. C++ gives you the power and flexibility to really optimize for space or performance but rarely is that worth it early in development. Instead, just start by simply writing the code. Good design will give you affordances for appropriate optimization later.

And if your “embedded” system is so massive it can run Linux then you could end up with better code density by using Python source and including the interpreter!


> All the PRs dragged along because of micro-optimizations every step of the way. "Let me see the generated assembly in Godbolt" why, why?

I think a large part of it is due to the language. C++ code can be simultaneously both very low-level and highly-abstracted and then you will get reviewers complaining about needlesly copying 48 bytes while making a network request...


If might depend if your program does 1 request every now and then, or if your program is doing several thousands network requests per second (then I might complain too).


Of course it depends on what you are building, but my point is that the language gives you access to the low-level facilities which nudge people to worry/think about them even when they are irrelevant and unimportant. Because details like copy elision are usually an obvious point which can be improved upon, and people generally have a need to participate and contribute, small things will be mentioned on review and delay the feature even when it makes no difference. It's not the fault of the language itself but rather the culture around it and the easy and obvious answer to this "just dont use C++ if you dont need it" stops being easy and obvious when you try to actually interop with other languages. </rant>


Idk I once sped a particle renderer a few dozen-fold because people were doing unnecessary copies.. it burnt CPU cycles for nothing for a long time before that, which should really be made illegal


Better that than having to convince someone your code is fine and doesn't need to have variables pulled out of the loop for "optimization". Compilers are good at code motion and many engineers will microptimize things that end up hurting readability for now benefit unless you show them Godbolt.


> "Let me see the generated assembly in Godbolt" why, why

What else do you propose ?


It would be the same in C, of course. Except worse.


To the author: the initial for loop is not equivalent to a do/while loop. The condition of the for loop is executed before the loop runs. e.g. the loop body will never run if *count == 0.


There’s an if statement around the do while loop on that page, so that the do while loop also will not run if *count == 0.

The code snippets

    for (int i = 0; i < *count; ++i) {
        str[i] = 0;
    }
and

  if (*count > 0) {
    long idx = 0;
    do {
      str->__data[idx] = 0;
      idx += 1;
    } while (idx <= *count);
  }
Will produce the same changes to the string data given the same count.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: