Hacker News new | past | comments | ask | show | jobs | submit login

It's entirely valid C and (assuming this and that are byte pointers) copies a range of bytes until (and including) a zero byte is reached.

With a suffficient warning level (e.g. -Wall on gcc, which should always be enabled anyway, together with -Wextra), compilers will complain about the '=' and ask you to add a pair of braces to make clear that this is actually intended:

    while( (*this++ = *that++) );
It's also one of those cases where the C code matches the output assembly pretty well:

https://www.godbolt.org/z/nz1jbz4Er

As far as "obfuscated C" goes, this is a very tame example though, it's just a straightforward usage of language features, which might look strange only when coming from other languages that don't have pointers or a post-increment operator).




Doesn't that kind of prove my point?

The code as written was described as beautiful, yet would not have passed -Wall.

It's those little things that can easily get you in C, and there are so many little things to consider.


That extra pair of braces doesn't make the code 'ugly' ;)

And the code without braces is still entirely valid standard C, the warning is essentially just a lint to protect against typos (similar to JS linters warning about '===' vs '==').

PS: let's see if the alternatives would be any more readable:

    char c;
    while (c = *that++) {
        *this++ = c;
    }
...this is already buggy because it doesn't copy the final zero byte, so the test must happen inside the loop body and also lets try to get rid of the post-increment:

    while (true) {
        char c = *that;
        *this = c;
        this += 1;
        that += 1;
        if (c == 0) {
            break;
        }
    }
...hmm not really any more readable...

Let's try with an index...

    while (true) {
        char c = that[i];
        this[i] = c;
        i += 1;
        if (c == 0) {
            break;
        }
    }
...might be a bit easier to grasp when used to other languages, but readability hasn't improved all that much I'd say...

For reference, MUSL also just uses the original approach:

https://github.com/esmil/musl/blob/master/src/string/strcpy....


I would use a do-while, because at least one char is always copied:

    char c;
    do
    {
        c = *src++;
        *dst++ = c;
    }
    while (c != 0);
The post-increment is idiomatic enough in C-based languages that I wouldn’t worry about that.


I don’t write much C, but to an outsider like me this is a pretty big improvement.

It is a shame post-test loops aren’t more popular, given the similarity to the assembly they output. Seems more mechanically sympathetic. Oh well, at least it is an excuse to whip out the goto.


In Pascal, you could do

    repeat
        ...
    until c = 0;
which might be even clearer for this use case.


I was considering that but had my doubts that people who already find post-increment confusing could handle a do-while loop ;)


I find it crazy that you improve the readability so much and say readability hasn't improved that much.

The order things happen in *this++ is not obvious unless you know a bunch of C-specific rules, while the ordering of multiple statements is obvious even to someone who doesn't know C. Perhaps C programmers should find this obvious, but it seems to me more like cognitive overhead which has a non zero chance of confusing someone at some point.


That's almost a philosophical question ;) Should code in a specific language be more readable to programmers familiar to that language or to programmers who are not familiar?

E.g. I guess for a mathematician, all imperative languages are probably 'weird', while something like Haskell feels more familiar?


I was unclear, sorry: I didn't mean to say that the extra braces make it uglier, I meant to point out that something that was described as beautiful was actually flawed.

The flaw was minor in this case because the identifier names and lack of body make the intention clear, but my point is that there are a lot of minor things in C that can come and bite you at any time.

Edit: You are right, I don't see a way this could have been implemented more readable without sacrificing some performance.

First thing I thought of was:

  void cp(const char* from, char* to) {
      while (*from) {
          *to = *from;
          to++;
          from++;
      }
  }
But that does not reduce to the original case.


I've added a couple of examples trying to find a more readable version, which actually isn't trivial. Sorry for the 'post-edit' :)

As for performance: I don't think such details matter much, first, compilers are pretty good to turn "readable but inefficient" code into the same optimal output (aka "zero cost abstraction").

And a really performance-oriented strcpy() wouldn't simply copy byte by byte anyway, but try to move data in bigger chunks (like 64-bit or even SIMD registers). Whether this is then actually faster also depends on the CPU though.


`this` and `that` are arrows which range over a stream of data; `=` is copy, and `++` moves the arrow along the stream.

This isn't a "clever one-liner" it is a clear and precise syntax for expressing the operation the machine actually performs.

while(copy(current(stream_a), current(stream_b)) and not end_of_stream(stream_a))

You might prefer the above, but then, that's every other major language. The beauty of C is that the above code has to compile to something like the C version. C just allows you to actually express it


> it is a clear and precise syntax for expressing the operation the machine actually performs.

No it’s not. Your compiler will almost certainly translate this into vector instructions, at least.


For posterity, I was apparently wrong. It doesn’t autovectorize, with gcc 12.2 -O3 on godbolt, at least.


C lacks the expression of many useful common CPU capabilities. Integer rotation and overflow checking come to mind immediately.


GCC warnings can be overly pedantic. It's setup to warn about common footguns but doesn't know what your intent is. In this case it's a common enough idiom to assign within a control statement that GCC has the extra parens escape hatch.

You shouldn't just blindly let your tooling dictate how you work. It's a tool that's supposed to work for you, not control you. -Wall and -Wextra are good baselines but I always disable some of their warnings because I don't need the hassle on known good code.


Yes, with -Wall it triggers a warning, but the warning is a false positive because assignment is indeed the intention.


> the output assembly pretty well

Ironically, the compiler os likely to recognize this as a strcpy and replace it with a possibly vectorized implementation.


I actually tried to make that happen, but was unsuccessful on GCC and Clang (I've seen this in the past for mempcy() though).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: