> the current crop of c compilers is simply not very good, compared with what's ...

moonchild · on July 8, 2023

One prominent example: the use of intermediate representations based on basic blocks introduces redundancies that increase the complexity of the compiler, requiring attendant redundancies in order to optimise the same. You can see the redundancy manifest here https://godbolt.org/z/8o3oe39hh as different code generation from f and g. (They may change the result of this particular test in the future, but it seems unlikely that the disease—rather than the symptom—can be treated without a complete rearchitecture.)

E-graphs ameliorate phase ordering issues and allow for exploring the space of non-monotonic rewrites; recent research makes them computationally viable.

Put simply: it's legacy. Gcc and llvm are millions of lines of code, and they assume a particular architecture. Changing that is not easy.

Another issue, which I did not mention (but which is pertinent) is that c is a poor language for compilation. (Fran allen famously said 'c has destroyed our ability to advance the state of the art'.) In some respects, the optimisations performed automatically by modern high-performance cpus are more sophisticated than those done by c compilers, howbeit with less reach; the only reason they are able to do this is that they have direct control of the execution and hence have a greater ability to abstract over the side effects which are rampant in most c code.

bruce343434 · on July 8, 2023

E-graphs are interesting, but one still has to deal with combinatorial explosions. Are you alluding to some powerful search heuristic?

Your example touches on the problems of inflexible ABI, namely caller saved registers and the unknowability of side effects of external functions. Very weird that it can't reorder `r = x+y` despite it having no "observable" side effect until `return r`, since that return dominates the assignment, and there's no real relation between (the return, assignment) and (eff()).

bruce343434 · on July 8, 2023

I looked at it closer. In C, it is a side effect to assign to a variable. For an extern function not annotated __attribute__((pure)) the compiler has to assume the function call generates side effects. This prevents it from reordering the assignment and function call. Since x86-64 ABI has caller saved registers, in the case where it calls eff() first, it has to save x and y, and after the call, restore them.

moonchild · on July 8, 2023

The work on e-graphs I refer to is <https://egraphs-good.github.io/>—amortised rebuilds.

My example has nothing whatsoever to do with abi, and everything to do with ir. f and g are exactly semantically equivalent, and this equivalence is trivial to show; that the compilers generate different code for each demonstrates redundancies in their ir.

> it is a side effect to assign to a variable

But that variable is not aliased here.