Allocation with a GC is typically not any more expensive than being "on the stac...

neonsunset · on Nov 4, 2023

This is incorrect in multiple ways. C# stack is completely native, identical to C++ or Rust.

The following factors contribute to “structs being faster”:

- Heap allocations have go to through allocation calls, which need to find free memory, possibly zero it out, and then return pointer (reference) to it, both in managed and unmanaged languages, with C# being much faster at small object allocation (tlv read, pointer bump, and object header write) while unmanaged wins for large allocations instead (you don't have to go through LOH and extra cost associated with it). In comparison, stack is already zeroed for structs that are written to it, and those are just movs (or ldr/ldp's and str/stp's in case of arm64), and even then, only when spilled to stack at all (see below)

- Stack may not be the best way to describe it - think "local exclusively owned memory" which means that compilers, no matter how strict, can reason about the exact lifetimes of local values and the changes that happen to them. This means that all struct values can be promoted to CPU registers and never touch memory unlike with heap allocations, where multiple reads of the same property may require repeated dereferencing to account for the fact that object memory may be globally observable. This in turn applies to optimizations like CSE which can elide multiple identical checks against struct values knowing they won't change between operations.

- In .NET, generic method bodies for class-based generic arguments are shared (closest example in Rust - Box<dyn Trait>-based dispatch but with less overhead). However, struct generic arguments force method body monomorphization aka emitting specialized version for the exact generic type, which allows to write code with zero-cost abstractions the same way one would do in Rust with generics or in C++ with templates.

astrange · on Nov 4, 2023

> In comparison, stack is already zeroed for structs that are written to it

This is not possible. A stack is a bump pointer allocator and is the same as any other bump pointer allocator. This includes having to decide when/if to zero memory. (The best time is on free because of memory compression, but most implementations don't do this.)

It's certainly not true that the unused part of the stack is always already zeroed; what if you already used it once? (But it is true if you zero on free.)

> - Stack may not be the best way to describe it - think "local exclusively owned memory" which means that compilers, no matter how strict, can reason about the exact lifetimes of local values and the changes that happen to them.

This is escape analysis and applies to anything with a known lifetime.

neonsunset · on Nov 4, 2023

This is .NET and not JVM, my previous comment describes how it works today.

Be it C++, Rust or C#, the necessary space on stack is usually reserved in function/method prologue when known statically. Additionally, because C# guarantees that all local variables/memory are initialized, the corresponding stack space is pre-zeroed (it is efficient since it is done with widest applicable writes - scalar, sse/avx(2/512)/neon, etc. (arm64 has dczva which kicks in above certain threshold).

Regardless, the cost is not in bumping the offset/ptr or zeroing out the memory, it is in going through the allocator call (even if it's inlined, you're still executing more code) and the book-keeping required for heap allocations in general (both .NET's GC and allocators like Mimalloc do it), and then there is subsequent cost for tracking and collecting objects in the case of GC.

In addition, .NET does not do escape analysis because, again, it is not JVM - while it may be added in the future, it is (relatively) unprofitable to do today because allocation traffic is far lower since everything isn't a potentially escaping object, and structs or stack-allocated buffers are often used in performance-sensitive code (or where it makes sense to do so in general). The way .NET views the objects is similar to the way C++ views heap allocated data, albeit with less aggressive (and often unsound or UB) assumptions compared to GCC. I cannot stress this enough that while JVM's escape analysis does lead to object stack-allocation, the reasoning the compilers can do about state of the data on stack is what e.g. JVM gets as a result of doing escape analysis, not vice versa. And other "unmanaged" languages are subject to similar limitations when it comes to stack vs heap.

vips7L · on Nov 5, 2023

IIRC escape analysis in HotSpot won’t actually allocate the whole object on the stack. It’ll explode the object into its sub fields and store them in registers or spill them into the stack if register pressure is high.

astrange · on Nov 5, 2023

This is a common optimization in any compiler, not just HotSpot. Although it's more important and more extensive in the JVM where almost everything is a heap allocation, yes. (Although the technical term isn't "escape analysis"; that just covers analysis, not resulting optimization.)

I've noticed Microsoft seems to think ordinary compiler optimizations are deep magic they're very proud of not implementing. Do they just not have good enough compiler people?

neonsunset · on Nov 5, 2023

> I've noticed Microsoft seems to think ordinary compiler optimizations are deep magic they're very proud of not implementing. Do they just not have good enough compiler people?

What in our discussion has prompted you to respond with an ad-hominem attack?

astrange · on Nov 5, 2023

That's not an attack, unless you designed .NET, in which case I'm of course referring to your recruiters and not you.

Anyway, it's the part about how C# doesn't need to implement scalar replacement because it has structs. Do it anyway, it's good!

But I've also noticed (reading some .NET developer blog post I couldn't find for you now) them talking about how they couldn't do inlining because it would take too long and be too slow, so they put some very simple heuristics that did not look like a good trade off. Inlining of course is often very beneficial and can decrease code size.

neonsunset · on Nov 5, 2023

Well, this kind of attitude is just sad.

Rather than imagining issues .NET has without verifying them first and then complaining, I'd like to suggest to spot check assumptions with Godbolt[0] which would be a good start (it can't show DynamicPGO, NativeAOT-specific and some other opts but is still fairly illustrative).

A more comprehensive view of produced asm can be acquired with [DisassemblyDiagnoser] attribute when running code with BenchmarkDotNet [1] (in the Java world a similar solution is called JMH).

[0] https://godbolt.org/

[1] https://benchmarkdotnet.org/articles/guides/getting-started....

vips7L · on Nov 5, 2023

> Although the technical term isn't "escape analysis"; that just covers analysis, not resulting optimization.

Yeah I meant scalar replacement!

whoknowsidont · on Nov 4, 2023

>C# stack is completely native, identical to C++ or Rust.

This is absolutely not true. Where are you getting this from? Pray tell, what you think this is: https://github.com/dotnet/runtime