Comparing with a fairly optimized malloc at $COMPANY, the Go allocator is (both in terms of relative cycles and fraction of cycles of all Go programs) significantly more expensive than the C/C++ counterpart (3-4x IIRC). For one, it has to do more work, like setting up GC metadata, and zeroing.
There have recently been some optimizations to `runtime.mallocgc`, which may have decrease that 3-4x estimate a bit.
How can that be true? If it is 3-4x more expensive than malloc, then per my measurements your malloc is a bump allocator, and that simply isn't true for any real world malloc implementation (typically a modified free list allocator afaik). `mallocgc` may not be fast, but I simply did not find it as slow as you are saying. My guess is it is about as fast as most decent malloc functions, but I have not measured, and it would be interesting to see a comparison (tough to do as you'd need to call malloc via CGo or write one in C and one in Go and trust the looping is roughly the same cost).
I should correct and clarify: I meant 3-4x more expensive in relative terms. Meaning:
- For C++ programs, the allocator (allocating+freeing) consumes roughly 5% of cycles.
- For Go programs, the allocator (runtime.mallocgc) used to consume ~20% of cycles (this is the data I referenced). I checked and recently it's become closer to 15%, thanks to optimizations.
I have not tested the performance differential on a per-byte level (though that will also differ with object structure in Go).
There have recently been some optimizations to `runtime.mallocgc`, which may have decrease that 3-4x estimate a bit.