Comparing with a fairly optimized malloc at $COMPANY, the Go allocator is (both ...

nu11ptr · 2025-04-01T12:48:49 1743511729

How can that be true? If it is 3-4x more expensive than malloc, then per my measurements your malloc is a bump allocator, and that simply isn't true for any real world malloc implementation (typically a modified free list allocator afaik). `mallocgc` may not be fast, but I simply did not find it as slow as you are saying. My guess is it is about as fast as most decent malloc functions, but I have not measured, and it would be interesting to see a comparison (tough to do as you'd need to call malloc via CGo or write one in C and one in Go and trust the looping is roughly the same cost).

aktau · 2025-04-02T08:38:24 1743583104

I should correct and clarify: I meant 3-4x more expensive in relative terms. Meaning:

  - For C++ programs, the allocator (allocating+freeing) consumes roughly 5%  of cycles.
  - For Go programs, the allocator (runtime.mallocgc) used to consume ~20% of cycles (this is the data I referenced). I checked and recently it's become closer to 15%, thanks to optimizations.

I have not tested the performance differential on a per-byte level (though that will also differ with object structure in Go).