I find it interesting seeing the strides that Ruby is making with its GC. Interesting because I saw a very similar evolution with Java's GC a decade or so ago. If history repeats itself, then the next phase will be ruby monitoring and troubleshooting tools.
Anecdotal, but we have seen a significant improvement in latency in our api response times after switching from Ruby 2.1 to 2.2. With 2.1 we could easily spot requests where the gc ran, but with 2.2 our requests are much more consistent.
It's so annoying to keep listen to people exaggerating problems of GC and pauses just because they only know of naive approaches like mark and sweep or reference counting.
Yeah, "new" languages have a real tendency to go for "some sort" of automatic memory management. And they pick the easy option.
Unfortunately, this can lead to wired-in assumptions, such as reference counting which can lead to extension libraries relying on header files to do reference counting logic. The better option, imho would be to instead require extension libraries to 'pin' objects because they've been passed though an FFI.
If a language implementation were to provide an opaque interface of "I am holding onto this object", "I want to allocate an object", "I want to read/write this object's element fields", a far better GC can be provided at a later date while retaining backwards compatibility.
Why go through all this hassle for extension libraries? Because then your GC implementation is completely decoupled from libraries. And most language implementors sufficiently well read to know how to apply GC, but a little ignorant of providing space for a complete decoupling of GC.
Turn GC into an interface and you can provide very advanced GCs if you can find someone to write them. E.g. Java's G1GC for ruby.
advanced GCs need more, they need to know your object layout on the stack and on the heap. Otherwise they have to be conservative and have to pin a lot of objects during the moving phase and cannot skip large non-pointer structures, at least if those reside on the same heap.
And once you start pinning objects using bump pointer allocators gets more complicated since you have to maintain holes or free-lists.
I think the next logical step for ruby's GC would be parallelizing it. STW pauses are fairly controlled states that should be amenable to parallelization i.e. suffer far less from potentially racy behavior like mutator threads.