1. This is not due to GC design but to language semantics (value types, which ar...

ngrilly · on July 17, 2018

> This is not due to GC design but to language semantics (value types, which aren't yet available in Java).

The fact that Go has value types helps a lot, but the latency inferior to 0.5 ms is mainly the result of GC design, as explained in the discussed article (especially the work on removing as much as possible stop the world pauses).

> Note that at some point, worst-case latency becomes far less meaningful than throughput, because unless running on a realtime OS, the OS introduces bigger pauses, and it makes no sense for the GC to try and beat them.

This is already said in the article.

pron · on July 17, 2018

Because you usually pay for latency with throughput, you can afford the low latency achieved with a simpler design only if you have less concurrent work.

masklinn · on July 18, 2018

Go is not magic, and does in fact pay for latency with throughput: https://www.reddit.com/r/golang/comments/5j7phw/modern_garba...

> Go: 67 ms max, 1062 pauses, 23.6 s total pause, 22 ms mean pause, 91 s total runtime

> Java, G1 GC, no tuning: 86 ms max, 65 pauses, 2.7 s total pause, 41 ms mean pause, 20 s total runtime

nvarsj · on July 18, 2018

I couldn't find the benchmarks for that comment. But the Java numbers seem way off. It must have been a simple toy application, probably with allocations optimized away somewhere - I'd want to see memory allocation rates to see if they are comparable. G1 is usually higher latency than CMS in my experience, too. Try running Cassandra with G1, and see how good your pauses are. Min viable pause time target for G1 is ~200ms (the hotspot default).

BuckRogers · on July 19, 2018

I don't really understand the argument that Go is "almost" soft-realtime.. if you need that, you probably need or should just go to realtime. Use say, Rust or C++.

Otherwise it seems to me that the Java/C# model is the best design for most tasks. Which is why they're so popular, it's not a mistake.

ngrilly · on July 19, 2018

> Otherwise it seems to me that the Java/C# model is the best design for most tasks.

This is discussed in the article (basically, Google needed low latency servers):

« If you want 10 answers ask for several more and take the first 10 and those are the answers you put on your search page. If the request exceeds 50%ile reissue or forward the request to another server. If GC is about to run, refuse new requests or forward the requests to another server until GC is done. And so forth and so on.

All these are workarounds come from very clever people with very real problems but they didn't tackle the root problem of GC latency. At Google scale we had to tackle the root problem. Why?

Redundancy wasn't going to scale, redundancy costs a lot. It costs new server farms. »

pron · on July 19, 2018

> but they didn't tackle the root problem of GC latency

But they did. The new low-latency Java GCs are more sophisticated than Go's, and deliver pauses that are on the order of OS-caused pauses. The reason Go was able to achieve low latency with a relatively simple design is because 1. it suffers a hit to throughput and 2. that throughput hit, while significant, is not catastrophic because Go relies heavily on value types.

ngrilly · on July 19, 2018

> The new low-latency Java GCs

As you wrote, they are new, and weren't available when the decision was made for the Go GC.

> The reason Go was able to achieve low latency with a relatively simple design is because 1. it suffers a hit to throughput and 2. that throughput hit, while significant, is not catastrophic because Go relies heavily on value types.

And are you saying these "new low-latency Java GCs" have no tradeoffs either?

I'm sorry, but we are discussing the keynote of the International Symposium on Memory Management, which is a recognized event in the field, and you are claiming things without any substantial material to offer. Maybe you're right, but I need more than vague assertions to be convinced :-)

pron · on July 19, 2018

There is always a tradeoff in throughput (and a commercial low-latency GC has been available for Java for years, as well as realtime GCs). All I'm saying is that the reason Go is able to achieve low latency with a relatively simple design is because the language is designed to generate less garbage, so the challenge is smaller. My point is that Go's GC is not some extraordinary breakthrough in GC design (not that Hudson hadn't made some in the past) that unlocks the secret to low-pause GCs, but more of an indirect exploitation of the fact that the allocation rate is relatively low. The same design with much higher allocation rates would likely suffer an unacceptable hit to throughput.

A recent presentation on ZGC is here: https://www.youtube.com/watch?v=tShc0dyFtgw

ngrilly · on July 19, 2018

I'm not really sure what you're trying to prove here. Java is definitely a great platform and has a cutting-edge GC. Nobody contests that. Go's GC is another point in the design space, starting with different constraints (low latency and non moving). This is what makes the ISMM keynote interesting.

Hudson explains they tried to switch to a generational GC, but for this they needed a write barrier. It was difficult to optimize the write barrier, by eliding it whenever possible, because Go is moving to a system where there is a GC safepoint at every instruction (this is because goroutines can be preempted at anytime, which is not a requirement for Java threads). In other words, the GC design is also constrained by the way goroutines work.

Hudson also explains that because Go relies a lot on value types, it makes espace analysis more effective, even without interprocedural analysis, which makes generational collection less effective than in other languages using a GC.

Keeping the allocation rate low is part of Go's GC design. A language with a "much higher allocation rate" would probably lead to a different design.

Thanks for the link to the presentation on ZGC! I'll watch it soon. But I saw the slide showing the performance goals and ZGC doesn't sounds a lot better than the numbers presented by Hudson for Go:

- "10 ms max pause time" for ZGC versus "two <500 microseconds STW pauses per GC" for Go

- "15 % max throughput reduction" for Java versus "25% of the CPU during GC cycle" for Go"

By the way, I also note that ZGC is "single generation".

ngrilly · on July 19, 2018

> The new low-latency Java GCs are more sophisticated than Go's, and deliver pauses that are on the order of OS-caused pauses.

ZGC, whose explicit goal is to be a low latency GC for Java, has a goal of 10 ms max pause time, which seems way above the pauses caused by the OS (compared to two <0.5 ms pauses per GC cycle for Go). The number came from the video you shared earlier. But maybe I misunderstood something.

pron · on July 20, 2018

1. In the video they say that in practice they see average pauses of 1ms and max pauses of 4ms.

2. It's still very early days for ZGC, but there are more established low-latency Java GCs. A non-realtime one is C4[1] (used in the Zing JVM), which then added generational collection to a design that is similar to ZGC. They report[2] a few 0.5-1ms pauses per hour

[1]: https://www.azul.com/files/c4_paper_acm2.pdf

[2]: https://groups.google.com/forum/#!topic/mechanical-sympathy/...

ngrilly · on July 25, 2018

Thanks for the links.

This is definitely great engineering.

I also recommend this short presentation of ZGC: https://dinfuehr.github.io/blog/a-first-look-into-zgc/

pjmlp · on July 17, 2018

And even on the real time case, we have then the real time extensions, even though they are out of range for most devs (price wise).