Everything I Ever Learned About JVM Performance Tuning At Twitter

sehugg · on Oct 28, 2011

A couple other things:

-XX:+UseCompressedOops converts many 64-bit references into 32-bit, trading a bit of CPU for a lot of memory

If you use a lot of threads, lower your stack size (-Xss) (no real method here, just drive some trucks over the bridge until it breaks, and that's your weight limit :P )

You don't want to swap! A good rule of thumb is to make your max heap (-Xmx) about 60% of total RAM if you only have 1 JVM running

Adaptive GC tuning algorithms can become unstable in long-running processes. I like to use -XX:GCTimeRatio for the throughput collector which effectively turns off the adaptive stuff. CMSInitiatingOccupancyFraction/UseCMSInitiatingOccupancyOnlydoes the same thing for the concurrent mark-sweep colletor. Keep the ice pick handy :P

VladRussian · on Oct 28, 2011

>You don't want to swap! A good rule of thumb is to make your max heap (-Xmx) about 60% of total RAM if you only have 1 JVM running

i'd prepend the advise with "use large pages, Luke". Set Xmx to almost 100% of large pages, while the total of large pages set to 60% of the total RAM in your case or whatever your application specific is. For non-filesystem-IO application (middleware between DB and client applications) in my case the 14G on 16G machine works fine.

wolf550e · on Oct 29, 2011

Have you tried transparent hugepages in the newer kernel?

VladRussian · on Oct 29, 2011

nope. Enterprise software, not on the bleeding edge.

xentronium · on Oct 29, 2011

Law of leaky abstractions at its best (or worst): you decide to use a vm and gc to abstract from manual memory management only to deal with gory details of memory padding and internals of jvm later. Ouch.

gchpaco · on Oct 29, 2011

The stock JVM is exceptionally bizarre about this because of the insane decision to pad all fields to memory boundaries (which means on a 32 bit machine a class with 10 individual bit fields consumes 40 bytes of memory instead of something sane, like, say, 2). It also has an enormous amount of header overhead; a smallint on a competent Lisp or Smalltalk image takes up, say, one 4 byte word and can use 30 bits of that for data. Java's Integer class which is similar in intent is (in stock JVM) somewhere between 12 and 32 depending on VM settings. I've never seen another VM quite like it. In a previous job we got burned pretty hard by the amount of sheer overhead in this and I had to spend about six months converting everything over to use raw int arrays and copy-on-write objects so we could fit a reasonable but not really huge dataset into memory.

Despite this, Hotspot is very good and the stock JVM GC is very useful and it is reasonably straightforward to get visibility into the state of the heap and what is going on with the GC. I've never worked in a professional capacity that has anything comparable to VisualVM, although I gather Smalltalk and commercial Lisps have comparable offerings, and you only have to program in Ruby for a while to learn to appreciate the GC quality.

jules · on Oct 29, 2011

The bits can't be in the same memory word for concurrency reasons. Setting a bit is not an atomic operation on most processors.

moonboots · on Oct 28, 2011

Good presentation - I hate slideshare though, randomly skips slides, requires flash, needs facebook login to download, etc.

koevet · on Oct 28, 2011

I have watched the presentation with my iPad so, no, Slideshare does not require Flash. I do agree on the suckiness of the FB marriage.

bfrog · on Oct 28, 2011

Funnily enough erlang does not have these horrible GC issues people are seeing in .NET and the HotSpot JVM.

wglb · on Oct 29, 2011

You have my curiosity piqued. Why is that?

derleth · on Oct 29, 2011

Apparently, the secret of Erlang garbage collection is that it's done per-thread, and the whole Erlang model is based on spawning many threads. As a result, each thread only has a few K of RAM to collect, as opposed to a bigger pool per thread for Java.

http://prog21.dadgum.com/16.html

simcop2387 · on Oct 29, 2011

I imagine that some of it is also the almost entirely strictly functional design of the language. Some of the ways it makes you write things should make certain tasks like that easier for the compiler. It should be almost always known at compile time when it'll be able to reap things.

derleth · on Oct 29, 2011

Right. For one thing, nothing in Erlang is mutable: To change something, you copy it and change the copy. This sounds like it should make generational garbage collection easier (to a first approximation, nothing in the older generation is being used) but I don't know.

derleth · on Oct 29, 2011

Yeah, but Erlang is a fake language, and fake languages are known for doing impossible things. ;)

ananthrk · on Oct 29, 2011

Does anyone have a PDF copy of this? The slideshare download is broken for this presentation.

js2 · on Oct 29, 2011

dropbox: http://dl.dropbox.com/u/2138120/twitter-jvmperformancetuning...

google docs: http://goo.gl/g7y02

cshesse · on Oct 28, 2011

My understanding is that some people just restart the JVM periodically to avoid GC pauses. How does that compare for web applications?

wolf550e · on Oct 29, 2011

While the JVM is down because you have restarted it, presumably your load balancer is routing requests to another JVM/machine. Then the new JVM starts up and warms up until it's ready to receive requsts.

A restart transformed a JVM with a fragmented old gen into a JVM with fresh old gen using X1 wall-clock seconds and Y1 cpu time/watts.

A full GC transforms a JVM with a fragmented old gen into a JVM with fresh old gen using X2 wall-clock seconds and Y2 cpu time/watts.

Have you ever compared X1,Y1 vs. X2,Y2?

jbri · on Oct 29, 2011

Does the JVM finish up outstanding requests and ask the load balancer to please-don't-send-me-anything-for-a-bit before it does a full GC?

wolf550e · on Oct 29, 2011

If you trigger a full GC "from outside", the way you would trigger a restart, you'd perform whatever you do for a restart (stop accepting requests, finish accepted requests, be unavailable, start accepting requests).

My point was that a full GC does not need to re-profile the code, re-JIT the code and warm data caches, like a newly restarted JVM does. Sending a user request to a JVM to be run in an interpreter with cold caches is not good for user satisfaction, so a newly restarted JVM needs to be given mock requests to warm it up (like a script for PGO static compilation). A JVM doing a full GC does not need all that to become fully ready, and if it has enough free RAM to defragment quickly, the process should be much more efficient.

nl · on Oct 29, 2011

No.

I suppose in theory you could use JMX instrumentation and tie it to your load balancer, but I've never heard of it being done.

(It's actually quite a good idea! hmm...)

nl · on Oct 29, 2011

I've never heard of people restarting the JVM to avoid GC pauses. Usually scheduled JVM restarts are to avoid "instability", which is usually a memory leak someone can't find or doesn't understand.

tsotha · on Oct 29, 2011

We run JVM instances for months between releases. I don't think there's any reason to restart unless there's something wrong with the code.

jlarocco · on Oct 29, 2011

"A research project had to load the full follower graph in memory"

Really? It had to? There was no possible alternative?

As often as I get the "fail whale" page on Twitter, I'm always skeptical seeing them present stuff like this and release code.

Maybe I'm just not grasping how large and popular Twitter is, but of the popular web services I use, Twitter fails more than all the others combined.

Though it was an interesting read, it seems suspect that they're having so many more problems than everybody else.

jQueryIsAwesome · on Oct 29, 2011

> Though it was an interesting read, it seems suspect that they're having so many more problems than everybody else.

So true; i don't even know what a "facebook overload" looks like because i have never had one.

rubashov · on Oct 28, 2011

If you're fighting with the JVM this much, why not just use C++? The problems were garbage collection, lack of control over memory layout, and bloated types. Tool for the job, no?

barrkel · on Oct 28, 2011

Tuning GC, by and large, lets you treat the whole system as a single abstraction. You monitor allocation rates and patterns, and adjust algorithms and generation sizes to meet your goals. But once you go to C++, you've dived into a world of itty bitty details; all the memory allocation behaviour is internal to the system, and if you want to change how it works, you have to do relatively severe cross-cutting stuff with overloaded new operator, allocator template arguments, etc. It's very invasive and easy to tie different parts of the system together inadvertently.

And when you go with C++ you've forsaken type safety (i.e. memory safety). That's a substantial thing to give up, IMO.

saurik · on Oct 28, 2011

"""And when you go with C++ you've forsaken type safety (i.e. memory safety). That's a substantial thing to give up, IMO.""" <- This is true only if you have poorly abstracted type casts. If you design your C++ program correctly, you simply won't ever be type casting or manually handling any of your memory. In fact, with a well designed system, this also solves your issue regarding centralized memory planning. I mean, it is as if you are assuming that once you touch C++ you are going to have a million little calls to malloc() with casts all over the place (which shouldn't even be true in a well-designed C program).

kenjackson · on Oct 28, 2011

Actually the problem is deeper than this. You can't have type safety w/o memory safety, and it has nothing to do with casting.

Here's an example. You allocate a chunk of memory as type Foo. You use it as Foo. You hand out references to Foo. Everyone is happy. Then someone says, "We don't need this instance of Foo anymore" and deletes it. Someone later comes and allocates Bar and the memory allocator correctly goes and grabs some of Foo's old memory and uses it for Bar.

All good, right? Well, except there's this other thread that is using a reference to that old instance of Foo. Suddenly you have a type safety issue at hand.

Now you can say, "Don't write programs with bugs". And indeed that is just about the only alternative.

saurik · on Oct 29, 2011

My claim is that well designed C++ programs do not have calls to delete at all, not just ones that cause bugs: you come up with your memory abstraction, you build that part of the system (it is likely that this code will have a single call to delete in it), and then your actual program relies on that as a "runtime": if the code in that area calls "delete", you do not allow that code, and treat it as a build failure.

I have been responsible for projects with on the order of hundreds of thousands of line of C++ (such as for a video game we were developing), and the only calls to "delete" were in the stack-locked reference counting implementation I designed, and a few optimized containers. It isn't even hard to think like this in a language like C++, and isn't highly different than a language like Haskell (you build abstractions from unsafePerformIO: you don't routinely code with it).

jules · on Oct 29, 2011

For concurrent data structures this "runtime" ends up being basically a re-implementation of garbage collection.

cperciva · on Oct 29, 2011

I don't understand you java people. Are you really saying that because a pointer-to-Foo will always point to a Foo, you're somehow safe from bugs?

Sure, you might be less likely to crash; but you're far more likely to get silent data corruption -- and I know which of those two I'd prefer to see.

barrkel · on Oct 29, 2011

No; the pointer-to-Foo will always point to the original Foo (unless it's reassigned). The point of the example is to show that type safety depends on not having access explicit heap management.

You're more likely to get silent data corruption without memory safety, than with it.

It has nothing to do with Java specifically, FWIW. Ruby, Python, shell script, C#, Haskell, they are all memory safe. It's a big church.

cperciva · on Oct 29, 2011

unless it's reassigned

Well, sure. And that's true of C as well. The question is what happens when that memory gets reused.

barrkel · on Oct 29, 2011

I don't know if you're being deliberately obtuse. At the semantic level of a memory safe language, there never is any memory reuse, because memory isn't a resource like that. It is impossible to observe memory being reused from within the language. In fact, you could use that as a kind of definition of memory safety. It cuts off a whole kind of memory corruption bug, makes it impossible (aside from bugs in the language and / or runtime, of course).

I'm just going to have to throw up my hands here. The masses on HN are apparently proudly ignorant, and unwilling to learn even the slightest thing before wading in.

cperciva · on Oct 29, 2011

What you're talking about is the benefits of garbage collection -- the fact that you can't free memory until nobody has a reference to it any more.

That's completely different from type safety.

kenjackson · on Oct 29, 2011

That's completely different from type safety.

It's not completely different from type safety. You're right, they're not the same thing, but GCs prevent a common form of type safety violation. With that said there are other techniques that exist that are orthogonal to GC.

At the same time there are problems, like lack of bounds checking, which can result in type safety issues.

My point isn't that you need a GC per se, but you do need memory safety. GCs happen to be one of the most prevalent ways to achieve it. Someone else in this thread noted that they build a complete runtime that guaratees memory safety and requires their devs to code against that runtime. That's fine too (although I think a lot less common than that poster might lead one to believe). It's almost like the memory safe subset of other popular existing runtimes.

barrkel · on Oct 29, 2011

Colin, it's embarrassing at this point, it really is. I'd have hoped, with your background, that you'd know better.

Memory safety is not dependent on having a GC - if you have an appropriate runtime, a smart enough type system, or a simple enough language semantics.

cperciva · on Oct 29, 2011

Ok, enlighten me: Assuming you don't have an infinite amount of RAM how do you avoid running out of memory without doing at least one of garbage collection or explicit memory deallocation?

sausagefeet · on Oct 29, 2011

This is why God invented std::shared_ptr

barrkel · on Oct 28, 2011

I don't think you understand what memory safety means. Please look it up.

It's independent of how you write or design your program. It's a property of the language + runtime combination.

(Folks out there downvoting me: you could do with some education too... Parent is a completely ignorant (in the best possible sense - easily fixed) comment.)

saurik · on Oct 28, 2011

You cannot escape the world of safety without using an unsafe feature: an unchecked pointer arithmetic or cast operator. These are features I can statically determine you are using from your code, and which I've stated in a well designed system you have fully abstracted out of the actual program.

Yes: in a couple places in your software you will have a cast, but it is about as useful to complain about that as it would be to claim that the JVM itself has a cast operator in it; if the program code isn't using it, then the program code can be proven to be as safe as the runtime, and C++ does not remove your ability to make statements like that.

So no: I believe that your comment is "completely ignorant" (and rude, to boot).

angelbob · on Oct 29, 2011

You cannot escape the world of safety without using an unsafe feature: an unchecked pointer arithmetic or cast operator.

Or threads.

saurik · on Oct 29, 2011

Somehow I feel like if you are spawning off threads while handling a web request, causing concurrency of data generated during the request, you are doing something wrong (that said: I've done it before, in Python; as I did it, however, I believed I was doing something wrong, and the complexity of the code jumped tremendously ;P), but it is a point well taken.

That said, I'm having a difficult time figuring out how that could actually cause a type safety problem, given the same constraints (using a safe memory management framework: garbage collection or stack-locked reference counting). I am totally willing to believe I'm missing something, however; care to provide an example (that does not involve type casts or unbounded pointer arithmetic)?

angelbob · on Oct 29, 2011

Wait, you're saying you're using garbage collection in C++? Or you're saying you're using stack-locked reference counting in C++? If the former, you have the same problem as Java: tuning it. If the latter, you've got different performance problems if it's shared between threads. If it's not shared between threads, then you only dodged the question.

saurik · on Oct 29, 2011

I am not claiming a performance improvement, or the lack of having to mess with garbage collection tuning settings. In fact, I have explicitly stated the opposite: that by moving to C++ you do not lose the ability to do global tuning, or even the ability to use a garbage collector to do it.

Nor even is the parent of the post I responded to (as you might then say "well, I am arguing that, and you can't have both"): you easily can have both, as all that these people actually end up doing is brick allocating blocks of objects (either by using C# structs, such as the Stack Overflow articles we've been seeing recently, or doing a poor man's "column store", splitting the fields into arrays, which then causes cache performance issues).

So, as that parent post's parent pointed out, if you are going to go through hell to do that, you may as well do so in C++, as it will be a million times easier (even doing this for their existing managed objects in Managed C++ would have been easier, and it is unfortunate that they didn't evaluate that).

Therefore, with that performance and tuning argument totally removed from the rest of the conversation, I am focussing instead on the irritatingly strong statement "That's a substantial thing to give up, IMO."--a statement I very explicitly pulled away from the unrelated argument in the previous paragraph of the comment--which does not seem at all warranted given the situation, or the facts of these languages.

I mean, the entire premise of this argument is flawed... there are very few systems that actually /are/ type safe, and yet people seem to like using them. We don't even need to go to silly examples like Haskell's unsafePerformIO keyword: C#, a very similar language and one that has been being talked about a lot with respect to GC performance on HN recently (the Stack Overflow article), is quite clearly not type safe, as it allows you to type cast pointers using the "unsafe" keyword.

Of course, you don't have to like people using the "unsafe" keyword, and you can enforce that people on your team not use it; but then that's the real issue: if you are allowed to use the entire specified system, as opposed to restricting yourself to "the known safe subset", then almost no systems of note are actually type safe, partly because users wouldn't stand for it.

With this understanding, we can now go further: Java?... /not type safe/ (sun.misc.Unsafe, as used in the ConcurrentLinkedQueue from earlier in this thread; or more simply, JNI, which people, including myself, use to do all sorts of craziness, even going as far as the JVM itself by backpatching its code at runtime... and I'm not kidding: I actually do that).

Therefore, that anyone is even arguing some hard line that Java is type safe, C++ is not, that the definition is clear, that I'm ignorant for supposedly not understanding that definition (despite spending years researching programming languages and virtual machines in academia, and having implemented multiple), and that this type safety "is a substantial thing to give up", is ludicrous.

barrkel · on Oct 29, 2011

On rudeness: I was using ignorant in its strictly technical sense, as in you didn't know the meaning of the phrases I used. You're right in that it was brash; but I was upset that someone had instantly downvoted me when they were WRONG, goddammit!

ig1 · on Oct 28, 2011

I used to work at an investment bank where we were able out-do our competitors at latency even though we were using Java and most of the industry was using C++.

The ease of building lock-free algorithms and changing threading models in Java was probably a significant factor.

wheels · on Oct 29, 2011

This also doesn't have to be an either-or sort of scenario. It's pretty easy to mix C++ and Java using the Java Native Interface. It's not going to win a beauty pageant, but it's not rocket science either.

My company's web services are exactly that: we do all of the heavy graph traversal stuff down in the C++ layer and do the web services and XML parsing and all of that stuff in Java. The Java types that correspond to our graph elements basically just are a reference to our persistent graph store (basically a bunch of mmaped disk structures) which means that to do graph traversal on the Java side we never have to actually load the graph into the JVM. And the glue code between the JVM and our C++ still weighs in at under 1k LOC.

gorset · on Oct 28, 2011

The JVM is very good if you are able to handle or avoid the GC pauses.

Unless you are running Azul's Zing JVM, some tuning is usually required for non-trivial systems. We have tuned our system to be very fast with low latency most of the time, and handle timeouts by using a load balancer in front of a few servers. The chance that all servers are having a GC pause at the same time can be made sufficiently small.

maximusprime · on Oct 28, 2011

-Xincgc avoids pauses no?

mbell · on Oct 28, 2011

In Java 6 -Xincgc simply translates to:

-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode

-XX:+CMSIncrementalMode causes more frequent, smaller pauses. It was designed for low CPU core count, small heap size applications where your willing to pay a CPU premium to gain better response times. It is specifically not recommended for 2+ CPUs and/or large heap sizes. With large heap sizes you'll be paying a large CPU penalty and if your are using / freeing a lot of objects it may not be able to "keep up" with its short pauses resulting in the heap filling up quickly.

gorset · on Oct 28, 2011

There's usually a tradeoff between many small pauses and few big ones (but even with CMS there will be a point where the JVM needs to do a full GC).

The usual approach for low latency systems are to spend the GC pauses incrementally over time. But for us, even though we require consistent low latency, big pauses are perfectly fine because we can run several replicas of the system behind a load balancer.

Not all systems can be designed to do this, and it usually implies a idempotent architecture where you can run the same transaction multiple times without any problems. But when it's possible, you can start guaranteeing low latency with a very high probability. Worst case: simply run the transaction on all instances and use the result from the first one that answers.

thwarted · on Oct 28, 2011

There does seem to be a line beyond which continuously manually tweaking the GC would be more work than manual memory management. GC allows you to get up and running faster, but the level of detail you need to know of the JVM to do manual tweaking of the GC implementation approaches the level of detail you need to be aware of in your app to do manual memory management.

That is to say, optimizing the JVM's GC settings and effectively writing/debugging C and C++ memory management seem to be wizardly-level tasks of about equal standing. TANSTAFL applies.

angelbob · on Oct 29, 2011

Sure. If you have this problem in 10% of your code, you can save the wizarding for the other 90% where tuning the garbage collector isn't terribly important, and you wouldn't need to write memory management code for acceptable performance.

10% of your code being performance critical would be a lot, in my experience. The usual number where I've worked is probably closer to 1%.

Yeroc · on Oct 30, 2011

I really don't think you can equate tuning the JVM to writing/debugging C and C++ memory management. Tuning the JVM doesn't require wizardry. It requires some attention to detail and careful monitoring but most of what you need to know were shared in these slides. There are also a few whitepapers out there on the subject that are pretty readable. I hardly think changing a few command line arguments on the JVM, using jvisualvm to monitor or -verbosegc etc. are anywhere near as difficult as actually implementing your own memory management system.

angelbob · on Oct 28, 2011

Those are the problems. But often in C++ you've still got problems with mixed collection types, knowing types at runtime, stability (Java's not perfect, but it's better), ability to replace running code at runtime, et al.

Getting Java's garbage collection tuned isn't trivial, but it's still easier than solving any of those problems in C++ at serious scale.

rubashov · on Oct 28, 2011

> problems with mixed collection types

I don't understand. C++ has exactly the same model for collections of polymorphic types. But you can also pursue data oriented designs that aren't feasible in Java.

> knowing types at runtime

RTTI works fine. Querying object types at run time is probably a sign of horrible design.

> stability

By this do you mean unhandled runtime exceptions? If anything it's easier to write C++ free of fatal "null pointer" exceptions by avoiding pointers in favor of references and using RAII well.

angelbob · on Oct 29, 2011

By this do you mean unhandled runtime exceptions?

Partially. Mostly I mean that bad code can make your runtime unstable by corrupting memory.

You could say "but I write code that never does that". Pretending for a moment that I believe you, you're still assuming nobody else does that, either.

RAII is great, modulo the problem that you have to handle the whole "no exceptions in constructors" problem to avoid resource leaks.

Querying object types at run time is probably a sign of horrible design

Querying object types at run time is usually a sign of doing something that wasn't anticipated at design time. As systems get larger and older, and as they get pieces patched at runtime, doing unanticipated things becomes inevitable.

Java isn't designed to be maximally clean and flexible for a tiny team. It's designed to build behemoth software for industry which is very forgiving of problems.

C++ doesn't do that as well.

I don't understand. C++ has exactly the same model for collections of polymorphic types.

Yes and no. C++ has more restrictions in terms of using only RTTI-capable types in mixed collections.

What types aren't RTTI-capable? Primitives and structs.

Java has the same problem, patched by autoboxing in such a way that you can't actually screw it up thoroughly. C++ lets you get into unfixable problems pretty quickly.

You're right - use only instances of classes with virtual functions, never a primitive or non-virtual class, and you can avoid it. But "with enough auditing, the problem goes away" assumes some pretty specific things about your final environment, including the assumption that errors either don't make it through or are very recoverable.

strlen · on Oct 29, 2011

> If you're fighting with the JVM this much, why not just use C++? The problems were garbage collection, lack of control over memory layout, and bloated types. Tool for the job, no?

Here's why. Compare these two:

http://hg.openjdk.java.net/jdk7/hotspot/jdk/file/9b8c96f96a0... <-- Doug Lea's java.util.concurrent.ConcurrentLinkedQueue

https://github.com/afeinberg/lockfree/blob/master/src/lockfr... <-- my port of above to C++0x

https://github.com/afeinberg/lockfree/blob/master/src/hazard... <-- essentially an implementation of garbage collection that's needed to to work around the ABA problem

tl;dr Shared memory concurrency is surprisingly hard with manual memory management. Not impossible, not infeasible, not impractical. Just hard.

C++ is still a valid choice for many products: JVM isolates you from the underlying OS, the VM subsystem, and there are cases where the cost of garbage collection is prohibitive.

However, I'll argue that vast majority of a site like Twitter (or LinkedIn, another high-scale JVM powered property) is best served by runtime like the JVM or CLR. Erlang is another great option, but it's more of something you'll have _along with_ JVM/CLR and C/C++: Erlang's model is (highly efficient, well abstracted) concurrency with message passing -- which is awesome, but not a full substitute for shared memory concurrency -- i.e., it's a great tool for some jobs, but not others.

C++ makes more sense for things like a B+Tree implementation: I've been using a pure Java B+Tree implementation -- BerkeleyDB Java Edition, and can certainly mention the negatives of that approach.

On the other hand, look at something like the routing layer of Voldemort, multi-Paxos implementation in ZooKeeper, or (as an example of another high scale, Java based service) the modified Paxos implementation in Google's MegaStore, non-storage parts of Amazon's SimpleDB (written at Java at one point, in Erlang at another -- not sure what it's written in now).

I'll also argue that I'd rather use a less verbose and more functional language than Java -- and honestly, in some cases C++ far outdoes Java in terms of expressiveness. Go, Scala, languages in the ML family and Erlang (especially with tools like dialyzer and quick check, to get around the dynamic typing) are the right way forward for building highly concurrent user-land "systems-y" software (think more databases or distributed middleware than an OS) -- the parts where memory layout is critical and which tend to produce a lot of garbage can always be implemented in C/C++.

tl;dr Programming language choice involves trade offs, Java is too verbose, but garbage collected languages/runtimes have their role in building highly scalable applications or service

saurik · on Oct 29, 2011

To be fair, the ConcurrentLinkedQueue class you linked to was apparently already so complex to implement that the developers had to go off the edge of type safety into the land of sun.misc.Unsafe; given how much research you already need to understand to feel confident that this algorithm works at all, going the extra distance to understand hazard pointers (which I, at least, find refreshingly easy to understand in juxtaposition), seems minuscule. ;P

That said, since I've now taken the time to read your code, I have a suggestion that would drastically clean up the parts that are making the manual memory allocation feel "overtly manual": rather than allocating memory and then using the default placement new to construct your object, you could define a placement new/delete that operates over the allocator.

This not only would remove all of the reinterpret_cast<>s and sizeof()s from the queue code, but it would also allow a future modification where the queue was able to store values other than void *, while retaining exception safety (as when you template the Node by way of the Queue itself, its more complex constructor would suddenly be allowed to throw, and the memory would need to be collected).

strlen · on Oct 29, 2011

Well, you could also perform CAS using AtomicReference -- the examples in Maurice Herlihy's The Art of Multiprocessor Programming [1] and Brian Goetz' Java Concurrency In Practice [2] do that. So you don't really need to use sun.misc.Unsafe in your own code (of course, you need CAS to implement AtomicReference in the first place).

You're also completely correct about placement new: I am working on a cleaned up version of this class, this was essentially a first pass to get myself more familiar with concurrency in c++0x. What complicates things a bit is that allocators are (much like all else in STL) meant to be used as class template arguments, which makes main separate compilation impossible -- hence the need of an adapter from an STL-style allocator to a pure virtual class. Separate compilation is why I also made a void* version of this initially.

I have a much cleaned up version in the work that will handle more than void . There's an implementation I call it ConcurrentLinkedQueueImpl that handles just void , that is compiled separately -- and there is generic version ConcurrentLinkedQueue that is specialized for void * (ends up just proxying the calls to ConcurrentLinkedQueueImpl), with the generic version (in turn) using ConcurrentLinkedQueue<void *> and placement new to hold any type.

Once again, the version posted there was a rough pass to get myself familiar with 0x concurrency constructs and hazard pointers -- the code is fairly messy.

[1] http://www.amazon.com/Art-Multiprocessor-Programming-Maurice... [2] Everyone should read this book cover to cover -- http://jcip.net/

strlen · on Oct 29, 2011

ConcurrentLinkedQueue in jdk6, by the way, does not use Unsafe:

http://hg.openjdk.java.net/jdk6/jdk6/jdk/file/b58af78ac79c/s...

wanderingmarker · on Oct 29, 2011

Could you expand a bit more on the drawbacks of BerkeleyDB JE which you found? Thanks!

strlen · on Oct 29, 2011

BerkeleyDB JE is for the most part excellent code (as excepted, given the calibre of the team at Sleepycat/Oracle that built it -- what else should one expect out of a startup founded by Margo Seltzer and Keith Bostic?).

That said, the main drawbacks are:

* At most, I can give the JVM about 18-22 gb of heap, out of that, at most 8-12gb (usually 10) of heap can go to BerkeleyDB's cache. If I give BDB-Je too little heap, I run the risk of it not having enough space for buffers (for cleaner thread, for scanning through the entries). If I give Bdb too much cache, memory pressure becomes an issue.

So now there's 10-12gb of memory (per machine) that's neither use for caching data by BerkeleyDB itself, nor used for the OS page cache.

* The index takes up surprisingly much space. In most cases, the index still fits in memory -- but it could definitely be smaller.

* Couple of implementation issues that are being addressed by the BDB-Je team and aren't really Java specific. I won't go into them.

That said there are many pluses to BerkeleyDB JE:

* Log structured B+Tree. Awesome for use with SSDs. SSDs also make the penalty of going outside the in-memory cache much less costly, a random seek for reads is 0.2 ms, and due to log structures design random writes -- not very efficient on SSDs -- don't happen. Additionally, log-structured design means less wear on an SSD. That said, I'd still strongly suggest using an SSD even with a conventional B+Tree (e.g., MySQL InnoDB) -- but benchmark your application first.

* Avoids overhead of JNI

* As I've mentioned in my comment, being on the JVM means you can spend a lot more time thinking about concurrency. The locking/latching design in BDB JE is very well made.