WebSocket benchmarks

jerf · on June 13, 2012

"I expected Go to kick Erlang's ass in the performance department but the message latency was much higher than Erlang's latency and we had 225 unhappy customers."

Go is neat and I hope to see it succeed and thrive, but it is simply unavoidable that Erlang has existed for much longer, has been getting tuned for much longer, and this sort of thing is its monomaniacal focus whereas Go is spread a bit more thinly at the moment. Go is trying to be a systems language, Erlang very much isn't.

lemming · on June 13, 2012

What was surprising was how much faster the Java implementation was than Go. I expected it to be competitive but I didn't expect it to perform much better.

zemo · on June 13, 2012

his reporting is very misleading. If you look at the number of messages and connections, Java only held ~5k connections, while Go held just shy of ~10k. In the same amount of time, Java was only able to facilitate the transmission of half the number of messages, so... it depends what you mean by "fast".

stock_toaster · on June 13, 2012

Interestingly, Go's "connection time" was by far the lowest.

He also tested with m1.medium instances, which are single cpu. No mention if the instances were 64bit or 32bit (this may matter for Go, as the GC has some issues under 32bit currently).

Tests are hard. Still nice to see a real-world-like comparison between some popular stacks.

Edit: discussion on go-nuts: https://groups.google.com/group/golang-nuts/browse_thread/th...

ericmoritz · on June 13, 2012

I used the m1.medium instance on 64bit Ubuntu 12.04. I will update the results.md to note that.

I will also rerun all the tests on 2 core 64bit instances tonight.

zemo · on June 13, 2012

he's using io.Copy in the main websocket handler, which is a generic, buffered routine for manipulating byte buffers. I don't know if the other implementations are buffered in this manner. It's likely that there are ways to improve the Go source.

Erlang is significantly more established, but I'm wary of judging technologies on their technological merits alone. Although the technical merits of Erlang are very interesting, I've been quite happy with my experiences using Go.

ericmoritz · on June 13, 2012

Feel free to fix the go version and submit a pull request. My implementation is just copy pasta from the net.

dchest · on June 13, 2012

That big difference can't be right. There's something wrong somewhere, possibly in Go's websockets implementation (which I think haven't been update for a while).

jerf · on June 13, 2012

To you and zemo's point, those are specific instantiations of the general point I'm making. I'm sure Go will get there, as long as it stays alive, but it shouldn't be a surprise that Erlang is more polished at the moment. That's not a criticism of Go per se, it's just an effect of where they are in their lifespans. Erlang is basically mature and just polishing, and Go's still a rambunctious early teen with a wide-open future.

dchest · on June 13, 2012

I agree with this, however I'm specifically pointing out that this is not about polish of Go in general sense -- while there's a lot things to improve in the compiler, etc., in this instance, there's something wrong somewhere, perhaps in websockets library. It doesn't mean that there's something wrong with Go per se. Note that websockets is not included in the standard library yet.

halayli · on June 13, 2012

When benchmarking language performance results, it's a good practice to explain why is one language faster than the other. In other words, profile and figure out the bottleneck.

If you don't know why, then it's better not to post them until you figure out the bottlenecks. There could be factors in your environment that can affect one language and not the other which you are not aware of. Publishing such number will confuse people.

borlak · on June 13, 2012

My guess would be that the poor performing implementations are using the naive select() solution, instead of one of the more logarithmic solutions like epoll or kqueue.

Pretty surprised to see node.js behave so poorly. We had discussions recently at where I work as to what server solution we should use for a multiplayer platform, and as I had recently built a socket/connection handling server in C they asked what I thought about node.js, and my only concern was that if the underlying code uses select(), then it's going to be a poor choice...but I don't think anyone got around to testing it.

search for: C10K for more information

cpleppert · on June 13, 2012

I did a bunch of these benchmarks a while back and the most important factor in a performance test like this is the VM/Language environment. Simply calling into a VM on every IO request will rapidly destroy performance no matter what your IO setup looks like.

Erlang's IO stack if just fantastic for this and is generally very well optimized. If you want to go even faster you can code the fast path in C/C++ and call out to a higher level language as needed. A little dated but helpful: http://www.metabrew.com/article/a-million-user-comet-applica...

I think that Webbit is also pretty good, this benchmark is little contrived.

halayli · on June 13, 2012

Poller is a very important factor but there are other things to consider as well depending on how fast you want to go. But sometimes 'fastest' is not what you should look at though.

Solutions like node.js, python, etc.. copy strings and buffers all the time and this can slow down the server considerably. They don't have an honest iov layer that won't copy data before passing it to writev and you cannot manage your own memory and create memory pools. These factors can have a bigger impact on performance than selecting the right poller.

ajross · on June 13, 2012

That doesn't sound right to me. There are 10k clients, each sending (and the server echoing) one small timestamp record per second. An AWS medium instance has access to half a DRAM pipe (frankly this test should fit entirely in L3 cache, but let's be conservative). The back of my envelope says that the RAM bandwidth can handle about 100kb of data copying per send or receive event. That's a staggeringly large amount of copying, literally thousands of times larger than the size of the record.

Honestly I think CPU or syscall overhead is a more likely culprit.

halayli · on June 14, 2012

You'd think.

Write a web server in C and a web server in python/erlang/go/node.js. You'll notice that the one written in C (or C++) runs twice as fast. If it was a syscall bottleneck then they'd achieve the same performance.

zero-copy request handling is not possible in high level languages without hacking.

For a quick test, run httperf against the dumbest node.js web server that returns 204.

Then run the same test against an nginx server that does nothing but return a 204

___location / { return 204; }

Compare the difference.

allardschip · on June 13, 2012

The Python version uses Gevent so likely uses epoll/kqueue for this test. It'd be interesting to profile and see where the issues are in the Node.js and Python code.

Weltschmerz · on June 13, 2012

This node.js version is most certainly faster https://github.com/Weltschmerz/wsdemo/blob/master/competitio...

Wondering how much!

kcbanner · on June 13, 2012

Hm, what performance impact does using arguments have?

Weltschmerz · on June 13, 2012

Using .apply({}, arguments) is actually a bit slower, but either method is so fast that it won't be a bottleneck. I use it in this case for elegance, because I can. Note that in your original implementation this is impossible, because you have to access the "type" variable and use one of two methods (sendUTF or sendBytes). `ws` has only one method (send) and the parameters of this method match the arguments to the callback for on('message').

mumrah · on June 13, 2012

Not surprised to see Erlang come out on top. As I understand it, this is precisely the type of problem Erlang is good at - doing lots of little lightweight things with minimal overhead.

MetaCosm · on June 17, 2012

Erlang is quietly doing heavy lifting all over the place. Amazon's SDB, Facebook's Chat, Various High Speed Trading Implementations, Ejabberd, CouchDB, RabbitMQ... and far more places quietly and privately.

The good news is, it finally seems like other languages are starting to tool out the way Erlang has (Akka, ZMQ, etc).

lucian1900 · on June 13, 2012

Python network programs, especially ones with many connections, should use Twisted.

A benchmark of anything but the best library/tools in each language is pointless.

TazeTSchnitzel · on June 13, 2012

One of the good Twisted WebSocket libraries is called txWS: https://github.com/MostAwesomeDude/txWS

It wraps an existing TCP factory, and makes it work with WebSocket.

The entire API is a single function.

ericmoritz · on June 14, 2012

Excellent. I wanted to write a twisted implementation. care to take a crack at it and submitting a pull request?

TazeTSchnitzel · on June 14, 2012

Should I have time, I'll think about it.

ericmoritz · on June 14, 2012

I can implement the server. Is there anything I should know about tuning Twisted for this kind of benchmark?

Scramblejams · on June 16, 2012

If you haven't yet, hit up #twisted on freenode for any tips you need. Lots of heavy hitters there who are happy to help.

TazeTSchnitzel · on June 14, 2012

I don't know much about tuning Twisted, sorry, but maybe experiment with different reactors, like the libevent one?

hogu · on June 13, 2012

the python ws benchmark uses gevent with ws4py, it's also asynchronous - do you expect twisted to outperform gevent? why?

lucian1900 · on June 13, 2012

For trivial cases gevent might be fast, but its design is simply flawed, it doesn't handle all the intricacies of networks.

Regardless, using a single library, and not even the mainstream one, makes for a poor benchmark.

allardschip · on June 13, 2012

Can you elaborate on what you think is wrong with Gevent? It comes out pretty well in this shootout: http://nichol.as/benchmark-of-python-web-servers

lucian1900 · on June 14, 2012

A common case where it goes very wrong is disconnection logic. It's rather complex and gevent has a very naive view of it, which means you can get stuck greenlets.

Micro-benchmarks rarely are complex enough to expose design problems, my point was that one should at least use the de-facto standard, if not several alternatives too.

rdtsc · on June 13, 2012

Why not gevent or eventlet?

ericmoritz · on June 13, 2012

This initial release is not meant to be definitive. I welcome pull requests for better implementations in your language/platform of choice and any suggestion to better tune Linux for these tests.

willvarfar · on June 13, 2012

I'd love to see numbers for my hellepoll on that configuration.

https://github.com/williame/hellepoll

http://williamedwardscoder.tumblr.com/post/18200335569/the-h...

http://williamedwardscoder.tumblr.com/post/13590981677/perfo...

ajacksified · on June 13, 2012

Pretty neat. I'm going to fork it and throw it against my C# implementation (http://www.alchemywebsockets.net) on mono and see how it compares.

Liongadev · on June 13, 2012

would like to see that, please post it

zurn · on June 14, 2012

Looking at https://github.com/ericmoritz/wsdemo it seems he's using his own erlang ws implementation and using a third party library for all the others?

If this is correct, then it's no big surprise that he does well in his own benchmark that he's developing against.

motiejus · on June 16, 2012

No. He is using Cowboy[1], which is the mainstream websocket Erlang implementation (uh.. cowboy actually is a socket acceptor pool which happens to have awesome HTTP and WS handlers).

[1]: https://github.com/ericmoritz/wsdemo/blob/master/src/wsdemo.... [2]: https://github.com/extend/cowboy

riffraff · on June 13, 2012

did I miss something or the methodology lacks a warm up step?

What I mean is, I'd expect a first run of 5 mins to warm up the servers, than another one to actually get the data.

Otherwise, while averages will be mostly the same, numbers like the dropped connections _could_ be misleading.

DanWaterworth · on June 13, 2012

With websocket connections, provided the latency is low enough, memory usage is a bigger concern.