Hacker News new | past | comments | ask | show | jobs | submit login

My guess would be that the poor performing implementations are using the naive select() solution, instead of one of the more logarithmic solutions like epoll or kqueue.

Pretty surprised to see node.js behave so poorly. We had discussions recently at where I work as to what server solution we should use for a multiplayer platform, and as I had recently built a socket/connection handling server in C they asked what I thought about node.js, and my only concern was that if the underlying code uses select(), then it's going to be a poor choice...but I don't think anyone got around to testing it.

search for: C10K for more information




I did a bunch of these benchmarks a while back and the most important factor in a performance test like this is the VM/Language environment. Simply calling into a VM on every IO request will rapidly destroy performance no matter what your IO setup looks like.

Erlang's IO stack if just fantastic for this and is generally very well optimized. If you want to go even faster you can code the fast path in C/C++ and call out to a higher level language as needed. A little dated but helpful: http://www.metabrew.com/article/a-million-user-comet-applica...

I think that Webbit is also pretty good, this benchmark is little contrived.


Poller is a very important factor but there are other things to consider as well depending on how fast you want to go. But sometimes 'fastest' is not what you should look at though.

Solutions like node.js, python, etc.. copy strings and buffers all the time and this can slow down the server considerably. They don't have an honest iov layer that won't copy data before passing it to writev and you cannot manage your own memory and create memory pools. These factors can have a bigger impact on performance than selecting the right poller.


That doesn't sound right to me. There are 10k clients, each sending (and the server echoing) one small timestamp record per second. An AWS medium instance has access to half a DRAM pipe (frankly this test should fit entirely in L3 cache, but let's be conservative). The back of my envelope says that the RAM bandwidth can handle about 100kb of data copying per send or receive event. That's a staggeringly large amount of copying, literally thousands of times larger than the size of the record.

Honestly I think CPU or syscall overhead is a more likely culprit.


You'd think.

Write a web server in C and a web server in python/erlang/go/node.js. You'll notice that the one written in C (or C++) runs twice as fast. If it was a syscall bottleneck then they'd achieve the same performance.

zero-copy request handling is not possible in high level languages without hacking.

For a quick test, run httperf against the dumbest node.js web server that returns 204.

Then run the same test against an nginx server that does nothing but return a 204

___location / { return 204; }

Compare the difference.


The Python version uses Gevent so likely uses epoll/kqueue for this test. It'd be interesting to profile and see where the issues are in the Node.js and Python code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: