Just to reiterate stuff said in the other comments because your comment is maybe deliberately misrepresenting what was said in the thread.
Their entire cluster was 2.4 million CPU cores (without more info on what the cores were). This includes not only Ruby web applications that handle requests, but also other infrastructure. Asynchronous processing, database servers, message queue processing, data workflows etc, etc, etc. You cannot run a back of the envelope calculation and say 0.85 requests per second per core and that is why they're optimising Ruby. While that might be the end result and a commentary on contemporary software architecture as a whole, it does not tell you much about the performance of the Ruby part of the equation in isolation.
They had bursts of 280 million rpm (4.6 million rps) with average of 2.8 million rps.
> It does not tell you much about the performance of the Ruby part of the equation in isolation.
Indeed, it doesn't. However, it would be a fairly safe bet to assume it was the slowest part of their architecture. I keep wondering how the numbers would change if Ruby were to be replaced with something else.
Shopify invest heavily in Ruby and write plenty of stuff in lower level languages where they need to squeeze out that performance. They were heavily involved in Ruby's new JIT architecture and invested in building their own tooling to try and make Ruby act more like a static language (Sorbet, Bootsnap).
Runtime performance is just one part of a complex equation in a tech stack. It's actually a safe bet that their Ruby stack is pretty fucking solid because they've invested in that, and hiring ruby and JS engineers is still 1000x easier than hiring a C++ or Rust expert to do basic CRUD APIs.
Since we're insinuating, I bet you that Ruby is not their chief bottleneck. You won't get much more RPS if you wait on an SQL query or RPC/HTTP API call.
In my experience when you have a bottleneck in the actual Ruby code (not speaking about n+1s or heavy SQL queries or other IO), the code itself is written in such a way that it would be slow in whichever language. Again, in my experience this involves lots of (oft unnecessary) allocations and slow data transformations.
Usually this is preceded by a slow heavy SQL query. You fix the query and get a speed-up of 0.8 rps to 40 rps, add a TODO entry "the following code needs to be refactored" but you already ran out of estimation and mark the issue as resolved. Couple of months later the optimization allowed the resultset to grow and the new bottleneck is memory use and the speed of the naive algorithm and lack of appropriate data structures in the data transformation step... Again in the same code you diligently TODOed... Tell me how this is Ruby's fault.
Another example is one of the 'Oh we'll just introduce Redis-backed cache to finally make use of shared caching and alleviate the DB bottleneck'. Implementation and validation took weeks. Finally all tests are green. The test suite runs for half an hour longer. Issue was traced to latency to the Redis server and starvation due to locking between parallel workers. The task was quietly shelved afterwards without ever hitting production or being mentioned again in a prime example of learned helplessness. If only we had used an actual real programming language and not Ruby, we would not be hitting this issue (/s)
I wish most performance problems would be solved by just using a """fast language"""...
Effective use of IO at such scale implies high-quality DB driver accompanied by performant concurrent runtime that can multiplex many outstanding IO requests over few threads in parallel. This is significantly influenced by the language of choice and particular patterns it encourages with its libraries.
I can assure you - databases like MySQL are plenty fast and e.g. single-row queries are more than likely to be bottlenecked on Ruby's end.
> the code itself is written in such a way that it would be slow in whichever language. Again, in my experience this involves lots of (oft unnecessary) allocations and slow data transformations.
Inefficient data transformations with high amount of transient allocations will run at least 10 times faster in many of the Ruby's alternatives. Good ORM implementations will also be able to optimize the queries or their API is likely to encourage more performance-friendly choices.
> I wish most performance problems would be solved by just using a """fast language"""...
Many testimonies on Rust do just that. A lot of it comes down to particular choices Rust forces you to make. There is no free lunch or a magic bullet, but this also replicates to languages which offer more productivity by means of less decision fatigue heavy defaults that might not be as performant in that particular scenario, but at the same time don't sacrifice it drastically either.
You know, if I was flame-baiting, I would go ahead and say 'there goes the standard 'performance is more important than actually shipping' comment. I won't and I will address your notes even though unsubstantiated.
> Effective use of IO at such scale implies high-quality DB driver accompanied by performant concurrent runtime that can multiplex many outstanding IO requests over few threads in parallel. This is significantly influenced by the language of choice and particular patterns it encourages with its libraries.
In my experience, the bottleneck is mostly on the 'far side' of the IO from the app's PoV.
> I can assure you - databases like MySQL are plenty fast and e.g. single-row queries are more than likely to be bottlenecked on Ruby's end.
I can assure you, Ruby apps have no issues whatsoever with single-row queries. Even if they did, the speed-up would be at most constant if written in a faster language.
> Inefficient data transformations with high amount of transient allocations will run at least 10 times faster in many of the Ruby's alternatives. Good ORM implementations will also be able to optimize the queries or their API is likely to encourage more performance-friendly choices.
Or it could be o(n^2) times faster if you actually stop writing shit code in the first place.
Good ORMs do not magically fix shit algorithms or DB schema design. Rails' ORM does in fact point out common mistakes like trivial n+1 queries. It does not ask you "Are you sure you want me to execute this query that seq scans the ever-growing-but-currently-20-million-record table to return 5000 records as a part of your artisanal hand-crafted n+1 masterpiece(of shit) for you to then proceed to manually cross-reference and transform and then finally serialise as JSON just to go ahead and blame the JSON lib (which is in C btw) for the slowness".
> Many testimonies on Rust do just that. A lot of it comes down to particular choices Rust forces you to make. There is no free lunch or magic bullet, but this also replicates to languages which offer more productivity by means of less decision fatigue heavy defaults that might not be as performant in that particular scenario, but at the same time don't sacrifice it drastically either.
I am by no means going to dunk on Rust as you do on Ruby as I've just toyed with it, however I doubt that I could right now make the performance/productivity trade-off in Rust's favour for any new non-trivial web application.
To summarise, my points were that whatever language you write in, if you have IO you will be from the get go or later bottlenecked by IO and this is the best case. The realistic case is that you will not ever scale enough for any of this to matter. Even if you do you will be bottlenecked by your own shit code and/or shit architectural decisions far before even IO; both of these are also language-agnostic.
Their entire cluster was 2.4 million CPU cores (without more info on what the cores were). This includes not only Ruby web applications that handle requests, but also other infrastructure. Asynchronous processing, database servers, message queue processing, data workflows etc, etc, etc. You cannot run a back of the envelope calculation and say 0.85 requests per second per core and that is why they're optimising Ruby. While that might be the end result and a commentary on contemporary software architecture as a whole, it does not tell you much about the performance of the Ruby part of the equation in isolation.
They had bursts of 280 million rpm (4.6 million rps) with average of 2.8 million rps.