Shopify strategy aka the story of YJIT If I cannot refactor my services, I shall...

pjmlp · 2024-12-25T09:53:12 1735120392

That has been the story of every dynamic language since forever, thankfully the whole AI focus has made JITs finally matter in CPython world as well.

Personally I have learnt this lesson back in 2000's, in the age of AOLServer, Vignette, and our own Safelayer product. All based on Apache, IIS and Tcl.

We were early adopters of .NET, when it was only available to MSFT Partners and never again, using scripting languages without compilers, for full blown applications.

Those learnings are the foundations of OutSystems, same ideas, built with a powerful runtime, with the hindsight of our experiences.

nerdponx · 2024-12-26T17:21:28 1735233688

> AI

The push for Python performance and JIT compilation has little to do with AI and more to do with Python's explosion in adoption for backend server applications in the 2010s, as well as the dedication of smaller projects like PyPy that existed largely because it was possible to make them exist. The ML/AI boom helped spread Python even farther and wider, yes, but none of the core language performance improvements are all that relevant for ML or AI.

As another commenter pointed out, the performance bottlenecks in AI specifically have essentially to do with the CPython runtime performance. The only exception is in the pre-processing of very large text corpora, and that alone has hardly been a blip on the radar of the people working on CPython performance.

Moreover, most of the "Python performance" projects that do sit closer to machine learning use cases (Cython-Numpy integration, Numba, Nuitka) are more or less orthogonal to the more recent push for Python interpreter performance.

Cython itself and MypyC are mainly relevant because they are intended to be general-ish purpose performance boosters for CPython, and in doing so helped fill the need for greater performance in "hot and loopy" code such as network protocols, linters, and iterators. Cython also acted as a convenient glue layer for ad-hoc C library binding. But neither project is all that closely related to AI or to the various JIT compilers that have arisen over the years.

pjmlp · 2024-12-26T19:29:01 1735241341

Not at all, given Facebook and Microsoft involvement into making CPython folks finally accept a JIT has to be part of the story, coupled by NVidia and Intel work on GPU JIT DSLs for Python.

nerdponx · 2024-12-26T23:06:39 1735254399

Yeah but how much of the Microsoft and Facebook effort was due to AI directly, as opposed to the general popularity of Python? which is undoubtedly driven nowadays by AI, but indirectly.

pjmlp · 2024-12-27T16:13:57 1735316037

What Python projects do they have outside AI?

saagarjha · 2024-12-30T10:33:13 1735554793

Instagram?

rapind · 2024-12-25T10:20:23 1735122023

> Personally I have learnt this lesson back in 2000's, in the age of AOLServer, Vignette, and our own Safelayer product. All based on Apache, IIS and Tcl.

Woah, your mention of “Vignette” just brought back a flood of memories I think my subconscious may have blocked out to save my sanity.

zahlman · 2024-12-26T06:04:55 1735193095

>thankfully the whole AI focus has made JITs finally matter in CPython world as well.

Isn't most of the work in Python AI projects done in C or C++ extensions anyway?

pjmlp · 2024-12-26T14:37:46 1735223866

Yes, but not everyone loves to have dual stack development, I surely didn't, back in the Tcl days, eventually we ask ourselves for how long.

nerdponx · 2024-12-26T17:12:28 1735233148

That's not how it works in Python.

The C/C++ is shipped in the form of well-established libraries like Numpy and PyTorch. Very few end users ever interact with the C/C++ parts, except for specialists with special requirements, and library contributors themselves.

pjmlp · 2024-12-26T19:31:32 1735241492

It is definitely how it works in Python.

As if there is nothing else to chose from regarding Python performance issues and libraries used by folks.

Not everything is fashionable AI.

nerdponx · 2024-12-26T23:04:47 1735254287

The comment thread was specifically about AI, so my comments were specifically meant for that context. I wasn't clear enough, sorry for the confusion.

zahlman · 2024-12-26T22:44:45 1735253085

Can you name specific "un-fashionable" AI projects that are dependent on Python code for things that have any significant performance impact, which are seeing significant benefits from Python JIT implementations?

pjmlp · 2024-12-27T16:12:15 1735315935

I guess you will have to ask Microsoft, Facebook, NVidia and Intel why they are bothering then.

zahlman · 2024-12-27T22:32:34 1735338754

Can you name projects at those companies which meet the description?

cheikhcheikh · 2024-12-27T14:37:26 1735310246

He cannot

pjmlp · 2024-12-27T16:12:28 1735315948

Microsoft, Facebook, NVidia and Intel apparently can.

rat87 · 2024-12-25T14:10:19 1735135819

What's a scripting language? Also I'm not sure for TCL (https://news.ycombinator.com/item?id=24390937 claims it's had a bytecode compiler since around 2000) but the main python and Ruby implementations have compilers (compile to bytecode then interpret the bytecode). Apparently ruby got an optional (has to be enabled) jit compiler recently and python has an experimental jit in the last release (3.13).

igouy · 2024-12-25T19:21:59 1735154519

"... the distinguishing feature of interpreted languages is not that they are not compiled, but that any eventual compiler is part of the language runtime and that, therefore, it is possible (and easy) to execute code generated on the fly."

p57 https://www.lua.org/pil/#1ed

pjmlp · 2024-12-26T14:39:31 1735223971

https://en.m.wikipedia.org/wiki/Scripting_language

Chetan496 · 2024-12-25T11:57:33 1735127853

Hey, I have worked on the Outsystems platform. Developed some applications. Do you work at Outsystems?

pjmlp · 2024-12-25T12:47:59 1735130879

No, I worked with the founders at a previous startup, Intervento, which became part of an EasyPhone acquisition, which got later renamed into Altitude Software alongside other acquisitions.

They eventually left and founded OutSystems with what we learned since the Intervento days, OutSystems is of the greatest startup stories in the Portuguese industry.

This was all during dotcom wave from the 2000's, instead I left to CERN.

maccard · 2024-12-26T16:11:35 1735229495

HHVM has raised its head.

pjmlp · 2024-12-26T19:32:06 1735241526

Which happens to have a JIT compiler, and contributed to standard PHP having one as well.

janice1999 · 2024-12-25T16:07:43 1735142863

Classic story. Didn't Dropbox do the same for Python? ANd Facebook for PHP (and then forked it)?

tuananh · 2024-12-25T16:11:12 1735143072

Roblox did the same with luau https://luau.org/performance

rurban · 2024-12-26T08:12:11 1735200731

And cPanel for perl

neonsunset · 2024-12-25T10:41:05 1735123265

During their black friday / cyber monday load peak, Shopify averaged between ~0.85 and ~1.94 back-to-back RPS per CPU core. Take from that what you will.

Reference: https://x.com/ShopifyEng/status/1863953413559472291

PhilippGille · 2024-12-25T12:41:21 1735130481

You seem to imply that everything they run is Ruby, but they're talking about 2.4 million CPU cores on their K8s cluster, where maybe other stuff runs as well, like their Kafka clusters [1] and Airflow [2]?

[1] https://shopify.engineering/running-apache-kafka-on-kubernet...

[2] https://shopify.engineering/lessons-learned-apache-airflow-s...

cristianbica · 2024-12-25T12:36:29 1735130189

Obviously you meant for the whole infrastructure: ruby / rails workers, Mysql, Kafka, whatever other stuff their app needs (redis, memcache, etc), loadbalancers, infrastructure monitoring, etc.

neonsunset · 2024-12-25T12:45:55 1735130755

This is correct! I thought this was clear but I guess not...

ksec · 2024-12-26T05:13:53 1735190033

It is not because this is the first time I heard about back to back RPS. Which when come to think of it isn't too bad of a metric from a business POV.

We can also infer that into how much saving YJIT provides. At this point Shopify is likely already getting a return of investment from YJIT.

phoronixrly · 2024-12-25T20:59:16 1735160356

Just to reiterate stuff said in the other comments because your comment is maybe deliberately misrepresenting what was said in the thread.

Their entire cluster was 2.4 million CPU cores (without more info on what the cores were). This includes not only Ruby web applications that handle requests, but also other infrastructure. Asynchronous processing, database servers, message queue processing, data workflows etc, etc, etc. You cannot run a back of the envelope calculation and say 0.85 requests per second per core and that is why they're optimising Ruby. While that might be the end result and a commentary on contemporary software architecture as a whole, it does not tell you much about the performance of the Ruby part of the equation in isolation.

They had bursts of 280 million rpm (4.6 million rps) with average of 2.8 million rps.

neonsunset · 2024-12-25T21:22:24 1735161744

> It does not tell you much about the performance of the Ruby part of the equation in isolation.

Indeed, it doesn't. However, it would be a fairly safe bet to assume it was the slowest part of their architecture. I keep wondering how the numbers would change if Ruby were to be replaced with something else.

ljm · 2024-12-26T01:39:32 1735177172

Shopify invest heavily in Ruby and write plenty of stuff in lower level languages where they need to squeeze out that performance. They were heavily involved in Ruby's new JIT architecture and invested in building their own tooling to try and make Ruby act more like a static language (Sorbet, Bootsnap).

Runtime performance is just one part of a complex equation in a tech stack. It's actually a safe bet that their Ruby stack is pretty fucking solid because they've invested in that, and hiring ruby and JS engineers is still 1000x easier than hiring a C++ or Rust expert to do basic CRUD APIs.

phoronixrly · 2024-12-25T21:45:03 1735163103

Since we're insinuating, I bet you that Ruby is not their chief bottleneck. You won't get much more RPS if you wait on an SQL query or RPC/HTTP API call.

In my experience when you have a bottleneck in the actual Ruby code (not speaking about n+1s or heavy SQL queries or other IO), the code itself is written in such a way that it would be slow in whichever language. Again, in my experience this involves lots of (oft unnecessary) allocations and slow data transformations.

Usually this is preceded by a slow heavy SQL query. You fix the query and get a speed-up of 0.8 rps to 40 rps, add a TODO entry "the following code needs to be refactored" but you already ran out of estimation and mark the issue as resolved. Couple of months later the optimization allowed the resultset to grow and the new bottleneck is memory use and the speed of the naive algorithm and lack of appropriate data structures in the data transformation step... Again in the same code you diligently TODOed... Tell me how this is Ruby's fault.

Another example is one of the 'Oh we'll just introduce Redis-backed cache to finally make use of shared caching and alleviate the DB bottleneck'. Implementation and validation took weeks. Finally all tests are green. The test suite runs for half an hour longer. Issue was traced to latency to the Redis server and starvation due to locking between parallel workers. The task was quietly shelved afterwards without ever hitting production or being mentioned again in a prime example of learned helplessness. If only we had used an actual real programming language and not Ruby, we would not be hitting this issue (/s)

I wish most performance problems would be solved by just using a """fast language"""...

neonsunset · 2024-12-25T22:11:15 1735164675

Here comes the "IO" excuse :)

Effective use of IO at such scale implies high-quality DB driver accompanied by performant concurrent runtime that can multiplex many outstanding IO requests over few threads in parallel. This is significantly influenced by the language of choice and particular patterns it encourages with its libraries.

I can assure you - databases like MySQL are plenty fast and e.g. single-row queries are more than likely to be bottlenecked on Ruby's end.

> the code itself is written in such a way that it would be slow in whichever language. Again, in my experience this involves lots of (oft unnecessary) allocations and slow data transformations.

Inefficient data transformations with high amount of transient allocations will run at least 10 times faster in many of the Ruby's alternatives. Good ORM implementations will also be able to optimize the queries or their API is likely to encourage more performance-friendly choices.

> I wish most performance problems would be solved by just using a """fast language"""...

Many testimonies on Rust do just that. A lot of it comes down to particular choices Rust forces you to make. There is no free lunch or a magic bullet, but this also replicates to languages which offer more productivity by means of less decision fatigue heavy defaults that might not be as performant in that particular scenario, but at the same time don't sacrifice it drastically either.

phoronixrly · 2024-12-25T22:56:08 1735167368

> There comes the standard "IO" excuse :)

You know, if I was flame-baiting, I would go ahead and say 'there goes the standard 'performance is more important than actually shipping' comment. I won't and I will address your notes even though unsubstantiated.

> Effective use of IO at such scale implies high-quality DB driver accompanied by performant concurrent runtime that can multiplex many outstanding IO requests over few threads in parallel. This is significantly influenced by the language of choice and particular patterns it encourages with its libraries.

In my experience, the bottleneck is mostly on the 'far side' of the IO from the app's PoV.

> I can assure you - databases like MySQL are plenty fast and e.g. single-row queries are more than likely to be bottlenecked on Ruby's end.

I can assure you, Ruby apps have no issues whatsoever with single-row queries. Even if they did, the speed-up would be at most constant if written in a faster language.

> Inefficient data transformations with high amount of transient allocations will run at least 10 times faster in many of the Ruby's alternatives. Good ORM implementations will also be able to optimize the queries or their API is likely to encourage more performance-friendly choices.

Or it could be o(n^2) times faster if you actually stop writing shit code in the first place.

Good ORMs do not magically fix shit algorithms or DB schema design. Rails' ORM does in fact point out common mistakes like trivial n+1 queries. It does not ask you "Are you sure you want me to execute this query that seq scans the ever-growing-but-currently-20-million-record table to return 5000 records as a part of your artisanal hand-crafted n+1 masterpiece(of shit) for you to then proceed to manually cross-reference and transform and then finally serialise as JSON just to go ahead and blame the JSON lib (which is in C btw) for the slowness".

> Many testimonies on Rust do just that. A lot of it comes down to particular choices Rust forces you to make. There is no free lunch or magic bullet, but this also replicates to languages which offer more productivity by means of less decision fatigue heavy defaults that might not be as performant in that particular scenario, but at the same time don't sacrifice it drastically either.

I am by no means going to dunk on Rust as you do on Ruby as I've just toyed with it, however I doubt that I could right now make the performance/productivity trade-off in Rust's favour for any new non-trivial web application.

To summarise, my points were that whatever language you write in, if you have IO you will be from the get go or later bottlenecked by IO and this is the best case. The realistic case is that you will not ever scale enough for any of this to matter. Even if you do you will be bottlenecked by your own shit code and/or shit architectural decisions far before even IO; both of these are also language-agnostic.

kstrauser · 2024-12-25T11:17:04 1735125424

Ouch. I had no idea it was that much of a resource hog.

htunnicliff · 2024-12-25T14:40:03 1735137603

For a stranger to the Ruby ecosystem, what are the benefits of YJIT?

Malp · 2024-12-25T15:07:58 1735139278

Just-in-time compilation of Ruby allowing you to elide a lot of the overhead of dynamic language features + executing optimized machine code instead of running in the VM / bytecode interpreter.

For example, doing some loop unrolling for a piece of code with a known & small-enough fixed-size iteration. As another example, doing away with some dynamic dispatch / method lookup for a call site, or inlining methods - especially handy given Ruby's first class support for dynamic code generation, execution, redefinition (monkey patching).

From https://railsatscale.com/2023-12-04-ruby-3-3-s-yjit-faster-w...,

> In particular, YJIT is now able to better handle calls with splats as well as optional parameters, it’s able to compile exception handlers, and it can handle megamorphic call sites and instance variable accesses without falling back to the interpreter.

> We’ve also implemented specialized inlined primitives for certain core method calls such as Integer#!=, String#!=, Kernel#block_given?, Kernel#is_a?, Kernel#instance_of?, Module#===, and more. It also inlines trivial Ruby methods that only return a constant value such as #blank? and specialized #present? from Rails. These can now be used without needing to perform expensive method calls in most cases.

weaksauce · 2024-12-26T02:43:53 1735181033

it makes ruby code faster than c ruby code so they are moving toward rewriting a lot of the core ruby stuff in ruby to take advantage of it. run time performance enhancing makes the language much faster.

dismalaf · 2024-12-26T06:01:01 1735192861

Same as the benefits of JIT compilers for any dynamic language; makes a lot of things faster without changing your code, by turning hot paths into natively compiled code.

FBISurveillance · 2024-12-25T09:04:00 1735117440

Since when contributing back to the community is considered a bad faith move?

t-writescode · 2024-12-25T09:05:30 1735117530

That's certainly not what I get out of what they said.

Shopify has introduced a bunch of very nice improvements to the usability of the Ruby language and their introductions have been seen in a very positive light.

Also, I'm pretty sure both Shopify for Ruby and Facebook for their custom PHP stuff are both considered good moves.