This seems pretty reasonable and matches my suspicions. It is not hard for me to...

hedgehog · on May 25, 2023

First you have to figure out what problem to attack. Research, training production models, and production inference all have very different needs on the software side. Then you have to work out what the decision tree is for your customers (so depends who you are in this equation) and how you can solve some important problem for them. In all of this for say training a big transformer numpy isn't going to help you much so it doesn't matter if it's faster for some small cases. If you want to support a lot of model flexibility (for research and maybe training) then you need to do some combination of hand-writing chip-specific kernels and building a compiler that can do some or most of that automatically. Behind that door is a whole world of hardware-specific scheduling models, polyhedral optimization, horizontal and vertical fusion, sparsity, etc, etc, etc. It's a big and sustained engineering effort, not within the reach of hobby developers, so you go back to the question of who is paying for all this work and why. Nvidia has clarity there and some answers that are working. Historically AMD has operated on the theory that deep learning is too early/small to matter, and for big HPC deployments they can hand-craft whatever tools they need for those specific contracts (this is why ROCm seems so broken for normal people). Google built TensorFlow, XLA, Jax, etc for their own workloads and the priorities reflect that (e.g. TPU support). For a long time the great majority of inference workloads were on Intel CPUs so their software then reflected that. Not sure what tiny corp's bet here is going to be.

The change in the landscape I see now is that the models are big enough and useful enough that the commercial appetite for inference is expanding rapidly, hardware supply will continue to be constrained, and so tools that can reduce production inference cost by a percentage are starting to become a straight forward sale (and thus justify the infrastructure investment). This is not based on any inside info but when I look at companies like Modular and Octo that's a big part of why I think they probably will have some success.