The biggest reason I'm not worried about prices going back up again is Llama. Th...

The biggest reason I'm not worried about prices going back up again is Llama. The Llama 3 models are really good, and because they are open weight there are a growing number of API providers competing to provide access to them.

These companies are incentivized to figure out fast and efficient hosting for the models. They don't need to train any models themselves, their value is added entirely in continuing to drive the price of inference down.

Groq and Cerberus are particularly interesting here because WOW they serve Llama fast.