Until GPT-4.5, GPT-4 32K was certainly the most heavy model available at OpenAI. I can imagine the dilemma between to keep it running or stop it to free GPU for training new models. This time, OpenAI was clear whether to continue serving it in the API long-term.
Don't they use different hardware for inference and training? AIUI the former is usually done on cheaper GDDR cards and the latter is done on expensive HBM cards.