The huge up-front energy costs of GPU clusters, both of making and operating the...

The huge up-front energy costs of GPU clusters, both of making and operating them, is amortized over all subsequent uses of the models, i.e. inference. Inference itself is cheap per query. Your use cases aren't consuming that much energy. I feel the amount is in the same ballpark as what you'd use doing those things yourself.

As for whether it's worth it, I argue this is the single most useful application of GPUs right now, both economically and in terms of non-monetary value delivered to users.

(And training them is, IMO, by far the most valuable and interesting part of almost all creative works available online, but that's another discussion.)