>>*>How does it perform on 3090, 4090 or less? Are us mere mortals gonna be able...

>>>How does it perform on 3090, 4090 or less? Are us mere mortals gonna be able to have fun with it ?

>>>Its in sizes from 800m to 8b parameters now, will be all sizes for all sorts of edge to giant GPU deployment.

Can you fragment responses such that if an edge device (mobile app) is prompted for [thing] it can pass tokens upstream on the prompt -- Torrenting responses effectively - and you could push actual GPU edge devices in certain climates... like dens cities whom are expected to be a Fton of GPU cycle consumption around the edge?

So you have tiered processing (speed is done locally, quality level 1 can take some edge gpu - and corporate shit can be handled in cloud...

----

Can you fragment and torrent a response?

If so, how is that request torn up and routed to appropriate resources?

BOFH me if this is a stupid question? (but its valid for how we are evolving to AI being intrinsic to our society so quickly.)