>>>How does it perform on 3090, 4090 or less? Are us mere mortals gonna be able to have fun with it ?
>>>Its in sizes from 800m to 8b parameters now, will be all sizes for all sorts of edge to giant GPU deployment.
--
Can you fragment responses such that if an edge device (mobile app) is prompted for [thing] it can pass tokens upstream on the prompt -- Torrenting responses effectively - and you could push actual GPU edge devices in certain climates... like dens cities whom are expected to be a Fton of GPU cycle consumption around the edge?
So you have tiered processing (speed is done locally, quality level 1 can take some edge gpu - and corporate shit can be handled in cloud...
----
Can you fragment and torrent a response?
If so, how is that request torn up and routed to appropriate resources?
BOFH me if this is a stupid question? (but its valid for how we are evolving to AI being intrinsic to our society so quickly.)
>>>Its in sizes from 800m to 8b parameters now, will be all sizes for all sorts of edge to giant GPU deployment.
--
Can you fragment responses such that if an edge device (mobile app) is prompted for [thing] it can pass tokens upstream on the prompt -- Torrenting responses effectively - and you could push actual GPU edge devices in certain climates... like dens cities whom are expected to be a Fton of GPU cycle consumption around the edge?
So you have tiered processing (speed is done locally, quality level 1 can take some edge gpu - and corporate shit can be handled in cloud...
----
Can you fragment and torrent a response?
If so, how is that request torn up and routed to appropriate resources?
BOFH me if this is a stupid question? (but its valid for how we are evolving to AI being intrinsic to our society so quickly.)