I think the two can complement each other very well. GPUs are flexible and scala...

Symmetry · on March 21, 2017

If you're dealing at scales where you can use the word "fleet" then it will usually make sense to just build an ASIC on a trailing process node rather then go for FPGAs. They'll be cheaper in bulk and more performant even with a large process disadvantage.

ADDENDUM: But fundamentally, in spaces like this, the underlying algorithms that can be accelerated are fairly simple. In most cutting edge AI these days the heavy lifting is performed by convolutional neural networks and the specialized silicon that works to speed up one set of convolutional neural network operations will speed up another just as well. Baking the network itself into the hardware shouldn't tend to be any better than loading it into specialized memory pools unless you get really exotic and do your neural network in analog electronics.

bayesian_horse · on March 22, 2017

I think there is a big enough space between GPU and ASIC technology for FPGAs. The main reason is the lifetime of the models. The shorter that lifespan, the more expensive it is to exchange the ASICs. At the very least you have to produce new ASICs every few months, and replace them in special sockets, or even reflow/solder them to new cards.

I hardly believe that this is economical.

Symmetry · on March 22, 2017

My assumption is that the ASIC is executing code that changes every month but that it's using instructions and a memory hierarchy geared towards constitutional neural networks. If that stops being true then of course you'd need a different ASIC but then again if that stops being true then there's no guarantee that a GPU or ASIC will do any better than a CPU. You could end up with something like Alpha-Beta pruning where parallelism doesn't make much different. A reasonable chip wont' be able to contain enough transistors to have separate execution resources for each layer. It's going to have to work by loading a layer, convolving it, load the next layer, convolve it, etc. So you'll be able to change your network without changing the ASIC your running it on while still taking advantage of your dedicated ganged operations. The exact size of the network layers can be optimized for in the FPGA version that you can't in the more flexible ASIC version. But I expect the benefit from those to be much smaller than moving to an ASIC.

bayesian_horse · on March 23, 2017

From the papers I do believe they are hardcoding the layer weights in the hardware definition of the FPGA. These FPGAs also have no significant RAM on chip, but the intel FPGAs they use do seem to have an even larger number of LUTs than the usual embedded FPGAs, and even dedicated floating point units.

At the very least they talk about omitting weights which are 0 in the synthesis.

DocSavage · on March 21, 2017

Are there any TensorFlow-tuned ASICs like Google's TPU available or planned for general release?

https://rcpmag.com/articles/2016/10/10/microsoft-google-ai-s...

If the deep learning architecture stabilizes for a problem with sufficient market demand, seems like ASICs could be economical.

etrautmann · on March 21, 2017

TrueNorth from Paul Merolla and co. at IBM, and Nervana Systems (now Intel) both have hardware optimized for neural networks

p1esk · on March 22, 2017

TrueNorth was built to run spiking neural networks, which have little to do with deep learning (even though they managed to get it to run a small convolutional NN), and Nervana has never actually built any hardware.

hedgehog · on March 21, 2017

Yes, there are at least a dozen companies with specialized hardware accelerators in some stage of development. For smaller parts some of the existing DSP companies like CEVA and Cadence Tensilica are also are adapting their architectures for deep neural net workloads.

p1esk · on March 22, 2017

Yet it's still not clear if building a custom chip makes sense, because the next Nvidia chip might make it obsolete. Or the one after the next (still too soon).