I'm sorry but I'm finding it really difficult to watch this and match up what th...

ehmish · on April 22, 2019

This doesn't seem wildly inconsistent with my understanding of Elon musk, in that he has a higher level of confidence of his understanding of the world than others would, and in a "you don't know what you don't know " way it seems he didn't know about custom how custom ASICs are much more efficient at specific tasks than a general purpose computer. Doesn't make him a bad person, and in fact something like that is exactly who you want leading a company trying to change the rules of the industry they're in, it's the job of the boots on the ground to exercise due diligence here.

unityByFreedom · on April 23, 2019

I don't understand this response to the above comment. You write,

> it seems he didn't know about custom how custom ASICs are much more efficient at specific tasks than a general purpose computer

Yet, Musk is clearly stating his new chip is state-of-the-art. He is not underselling it.

> something like that is exactly who you want leading a company trying to change the rules of the industry they're in

You want someone in charge who does not understand the hardware he's touting?

> it's the job of the boots on the ground to exercise due diligence here

Who are you referring to? Employees and investors?

What, then, do you consider are the responsibilities of the CEO, if due diligence is not part of the job?

mrtron · on April 22, 2019

I think the part that Elon, and the engineer, are trying to express that you are looking past is "A dot product with some memory attached" is what they need to do low latency image inference on the camera data.

They don't need a multipurpose CPU/GPU, they have a single well defined task that dominates their compute needs. They built a chip to do this that is cost and power effective, and redundant for safety.

Traster · on April 23, 2019

Right, but Nokia's 5G backhual infrastructure is far more capable of doing Mobile backhaul than a GPU, you don't get the head of Nokia standing up in a press conference telling everyone his team has designed the best chip in the world.

dragontamer · on April 22, 2019

> For example, the engineer says the custom ASIC does 144 TOps for 2 chips vs the NVidia drive Xavier - does 21 TOps. Okay, well yeah I expect your custom ASIC does have a nice performance advantage over the equivalent GPU. at 3.5x advantage probably seems reasonable. Cue Elon Musk:

Here's my issue with that. The on-chip SRAM is only 32MB, and the RAM is LPDDR4 rated at only 68GB/s.

Assuming a dot-product (multiply + add) over INT8 data, that's a limitation of 2-operations per 68GB/s (that the RAM moves at). Or 136 GIOPS (Giga-integer8 operations per second). You're limited by RAM, based on what I've seen in the presentation.

Unless their neural net is 32MB and fits entirely in on-chip SRAM. That seems unlikely to me...

shaklee3 · on April 23, 2019

Nvidia already responded:

""Tesla was inaccurate in comparing its Full Self Driving computer at 144 TOPS of processing with Nvidia Drive Xavier at 21 TOPS," a spokesman said in an email. "The correct comparison would have been against Nvidia's full self-driving computer, Nvidia Drive AGX Pegasus, which delivers 320 TOPS for AI perception, localization and path planning." The statement also contends that "while Xavier delivers 30 TOPS of processing, Tesla erroneously stated that it delivers 21 TOPS. "

Robotbeat · on April 23, 2019

From what I understand, Pegasus consumes about 500Watts, compared to under 100 Watts for Tesla's FSD computer. Elon in particular emphasized the performance per watt (as it's always possible to cram more chips to increase performance if you ignore cost and power consumption).

The comparison made in the video: 500Watts for an hour consumes about 2-3 miles of range. In a city in slow traffic, going 12mph, that's a significant range reduction. So you might have a 10% improvement in range for the Tesla ASIC in low speed conditions.

dragontamer · on April 24, 2019

Its incomparable though. The Pegasus has far more compute power.

Just because the Pegasus has 500W worst-case TDP doesn't mean that its average case would be 500W constant. If you scale back your code and idle parts of the GPU, you can drop the energy cost arbitrarily.

At least, that's how GPUs on desktops work. They only use a ton of power if you give them a ton of work. Write your code in an energy-efficient manner, and the 2080 Ti will drop down to 20W, or scale all the way up to 300W.

https://www.tomshardware.com/reviews/nvidia-geforce-rtx-2080...

Modern chips idle very well. With the right code, the Pegasus could be tuned to only use 100W (assuming good enough programmers). But Tesla's chip will NEVER be able to scale above 100W.

Robotbeat · on April 24, 2019

Pegasus's chip is more general purpose and doubtless has far more general purpose compute power. But that's irrelevant. What's relevant is Tesla's chip is optimized specifically for their NN pipeline whereas Pegasus is based off of general purpose GPU architecture, thus Tesla's chip achieves a better TOPS per Watt than Pegasus. And it'd be strange if it didn't.

"Tesla's chip will NEVER be able to scale above 100W" okay, based on what? Tesla has a higher performance chip in the pipeline right now, and they could've used more silicon to achieve more TOPS if they needed it.

EDIT: Pegasus has 500W at 320 TOPS. Tesla's has 72 Watts at 144TOPS. Thus Tesla's chip, because it's focuses specifically on Tesla's NN pipeline, is about to get almost 4 times to performance per watt of the Pegasus and is much cheaper. Tesla's NN chip wouldn't help your video game, and Tesla isn't intending to compete in all the markets Nvidia operates in.

dragontamer · on April 24, 2019

> Pegasus has 500W at 320 TOPS. Tesla's has 72 Watts at 144TOPS.

Theoretical TOPS which can only ever execute within the 32MB SRAM that Tesla has created. Otherwise, Tesla's compute chip is stuck at 68 GBps LPDDR4 RAM. Pretty slow.

Pegasus uses HBM2 chips at 500GBps. Pegasus will be able to efficiently compute neural networks that are larger than 32MB in size.

Tesla is making big bets about this tiny 32MB SRAM. Bits and pieces of the CNN can fit in there, but almost certainly not the entire neural network.

You're right that this is a specialized chip. But even for NN / Deep Learning inference, it seems a bit underpowered to me from a RAM perspective

Robotbeat · on April 24, 2019

Specmanship doesn't matter. What matters is how fast it's able to execute on the task at hand and for what cost in terms of purchase price and energy.

Nvidia's offering can be really good, and so can Tesla's.

grandmczeb · on April 22, 2019

Convolutions have a much higher arithmetic intensity than dot products.

cromwellian · on April 22, 2019

But in terms of CNNs, it's basically dot product is it not?Can you give an example of a CNN where it isn't?

grandmczeb · on April 23, 2019

In a conv layer, weights are shared across many input features. E.g. assume a 1x1 conv layer with a 28x28x3 input. You only need to load 3 weights even though there are effectively 28x28=748 different dot products. In practice, the input and output activations can be stored on chip as well (except for the first layer) which means the ratio of operations to DRAM accesses can be incredibly high. For some real world examples, take a look at the classic Eyeriss paper[1] which finds a ratio of 345-285 for AlexNet and VGG-16 respectively. You can also check out the TPU paper[2] which places the ratio at >1000 for some unnamed CNNs. Compare that to your analysis which yields a ratio of 2.

[1] https://people.csail.mit.edu/emer/papers/2017.01.jssc.eyeris...

[2] https://arxiv.org/pdf/1704.04760.pdf

dragontamer · on April 22, 2019

He has a point. There's a bit amount of data-reuse in CNNs.

Hmm... it will depend on the CNN. There's probably a good neural network design that would take advantage of this architecture. IE: A well recycled convolutional layer that probably fits within the 32MB (load those weights once, use them across the whole picture).

So the whole NN doesn't necessarily have to fit inside of 32MB to be useful. But at least, large portions have to fit. (say, a 128x128 tile with 20 hidden-layers is only 300kB). Recycling that portion across the 1080 x 1920 input would be relatively cheap.

I herp-derped early on, there seem to be CNNs that would make good use of the architecture. Still, the memory bandwidth of that chip is very low, I'd expect GDDR6 or HBM2 to definitely be superior to the 68GBps LPDDR4 chip they put in there.

Joky · on April 22, 2019

I think the key thing is that you assume a single dot-product per RAM load. In general you get higher compute performance by doing more than this. You load weights for a layer in the SRAM, then stream from RAM tiles of data that you can compute multiple operations on.

In the same way, to reach peak FLOPs on a GPU you better use the local/shared memory as much as possible.

Fuzzwah · on April 22, 2019

>> how could it be that Tesla

> Mate, it's a dot product with some memory attached, and not a single detail your half hour deep dive has gone into suggests anything other than a bog standard ASIC.

There you go, you answered the question Musk put forward. Grats.

tigershark · on April 23, 2019

Apparently it’s actually less than half the performance of Nvidia solution: https://www.marketwatch.com/story/nvidia-says-tesla-inaccura...

https://www.nvidia.com/en-gb/self-driving-cars/drive-platfor...

tomComb · on April 22, 2019

I'm a fan of Tesla, but not Musk. You really can't believe much that comes out of his mouth. If you look at his past claims it becomes clear that he should just be ignored.

willio58 · on April 22, 2019

Being the fan of a company while completely ignoring the ceo of said company is questionable to me.

tomComb · on April 23, 2019

Tesla has great products, created and supported by thousands of people who appear to go above and beyond. Kudos to them. And kudos to Musk for the vision and the early execution, but that doesn't mean I have to like him or believe that he is the right person for this phase of the companies growth.

davej · on April 22, 2019

> If you look at his past claims it becomes clear that he should just be ignored.

What's an example of this? His timelines are often much too aggressive for reality but he repeatedly delivers (eventually).

kurtisc · on April 23, 2019

>he repeatedly delivers (eventually)

This is essentially unfalsifiable. In the case claims about FSD or a person on Mars are wrong then this statement can only be proven wrong when Elon Musk chooses to admit defeat - suffering a significant personal financial loss as a result - or when he retires.

tomComb · on April 23, 2019

The main example, of course, is in self-driving where he has most certainly not delivered. And he wasn't just a little off.

But also his claims about being able to automate the manufacture of Model 3 turned out to be very wrong and to do terrific damage to the company - they are/were way ahead with their technology and should have taken a less risky approach to getting that tech onto the market and taking advantage of that lead. And how did he not learn from the model X experience?

bobsil1 · on April 23, 2019

He's more credible on physics and hardware than software.

dwighttk · on April 22, 2019

hasn't delivered full self driving (yet)

cma · on April 23, 2019

Is it really an ASIC? I thought they said over and over it is full custom.

jakobegger · on April 23, 2019

Definition from wikipedia:

> An application-specific integrated circuit (ASIC /ˈeɪsɪk/) is an integrated circuit (IC) customized for a particular use, rather than intended for general-purpose use.

Is there anything more "custom" than an ASIC?

cma · on April 24, 2019

I suppose both are still called an ASIC. What I'm wondering is if it is gate-array/semi custom

https://en.wikipedia.org/wiki/Application-specific_integrate...

or full custom:

https://en.wikipedia.org/wiki/Application-specific_integrate...