NVDA- An Answer to the Puzzle

Going back to early 2016, Nvidia was a good company whose stock was performing pretty well over the last year but was not doing much in the last 8 months or so. But NVDA was up to something that started with CUDA several years before this when they made the GPU become a general purpose computing accelerator. That spring they released the Pascal architecture. Not only would this lead to a boom in the gaming segment that has lasted all the way to now, but Pascal would make its way into the fledgling Tesla Datacenter lineup.

Nvidia did make data center chips but it was a very small base. Things were starting to happen in the Machine Learning and AI training fields, but the existing compute was not powerful enough for this market to take off. Most training compute was CPU based, though the GPUs were starting to make a name for themselves. The Tesla P100 was the catalyst the industry needed. Training time went from weeks and months to hours and days. The time saved created an explosion of experimentation, leading to rapid progress. And Nvidia’s data center revenue went through the roof, adoption of their technology widespread. Everyone is on board.

Then Nvidia upped the ante with Volta and the saga continues with Data Center, almost entirely still for training or High Performance computing, growing 83% in the most recent quarter. Volta decreases training time from hours and days to minutes.

That was wave 1 which is not even close to cresting. Nvidia utterly dominates this wave. The other thing that Nvidia has known is going on is wave 2. All of the training and experimentation and widespread open sourcing has led to thousands of frameworks and networks that are capable of performing the AI. When the network is presented with new data they make an inference on what it is they’re trained to do. But because of these advances the complexity of these programs has increased immensely.

We arrive at another crossroads. Somewhere around 90% of the world’s inferences are processed on CPUs. Facebook notes that 100% of their inference is processed on CPU servers. With this new world that Nvidia has ushered in, that’s not going to cut it. The latency and price/performance of CPUs are prohibitive for advancement.

Enter the Nvidia and the GPU again. They have been carving out space already with their Pascal solution. Google recently announced that they are bringing the P4 to their cloud. Microsoft has been running Bing and Cortana inferences on Nvidia GPUs. But this market that is estimated to be double the training market has been a question. Whose boat will rise on this next wave?

Starting here, Facebook has announced their intention to build a new data center solely for their inference needs and a competition of sorts for the hardware. I include this as an example of how big this green field could be.


In what I believe has been their hugest product announcement in a while, Nvidia recently announced the Tesla T4 Tensor Core GPU for Hyperscale Inferencing. It is an amazing piece of hardware/software specially designed for processing multiple inferences from multiple users running multiple frameworks concurrently on a single node.

Check out these specs and info here.


About a 4.5x increase over P4 and 27x over a CPU and only 75w and capable of additional variations(lower precision typical in inference workloads) previously not available thanks to the Tensor Cores. And as the Author states, on a price/performance comparison we’re talking a 35x-60x better performance. The more you buy the more you save, right.

Now, how about others competing for the inference pie? Sure there are other options, none of which has disrupted the CPU yet. Some startups out there, with no commercial products anytime soon. Xilinx, this week announced their newest FPGA inference product here.


Compare the numbers. The FPGA does 21 TOPs (8INT) and the T4 does 130TOPs (8INT). Both are 75W. Not even close, Nvidia moved the target again. And of course FPGAs have the programmability isseue that affects latency for multiple platforms. It can’t do them concurrently like the T4. Latency is money.

With the T4 announcement, I think Nvidia is off to the races again with nothing but green in their future.

My conviction is that the new inference lineup will add at least as much as their current data center market over the next 2-3years. And that training will continue to have great growth as well. A $4B inference run rate won’t equate to another 700% in stock price growth, because $4B on top of 2018 Nvidia is not as great as $4B on top of 2016 Nvidia. But we haven’t even gotten to growth in Gaming, ProVis, and that Autonomous vehicle/Machine part yet. For another time.




What a great post! Thanks.

When FB’s Suck testified before Congress, he committed to hiring a bunch a people (it was either 20,000 or 10,000, I think 20,000) to monitor content. Well, that’s a lot of people and it really increases FB’s operating expenses. I posted a while back that FB’s share price hit was well justified because of the long term hit to profitability. I also suggested that FB needs to automate all of the content checking. When you have more then 2B people posting stuff you want to make sure that the content is appropriate. Inferencing is the answer: FB must increase the speed and accuracy of content monitoring while reducing its cost. Yes, FB will be buying a lot of hardware to achieve this.