“The speed and cost of running machine learning operations — ideally in deep learning — are a competitive differentiator for enterprises… That speed can only be achieved with custom hardware, and Inferentia is AWS’s first step to get in to this game,”…Google has a 2-3 year head start with its TPU infrastructure…AWS CEO Andy Jassy indicated it won’t actually be available until next year.

Sounds like all the big cloud players are moving into the ML/DL space (Google with TPU, MSFT with Project Brainwave/FPGA, and now AMZN with Inferentia).

I’m out of Nvidia primarily based on the slowdown in datacenter growth. I don’t see that trend reversing, which was my major reason for being in Nvidia.

Ugh. I might finally bail on NVDA now, losses and all. What was once my top performer is now my top loser. And the world might finally be turning towards specialized hardware for ML rather than continuing down the GPU path. I’ve been expecting this, just not quite this quickly. Thanks for that article.

Ok, reading more, and talking to an ex-colleague involved with an ML/AI chip startup, suggested I wait until all these people announcing things actually start shipping things that work better than GPUs. He said even the TPU is not all that impressive, all things considered. He also didn’t suggest I jump ship and join his company, so…

Hanging onto NVDA for the near term, but watching them like a hawk.

I recall this board already discussing the TPU and NVDA CEO might have addressed it too. I recall the conclusion was NVDA GPU was superior and their pipeline of GPUs was going to be another quantum leap above TPU.

Anyone else recall our discussions?

That said, everyone should value cyber mining at zero. It could be even negative of those GPUs come on the market second hand and partially replace new sales.

“Inferentia” is a strange name for a machine learning chip.

Inference

Deduction is inference deriving logical conclusions from premises known or assumed to be true, with the laws of valid inference being studied in logic. [emphasis added]

Machine and deep learning create the “premises known or assumed to be true.” Inferencing uses the “premises known or assumed to be true” to reach some conclusion. Did AWS really think this through?

It seems everyone is coming out with their own asic these days. It’s unusual.

It would be akin to Microsoft to coming out with their own processor to run windows.

It puts a limit to the market opportunities for NVDA but the market is huge. It’s also why I don’t put much emphasis on their high volume applications such as autonomous driving. Though ford and gm aren’t exactly the kind of conpanies to come out with an asic.

Machine and deep learning create the “premises known or assumed to be true.” Inferencing uses the “premises known or assumed to be true” to reach some conclusion. Did AWS really think this through?

I can see needs for doing inference on AWS, so quite possibly yes they have thought this through.

Deduction is inference deriving logical conclusions from premises known or assumed to be true, with the laws of valid inference being studied in logic. [emphasis added]

Machine and deep learning create the “premises known or assumed to be true.” Inferencing uses the “premises known or assumed to be true” to reach some conclusion. Did AWS really think this through?

In the lingo of machine learning and deep learning there are two parts.
Part 1 is the “training”…which takes a long time (relatively) since the models must be trained with thousands to millions of examples. The result is a set of weights used in the mathematical convolutions.

Part 2 is called “inference.” The model with its trained weights is run and the answer is produced. There is really no thinking involved, every case takes the same amount of compute time.

Training is typically run using 32-bit floating point math. Once trained, to save time/memory during the inference, the weights can be reduced to 16-bit floating point, 8-bit integers (or even smaller depending on the usage). Some minor retraining is needed. Then the (potentially) millions or billions of time you use the trained model you just run the inference at the lower precision.

Yes, Amazon did think this through…their chip is designed to just run the inference.

Note: something like Alexa might use the lower precision inference (which might be 99% as good but use 25-50% of the compute time), while a medical MRI application would use the full 32-bit precision

In the lingo of machine learning and deep learning there are two parts. Part 1 is the “training”…which takes a long time (relatively) since the models must be trained with thousands to millions of examples. The result is a set of weights used in the mathematical convolutions.

Part 2 is called “inference.” The model with its trained weights is run and the answer is produced. There is really no thinking involved, every case takes the same amount of compute time.

Training is typically run using 32-bit floating point math. Once trained, to save time/memory during the inference, the weights can be reduced to 16-bit floating point, 8-bit integers (or even smaller depending on the usage). Some minor retraining is needed. Then the (potentially) millions or billions of time you use the trained model you just run the inference at the lower precision.

What I said but using many more words…

Yes, Amazon did think this through…their chip is designed to just run the inference.

Then the name makes sense and Ron Miller at TechCrunch is misleading:

“AWS is not content to cede any part of any market to any company. When it comes to machine learning chips, names like Nvidia or Google come to mind, but today at AWS re:Invent in Las Vegas, the company announced a new dedicated machine learning chip of its own called Inferentia.”

For Saul’s board, Inferentia is not a real threat to Nvidia. No need to panic!