How NVDA Does it- SATURN V

Many on this board (gauchoChris the most) have asked the question does Nvidia use its deep learning and AI technology to help run its business and research to sustain its competitive advantage.

The answer is yes they do. Huangenstein’s Monster is the SATURN V Super Computer that Nvidia made at their headquarters. It consists of 125 DGX-1 Pascal Supercomputers. At 2016 Supercomputing top 500 it was something like 28th for speed and first place in efficiency(performance/watt). While they do rent it out to exclusive researchers, Nvidia is the primary user of this massive super computer.

https://blogs.nvidia.com/blog/2016/11/14/dgx-saturnv/

And they are or have updated it to Volta

https://www.nvidia.com/en-us/data-center/dgx-saturnv/

Some of the highlights from this revelation.

“We’re convinced AI can give every company a competitive advantage.”

“SATURNV helps us build the autonomous driving software that’s a key part of our NVIDIA DRIVE PX 2 self-driving vehicle platform.”

“We’re also training neural networks to understand chipset design and very-large-scale-integration, so our engineers can work more quickly and efficiently.”

“Yes, we’re using GPUs to help us design GPUs.”

“Most importantly, SATURNV’s power will give us the ability to train — and design — new deep learning networks quickly.”

They use it to analyze gamer data to build GeForce cards they want.

They use it to recruit new hires.

There are probably hundreds of business applications they are running on this beast.

So this is how NVDA went from Pascal to Volta in a year. This is how CUDA and NVDA software stack helps to accelerate so much that an update can accelerate an existing GPU architecture. This is how the DGX-2 released this year is 10x faster than the DGX-1 from last year(not just from double the GPUs but also the NVSwitch and Software Stack upgrades). SATURN helps write the software.

SATURN helped make VOLTA which is now replacing itself with what it created. Wierd stuff. I can make me better here’s how humans.

Darth

40 Likes

<<<first place in efficiency(performance/watt).>>>

Nvidia has a lot of competitive advantages, well “advantages” seem quite tame as to just how dominating those advantages are. It makes sense NVDA would also be the best user of AI in the world (that is an assumption, we don’t know, but they sure make it sound like they are and the results speak for themselves).

This said, much of the fear surrounding Nvidia is that ASICs or some other heretofore unrevealed technology will come and spoil Nvidia’s market domination party.

The primary issue that GPUs are not as optimized. Speed has not been the issue, but performance/watt. A more “optimized” chip can produce better performance/watt.

If the above quote Darth gave is true, it appears that is not something we need to much worry about.

As our analysis of Volta v. Google tensor revealed (on the NPI board anyways) Google was not comparing apples to apples, and that when you do, it costs 30% more to get the same work done on the Google tensor than it does using 4 core Volta, and if I recall correctly, the Volta will get it done quicker (don’t quote me on the last part as I will need to look that up again).

But it was clear, that even using the Google cloud, except for specific use cases I would guess, NVDA Volta (that does have its own tensor) was 30% superior in cost/performance than Google tensor. But again that will vary with use case. When it is a tensor intense use, Nvidia superiority increased.

Tinker

6 Likes

“If the above quote Darth gave is true, it appears that is not something we need to much worry about.”

From the Article I posted.

“Our SATURNV supercomputer, powered by new Tesla P100 GPUs, delivers 9.46 gigaflops/watt — a 42 percent improvement from the 6.67 gigaflops/watt delivered by the most efficient machine on the Top500 list released just last June. Compared with a supercomputer of similar performance, the Camphore 2 system, which is powered by Xeon Phi Knights Landing, SATURNV is 2.3x more energy efficient.”

If you want to you can cross reference by searching for Nvidia Saturn V efficiency. You can find multiple sources that will give you the exact same data as that paragraph.

On a side note, Facebook built a super computer that from what I can tell is a duplicate to the Saturn V to run their stuff and things.

One of the things I’ve discovered in the process is that so much of the worlds high performance computing is done by CPUs. That includes super computers,cloud compute for hire by the Titans, and enterprise. If you go to their pay plans for machine learning and cloud computing most of the options are for CPUs. GPUs are mostly made up of older models. Looks like AWS and IBM were first in line for Volta. GPUs obviously catch a higher premium. I think we are likely to see GPUs continue to add to global compute or displace CPUs for some time.

Found something interesting about AWS and Tensorflow while poking around in there. Probably post a different thread about that when I get a chance.

Darth

10 Likes

ASIC I don’t know much about them but the name may say it all.

At this early stage most companies don’t even know what AI applications they will wind up using . Furthermore, Nvidia’s GPU are improving rapidly, by the time you build, debug, and deploy an ASCI it is already obsolescent.

This said, much of the fear surrounding Nvidia is that ASICs or some other heretofore unrevealed technology will come and spoil Nvidia’s market domination party isn’t that always true? No unrevealed technology came along to upset Microsoft or Intel. Because once you get a big enough headstart the “unrevealed technology” has to be massively better, not just slightly better.

My only real worry with NVDA is that AI and it’s cousins will turn out to be less useful than we think. The only one I know much about (except in theory ) is car autopilot and I have little doubt that it will get better than humans in the foreseeable future.And despite all the man bites dog publicity, from a rational standpoint it only as to be a bit better than humans and continually improving to warrant adoption. Because many humans are rotten drivers and are clearly getting no better at it.

4 Likes