One more response. Hope we’re not getting Pure fatigue yet.
Where does the 305% come from? Disclosure that my tech credentials only extend as far as being my wife’s tech guy.
Here’s the best I can come up with and this will apply to the Nvidia DGX-1 case which the graphic comes from. For reference here is a typical non FlashBlade DGX-1 configuration:
https://devblogs.nvidia.com/wp-content/uploads/2017/04/image…
Streaming Cache in the form of 4xSSD is the topic
The SSD is on board the super computer. When you plug in the DGX to a typical storage DL training Network to do training the data needs to go from the storage to the SSD to the GPU for training. The data or images need to be “cached” to the SSD in order to be processed. When an engineer downloads their images of road signs or whatever they have they go to the network storage first. They wouldn’t download them directly into the onboard SSD. That data would get written over when another training task is assigned.
Here is how it appears FlashBlade makes a difference. It is built from the ground up to be parallel DL/AI storage. While it is Flash, it also has an entirely unique architecture and software overlay. They describe it as being “cache-less”. The Network File System (NFS) through this software exists in parallel with the DGX-1 and eliminates the on board SSD altogether. The GPU has instant access to the data you want to feed it so no need to “on board” the data first so it can be fed by the SSD. That is why it saves “end to end” time and also eliminates the “latency” issue. Many FlashBlades in a chassis can be linked into many DGX-1 all in parallel and I imagine can do some pretty amazing things. They could, for instance, crank out trained neural networks onto Pegasus chips to be placed into self driving cars at scale. Correct me if I’m wrong but each neural network on each Drive system has to be trained to operate independently of a cloud. You don’t just train one and then cut and paste it into the next computer. Given that I don’t have a reason to believe Pure is lying the improvement they achieve reaches 305% on a whole end to end. If everything in this paragraph is true then it appears FlashBlade might be something revolutionary and very important to the DL equation.
http://www.fujitsu.com/id/Images/8.3.3%20FAC2017Track3_Brent…
This is what I deduced from taking the slides from the link into context. Start with the first one of DGX. It either comes equipped with SSD or without when coupled with a FlashBlade network. Next consider the slide from the self driving car company network. Artista to Pure to Nvidia (like a Tinker ETF) It shows the flow of data and how the DGX GPUs are in parallel to the Blade. Then look at the improvement charts. The Blade slightly outperforms on a direct connect benchmark but when taken into a whole end to end it skips the most time consuming step in a training process. imagine changing that self driving car network into a legacy network with a SSD DGX and picture a conventional storage array and then SSD in the DGX. Also if you re read the article that started all the PSTG posts (I think):
https://blog.purestorage.com/ai-industry-needs-rethink-stora…
It again discusses how the Blade is created from the ground up to run in parallel to the GPU. And there is another graph showing many Blades in parallel with many DGXs. Combined with multiple other literature by pure storage about FlashBlade this is how I think it works.
Nvidia starts shipping their Pegasus and Xavier developer kits to their partners either this quarter or next. While those will have the car chips the kits also contain Volta DGXs and Pro visual equipment to train the neural networks and provide a means to test the system and train in simulators as well. So at least three of the big ones are using Pure in the
equation according to Pure, maybe there will be more who choose this option.