PURE STORAGE. Have to read this

thanks Tamhas, that puts the discussion in a perspective I can understand. Does Pure attach as TCP/IP client?

What continues to bother me about the claims for Pure is that it is still networked and physics tells me that no matter what you do at the end of the cable and no matter how many parallel cables you provide, that true random access is going to be dramatically slower than SSDs which are in the box with the processor (being immensely cheaper, as well). For huge volumes of poorly structured data this may not matter, but for relational databases, it is almost certainly going to be a triumph of standard practice over performance.

I am a little confused what this has to do with PURE. You are bothered and yet every other business measure suggests the actual industry seems to know something more about its performance and value.

Their revenue is growing impressively, NTAP is wanting to compete and is actively converting its legacy systems, customers seem to be particularly pleased by their product and the technology is improving.

So the obvious question is how does this very knowledgeable industry, those actually in this industry, have it so wrong?

3 Likes

Pure, like NTNX, has grown faster than any company in storage industry history. Clearly both are doing something that is quite disruptive and missing from the incumbents. With Nutanix we know only VMWare stands in their way. With Pure, seems to be more competitors, but to date none of them have been able to come close to what Pure offers. What they do have is large customer bases who will be reluctant to leave their vendors, and they will make their products just enough better to try to fight off the new benefits that a Pure brings.

The question becomes, can the incumbents do enough to be good enough so as to remove Pure’s obvious advantages?

Over time I think the value proposition for Pure will grow, and that like with Arista, the technological limiters will be removed and maximized to the extent possible. Musk did this with Tesla, Arista has done this in SDN against one of the largest and meanest companies in the world (what, can’t Cisco just recreate its product? Yeah, and then have to lay off 50% of its sales force).

Can’t EMC or NTAP just redesign their entire business models? Answer that question yourselves.

Tinker

3 Likes

So the obvious question is how does this very knowledgeable industry, those actually in this industry, have it so wrong?

Well, they have had it “wrong” in this particular sense for a long time. Various kinds of networked storage are sold to companies as universal good solutions and while they may in fact provide cost and management savings for word processing, spreadsheets, and the like they provide significant compromises for high transaction volume database applications. That doesn’t make it a bad product if one is knowledgeable about its characteristics and uses it accordingly, but it can mean that it is a bad product for some companies for their database applications.

In some cases, these characteristics don’t matter because the volume and critical speed of transactions does not require highest performance. Or, as been found by a number of companies using the old rotating rust type of networked storage with RAID to cuts costs and a cache to reduce apparent write time, they work OK during normal processing and then are terrible when it comes to an operation like restoring a backup which swamps the cache. Pure seems likely to not have that problem and so should work for a larger number of companies.

I don’t know that this is a factor which should retard the adoption of Pure in a meaningful degree … even if the industry were to actually wake up and pay attention to these issues, i.e., stop drinking their own koolaid. I have raised the issue because Pure is claiming latency speeds which seem dubious to me unless they are measured within the box rather than from the other end of the cable. It doesn’t help that Pure is missing from sources like the SPC-1 benchmarks http://spcresults.org/ apparently because the benchmark rules do not allow some of the “tricks” which Pure uses to achieve their results. That doesn’t mean that those “tricks” don’t work in practice, but it does mean we don’t have a standardized comparison.

1 Like

I don’t know that this is a factor which should retard the adoption of Pure in a meaningful degree … even if the industry were to actually wake up and pay attention to these issues, i.e., stop drinking their own koolaid. I have raised the issue because Pure is claiming latency speeds which seem dubious to me unless they are measured within the box rather than from the other end of the cable. It doesn’t help that Pure is missing from sources like the SPC-1 benchmarks http://spcresults.org/ apparently because the benchmark rules do not allow some of the “tricks” which Pure uses to achieve their results. That doesn’t mean that those “tricks” don’t work in practice, but it does mean we don’t have a standardized comparison.

Which again gets back to the original question, you question the investment and the benefits of the company’s technology…are you the only one who gets this and yet NTAP and all its customers are being duped and all of the rapid growth PURE customers are being duped, etc.???

Your “koolaid” comment seems to suggest that all those database experts are in fact idiots.

Can you state your exact expertise in this arena and what publications you have on this matter and what companies you own that have dealt with this issue and what specific software you created that deals with this topic specifically?

2 Likes

I am not questioning the investment, per se. I am questioning the performance claim.

And, yes, there are a lot of IT “experts” who are functioning as idiots because they don’t understand the implications of the technology they are dealing with. I have been quite clear that the technology does have the benefits claimed for it for some applications; it is just not all applications and it is important to understand that if it applies to what you personally want to do.

I have been in paid IT positions for 51 years … yes, including while in grad school … and have been an independent software vendor for 39 years creating and writing an ERP application. Within the technology of that software I have given presentations and papers at conferences multiple places in the US and Europe and even a couple of times in Russia. I have contacts in that community which include the primary author of the database I use.

2 Likes

For anyone pondering the what are we missing question on Pure, here’s some background context I posted on NPI for anyone new to Pure or the storage sector…

http://discussion.fool.com/running-through-the-numbers-on-pstg-o…

Ant

3 Likes

One more response. Hope we’re not getting Pure fatigue yet.

Where does the 305% come from? Disclosure that my tech credentials only extend as far as being my wife’s tech guy.

Here’s the best I can come up with and this will apply to the Nvidia DGX-1 case which the graphic comes from. For reference here is a typical non FlashBlade DGX-1 configuration:
https://devblogs.nvidia.com/wp-content/uploads/2017/04/image…

Streaming Cache in the form of 4xSSD is the topic

The SSD is on board the super computer. When you plug in the DGX to a typical storage DL training Network to do training the data needs to go from the storage to the SSD to the GPU for training. The data or images need to be “cached” to the SSD in order to be processed. When an engineer downloads their images of road signs or whatever they have they go to the network storage first. They wouldn’t download them directly into the onboard SSD. That data would get written over when another training task is assigned.

Here is how it appears FlashBlade makes a difference. It is built from the ground up to be parallel DL/AI storage. While it is Flash, it also has an entirely unique architecture and software overlay. They describe it as being “cache-less”. The Network File System (NFS) through this software exists in parallel with the DGX-1 and eliminates the on board SSD altogether. The GPU has instant access to the data you want to feed it so no need to “on board” the data first so it can be fed by the SSD. That is why it saves “end to end” time and also eliminates the “latency” issue. Many FlashBlades in a chassis can be linked into many DGX-1 all in parallel and I imagine can do some pretty amazing things. They could, for instance, crank out trained neural networks onto Pegasus chips to be placed into self driving cars at scale. Correct me if I’m wrong but each neural network on each Drive system has to be trained to operate independently of a cloud. You don’t just train one and then cut and paste it into the next computer. Given that I don’t have a reason to believe Pure is lying the improvement they achieve reaches 305% on a whole end to end. If everything in this paragraph is true then it appears FlashBlade might be something revolutionary and very important to the DL equation.

http://www.fujitsu.com/id/Images/8.3.3%20FAC2017Track3_Brent…
This is what I deduced from taking the slides from the link into context. Start with the first one of DGX. It either comes equipped with SSD or without when coupled with a FlashBlade network. Next consider the slide from the self driving car company network. Artista to Pure to Nvidia (like a Tinker ETF) It shows the flow of data and how the DGX GPUs are in parallel to the Blade. Then look at the improvement charts. The Blade slightly outperforms on a direct connect benchmark but when taken into a whole end to end it skips the most time consuming step in a training process. imagine changing that self driving car network into a legacy network with a SSD DGX and picture a conventional storage array and then SSD in the DGX. Also if you re read the article that started all the PSTG posts (I think):
https://blog.purestorage.com/ai-industry-needs-rethink-stora…

It again discusses how the Blade is created from the ground up to run in parallel to the GPU. And there is another graph showing many Blades in parallel with many DGXs. Combined with multiple other literature by pure storage about FlashBlade this is how I think it works.

Nvidia starts shipping their Pegasus and Xavier developer kits to their partners either this quarter or next. While those will have the car chips the kits also contain Volta DGXs and Pro visual equipment to train the neural networks and provide a means to test the system and train in simulators as well. So at least three of the big ones are using Pure in the
equation according to Pure, maybe there will be more who choose this option.

3 Likes

Your “koolaid” comment seems to suggest that all those database experts are in fact idiots.

What it suggests to me, as a retired database guy, is that decisions on hardware are largely made by management, and management often is far more familiar with vendor salesmen and dog-and-pony shows than actual performance details and requirements.

(There was a period when IT management where I worked was so under the influence of IBM that a small fortune was spent networking our buildings for the first time… with Token Ring rather than Ethernet.)

1 Like

What it suggests to me, as a retired database guy, is that decisions on hardware are largely made by management, and management often is far more familiar with vendor salesmen and dog-and-pony shows than actual performance details and requirements.

AMEN! Sometimes aided and abetted by corporate IT types who regard mere DBAs who have to take care of [mission critical] DB applications as lesser beings!

they may in fact provide cost and management savings for word processing, spreadsheets, and the like they provide significant compromises for high transaction volume database applications.

Tamhas, I finally understand why I couldn’t understand what you were talking about. Where I worked, these environments were entirely separate. All our office applications like MS-Office, email, sharepoint, etc. were on a Windows Server network. Separate network, servers, separate everything. The high transaction volume database applications were on a UNIX server network. We hosted Oracle, SQL Server, DB2 DBM systems on the UNIX platform. We even had some IMS running, but I don’t honestly know which network it ran against.

In any case, it never occurred to me that some companies would run office products and mainline applications on the same network.

But irrespective of that, here’s the way I see it. I’ll grant you that there might be some situations in which flash memory provides little overall, end-to-end, round-trip performance improvement. When a company experiences one of those situations they will have two choices. They can continue with their current situation in which case they won’t be a Pure customer. Or, they can change their IT architecture in order to avail themselves of the benefits of PSTG products. If you have read this thread, you will be aware that performance is never the only issue under consideration when making the purchasing decision. In fact, most the time it’s down the list from ROI.

2 Likes

Will it not require EMC or NTAP or HP to re-architecture their hardware and software to create the modularity and simplicity of what PSTG offers?

It sounds like you’re assuming that NetApp only offers spinning-disk storage systems. If you want to fully see the flash storage landscape for what it is from a competitive standpoint, you should be sure to look into the SolidFire line of NetApp products: https://www.netapp.com/us/products/storage-systems/all-flash…. Worth also noting that SolidFire is far from the only flash offering from NetApp: https://www.netapp.com/us/products/storage-systems/all-flash….

SolidFire line appears to have a very similar zero-effort installation process to PSTG, better performance in many cases (I’ve heard repeatedly that SolidFire wins against a Pure system in every customer POC or sales match-up they’ve participated in so far, but certainly don’t take that as gospel), and a mature support ecosystem behind it, including “replacing flash for free” on any supported system. Typically, that replacement would happen (disk shipped) before the customer even knows there’s a problem.

Nothing for or against either of these vendors (and again, I’m still watching for an entry point into PSTG myself), but make sure you see the whole picture before assuming that one company has a fully unique offering or a competitive moat.

6 Likes

Pure Storage is now a Microsoft Gold Partner.

https://twitter.com/8arkz/status/960933064787468288?ref_src=…

3 Likes

In any case, it never occurred to me that some companies would run office products and mainline applications on the same network.

Sadly, it happens a lot. I can’t speak to the actual frequency since it only ever comes to attention when someone has done the wrong thing and is trying to figure out how to fix it, but it is certainly common enough. Often it seems like a case of management believing the sales people, but sometimes there are even old time IT people in the mix, not paying attention to the DBA who manages the high transaction DB.

Hlygrail,

I would appreciate your comment on this comparison between Network Appliance and Pure: https://blog.purestorage.com/contrasting-pure-netapps-philos…

This is of course from a Pure blog, and it makes a compelling case (one example of multiple) as to why Pure is the better solution. The simplicity exampled here vs. Network Appliance, while both are said to be dong the same thing I thought was profound.

I have no industry expertise however, and frankly I find not having industry expertise to be an advantage in investing as you look at multiple other factors.

I would be interested in your comments regarding this comparison.

Thanks.

Tinker

3 Likes

I will summarize the thoughts I shared offline w/ Tinker on the Pure blog (https://blog.purestorage.com/contrasting-pure-netapps-philos…), with full knowledge it will probably spark more “nuh uh!!!” comments from certain vectors… and that’s fine. I have many years under my belt in this industry, much of it on the software development side, so take that for whatever it’s worth.

I read the Pure blog a few days ago from this same thread. They make a “compelling” argument – if you completely ignore some other facts:

  • Some (I’d argue many) customers actually want the flexibility to turn on/off features. Blanket-enabling them (such as deduplication) may be a marketing win for Pure Storage, but it isn’t necessarily or automatically a good thing. Unless /all/ you care about is price/gigabyte and being able to thin-provision hundreds or thousands of clients with “fake” (it’s not actually there, but the client doesn’t know better) storage, and that’s a smaller niche of the marketplace.
  • There is a cost for EVERY software feature or function, whether it’s performance or bandwidth or CPU cycles or operations/second or latency or whatever. Pure may be over-engineering their systems to account for “always-on”. I have some friends and even some ex-employees that work there, and I know more than that, but I don’t need to throw any more gasoline that way.
  • I find it very interesting that Pure compares their system to NetApp’s FAS line of products (which may have SATA, FC or SSD storage – which did they compare, or did they just guess or assume?), completely ignoring and not even mentioning the actual corollaries: NetApp SolidFire, or even the AFA and EF-Series all-flash systems. That’s Classic FUD/Spin Tactics 101, but hey, that’s what Marketing folks get paid to do.
  • Pure’s systems don’t have any SPC-1 performance specs – last I looked the SPC folks declined to test because Pure does some “tricks” to fake their numbers (their statement, not mine). Decide for yourself whether that matters and figure out who’s ranked where, but it’s an industry standard that all the mainline storage vendors measure against.

In short, that blog article is at least partially disingenuous. If you compare an apple to a grape and then point a finger at the grape grower because it doesn’t taste like all the apples… maybe the comparison was broken at the start?

For my $.02, “always-on” deduplication seems to be the Apple model – everything is a locked ecosystem, and you can’t control anything, while Apple controls everything. Oh, and you probably pay a premium for that level of “care.” It seems popular (or maybe we just have lots of followers in the world?). But I would always rather have the option to enable or disable things at my discretion, whether it’s personal or professional.

Disclosures again: I am a current AAPL shareholder, work in the storage industry and probably own one or more of those types of stocks, and am also watching for an entry point to own PSTG…

18 Likes

Hlygrail, reguarding you posts about PSTG and all the facts you say investors are ignoring and ‘using tricks to fake their numbers’ and then saying you’re looking for any entry point, what entry point would be sufficient for you to buy PSTG?

1 Like

Sorry I haven’t responded sooner - I got this thing called a Day Job.

Anyway, here’s some more thoughts and responses, not just to Tinker.

  1. instruction sheet that is on a business card, and in fact is on a business card.
    I saw the card (https://www.purestorage.com/content/dam/purestorage/new-for-… ) and that’s 3 commands for creating volumes, hosts, and connecting them. Just like anyone else. What’s missing is actually integrating the array into your backup or whatever application.
    For instance, here’s how to set things up for VMs: https://blog.purestorage.com/assigning-a-vvol-vm-storage-pol… Doesn’t read so simple. And here (https://blog.purestorage.com/announcing-the-new-pure-storage… ) is a link to the 216 cmdlets for integrating into Windows Powershell. No business card instructions here.

  2. No one comes close to the duplex ratios that PSTG does.

What’s a “duplex ratio”? Do you mean “de-duplication compression ratio”? If so, Pure isn’t really any better than Data Domain (bought by EMC in 2009). Data Domain pioneered inline de-duplication and successfully sued Pure back in 2013-2016 for patent infringement. Pure also hired a lot of ex-Data Domain employees (the Data Domain ex-CEO is on the board at Pure, btw), but my inside view is that was EMC’s fault for not moving fast enough, technology-wise.

  1. There are other examples such as w EMC who use marketing terms to exaggerate attributes of their products that create similar unnecessary complexities.

Well, after bashing EMC for thin provisioning compression claims (https://blog.purestorage.com/evergreen-storage-vs-dell-emc-f… where they say: thin provisioning and snapshots aren’t useful when comparing data reducing arrays.), Pure itself now includes thin provisioning front and center (https://www.purestorage.com/products/purity/purity-reduce.ht… )

In case you’re curious, Thin Provisioning is simply allocating the same unused space for multiple uses/users. Imagine you have a 100 GB disk and 3 users. Those 3 users today only use, say 25 GB each, so you have 25GB available. But, imagine if you tell each of those 3 users that they have 50 GB available. As long as all 3 users don’t take you up on that you’re golden. It’s what Southwest airlines has done (may still be doing) with airline seats. They know that not everyone who purchased a ticket on a flight will actually show up for the flight, so they have actually sold more tickets than they have seats. As long as they can predict cancellations well, everyone who wants a seat gets one and they have less empty seats.

  1. Darthtaco quotes Pure’s CEO: “But the important thing to realize is that flash doesn’t actually have to be cheaper than magnetic disk for the full transition to happen — it just has to be cheap enough for folks to be able to justify the conversion in terms of saving power, space, cooling, and manpower, and for folks to understand the business upside in moving to flash. … That ROI equation already works for the vast majority of important business applications today, and we think the pivot for unstructured data applications will happen much faster than people anticipate.”

This is a strong argument FOR Pure. Remember, dedupe on flash doesn’t give any better compression than dedupe on spinning disks. Flash is faster, but when you’re doing the compute intensive process of de-duping (or reconstruction), then disk speed will often not be the limiting factor (CPU and memory speed would be). Flash consumes less power, so you get some power and air conditioning cost advantages. And space matters since data centers are a fixed size and are expensive to expand. But, are those truly compelling arguments? Certainly not for switching, but perhaps for new setups.

I find it interesting that he see “unstructured data applications” as being the first-mover to flash. What would be some specific examples of these?

  1. Flash is better than Disks
    OK, sure, but everyone has some kind of flash solution. How does Pure’s architecture take advantage of flash in ways that traditional disk-based system can’t? In the end, it’s still racks of a certain form factor of storage, right? I know EMC used to charge extraordinary amounts of money for the EMC specific racks needed for their systems. I always thought that was a bad practice, leveraging the ROI by literally screwing the customer on what the real costs were. But then, EMC never really got the software value model and probably felt customers would always value a piece of metal more than software.

And on the Flash is Faster sub-argument, remember that Pure has deduplication always on. That requires processing power and will definitely slow things down. In some of the reading I did, it seemed that maybe Pure wasn’t doing de-dupe inline (meaning the de-duplication is done as data comes in and is written to storage) and instead was doing post-processing (meaning that uncompressed data was written and then the de-dupe was being done in a separate process, hopefully taking advantage of the typical setup where data isn’t being written to the device all the time). It could be Pure is playing some pretty cool tricks here, but as tamhas and hlygrail have pointed out, the simplicity of always doing de-dupe may not be the best in terms of overall performance. I don’t know enough to tell at this point in time.

If I were to postulate a reason, it would be that since Flash is more expensive, always using de-dupe and standard compression is a must in order for Pure systems to compete on stored byte for byte on price. If they can get away with that by having really good performance, then it’s a fine thing. So maybe some day Pure will let you turn off de-dupe if you want really high performance and don’t mind buying more storage to store the same data.

The storage industry is pretty staid. It was just a decade ago that the migration from tape to disk was revolutionary, and now the next migration is from spinning disk to flash. And it’s true that companies don’t want to have to hire Backup Administrators and Storage Administrators, etc., and would rather hire less fewer and less specialized people, so simplicity is a strong selling point. I do find it interesting, though, that even today even Pure Storage is command line based. Storage solutions should be GUI based, and I mean more than just dashboards to display status. For instance, does anyone here actually (or actually want to) configure their home router with a CLI (Command Line Interface)? We all use GUIs in browsers to configure and manage them. Same should be true of even large enterprise storage.

But now I’m on a soapbox, so I’ll stop.

I still don’t know how to predict Pure’s success moving forward. I can say that the aforementioned Frank Slootman (who went from CEO of Data Domain, selling it to EMC, to CEO of Service Now, but is moving on soon), who is on the board at Pure is a super smart businessman who also knows the storage world. But, I don’t know enough to know that Pure’s advantages are important enough to enough use cases at enough businesses for them to grow.

36 Likes

Sorry I haven’t responded sooner - I got this thing called a Day Job.
That’s so 20th century - you gotta sort that out.
Ant
(Sorry - it’s been a long day.)

1 Like

Since my horizon for all but a few of my dozen stocks is between now and the stock fall prior to the next recession, I am more concerned about what Pure can do in the next couple of years. Or more accurately the perception of what PURE can do.

If flash has as many advantages for commercial users as it does for my PC I expect conversion to be rapid. A rising tide lifting all ships? At least for a few years?
To restate your quote, Flash does not have to be perfect , just better and cheaper than what we have today.

I do own some PSTG but it is not the kind of stock I would be willing to hold through a recession.

1 Like