PURE STORAGE. Have to read this

One more response. Hope we’re not getting Pure fatigue yet.

Where does the 305% come from? Disclosure that my tech credentials only extend as far as being my wife’s tech guy.

Here’s the best I can come up with and this will apply to the Nvidia DGX-1 case which the graphic comes from. For reference here is a typical non FlashBlade DGX-1 configuration:
https://devblogs.nvidia.com/wp-content/uploads/2017/04/image…

Streaming Cache in the form of 4xSSD is the topic

The SSD is on board the super computer. When you plug in the DGX to a typical storage DL training Network to do training the data needs to go from the storage to the SSD to the GPU for training. The data or images need to be “cached” to the SSD in order to be processed. When an engineer downloads their images of road signs or whatever they have they go to the network storage first. They wouldn’t download them directly into the onboard SSD. That data would get written over when another training task is assigned.

Here is how it appears FlashBlade makes a difference. It is built from the ground up to be parallel DL/AI storage. While it is Flash, it also has an entirely unique architecture and software overlay. They describe it as being “cache-less”. The Network File System (NFS) through this software exists in parallel with the DGX-1 and eliminates the on board SSD altogether. The GPU has instant access to the data you want to feed it so no need to “on board” the data first so it can be fed by the SSD. That is why it saves “end to end” time and also eliminates the “latency” issue. Many FlashBlades in a chassis can be linked into many DGX-1 all in parallel and I imagine can do some pretty amazing things. They could, for instance, crank out trained neural networks onto Pegasus chips to be placed into self driving cars at scale. Correct me if I’m wrong but each neural network on each Drive system has to be trained to operate independently of a cloud. You don’t just train one and then cut and paste it into the next computer. Given that I don’t have a reason to believe Pure is lying the improvement they achieve reaches 305% on a whole end to end. If everything in this paragraph is true then it appears FlashBlade might be something revolutionary and very important to the DL equation.

http://www.fujitsu.com/id/Images/8.3.3%20FAC2017Track3_Brent…
This is what I deduced from taking the slides from the link into context. Start with the first one of DGX. It either comes equipped with SSD or without when coupled with a FlashBlade network. Next consider the slide from the self driving car company network. Artista to Pure to Nvidia (like a Tinker ETF) It shows the flow of data and how the DGX GPUs are in parallel to the Blade. Then look at the improvement charts. The Blade slightly outperforms on a direct connect benchmark but when taken into a whole end to end it skips the most time consuming step in a training process. imagine changing that self driving car network into a legacy network with a SSD DGX and picture a conventional storage array and then SSD in the DGX. Also if you re read the article that started all the PSTG posts (I think):
https://blog.purestorage.com/ai-industry-needs-rethink-stora…

It again discusses how the Blade is created from the ground up to run in parallel to the GPU. And there is another graph showing many Blades in parallel with many DGXs. Combined with multiple other literature by pure storage about FlashBlade this is how I think it works.

Nvidia starts shipping their Pegasus and Xavier developer kits to their partners either this quarter or next. While those will have the car chips the kits also contain Volta DGXs and Pro visual equipment to train the neural networks and provide a means to test the system and train in simulators as well. So at least three of the big ones are using Pure in the
equation according to Pure, maybe there will be more who choose this option.

3 Likes

Your “koolaid” comment seems to suggest that all those database experts are in fact idiots.

What it suggests to me, as a retired database guy, is that decisions on hardware are largely made by management, and management often is far more familiar with vendor salesmen and dog-and-pony shows than actual performance details and requirements.

(There was a period when IT management where I worked was so under the influence of IBM that a small fortune was spent networking our buildings for the first time… with Token Ring rather than Ethernet.)

1 Like

What it suggests to me, as a retired database guy, is that decisions on hardware are largely made by management, and management often is far more familiar with vendor salesmen and dog-and-pony shows than actual performance details and requirements.

AMEN! Sometimes aided and abetted by corporate IT types who regard mere DBAs who have to take care of [mission critical] DB applications as lesser beings!

they may in fact provide cost and management savings for word processing, spreadsheets, and the like they provide significant compromises for high transaction volume database applications.

Tamhas, I finally understand why I couldn’t understand what you were talking about. Where I worked, these environments were entirely separate. All our office applications like MS-Office, email, sharepoint, etc. were on a Windows Server network. Separate network, servers, separate everything. The high transaction volume database applications were on a UNIX server network. We hosted Oracle, SQL Server, DB2 DBM systems on the UNIX platform. We even had some IMS running, but I don’t honestly know which network it ran against.

In any case, it never occurred to me that some companies would run office products and mainline applications on the same network.

But irrespective of that, here’s the way I see it. I’ll grant you that there might be some situations in which flash memory provides little overall, end-to-end, round-trip performance improvement. When a company experiences one of those situations they will have two choices. They can continue with their current situation in which case they won’t be a Pure customer. Or, they can change their IT architecture in order to avail themselves of the benefits of PSTG products. If you have read this thread, you will be aware that performance is never the only issue under consideration when making the purchasing decision. In fact, most the time it’s down the list from ROI.

2 Likes

Will it not require EMC or NTAP or HP to re-architecture their hardware and software to create the modularity and simplicity of what PSTG offers?

It sounds like you’re assuming that NetApp only offers spinning-disk storage systems. If you want to fully see the flash storage landscape for what it is from a competitive standpoint, you should be sure to look into the SolidFire line of NetApp products: https://www.netapp.com/us/products/storage-systems/all-flash…. Worth also noting that SolidFire is far from the only flash offering from NetApp: https://www.netapp.com/us/products/storage-systems/all-flash….

SolidFire line appears to have a very similar zero-effort installation process to PSTG, better performance in many cases (I’ve heard repeatedly that SolidFire wins against a Pure system in every customer POC or sales match-up they’ve participated in so far, but certainly don’t take that as gospel), and a mature support ecosystem behind it, including “replacing flash for free” on any supported system. Typically, that replacement would happen (disk shipped) before the customer even knows there’s a problem.

Nothing for or against either of these vendors (and again, I’m still watching for an entry point into PSTG myself), but make sure you see the whole picture before assuming that one company has a fully unique offering or a competitive moat.

6 Likes

Pure Storage is now a Microsoft Gold Partner.

https://twitter.com/8arkz/status/960933064787468288?ref_src=…

3 Likes

In any case, it never occurred to me that some companies would run office products and mainline applications on the same network.

Sadly, it happens a lot. I can’t speak to the actual frequency since it only ever comes to attention when someone has done the wrong thing and is trying to figure out how to fix it, but it is certainly common enough. Often it seems like a case of management believing the sales people, but sometimes there are even old time IT people in the mix, not paying attention to the DBA who manages the high transaction DB.

Hlygrail,

I would appreciate your comment on this comparison between Network Appliance and Pure: https://blog.purestorage.com/contrasting-pure-netapps-philos…

This is of course from a Pure blog, and it makes a compelling case (one example of multiple) as to why Pure is the better solution. The simplicity exampled here vs. Network Appliance, while both are said to be dong the same thing I thought was profound.

I have no industry expertise however, and frankly I find not having industry expertise to be an advantage in investing as you look at multiple other factors.

I would be interested in your comments regarding this comparison.

Thanks.

Tinker

3 Likes

I will summarize the thoughts I shared offline w/ Tinker on the Pure blog (https://blog.purestorage.com/contrasting-pure-netapps-philos…), with full knowledge it will probably spark more “nuh uh!!!” comments from certain vectors… and that’s fine. I have many years under my belt in this industry, much of it on the software development side, so take that for whatever it’s worth.

I read the Pure blog a few days ago from this same thread. They make a “compelling” argument – if you completely ignore some other facts:

  • Some (I’d argue many) customers actually want the flexibility to turn on/off features. Blanket-enabling them (such as deduplication) may be a marketing win for Pure Storage, but it isn’t necessarily or automatically a good thing. Unless /all/ you care about is price/gigabyte and being able to thin-provision hundreds or thousands of clients with “fake” (it’s not actually there, but the client doesn’t know better) storage, and that’s a smaller niche of the marketplace.
  • There is a cost for EVERY software feature or function, whether it’s performance or bandwidth or CPU cycles or operations/second or latency or whatever. Pure may be over-engineering their systems to account for “always-on”. I have some friends and even some ex-employees that work there, and I know more than that, but I don’t need to throw any more gasoline that way.
  • I find it very interesting that Pure compares their system to NetApp’s FAS line of products (which may have SATA, FC or SSD storage – which did they compare, or did they just guess or assume?), completely ignoring and not even mentioning the actual corollaries: NetApp SolidFire, or even the AFA and EF-Series all-flash systems. That’s Classic FUD/Spin Tactics 101, but hey, that’s what Marketing folks get paid to do.
  • Pure’s systems don’t have any SPC-1 performance specs – last I looked the SPC folks declined to test because Pure does some “tricks” to fake their numbers (their statement, not mine). Decide for yourself whether that matters and figure out who’s ranked where, but it’s an industry standard that all the mainline storage vendors measure against.

In short, that blog article is at least partially disingenuous. If you compare an apple to a grape and then point a finger at the grape grower because it doesn’t taste like all the apples… maybe the comparison was broken at the start?

For my $.02, “always-on” deduplication seems to be the Apple model – everything is a locked ecosystem, and you can’t control anything, while Apple controls everything. Oh, and you probably pay a premium for that level of “care.” It seems popular (or maybe we just have lots of followers in the world?). But I would always rather have the option to enable or disable things at my discretion, whether it’s personal or professional.

Disclosures again: I am a current AAPL shareholder, work in the storage industry and probably own one or more of those types of stocks, and am also watching for an entry point to own PSTG…

18 Likes

Hlygrail, reguarding you posts about PSTG and all the facts you say investors are ignoring and ‘using tricks to fake their numbers’ and then saying you’re looking for any entry point, what entry point would be sufficient for you to buy PSTG?

1 Like

Sorry I haven’t responded sooner - I got this thing called a Day Job.

Anyway, here’s some more thoughts and responses, not just to Tinker.

  1. instruction sheet that is on a business card, and in fact is on a business card.
    I saw the card (https://www.purestorage.com/content/dam/purestorage/new-for-… ) and that’s 3 commands for creating volumes, hosts, and connecting them. Just like anyone else. What’s missing is actually integrating the array into your backup or whatever application.
    For instance, here’s how to set things up for VMs: https://blog.purestorage.com/assigning-a-vvol-vm-storage-pol… Doesn’t read so simple. And here (https://blog.purestorage.com/announcing-the-new-pure-storage… ) is a link to the 216 cmdlets for integrating into Windows Powershell. No business card instructions here.

  2. No one comes close to the duplex ratios that PSTG does.

What’s a “duplex ratio”? Do you mean “de-duplication compression ratio”? If so, Pure isn’t really any better than Data Domain (bought by EMC in 2009). Data Domain pioneered inline de-duplication and successfully sued Pure back in 2013-2016 for patent infringement. Pure also hired a lot of ex-Data Domain employees (the Data Domain ex-CEO is on the board at Pure, btw), but my inside view is that was EMC’s fault for not moving fast enough, technology-wise.

  1. There are other examples such as w EMC who use marketing terms to exaggerate attributes of their products that create similar unnecessary complexities.

Well, after bashing EMC for thin provisioning compression claims (https://blog.purestorage.com/evergreen-storage-vs-dell-emc-f… where they say: thin provisioning and snapshots aren’t useful when comparing data reducing arrays.), Pure itself now includes thin provisioning front and center (https://www.purestorage.com/products/purity/purity-reduce.ht… )

In case you’re curious, Thin Provisioning is simply allocating the same unused space for multiple uses/users. Imagine you have a 100 GB disk and 3 users. Those 3 users today only use, say 25 GB each, so you have 25GB available. But, imagine if you tell each of those 3 users that they have 50 GB available. As long as all 3 users don’t take you up on that you’re golden. It’s what Southwest airlines has done (may still be doing) with airline seats. They know that not everyone who purchased a ticket on a flight will actually show up for the flight, so they have actually sold more tickets than they have seats. As long as they can predict cancellations well, everyone who wants a seat gets one and they have less empty seats.

  1. Darthtaco quotes Pure’s CEO: “But the important thing to realize is that flash doesn’t actually have to be cheaper than magnetic disk for the full transition to happen — it just has to be cheap enough for folks to be able to justify the conversion in terms of saving power, space, cooling, and manpower, and for folks to understand the business upside in moving to flash. … That ROI equation already works for the vast majority of important business applications today, and we think the pivot for unstructured data applications will happen much faster than people anticipate.”

This is a strong argument FOR Pure. Remember, dedupe on flash doesn’t give any better compression than dedupe on spinning disks. Flash is faster, but when you’re doing the compute intensive process of de-duping (or reconstruction), then disk speed will often not be the limiting factor (CPU and memory speed would be). Flash consumes less power, so you get some power and air conditioning cost advantages. And space matters since data centers are a fixed size and are expensive to expand. But, are those truly compelling arguments? Certainly not for switching, but perhaps for new setups.

I find it interesting that he see “unstructured data applications” as being the first-mover to flash. What would be some specific examples of these?

  1. Flash is better than Disks
    OK, sure, but everyone has some kind of flash solution. How does Pure’s architecture take advantage of flash in ways that traditional disk-based system can’t? In the end, it’s still racks of a certain form factor of storage, right? I know EMC used to charge extraordinary amounts of money for the EMC specific racks needed for their systems. I always thought that was a bad practice, leveraging the ROI by literally screwing the customer on what the real costs were. But then, EMC never really got the software value model and probably felt customers would always value a piece of metal more than software.

And on the Flash is Faster sub-argument, remember that Pure has deduplication always on. That requires processing power and will definitely slow things down. In some of the reading I did, it seemed that maybe Pure wasn’t doing de-dupe inline (meaning the de-duplication is done as data comes in and is written to storage) and instead was doing post-processing (meaning that uncompressed data was written and then the de-dupe was being done in a separate process, hopefully taking advantage of the typical setup where data isn’t being written to the device all the time). It could be Pure is playing some pretty cool tricks here, but as tamhas and hlygrail have pointed out, the simplicity of always doing de-dupe may not be the best in terms of overall performance. I don’t know enough to tell at this point in time.

If I were to postulate a reason, it would be that since Flash is more expensive, always using de-dupe and standard compression is a must in order for Pure systems to compete on stored byte for byte on price. If they can get away with that by having really good performance, then it’s a fine thing. So maybe some day Pure will let you turn off de-dupe if you want really high performance and don’t mind buying more storage to store the same data.

The storage industry is pretty staid. It was just a decade ago that the migration from tape to disk was revolutionary, and now the next migration is from spinning disk to flash. And it’s true that companies don’t want to have to hire Backup Administrators and Storage Administrators, etc., and would rather hire less fewer and less specialized people, so simplicity is a strong selling point. I do find it interesting, though, that even today even Pure Storage is command line based. Storage solutions should be GUI based, and I mean more than just dashboards to display status. For instance, does anyone here actually (or actually want to) configure their home router with a CLI (Command Line Interface)? We all use GUIs in browsers to configure and manage them. Same should be true of even large enterprise storage.

But now I’m on a soapbox, so I’ll stop.

I still don’t know how to predict Pure’s success moving forward. I can say that the aforementioned Frank Slootman (who went from CEO of Data Domain, selling it to EMC, to CEO of Service Now, but is moving on soon), who is on the board at Pure is a super smart businessman who also knows the storage world. But, I don’t know enough to know that Pure’s advantages are important enough to enough use cases at enough businesses for them to grow.

36 Likes

Sorry I haven’t responded sooner - I got this thing called a Day Job.
That’s so 20th century - you gotta sort that out.
Ant
(Sorry - it’s been a long day.)

1 Like

Since my horizon for all but a few of my dozen stocks is between now and the stock fall prior to the next recession, I am more concerned about what Pure can do in the next couple of years. Or more accurately the perception of what PURE can do.

If flash has as many advantages for commercial users as it does for my PC I expect conversion to be rapid. A rising tide lifting all ships? At least for a few years?
To restate your quote, Flash does not have to be perfect , just better and cheaper than what we have today.

I do own some PSTG but it is not the kind of stock I would be willing to hold through a recession.

1 Like

Good info Smorg and Hlygrail. Different perspectives. And yes, PURE’s bread and butter has been small and mid-size business, but Pure has also move into 25% of the Fortune 500 (which is still pretty small penetration). Clearly from our discussion, Pure’s software is still developing its sophistication.

On the other hand, although that keeps Pure out of some accounts or some projects, the simplicity also makes Pure a much better purchase in far more than they are left out. This is why Pure has grown to $1 billion in revenues faster than any storage company in history (although Nutanix might debate it, not sure where to put those two as they are not quite apples and oranges).

What Pure will argue is that our software continues to be improved and sophistication added. Over time the simplicity and benefits, without the unneeded extraneous complications, will move upward as we add and improve our software. The total cost of ownership for their core customers, and in the Fortune 500 projects they are in is materially less than any of its competitors.

Meanwhile Pure will say that NTAP and EMC and the like, like Windows that you can still see the underlying DOS code, and I am always shocked just how often and how intrusively Windows needs to make update. I use a Mac, but have to use Windows occasionally for some indoor cycle training, and every time I launch the program, a few times a month, I am inundated with mandatory updates, streaming while I am trying to use the software, messing with the software I am running, forcing shut downs, and with intrusive menus popping up that I need to click away while I am trying to climb a 2000 foot piece of road virtually. This happens only a few times a year with Mac, and far less intrusively.

Oh, I went on, that like Windows, the legacy software of NTAP and EMC will never b able to be as simple and efficient as what Pure offers, and it will never be optimized to run on flash. Flash, unlike magnetic, is damaged every time a cell is used.

So the argument is, look at our success coming from the bottom up. We build software with no unneeded complexity, custom built for flash, and we will continue to move upscale as we add features without unneeded legacy complications the benefits of which, that have been rapidly adopted in the marketplace, will also be welcome in the upper tier as we add to the sophistication of the software.

Meanwhile NTAP and EMC and HP will argue, that we already have the sophistication, and that we can adapt what we have for flash no problem. It is just another medium. And that “complexity” provides maturity, IT staff knows how to run it, and real professionals want options.

Seems to be where the arguments are.

On Pure’s side is their incredible customer satisfaction. It is like Venus vs. Mars in regard. Customers love Pure. This said, customers really like their NTAP, and are only slightly less happy with EMC.

What we don’t know is how much market share Pure is actually taking from NTAP and EMC. The premise is that both companies are losing marketshare to Pure, little by little anyways.

NTAP is firing back, NOT WITH OUR NEW PRODUCT!

I have seen the debates between EMC and Pure on how they calculate wins vs. each other. EMC’s definition of a win has nothing to do with a head to head battle vs. Pure, but EMC claims a 95% win rate against PURE (which does not really stand up to any credibility test). Pure is citing more than a 70-75% win rate when both products are tested against each other (which I assume are use cases that Pure is qualified for).

Thus, I have no idea how to comment on NTAP’s alleged 100% win rate at this point in time.

To be continued, as too much to do, and not really worth looking at the market crash at the moment.

Tinker

3 Likes

What’s a “duplex ratio”? Do you mean “de-duplication compression ratio”? If so, Pure isn’t really any better than Data Domain (bought by EMC in 2009). Data Domain pioneered inline de-duplication and successfully sued Pure back in 2013-2016 for patent infringement. Pure also hired a lot of ex-Data Domain employees (the Data Domain ex-CEO is on the board at Pure, btw), but my inside view is that was EMC’s fault for not moving fast enough, technology-wise.

So here’s some fun trivia: Guess where many/most of the original Data Domain guys came from? Answer: NetApp.

They were ‘mad’ that instead of making dedupe its own product, it was going to be built into the core of (then called) Data ONTAP, so a handful went off to do it ‘their way.’ That being so, I wouldn’t quite say they “pioneered” inline deduplication, but they were out front with the same crowd from other vendors, just positioning it for a different purpose.

And, not surprisingly, you can read dozens of articles from various leaders and now-ex-Data Domain CEOs about the culture at Data Domain and how it was modeled exactly after NetApp – that is, until they were Borg’d by EMC, which put a swift end to that part.

Someone also asked what my entry point might be for PSTG. I’m really hoping to pick it up “on the cheap,” so to speak, so I’d love to get in below $18. They haven’t turned an actual profit yet, and the risk is a little higher than some other options, so I want a little more safety cushion for my own comfort level. I was watching on Monday/Tuesday and it came really close at Tuesday’s market open – that might have been my missed opportunity, but with volatility and a supposed ‘correction’ back in play, I think there may be another one.

2 Likes

I did not know that NTAP acquired its SolidFire technology from a lessor start up.

Here is a Pure blog in regard: https://blog.purestorage.com/netapp-acquires-solidfire/

Studying up on this, and Pure’s former CEO, still on the board, in this interview, emphasizes that only Pure and EMC get flash, and NTAP has tried multiple strategies and has yet to get it right.

In this page of the interview, he mentions some of the materially different elements of flash. You can go straight to page 1 thereafter to read the whole thing or not. All interesting:

http://www.crn.com/slide-shows/storage/300079343/qa-pure-sto…

The reason we’re performing this way in the market is [that] the technology we’re delivering is highly, highly differentiated. Flash changes the way you design things. Each SSD can do about 64 things in parallel. [Incumbent vendors’] technology was designed for a mechanical disk that does one thing at a time. Flash can support variable I/O scheduling, [but] they use fixed-block architectures universally because it didn’t matter in the disk era. … We are two to five times better in data reduction. We’re unique in generally embracing the most cost-effective flash, and doing so well ahead of the market. We changed the business model to support cloud where the hardware and software morph transparently underneath the data so [there’s] no need to resell the same tech over and over again.

The reason that I believe Pure over EMC or NTAP is because Pure has done so well in this market despite the dominance of the legacy providers. It does appear that NTAP has had to retrofit its software and product onto its latest product, and in the interview the difficulties of this are discussed.

It may be that NTAP has got it right at last, or maybe not. I do not know. I can just look at the evidence in front of us and Pure is doing something no other storage vendor has ever done, and the only prior comparison is when NTAP came along in the 1990s to compete against EMC and disrupt much of that market to get to $5 billion plus in revenues.

I will do some more digging and hopefully come up with some real answers either way. The market seems to really like what NTAP has finally been able to do, and Pure admits EMC understands flash, that only Pure does it better, but little respect for NTAP.

We shall see.

Tinker

5 Likes

Tamhas:

Latency and random access performance challenges are two fundamental problems that memory-based storage systems alleviate.

TL;DR
Here’s why: The media is designed to address these issues of latency and random data access. RAM is random access memory, works in parallel, and is many time faster in data access times and data throughput.

Detailed description:
The key to addressing random IO access is employing a type of media that excels at random IO access patterns. Solid state storage arrays (AFAs, SSAs, etc…) largely employ NAND Flash, which is a non-volatile Random Access Memory. The description says it all. Disk has always had a very tough challenge with random data access because the drive heads can only be in one location at a time, and the data access for reads and writes is serial. The heads also may have to move to various points on the drive platter to read or write the data the application is requesting. The time it takes for the head to move, settle and read or write takes a few milliseconds. This is a big latency penalty, and during this time, the CPUs in the servers are in an I/O wait state.

SSDs and other memory-based media don’t have this physical restriction, and are able to access data both randomly, and in parallel (servicing multiple read and write requests simultaneously). Databases run like scalded cats on flash-based systems, and it is in this arena that memory based storage really started to show the promise of the technology in meaningfully improving business operations.

These technical reasons are the fundamental performance drivers that led the industry toward making and selling memory-based storage. Then, of course, the price premium between high-performance disk and SSDs closed (SSDs are now arguably less expensive than 15k and 10k RPM hard disks).

In addressing network latency, the networks that support storage are either high-bandwidth Ethernet (10GbE and above), or FibreChannel (designed as a very low latency protocol specifically for connecting storage arrays to servers). Network latency, as a rule, is very, very low, measured in nanoseconds (whereas storage latencies are microseconds or milliseconds). Usually, network latency is far less of a performance detractor than the storage media or the application itself.

The other issue of the bus, the thing the media is connected to in order to communicate with the CPU is this: IN order to make an easy insertion of memory-based into an architecture, the memory was packaged into a similar format as the existing hard disk drive. It is connected to the disk bus, talks to the system using a disk protocol (SATA or SAS), and is thereby limited in its throughput to the maximum speed of that bus, and limited to an extent by the need to translate logical block addresses (how disks place data) to memory registers (the function of a flash translation layer).

In order to fully leverage the performance of memory-based storage, a new protocol specification is being delivered, called the Non-Volatile Memory Express protocol. This lets operating systems and applications treat the memroy as memory, rather than as an analog to disk. This improves and fully leverages the parallel nature of memory access, and improves throughput, while driving down latency even further (10s of microseconds, rather than 100s of microseconds). Think of it like the transtion from older peripheral buses to the PCI bus.

5 Likes

Latency and random access performance challenges are two fundamental problems that memory-based storage systems alleviate.

TL;DR

thepsakis,
Any investing take-away associated with this technical description of how flash can be a big improvement over spinning disk storage, particularly in regards to Pure Storage (PSTG)?

Thanks,
volfan84
long PSTG…and exercising some PSTG $17.50 Feb 16 2017 call options tomorrow (bought at $0.66, currently at about $4.40 of intrinsic value)

Disk has always had a very tough challenge with random data access because the drive heads can only be in one location at a time

Usually! :slight_smile: Back in the late 60s when I was working on the Illiac IV project, we were introduced to a disk subsystem by Burroughs which had a head per track, so the only latency was the rotational latency. One imagines it was a wee bit expensive, though.

Databases run like scalded cats on flash-based systems

Absolutely … provided that the SSDs are local to the database.

In addressing network latency, the networks that support storage are either high-bandwidth Ethernet (10GbE and above), or FibreChannel (designed as a very low latency protocol specifically for connecting storage arrays to servers). Network latency, as a rule, is very, very low, measured in nanoseconds (whereas storage latencies are microseconds or milliseconds). Usually, network latency is far less of a performance detractor than the storage media or the application itself.

This does not match the experience of my sources. In particular, they note that it is not the latency of an individual component which matters, but the end to end latency. This includes the overhead of device drivers. As I believe I previously noted, the DB with which I work will support both shared memory clients, where the client reads and writes directly to and from the buffers of the server, and remote clients connected via TCP/IP. It is possible to run the remote clients on the same physical machine as the server so that requests and responses go down and up through the TCP/IP stack, but without any network actually involved. Those clients are substantially lower performance than the shared memory clients.

Volfan84:

I’d say, based on its growth relative to the other companies in this space, IDC and Gartner analysis of their leadership, the burgeoning AI market and data growth at large, and the demand for faster, simpler, more secure, and more sustainable storage infrastructure, that PSTG is in a tremendous position for growth in a market segment that is in the process of major transformation.

1 Like