PURE STORAGE. Have to read this

Tamhas:

Latency and random access performance challenges are two fundamental problems that memory-based storage systems alleviate.

TL;DR
Here’s why: The media is designed to address these issues of latency and random data access. RAM is random access memory, works in parallel, and is many time faster in data access times and data throughput.

Detailed description:
The key to addressing random IO access is employing a type of media that excels at random IO access patterns. Solid state storage arrays (AFAs, SSAs, etc…) largely employ NAND Flash, which is a non-volatile Random Access Memory. The description says it all. Disk has always had a very tough challenge with random data access because the drive heads can only be in one location at a time, and the data access for reads and writes is serial. The heads also may have to move to various points on the drive platter to read or write the data the application is requesting. The time it takes for the head to move, settle and read or write takes a few milliseconds. This is a big latency penalty, and during this time, the CPUs in the servers are in an I/O wait state.

SSDs and other memory-based media don’t have this physical restriction, and are able to access data both randomly, and in parallel (servicing multiple read and write requests simultaneously). Databases run like scalded cats on flash-based systems, and it is in this arena that memory based storage really started to show the promise of the technology in meaningfully improving business operations.

These technical reasons are the fundamental performance drivers that led the industry toward making and selling memory-based storage. Then, of course, the price premium between high-performance disk and SSDs closed (SSDs are now arguably less expensive than 15k and 10k RPM hard disks).

In addressing network latency, the networks that support storage are either high-bandwidth Ethernet (10GbE and above), or FibreChannel (designed as a very low latency protocol specifically for connecting storage arrays to servers). Network latency, as a rule, is very, very low, measured in nanoseconds (whereas storage latencies are microseconds or milliseconds). Usually, network latency is far less of a performance detractor than the storage media or the application itself.

The other issue of the bus, the thing the media is connected to in order to communicate with the CPU is this: IN order to make an easy insertion of memory-based into an architecture, the memory was packaged into a similar format as the existing hard disk drive. It is connected to the disk bus, talks to the system using a disk protocol (SATA or SAS), and is thereby limited in its throughput to the maximum speed of that bus, and limited to an extent by the need to translate logical block addresses (how disks place data) to memory registers (the function of a flash translation layer).

In order to fully leverage the performance of memory-based storage, a new protocol specification is being delivered, called the Non-Volatile Memory Express protocol. This lets operating systems and applications treat the memroy as memory, rather than as an analog to disk. This improves and fully leverages the parallel nature of memory access, and improves throughput, while driving down latency even further (10s of microseconds, rather than 100s of microseconds). Think of it like the transtion from older peripheral buses to the PCI bus.

5 Likes