What are you looking for ?
Advertise with us
RAIDON

Pure Storage: The Price of Flash Is Right

By David Hill, analyst, Mesabi Group

pundit_hill_pure_storage David Hill, analyst, Mesabi Group,
in a weekly review of Pund-IT, Inc.,
dated September 18, 2013,wrote:

A common perception is that flash storage is more expensive than high performance (i.e. 10K/15K RPM) HDDs. On a raw, per-byte basis, that may be the case, but that is not the right measure; instead, the comparison should be between bytes of usable storage, not raw storage. That is to say, the measure should be applied to the storage that is actually put to work. Plus, since using deduplication and compression magnifies the actual raw capacity of flash storage into a much larger quantity of usable storage, the price difference may very well vanish in, well, a flash.

Pure Storage Challenges Common Perceptions
Pure Storage is an all-flash, controller-based, scale-up enterprise storage vendor that competes directly with traditional and hybrid storage arrays, as well as with all-flash storage arrays. It champions that approach and claims that its flash storage is less expensive today than high performance HDDs, and that the gap favoring flash storage will only increase over the next few years. The company has also tackled and addressed supposed reliability issues with flash storage so that, from a cost and reliability perspective, flash storage in a controller-based, all-flash array measures up.

Not to go into all the depth, but Pure Storage builds deduplication and compression into its flash storage arrays as a standard feature that the customer cannot turn off (and, in fact, for architectural reasons having to do with managing writes, would not want to turn off. Although different storage workloads are not subject to deduplication and compression uniformly, Pure Storage claims approximately a 6 to 1 data reduction ratio on average, but results can be higher or lower. Still, that means that each 1TB of flash purchased produces 6TB of usable storage, a ratio that makes investing in flash highly attractive.

How valid is their claim? Well, VCs have just advanced Pure Storage $150 million in additional funding. Although large vendors are obviously investing even larger sums in flash storage, smaller vendors are often able to take a focused approach and not have to worry about protecting existing legacy investments.

Why Flash Storage Can Combine Deduplication
and Compression Efficiently and HDD Storage Cannot

The obvious question is: Can’t HDDs perform those same deduplication and compression functions just as efficiently? Now, compression can be applied successfully to HDDs, but that practice has not seen broad acceptance. Deduplication in the form of single instance file deduplication is fairly common but generally not really block-level deduplication (and there is no reason that single instancing could not be applied before block level deduplication on flash, anyway).

Pure Storage has identified four specific reasons why similar deduplication and compression technology is highly unlikely to be applied to HDDs. All take into account the difference in architectures between flash storage that (as a solid state technology) has no moving parts and HDDs that (as electromagnetic devices) do. Moreover, HDD arrays are collections of many similar, but still separate devices. These are ‘pooled’¨ in the sense of the discrete drives being aggregated together, but are not pooled in the sense of uniform components (flash memory chips) linked together so as to be indistinguishable. Why does Pure Storage believe these differences matter with deduplication and compression?

  • Flash storage is more efficient than HDDs in dealing with random I/Os on reads – the process of deduplication takes out duplicate data and replaces with pointers; the result is that any given dataset may be spread over much of the storage array; thus when reading a file whose pieces were originally sequentially linked, pieces may be anywhere; this creates random I/Os which are the bane of existence of HDDs; think of all the moving heads when trying to reassemble a file from a large number of individual disks versus the ‘virtual’ hop and skip approach (so to speak) of flash storage; that is why the different pooling architectures matter with deduplication on reads.
  • Flash storage is more efficient than HDDs in dealing with writes – not getting into all the details, but deduplication and compression add operations complexity (such as verifying the validity of the data that is being written) that take time and CPU cycles; flash storage can deal with these issues without the significant additional overhead that HDDs have to use to do the job; thus flash storage is more efficient on the write side.
  • Flash is more efficient than HDDs in dealing with storage virtualization – the data reduction produced by the combination of deduplication and compression requires the complete virtualization of the array (as the process of necessity separates the logical position of the data from its physical placement); this virtualization is very fine-grained (such as 512-byte chunks); that leads to a metadata structure that can handle the pointers to billions and potentially to trillions of objects; Pure Storage argues that retrofitting the controllers of standard HDD arrays simply isn’t possible; doing it even upfront with a flash array is very difficult; here again the advantage goes to flash storage.
  • Flash storage makes dealing with the modified data using compression easier – compression is non-deterministic in size; that means that when a file is over-written or modified, it may not fit in the same space; that can lead to a read-modify-write cycle in which, among other things, decompression and recompression have to take place; although StorageTek’s Iceberg addressed this problem as far back as 1994 on HDDs, the latencies involved with mechanical devices are far larger and more cumbersome than with flash.

The bottom line is that flash storage can make data reduction in its entirety work, and that there are significant challenges in doing the same with HDD arrays. So Pure Storage’s argument about the competitive positioning of flash storage and HDDs is likely to stand the test of time.

Where Pure Storage Is Coming From
Pure Storage focuses on the scale-up midrange and enterprise market. That means that it does not focus on the service provider market where QoS software is important, such as to deal with ‘noisy neighbor’ multi-tenancy issues where one user may try to hog resources. However, the company does need to focus on storage management software basics, such as replication technology, but there is no reason to believe that it will not be able to deliver these capabilities as needed.

Scale-up versus scale-out approaches to storage have been much debated, but practically speaking, scale-up meets the needs of many customers, and trying to be all things to all men may not be in the best interests of Pure Storage.

One challenge for Pure Storage is the scale-up hybrid storage array that mixes flash and HDDs. Such products are available from many vendors, including large ones who have the marketing and sales muscle, as well as an installed base of customers for whom switching costs would be an issue. A second challenge is from competing all-flash array vendors. These may also be able to support deduplication and compression so that, in time, the data reduction claims that are Pure Storage’s bread and butter will become check-box items. However, Pure Storage argues that doing this with the necessary latency and reliability requirements is very difficult, and that its competitors have not yet and may not be able to duplicate what Pure Storage has already demonstrated. If so, Pure Storage will continue to have a competitive advantage in data reduction.

Still, Pure Storage is not just about price, but also ease of use and other qualities that have attracted customers to the company. Moreover, Pure Storage feels that it has a window of 18 months where it can effectively deploy its $150 million to continue innovative development that can give its products a somewhat sustainable competitive advantage. In addition, the company plans to tighten its marketing and sales focus to build a stronger customer base to the growing number of new prospects that market forecasts suggest are receptive to the idea of deploying an all-flash array for primary storage.

Mesabi Musings
The topic of flash – when, where, how much – is probably the hottest storage topic today. Many flash configurations and architectures have proven to be quite attractive for many use cases. Still there is a lot of untapped potential, and one means of capturing that potential is to overcome perceptions that all-flash storage arrays are too expensive relative to high performance HDDs.

Pure Storage argues that the combination of deduplication and compression as standard functionalities on its all-flash storage arrays results in 6TB of usable storage for each 1TB of raw flash storage, enabling it to compete economically with HDDs where 6TB of raw and usable capacity are the same. On a level playing field, Pure Storage believes that it can win the case for why buying all-flash storage arrays is better than buying hybrid or traditional storage arrays. That should be an interesting conversation but it will be one where price difference is not the only or determining factor.

Articles_bottom
ExaGrid
AIC
Teledyne
ATTO
OPEN-E