What are you looking for ?
Infinidat
Articles_top

SLC SSDs no More Reliable Than Lower End MLC Drives – University of Toronto/Google

Higher rate of problems with SSDs rather than HDDs

This report was written by Bianca Schroeder, University of Toronto, Raghav Lagisetty and Arif Merchant, Google Inc.

Flash Reliability in Production: The Expected and the Unexpected

This paper provides a number of interesting insights into flash reliability in the field. Some of these support common assumptions and expectations, while many were unexpected.

The summary below focuses
on the more surprising results and implications from our work:

  • Between 20-63% of drives experience at least one uncorrectable error during their first four years in the field, making uncorrectable errors the most common non-transparent error in these drives. Between 2-6 out of 1,000 drive days are affected by them.
  • The majority of drive days experience at least one correctable error, however other types of transparent errors, i.e. errors which the drive can mask from the user, are rare compared to non-transparent errors.
  • We find that RBER (raw bit error rate), the standard metric for drive reliability, is not a good predictor of those failure modes that are the major concern in practice. In particular, higher RBER does not translate to a higher incidence of uncorrectable errors.
  • We find that UBER (uncorrectable bit error rate), the standard metric to measure uncorrectable errors, is not very meaningful. We see no correlation between UEs and number of reads, so normalizing uncorrectable errors by the number of bits read will artificially inflate the reported error rate for drives with low read count.
  • Both RBER and the number of uncorrectable errors grow with PE cycles, however the rate of growth
    is slower than commonly expected, following a linear rather than exponential rate, and there are no sudden spikes once a drive exceeds the vendor’s PE cycle limit, within the PE cycle ranges we observe in the field.
  • While wear-out from usage is often the focus of attention, we note that independently of usage the age of a drive, i.e. the time spent in the field, affects reliability.
  • SLC drives, which are targeted at the enterprise market and considered to be higher end, are not more reliable than the lower end MLC drives.
  • We observe that chips with smaller feature size tend to experience higher RBER, but are not necessarily the ones with the highest incidence of non-transparent errors, such as uncorrectable errors.
  • While flash drives offer lower field replacement rates than HDD drives, they have a higher
    rate of problems that can impact the user, such as uncorrectable errors.
  • Previous errors of various types are predictive of later uncorrectable errors. (In fact, we have work in progress showing that standard machine learning techniques can predict uncorrectable errors based on age and prior errors with an interesting accuracy.)
  • Bad blocks and bad chips occur at a signicant rate: depending on the model, 30-80% of drives develop at
    least one bad block and and 2-7% develop at least one bad chip during the first four years in the field. The latter emphasizes the importance of mechanisms for mapping out bad chips, as otherwise drives with a bad chips will require repairs or be returned to the vendor.
  • Drives tend to either have less than a handful of bad blocks, or a large number of them, suggesting that impending chip failure could be predicted based on prior number of bad blocks (and maybe other factors). Also, a drive with a large number of factory bad blocks has a higher chance of developing more bad blocks in the field, as well as certain types of errors.

Complete report

Articles_bottom
AIC
ATTO
OPEN-E