Large-Scale Study of Flash Memory Failures
By Carnegie Mellon University and Facebook
By Jean Jacques Maleval | September 15, 2020 at 1:56 pmTo read this article from Carnegie Mellon University and Facebook, click on:
A Large-Scale Study of Flash Memory Failures in the Field
Based on our field analysis of how flash memory errors manifest when running modern workloads on modern SSDs, this paper makes several major observations:
(1) SSD failure rates do not increase monotonically with flash chip wear; instead they go through several distinct periods corresponding to how failures emerge and are subsequently detected,
(2) the effects of read disturbance errors are not prevalent in the field,
(3) sparse logical data layout across an SSD’s physical address space (e.g., non-contiguous data), as measured by the amount of metadata required to track logical address translations stored in an SSD-internal DRAM buffer, can greatly affect SSD failure rate,
(4) higher temperatures lead to higher failure rates, but techniques that throttle SSD operation appear to greatly reduce the negative reliability impact of higher temperatures, and
(5) data written by the operating system to flash-based SSDs does not always accurately indicate the amount of wear induced on flash cells due to optimizations in the SSD controller and buffering employed in the system software.