Enhanced Flash Memory Lifetime With DIE-RAID
From Innodisk
This is a Press Release edited by StorageNewsletter.com on February 18, 2022 at 2:02 pmAs the physical size of NAND flash cells reduces, more cells are accommodated per unit area, increasing storage capacity, according to Innodisk Europe B.V.
However, this may lead to higher interference with the charge trapped in the cell, resulting in a larger error bit rate. Each cell can hold more bits, which means the buffer between voltage levels decreases and read errors become more likely. Without countermeasures, this leads to failures in modern NAND flashes after a short time. Error correction is therefore one of the most important functions in memory products.
Due to its effectiveness, BCH code (Bose Chaudhuri Hocquenghem code) has been a commonly used error correction method in NAND flash. With recent technological advancements, this method is being replaced by Low-Density Parity Check (LDPC) as the standard ECC function for most SSDs. It has a stronger error correction capability compared to the BCH code, providing a more precise identification of the original bit sequence after an error.
DIE-RAID is another tool for correcting erroneous bits. Its drawback is that SSDs appear to have capacities below the standard Power-of-Two table. This seemingly missing use capacity is in fact storing parity data, which is used to correct error bits when ECC fails to do so. Combined with other error correction features, DIE-RAID increases the number of program/erase (P/E) cycles and ensures consistent flash performance over time.
RAID uses mirroring, striping, and parity data storage to ensure data integrity when using 2 or more storage drives. Mirroring copies of data from one drive to another ensures that a set of data is available in the event of a drive fail (RAID-1). Striping describes how data sets are written to the different drives, as opposed to being stored on a single drive. Parity data is used in configurations with 3 or more drives, each storing a section of the distributed parity data in case any of the remaining drives fail (RAID-5).
DIE-RAID error correction method follows the same principle as standard RAID for storage drives in RAID-5. Instead of spreading the data across the drives, DIE-RAID spreads the data across different die, adding a parity buffer to each set. The RAID engine, located in the controller, decides how to store the data received from the SSD. The engine constructs the RAID stripes and also performs RAID recovery when error bits are detected and other ECC functions could not fix the problem. Once the command to read data is sent, it first goes through the LDPC engine. Whether the data is correct or one or more error bits have been corrected by the LDPC engine, the data is read without problems. If the LDPC engine is unable to correct the error bit, the RAID engine performs a RAID rebuild to correct the error. It will mark the block as faulty if it fails to correct it. This means that DIE-RAID acts as an extra layer against error bits and ultimately increases the lifespan of any SSD.
For DIE-RAID, a part of the capacity of the SSD must be reserved for error correction. However, especially in the industrial sector, DIE-RAID contributes increasing the lifetime and ensuring higher data integrity. With DIE-RAID, the start of the read repetition can be delayed due to the additional decision level in determining the bit values. This means an increase in data integrity and a more stable device lifetime. It is suitable for all applications that have a high data throughput and ensures there that the SSD can provide continuous performance over the long term.