What are you looking for ?
Infinidat
Articles_top

Raidix Storage Software Provides Solution for Silent Data Corruption

Caused by bugs in drivers and disk firmware, memory errors, drive-head crashes

Media and entertainment industry depends on highly accessible and reliable content.

Any interruptions in workflow can delay project delivery times, pulling valuable resources from more productive uses. With potentially catastrophic consequences, silent data corruption is one of the most serious IT challenges facing modern media professionals. Though previous solutions address aspects of dealing with silent data corruption, they are incomplete and often degrade performance.

RAIDIX, LLC. has pioneered a solution to detect and correct silent data corruption, while retaining high-performance storage standards.

raidix

What is Silent Data Corruption?
Silent data errors are caused by bugs in drivers and disk firmware, memory errors, drive-head crashes, as well as similar software and hardware problems. Silent errors that occur during the write to disk process are the most dangerous, as there is no indication that the data is incorrect.

Silent errors go undetected by drive firmware and host OSs. No matter what a ‘regular’ system does, silent errors and their corresponding data corruptions will not be detected, unless they result in data structure corruption, which is verified for integrity by the system or application software. For example, a silent error in a file system table will likely be detected at a point when the host OS will be unable to locate files or their fragments, or the database server will notice a malformed table caused by a corruption of information keeping the table structures.

As drive capacity continuously increases, silent data coruptions are becoming more prevalent. Multiple independent studies have shown corruption rates starting at one error for every 67GB to much higher rates. In short, silent data corruption is a pressing reality in the large capacity storage environment.

Approaches to Combat Silent Data Corruption
Combating data corruptions has long been a high-priority concern of computer hardware and software manufacturers. A variety of preventative methods exist. For example, some errors are detected using checksum functions stored in each drive sector. When disk operations generate plural errors within a single sector, drives may copy the information to a reserve sector and remap the failing sector for a healthy one, without assistance from OS software. The following approaches are commonly used by hardware and software vendors to detect and, if possible, to correct silent data errors.

Adding checksum information to each block
This method works best with select SAS drives, allowing disks with larger than standard sector size. Checksums, stored next to each data-block, are used during read operations to detect a possible corruption. This approach reduces the formattable disk volume, but has relatively low impact on performance.

Most drives support only the default sector size, so checksums have to be stored in a different sector. Since additional sector reads are constantly required, checksuming will severely degrade performance, in addition to taking up disk capacity.

Write-ReRead
After each write, the sector is read back and checksumed. The checksums of the data block before and after the write are compared to verify that the data was written correctly.

RAID systems can employ two additional instruments for silent error detection:

Vertical-parity
In addition to stripping data and parity information, secondary parity records block checksums written on each drive in the array. This vertical parity strip is used for detecting data corruptions.

Using Q redundancy of RAID-level 6 arrays to detect corruptions
RAID-6 arrays use a scheme called P+Q redundancy. It utilizes Reed-Solomon codes to protect against up to two disk failures. Q checksum can be used to verify data integrity and to detect data corruptions.

Sun/Oracle’s ZFS file system was designed to combat data corruptions by storing redundancy information on multiple levels. RAIDIX’s forward error-correction algorithm analyzes RAID metadata to detect and fix silent corruptions, while regular disk operations are performed, without performance degradation. In addition, RAIDIX performs background scans for silent errors during lower-activity periods. Unlike ZFS’s process of tying information protecting data from corruptions to the proprietary file system structures, RAIDIX implements error correction at the block level, rendering compatibility with any OS and the file system chosen by the platform’s administrator.

RAIDIX’s algorithm, developed through properties of RAID-6 checksums, detects and corrects silent data corruptions. Since it does not need additional storage and/or read operations, using only standard RAID properties, it exhibits performance far exceeding competing approaches.

Articles_bottom
AIC
ATTO
OPEN-E