Can Dedup Move From Secondary to Primary Storage?

After making a huge impact in secondary and backup storage applications, data de-duplication is on the verge of becoming a must-have technology for primary data storage as well, a development that will change the economics of buying and deploying storage, according to Hifn.

Market developments, enhanced processor technology, and storage budget pressures are converging to make data de-duplication a practical and must-have solution for primary storage environments. The nature of primary data is changing to make it more amenable to the benefits of data de-duplication, while at the same time dedicated processors that are capable of performing de-duplication in real-time without degrading I/O performance are now available and affordable.

Here is an overview of the key issues for applying data de-duplication to primary data storage:

Virtualization:
Data de-duplication has not been perceived as a required technology for primary storage in part because tier 1 storage traditionally does not have anywhere close to the level of redundancy as tier 2 and tier 3 data volumes. With the increased use of server virtualization and multiple virtual machines running on one physical server, multiple instances of operating systems and applications are created, increasing the level of redundant data on expensive primary disk storage, making data de-duplication more attractive for minimizing the demand for costly primary storage arrays.

Real Time Data De-Duplication:
One of the limiting factors that has kept data de-duplication from primary storage use is the need to perform the de-duplication in real-time on production data. The backup storage version of data de-duplication technology means that the existing products are software based, using post-processing and applying data de-duplication only after the data has been written to disk. Alternatively, data de-duplication appliances simply don’t have the compute power needed to handle real-time the processor-intensive hashing algorithms used by most data de-duplication technologies. To handle these processes in real-time for primary storage environments, a dedicated offload processor that is optimized for data de-duplication is required. The next generation of hardware-advantaged, data de-duplication solutions will enable in-line, real-time de-duplication.

Cascading Hardware Cost Savings:
By reducing the need for primary storage capacity, data de-duplication delivers immediate IT cost savings with a corresponding reduction in the need to purchase costly high-performance primary disk arrays. But the advantages of smaller primary data storage loads cascade throughout a company’s IT infrastructure. Less primary data in turn lowers the need for secondary and archival storage capacity and puts less stress on the network backbone, delaying or eliminating the need to purchase additional switches, routers and other network resources.

Complementary Data Reduction & Capacity Optimization Technology
Applying data de-duplication to primary storage does not eliminate the usefulness of other data reduction technologies, such as data compression. Data de-duplication is a complementary technology that works well with existing data compression technologies, including the industry standard LZS compression algorithm, providing a double dose of data reduction.

Energy Cost Savings:
Skyrocketing energy prices are one of the major challenges facing global IT managers and data de-duplication can play a major role in inhibiting those costs. With its dramatic impact in reducing storage capacity requirements, companies get a corresponding reduction in the need for storage arrays, servers and the networking equipment to connect them. Eliminating all of that equipment decreases the energy demands to power and cool the data center or computer room , producing substantial energy cost savings.