Next-Gen De-Dupe and Compression: Object-Based De-Dupe
This is a Press Release edited by StorageNewsletter.com on Fri, December 23rd, 2011
By John Everett, storage business manager, Dell EMEA
This article has been written by John Everett, storage business manager, Dell EMEA.

The Next-Generation of Deduplication and Compression
With data growing at an unprecedented rate, organisations of all sizes are looking to maximise the efficiency of how they store and manage data throughout its entire lifecycle. This ongoing challenge has led to the proliferation of technologies such as thin provisioning, automated tiering and scale-out storage, which can deliver both Capex and Opex savings through smart resource management for better utilisation rates, increased energy efficiency and simplified administration.
Now, advances in deduplication and compression technologies are allowing organisations to push utilisation rates even higher through what Dell calls 'content-aware storage optimisation' - also known as object-based deduplication - shrinking meaningful amounts of data for significant cost and management savings.
At a basic level, deduplication is the process of eliminating duplicate copies of data and replacing them with pointers to a single copy. Its function helps organisations reach two primary goals: to reduce the amount of storage capacity needed to store a myriad of data, and to decrease the amount of data in flight during backup or replication processes. As it stands, the dominant use case for deduplication is backup storage, because of the amount of static data that organisations have to backup. Nevertheless, deduplication technology has developed into other data centre storage platforms such as NAS.
Some deduplication processes examine files in their entirety to determine whether they are duplicates, which is referred to as file-level deduplication, or 'Single Instance Storage,' while others break the data into blocks and try to find duplicates among the blocks, which is referred to as block-level deduplication. Block-level deduplication typically provides more granularity and a greater reduction in the amount of utilised storage capacity compared with file-level deduplication. This is particularly appealing from a bac-up perspective. Both types of deduplication are commonly used and offered today; however, there is a growing appreciation that these approaches may not be sufficient to handle the growth of big data in verticals such as oil and gas, life sciences, media and entertainment.
A more intelligent form of deduplication has emerged in the form of object-based deduplication. Now, organisations can take advantage of next-generation technology that is tailored to their particular vertical. This can be achieved with a solution that bridges the gap between applications and native storage platforms to optimise the way data is stored. This optimisation technology identifies how a given file is structured, breaking it down to component sub files and then selecting which is most effective from a library of more than 100 different compression algorithms for the targeted file. Even if the file has never before been identified, and there is no content-specific compressor, the technology will infer information about the structure and nature of the contents to select the most effective data-reduction algorithm. By understanding the layout of specific application files - like an email programme or a digital image - IT can make intelligent decisions about how to de-dupe and compress that data for optimal storage.
The central components of Dell's data-processing system include two types of content-aware algorithms and a neural net framework for testing and selecting different compressors for best run-time efficiency. The two types of content-aware algorithms are de-layering algorithms, which dissect files to identify the contiguous sub-objects, and data-shrinking algorithms, which include deduplication and compression. These custom compressors are more capable of shrinking meaningful amounts of data that plague specific verticals.
To further reap the benefits of deduplication, this technology should be able to be seamlessly applied across the entire IT infrastructure. To this end, Dell is rolling out storage optimisation technology across a variety of solutions for primary storage, archive, and backup. Deduplication and compression will be integrated in the Dell Scalable File System and Dell Object storage; once data is deduplicated, it can move in a deduplicated state from one storage system to another. For example, data that is deduplicated on Dell primary storage solutions can be backed up without rehydration to Dell backup storage, which can then be replicated in a deduped state over a LAN/WAN to a Dell backup storage replica. It is this end-to-end optimisation of data from the server to storage to the cloud that brings the most value to an end user organisation in a data heavy world.
Even though dededuplication and compression technology has been around for a few years, it is here to stay and is evolving rapidly. To be truly effective in today's business world as well as tomorrow's, organisations should look to a solution that adheres to three main tenants:

The Next-Generation of Deduplication and Compression
With data growing at an unprecedented rate, organisations of all sizes are looking to maximise the efficiency of how they store and manage data throughout its entire lifecycle. This ongoing challenge has led to the proliferation of technologies such as thin provisioning, automated tiering and scale-out storage, which can deliver both Capex and Opex savings through smart resource management for better utilisation rates, increased energy efficiency and simplified administration.
Now, advances in deduplication and compression technologies are allowing organisations to push utilisation rates even higher through what Dell calls 'content-aware storage optimisation' - also known as object-based deduplication - shrinking meaningful amounts of data for significant cost and management savings.
At a basic level, deduplication is the process of eliminating duplicate copies of data and replacing them with pointers to a single copy. Its function helps organisations reach two primary goals: to reduce the amount of storage capacity needed to store a myriad of data, and to decrease the amount of data in flight during backup or replication processes. As it stands, the dominant use case for deduplication is backup storage, because of the amount of static data that organisations have to backup. Nevertheless, deduplication technology has developed into other data centre storage platforms such as NAS.
Some deduplication processes examine files in their entirety to determine whether they are duplicates, which is referred to as file-level deduplication, or 'Single Instance Storage,' while others break the data into blocks and try to find duplicates among the blocks, which is referred to as block-level deduplication. Block-level deduplication typically provides more granularity and a greater reduction in the amount of utilised storage capacity compared with file-level deduplication. This is particularly appealing from a bac-up perspective. Both types of deduplication are commonly used and offered today; however, there is a growing appreciation that these approaches may not be sufficient to handle the growth of big data in verticals such as oil and gas, life sciences, media and entertainment.
A more intelligent form of deduplication has emerged in the form of object-based deduplication. Now, organisations can take advantage of next-generation technology that is tailored to their particular vertical. This can be achieved with a solution that bridges the gap between applications and native storage platforms to optimise the way data is stored. This optimisation technology identifies how a given file is structured, breaking it down to component sub files and then selecting which is most effective from a library of more than 100 different compression algorithms for the targeted file. Even if the file has never before been identified, and there is no content-specific compressor, the technology will infer information about the structure and nature of the contents to select the most effective data-reduction algorithm. By understanding the layout of specific application files - like an email programme or a digital image - IT can make intelligent decisions about how to de-dupe and compress that data for optimal storage.
The central components of Dell's data-processing system include two types of content-aware algorithms and a neural net framework for testing and selecting different compressors for best run-time efficiency. The two types of content-aware algorithms are de-layering algorithms, which dissect files to identify the contiguous sub-objects, and data-shrinking algorithms, which include deduplication and compression. These custom compressors are more capable of shrinking meaningful amounts of data that plague specific verticals.
To further reap the benefits of deduplication, this technology should be able to be seamlessly applied across the entire IT infrastructure. To this end, Dell is rolling out storage optimisation technology across a variety of solutions for primary storage, archive, and backup. Deduplication and compression will be integrated in the Dell Scalable File System and Dell Object storage; once data is deduplicated, it can move in a deduplicated state from one storage system to another. For example, data that is deduplicated on Dell primary storage solutions can be backed up without rehydration to Dell backup storage, which can then be replicated in a deduped state over a LAN/WAN to a Dell backup storage replica. It is this end-to-end optimisation of data from the server to storage to the cloud that brings the most value to an end user organisation in a data heavy world.
Even though dededuplication and compression technology has been around for a few years, it is here to stay and is evolving rapidly. To be truly effective in today's business world as well as tomorrow's, organisations should look to a solution that adheres to three main tenants:
- to be transparent to the end user and applications, meaning that there should not be any performance delays upon retrieval;
- to be customised to specific verticals with more and better algorithms and logic; and
- to be utilised end-to-end across the entire workflow to ensure optimisation of the overall IT environment.
With all the daily news
on the WW storage industry, this
website is updated every day at 9AM
in Chicago or 4PM in Paris.
You can subscribe to receive
an email with the daily headlines.
COMPLETE STORAGE
START-UP DATABASE
It contains more than 350 current
storage start-ups in the world
(2/3 in USA), with, for each firm:
- Company name,
- Headquarters, web site, CEO
- Year founded,
- Business activity,
- Yearly financial funding
and total received,
- Classification by sector.




Print this news
