DataCore

All Firms in De-Dupe/ANCIEN

But today's main question is: "Who is not involved?"

By Jean-Jacques Maleval on 2012.05.28
AddThis Social Bookmark Button

DOCUMENT: (About) All Firms in De-Dupe

In data reduction, first technology was and continues to be lossless compression used at least since 1990 for tape, HDD or LAN/WAN transmission with software or chips to reduce the size of the files. Then de-dupe from 2000.

We don't study here the algorithms to compress sounds, images and videos.
Data Compression
According to wolframscience.com, modern work on data compression began in the late 1940s with the development of information theory. In 1949 Claude Shannon and Robert Fano devised a systematic way to assign codewords based on probabilities of blocks. An optimal method for doing this was then found by David Huffman in 1951. In the mid-1970s, the idea emerged of dynamically updating codewords for Huffman encoding, based on the actual data encountered. And in the late 1970s, with online storage of text files becoming common, software compression programs began to be developed, almost all based on adaptive Huffman coding.

In 1977 Abraham Lempel and Jacob Ziv suggested the basic idea of pointer-based encoding LZ (Lempel–Ziv). In the mid-1980s, following work by Terry Welch, the so-called LZW (Lempel–Ziv–Welch) algorithm rapidly became the method of choice for most general-purpose compression systems. It was used in programs such as PKZIP, as well as in hardware devices such as modems. Also noteworthy are the LZR (LZ–Renau) methods, which serve as the basis of the standard Zip method.

Among the first companies involved we found in 1990 InfoChip Systems in Santa Clara, CA and Hardware Architecture in Moscow, ID. One of the leaders at that time was Stac Electronics in Carlsbad, CA. There was also some proprietary methods to reduce data on tapes (HP DCLZ for QIC and DAT, IBM IDRC for 3480 cartridges, etc).

With compression, the average is no more than 2X reduction. De-dupe has completely changed the storage world with 10X to 100X ratios depending on the data. Note that de-dupe and compression can be used together.

Who invented de-dupe?
That's a difficult question. We have never heard about a company claiming to be the first one.

The pioneers seems to be Avamar, Data Domain, Diligent, Exagrid, FilePool, Permabit, Riverbed and Rocksoft at the beginning of the century.

Avamar was established in 1999 and had their first source de-dupe product in the market in 2002. The company was bought by EMC in 2006 for $165 million.

Data Domain was born in 2001 and conceived a D2D de-dupe appliance. After getting $41 million in financial funding, it raised $111 million following an IPO in 2007 and then was acquired by EMC for a huge $2.2 billion in 2009.

Israeli start-up Diligent, in secondary de-dupe, was acquired by IBM in May 2008 for $200 million.

ExaGrid Systems in Westborough, MA, was born in 2002. Formerly Inspection Systems, it was created by former employees of HighGround Systems and has now 1,200 customers and 4,000 installed systems.

Belgium firm FilePool (formerly Wave Research), co-founded by Paul Carpentier, now CTO of Caringo, was without question the pionner in CAS software. The start-up was taken over in May 2001 for $50 million by EMC to build the Centera, with content-derived addresses that permit only one protected copy of content to be stored no matter how many times it is used. We discovered patent filed by Carpentier and others as early as 1998.

Permabit (Cambridge, MA) was created in 2000 and continues to exist, having OEMs like HDS, LSI, Overland or StoneFly or Violin Memory.

Riverbed was founded in May 2002 in order to design an appliance for WAN optimization.

Born in 2002 in Adelaide, Australia, small start-up Rocksoft in de-dupe software, was bought by ADIC in 2006 for $63 million. Then Quantum got the technology following its acquisition of ADIC. In fact, Quantum did mainly this operation to get a tape activity. But now, it's a flagship technology for the company that was one of the first power in D2D backup subsystems. Quantum said to have issued 9 U.S. patents on de-dupe and 42 pending ones.

De-Dupe Process
In the de-dupe process, unique chunks of data, or byte patterns, are identified and stored during a process of analysis. It may occur in-line, as data is flowing, or post-process after it has been written on disk. The operation can be done on blocks or files, through software or faster through a dedicated hardware appliance.

dedupe
(Source: Citrix/NetApp)
The basic idea is simple: when you transfer data between two sources, check which ones have already been transmitted and replace them by a small index. But practically, it's more complicated. Each firms has its own algorithm. There is no standardization, so de-dupe is perfect for backup but risky for archiving.

In the list below, we cannot guarantee that all them are using their own algorithms and some have only patents and no products.

Today the question is more "Which storage companies do not have de-dupe?" rather than "Which companies are involved?". All these later sign OEM contracts with other ones to implemented de-dupe, a technology absolutely necessary today to sell backup or VTL and even WAN solutions, and probably in the future on primary storage systems, for the users to reduce its number of HDDs and more costly SSDs.

Note: after the name of a firm, a "/" precedes the company (ies) acquired for de-dupe.

(ABOUT) ALL COMPANIES IN DE-DUPE

3X Systems
Acronis
Altaro
American Megatrends India
AppAssure
ArcMail Technology
Arkeia/Kadena Systems
Atempo
Attix5
Atlantis Computing
Bacula
balesio
Barracuda Networks
BitSpeed
BluPointe
BridgeSTOR
Brocade (patent)
CA/XOsoft
Caminosoft
Cavium
Clearpace Software
CloudBerry
Code 24 Software
Code42
Cofio
CommVault
Comodo
Copiun
Ctera Networks
Data Storage Group
Datacastle
dataStor
Dell/Ocarina
Digitili
Druva Software
Dynamic Solutions International
EMC/Avamar/Data Domain
Eversync
Exagrid
Exar/Hifn
FalconStor
Fujitsu
Genie9
GFI Software
GreenBytes
Hitachi (patent)
HP
IASO
iB3
IceWEB
IBM/Tivoli/Storwize/Diligent
id7
Imation/Nine Technology
Infineta Systems
Infortrend
InQuinox
InterCloud Systems /VaultLogix
Iron Mountain
Ixilix
Lortu Software
KeepItSafe
Luminex
Maxta
Maxtronix
Microsoft
Navisite
Nakivo
NEC
NetApp
NetJapan
NetLogic
Nexenta
Nexsan
Nimble Storage
Nine Technology
NovaStor
OnApp
Opendedup (open source)
Oracle/Sun ZFS
Overland/Tavata Software
Panzura
Parsec Labs
Permabit
PHD Virtual
Pixel8 Networks (patent)
Pure Storage
QUADStor
Quantum/ADIC/Rocksoft
Quest Software/BakBone
QuorumSoft
RainStor
Rebit
Revinetix
Riverbed
ROBObak
Sepaton
SGI/Copan
Silver Peak
Spectra Logic
Storagedata
Storageflex
Symantec/Veritas/Datacenter Technologies
Tandberg Data (dataStor)
Tegile
Teradata
Tilana
TwinStrata
Unitrends
Vaulten
Vaultize
Veeam
VeloBit
Venyu
WhipTail Tech
Zetta

3 904
Articles_bottom