DOCUMENT: Latest Patents on De-Dupe

Here is a list of the most recent patents and application patents (to be confirmed) on de-duplication technology.

Companies involved in this list:

Barracuda Networks
Brocade
Dell
Exar
Hitachi
GreenBytes
IBM
Microsoft
NetApp
Nine Technology
Pixel8 Networks
Quantum
Symantec

PATENTS

US Patent ≠8,086,799, December 27, 2011
Scalable deduplication of stored data

Inventors: Mondal; Shishir (Bangalore, IN), Killamsetti; Praveen (Bangalore, IN)
Assignee: NetApp, Inc. (Sunnyvale, CA)

In a method and apparatus for scalable deduplication, a data set is
partitioned into multiple logical partitions, where each partition can
be deduplicated independently. Each data block of the data set is
assigned to exactly one partition, so that any two or more data blocks
that are duplicates of each are always be assigned to the same logical
partition. A hash algorithm generates a fingerprint of each data block
in the volume, and the fingerprints are subsequently used to detect
possible duplicate data blocks as part of deduplication. In addition,
the fingerprints are used to ensure that duplicate data blocks are sent
to the same logical partition, prior to deduplication. A portion of the
fingerprint of each data block is used as a partition identifier to
determine the partition to which the data block should be assigned. Once
blocks are assigned to partitions, deduplication can be done on
partitions independently.

US Patent ≠8,074,049, December 6, 2011
Online backup system with global two staged deduplication without using an indexing database

Inventors: Gelson; Thomas M. (Marion, MA), Stoev; Alexander (Woodbridge, CA)
Assignee: Nine Technology, LLC (Middleboro, MA)

An encryption for a distributed global online backup system with global
two-stage deduplication in the absence of an indexing database where
data blocks are encrypted using their SHA-1 signatures as encryption
keys.

US Patent ≠8,074,043, December 6, 2011
Method and apparatus to recover from interrupted data streams in a deduplication system

Inventors: Zeis; Michael John (Minneapolis, MN)
Assignee: Symantec Corporation (Mountain View, CA)

Detection and proper deduplication of a re-started data stream in a
segmentation analysis-based deduplication system are provided by
retaining information about a previous data stream and using that
information when performing segmentation of the re-started data stream.
Information such as a segment size associated with a last data object
received in the previous data stream and a record of how much data was
present in the last segment associated with the previous data stream is
retained. The retained segment size information is used to set a first
data object segment size of the re-started data stream, and the size of
last segment information is used to determine how much information
should be put in the first segment associated with the re-started data
stream in order to maintain proper alignment of the remainder of the
segments for the first data object in the re-started data stream for
deduplication.

U.S. Patent ≠8,051,252, November 1, 2011
Method and apparatus for detecting the presence of subblocks in a reduced-redundancy storage system

Inventors: Williams; Ross Neil (Adelaide, AU)
Assignee: Quantum Corporation (San Jose, CA)

A method that includes, by one or more computer systems, determining a
data retrieval rate policy based on at least one data retrieval rate
parameter. The method also includes determining at least one storage
subsystem performance parameter. The method further includes determining
a fragmentation value based on the data retrieval rate policy and the
at least one storage subsystem performance parameter. The method
additionally includes determining a storage subsystem fragmentation of a
first data object. The storage subsystem fragmentation includes
fragmenting the first data object into a plurality of first data object
fragments. The method also includes deduplicating the first data object
based on the fragmentation value and the storage subsystem
fragmentation.

US Patent ≠8,060,715, November 15, 2011
Systems and methods for controlling initialization of a fingerprint cache for data deduplication

Inventors: Cremelie; Nick (Ghent, BE), Stougie; Bastiaan (Melle, BE)
Assignee: Symantec Corporation (Mountain View, CA)

A computer-implemented method for controlling initialization of a
fingerprint cache for data deduplication associated with a
single-instance-storage computing subsystem may comprise: 1) detecting a
request to store a data selection to the single-instance-storage
computing subsystem, 2) leveraging a client-side fingerprint cache
associated with a previous storage of the data selection to the
single-instance-storage computing subsystem to initialize a new
client-side fingerprint cache, and 3) utilizing the new client-side
fingerprint cache for data deduplication associated with the request to
store the data selection to the single-instance-storage computing
subsystem. Other exemplary methods of controlling initialization of a
fingerprint cache for data deduplication, as well as corresponding
exemplary systems and computer-readable-storage media, are also
disclosed.

US Patent ≠8,055,618, November 8, 2011
Data deduplication by separating data from meta data

Inventors: Anglin; Matthew J. (Tucson, AZ)
Assignee: IBM Corporation (Armonk, NY)

Provided are techniques for data deduplication. A chunk of data and a
mapping of boundaries between file data and meta data in the chunk of
data are received. The mapping is used to split the chunk of data into a
file data stream and a meta data stream and to store file data from the
file data stream in a first file and to store meta data from the meta
data stream in a second file, wherein the first file and the second file
are separate files. The file data in the first file is deduplicated.

US Patent ≠8,050,251, November 1, 2011
VPN optimization by defragmentation and deduplication apparatus and method

Inventors: Ongole; Subrahmanyam (Cupertino, CA), Srinivasan; Sridhar (Bangalore, IN)
Assignee: Barracuda Networks, Inc. (Campbell, CA)

An apparatus for optimizing a virtual private network operates by
defragmenting and deduplicating transfer of variable sized blocks. A
large data object is converted to a plurality of data paragraphs by a
fingerprinting method. Each data paragraph is cached and hashed. The
hashes are transmitted between a primary and a satellite apparatus. Only
data paragraphs which are not cached at both the primary and satellite
are transferred. The data object is integrated from data paragraphs
stored in cache and transmitted to its destination IP address.

APPLICATION PATENTS

US Patent Application ≠20110320865, December 29, 2011
Deduplication in a hybrid storage environment

Inventors: Jain; Bhushan P.; (Maharashtra, IN); Musial; John
G.; (Newburgh, NY); Nagpal; Abhinay R.; (Maharashtra, IN); Patil;
Sandeep R.; (Elmsford, NY)
Assignee: IBM Corp., Armonk, NY

Deduplication in a hybrid storage environment includes determining
characteristics of a first data set. The first data set is identified as
redundant to a second data set and the second data set is stored in a
first storage system. The deduplication also includes mapping the
characteristics of the first data set to storage preferences, the
storage preferences specifying storage system selections for storing
data sets based upon attributes of the respective storage systems. The
deduplication further includes storing, as a persistent data set, one of
the first data set and the second data set in one of the storage
systems identified from the mapping.

US Patent Application ≠20110307447, December 15, 2011
Inline wire speed deduplication system

Inventors: Sabaa; Amr; (Sunnyvale, CA); Kumar; Pashupati;
(San Jose, CA); Vu; Bao; (San Ramon, CA); Parekh; Tarak; (San Jose, CA);
Kuriakose; Poulo; (Cupertino, CA); Guntaka; Vidyasagara Reddy; (San
Jose, CA); Hans; Madhsudan; (San Ramon, CA); Ko; Kung-Ling; (Union City,
CA)
Assignee: Brocade Communications Systems, Inc., San Jose, CA

Systems for performing inline wire speed data deduplication are
described herein. Some embodiments include a device for inline data
deduplication that includes one or more input ports for receiving an
input data stream containing duplicates, one or more output ports for
providing a data deduplicated output data stream, and an inline data
deduplication engine coupled to said one or more input ports and said
one or more output ports to process input data containing duplicates
into output data which is data deduplicated, said inline data
deduplication engine having an inline data deduplication bandwidth of at
least 4 Gigabytes per second.

U.S. Patent ≠2011/0225385, December 15, 2011
Index Entry Eviction

Inventor: Spackman, Stephen; San Jose CA
Assignee: Quantum Corporation, San Jose, CA

Systems, methods embodied on computer-readable media, and other
embodiments associated with index entry eviction are described. One
example method includes selecting an index entry for eviction from a
bucket of index entries based on a time value, a utility value, and a
precedence value. A precedence value may be a value associated with an
index entry that is static over time. Additionally, results of a
function that compares two precedence values may be static over time.
The example method may also include providing an index entry identifier
that identifies the index entry.

U.S. Patent ≠2011/0289281 A1, November 24, 2011
Policy based data retrieval performance for deduplicated data

Inventor: Spackman, Stephen P.; San Jose, CA
Assignee: Quantum Corporation, San Jose, CA

US Patent Application ≠20110289290, November 24, 2011
Space reservation in a deduplication system

Inventors: Akirav; Shay H.; (Petach-Tikva, IL); Caro; Aviv;
(Modiin, IL); Drobchenko; Elena; (Raanana, IL); Ekshtein; Asaf K.;
(Petach-Tikva, IL); Hepner; Dov N.; (Hertzelyia, IL); Leneman; Ofer;
(Kfar Saba, IL); Taub; Tzafrir Z.; (Givaataim, IL)
Assignee: IBM Corp., Armonk, NY

Various embodiments for space reservation in a deduplication system are
provided. A calculated factoring ratio is determined as a weighted ratio
of current nominal data to physical data based on at least one storage
capacity threshold and a used storage space currently physically
consumed by one of backup and replication data. A maximal nominal
estimated space in the computing storage environment is calculated. A
remaining space, defined as the maximal nominal estimated space minus a
current nominal space in the computing storage environment, is
calculated. If the remaining space is one of equal and less than a
user-configured reservation space for backup operations, data
replication operations are accepted and stored in the computing storage
environment.

US Patent Application ≠20110276781, November 10, 2011
Fast and low-RAM-footprint indexing for data deduplication

Inventors: Sengupta; Sudipta; (Redmond, WA); Debnath;
Biplob; (Minneapolis, MN); Li; Jin; (Bellevue, WA); Desai; Ronakkumar
N.; (Redmond, WA); Oltean; Paul Adrian; (Redmond, WA)
Assignee: Microsoft, Corp., Redmond, WA

The subject disclosure is directed towards a data deduplication
technology in which a hash index service’s index maintains a hash index
in a secondary storage device such as a hard drive, along with a compact
index table and look-ahead cache in RAM that operate to reduce the I/O
to access the secondary storage device during deduplication operations.
Also described is a session cache for maintaining data during a
deduplication session, and encoding of a read-only compact index table
for efficiency.

US Patent Application ≠20110276744, November 10, 2011
Flash memory cache including for use with persistent key-value store

Inventors: Sengupta; Sudipta; (Redmond, WA); Debnath; Biplob Kumar; (Minneapolis, MN); Li; Jin; (Bellevue, WA)
Assignee: Microsoft Corporation, Redmond, WA

Described is using flash memory, RAM-based data structures and
mechanisms to provide a flash store for caching data items (e.g.,
key-value pairs) in flash pages. A RAM-based index maps data items to
flash pages, and a RAM-based write buffer maintains data items to be
written to the flash store, e.g., when a full page can be written. A
recycle mechanism makes used pages in the flash store available by
destaging a data item to a hard disk or reinserting it into the write
buffer, based on its access pattern. The flash store may be used in a
data deduplication system, in which the data items comprise
chunk-identifier, metadata pairs, in which each chunk-identifier
corresponds to a hash of a chunk of data that indicates. The RAM and
flash are accessed with the chunk-identifier (e.g., as a key) to
determine whether a chunk is a new chunk or a duplicate.

US Patent Application ≠20110276543, November 10, 2011
Virtual block device
Inventor: Matze; John Edward Gerard; (Poway, CA)
Assignee: Exar Corporation, Fremont, CA
A virtual block device is an interface with applications that appears to
the applications as a memory device, such as a standard block device.
The virtual block device interacts with additional elements to do data
deduplication to files at the block level such that one or more files
accessed using the virtual block device have at least one block which is
shared by the one or more files.

US Patent Application ≠20110270810, November 3, 2011
Methods and apparatus for active optimization of data

Inventors: Dinkar; Abhijit; (San Jose, CA); Jayaraman;
Vinod; (San Francisco, CA); Bashyam; Murali; (Fremont, CA); Rao;
Goutham; (Los Altos Hills, CA)
Assignee: Dell Products L.P., Round Rock, TX

Techniques and mechanisms are provided to support live file
optimization. Active I/O access to an optimization target is monitored
during optimization. Active files need not be taken offline or made
unavailable to an application during optimization and retain the ability
to support file operations such as read, write, unlink, and truncate
while an optimization engine performs deduplication and/or compression
on active file ranges.

US Patent Application ≠20110270809, November 3, 2011
Heat indices for file systems and block storage

Inventors: Dinkar; Abhijit; (San Jose, CA); Jayaraman;
Vinod; (San Francisco, CA); Bashyam; Murali; (Fremont, CA); Rao;
Goutham; (Los Altos Hills, CA)
Assignee: Dell Products L.P., Round Rock, TX

Techniques and mechanisms are provided to allow for selective
optimization, including deduplication and/or compression, of portions of
files and data blocks. Data access is monitored to generate a heat
index for identifying sections of files and volumes that are frequently
and infrequently accessed. These frequently used portions may be left
non-optimized to reduce or eliminate optimization I/O overhead.
Infrequently accessed portions can be more aggressively optimized.

US Patent Application ≠20110270800, November 3, 2011
Global Deduplication File System

Inventors: Chou; Randy Yen-pang; (San Jose, CA); Jung; Steve; (San Jose, CA); Mulam; Ravi; (San Jose, CA)
Assignee: Pixel8 Networks, Inc., Menlo Park, CA

Embodiments of methods and systems implementing global deduplication
file systems are described. In one embodiment of the invention, a method
and system implements a global deduplication file system between a
plurality of interconnected systems located in different locations
globally by making use of the deduplication dictionary included in
metadata being periodically snapshot. In yet another embodiment of the
invention, a method implements a global deduplication file system
between a plurality of interconnected systems located in different
locations globally and provides appropriate read/write locks.

US Patent Application ≠20110258404, October 20, 2011
Method and apparatus to manage groups for deduplication

Inventors: Arakawa; Hiroshi; (Sunnyvale, CA); Kaneda; Yasunori; (San Jose, CA)
Assignee: Hitachi, Ltd., Tokyo, Japan

A storage system comprises one or more pool volumes having chunks for
storing data; one or more primary volumes; writable snapshots as virtual
volumes for each primary volume which is a common ancestor of the
writable snapshots, each primary volume and corresponding writable
snapshots being members forming a snapshot group; and a storage
controller which includes a processor, a memory storing, for each
snapshot group, group information of the members within the snapshot
group, and a deduplication module. The deduplication module may identify
a snapshot group for deduplication based on the group information and
perform deduplication of data for the identified snapshot group in a
deduplication area, or perform deduplication of data in a deduplication
area which is specified based on the group information of a snapshot
group being generated in the storage system.

US Patent Application ≠20110258374, October 20, 2011
Method for optimizing the memory usage and performance of data deduplication storage systems

Inventor: Pertocelli; Robert; (Westerly, RI)
Assignee: GreenBytes, Inc.

A method and system of optimizing the memory usage and performance of
data deduplication storage systems includes organizing the metadata of
data blocks needed by deduplicating storage systems. A three level
hierarchy is used. Level 1 stores the metadata on disk along with the
user data. Level 2 uses low latency storage (e.g. RAM and Solid State
Disks) to cache the on-disk meta data for faster direct access. Level 3
organizes the fingerprints using a Trie and is entirely resident in RAM.
Thus, the search, to determine whether a data block is unique or not
and a candidate for transfer, can be more efficiency executed and to
ensure that the meta data is transactionally secure.

U.S. Patent ≠2011/0238635, September 29, 2011
Combining hash-based duplication with sub-block differencing to deduplicate data

Inventor: Leppard, Andrew C., Unley (AU)
Assignee: Quantum Corporation, San Jose, CA

In one embodiment, a method includes accessing data; partitioning the
data into sub-blocks; determining whether a first one of the sub-blocks
is identical to another one of the sub-blocks or similar to another one
of the sub-blocks; if the first one of the sub-blocks is identical to
another one of the sub-blocks, applying by the one or more computer
systems hash-based deduplication to storage of the first one of the
sub-blocks with respect to the other one of the sub-blocks; and, if the
first one of the sub-blocks is similar to another one of the sub-blocks,
applying by the one or more computer systems sub-block differencing to
storage of the first one of the sub-blocks with respect to the other one
of the sub-blocks.

U.S. Patent ≠2011/0218972, September 8, 2011
Data reduction indexing

Inventor: Tofano, Jeffrey Vincent; San Jose, CA
Assignee: Quantum Corporation, San Jose, CA

Example apparatus, methods, data structures, and computers control
indexing to facilitate duplicate determinations. One example method
includes indexing, in a global index, a unique chunk processed by a data
de-duplicator. Indexing the unique chunk in the global index can
include updating an expedited data structure associated with the global
index. The example method can also include selectively indexing, in a
temporal index, a relationship chunk processed by the data
de-duplicator. The relationship chunk is a chunk that is related to
another chunk processed by the data de-duplicator by sequence, storage
location, and/or similarity hash value. Indexing the relationship chunk
in the temporal index can also include updating one or more expedited
data structures associated with the temporal index. The expedited data
structures and indexes can then be consulted to resolve a duplicate
determination being made by a data reducer.

U.S. Patent ≠2011/0213917, September 1, 2011
Methods and systems for improving read performance in data de-duplication storage

Inventor: Davis, Camden John; Saint Paul, TX
Assignee: Quantum Corporation, San Jose, CA

The present invention is directed toward methods and systems for data
de-duplication. More particularly, in various embodiments, the present
invention provides systems and methods for data de-duplication that may
utilize a data de-duplication system that retrieves data from a data
storage device in an order based on the location of blocks on the data
storage device. Some embodiments break a data stream into multiple
blocks of data and store the blocks of data on a data storage device of a
data de-duplication system, wherein a code representing a redundant
block of data is stored in place of the block of data. A location for
each block of data may be stored. Additionally, the blocks may be read
in an order that is determined based on the location of the blocks.

U.S. Patent ≠ 2011/0258398, October 20, 2011
Methods and systems for vectored data de-duplication

Inventors: Saliba, George; Boulder, CO, White, Theron; Boulder, CO
Assignee: Quantum Corporation, San Jose, CA

The present invention is directed toward methods and systems for data
de-duplication. More particularly, in various embodiments, the present
invention provides systems and methods for data de-duplication that may
utilize a vectoring method for data de-duplication wherein a stream of
data is divided into “data sets” or blocks. For each block, a code, such
as a hash or cyclic redundancy code may be calculated and stored. The
first block of the set may be written normally and its address and hash
can be stored and noted. Subsequent block hashes may be compared with
previously written block hashes.