EMC/Data Domain Global Deduplication Array
14.2PB of logical backup capacity, throughput up to 12.8TB/hour
This is a Press Release edited by StorageNewsletter.com on April 12, 2010 at 3:04 pmEMC Corporation announced the EMC Data Domain Global Deduplication Array (GDA), the industry’s fastest inline deduplication storage system for enterprise backup applications.
The Global Deduplication Array, based on a new multi-controller extension of the Data Domain architecture, offers inline global deduplication and a global namespace for all data stored in the dual controller system. With throughput up to 12.8 terabytes per hour (TB/hour), it establishes consistently high benchmarks across the spectrum of common data center backup metrics. The Global Deduplication Array provides up to 14.2 petabytes (PB) of logical backup capacity, driving new levels of simplicity for data center backup consolidation across workloads as diverse as very large databases, VMware images, and unstructured data.
Unlike most multi-controller deduplication systems, the inline Global Deduplication Array is tightly coupled with backup software, enabling industry leading inline deduplication performance, dynamic distribution of load and simplicity of operation. The Global Deduplication Array distributes parts of the deduplication process to the backup servers to reduce network load and increase the throughput performance of the GDA controllers. It offers more than 3x faster backup throughput per controller than competitive deduplication configurations and is the fastest inline deduplication system available. This distributed deduplication processing throughput is anchored by the native speed advantages of the Intel Xeon multi-core CPUs in the GDA controllers and the Data Domain SISL (Stream-Informed Segment Layout) scaling architecture that minimizes the number of disk accesses required in the deduplication process. At initial release, the platform supports Symantec NetBackup and Backup Exec through backup server-based OpenStorage plug-in software. Later in 2010, it will also support EMC NetWorker using integrated software.
The Global Deduplication Array presents a single inline deduplication storage pool to the backup application across two EMC Data Domain DD880 controllers. Large datacenter backup jobs are dynamically and transparently load balanced across the controllers, simplifying capacity management, performance management and backup administration:
- For backup environments with hundreds of terabytes to process, administrators can target their backup policies to a Global Deduplication Array and leverage a common deduplication storage environment for all data protected by those policies.
- The Global Deduplication Array accommodates up to 270 concurrent backup jobs and up to 12.8 TB/hour of throughput, allowing more backups to finish sooner while putting less pressure on limited backup windows.
- Global namespace minimizes the need to reconfigure complex backup policies, while innovative global deduplication technology dynamically load balances policies for performance and capacity management. Consequently, very large data sets can be easily protected with administrative simplicity while maximizing overall deduplication efficiency and therefore minimizing physical storage footprint.
"Figuring out how to get backups done within the allotted period of time in the face of data growth is still the biggest data protection challenge that organizations face according to our research," said Brian Babineau, Senior Consulting Analyst with Enterprise Strategy Group. "With their Data Domain Global Deduplication Array, EMC has far exceeded the inline deduplication performance benchmark it set with its previous top-of-line Data Domain system, but more importantly, the company has given customers a way to protect more of their data in a shorter period of time. We expect more companies to evaluate integration between backup software and deduplication storage to maximize these performance levels and data reduction results while consolidating administrative tasks."
"The EMC Data Domain Global Deduplication Array, while very sophisticated under the hood, builds on the mature foundation of the existing Data Domain platform and retains its appliance simplicity," said Brian Biles, Vice President of Product Management, EMC Backup Recovery Systems Division. "Its deduplication is inline, it’s blistering fast, and it’s big enough for significant datacenter backup consolidation, but its dynamic load balancing, single deduplication storage pool and namespace and tight integration with backup software means the Global Deduplication Array is easier to operate than competitors who don’t have its scale. EMC has once again moved the dial on disk-based data protection."
Unequalled Replication Capabilities:
- With the EMC Data Domain Replicator software option, the Global Deduplication Array can automate wide area network (WAN) vaulting for use in disaster recovery (DR), remote office backup, or multi-site tape consolidation.
- A single Global Deduplication Array can support a replication fan-in of up to 270 remote offices using smaller deduplication storage systems such as the Data Domain DD140 or the DD600 series appliances.
- Cross-site deduplication further minimizes the required bandwidth since only the first instance of data is transferred across any of the WAN segments between sites. Additionally, for fast offsite protection and consolidation of tape out operations, the Global Deduplication Array provides up to 54 TB/hour of replication throughput.
Like all Data Domain systems, the new Global Deduplication Array is simple to install and flexible enough to be implemented into existing user environments without disruption. Backed by available EMC 24x7x365 enterprise class service, it seamlessly integrates into Symantec™ NetBackup and Backup Exec backup environments using the EMC Data Domain OpenStorage software option.
Why Architecture Matters
The Global Deduplication Array is based on the same CPU-centric approach to inline data deduplication as all EMC Data Domain systems. Unlike most deduplication approaches that are added as afterthoughts to existing disk arrays, Virtual Tape Libraries (VTLs) or backup software, combined efficiencies of Data Domain include:
- SISL scaling architecture leverages CPU improvements to increase deduplication speed inline while minimizing reliance on disk accesses for performance. Data Domain systems have delivered consistent improvement in throughput performance by nearly 90 times and in capacity by more than 225 times over the last 6 years. Based on Intel’s CPU roadmap, increased throughput is expected to continue growing significantly in the future.
- High performance inline deduplication for simplicity, to minimize system resources, administration, and internal system process contention.
- Green storage efficiency for a smaller system footprint and lower power consumption.
- Data Domain Data Invulnerability Architecture defends against data integrity issues by providing continuous verification during storage and recovery of data.
Availability and Global Services
The EMC Data Domain Global Duplication Array will be generally available in the 2nd quarter of 2010. In addition, customers leveraging the new EMC Data Domain Global Deduplication Array can work with EMC Global Services to accelerate their backup and deduplication deployments and maximize the value of its capabilities through the use of EMC’s end-to-end consulting, implementation, residency and education services.