EMC Greenplum Database Enhancements for Big Data Analytics
Include gNet for Hadoop and Data Domain Boost
This is a Press Release edited by StorageNewsletter.com on March 2, 2012 at 3:11 pmEMC Corporation announced version 4.2 of Greenplum Database, bringing to the platform for in-database analytics new levels of big data integration, database manageability and performance.
That means customers can run massive-scale mission-critical analysis more easily and rapidly, thus further boosting their analytic productivity, business value and business decision-making prowess.
Sitting at the heart of the Greenplum family of products,
Greenplum Database 4.2 includes:
- a performance gNet for Hadoop;
- language and compatibility enhancements for faster migrations to Greenplum;
- simpler, scalable backup with Data Domain Boost;
- an extension framework and turnkey in-database analytics; and
- targeted performance optimization.
In order to expand the range of solutions that can be created for data integration and processing and to run queries for mission-critical complex analysis, customers seek the most efficient and flexible data exchange between Greenplum Database and Hadoop, in addition to the existing parallel data access. To address this, Greenplum 4.2 enables performance parallel import and export of all data (compressed and uncompressed) from Hadoop using gNet for Hadoop, a parallel communications transport. This achievement represents the first direct query interoperability between Greenplum Database and Hadoop.
A new Greenplum Database feature is the integration with Data Domain deduplication storage systems via Data Domain Boost, resulting in faster (10 to 30x data reduction average), more efficient backup. This integration distributes parts of the deduplication process to Greenplum database servers, enabling them to send only unique data to the Data Domain system, thus increasing aggregate throughput, reducing the amount of data transferred over the network and eliminating the need to create and manage virtual drives (inline deduplication with up to 26.3 TB/hour of throughput; backup over173 TB in less than eight hours).
Addressing database manageability and performance, Greenplum Database delivers an extensible platform for in-database analytics, leveraging the system’s massively parallel architecture. With Release 4.2, Greenplum enables in-database analytics via Greenplum Extensions, which can be downloaded from EMC Subscribenet and installed using the new Greenplum Package Manager – a new utility that ensures automatic installation and updates of functional extensions to simplify the task of enabling and managing advanced in-database functionality across a cluster. Release 4.2 also supports dynamic partition elimination and query memory optimization, thus reducing the data scanned for a query, accelerating query processing and allowing for more concurrency.
Greenplum Command Center
- It is a web-based Big Data infrastructure management console, provides a unified administrative and real-time/historical health-monitoring dashboard for all currently available Greenplum products.
- Supported Greenplum Database administrative operations include start, stop, and initialize Greenplum Database; search, prioritize, or cancel any query; and recover and rebalance data mirrors.
- Initial release of Greenplum Command Center is available with Greenplum Data Computing Appliance version 1.2.
EMC Greenplum Database version 4.2 and the Greenplum Command Center are available.
Scott Yara, Senior Vice President of Products, Greenplum, a division of EMC, said: "The EMC Greenplum Database continues to be at the core of driving Big Data insights and decisions for our customers. As more organizations create data-driven cultures, the Greenplum Database’s shared-nothing, massively parallel processing (MPP) makes business intelligence and analytical processing much faster. It is this analytic productivity that is the real benefit of the database and is something we’re proud to offer."