What are you looking for ?
Infinidat
Articles_top

EMC Creates Greenplum Analytics Workbench, 1000 Node Platform for Tests on Apache Hadoop

With Intel, VMware, Micron, Seagate, Supermicro, Switch, and Mellanox

EMC Corporation announced the creation of the Greenplum Analytics Workbench, which will be used for regular integration tests on Apache Hadoop.

The 1,000-plus node test bed cluster incorporates technology from software and hardware manufacturers with the intention of providing the infrastructure needed to facilitate Apache Hadoop innovation. With the availability of a large-scale test bed, developers can have their contributions validated at scale, and enterprises can deploy new releases in a production environment.

Apache Hadoop has rapidly emerged as the preferred solution for Big Data analytics across unstructured data. Organizations looking for opportunity in an ever-changing business environment are finding that Big Data analysis is the competitive advantage. In fact, according to a 2011 TDWI survey, 34% of companies do big data analytics today, and that number is growing.

Hadoop-based batch processing of unstructured and structured data at massive scale using commodity hardware has led to a profound change in analytics. By extracting the knowledge wrapped within unstructured and machine-generated data, organizations can make better decisions.

Hadoop innovation and development is reliant upon contributions made by open source developers. However, the Apache Hadoop community has consistently faced the challenge of provisioning the required resources to validate new releases of the open source software. Without access to a large cluster for scale validation, the Apache community – and enterprise users – must wait for Hadoop user communities to sponsor an effort to run scale validations. This is done very infrequently and a lot of time is spent stabilizing releases for enterprise adoption.

With a plan for testing on the Apache Hadoop trunk and its continuing releases, EMC is excited to contribute to the Hadoop open source community by providing testing resources it lacks to quickly identify bugs, stabilize new releases and optimize hardware configurations in an effort to speed up the innovation of Hadoop. EMC plans to provide test results to the Apache Software Foundation and open source community, and EMC’s testing will be planned in coordination with the Apache Hadoop project.

The Greenplum Analytics Workbench is the result of a collaboration of several hardware and software vendors including:

  • EMC
  • Intel
  • Mellanox Technologies
  • Micron
  • Seagate
  • SuperMicro
  • Switch
  • VMware

The test bed cluster, which consists of 1,000-plus hardware nodes or 10,000 nodes with the addition of virtual machines, features 24 petabytes of physical storage. This is the equivalent of nearly half of the entire written works of mankind, from the beginning of recorded history.

"EMC and its partners have made a significant contribution to the Apache Hadoop community by promising to validate Apache Hadoop releases on clusters at petabyte scale. With access to continuous integration testing, the world’s best unstructured data analytics software will get better and faster, allowing companies and organizations to gain better insights from their data," said Dhruba Borthakur, Member of Hadoop Project Management Committee.

"The EMC 1k node cluster fills a vital resource gap, one that has been missing up to this validating Apache Hadoop builds and releases at scale. I can’t wait to take it out for a burn," said Michael Stack, Engineer at StumbleUpon and Member of Hadoop Project Management Committee.

"Apache Hadoop at this stage needs a standardized tool for testing and validating Hadoop releases at scale. EMC’s 1,000 node test bed launch will facilitate the development of Apache Hadoop as a vital tool for Big Data analytics, advance its internal innovation, and lead to greater adoption of Hadoop. I am especially pleased that EMC is contributing its findings back to the open source community," said Konstantin Shvachko of eBay, Member of Apache Project Management Committee.

"Intel is excited to be a part of the largest Hadoop test bed cluster ever built. Being able to analyze Big Data sets and make use of the tremendous volume of unstructured data being created is an opportunity that could transform entire industries. The latest Intel Xeon 5600 series processors will provide the processing power required to scale Big Data analytics and realize the full potential of Apache Hadoop. The entire open source community, including Intel, will benefit from the key learnings from both development and testing on the cluster," said David Tuhy, General Manager of the Storage Group, Intel Corporation.

"Greenplum is excited to be part of the elite group of hardware and software manufacturers that made possible the Greenplum Analytics Workbench. The test bed cluster, at 1,000-plus hardware nodes, is itself an accomplishment. But more importantly, we are excited to make this test bed available to the open source community so that enterprises can feel comfortable deploying Apache Hadoop in a production environment and can reap the benefits of Big Data analytics," said Luke Lonergan, Chief Technology Officer, Greenplum, a division of EMC.

Availability
Regular testing cycles on the Greenplum Analytics Workbench will begin early next year.

Articles_bottom
AIC
ATTO
OPEN-E