… And ClusterStor Hadoop Workflow Accelerator
For big data analytics
This is a Press Release edited by StorageNewsletter.com on November 24, 2014 at 2:57 pmSeagate Technology plc announced availability of the ClusterStor Hadoop Workflow Accelerator, a solution providing the tools, services, and support for HPC customers who need performing storage systems for big data analytics.
It is a set of Hadoop optimization tools, services and support that leverages and enhances the performance of ClusterStor, a scale-out storage system designed for big data analysis.
Computationally intensive High Performance Data Analytics (HPDA) environments will benefit from reductions in data transfer time. This solution also includes the Hadoop on Lustre Connector, which allows both Hadoop and HPC Lustre clusters to use the same data without having to move the data between file systems or storage devices.
“Data-intensive computing has long been a part of HPC, but newer analytical approaches using Hadoop and other methods, such as graph analytics, will help drive strong growth in high performance data analysis, which is the market for big data needing HPC. The Hadoop Workflow Accelerator is designed to serve both the technical computing and commercial sides of this converging big data-HPC market that IDC forecasts will exceed $4 billion in 2018,” said Steve Conway, IDC research VP, HPC. “IDC research shows that 29% of HPC sites already use Hadoop. The market will welcome tools that boost Hadoop performance and efficiency.”
The accelerator supports Hadoop distributions based on Open Source Apache Hadoop. The company is working with Hadoop distributors to offer solutions to HPC customers and will provide integration between the accelerator and other Hadoop distributions in future releases.
“Organizations not only want to manage the tremendous volume of data that they are collecting from a variety of sources, they also want to derive new insights that enable actionable intelligence and improve operational efficiency. Seagate’s award-winning ClusterStor scale-out HPC solutions, now with our Hadoop Workflow Accelerator options, enable organizations to optimize big data workflows and centralize storage for High Performance Data Analytics solutions,” said Ken Claffey, VP of ClusterStor, Seagate cloud systems and solutions. “TeraSort benchmark results have the Hadoop Workflow Accelerator outperforming Hadoop on the Hadoop Distributed File System (HDFS) by 38% on the same hardware. The Hadoop Workflow Accelerator meets our customer’s performance demands and optimizes the performance of Hadoop Ecosystem deployments, thus helping customers chieve the fastest time to results for their data intensive workloads and hardware configuration.”
The ClusterStor systems’ scale-out HPC architecture enables a central repository allowing both HPC and Hadoop analytics tools to be run simultaneously on the same data sets in ClusterStor. The accelerator reduces time to results by enabling immediate Hadoop data processing from the start of each job, and eliminates the time consuming step of bulk copying large amounts of data from a separate data repository. With it, Hadoop environments can now scale computing and storage resources independently, increasing flexibility to optimize analysis resources, while supporting centralized high-performance data repositories of 100’s of petabytes of storage capacity.
Hadoop Workflow Accelerator detail:
- Tests run with Hadoop Workflow Accelerator on applications such as Mahout, Hive and Pig showed marked improvements for Apache Hadoop 1.0 distributions over standard storage configurations. TeraSort benchmarks show that the Hadoop Workflow Accelerator outperforms Hadoop on HDFS by up to 38%.
- It includes the Seagate developed Hadoop on Lustre Connector and an array of ClusterStor performance optimization best practices, system tuning methods, installation and configuration management tools, and professional services.
- Expanding compatibility, the company’s engineered family of Hadoop on Lustre Connectors extend support to several Hadoop eco-system packages such as Mahout, Hive and Pig to take advantage of the parallel R/W performance of the Lustre file system operating with high-speed networks such as 40GbE and IB.
- It is compatible with both Hadoop 1.0 and Hadoop 2.0 or Yarn distributions and requires no code changes or re-compiling of either Hadoop or Lustre systems.
- It is compatible with existing HDFS-based Hadoop installations. There’s no need to migrate data to ClusterStor prior to using the accelerator as users can read from or write to ClusterStor and HDFS interchangeably, while running Hadoop jobs.
Hadoop Workflow Accelerator is scheduled to be available in January as a set of distinct product bundles, with varying levels of performance optimization, services and support.
Seagate was exhibiting at SC14, November 16-21 in New Orleans, LA during which it was demonstrating its Hadoop Workflow Accelerator.