RainStor 4.5 Deployed Using Cloudera Distribution

Trading name of Clearpace Software Ltd., RainStor, an infrastructure software company specializing in Online Data Retention (OLDR), announced that RainStor 4.5 can be deployed using Cloudera’s Distribution including Apache Hadoop.

The result is a pragmatic and scalable approach to Big Data that performs fast analytics while retaining data at a lower overall total cost of ownership.

RainStor can be used to retain and access massive data sets on the Hadoop Distributed File System (HDFS) at a physical footprint at least 97 percent smaller. The result combines Hadoop’s Big Data processing, management and analytics with RainStor for compliant data retention on existing, low-cost servers and storage.

"Hadoop gives organizations the ability to scale for Big Data analytics but the data actually grows as it’s replicated across nodes. Reducing the size of data slated for retention makes enormous sense," said Merv Adrian, VP Research, Gartner. "The combination changes the class of hardware and storage required, making the economics even more attractive."

RainStor on HDFS Designed
for Petabytes of Enterprise Data
As enterprises collect and generate more data than ever, RainStor on HDFS, using locally attached commodity storage, offers low initial capital investment and ongoing total cost of ownership for retaining petabytes of data. RainStor’s specialized repository compresses the data using a patented value and pattern de-duplication technique and stores it in immutable form on HDFS. RainStor has built-in security, audit trails and granular retention and expiry policies for managing the lifecycle of stored data. Data within RainStor can be accessed through standard structured query language (SQL), specialized RDBMS native SQL and standard BI tools via ODBC/JDBC.

Making the Big Data Problem Smaller
Depending on the Hadoop replication factor, the size of stored data can be a significant multiple of the raw data loaded. To counteract this, most Hadoop deployments rely on the use of binary compression (such as LZO), which typically yields on average 5 to 1 compression and comes with a re-inflation performance penalty upon access. In contrast, RainStor achieves compression rates of 40 to 1 or greater and allows data access without re-inflation.

Example: With 2PB of raw data to be stored for a 6-month period, the difference in disk savings could look like this:
Data in HDFS: 2PB X 3 (for replication) = 6PB + results of analysis.
Data in HDFS with RainStor: 0.05PB (original source data compressed 40 to 1) X 3 (for replication) = 0.15PB + results of analysis. A physical storage savings of 5.85PB.

Even using low cost commodity disk, as data volumes reach multi-petabytes and beyond, the initial capital expenditure can be significant. The overall operating cost of a large number of storage drives continues to be a contributing expense that can reach millions of dollars over multiple years. RainStor’s compression, lifecycle management and compliant retention features, combined with HDFS’ low cost commodity disk and scale out benefits, provide value and cost savings for Big Data analysis and retention.

"Cloudera’s Distribution including Apache Hadoop is fast becoming the gold standard in enterprise Hadoop deployments," said Ramon Chen, vice president product management, RainStor. "Our partners and their customers face exploding data volumes and extended compliance retention requirements. Organizations that deploy RainStor on HDFS benefit from a scalable online data retention solution at the lowest TCO, while leveraging Hadoop for Big Data analytics."

RainStor 4.5 for Hadoop is available.