What are you looking for ?
Infinidat
Articles_top

Availability of Alluxio Structured Data Service Featuring Data Catalog Service and Transformation Service

Provides just-in-time data transform of data to be compute-optimized for applications like Presto, independent of storage solution or format.

Alluxio, Inc. announced Structured Data Service (SDS) featuring a data catalog service and transformation service, 2 major architectural components of its Data Orchestration Platform.

Alluxio Structured Data Mgmt

Data engineers, architects and developers can spend less resources storing data and more time delivering data to analytical compute engines.

As users and enterprises leverage available analytics engines such as Presto, Apache Spark SQL or Apache Hive, they often run into inefficient data formats and face performance challenges. Typically, those engines consume structured data in different databases with ‘tables’ consisting of ‘rows’ and ‘columns’, rather than ‘offset’ and ‘length’ in files or objects. This gap creates multiple challenges and inefficiencies, such as mappings or creating converted copies of the data. With this announcement, users benefit from a more simplified data platform that enables connections to different catalogs for access to structured data, with less copies and pipelines and more compute-optimized data.

Alluxio now provides just-in-time data transform of data to be compute-optimized, independent of the storage format for OLAP engines, such as Presto and Apache Spark,” said Haoyuan Li, founder and CTO. “These schema-aware optimizations are made possible with the new Alluxio Catalog Service which abstracts the widely-used Apache Hive Metastore, so regardless of how the data was initially stored – CSV and text formatted files, for example – the data is now transformed into the generally recognized compute-optimized parquet format. Almost every organization has a surprising amount of data in CSV or other text formats and this removes the manual work to make that data more usable. A second type of transformation will coalesce many smaller files, enabling the data to be combined into fewer files, which is more efficient to process for SQL engines. And yet a third type of transformation is for sorting, enabling table columns to be sorted adding to the efficiency of queries, newly available in our Enterprise Edition.

We can thank Kubernetes for distributed compute; and Alluxio for distributed data. The combination of these technologies offers tremendous promise for our data-driven hybrid and multicloud future,” said Eric Kavanagh, CEO, Bloor Group.

Alluxio Ecosystem

Structured Data Service
With it, the company can expose the data to be accessed by the SQL engines, independent of how and where the data is stored. Capabilities and services include:

  • Presto Connector for Alluxio – A Presto connector for Alluxio is available. This allows integration and configuration of Alluxio with Presto.

  • Catalog Service – It manages the metadata of structured data in the system. It is responsible for all the database, table, and schema information, as well as the location of all the stored data. There is no longer a need to change any table locations in the Hive metastore, or to restart or reconfigure any Hive services.It enables schema-aware optimizations for any type of structured data. For example, once the Hive metastore is attached to the Catalog Service, this service will automatically mount the appropriate table locations, and automatically serve the table metadata with the Alluxio locations.

  • Transformation Service – It transforms data into a compute-optimized representation of the data, which is independent from the storage-optimized format. This enables physical data independence. Three types of transformations are available for tables: coalesce, format conversion, and sorting. While results depend on the specific formats and workloads, internal tests have shown increase in query performance by over 2.5x.

Availability:
Alluxio 2.2 Community and Enterprise Edition with Structured Data Service are
available for download.

Resources:
Blog: What’s new in Alluxio 2.2     
Blog: Serving Structured Data in Alluxio: Concept     
Blog: Serving Structured Data in Alluxio: Example
Reading more about how to get started with Alluxio’s structured data service in the documentation!

Read also:
AWS Summit: Alluxio Launches Data Orchestration Platform Powering Multi-cloud Analytics and AI
Community and Enterprise edition include capabilities across critical areas that are gaps in cloud data engineering market.
July 16, 2019 | Press Release

Articles_bottom
AIC
ATTO
OPEN-E