MapR Data Platform Integrated With Kubernetes Core Components for Primary Workloads on Spark and Drill.

MapR Technologies, Inc. announced innovations in its Data Platform that accelerate the compute journey with integrations with Kubernetes core components for primary workloads on Spark and Drill.

Click to enlarge

These innovations make it easy to better manage highly elastic workloads while also facilitating in-time deployments and the ability to separately scale compute and storage. Organizations restructuring their applications or building next-generation real time data lakes will benefit from these new capabilities in a Kubernetes model, with Spark and Drill, by leveraging the elasticity and agility of such clusters.

“Having run a recent survey on organizations’ use of containers to support AI and analytics initiatives, it is clear that a majority of them are exploring the use of containers and Kubernetes in production,” said Mike Leone, senior analyst, ESG. “We are also seeing compute needs are growing rapidly and bursty due to the unpredictability of compute-centric applications and workloads. MapR is solving for this need to independently scale compute while also tightly integrating with Kubernetes in anticipation of organizations’ rapid container adoption.“

In early 2019, the company enabled persistent storage for compute running in Kubernetes-managed containers through a CSI compliant volume driver plug-in. With this announcement, the firm expands its portfolio of features and allows the deployment of Spark and Drill as compute containers orchestrated by Kubernetes. This deployment model allows end users including data engineers to run compute workloads in a Kubernetes cluster that is independent of where the data is stored or managed.

Core capabilities:

Tenant operator: Creates tenant namespaces (Kubernetes namespaces) for running compute applications, allowing for a simple way to start complex applications in containers within Kubernetes. An end user can run Spark, Drill, Hive Metastore, Tenant CLI, and Spark History Server in these namespaces. These tenants can, in turn, point to a storage cluster that is located elsewhere.
Spark job operator: Creates Spark jobs, allowing for separate versions of Spark to be deployed in separate pods, facilitating the multiple stages of dev, test, and QA that are typical in a data engineer’s workflow.
Drill operator: Starts a set of Drillbits, allowing for auto-scaling of queries based on demand.
CSI driver operator: Standard plug-in to mount persistent volumes to run stateful applications in Kubernetes.

“MapR is paving the way for enterprise organizations to easily do two key things: start separating compute and storage and quickly embrace Kubernetes when running analytical AI/ML apps,” said Suresh Ollala, SVP, engineering, MapR. “Deep integration with Kubernetes core components, like operators and namespaces, allows us to define multiple tenants with resource isolation and limits, all running on the same MapR platform. This is a significant enabler for not only applications that need the flexibility and elasticity but also for apps that need to move back and forth from the cloud.“

Release delivers on six key benefits:

Handle compute bursts by spinning additional compute containers without having to add more physical host servers;
Isolate resources and prevent applications from starving each other of resources by setting granular limits on quotas, and by using Spark job operators to create different Spark clusters;
Accommodate fluctuating query workload by growing Drillbits dynamically based on load and demand;
Run different versions of Spark and Drill on the same platform,
Allow for multiple tenants to co-exist; and
Deploy Spark and Drill container applications, along with MapR volumes, across multi-cloud environments, including private, hybrid and public clouds;

These capabilities will be available in 2Q19.

Blog: MapR, Kubernetes, Spark and Drill: A Two-Part Guide to Accelerating Your AI and Analytics Workloads