Announcing the Databricks Storage Ecosystem: Governing the Enterprise Data Estate, Wherever it Lives

Powered by open source OpenSharing, our new storage partner ecosystem brings Databricks Data Intelligence Platform directly to your on-premises and hybrid infrastructure — without copying a single byte

By Philippe Nicolas | June 24, 2026 at 2:02 pm

Jump right to comments

Blog written by Rupal Jain, technology partner management director, and Denis Dubeau, principal, partner engineering, Databricks, published June 10, 2026

The Data That Can’t Move
For years, the enterprise data strategy was simple: move everything to the cloud. Migrate the data lakes and the warehouses to the cloud, and then governance follows. It was a clean story — until it wasn’t.

Today, some of the world’s most sophisticated enterprises are telling us clearly: they cannot — and will not — move all of their data to the cloud. Leading semiconductor manufacturers are training models on engineering-classified datasets that must never leave their premises. Global trading firms sit on massive volumes of historical tick data where the economics of cloud egress make migration impossible. Tier-1 banks have adopted “Hybrid Forever” strategies, modernizing on-premises storage while maintaining strict data sovereignty. Major pharmaceutical companies run millions of daily drug experiments against petabyte-scale on-premises data estates subject to stringent regulatory controls.

These aren’t edge cases. They represent a structural shift in how enterprises think about data: from “Migrate Everything” to “Govern Everything.”

The drivers are real and compounding:

Data sovereignty & regulation: Financial services, healthcare, and government organizations operate under mandates — GDPR, HIPAA, NIS2, sector-specific data residency rules — that require data to remain within specific jurisdictions or air-gapped environments. Cloud migration is not optional; it is legally prohibited for certain datasets
Data gravity & costs: At petabyte and exabyte scale, the economics of cloud migration break down entirely. Egress fees, storage costs, and sheer data volume make the “move it once” model financially unsustainable. Some of the world’s largest retailers are actively repatriating analytics workloads from cloud back to on-premises infrastructure for precisely this reason
Latency & edge workloads: Retail, manufacturing, and telco workloads require low-latency access to on-premises and edge data. Telecommunications providers ingest enormous volumes of network telemetry on-premises daily to power AI-driven network operations that cannot tolerate cloud round-trips
AI on dark data: Vast stores of backup data, unstructured archives, and secondary datasets — representing hundreds of exabytes across the enterprise — contain immense AI value that has never been unlocked because governance didn’t reach it

The signal is unmistakable. We have received requests from hundreds of customers explicitly requesting on-premises and hybrid storage connectivity to Unity Catalog. The Software-Defined Storage (SDS) market stands at hundreds of billions of dollars in 2026, and the enterprise partners who manage this estate — collectively holding more than 2 Zettabytes of data under management — are building with us.

Introducing the Databricks Storage Ecosystem
Today, we are excited to announce the Databricks Software-Defined Storage (SDS) Ecosystem — a new partner category purpose-built to bring Databricks Intelligence Platform to enterprise data wherever it lives: on-premises, in private clouds, and at the edge environments. If you are an enterprise running petabytes of data on these platforms today, you no longer have to choose between your existing non-cloud storage infrastructure and Databricks AI.

“For too long, enterprises had to choose between the on-premises storage infrastructure they rely on and the cloud-native AI they want to build. Forcing customers to migrate massive amounts of data using complex pipelines just to unlock that intelligence is a broken model. By uniting these industry-leading partners, we are ending that compromise and delivering Databricks Intelligence directly to where the enterprise data lives. But this launch is just day one. We are building the foundation to ensure that soon, every piece of hybrid data–structured or unstructured–is instantly ready for generative AI without ever copying a byte,” said Stephen Orban, SVP, product partnerships & ecosystem, Databricks.

At the heart of this ecosystem is OpenSharing, an open-source protocol for secure, governed data sharing. Our storage partners are implementing OpenSharing servers to expose their data estates directly to Databricks Serverless Compute. The path is simple: the storage partner stands up a OpenSharing endpoint, you connect it to Unity Catalog, and you instantly gain secure, governed access to your on-premise data in Databricks without data migration.

This integration provides a single, unified catalog across your entire hybrid environment. Customers can now use Databricks Serverless Compute, Genie, AgentBricks, and model training to query and reason over data that never leaves the premises. The result? Zero data movement, no duplication of data and zero compliance risk.

This is not a roadmap aspiration. Customers can try these integrations today. Partners building these integrations follow the Partner Well-Architected Framework — a technical blueprint covering architecture, security, and certification criteria.

“Customers want to break down data silos and unify all of their Data and AI estate – including large amounts of data that still sits on-premises. Thanks to on-premises storage partners leveraging the open source Open Sharing protocol, customers can now seamlessly unify, govern, and analyze all of their data estate in Databricks Unity Catalog – unlocking the full value of their data in the Databricks Data Intelligence Platform,” added Jonathan Keller, VP, product management, Databricks.

Click to enlarge

Our Launch Partners
We are proud to announce integrations with the following leading storage providers:

Click to enlarge

MinIO — General Availability
MinIO AIStor is the bridge that seamlessly connects the Databricks Data Intelligence Platform with enterprise data that can’t move to the cloud. By natively implementing the open Open Sharing protocol at the storage layer, AIStor eliminates complexity and enables Databricks customers to efficiently query live on-premises Apache Iceberg™️ and Delta tables under full Unity Catalog governance. It extends Serverless Compute, Genie, and Agent Bricks to on-premises data, bringing the full power of the Databricks Platform to an enterprise’s most critical data.

“AI and analytics initiatives are often constrained by where data resides, particularly in environments with strict security, sovereignty, or operational requirements. By bringing native OpenSharing to AIStor, we’re enabling organizations to securely expose data where it lives while giving Databricks seamless access through open standards. This removes a major barrier between enterprise data and AI, allowing organizations to activate previously inaccessible data for AI, analytics, and agentic applications without compromising control,” said Ugur Tigli, CTO, MinIO.

Everpure (formerly Pure Storage) — Private Preview
Everpure and Databricks enable organizations to use on-prem data directly in the cloud removing the need for data replication or duplication.This is delivered through an OpenSharing connector that bridges data in object storage with databricks core workspaces in a secure and gated manner.

“Everpure and Databricks enable organizations to access and analyze on-premises data directly from the cloud without the need for replication or duplication. Continuously moving data between environments is costly and unsustainable at scale. Customers are looking for a simpler approach that balances cost, compliance, and data sovereignty while reducing operational complexity,” said Chadd Kenney, VP, product management, Everpure.

Qumulo — Private Preview in July 2026
Qumulo has integrated OpenSharing with its new NeuralSearch, allowing customers to securely share Qumulo-stored data with Databricks across core, cloud, and edge environments—without replication, extra costs, or complexity. Using NeuralSearch, users can discover relevant datasets, including unstructured content, via natural-language queries and seamlessly share those curated tables with Databricks via OpenSharing.

“Organizations can no longer afford the cost, complexity, and delays of copying massive datasets across environments just to support AI and analytics. By combining Qumulo NeuralSearch with Databricks OpenSharing, customers can securely discover, govern, and share both tabular and unstructured data across core data centers, edge locations, and public clouds – in real time, without moving the data itself. Together, we’re helping organizations accelerate AI initiatives, unify governance, and unlock faster time-to-insights from globally distributed data while maintaining a single source of truth,” said Brandon Whitelaw, SVP, head, product, Qumulo.

Vast Data — Private Preview in August 2026
Vast Data is extending the Vast AI Operating System with OpenSharing support to help enterprises bridge Databricks workflows with data that resides across on-premises and hybrid infrastructure – without requiring massive data movement or migration. The integration will give customers more flexibility to access, process and operationalize data across cloud, data center and emerging AI infrastructure environments while supporting modern hybrid AI and analytics workloads.

“AI infrastructure is becoming fundamentally hybrid. Customers increasingly want the ability to process data wherever it makes the most sense economically and operationally, while still maintaining seamless access across environments. OpenSharing support extends the Vast AI Operating System’s ability to bridge Databricks workflows with data that resides across cloud and on-premises infrastructure for modern AI and analytics applications. Unlike traditional storage platforms, Vast combines data services, distributed processing and AI infrastructure orchestration into a unified operating system for AI data at scale,” said John Mao, VP, global technology alliances, Vast Data.

What’s Next

Integrations Coming Soon
In addition to our launch partners, momentum across the storage ecosystem continues to accelerate. We have secured commitments from Cohesity, Commvault, HPE, NetApp, Nutanix, and Rubrik —to build native integrations by the end of the year.

Collectively, these partners, along with launch partners, manage hundreds of exabytes of enterprise data, spanning high-performance unstructured media, secondary backup archives, cost-effective cloud storage, and hyperconverged private cloud estates.

Unlocking Unstructured Data
Today’s launch establishes structured, tabular data as fully governed and accessible across this ecosystem. But we know that exciting opportunity lies ahead in unstructured data: the images, PDFs, videos, medical scans, engineering simulations, and backup archives that represent the majority of enterprise data under management — and the raw material for the next generation of RAG pipelines and fine-tuned models.

We are actively working to extend the OpenSharing protocol with Volumes APIs — exposing unstructured files from on-premises storage directly to Databricks for GenAI workloads. With this coming, partners managing massive unstructured estates — from media and imaging archives to enterprise backup repositories — will unlock an entirely new class of AI use cases for their customers.

This is what it means to govern everything.

The era of “Migrate Everything” is over. The era of “Govern Everything” starts today.

Read also :

Data + AI Summit 2026: Databricks Launches LTAP - The First Lake Transactional/Analytical Processing Architecture
New data processing architecture that unifies transactions, analytics, streaming, and operational data on a single copy of storage in the lake
June 18, 2026 | Press Release

Data + AI Summit 2026: Databricks Launches Genie One: All-New Agentic Coworker for Every Team
Covering any data, structured or unstructured, analytical or operational, inside or outside Databricks
June 18, 2026 | Press Release

Data + AI Summit 2026: Databricks Enters the Marketing Industry with CustomerLake
The Agentic Customer Data Platform (CDP)
June 17, 2026 | Press Release

Data + AI Summit 2026: Databricks Agrees to Acquire Panther
Further establishing the security Lakehouse category
June 17, 2026 | Press Release

Data + AI Summit 2026: Databricks Launches Lakehouse//RT to Bring Real-Time Analytics Directly to the Lakehouse
Adding a new dimension to the Lakehouse
June 17, 2026 | Press Release

Data + AI Summit 2026: Databricks Announces OpenSharing
A new open standard for sharing of data and AI assets across platforms and organizations
June 16, 2026 | Press Release

Data + AI Summit 2026: Databricks Announces Keynote Lineup and Programming for the World’s Largest Data and AI Conference
30,000+ attendees expected to gather in San Francisco to hear keynotes from Databricks co‑founders Ali Ghodsi, Matei Zaharia, Arsalan Tavakoli‑Shiraji and Reynold Xin, alongside guest speakers Satya Nadella in a pre-recorded fireside chat, Greg Brockman, and Magesh Bagavathi
June 15, 2026 | Press Release

Comments

The same question has been raised for years: should we move data closer to compute, or instead spin up compute where the data resides? Several approaches have emerged from various players, ranging from "bottom" infrastructure actors to those operating further "up the stack." And obviously when we speak about this topic it is not at all related to data redundancy for data protection. It is about leveraging production data on primary storage and potentially some older data versions kept on secondary storage units.

File sharing relies on long-established methods, joined more recently by HTTP/S3-style access, and then by data sharing approaches that make things more "application-aware" through various data integration mechanisms. We have also seen multiple iterations of global file storage attempting to aggregate and unify data points of presence.

Data has exploded everywhere, a statement we have repeated for years, and one that remains accurate for multiple reasons. What was said in the mobile era, then during the rise of social media, then analytics, and now with AI, all confirms that data volume growth is inevitable. This makes it obvious that data should be processed where it is generated, or at the very least be made accessible through advanced remote mechanisms.

The opening statement from Databricks comes as a surprise, since it suggests cloud was and remains inevitable. It was not, and it is not. Even if the "move to the cloud" mantra sounds simple and is amplified by hyperscalers with strong footprints and voices in the industry, the reality is different. Of course every enterprise has some data it can "cloudify," but the same enterprises often prefer to keep other data on-premises for obvious sovereignty reasons. Ultimately, this is about control, governance, independence, competitive advantage, and IP preservation.
The world is therefore more grey than black-and-white: we live in a hybrid environment, not of equal proportions, but one where data is distributed across various entities.

Beyond sovereignty, data gravity and the costs associated with manipulating data create a clear no-go feeling. We are seeing many repatriation projects driven by these extra costs tied to traffic and operational loads.

Another factor is response time and latency, which can make some projects inefficient and significantly reduce expected value. Even with cloud data services positioned close to the data, cost remains a key dimension, especially at scale.

At the same time, the hybrid model has created the need for unified cataloging, since data is now even more dispersed. For this kind of service, contenders are coming from various horizons.

Is it a surprise to see Databricks "touching" data storage-related services with this news? Yes and no. The company has shaken the industry, breaking boundaries, and with its large, highly talented team, billions of dollars in investments and several key acquisitions, it can enter different domains without difficulty. They confirm, once again, that data is everything, a point also illustrated by storage companies repositioning themselves as data- and software-centric players.

OpenSharing genuinely shakes existing positions thanks to its open source DNA, which will accelerate adoption. This upper data services layer, sitting above storage infrastructure and access methods, operates as a common glue with endpoint instances deployed on top of each storage entity, without any data movement. This last point is paramount.

The first key partners have been clearly identified - Everpure, MinIO, Qumulo and VAST Data - agile, fast-moving companies that quickly decided to join the initiative. A second wave is expected later this year, extending the initiative to key players in data protection and preservation coupled with secondary storage, such as Cohesity, Commvault and Rubrik, alongside NetApp and Nutanix. The presence of Nutanix in this list is somewhat surprising. We also expect DDN, WEKA and possibly Cloudian to follow. And where are Dell, HPE, IBM or Oracle?

As Coldago Research has been promoting for a long time with its U3 - Universal, Unified and Ubiquitous - storage model, Databricks, through the OpenSharing initiative, clearly participates in and validates this view, once again.