What are you looking for ?
Advertise with us
RAIDON

Vast Data Platform to Be Foundation of AI-Assisted Discovery

Global data infrastructure offering, unifying storage, database and virtualized compute engine services in scalable system built for future of AI

Vast Data Ltd. unveiled the full vision for the company by introducing a transformative data computing platform designed to be the foundation of AI-assisted discovery.

Vast Platform Feature

The Vast Data Platform is the company’s global data infrastructure offering, unifying storage, database and virtualized compute engine services in a scalable system that was built for the future of AI.

Graduating beyond large language models to AI-assisted discovery
While generative AI and Large Language Models (LLMs) have introduced the world to the early capabilities of AI, LLMs are limited to performing routine tasks like business reporting or reciting information that is already known. The true promise of AI will be realized when machines can recreate the process of discovery by capturing, synthesizing and learning from data – achieving a level of specialization that used to take decades in a matter of days.

The era of AI-driven discovery will accelerate humanity’s quest to solve its biggest challenges. AI can help industries find treatments for disease and cancers, forge new paths to tackle climate change, pioneer revolutionary approaches to agriculture, and uncover new fields of science and mathematics that the world has not yet even considered.

As such, enterprises are increasingly turning their focus to AI applications, and while organizations can stitch together technologies from disparate public or private cloud offerings, customers require a data platform that simplifies the data management and processing experience into one unified stack. Today’s existing data platforms have become popular for global enterprises, reducing infrastructure deployment complexity for business intelligence and reporting applications, but are not built to meet the needs of new deep learning applications. This next-gen of AI infrastructure must deliver parallel file access, GPU-optimized performance for neural network training and inference on unstructured data, and a global namespace spanning hybrid multi-cloud and edge environments, all unified within one easy to manage offering in order to enable federated deep learning.

Vast Platform Diagram 1

Introducing Vast Data Platform
The foundation of this next era of AI computing can only be built by resolving fundamental infrastructure tradeoffs that have previously limited applications from computing on and understanding datasets from global infrastructure in real-time. To bring DL to data, the company is introducing the Vast Data Platform.

Vast Platform Diagram 2

This platform was built with the entire data spectrum of natural data in mind – unstructured and structured data types in the form of video, imagery, free text, data streams and instrument data – generated from all over the world and processed against an entire global data corpus in real-time.

This approach aims to close the gap between event-driven and data-driven architectures by providing the ability to:

  • Access and process data in any private or major public cloud data center
  • Understand natural data by embedding a queryable semantic layer into the data itself.
  • Continuously and recursively compute data in real time, evolving with each interaction

For more than 7 years, the firm has been building toward a vision that puts data – natural data, rich metadata, functions and triggers – at the center of the Vast Disaggregated Shared-Everything (DASE) distributed systems architecture. DASE lays the data foundation for DL by eliminating tradeoffs of performance, capacity, scale, simplicity and resilience to make it possible to train models on all of an enterprise’s data. By allowing customers to now add logic to the system – machines can continuously and recursively enrich and understand data from the natural world.

Unified global datastore, database and AI computing engine

Vast Datastore Element Store

To capture and serve data from the natural world, the company’s 1st engineered the foundation of its platform, the Vast DataStore, a scalable storage architecture for unstructured data that eliminates storage tiering. Exposing enterprise file storage and object storage interfaces, the firm’s DataStore is an enterprise NAS platform built to meet the needs of l AI computing architectures, such as NVIDIA DGX SuperPOD AI supercomputers, as well as big-data and HPC platforms. The exabyte-scale DataStore is built with best-in-class system efficiency to bring archive economics to flash infrastructure – making it also for archive applications. Resolving the cost of flash storage has been critical to laying the foundation for DL for enterprise customers as they look to train models on their proprietary data assets. To date, the company has managed more than 10EB of data globally with customers including Booking.com, NASA, Pixar Animation Studios, Zoom Video Communications, Inc.

Click to enlarge

Vast Database Structure 3

To apply structure to unstructured natural data, the firm has added a semantic database layer natively into the system with the introduction of the Vast DataBase. Applying first-principles simplification of structured data by combining the characteristics of a database, a data warehouse and a data lake all in one simple, distributed and unified database management system, the company has resolved the tradeoffs between transactions (to capture and catalog natural data in real time) and analytics (to analyze and correlate data in real-time). Designed for rapid data capture and fast queries at any scale, the DataBase is the first system to break the barriers of real-time analytics from the event stream all the way to the archive.

Click to enlarge

Vast Data Platform Dataengine Designed To Create Structure And Insight From Unstructured Data

With a foundation for synthesized structured and unstructured data, the Data Platform then makes it possible to refine and enrich raw unstructured data into structured, queryable information with the addition of support for functions and triggers. The Vast DataEngine is a global function execution engine that consolidates data centers and cloud regions into one global computational framework. The engine supports popular programming languages, such as SQL and Python, and introduces an event notification system as well as materialized and reproducible model training that make it easier to manage AI pipelines.

Vast Dataspace Cloud

The final element of the Vast Data Platform strategy is the Vast DataSpace, a global namespace that permits every location to store, retrieve and process data from any location with high performance while enforcing strict consistency across every access point. With the DataSpace, the Data Platform is deployable in on-premises data centers, edge environments and now also extends DataSpace access into public cloud platforms including AWS, Microsoft Azure and Google Cloud.

This global, data-defined computing platform takes a new approach to marrying unstructured data with structured data by storing, processing and distributing that data from a single, unified system.

Click to enlarge

Vast Scheme Data Platform Fv

For enterprise AI and LLM systems to drive new discoveries and understandings, they require:

  • Direct access to the natural world through the DataSpace, eliminating reliance on slow and inaccurate translations
  • Ability to store immense amounts of natural unstructured data in an accessible manner, through the DataStore
  • Intelligence to transform unstructured raw data into an understanding of its underlying characteristics, through the DataEngine
  • And finally, a way to build on all of an organization’s global knowledge, query it, and generate a better understanding of it, through the DataBase

We’ve been working toward this moment since our first days, and we’re incredibly excited to unveil the world’s first data platform built from the ground up for the next generation of AI-driven discovery,” said Renen Hallak, CEO and co-founder, Vast Data. “Encapsulating the ability to create and catalog understanding from natural data on a global scale, we’re consolidating entire IT infrastructure categories to enable the next era of large-scale data computation. With the Vast Data Platform, we are democratizing AI abilities and enabling organizations to unlock the true value of their data.

Availability:

  • The Vast DataStore, DataBase and DataSpace are available within the Data Platform.
  • The Vast DataEngine will be made available in 2024.

To be really impactful in this era of AI and deep learning, you want not only to have lots of data, but high quality data that is correctly organized and available at the right place at the right time,” said Max Tegmark, professor and AI researcher, MIT. “As long as we manage its potential risk, AI will bring an immense upside, helping us solve many of the problems that have stumped humanity so far, from curing diseases to eliminating poverty and stabilizing our climate. It’s incredibly inspiring, so let’s not squander the amazing opportunities that this era of AI-enabled possibilities offers.” – Watch the video here

Vast is allowing us to put all of our rendered assets on one tierless cluster of storage, which offers us the ability to use these petabytes of data as training data for future ai applications,” said Eric Bermender, head, data center and it Infrastructure, Pixar Animation Studios. “We’ve already moved all of our denoising data, ‘finals’ and ‘takes’ data sets onto the Vast Data Platform, specifically because of the AI capabilities this allows us to take advantage of in the future.” – Watch the video here

AI is a big priority for us here at Zoom, and we’re working with Vast on efficiently building and training our AI/ML models across multiple unstructured datasets of video, audio and text data,” said Vijay Parthasarathy, head, AI/ML, Zoom Video Communications, Inc. “Automation is the key, and the Vast Data Platform allows us to build beyond the capabilities that we’ve already built to deliver a frictionless global communication experience.” – Watch the video here

At Allen Institute, the data that we collect is gigantic, with new files growing to hundreds of terabytes within just a day or two – and everything changes about how you need to manage data when it’s that big and that fast,” said David Feng, director, scientific computing, Allen Institute. “We were excited to work with Vast because of the performance they could offer at this scale, and the system’s multiple protocol support is critical to our entire pipeline. Taking advantage of new advancements in AI will be pivotal to help us make sense of all of this data, and the Vast Data Platform allows us to collect massive amounts of data, so that we can ultimately map as many neural circuits as possible – and its mechanisms for collaboration enable us to rapidly share that data around the world.” – Watch the video here

As data is the fuel for AI, enterprises need modern data architectures to position themselves for success amid the greatest technology shift of our time,” said Manuvir Das, VP, enterprise computing, NVIDIA Corp.. “Vast’s new platform provides powerful integration with NVIDIA DGX AI supercomputing to provide companies with a comprehensive solution for transforming their data into powerful generative AI applications.” – Watch the video here

The Vast Data Platform is radically discontinuous from any data platform that has come before it,” said Merv Adrian, principal analyst, IT Market Strategy. “By bringing together structured and unstructured data in a high-performance, globally distributed namespace with real-time analysis, Vast is not only tacking fundamental DBMS challenges of data access and latency, but also offering genuinely disruptive data infrastructure that provides the foundation AI-driven organizations need to solve the problems they haven’t yet attempted to solve. Any organization confronting the limitations of current data management solutions should assess Vast Data as an opportunity.

According to IDC Worldwide AI Spending Guide, February (2023 V1), global spending on AI-centric systems continues to grow at double digit rates, reaching a 5-year (2021-2026) CAGR of 27% and will exceed $308 billion by 2026,” said Ritu Jyoti, group VP, AI and automation research practice, IDC. “Data is foundational to AI systems, and the success of AI systems depends crucially on the quality of the data, not just their size. With a novel systems architecture that spans a multi-cloud infrastructure, Vast is laying the foundation for machines to collect, process and collaborate on data at a global scale in a unified computing environment – and opening the door to AI-automated discovery that can solve some of humanity’s most complex challenges.

The key to unlocking new business insights depends on an organization’s ability to tap into the full potential of their data,” said Harrison Johnson, head, technology partners and ecosystem solutions, Starburst. “By combining the power of Starburst Enterprise, based on open-source Trino, the industry-leading SQL-based query engine, with the unparalleled scalability and performance of the Vast Data Platform, we are empowering organizations to effortlessly navigate the vast oceans of data enterprises have amassed over the years. With the Vast DataBase and the new plug-in for Trino, we are crafting a revolutionary approach to improve analytics performance and redefine the boundaries of what’s possible in today’s AI era.”

Dremio and Vast Data’s partnership embodies our unwavering commitment to revolutionizing the AI landscape and unlocking the full potential of data for organizations,” said Roger Frey, VP alliances, Dremio. “This collaboration brings together Dremio’s lightning-fast data processing capabilities and the scalability of the Vast Data Platform, empowering our joint customers to extract invaluable insights and make informed decisions at an unprecedented scale. Together, we look forward to shaping a future where AI transforms industries across the globe, driving innovation and pushing the boundaries of what’s possible.”

We see firsthand how the AI and advanced analytics programs of enterprises are pushing the boundaries of what CPU processing was designed to deliver,” said Deborah Leff, CRO, SQream. “Our collaboration with Vast Data is a key milestone in the world of artificial intelligence and analytics, and our combined solutions enable customers to break through compute limits to analyze massive data sets at extremely fast speeds, both of which are critical to ushering in the next generation of AI-driven insights.

The partnership between Vast Data and Zetaris marks a pivotal moment in the world of Big Data Initiatives and Artificial Intelligence,” said Vinay Samuel, CEO, Zetaris. “In an era where data reigns as the most valuable asset, our joint mission is to empower enterprises of all sizes to embrace data-driven strategies confidently. “The Vast Data Platform with Zetaris’ AI-driven analytics harnesses the formidable processing capabilities of GPUs to propel and empower AI models, accelerating their performance and enabling groundbreaking advancements in the field of artificial intelligence.”

Resources:
K
eynote presentations from Vast executives and industry influencers, and video testimonials from customers and partners    
The Vast Data Platform
Blog : The Grand Unification Theory of AI Infrastructure    
Blog : Blurring the Lines Between Event-Driven and Data-Driven Architectures    
Blog : The Quest to Build Thinking Machines    
Blog : Democratizing AI for the Enterprise with NVIDIA DGX SuperPOD and Vast

Articles_bottom
ExaGrid
AIC
ATTOtarget="_blank"
OPEN-E