Partnership between Los Alamos National Laboratory and AirMettle
For analyzing dimensional data sets from large-scale simulation campaigns while protecting stored data
This is a Press Release edited by StorageNewsletter.com on October 13, 2023 at 2:00 pmA partnership between Los Alamos National Laboratory and AirMettle, Inc. offers a solution for efficiently analyzing highly dimensional data sets from large-scale simulation campaigns while protecting the stored data.
Performing some parts of the analytics near storage reduces the amount of data moved to perform the analysis – reducing both the cost of analytics and the time-to-scientific-insight.
“Our scientific large-scale simulations can generate hundreds of petabytes of highly dimensional floating-point data,” said Gary Grider, HPC division leader at Los Alamos. “But the data associated with a scientific feature of interest can be orders of magnitude smaller than the written data, so a key challenge is quickly and efficiently finding what’s relevant in this sea of data. To optimize this process, we’ve been drawn towards computational storage – processing data in-place and near storage – to eliminate unnecessary data movement while maintaining parallelism and adequate data protection.”
Building on AirMettle’s Real-Time Smart Data Lake (RT-SDL) architecture, Los Alamos and AirMettle have defined a common API to extend the NVMe standard for computational storage devices, empowering them to support in-place analytics. RT-SDL enables scalable analytics to be done near storage using standard interfaces like the S3 object storage interface and standard data formats like Apache Parquet while integrating rigorous data protection using erasure coding.
Scalable and cost-efficient data processing
In extending that technology, computational tasks will be delegated down to the device level, so data can be processed in a far more scalable and power-efficient manner. Reduction of the data near storage means a smaller analytics processing capability can be used as well. These enhancements build on the benefits of AirMettle’s existing unique architecture.
“Accelerating analytics of vast volumes of experiment and simulation data is a key requirement and challenge for the scientific community,” said Donpaul Stephens, founder and CEO of AirMettle. “AirMettle’s RT-SDL is the first computational storage service with highly scalable in-place processing to accelerate analytics by 100x or more and reduce network traffic. Users can easily store and retrieve their data in our object store via standard APIs. AirMettle stripes this data across hundreds of storage nodes, eliminating hot spots for both traditional storage access and high-speed parallel analytics.”
Working with Los Alamos, AirMettle recently published an open-source reference design with APIs, for utilizing analytics in computational storage devices, enabling further scalability and efficiency. It will be presenting this at the 2023 Open Compute Project Global Summit on October 17-19 in San Jose, CA.