DapuStor SSDs Accelerate AI Model Training with NVIDIA’s BaM Framework

DapuStor Corp. accelerates AI model training with a novel BaM framework proposed by Nvidia Corp. using their R5101 and H5100 NVMe SSDs, with tests showing them to be 10x faster and more cost effective.

H5100 NVMe PCIe 5.0 SSDs
Click to enlarge

It showcases a benchmark result in AI model training performance by leveraging the Big Accelerator Memory (BaM) framework. Recent tests conducted with the company’s R5101 (PCIe Gen 4) and H5100 (PCIe Gen 5) NVMe SSDs demonstrate improvements in AI training time, positioning the firm as a player in advancing AI infrastructure efficiency.

AI model training with BaM
As AI models grow larger and more complex, training datasets can reach 10TB, far exceeding the memory capacity of GPUs. Traditional methods of managing data for AI model training – either CPU orchestration or expanding host/GPU memory – are inefficient and costly. BaM offers a novel solution by enabling direct data transfer from NVMe SSDs to GPUs, bypassing CPU intervention entirely.

BaM was developed by a collaboration of industry and academic leaders, including Nvidia, IBM, the University of Illinois Urbana-Champaign, and the University at Buffalo. This approach maximises the parallelism of GPU threads and leverages a user-space NVMe driver, ensuring that data is delivered to GPUs on demand with minimal latency. The result is reduced overhead in data synchronisation between the CPU and GPU, optimising both cost and performance.

Test results
In a rigorous test of the BaM framework, Graph Neural Network (GNN) training was evaluated using heterogeneous large datasets. The test system processed 1,100 iterations, with the company’s R5101 and H5100 SSDs showing a 25x performance increase over traditional methods, significantly reducing end-to-end execution time.

The feature aggregation phase, which is most impacted by I/O performance, saw improvements thanks to the high throughput and low latency of the NVMe SSDs. With BaM, the end-to-end execution time dropped from 250s to less than 10s, and reduced the time spent on feature aggregation from 99% in baseline tests to 77%.Adding additional SSDs further enhanced the system’s ability to handle data in parallel, reducing feature aggregation times by an additional 40%.

Power of PCIe Gen 5
The company’s H5100, with PCIe Gen 5, proved effective in handling demanding workloads. When the batch size was increased to 4,096, it achieved 18% faster feature aggregation compared to the R5101 (PCIe Gen 4), highlighting the performance benefits of the Gen 5 interface in high-IO/s scenarios. It reached an estimated 2 million IO/s in the test, which exceeds the maximum value of PCIe 4.0 SSD products available in the market but is still lower than the specs of H5100, demonstrating its ability to fully capitalize on the capabilities of PCIe Gen 5.

“As AI models scale rapidly, ensuring efficient utilisation of GPU resources has become a critical challenge,” says Grant Li, solutions architect, Dapustor. “With the BaM framework and DapuStor’s high-performance NVMe SSDs, AI model training times can be drastically reduced, leading to cost savings and more efficient use of infrastructure. DapuStor’s R5101 and H5100 SSDs demonstrate the potential for 10x faster model training. And even greater performance is achievable through the use of PCIe Gen 5 technology.”