Huawei OceanStor A800 Ranked ≠1 in 2024 MLPERF AI Benchmarks
Testing platform to measure performance of AI hardware, software, and services
This is a Press Release edited by StorageNewsletter.com on October 18, 2024 at 2:00 pmMLPERF benchmark suites provide a standardized testing platform to measure the performance of AI hardware, software, and services. The benchmark suites were jointly developed by Turing Award winner David Patterson, Google, Stanford University, Harvard University, and other top enterprises and academic institutions. MLPERF benchmarks are viewed as the world’s most authoritative and influential AI performance benchmarks.
This year’s MLPERF Storage performance tests evaluated 13 mainstream vendors. A distributed training test program simulated GPU compute processes and reproduced a model in which AI servers maximized access to the storage system. Such simulations measure the maximum number of GPUs supported by an AI storage system, which represents overall storage performance.
The MLPERF Storage benchmark for 3D U-Net workload aligns with industry trends for multi-modal models and demands the highest storage bandwidth. It provides a more comprehensive and accurate assessment of storage performance in large-scale AI clusters. The 3D U-Net workload entails the highest storage bandwidth per FLOPS, and requires data be read directly from storage nodes, not cached on hosts in advance. This reflects the actual storage performance and large AI model experiences.
Huawei OceanStor A800 ranked ≠1 in this AI storage performance test, meeting the data throughput requirements of 255 GPUs using just a single storage system. The solution’s GPU utilization was above 90%, while its single controller enclosure achieved a bandwidth of 679GB/s – 10x greater than that of conventional storage systems.
In addition, OceanStor A800 provides 100TB/s – level bandwidth through scale-out expansion, reducing the read/write time of checkpoint data from 10mn to seconds. The time required for resumable training is under 15mn. This minimizes GPU wait times, improves end-to-end computing power utilization by over 30%, and comprehensively enhances the training efficiency of large AI models.
This was Huawei data storage’s first-ever participation in the MLPERF Storage v1.0 benchmark testing.
Huawei’s data storage team has said it is committed to innovation and that the new OceanStor A series AI storage has been specifically designed for hybrid workloads in AI scenarios. It uses an architecture that provides new hardware, excellent performance, exabyte-level scalability, and long-term memory capabilities for inference. Their aim has been to comprehensively accelerate the training and inference processes of large AI models.
Looking ahead, Huawei’s storage team plans to further advance in the realm of large AI models, continually pushing the boundaries of performance and keeping pace with the evolving data landscape to shape the future of data.