What are you looking for ?

Innovative Data Infrastructure Forum: Huawei Redefining AI Storage Systems and Solutions

Announcing OceanStor A800 to increase AI cluster utilization by 30%, and, as for performance, delivering high bandwidth and IO/s

At Huawei Innovative Data Infrastructure Forum 2024 in Berlin, Germany, themed as ‘Data Awakening: Building Leading AI-Ready Data Infrastructure’, Dr. Peter Zhou, VP and president, data storage product line, Huawei Technologies Co., Ltd., delivered a speech about ‘Redefining Data Storage in the Data Awakening Era’.

Dr. Peter Zhou, VP and president, data storage product line, Huawei
Click to enlarge

Huawei Redefining Data Storage In The Data Awakening Era

Dr. Zhou envisioned the future of storage being driven by multiple capabilities, including ultra-performance, data resilience, new data paradigm, scalability, sustainability, and data fabric.

Since the 1990s, enterprise applications have advanced from single hosts, databases, virtualization, and file sharing to encompassing big data and high performance data analytics (HPDA) – an evolution that has led to a shift in data storage technologies from DAS, SAN, and NAS to unstructured storage. With the rise of generative AI, the demand for robust storage solutions has become even more critical in today’s technological landscape.

As the cluster scale of large AI models has grown to include tens of thousands, and even hundreds of thousands of GPUs, this expansion has resulted in more frequent cluster faults and training interruptions. The lengthy process of repeatedly writing checkpoint data and resuming training leads to extended idle time of computing cards, thereby causing cluster utilization to drop below 50%. What’s more, by 2026, the power consumption of global data centers is expected to reach 2.3x that of 2022, and is equivalent to the annual power consumption of Japan. More than half of the power in data centers will be consumed by AI.

AI looks to disrupt traditional storage, by not only focusing on performance, reliability, and data paradigm, but also on scalability, sustainability, and data fabric. In the data awakening era, Huawei will redefine data storage through innovation in these 6 dimensions:

  1. Ultra-performance: The company enhances storage performance by a factor of 10, compared to traditional storage. The storage also supports bandwidth in petabyte/s and 100 million IO/s, which improves the efficiency of the entire generative AI process.

  2. Data resilience: Innovative architecture and technologies boost high reliability of 99.9999%. The built-in ransomware detection engine raises the detection accuracy to 99.99%. Even the checkpoint recovery time during AI training is shortened to less than a minute.

  3. New data paradigm: The multi-dimensional tensor data is enabled to support fast data retrieval via an intelligent search engine. The retrieval-augmented generation (RAG) technology works with the embedded knowledge base to eliminate hallucination in large AI models.

  4. Scalability: A single storage cluster can be scaled out for exabyte-level capacity and each engine can be scaled up with more GPUs, DPUs, or NPUs for near-storage computing.

  5. Sustainability: Innovations in storage media and devices have brought about outstanding storage energy efficiency (less than 1W/TB) and storage density (greater than 1PB/U).

  6. Data fabric: The capabilities of storage metadata management and search enable global data visibility and manageability, as well as data mobility that is 10 times more efficient.

These innovations have laid the ground for the release of the high-performance OceanStor A800, which is an addition to the firm’s OceanStor A series storage models. Tailored to AI, OceanStor A800 can increase AI cluster utilization by 30%, and as for performance, it delivers high bandwidth and IOs, which are 4 and 8x better than its peer vendor’s. Regarding scalability, OceanStor A800 supports scaling out to exabyte-level capacity with up to 512 controllers, as well as scaling up to a maximum of 4,096 computing cards. As for conserving space and energy, it achieves a storage density of 1PB/U and energy efficiency of 0.7W/TB. It also provides a new data paradigm with vector index, tensor data, and RAG. In terms of data resilience, the accuracy of ransomware detection is improved from 99.9% to 99.99%. In addition, the data fabric capability facilitates data asset management.

At the same time, storage media innovations are driving sustainable development. The firm’s newly released high-capacity SSDs provide 10x more capacity with the same disk size, which can further reduce energy consumption of a data center. With 128TB capacity per disk, these SSDs consume 88% less storage space and 92% less energy than the peer vendor’s SSDs when storing every one PB of data.

To be AI-ready, enterprises must get data-ready. The Omni-Dataverse global file system built in the DME makes enterprise data assets visible, manageable, and mobile across regions, thereby building a solid AI data lake storage foundation for enterprises.

Dr. Zhou ended by emphasizing the company’s commitment to redefining data storage that focuses on customer challenges and demands in the data awakening era, and building leading AI-ready data infrastructure for greater customer value.

Read also :