GWDG Achieves 4x Storage Performance Leap with Megware's All-NVMe Lustre Solution Powered by Xinnor xiRAID

The Gesellschaft für wissenschaftliche Datenverarbeitung (GWDG), operator of the Emmy HPC cluster serving over 1,500 compute nodes and 150,000 CPU cores, has deployed a new multi-petabyte all-NVMe storage system in partnership with Megware.Built on Xinnor‘s xiRAID and Lustre architecture, the solution surpasses GWDG’s previous storage system in all performance areas by a factor of more than 4 – fully exploiting the capabilities of its 2×100G OPA network connections.

The deployment eliminates I/O bottlenecks for mixed HPC and AI workloads while laying the groundwork for future upgrades to 400G networking and next-gen AMD EPYC-based compute.

GWDG required a storage solution capable of supporting diverse, data-intensive research workloads ranging from traditional HPC simulations to modern AI model training. The goals were clear: eliminate I/O bottlenecks, provide true high availability with no single points of failure, support predictable scaling, and enable a smooth transition to next-generation network speeds without a disruptive forklift upgrade.

Megware selected the Celestica SC6100 Storage Bridge Bay platform – a dual-node chassis sharing 24 NVMe drives – as the foundation for the solution. Each drive is split into two namespaces and connected via PCIe lanes to both server nodes simultaneously, enabling concurrent utilization and exposing 48 virtual drives per system. Xinnor’s xiRAID manages a topology of RAID10 groups for metadata and RAID6 groups for object storage, delivering 245TB of usable capacity per system.

Integrated with Pacemaker and Corosync for high-availability orchestration, the cluster is protected against both multiple drive failures and complete node failures. If a node goes offline, its RAID groups automatically fail over to the surviving node and fail back seamlessly upon recovery.

The solution connects at 4×100Gb/s Omni-Path per system (2 per server), fully exploiting the available network bandwidth.

“The performance improvement we’ve seen since deploying the new storage system has been remarkable. Our researchers no longer face I/O bottlenecks when running demanding HPC or AI workloads, and the high-availability design gives us the operational confidence we need in a production environment. Megware and Xinnor delivered exactly what they promised,” said Sebastian Krey, HPC system architect and storage expert, GWDG.

“Partnering with GWDG on this project was a great opportunity to show what a well-architected all-NVMe Lustre solution can achieve in a real production research environment. By combining Xinnor’s xiRAID with the Celestica SC6100 platform and Lustre, we delivered a system that not only meets today’s demands but is built to grow with GWDG’s needs – from 100G to 400G networking and beyond,” said Markus Hilger, HPC engineer, Megware.

“GWDG’s deployment demonstrates exactly what xiRAID was built for – delivering NVMe performance at scale without sacrificing resilience or operational simplicity. By fully leveraging the available network bandwidth, the new system achieves more than four times the throughput of its predecessor, and the architecture is designed to step up to 400G connectivity when GWDG is ready. We are proud that xiRAID is at the heart of one of Europe’s most demanding research storage environments,” said Dmitry Livshits, CEO, Xinnor.

The architecture was designed from the outset for forward compatibility. With the upcoming switch to the CN5000 400G interconnect generation, GWDG will unlock a further step-change in storage performance – with no need to replace the underlying storage hardware.

In parallel, GWDG is preparing for the next generation of HPC compute based on Megware’s Eureka platform, a direct liquid cooling (DLC) warm-water cooled node design. The upcoming system will consist of more than 300 nodes featuring AMD EPYC Turin 9745 128-core and Venice 128-core CPUs, delivering a major leap in compute density and energy efficiency for the Emmy cluster ecosystem.

GWDG Achieves 4x Storage Performance Leap with Megware’s All-NVMe Lustre Solution Powered by Xinnor xiRAID

Illustrating the performance and resiliency provided by Xinnor