Mellanox HDR 200G IB With Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Technology

Mellanox Technologies, Ltd. announced that its HDR 200G IB with the Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) technology has set performance records, doubling deep learning operations performance.

QM8700 Series – Quantum HDR 200Gb/s IB switch

The combination of the company’s In-Network Computing SHARP with NVIDIA Corp.‘s 100 Tensor Core GPU technology and Collective Communications Library (NCCL) deliver efficiency and scalability to deep learning and AI applications.

The combination of the NVIDIA GPUs, Mellanox’s IB, GPUDirect RDMA and NCCL to train neural networks has already become a standard when scaling out deep learning frameworks, such as Caffe, Caffe2, Chainer, MXNet, TensorFlow, and PyTorch. With the SHARP technology and HDR IB, deep learning training’s data aggregation operations can be offloaded and accelerated by the IB network, resulting in improving their performance by two times.

The joint effort with NVIDIA and testing performed in the firm’s performance labs, using the HDR IB Quantum connecting four system hosts, each with eight V100 Tensor Core GPUs with NVLink interconnect technology and a single ConnectX-6 HDR adapter per host, have achieved an effective reduction bandwidth of 19.6GB/s by integrating SHARP’s native streaming aggregation capability with NVIDIA’s latest NCCL 2.4 library, which now takes advantage of the bi-directional bandwidth available from the Mellanox interconnect. This implementation is effectively two times higher bandwidth than NVIDIA’s current tree-based implementation using the same hardware configuration.

In the more common setup for this configuration, four HCAs in each system host are used for balanced performance across a variety of workloads where the initial SHARP and NCCL results yielded an expected 70.3GB/s. For more densely populated GPU-based systems, like NVIDIA DGX-2, which houses 16 NVIDIA V100 Tensor Core GPUs with NVLink in each system node, the in-network capabilities and available bidirectional bandwidth of the Mellanox fabric can be leveraged.

“Our long-standing collaboration with NVIDIA has again delivered a robust solution that takes full advantage of the best-of-breed capabilities from Mellanox IB, including GPUDirect RDMA and now extending in-network computing to NCCL, which delivers two times better performance for AI,” said Gilad Shainer, VP, marketing, Mellanox Technologies. “HDR IB in-network computing acceleration engines, including the SHARP technology, provide the highest performance and scalability for HPC and AI workloads.“

“Mellanox solutions amplify NVIDIA’s unmatched CUDA-X acceleration libraries using NCCL, our open source collective communication library,” said Ian Buck, VP and GM, accelerated computing, NVIDIA. “Together, we offer solutions that ensure the most demanding AI applications in the data center benefit from cutting-edge performance and scaling efficiency.“

Resources:
Mellanox SHARP
Mellanox Quantum™ HDR 200Gb/s InfiniBand Smart Switches

Read also:
OCP Global Summit: Mellanox Introduces NVMe SNAP Technology to Simplify Composable Storage
BlueField SmartNIC virtualizes storage for any application or OS to accelerate cloud.
March 22, 2019 | Press Release
Confirmation: Nvidia Acquired Mellanox, for $6.9 Billion
To combine computing and connectivity for HPC
March 13, 2019 | Press Release
Mellanox: HDR 200G IB Accelerating Multiple HPC and AI Platforms
More than 20 of world’s largest, high-speed, modular switch systems delivered to data centers in 1Q19
March 4, 2019 | Press Release