OCP Global Summit 2025: d-Matrix Announces SquadRack, Industry's First Rack-Scale Solution Purpose-Built for AI Inference at Datacenter Scale

Generative AI inference pioneer d-Matrix Inc., in collaboration with AI infrastructure leaders Arista, Broadcom and Supermicro, announced SquadRack, the industry’s first blueprint for disaggregated standards-based rack-scale solutions for ultra-low latency batched inference.

Showcased at the Open Compute Project Global Summit this week, SquadRack comes at a time when cloud providers, including sovereign clouds and enterprises, are struggling to keep up with generative AI inference demands. SquadRack provides a reference architecture to build turnkey solutions enabling blazing fast agentic AI, reasoning and video generation. It delivers up to 3x better cost-performance, 3x higher energy efficiency, and up to 10x faster token generation speeds compared to traditional accelerators.

Click to enlarge

SquadRack configured with eight nodes in a single rack enables customers to run Gen AI models up to 100 billion parameters with blazing fast speed. For larger models or large-scale deployments, it uses industry standards-based ethernet to scale out to hundreds of nodes across multiple racks.

“With the launch of SquadRack, we’re enabling customers to scale inference the right way – with high efficiency, low latency, and standards-based deployment. Corsair delivers the compute-memory acceleration, while JetStream delivers I/O acceleration. Combined with Supermicro’s AI servers, Arista’s ethernet switches, and Broadcom’s PCIe and ethernet switch chips, we’re delivering an AI inference rack that speeds up time to deployment. It’s a big step forward in making AI infrastructure commercially viable at scale,” said Sid Sheth, CEO and co-founder, d-Matrix.

“Supermicro is proud to collaborate with d-Matrix in delivering an efficient AI inference rack solution that combines compute acceleration, efficient networking, and server density in one integrated platform. Our proven track record in rack-level integration, along with d-Matrix’s inference acceleration products, offers customers a practical path to scaling AI inference across the enterprise and cloud,” continued Vik Malyala, president & managing director, EMEA and SVP technology & AI, Supermicro.

“As a leader in high-performance PCIe and Ethernet connectivity, Broadcom is excited to see d-Matrix advancing AI infrastructure solutions. d-Matrix is unlocking a new level of performance and efficiency in AI inference while leveraging the standards-based networking ecosystem that Broadcom has long supported,” confirmed Jas Tremblay, VP and GM, data center solutions group, Broadcom.

“Arista’s cloud networking fabric is designed to meet the rigorous demands of AI infrastructure. JetStream’s ability to enable accelerator-to-accelerator communication over standard ethernet pairs perfectly with Arista’s high-performance switches. Together, we’re demonstrating how AI inference can scale efficiently without requiring proprietary networking fabrics,” concluded Vijay Vusirikala, distinguished lead, AI systems and networks, Arista Networks.

Click to enlarge

SquadRack’s key components include:

d-Matrix Corsair Inference Accelerators with innovative compute-memory integration delivering ultra-low latency, high-throughput inference
d-Matrix JetStream IO Accelerators enabling ultra-low latency device-initiated, accelerator-to-accelerator communication using standard ethernet
Supermicro X14 AI Server Platform integrated with Corsair accelerators and JetStream NICs
Broadcom PCIe switches for scaling up within a single node
Arista Leaf Ethernet Switches connected to JetStream NICs enabling high performance, scalable, standards-based multi-node communication
d-Matrix Aviator™ software stack that makes it easy for customers to deploy Corsair and JetStream at scale and speed up time to inference