Enfabrica Unveils Ethernet-Based AI Memory Fabric System for Efficient Superscaling of LLM Inference

Enfabrica Corporation announced the availability of its Elastic Memory Fabric System ‘EMFASYS’, a transformative hardware and software solution designed to improve the compute efficiencies in large-scale, distributed, memory-bound AI inference workloads.

EMFASYS is a 1st commercially available system that integrates high-performance Remote Direct Memory Access (RDMA) Ethernet networking with an abundance of parallel ComputeExpressLink (CXL) based DDR5 memory channels. The solution provides AI compute racks with fully elastic memory bandwidth and memory capacity, in a standalone appliance reachable by any GPU server at low, bounded latency over existing network ports.

Generative, agentic, and reasoning-driven AI workloads are growing exponentially – in many cases requiring 10 to 100 times more compute per query than previous Large Language Model (LLM) deployments and accounting for billions of batched inference calls per day across AI clouds. The EMFASYS solution addresses the critical need for AI clouds to extract the highest possible utilization of GPU and High-Bandwidth-Memory (HBM) resources in the compute rack while scaling to greater user/agent count, accumulated context, and token volumes. It achieves this outcome by dynamically offloading HBM to commodity DRAM using a caching hierarchy, load-balancing token generation across AI servers, and reducing stranding of expensive GPU cores. When deployed at scale with the company’s EMFASYS remote memory software stack, the solution enables up to 50% lower cost per token per user, allowing foundational LLM providers to deliver significant savings in a price/performance tiered model.

“AI Inference has a memory bandwidth-scaling problem and a memory margin-stacking problem,” said Rochan Sankar, CEO, Enfabrica. “As inference gets more agentic versus conversational, more retentive versus forgetful, the current ways of scaling memory access won’t hold. We built EMFASYS to create an elastic, rack-scale AI memory fabric and solve these challenges in a way that hasn’t been done before. Customers are excited to partner with us to build a far more scalable memory movement architecture for their GenAI workloads and drive even better token economics.”

EMFASYS Features and Benefits:

Powered by Enfabrica’s 3.2Tb/s Accelerated Compute Fabric SuperNIC (ACF-S) elastically connecting up to 144 CXL memory lanes to 400/800GbE ports
Offloads GPU and HBM consumption by enabling shared memory targets of up to 18TB CXL DDR5 DRAM per node, networked using 3.2Tb/s RDMA over Ethernet
Effectively aggregates CXL memory bandwidth for AI by enabling the application to stripe transactions across a wide number of memory channels and Ethernet ports
Uncompromised AI workload performance with read access times in microseconds and software-enabled caching hierarchy that hides transfer latency within inference pipelines
Drives down the cost of LLM inference at scale, particularly in large-context and high-turn workloads, by containing growth in GPU server compute and memory footprints
Outperforms flash-based inference storage solutions with 100x lower latency and unlimited write/erase transactions

The firm’s EMFASYS system effectively allows AI cloud operators to deploy massively parallel, low-latency ‘Ethernet memory controllers’, fed by wide GPU networking pipes and populated with pooled, commodity DRAM they can purchase directly from DRAM suppliers. Scaling memory with EMFASYS alleviates the tax of linearly growing GPU HBM and CPU DRAM resources within the AI server based on inference service scale requirements alone.

The announcement of the EMFASYS memory fabric system follows the company’s successful sampling earlier this year of its 3.2Tb/s ACF-S chip – the AI networking silicon at the heart of EMFASYS. The ACF-S chip delivers multi-port 800GbE connectivity to GPU servers and 4X the I/O bandwidth, radix, and multipath resiliency of any other GPU-attached NIC product available today. By virtue of the chip’s flexibility, ACF-S supports high-throughput, zero-copy, direct data placement and steering not only across a 4- or 8-GPU server complex, but alternatively across 18+ channels of CXL-enabled DDR memory. EMFASYS leverages the ACF-S chip’s high-performance RDMA-over-Ethernet networking and on-chip memory movement engines, along with a remote memory software stack based on Infiniband Verbs, to enable massively parallel, bandwidth-aggregated memory transfers between GPU servers and commodity DRAM over resilient bundles of 400/800G network ports.

The release of the EMFASYS AI memory fabric system builds on Enfabrica’s expanding presence in the AI infrastructure industry and its pioneering approach to optimizing and democratizing accelerated computing networks. Earlier this year, the company opened a new R&D center in India to grow its world-class engineering team and scale silicon and software product development. In April, the company began sampling its 3.2Tb/s ACF-S chip following the announcement of the solution’s general availability late last year. Enfabrica is also an active advisory member of the Ultra Ethernet Consortium (UEC) and a contributor to the Ultra Accelerator Link (UALink) Consortium.

Availability:
Both the EMFASYS AI memory fabric system and the 3.2Tb/s ACF SuperNIC chip are currently sampling and piloting with customers.