Lightbits Names Former Infineon VP to Scale Inferra KVCache Engine for NeoClouds

Lightbits Labs, inventor of NVMe over TCP and the Inferra KV cache acceleration engine for AI inference, announced the appointment of former Infineon executive Ramesh Chettuvetty as SVP, product and business for AI solutions.

Ramesh Chettuvetty Lightbits Chettuvetty will drive the business forward and build the playbook for Lightbits’ AI solutions in the next generation of data centers and NeoClouds, ensuring that the company’s technology solves hardware efficiency and performance challenges faced by organizations running GPU-heavy workloads. Central to this role is Inferra by Lightbits, a fully optimized KV cache engine designed to eliminate GPU stalls in long-context AI inference.

The appointment comes as NeoCloud providers face growing infrastructure challenges tied to long-context LLM inference workloads, including GPU memory bottlenecks, KV cache scaling limitations, and rising infrastructure costs. Inferra is designed to address these constraints by enabling disaggregated KV cache architectures that reduce GPU stalls and expand context window scalability beyond traditional GPU memory limits.

“Lightbits is currently working with NeoCloud providers evaluating high-performance scaleout software-defined architectures for long context, high-density GPU environments,” said Eran Kirzner, co-founder and CEO, Lightbits Labs. “Ramesh brings deep business and technology expertise across memory systems and AI infrastructure that will help accelerate Inferra’s adoption as a high-performance backend for modern inference environments.”

Inferra effectively breaks the memory wall by enabling practically infinite KV cache capacity through smart tiering and pre-fetching algorithms, allowing context windows to scale from 32K to 1M tokens in production systems today, and to expand to 10M tokens in the next generation of production systems without scaling HBM. Inferra enables NeoCloud operators to:

Accelerate Time to First Token (TTFT) and reduce Inter-Token Latency (ITL)
provide per-agent QoS with SLAs
scale context windows with a path to 10M tokens for next-gen production systems
while improving GPU utilization
and reducing dependence on costly HBM capacity expansion

Chettuvetty will be instrumental in positioning Inferra as a foundational infrastructure layer for next-gen AI inference environments built on state-of-the-art GPU technologies.

“I am excited to join Lightbits at this pivotal time,” said Ramesh Chettuvetty. “Inferra directly addresses one of the most critical bottlenecks in AI inference – scalable KV cache management. Combining Lightbits’ proven software-defined storage leadership with next-generation AI inference infrastructure creates a compelling opportunity.”

Lightbits Names Former Infineon VP to Scale Inferra KVCache Engine for NeoClouds

Continuing to iterate products for demanding configurations

Comments