Alluxio new Release Enterprise AI 3.6

Alluxio, Inc., the AI and data acceleration platform, announced the release of Alluxio Enterprise AI 3.6, delivering breakthrough capabilities for model distribution, model training checkpoint writing optimization, and enhanced multi-tenancy support.

Alluxio New Release Enterprise Ai 3.6

This latest version enables organizations to accelerate AI model deployment cycles, reduce training time, and ensure data access across cloud environments.

AI-driven organizations face increasing challenges as model sizes grow and inference infrastructures span multiple regions. Distributing large models from training to production environments introduces significant latency issues and escalating cloud costs, while lengthy checkpoint writing processes substantially slow down the model training cycle.

“We are excited to announce that we have extended our AI acceleration platform beyond model training to also accelerate and simplify the process of distributing AI models to production inference serving environments,” said Haoyuan (HY) Li, founder and CEO, Alluxio. “By collaborating with customers at the forefront of AI, we continue to push the boundaries of what anyone thought possible just a year ago.“

Alluxio Enterprise AI version 3.6 includes following key features:

Performing Model Distribution: Alluxio Enterprise AI 3.6 leverages Alluxio Distributed Cache to accelerate model distribution workloads. By placing the cache in each region, model files need only be copied from the Model Repository to the Alluxio Distributed Cache once per region rather than once per server. Inference servers can then retrieve models directly from the cache, with further optimizations including local caching on inference servers and memory pool utilization. Benchmarks demonstrate impressive throughput with Alluxio AI Acceleration Platform achieving 32GB/s throughput, exceeding the 11.6GB/s available network capacity by 20GB/s.
Fast Model Training Checkpoint Writing: Building on the CACHE_ONLY Write Mode introduced earlier, version 3.6 debuts the new ASYNC write mode, delivering up to 9GB/s write throughput in 100Gb network environments. This enhancement reduces the time needed for model training checkpoints by writing to the Alluxio cache instead of directly to the underlying file system, avoiding network and storage bottlenecks. With ASYNC write mode, checkpoint files are written to the underlying file system asynchronously to optimize training performance.
New Management Console: Alluxio 3.6 introduces a comprehensive web-based Management Console designed to enhance observability and simplify administration. The console displays key cluster information, including cache usage, coordinator and worker status, and critical metrics such as RW throughput and cache hit rates. Administrators can also manage mount tables, configure quotas, set priority and TTL policies, submit cache jobs, and collect diagnostic information directly through the interface without command-line tools.

Release also introduces several enhancements to Alluxio administrators:

Multi-Tenancy Support: This release brings multi-tenancy capabilities through integration with Open Policy Agent (OPA). Administrators can now define fine-grained role-based access controls for multiple teams using a single, secure Alluxio cache.
Multi-Availability Zone Failover Support: Alluxio Enterprise AI 3.6 adds support for data access failover in multi-Availability Zone architectures, ensuring HA and stronger data access resilience.
Virtual Path Support in FUSE: The new virtual path support allows users to define custom access paths to data resources, creating an abstraction layer that masks physical data locations in underlying storage systems.

Availability:
Alluxio Enterprise AI version 3.6 is available for download