Alibaba cloud says its Aegaeon GPU system cuts Nvidia use by 82 per cent

Published in AI

Alibaba cloud says its Aegaeon GPU system cuts Nvidia use by 82 per cent

by Nick Farrell on22 October 2025

font size decrease font size increase font size

Scheduler squeezes more work from fewer H20 accelerators

Alibaba Cloud boffins have emerged from their smoke filled labs having found a way to make Nvidia’s costly GPUs actually earn their keep.

The company’s new Aegaeon system reportedly slashes GPU requirements by 82 per cent while improving throughput almost ninefold.

The work was detailed in a paper presented at the 2025 ACM Symposium on Operating Systems in Seoul, written by engineers from Alibaba’s infrastructure division and Peking University. During several months of production testing, the number of Nvidia H20 accelerators needed to support dozens of large language models fell from 1,192 to just 213.

Unlike most research that focuses on faster model training, Aegaeon tackles the waste that happens during inference. It acts as a scheduler, parcelling out tiny slices of GPU time across different models with unpredictable demand. This approach keeps the chips busier and allows one H20 to serve several models simultaneously.

The result, according to the paper, is a sharp rise in system-wide efficiency, or “goodput”. By virtualising GPU access at the token level, Aegaeon maintains high utilisation even when workloads spike or idle unpredictably.

The South China Morning Post reported that all the tests used Nvidia’s H20 accelerator, one of the few still available to Chinese buyers under current US export restrictions.

If the figures stand up to scrutiny, Alibaba may have shown a way for Chinese data centres to do more with the limited hardware they can still import.

Last modified on 22 October 2025

Rate this item

(0 votes)

Tagged under