Published in Graphics

Nvidia's Rubin CPX GPU targets AI inferencing arms race

by on10 September 2025


Rack-sized monster to crush token counts

Nvidia has decided that building GPUs the size of small towns is the way to stay ahead in AI, and its latest weapon is the Rubin CPX which is aimed squarely at inferencing and large context models.

Unveiled at the AI Infra Summit, the Rubin CPX launches a fresh category in what Nvidia is calling the CFX lineup. It’s not replacing anything yet, but it will co-exist with the Rubin GPUs and the new Vera CPUs, forming what NVIDIA is billing as a rack-scale revolution.

Rubin CPX pushes 30 petaFLOPs of NVFP4 compute and comes strapped with 128 GB of GDDR7, which seems to be Nvidia's way of keeping things just about affordable without resorting to high-bandwidth memory.

The GPU will sit inside what the company is calling the “Vera Rubin NVL144 CPX” rack, which is a glorified exaFLOP card made up of 144 Rubin CPX GPUs, 144 Rubin GPUs, and 36 Vera CPUs. Altogether, that’s eight exaFLOPs to play with.

That’s 7.5 times more than the much hyped Blackwell Ultra, making it clear Nvidia isn't giving AMD much room to breathe.

Nvidia reckons the rack will deliver up to “30x to 50x return on investment,” which is a term which only exists in PowerPoint presentations. The real trick, apparently, is long-context AI inferencing, with Nvidia claiming it will handle workloads involving up to a million tokens. To pull that off, it’s pairing the hardware with Spectrum-X Ethernet interconnects and a lot of marketing.

There will be other configurations besides the NVL144 rack, but Nvidia hasn’t said what those will look like yet. For now, Rubin CPX is being pitched as a “relatively low-cost” solution, though we’ll believe that when we see the invoice.

Nvidia seems determined to close every remaining gap in the AI space, and this latest move into dedicated inferencing kit leaves competitors like AMD and Intel stuck playing catch-up once again.

Rate this item
(0 votes)

Read more about: