Nvidia’s GB300 NVL72 achieves 61.4K concurrent agents per megawatt, a 20x leap over H200

1 month ago 26

Nvidia just dropped a number that should make every data center operator do a double take. The company’s new GB300 NVL72 system can handle 61,400 concurrent AI agents per megawatt of power consumed, compared to just 2,600 on the prior-generation H200.

That’s a 20x improvement in agent density per unit of energy. For an industry where electricity costs are rapidly becoming the binding constraint on growth, this isn’t a spec sheet flex. It’s a structural shift in the economics of inference.

What’s inside the rack

The GB300 NVL72 is built on Nvidia’s Blackwell Ultra architecture, packing 72 Blackwell Ultra GPUs and 36 Grace CPUs into a single liquid-cooled rack. The system integrates roughly 20 to 21 TB of HBM3e memory and offers 130 TB/s of NVLink bandwidth, which is the internal data highway that keeps all those GPUs talking to each other without bottlenecking.

Nvidia says the platform delivers up to 50 times the AI factory output of its older Hopper-generation systems. It also claims 10 times the tokens per second per user and five times the throughput per watt.

The system includes software-level optimizations like WideEP/DeepEP and fused Mixture of Experts (MoE) techniques, both designed to squeeze more useful computation out of each watt and each GPU cycle. MoE is a routing system that only activates the parts of a model that are relevant to a given query, rather than firing every neuron every time.

Performance was validated using a benchmark called AgentPerf, developed by Artificial Analysis specifically to evaluate agent-oriented AI performance. The benchmark ran the DeepSeek V4 Pro model, a MoE architecture, with service-level objectives set at either 20 or 60 tokens per second per agent.

Who’s deploying it

The GB300 NVL72 has already attracted commitments from the cloud providers that matter most. Microsoft Azure is deploying the first large-scale cluster built around the system, with those racks expected to power OpenAI workloads beginning in late 2025 and extending into 2026.

CoreWeave has announced the first production instances of the GB300 NVL72, positioning itself as an early mover in the GPU cloud space. Oracle Cloud Infrastructure is also in the deployment pipeline.

What this means for investors

The 20x efficiency gain over H200 creates a direct ROI calculation for data center operators: the same power envelope could theoretically support 20 times more agents on GB300 hardware.

A 50x output improvement over Hopper platforms, combined with five times the throughput per watt, gives Nvidia a credible narrative for ESG-conscious institutional investors. As regulators and shareholders increasingly scrutinize the energy footprint of AI infrastructure, systems that deliver more intelligence per kilowatt-hour will command a premium in procurement decisions.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

Read Entire Article