What Jensen Actually Said at GTC 2026, and What It Means

March 18, 2026


I attended the GTC 2026 keynote in person on March 16. I took notes throughout so I could analyze it afterward rather than react to it in real time.

At the highest level, this keynote was more newly explicit than fundamentally new. If you already follow NVIDIA closely, only a minority of it felt genuinely new in direction. Roughly 20-30% felt actually new, 30-40% was not new in direction but newly explicit, and the rest was reinforcement, packaging, and narrative control.

What felt genuinely new was mostly operational: Rubin + Groq as an explicit heterogeneous inference lane, Kyber extending beyond NVL144 into a larger roadmap family, the COUP / TSMC co-packaged optics claim, and the degree to which Vera CPU, STX storage, and the full system stack were elevated as first-class parts of the platform. What felt most surprising was how bluntly Jensen said things NVIDIA had mostly implied before: token pricing tiers, engineer token budgets, $40 billion per gigawatt factory economics, 100% internal AI coding-tool usage, and the CPU / storage / tool-use argument.

Here is what I think matters most, and why.


The Factory Is the Product

The strongest through-line of the keynote was not better silicon. It was integrated system design across chip, rack, network, storage, cooling, and factory operations.

Jensen opened by saying NVIDIA now has three platforms: CUDA-X, systems, and AI factories. He also framed the stack as a “five-layer cake” beginning with land, power, and shell, then chips, platforms, models, and applications. He is not presenting AI factories as a side business. He is presenting them as a core platform category alongside CUDA and systems.

He repeatedly described modern AI infrastructure as factories that generate tokens rather than facilities that merely store files. Throughput, token speed, and tokens per watt are the new operating metrics. That is one of the keynote’s most important conceptual shifts. He is trying to make AI infrastructure legible as an industrial production system.

He then anchored the economics by saying a one-gigawatt data center costs about $40 billion over 15 years “even when you put nothing on it.” He used that claim to argue that the wrong architecture is not cheap enough even if the chip itself were free.

That is a facility economics argument disguised as a chip competitiveness argument. And he is directionally right: once factory cost dominates, watt-efficiency matters more than nominal chip price. But the keynote uses best-case token output and top-tier performance framing to make the competitive gap feel absolute. The right takeaway is not that competitors are irrelevant. It is that infrastructure cost amplifies architecture efficiency, and competitive comparison is moving away from chip list price and toward total output per constrained megawatt.


The $1 Trillion Claim Needs Careful Reading

On stage, Jensen used demand and purchase-order language without cleanly separating booked NVIDIA revenue from total system-level commitment. His own facility framing strongly suggests he is thinking at the level of the AI factory, not only at the level of chip revenue.

That ambiguity matters less than it first appears, because NVIDIA’s own revenue trajectory already accounts for most of the number. NVIDIA reported FY25 revenue of roughly $130.5 billion and FY26 revenue of roughly $215.9 billion. Current consensus for FY27 is around $336.7 billion, and FY28 guidance puts cumulative NVIDIA revenue across those four years at roughly $1.17 trillion. The $1 trillion figure Jensen used on stage is probably just that: a round number for NVIDIA’s own cumulative revenue trajectory at current guidance, not a separate demand claim.

You can back into the power scale. Using Jensen’s own heuristic of roughly $40 billion for a 1 GW AI factory, $1.17 trillion of cumulative NVIDIA revenue implies significant physical scale. If you assume something like $15 billion per GW for the site, power, cooling, shell, and other non-NVIDIA pieces, then the remaining implied NVIDIA-system spend is about $25 billion per GW. On that split, the cumulative revenue maps to roughly 47 GW.

Against publicly available datacenter demand projections, that is large but not out of family. Current AI datacenter capacity sits around 10 GW as of 2024. Industry projections put that at roughly 12 GW for 2025, 47 GW for 2027, and 104-156 GW for 2030. Hyperscaler announced builds through 2030 total roughly 50 GW. Total hyperscaler datacenter capacity as of early 2026 is around 80 GW, and global datacenter capacity in 2024 was roughly 122 GW.

So roughly 47 GW is already several times the current AI-only installed base, a very material fraction of the 2027-2030 AI buildout, and still well below the outer 200+ GW grid-buildout scenarios. The right read is not that Jensen’s number breaks the demand model. The right read is that it sits toward the high end of the near- and medium-term envelope while still fitting inside the longer-duration infrastructure projections.


Tokens as Pricing Architecture, Not Commodity

Jensen presented a tiered token-pricing model live on stage: free, $3 per million tokens, $6 per million, $45, and eventually $150 per million tokens for the most demanding use cases. He linked those tiers to throughput, latency, context length, and model quality. He also said engineers may need annual token budgets, and that NVIDIA could give engineers “probably half” of base pay in tokens so they can be amplified tenfold.

The token budget idea is more interesting than the token pricing idea, because it surfaces a hard organizational question: which engineers actually get amplified tenfold, and how do you demonstrate it? Jensen presented the claim as though amplification is uniform — give any engineer tokens and they scale. But the real distribution is likely very uneven. Some engineers working with AI coding tools have compressed projects that would have taken years into weeks. Others use the same tools and produce roughly what they would have produced anyway, just with more autocomplete. The difference is not the tool. It is whether the engineer can decompose, direct, and verify work at the pace the tool enables. That means the token budget is not really a compensation concept. It is a selection and measurement concept. The companies that figure out how to identify and measure real amplification will get disproportionate returns. The ones that hand out token budgets uniformly will just have higher API bills.

The word “commodity” is doing persuasive work here. Tokens are not standardized across models, tokenizers, context lengths, or quality tiers. Jensen’s own pricing ladder shows that the market will clear on service quality and latency, not on raw token count alone.

The better reading is that he is trying to create both a common operating language and a common monetization language for AI factories. Tokens per watt, token speed, and tiered token pricing make AI infrastructure easier for operators, investors, and CFOs to reason about.

But he is also talking about the value and revenue created by the users of NVIDIA systems. His pricing ladder implicitly tells cloud providers, model companies, and AI-native software companies how to think about charging their own customers: faster models, larger context, and smarter service tiers should command higher prices. The keynote reads less like a literal commodity theory and more like a suggested pricing architecture for the entire stack of services built on top of NVIDIA factories.


The Roadmap Is a System Roadmap, Not a Chip Cadence

Jensen did not present the roadmap as a simple sequence of chips. He presented it as a sequence of rack-scale system lanes with different scale-up methods, interconnect media, and deployment roles.

The two most important rack families are Oberon and Kyber. These are not just product names. They are two different ways NVIDIA is packaging scale-up:

Oberon is the standard rack system. Backwards compatible, copper scale-up, with an optical scale-up path to NVLink 576. This is the continuity path.

Kyber is a new scale-up rack architecture introduced with Rubin Ultra. 144 GPUs in one NVLink domain, vertical compute insertion, a rack midplane, and NVLink switches behind the compute plane. Later extended on the roadmap to Feynman Kyber NVL1152.

The physical difference matters. Jensen said Rubin slides in horizontally, while Rubin Ultra in Kyber goes in vertically. He described compute in the front and NVLink switching behind the midplane. Kyber is not merely “more GPUs in the same box.” It is a different mechanical and service model. Walking the exhibit hall confirmed it. These were the first systems that really read as essentially four-foot-wide footprints, with sidecars looking less like exceptions and more like the new normal.

For anyone making infrastructure decisions, the distinction matters:

  • For immediate deployment and customer rollout, Oberon matters more.
  • For facility design and new construction, Kyber and the Feynman-class forward envelope matter more.

New halls should be designed so they can gracefully take Oberon now while already being capable of supporting what the Kyber / Feynman direction implies next. Oberon is the backward-compatible occupant. It should not be the long-term design target for the building.


Heterogeneous Inference Is Now Explicit

Jensen’s Rubin + Groq framing divided inference between math-heavy and latency-sensitive phases. Rubin handles the large-model math and memory-heavy parts, while Groq extends the highest-value low-latency tier. He thanked Samsung for manufacturing the Groq LP30 chip and said they were “cranking as hard as they can.”

This is a real architecture change. Last year NVIDIA was still presenting Blackwell as a broad platform for pretraining, post-training, and reasoning inference. NVIDIA had already moved toward disaggregated serving in software, where processing and generation phases are separated so each can be optimized for its own job. What changed here is that the same phase-specific logic is now allowed to cross silicon families, not just GPU partitions.

The Groq angle should be read carefully. Jonathan Ross’s TPU history matters because it signals a real architectural lineage: compiler-led execution, inference-first thinking, and aggressive attention to latency. But Groq is not simply “TPU again.” The public Groq pitch leans much harder on deterministic, software-scheduled execution and large on-chip SRAM, whereas current TPU framing is still centered on systolic-array compute, HBM, and the compiler stack around XLA.

The admission, if there is one, is narrow. It is not that NVIDIA’s main architecture was wrong. It is that premium, latency-sensitive decode is now important enough that NVIDIA is willing to orchestrate a specialized lane rather than force every inference phase through one silicon type.

That point generalizes beyond Groq. Frontier AI labs pursuing custom or semi-custom silicon are often doing the same thing conceptually: identifying a workload slice that appears persistent enough in their serving architecture to justify dedicated optimization. The logic is not “replace the whole platform.” The logic is “own the expensive, permanent-enough lane.”

That has real infrastructure consequences. Mixed rack populations, different thermal envelopes, and new traffic patterns become normal. The challenge is no longer only higher density. It is controlled heterogeneity at scale.


The Power-Density Staircase

The detailed roadmap matters less than the direction of travel: higher rack power, full liquid cooling, tighter integration, and more aggressive electrical design. Jensen explicitly described 600 kW per rack and 800 VDC for the Kyber generation. The keynote makes it clear that a facility designed only for the present rack generation will age badly.

This is where the market’s usual framing of liquid cooling as a “premium” starts to break down. The liquid premium is not just a component markup. It can be compressed by how the facility is architected.

Switch’s EVO design is an existence proof. Rack-in-chamber, top-of-chamber air handling, liquid brought into the building envelope, and cooling distribution that can be allocated differently to each rack position. That kind of architecture reduces reliance on large fan walls and outside air handlers while preserving flexibility across different rack mixes. Switch has publicly demonstrated 250 kW per rack with air cooling in this architecture. Once cooling is designed as part of the system architecture rather than bolted on as an equipment premium, the usual liquid-vs-air CAPEX framing stops being the right comparison.

The forward envelope makes this urgent. If Rubin is near-term and Kyber follows quickly, facilities can no longer assume slow infrastructure transitions. New AI capacity has to be designed against the forward power envelope, not the current shipment.


The Vera CPU and the Agentic Control Plane

One of the easiest things to miss is that Jensen explicitly said NVIDIA is now selling a lot of Vera CPUs standalone and that it is already becoming a multi-billion-dollar business.

The role of the CPU was not “old host processor attached to a GPU.” He described Vera as the processor for orchestration, agentic workflows, tool use, data processing, and high single-thread performance.

The numbers make the point. The standalone Vera CPU rack is described as 256 liquid-cooled CPUs, 45,056 threads, 22,500 concurrent independent environments, 400 TB of LPDDR5, and 300 TB/s of aggregate memory throughput. That is not a host-compute footnote. It is a design center for the control plane of agentic systems.

This matters because agentic applications are structurally heavier than traditional AI. Industry estimates put agentic systems at roughly 15x the token consumption of traditional AI software, with fluid agent intercommunication running around 1,500+ tokens per second versus roughly 100 for human interaction. Once the system starts looking like that, a lot more of the bottleneck lives in coordination, memory movement, tool use, and state management around the GPU. The CPU becomes strategic again.

Jensen reinforced this from the software side. He said the industry has moved from generative to reasoning to agentic, and that 100% of NVIDIA now uses AI coding tools — Claude Code, Codex, and Cursor — internally for software engineering. He also said storage gets “pounded” by agents far harder than by humans, and introduced the STX storage rack as an AI-native storage tier built with BlueField-4, designed for structured data, vector search, and KV cache as part of the live inference path.

The picture that emerges is a full system: GPU racks do the large-model heavy lifting, Vera CPU racks handle orchestration and agent-side control, STX storage racks keep the data path close enough to matter, and Groq LPX racks extend the lowest-latency decode tier. NVIDIA is trying to own more of the control plane around AI, not just the tensor plane.


Why Blackwell Is Still Shipping

This is the most practical question the keynote raises. If Rubin is clearly better, why is Blackwell still moving?

Because availability, revenue timing, and factory utilization matter more than waiting for the theoretically better rack.

Blackwell is the system that can be deployed at scale now. AI demand is arriving faster than the replacement cycle. If a customer has power, shell, cooling, and commercial commitments ready today, leaving that capacity empty while waiting for Rubin is economically worse than deploying Blackwell and monetizing it now.

Facilities, power delivery, cooling plants, and customer contracts are all on their own timelines. Blackwell is the bridge generation that fills halls, exercises liquid-cooling operations, and generates output while Rubin ramps into broader availability. And not every workload needs the highest-value Rubin configuration on day one. Jensen’s own segmentation logic implies that different service tiers want different economics.

In the current AI market, pace is itself strategic. Deploying now preserves product momentum, customer learning, organizational learning, and market position. Waiting for the next rack can mean giving up a full cycle of iteration while somebody else compounds.

Blackwell is not being bought because it wins the 2027 comparison. It is being bought because it wins the 2026 availability, utilization, and momentum problem.


The Software Layer Is the Quiet Land Grab

OpenClaw, NemoClaw, and OpenShell were not side topics. NVIDIA was positioning itself as the reference platform and reference stack for agentic systems — the same strategic move it made with CUDA, applied one layer up.

OpenClaw had captured real attention as a fast-moving open project about building agents with agents. NVIDIA’s move was to embrace that story, extend it with enterprise guardrails and NVIDIA models, and distinguish NemoClaw as the reference design for serious deployment. Jensen drew explicit parallels to Linux, HTML, and Kubernetes moments.

That is powerful, but it is also the part of the keynote where hype risk is highest. Claims like every software company becoming an agentic-as-a-service company are strategically provocative, but not yet operationally proven at scale. Jensen also made a big deal of OpenClaw’s GitHub star growth, which should be read as a measure of ecosystem mindshare and meme velocity rather than production adoption. In a world of recursive agents and synthetic participation, that signal gets noisier fast.

This narrative took a lot of keynote time. The Omniverse / DSX material felt clipped by comparison. DSX — the design-and-operations layer for AI factories — deserved more stage time than it got. Jensen explicitly framed it in the same layered pattern as CUDA-X: hardware, libraries, APIs, and partner ecosystem. DSX is NVIDIA’s attempt to define the AI factory itself as a platform category with a control plane and software interfaces, not just an ops overlay. But the open agent ecosystem story won the room and the news cycle.


What I Took Away

The keynote’s most durable message is that the factory is the business. Jensen repeatedly collapsed file storage, data centers, AI infrastructure, and token production into one economic picture. Whether every forecasted number proves right or not, he is telling the market to think in factory terms: land, power, shell, rack, throughput, and output.

Companies that can deliver power, cooling, and deployment capacity on the timeline AI customers actually need are not peripheral to AI. They are part of the production function.

The more the market accepts factory-scale economics as the right frame, the more NVIDIA can argue that a more expensive system is cheaper in context. The more the forward power envelope outpaces legacy facility design, the more the facility itself becomes part of the competitive advantage. And the more NVIDIA succeeds at defining the platform interfaces — from CUDA through DSX through OpenClaw — the harder it becomes for any single layer of the stack to route around them.

That is the real story of this keynote. Not any single chip. The system.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Jason A. Hoffman

Subscribe now to keep reading and get access to the full archive.

Continue reading