NVIDIA GTC 2026: Everything You Need to Know About Jensen Huang's Keynote and the Vera Rubin Platform

Key Takeaways

Vera Rubin delivers 5x inference and 3.5x training performance over Blackwell
Token generation cost is approximately 10x lower than Blackwell
288GB of HBM4 memory per GPU: 50% more than Blackwell's HBM3e
The NVL72 rack requires 4x fewer GPUs to train equivalent MoE models
Dedicated KV-cache tier via BlueField-4 DPU delivers 5x better long-context inference performance
Vera Rubin in full production since January 2026: cloud instances arriving H2 2026 from AWS, Azure, Google Cloud, OCI, CoreWeave, Lambda, Nebius, and Nscale

What Is NVIDIA GTC and Why Does It Matter in 2026

GTC stands for GPU Technology Conference. It began as a focused gathering for GPU computing researchers and has evolved into the world's most significant AI and accelerated computing summit. In 2026, the event draws more than 30,000 attendees from over 190 countries, spanning developers, researchers, enterprise leaders, cloud architects, and investors, all gathering to understand where AI infrastructure is heading next.

Jensen Huang described GTC 2026 with uncommon directness: "GTC is the epicenter of the AI industrial era. AI is no longer a single breakthrough or application. It is essential infrastructure. Every company will use it. Every nation will build it."

The event covers five layers of what NVIDIA calls the AI stack: energy, chips, infrastructure, models, and applications. GTC 2026 includes:

More than 1,000 sessions across AI, accelerated computing, robotics, and quantum
60 hands-on labs for developers to work directly with NVIDIA tools and platforms
Nine full-day developer workshops covering the full AI stack
A dedicated Quantum Day, the first in GTC history

NVIDIA GTC 2026 Keynote: Date, Time, and How to Watch

Detail	Information
Keynote Date	Monday, March 16, 2026
Keynote Time	11:00 a.m. PT (2:00 p.m. ET / 7:00 p.m. GMT)
Venue	SAP Center, San Jose, California
Pregame Show	8:00 a.m. PT on March 16 (online only)
Livestream	Free at nvidia.com, no registration required
Investor Q&A	Tuesday, March 17 at 9:00 a.m. PT
Conference Duration	March 16 to March 19, 2026

The pregame show features industry CEOs including Aravind Srinivas (Perplexity), Harrison Chase (LangChain), Arthur Mensch (Mistral AI), and others ahead of the main keynote.

What Is the NVIDIA Vera Rubin Platform

The Vera Rubin platform is the successor to NVIDIA's Blackwell architecture and the most ambitious chip design the company has ever shipped. It is named after Vera Florence Cooper Rubin, the pioneering American astronomer whose observations in the 1970s and 1980s provided some of the most compelling evidence for dark matter, fundamentally transforming humanity's understanding of the structure of the universe.

Jensen Huang first unveiled the Rubin architecture at GTC 2025. At CES 2026 in January, Huang confirmed that the Vera Rubin platform had entered full production. GTC 2026 is expected to provide the final deployment details, partner announcements, and software ecosystem updates that complete the picture.

Rubin is NVIDIA's first extreme-codesigned platform: six chips designed simultaneously from the ground up:

Vera CPU: 88 Olympus custom ARM cores, 128GB GDDR7, 227 billion transistors
Rubin GPU: 288GB HBM4, 50 PFLOPS inference, 35 PFLOPS training
NVLink 6 Switch: high-speed chip-to-chip interconnect
ConnectX-9 SuperNIC: next-generation networking interface
BlueField-4 DPU: powers the KV-cache storage platform
Spectrum-6 Ethernet Switch: silicon photonics-based networking

Vera Rubin NVL72: Full Specifications

Component	Specification
Vera CPU cores	88 Olympus custom ARM cores
Vera CPU memory	128GB GDDR7
Vera CPU transistors	227 billion
Rubin GPU memory per GPU	288GB HBM4
GPU inference performance	50 PFLOPS (NVFP4)
GPU training performance	35 PFLOPS (NVFP4)
Inference vs Blackwell	Up to 5x improvement
Training vs Blackwell	3.5x improvement
Token generation cost vs Blackwell	~10x lower
GPUs for equivalent MoE training	4x fewer than Blackwell
Rack memory (LPDDR5X)	Up to 54TB
Total transistors in rack	220 trillion
Cooling	100% liquid cooled, fanless and tubeless
Rack assembly time	5 minutes (vs 2 hours for Blackwell, 18x faster)
Networking bandwidth	Up to 1.6 Tb/s via Quantum-CX9 InfiniBand

Vera Rubin vs Blackwell: The Full Comparison

Metric	Blackwell	Vera Rubin	Improvement
Inference performance (NVFP4)	~10 PFLOPS	50 PFLOPS	Up to 5x
Training performance (NVFP4)	~10 PFLOPS	35 PFLOPS	3.5x
GPU memory	192GB HBM3e	288GB HBM4	50% more, faster bandwidth
Token generation cost	Baseline	~10x lower	10x reduction
GPUs to train MoE model	Baseline	4x fewer	4x efficiency gain
Rack assembly time	~2 hours	~5 minutes	18x faster
Cooling	Air and liquid hybrid	100% liquid, fanless	Simpler, more efficient
Factory throughput	Baseline	10x higher	10x supply scaling
Networking	Spectrum-X Ethernet	Spectrum-6 Photonics	5x power efficiency

The 10x reduction in token generation cost is arguably more significant for the industry than the raw performance gains. The economics of inference are the primary constraint on AI adoption at scale. Every 10x reduction in token cost opens up workloads that were previously not economically viable: longer reasoning chains, larger context windows, higher request volumes, and applications that require multiple model calls per user interaction.

Five New Innovations Inside the Vera Rubin Platform

1. Next-Generation NVLink 6 Interconnect

NVLink 6 advances over NVLink 4 (used in Blackwell) with significantly higher per-link bandwidth, enabling tighter coupling across all 72 GPUs in the NVL72 rack and reducing communication overhead that limits scaling efficiency in large training and inference runs.

2. Next-Generation Transformer Engine

Improved support for NVFP4 precision formats enables higher throughput with lower memory bandwidth requirements, specifically tuned for the agentic and reasoning workloads driving the majority of inference growth in 2026.

3. Third-Generation Confidential Computing

The Vera Rubin NVL72 is the first rack-scale AI platform to deliver NVIDIA Confidential Computing simultaneously across the full CPU, GPU, and NVLink domain. Model weights, training data, and inference inputs are protected end-to-end, even on shared multi-tenant cloud infrastructure.

4. Second-Generation RAS Engine

The Reliability, Availability, and Serviceability engine spans GPU, CPU, and NVLink domain with real-time fault detection across every chip in the rack, automated fault tolerance, and proactive maintenance scheduling before failures cause downtime.

5. NVIDIA Inference Context Memory Storage Platform

Powered by BlueField-4 DPU with 150TB of NVMe storage, this platform introduces a dedicated AI-native KV-cache tier directly within the rack. Each GPU gains an additional 16TB of context memory, delivering:

5x higher tokens per second for long-context inference
5x better performance per TCO dollar vs software-based KV-cache management
5x better power efficiency for sustained long-context serving at scale

Who Is Deploying Vera Rubin First

Provider	Deployment Notes
Microsoft Azure	NVL72 rack-scale systems at Fairwater AI superfactory sites, scaling to hundreds of thousands of Vera Rubin Superchips
Amazon Web Services (AWS)	Confirmed for Vera Rubin-based instances in H2 2026
Google Cloud	Confirmed for Vera Rubin-based instances in H2 2026
Oracle Cloud Infrastructure	Confirmed for Vera Rubin-based instances in H2 2026
CoreWeave	Among the first to offer NVIDIA Rubin via CoreWeave Mission Control
Lambda	Confirmed NVIDIA Cloud Partner for Rubin deployments
Nebius	Confirmed; NVIDIA made a $2 billion investment (8.3% stake)
Nscale	Confirmed NVIDIA Cloud Partner for Rubin deployments

What to Expect at GTC 2026 Beyond Vera Rubin

Inference-focused chips with Groq technology: analysts indicate NVIDIA may announce a chip incorporating Groq's extreme low-latency inference IP, targeting agentic AI workflows where latency compounds across multi-agent chains
NVIDIA Cosmos for physical AI and robotics: significant updates for autonomous vehicles and industrial robots, plus updates to the Alpamayo open reasoning model family
Agentic AI infrastructure: software and networking strategy for the multi-agent orchestration era, including agent observability and memory management tooling
Quantum Day: the first dedicated quantum computing day in GTC history, with hybrid classical-quantum computing roadmap details
Open model ecosystem: NVIDIA's evolving role beyond hardware and expansion of the open model strategy

Why GTC 2026 and Vera Rubin Matter to Developers

Token costs are about to fall significantly. The 10x reduction in token generation cost will flow through to cloud provider pricing in the 12–18 months following H2 2026 deployments. Workloads that are currently cost-constrained will become substantially more economical.
Long-context inference becomes a solved infrastructure problem. The dedicated KV-cache tier via BlueField-4 addresses one of the hardest challenges in production LLM serving at scale.
Agentic AI workloads have a purpose-built platform. The combination of the Inference Context Memory Storage Platform, updated Transformer Engine, and NVLink 6 creates an architecture specifically suited for multi-step, memory-intensive agentic systems.
Supply constraints on Blackwell will ease. Rubin's 10x factory throughput improvement means significantly more compute capacity per unit of manufacturing time, easing the GPU supply crunch.
Confidential computing at rack scale changes enterprise AI security. Hardware-level isolation across the full CPU, GPU, and NVLink domain removes the primary security objection for enterprises running proprietary models on shared cloud infrastructure.

"Computing has been fundamentally reshaped as a result of accelerated computing and artificial intelligence. Some ten trillion dollars or so of the last decade of computing is now being modernized to this new way of doing computing."

Jensen Huang, NVIDIA Founder and CEO, CES 2026 Keynote

Who Was Vera Rubin: The Scientist Behind the Chip Name

Vera Florence Cooper Rubin was born in Philadelphia in 1928 and became one of the most influential observational astronomers of the twentieth century. Working at the Carnegie Institution of Washington, she spent years measuring the rotation curves of spiral galaxies. Classical physics predicted that stars at the outer edges of galaxies should orbit more slowly than stars near the center; however, Rubin found the opposite.

Her conclusion, confirmed across dozens of galaxies over decades of observation: galaxies must be embedded in vast halos of invisible mass now called dark matter, which contributes far more to a galaxy's gravitational field than all of its visible stars combined. Dark matter is now understood to constitute approximately 27% of the total mass-energy content of the universe.

Rubin passed away in December 2016 and did not receive the Nobel Prize during her lifetime despite repeated nominations, a fact widely regarded as one of the Nobel committee's most significant oversights. NVIDIA naming its most powerful AI platform after her is a recognition of a scientist whose foundational discoveries changed how humanity understands the cosmos.

Final Thoughts: The Inference Era Begins at GTC 2026

The shift from AI's training era to its inference era is the defining transition happening in 2026. Vera Rubin is NVIDIA's answer to the inference era, and GTC 2026 is where that answer becomes a deployment reality.

A 10x reduction in token generation cost changes the economics of every AI application currently being built. A 5x inference performance leap means the same budget buys 5x more capacity. A 4x reduction in GPUs required for training cuts the barrier to frontier model research. And hardware-level confidential computing at rack scale removes the last major security objection to running proprietary models on shared cloud infrastructure.

Jensen Huang takes the stage at SAP Center on Monday, March 16. The keynote streams live at nvidia.com starting at 11 a.m. PT. For anyone building in AI, working in data centers, or watching the semiconductor industry, it is worth the two hours. The Vera Rubin era has begun.

💡 Strategic Insight

This isn't just technical knowledge, it's the kind of engineering thinking that separates production systems from toy projects. Apply these patterns to reduce costs, improve reliability, and ship faster.

Frequently Asked Questions

NVIDIA GTC 2026 runs from March 16 to March 19, 2026, in San Jose, California. Jensen Huang's keynote takes place at SAP Center on Monday, March 16, at 11 a.m. PT. The keynote will be livestreamed for free at nvidia.com with no registration required.

The NVIDIA Vera Rubin platform is the successor to Blackwell and NVIDIA's first extreme-codesigned six-chip AI platform. It pairs the Vera CPU (88 Olympus ARM cores, 128GB GDDR7) with the Rubin GPU (288GB HBM4), delivering up to 5x inference performance and 3.5x training performance over Blackwell at 10x lower token generation cost.

The NVIDIA Vera Rubin platform is in full production as of January 2026. Rubin-based products will be available from AWS, Microsoft Azure, Google Cloud, Oracle Cloud Infrastructure, CoreWeave, Lambda, Nebius, and Nscale in the second half of 2026.

Vera Rubin delivers up to 5x the inference performance and 3.5x the training performance of Blackwell, while increasing the transistor count by only 1.6x. Token generation cost is approximately 10x lower, and the NVL72 rack requires up to 4x fewer GPUs to train equivalent MoE models.

Jensen Huang is expected to announce updates across the full AI stack at GTC 2026, including final Vera Rubin platform details, inference-focused chips potentially incorporating Groq technology, updates on physical AI and robotics through the NVIDIA Cosmos platform, agentic AI infrastructure announcements, and NVIDIA's broader strategy for the inference era.

Vera Florence Cooper Rubin was a pioneering American astronomer whose observations provided some of the most compelling evidence for the existence of dark matter. NVIDIA named its next-generation AI computing platform after her as part of its tradition of honoring scientists who made foundational contributions to human knowledge.

Tagged with

NVIDIA GTC 2026Vera Rubin GPUJensen HuangAI Chip 2026BlackwellHBM4AI InfrastructureNVLink 6Confidential Computing

TL;DR

Vera Rubin delivers 5x inference and 3.5x training performance over Blackwell
Token generation cost is approximately 10x lower than Blackwell
288GB of HBM4 memory per GPU: 50% more than Blackwell's HBM3e
The NVL72 rack requires 4x fewer GPUs to train equivalent MoE models
Dedicated KV-cache tier via BlueField-4 DPU delivers 5x better long-context inference performance
Vera Rubin in full production since January 2026: cloud instances arriving H2 2026 from AWS, Azure, Google Cloud, OCI, CoreWeave, Lambda, Nebius, and Nscale

Need help implementing this?

I help teams architect scalable systems, build AI-powered applications, and ship production-ready software.

Let's Architect Your System Hire for AI / Cloud / Full-Stack

Written by

Gaurav Garg

Full Stack & AI Developer · Building scalable systems

I write engineering breakdowns of major tech events, architecture deep dives, and practical guides based on real production experience. Every post is built from code, not theory.

Articles

Yrs Exp.

500+

Readers

Work with me

Get tech breakdowns before everyone else

Engineering insights on AI, cloud, and modern architecture, delivered when it matters. No spam.

Join 500+ engineers. Unsubscribe anytime.