Key Takeaways
- Vera Rubin delivers 5x inference and 3.5x training performance over Blackwell
- Token generation cost is approximately 10x lower than Blackwell
- 288GB of HBM4 memory per GPU — 50% more than Blackwell's HBM3e
- The NVL72 rack requires 4x fewer GPUs to train equivalent MoE models
- Dedicated KV-cache tier via BlueField-4 DPU delivers 5x better long-context inference performance
- Vera Rubin in full production since January 2026 — cloud instances arriving H2 2026 from AWS, Azure, Google Cloud, OCI, CoreWeave, Lambda, Nebius, and Nscale
What Is NVIDIA GTC and Why Does It Matter in 2026
GTC stands for GPU Technology Conference. It began as a focused gathering for GPU computing researchers and has evolved into the world's most significant AI and accelerated computing summit. In 2026, the event draws more than 30,000 attendees from over 190 countries, spanning developers, researchers, enterprise leaders, cloud architects, and investors, all gathering to understand where AI infrastructure is heading next.
Jensen Huang described GTC 2026 with uncommon directness: "GTC is the epicenter of the AI industrial era. AI is no longer a single breakthrough or application. It is essential infrastructure. Every company will use it. Every nation will build it."
The event covers five layers of what NVIDIA calls the AI stack: energy, chips, infrastructure, models, and applications. GTC 2026 includes:
- More than 1,000 sessions across AI, accelerated computing, robotics, and quantum
- 60 hands-on labs for developers to work directly with NVIDIA tools and platforms
- Nine full-day developer workshops covering the full AI stack
- A dedicated Quantum Day, the first in GTC history
NVIDIA GTC 2026 Keynote: Date, Time, and How to Watch
| Detail | Information |
|---|---|
| Keynote Date | Monday, March 16, 2026 |
| Keynote Time | 11:00 a.m. PT (2:00 p.m. ET / 7:00 p.m. GMT) |
| Venue | SAP Center, San Jose, California |
| Pregame Show | 8:00 a.m. PT on March 16 (online only) |
| Livestream | Free at nvidia.com, no registration required |
| Investor Q&A | Tuesday, March 17 at 9:00 a.m. PT |
| Conference Duration | March 16 to March 19, 2026 |
The pregame show features industry CEOs including Aravind Srinivas (Perplexity), Harrison Chase (LangChain), Arthur Mensch (Mistral AI), and others ahead of the main keynote.
What Is the NVIDIA Vera Rubin Platform
The Vera Rubin platform is the successor to NVIDIA's Blackwell architecture and the most ambitious chip design the company has ever shipped. It is named after Vera Florence Cooper Rubin, the pioneering American astronomer whose observations in the 1970s and 1980s provided some of the most compelling evidence for dark matter, fundamentally transforming humanity's understanding of the structure of the universe.
Jensen Huang first unveiled the Rubin architecture at GTC 2025. At CES 2026 in January, Huang confirmed that the Vera Rubin platform had entered full production. GTC 2026 is expected to provide the final deployment details, partner announcements, and software ecosystem updates that complete the picture.
Rubin is NVIDIA's first extreme-codesigned platform — six chips designed simultaneously from the ground up:
- Vera CPU — 88 Olympus custom ARM cores, 128GB GDDR7, 227 billion transistors
- Rubin GPU — 288GB HBM4, 50 PFLOPS inference, 35 PFLOPS training
- NVLink 6 Switch — high-speed chip-to-chip interconnect
- ConnectX-9 SuperNIC — next-generation networking interface
- BlueField-4 DPU — powers the KV-cache storage platform
- Spectrum-6 Ethernet Switch — silicon photonics-based networking
Vera Rubin NVL72: Full Specifications
| Component | Specification |
|---|---|
| Vera CPU cores | 88 Olympus custom ARM cores |
| Vera CPU memory | 128GB GDDR7 |
| Vera CPU transistors | 227 billion |
| Rubin GPU memory per GPU | 288GB HBM4 |
| GPU inference performance | 50 PFLOPS (NVFP4) |
| GPU training performance | 35 PFLOPS (NVFP4) |
| Inference vs Blackwell | Up to 5x improvement |
| Training vs Blackwell | 3.5x improvement |
| Token generation cost vs Blackwell | ~10x lower |
| GPUs for equivalent MoE training | 4x fewer than Blackwell |
| Rack memory (LPDDR5X) | Up to 54TB |
| Total transistors in rack | 220 trillion |
| Cooling | 100% liquid cooled, fanless and tubeless |
| Rack assembly time | 5 minutes (vs 2 hours for Blackwell — 18x faster) |
| Networking bandwidth | Up to 1.6 Tb/s via Quantum-CX9 InfiniBand |
Vera Rubin vs Blackwell: The Full Comparison
| Metric | Blackwell | Vera Rubin | Improvement |
|---|---|---|---|
| Inference performance (NVFP4) | ~10 PFLOPS | 50 PFLOPS | Up to 5x |
| Training performance (NVFP4) | ~10 PFLOPS | 35 PFLOPS | 3.5x |
| GPU memory | 192GB HBM3e | 288GB HBM4 | 50% more, faster bandwidth |
| Token generation cost | Baseline | ~10x lower | 10x reduction |
| GPUs to train MoE model | Baseline | 4x fewer | 4x efficiency gain |
| Rack assembly time | ~2 hours | ~5 minutes | 18x faster |
| Cooling | Air and liquid hybrid | 100% liquid, fanless | Simpler, more efficient |
| Factory throughput | Baseline | 10x higher | 10x supply scaling |
| Networking | Spectrum-X Ethernet | Spectrum-6 Photonics | 5x power efficiency |
The 10x reduction in token generation cost is arguably more significant for the industry than the raw performance gains. The economics of inference are the primary constraint on AI adoption at scale. Every 10x reduction in token cost opens up workloads that were previously not economically viable: longer reasoning chains, larger context windows, higher request volumes, and applications that require multiple model calls per user interaction.
Five New Innovations Inside the Vera Rubin Platform
1. Next-Generation NVLink 6 Interconnect
NVLink 6 advances over NVLink 4 (used in Blackwell) with significantly higher per-link bandwidth, enabling tighter coupling across all 72 GPUs in the NVL72 rack and reducing communication overhead that limits scaling efficiency in large training and inference runs.
2. Next-Generation Transformer Engine
Improved support for NVFP4 precision formats enables higher throughput with lower memory bandwidth requirements, specifically tuned for the agentic and reasoning workloads driving the majority of inference growth in 2026.
3. Third-Generation Confidential Computing
The Vera Rubin NVL72 is the first rack-scale AI platform to deliver NVIDIA Confidential Computing simultaneously across the full CPU, GPU, and NVLink domain. Model weights, training data, and inference inputs are protected end-to-end — even on shared multi-tenant cloud infrastructure.
4. Second-Generation RAS Engine
The Reliability, Availability, and Serviceability engine spans GPU, CPU, and NVLink domain with real-time fault detection across every chip in the rack, automated fault tolerance, and proactive maintenance scheduling before failures cause downtime.
5. NVIDIA Inference Context Memory Storage Platform
Powered by BlueField-4 DPU with 150TB of NVMe storage, this platform introduces a dedicated AI-native KV-cache tier directly within the rack. Each GPU gains an additional 16TB of context memory, delivering:
- 5x higher tokens per second for long-context inference
- 5x better performance per TCO dollar vs software-based KV-cache management
- 5x better power efficiency for sustained long-context serving at scale
Who Is Deploying Vera Rubin First
| Provider | Deployment Notes |
|---|---|
| Microsoft Azure | NVL72 rack-scale systems at Fairwater AI superfactory sites, scaling to hundreds of thousands of Vera Rubin Superchips |
| Amazon Web Services (AWS) | Confirmed for Vera Rubin-based instances in H2 2026 |
| Google Cloud | Confirmed for Vera Rubin-based instances in H2 2026 |
| Oracle Cloud Infrastructure | Confirmed for Vera Rubin-based instances in H2 2026 |
| CoreWeave | Among the first to offer NVIDIA Rubin via CoreWeave Mission Control |
| Lambda | Confirmed NVIDIA Cloud Partner for Rubin deployments |
| Nebius | Confirmed; NVIDIA made a $2 billion investment (8.3% stake) |
| Nscale | Confirmed NVIDIA Cloud Partner for Rubin deployments |
What to Expect at GTC 2026 Beyond Vera Rubin
- Inference-focused chips with Groq technology — analysts indicate NVIDIA may announce a chip incorporating Groq's extreme low-latency inference IP, targeting agentic AI workflows where latency compounds across multi-agent chains
- NVIDIA Cosmos for physical AI and robotics — significant updates for autonomous vehicles and industrial robots, plus updates to the Alpamayo open reasoning model family
- Agentic AI infrastructure — software and networking strategy for the multi-agent orchestration era, including agent observability and memory management tooling
- Quantum Day — the first dedicated quantum computing day in GTC history, with hybrid classical-quantum computing roadmap details
- Open model ecosystem — NVIDIA's evolving role beyond hardware and expansion of the open model strategy
Why GTC 2026 and Vera Rubin Matter to Developers
- Token costs are about to fall significantly. The 10x reduction in token generation cost will flow through to cloud provider pricing in the 12–18 months following H2 2026 deployments. Workloads that are currently cost-constrained will become substantially more economical.
- Long-context inference becomes a solved infrastructure problem. The dedicated KV-cache tier via BlueField-4 addresses one of the hardest challenges in production LLM serving at scale.
- Agentic AI workloads have a purpose-built platform. The combination of the Inference Context Memory Storage Platform, updated Transformer Engine, and NVLink 6 creates an architecture specifically suited for multi-step, memory-intensive agentic systems.
- Supply constraints on Blackwell will ease. Rubin's 10x factory throughput improvement means significantly more compute capacity per unit of manufacturing time, easing the GPU supply crunch.
- Confidential computing at rack scale changes enterprise AI security. Hardware-level isolation across the full CPU, GPU, and NVLink domain removes the primary security objection for enterprises running proprietary models on shared cloud infrastructure.
"Computing has been fundamentally reshaped as a result of accelerated computing and artificial intelligence. Some ten trillion dollars or so of the last decade of computing is now being modernized to this new way of doing computing."
— Jensen Huang, NVIDIA Founder and CEO, CES 2026 Keynote
Who Was Vera Rubin: The Scientist Behind the Chip Name
Vera Florence Cooper Rubin was born in Philadelphia in 1928 and became one of the most influential observational astronomers of the twentieth century. Working at the Carnegie Institution of Washington, she spent years measuring the rotation curves of spiral galaxies. Classical physics predicted that stars at the outer edges of galaxies should orbit more slowly than stars near the center — Rubin found the opposite.
Her conclusion, confirmed across dozens of galaxies over decades of observation: galaxies must be embedded in vast halos of invisible mass now called dark matter, which contributes far more to a galaxy's gravitational field than all of its visible stars combined. Dark matter is now understood to constitute approximately 27% of the total mass-energy content of the universe.
Rubin passed away in December 2016 and did not receive the Nobel Prize during her lifetime despite repeated nominations — widely regarded as one of the Nobel committee's most significant oversights. NVIDIA naming its most powerful AI platform after her is a recognition of a scientist whose foundational discoveries changed how humanity understands the cosmos.
Final Thoughts: The Inference Era Begins at GTC 2026
The shift from AI's training era to its inference era is the defining transition happening in 2026. Vera Rubin is NVIDIA's answer to the inference era, and GTC 2026 is where that answer becomes a deployment reality.
A 10x reduction in token generation cost changes the economics of every AI application currently being built. A 5x inference performance leap means the same budget buys 5x more capacity. A 4x reduction in GPUs required for training cuts the barrier to frontier model research. And hardware-level confidential computing at rack scale removes the last major security objection to running proprietary models on shared cloud infrastructure.
Jensen Huang takes the stage at SAP Center on Monday, March 16. The keynote streams live at nvidia.com starting at 11 a.m. PT. For anyone building in AI, working in data centers, or watching the semiconductor industry, it is worth the two hours. The Vera Rubin era has begun.
💡 Strategic Insight
This isn't just technical knowledge — it's the kind of engineering thinking that separates production systems from toy projects. Apply these patterns to reduce costs, improve reliability, and ship faster.
Frequently Asked Questions
NVIDIA GTC 2026 runs from March 16 to March 19, 2026, in San Jose, California. Jensen Huang's keynote takes place at SAP Center on Monday, March 16, at 11 a.m. PT. The keynote will be livestreamed for free at nvidia.com with no registration required.
The NVIDIA Vera Rubin platform is the successor to Blackwell and NVIDIA's first extreme-codesigned six-chip AI platform. It pairs the Vera CPU (88 Olympus ARM cores, 128GB GDDR7) with the Rubin GPU (288GB HBM4), delivering up to 5x inference performance and 3.5x training performance over Blackwell at 10x lower token generation cost.
The NVIDIA Vera Rubin platform is in full production as of January 2026. Rubin-based products will be available from AWS, Microsoft Azure, Google Cloud, Oracle Cloud Infrastructure, CoreWeave, Lambda, Nebius, and Nscale in the second half of 2026.
Vera Rubin delivers up to 5x the inference performance and 3.5x the training performance of Blackwell, while increasing the transistor count by only 1.6x. Token generation cost is approximately 10x lower, and the NVL72 rack requires up to 4x fewer GPUs to train equivalent MoE models.
Jensen Huang is expected to announce updates across the full AI stack at GTC 2026, including final Vera Rubin platform details, inference-focused chips potentially incorporating Groq technology, updates on physical AI and robotics through the NVIDIA Cosmos platform, agentic AI infrastructure announcements, and NVIDIA's broader strategy for the inference era.
Vera Florence Cooper Rubin was a pioneering American astronomer whose observations provided some of the most compelling evidence for the existence of dark matter. NVIDIA named its next-generation AI computing platform after her as part of its tradition of honoring scientists who made foundational contributions to human knowledge.
Tagged with
TL;DR
- Vera Rubin delivers 5x inference and 3.5x training performance over Blackwell
- Token generation cost is approximately 10x lower than Blackwell
- 288GB of HBM4 memory per GPU — 50% more than Blackwell's HBM3e
- The NVL72 rack requires 4x fewer GPUs to train equivalent MoE models
- Dedicated KV-cache tier via BlueField-4 DPU delivers 5x better long-context inference performance
- Vera Rubin in full production since January 2026 — cloud instances arriving H2 2026 from AWS, Azure, Google Cloud, OCI, CoreWeave, Lambda, Nebius, and Nscale
Need help implementing this?
I help teams architect scalable systems, build AI-powered applications, and ship production-ready software.

Written by
Gaurav Garg
Full Stack & AI Developer · Building scalable systems
I write engineering breakdowns of major tech events, architecture deep dives, and practical guides based on real production experience. Every post is built from code, not theory.
7+
Articles
5+
Yrs Exp.
500+
Readers
Get tech breakdowns before everyone else
Engineering insights on AI, cloud, and modern architecture — delivered when it matters. No spam.
Join 500+ engineers. Unsubscribe anytime.



