Allen Elzayn

January 5, 2026 · 19 min read · — views

← Back to Blog

Nvidia's $20B Groq Deal: The Inference Wars Just Got Real

Nvidia's $20B Groq Deal: The Inference Wars Just Got Real

I watched Nvidia pay $20 billion for a company whose entire pitch is "we're faster than Nvidia."

That's not a typo. On December 24, 2025, Nvidia announced a non-exclusive licensing agreement with Groq-the startup that built chips specifically designed to beat GPUs at AI inference. The deal values Groq at 2.9× its last funding round, and it's structured in a way that screams "we're trying to avoid antitrust scrutiny."

But here's what really caught my attention: Groq's LPU (Language Processing Unit) delivers 240+ tokens per second on Llama-2 70B. That's 5-10× faster than Nvidia's own H100 GPUs for inference workloads.

So why would Nvidia pay $20 billion for technology that makes their core product look slow?

The $20 Billion Question: What Did Nvidia Actually Buy?

Let me break down what this deal actually looks like.

Deal Structure:

  • Amount: $20 billion cash
  • Type: Non-exclusive licensing agreement + acqui-hire
  • Date: December 24, 2025
  • Groq Status: Remains independent company

Key People Moving to Nvidia:

  • Jonathan Ross (Founder/CEO) - the guy who created Google's TPU
  • Sunny Madra (President)
  • Core engineering team

What Stays at Groq:

  • Simon Edwards becomes new CEO
  • GroqCloud continues operating
  • Existing customer contracts honored

The Kicker: This isn't an acquisition. Nvidia didn't buy Groq-they licensed the technology and hired the team. Groq still exists as an independent company.

Think about that: Nvidia paid $20 billion for a license, not ownership.

The Valuation Math That Made Me Do a Double-Take

Let's talk numbers, because this is where it gets interesting.

Groq's Funding History:

RoundDateAmountValuationLead Investors
Series A2017$10M~$50MSocial Capital
Series B2018$52M~$200MTiger Global
Series C2021$300M$1BTiger Global, D1 Capital
Series DAug 2024$640M$2.8BBlackRock, Neuberger Berman
Series ESep 2025$750M$6.9BDisruptive, BlackRock, Samsung, Cisco

Total Raised: ~$1.75 billion

Deal Price: $20 billion

Multiple on Last Round: 2.9×

Multiple on Total Investment: 11.4×

For context, that's:

  • More than AMD paid for Xilinx ($35B, but that was a full acquisition)
  • Equivalent to buying Discord twice
  • 33% of Nvidia's Q3 2025 revenue ($60B annualized)

But here's the real question: what's the ROI math?

The Technical Deep-Dive: Why LPUs Are Different

This is where it gets fascinating.

Groq didn't just build a faster chip-they built a fundamentally different architecture. And understanding this is key to understanding why Nvidia paid $20 billion.

The Inference Split: Prefill vs Decode

Here's the insight that explains everything. Inference isn't one workload-it's two:

Prefill Phase:

  • The "prompt" stage where the model ingests data
  • Could be 100,000 lines of code or an hour of video
  • Compute-bound: Requires massive matrix multiplication
  • GPUs are excellent at this

Decode Phase:

  • Token-by-token generation
  • Each word feeds back to predict the next
  • Memory-bandwidth bound: Data must move from memory to processor fast
  • GPUs struggle here-this is where Groq shines

As Gavin Baker (Groq investor) summarized: "Inference is disaggregating into prefill and decode."

GPU vs LPU: The Architecture War

Traditional GPU (Nvidia H100):

Memory: HBM3e (High Bandwidth Memory)
Bandwidth: 3.35 TB/s
Architecture: Parallel processing, batch-optimized
Strength: Training + Prefill
Weakness: Decode (sequential token generation)

Groq LPU:

Memory: SRAM (Static RAM, on-chip)
Bandwidth: 80 TB/s internal
Architecture: Deterministic, single-stream
Strength: Decode (ultra-fast token generation)
Weakness: Large model training, massive context

The SRAM Advantage: Michael Stewart (Microsoft M12) explains it simply: "The energy to move a bit in SRAM is like 0.1 picojoules. To move it between DRAM and the processor is 20 to 100 times worse."

SRAM is etched directly into the processor-no external memory shuttling. For token generation, this is game-changing.

The Trade-off: SRAM is expensive and limited in capacity. Groq's sweet spot is models 8 billion parameters and below-edge inference, robotics, voice, IoT devices. Not the trillion-parameter frontier models.

But that's not a small market. It's a giant segment Nvidia wasn't serving.

The Numbers That Matter

Llama-2 70B Performance:

MetricNvidia H100Groq LPUDifference
Tokens/second30-50241-3005-10× faster
Latency (first token)200-500ms<100ms2-5× faster
Power efficiency~700W~300W2.3× better
Cost per 1M tokens$0.60-1.00$0.10-0.205× cheaper

Source: Groq benchmarks, Artificial Analysis, Tom's Hardware (2025)

Why This Matters: In 2025, AI inference revenue surpassed training revenue for the first time. The market is shifting from "build the model" to "serve the model."

And Nvidia's GPUs-while dominant in training-aren't optimized for inference.

The 80 TB/s Bandwidth Advantage

Here's the technical insight that explains everything.

The Bottleneck Problem: Large language models are "memory-bound" during inference. The chip spends most of its time waiting for data, not computing.

GPU Solution: Use HBM (High Bandwidth Memory) to increase data transfer speeds. H100 achieves 3.35 TB/s.

LPU Solution: Put everything on-chip using SRAM. Groq achieves 80 TB/s internal bandwidth-24× faster than H100's memory bandwidth.

The Trade-off: SRAM is expensive and limited in capacity. Groq's chips have less total memory than GPUs. But for inference workloads where the model fits, it's dramatically faster.

Real-World Impact:

  • ChatGPT response time: 2-5 seconds (GPU)
  • Groq-powered response: <1 second (LPU)

That's not just faster-it's a different user experience.

The Antitrust Angle: Why This Deal Is Structured So Weirdly

Let me be direct: this deal is structured to avoid regulatory scrutiny.

The "Non-Exclusive License" Fiction

Bernstein analyst Stacy Rasgon told CNBC:

"Structuring the deal as a non-exclusive license may keep the fiction of competition alive."

What This Means:

  • Nvidia doesn't "own" Groq
  • Groq can still license to others (theoretically)
  • FTC/DOJ can't block it as easily as an acquisition

The Reality:

  • Nvidia hired the entire founding team
  • Nvidia has exclusive access to key IP
  • Groq without Jonathan Ross is like Apple without Steve Jobs

The Anthropic Factor: Why Nvidia Was Nervous

Here's what most people missed: Anthropic broke Nvidia's moat.

Anthropic pioneered a portable engineering approach-a software layer that allows Claude models to run across multiple AI accelerators, including Nvidia GPUs AND Google TPUs. They recently committed to accessing 1 million TPUs from Google-over a gigawatt of compute capacity.

Val Bercovici (Weka's Chief AI Officer):

"The fact that Anthropic was able to build up a software stack that could work on TPUs as well as on GPUs, I don't think that's being appreciated enough in the marketplace."

The Threat: If companies can easily run inference on Google TPUs instead of Nvidia GPUs, Nvidia's CUDA lock-in weakens. The Groq deal ensures the most performance-sensitive workloads stay within Nvidia's ecosystem.

The Microsoft/Google Playbook

This isn't new. We've seen this pattern before:

Microsoft + OpenAI:

  • $13B investment (not acquisition)
  • Exclusive cloud partnership
  • OpenAI remains "independent"

Google + Anthropic:

  • $2B investment
  • Cloud partnership
  • Anthropic remains "independent"

Nvidia + Groq:

  • $20B license (not acquisition)
  • Key team moves to Nvidia
  • Groq remains "independent"

The Pattern: Big Tech has learned that outright acquisitions trigger antitrust review. But "partnerships," "investments," and "licenses" fly under the radar.

The FTC's Dilemma

The FTC under Lina Khan has been aggressive on tech antitrust. But this deal is hard to challenge:

  1. No ownership transfer - Groq still exists
  2. Non-exclusive license - Others can license too (in theory)
  3. Talent is free to move - Can't block people from changing jobs

My Take: This is regulatory arbitrage. Nvidia gets 90% of the benefit of an acquisition with 10% of the antitrust risk.

The Jonathan Ross Factor: Why One Engineer Is Worth Billions

You can't understand this deal without understanding Jonathan Ross.

The TPU Origin Story

In 2011, Ross was a Google engineer who started the TPU as a "20% side project." By 2013, he was designing and implementing the core elements of the chip that would eventually power all of Google's AI infrastructure.

The Result: Google's Tensor Processing Unit (TPU), which:

  • Powers Google Search, YouTube, Gmail
  • Reduced inference costs by 10×
  • Gave Google a 3-5 year lead in AI infrastructure

Ross's Contribution: He originated the entire project. The guy who saw the problem before anyone else and designed the solution from scratch.

From Google to Groq

Ross left Google in 2016 and brought 8 of the original 10 TPU team members with him to found Groq. His thesis:

"TPUs are good, but they're still designed for Google's specific workloads. I can build something better for general inference."

The Groq Difference:

  • TPU: Optimized for Google's models and infrastructure
  • LPU: Optimized for any large language model inference

The Bet: Ross believed the AI market would shift from training to inference, and whoever owned the best inference chip would win.

He was right.

Why Nvidia Wanted Him

Nvidia's problem isn't GPUs-it's the future.

Current State:

  • Nvidia owns 80%+ of AI training market
  • GPUs are the standard for model development
  • Revenue: $60B+ annually

The Threat:

  • Inference is becoming bigger than training
  • GPUs aren't optimal for inference
  • Competitors (AMD, Intel, custom silicon) are catching up

The Solution: Hire the guy who invented the TPU and built the fastest inference chip in the world.

What Nvidia Gets:

  • LPU architecture knowledge
  • Inference optimization expertise
  • The engineer who sees problems 5 years before everyone else

The Price: $20 billion. Or roughly $200 million per year of Ross's career.

The Market Context: Why Inference Is the New Battleground

Let me show you why this deal makes strategic sense.

The Training vs Inference Shift

2023:

  • Training revenue: $15B
  • Inference revenue: $10B
  • Ratio: 1.5:1

2025:

  • Training revenue: $25B
  • Inference revenue: $30B
  • Ratio: 0.83:1

2028 (projected):

  • Training revenue: $40B
  • Inference revenue: $80B
  • Ratio: 0.5:1

Source: Gartner, IDC, Bank of America estimates (projections subject to market conditions)

The Insight: Training a model is a one-time cost. Inference is ongoing. As AI products scale to billions of users, inference costs dominate.

The Economics of Inference

ChatGPT Example:

  • Users: 200M+ weekly active
  • Queries per user: ~10/week
  • Total queries: 2B/week
  • Cost per query (GPU): $0.01-0.05
  • Weekly inference cost: $20-100M

Annual Inference Cost: $1-5 billion

Training Cost (GPT-4):

  • One-time: $100M-500M

The Math: Inference costs 10-50× more than training over a model's lifetime.

Nvidia's Inference Problem

Current Nvidia Inference Solutions:

  • H100: Optimized for training, okay for inference
  • L40S: Better for inference, but still GPU architecture
  • TensorRT: Software optimization (helps, but limited)

The Gap: Nvidia's best inference solution is still 5-10× slower than Groq's LPU for token generation.

The Risk: If inference becomes 80% of the market and Nvidia only has 50% share (vs 80%+ in training), that's a massive revenue hit.

The Solution: Buy the best inference technology before someone else does.

The Agentic Future: Why KV Cache Matters

Here's the timing that's not coincidental: Meta acquired agent pioneer Manus for over $2 billion on December 29, 2025-just days after the Nvidia-Groq deal.

The Statefulness Problem

If an AI agent can't remember what it did 10 steps ago, it's useless for real-world tasks. KV Cache (Key-Value Cache) is the "short-term memory" that LLMs build during inference.

Manus reported: For production-grade agents, the ratio of input tokens to output tokens can reach 100:1. For every word an agent says, it's "thinking" and "remembering" 100 others.

The Problem: If that cache gets evicted from memory, the agent loses its train of thought. The model must burn massive energy to recompute everything.

The Solution: Groq's SRAM acts as a "scratchpad" for these agents-near-instant retrieval of state. Combined with Nvidia's Dynamo framework, they're building an "inference operating system" that tiers state across SRAM, DRAM, HBM, and flash storage.

The Cluster Is Now the Computer

Thomas Jorgensen (Supermicro):

"Compute is no longer the primary bottleneck for advanced clusters. Feeding data to GPUs is the bottleneck."

This is why Nvidia is pushing disaggregated inference. Specialized storage tiers feed data at memory-class performance, while "Groq-inside" silicon handles high-speed token generation.

The Competitive Landscape: Who's Building Inference Chips?

Nvidia isn't the only one who sees this opportunity.

The Inference Chip Race

CompanyChipArchitectureStatusFunding/Valuation
GroqLPUSRAM-basedNvidia deal$20B (Nvidia deal)
CerebrasWSE-3Wafer-scaleProduction$4B valuation
SambaNovaSN40LDataflowProduction$5B valuation
GraphcoreBowIPUAcquired by SoftBank~$500M
TenstorrentWormholeRISC-VDevelopment$1B valuation
d-MatrixCorsairDigital in-memoryDevelopment$300M raised

Gavin Baker's Prediction: The Groq deal will cause all other specialized AI chips to be canceled-except Google's TPU, Tesla's AI5, and AWS's Trainium.

The Pattern: Every major AI chip startup was focused on inference. Now Nvidia owns the best one.

Why Nvidia Had to Move

Scenario 1: Nvidia doesn't buy Groq

  • Groq partners with AMD or Intel
  • Cloud providers (AWS, Azure, GCP) adopt LPUs
  • Nvidia loses inference market share
  • Revenue impact: $10-20B annually by 2028

Scenario 2: Nvidia buys Groq

  • Nvidia owns best-in-class inference technology
  • Can integrate LPU architecture into future GPUs
  • Maintains market dominance across training AND inference
  • Cost: $20B one-time

The Math: $20B to protect $10-20B in annual revenue is a no-brainer.

The Financial Analysis: Does $20B Make Sense?

Let me run the numbers that matter.

Nvidia's Financial Position

Q3 FY2026 (ended October 2025):

  • Revenue: $57B (quarterly)
  • Cash & equivalents: $60.6B
  • Market cap: ~$4.5 trillion
  • Free cash flow: $22.09B (quarterly)

Source: Nvidia Q3 FY2026 earnings

The Context: $20B is:

  • 33% of Nvidia's cash reserves
  • Less than 1 quarter of free cash flow
  • 0.4% of market cap

Translation: This is a significant but manageable expense for Nvidia. They can fund it entirely from cash without debt.

The ROI Calculation

Assumption 1: Inference Market Growth

  • 2025 inference market: $30B
  • 2028 inference market: $80B (projected)
  • Nvidia current share: ~50%
  • Nvidia target share with Groq: 70%

Revenue Impact:

Without Groq (50% share): $40B (2028)
With Groq (70% share): $56B (2028)
Incremental revenue: $16B/year

Payback Period:

Deal cost: $20B
Annual incremental revenue: $16B
Gross margin (80%): $12.8B
Payback: 1.6 years

Assumption 2: Defensive Value What if Groq went to AMD instead?

AMD + Groq Scenario:

  • AMD gains best inference technology
  • Cloud providers shift inference workloads to AMD
  • Nvidia loses 20% inference market share
  • Revenue loss: $6-10B annually

Defensive Value: Preventing $6-10B annual loss is worth $20B upfront.

The Valuation Sanity Check

Groq's Metrics (estimated):

  • 2025 revenue target: ~$500M
  • 2024 actual revenue: ~$90M
  • Growth rate: 5-6× YoY
  • Gross margin: 60-70%

Valuation Multiples:

  • Price/2025 Revenue Target: 40×
  • Price/2024 Revenue: 222×
  • Price/Last Round: 2.9×

Comparable Deals:

  • Nvidia bought Mellanox (2020): $6.9B at 7× revenue
  • AMD bought Xilinx (2022): $35B at 10× revenue
  • Intel bought Habana (2019): $2B at 20× revenue

The Verdict: At 50-100× revenue, this is expensive. But Nvidia isn't buying revenue-they're buying technology and talent.

The Integration Question: What Happens Next?

Here's where it gets speculative.

Scenario 1: The Vera Rubin Strategy

Nvidia has already announced the Vera Rubin chip family, architected specifically for the prefill/decode split:

Rubin CPX (Prefill):

  • Optimized for massive context windows (1M+ tokens)
  • Uses GDDR7 memory instead of expensive HBM
  • Cost-effective for ingesting large datasets

"Groq-inside" Silicon (Decode):

  • High-speed token generation
  • SRAM-based architecture
  • Integrated into Nvidia's inference roadmap

Timeline: 2026-2027

Impact: Nvidia builds a complete inference stack-prefill AND decode-within the CUDA ecosystem.

Scenario 2: Separate Product Lines

Nvidia keeps GPUs for training, launches LPU line for inference.

Potential Products:

  • "Nvidia Inference" chip family
  • Optimized for data center inference
  • Different pricing/positioning than GPUs

Timeline: 2026-2027

Impact: Nvidia owns both markets with specialized products.

Scenario 3: Software Integration

Nvidia uses Groq's architecture knowledge to improve software.

Potential Products:

  • Enhanced TensorRT for inference
  • Better compiler optimization
  • Improved memory management

Timeline: 2026

Impact: Existing GPUs get faster at inference through software.

My Prediction

I think we'll see all three, in this order:

  1. 2026: Software improvements (TensorRT 10)
  2. 2027: Separate inference product line
  3. 2028: Hybrid architecture in next-gen GPUs

Nvidia paid $20B for a 3-year technology roadmap.

The Risks: What Could Go Wrong

Let me be honest about the downsides.

Risk 1: Integration Failure

The Problem: Groq's architecture is fundamentally different from GPUs. Integrating the two isn't trivial.

Historical Precedent:

  • Intel bought Nervana (2016) - product never shipped
  • Intel bought Habana (2019) - limited market impact
  • Google built TPU internally - took 5+ years to mature

The Risk: Nvidia spends $20B and the technology never makes it to products.

Probability: 20%

Risk 2: Talent Departure

The Problem: Jonathan Ross and team are joining Nvidia. But will they stay?

The Pattern: Acqui-hires often see key talent leave within 2-3 years:

  • Vesting schedules complete
  • Corporate culture clash
  • Founders want to start something new

The Risk: Ross leaves Nvidia in 2028, starts a new company, and Nvidia is left with IP but no vision.

Probability: 30%

Risk 3: Market Shift

The Problem: What if inference doesn't grow as expected?

Scenarios:

  • AI hype cools, inference demand plateaus
  • New architecture makes both GPUs and LPUs obsolete
  • Open-source models reduce inference costs dramatically

The Risk: Nvidia overpaid for a market that doesn't materialize.

Probability: 15%

Risk 4: Regulatory Intervention

The Problem: FTC could still challenge the deal, even with its creative structure.

Precedent:

  • FTC blocked Nvidia/Arm ($40B, 2022)
  • FTC challenged Microsoft/Activision ($69B, 2023)

The Risk: Deal gets unwound or restricted, Nvidia loses key benefits.

Probability: 10%

What I Learned: Three Key Insights

After digging through the data, three things stand out.

1. The General-Purpose GPU Era Is Ending

VentureBeat put it bluntly: "Nvidia just admitted the general-purpose GPU era is ending."

We're entering the age of disaggregated inference architecture-silicon split into specialized types for different workloads. In 2026, "GPU strategy" stops being a purchasing decision and becomes a routing decision.

The New Questions:

  • Prefill-heavy vs decode-heavy?
  • Long-context vs short-context?
  • Interactive vs batch?
  • Small-model vs large-model?
  • Edge vs data center?

Your architecture will follow those labels.

2. Talent Is Worth More Than Technology

Nvidia didn't just buy LPU architecture-they bought Jonathan Ross.

The Math:

  • Groq's technology: Maybe worth $5-10B
  • Jonathan Ross + team: Worth the other $10-15B

Michael Stewart (Microsoft M12):

"If even the leader, even the lion of the jungle will acquire talent, will acquire technology-it's a sign that the whole market is just wanting more options."

3. The CUDA Moat Is Under Attack

Anthropic proved you can build a portable stack that runs on both GPUs and TPUs. Google is offering competitive pricing. AWS has Trainium.

The Insight: Nvidia's Groq deal isn't just about getting better technology-it's about keeping the best inference workloads inside the CUDA ecosystem before competitors steal them.

The Bottom Line: Is This a Good Deal?

Let me give you my honest assessment.

For Nvidia: Yes

The Bull Case:

  • Secures best inference technology
  • Hires the engineer who invented TPUs
  • Protects against competitive threat
  • Payback period: <2 years (if execution works)

The Bear Case:

  • Integration risk is real
  • $20B is a lot for a license
  • Talent might leave

My Take: At $20B, this is expensive but defensible. Nvidia is buying insurance against losing the inference market. Even if the technology integration fails, they've prevented AMD or Intel from getting it.

For Groq: Mixed

The Good:

  • Investors get 2.9× return in 16 months
  • Technology gets Nvidia's distribution
  • Team gets Nvidia resources

The Bad:

  • Groq loses its independence
  • Vision gets absorbed into Nvidia's roadmap
  • Startup culture dies

My Take: This is a good financial outcome but a sad ending for an innovative company. Groq could have been the next Nvidia. Now it's a division of Nvidia.

For the AI Industry: Concerning

The Pattern: Every promising AI chip startup is getting acquired:

  • Graphcore → SoftBank
  • Groq → Nvidia (effectively)
  • Who's next? Cerebras? SambaNova?

The Risk: If all the innovation gets absorbed by incumbents, we lose the competitive pressure that drives progress.

My Take: This deal is good for Nvidia shareholders but potentially bad for AI innovation long-term.

The Calculation That Summarizes Everything

Here's the math that explains this deal:

Nvidia's Position (2028 Projected):

Training market size: $40B
Inference market size: $80B

Without Groq:
- Training (80% share): $32B
- Inference (50% share): $40B
- Total: $72B

With Groq:
- Training (80% share): $32B
- Inference (70% share): $56B
- Total: $88B

Incremental value: $16B/year

The Deal Math:

Cost: $20B (one-time)
Annual incremental revenue: $16B
At 80% gross margin: $12.8B profit/year
Simple payback: ~1.6 years

Caveat: These projections assume Nvidia executes well and the inference market grows as expected. Market projections are inherently uncertain-but even at half these numbers, the deal math still works.

The Verdict: Even with conservative assumptions, this deal creates significant value for Nvidia. The question isn't whether it's worth $20B-it's whether Nvidia can execute.


Resources:

Further Reading:


Connect

Allen Elzayn

Hi, I'm Allen. I'm a System Architect exploring modern tech stacks and production architectures. You can follow me on Dev.to, see some of my work on GitHub, or read more about me.

Comments