I watched Nvidia pay $20 billion for a company whose entire pitch is "we're faster than Nvidia."
That's not a typo. On December 24, 2025, Nvidia announced a non-exclusive licensing agreement with Groq-the startup that built chips specifically designed to beat GPUs at AI inference. The deal values Groq at 2.9× its last funding round, and it's structured in a way that screams "we're trying to avoid antitrust scrutiny."
But here's what really caught my attention: Groq's LPU (Language Processing Unit) delivers 240+ tokens per second on Llama-2 70B. That's 5-10× faster than Nvidia's own H100 GPUs for inference workloads.
So why would Nvidia pay $20 billion for technology that makes their core product look slow?
The $20 Billion Question: What Did Nvidia Actually Buy?
Let me break down what this deal actually looks like.
Deal Structure:
- Amount: $20 billion cash
- Type: Non-exclusive licensing agreement + acqui-hire
- Date: December 24, 2025
- Groq Status: Remains independent company
Key People Moving to Nvidia:
- Jonathan Ross (Founder/CEO) - the guy who created Google's TPU
- Sunny Madra (President)
- Core engineering team
What Stays at Groq:
- Simon Edwards becomes new CEO
- GroqCloud continues operating
- Existing customer contracts honored
The Kicker: This isn't an acquisition. Nvidia didn't buy Groq-they licensed the technology and hired the team. Groq still exists as an independent company.
Think about that: Nvidia paid $20 billion for a license, not ownership.
The Valuation Math That Made Me Do a Double-Take
Let's talk numbers, because this is where it gets interesting.
Groq's Funding History:
| Round | Date | Amount | Valuation | Lead Investors |
|---|---|---|---|---|
| Series A | 2017 | $10M | ~$50M | Social Capital |
| Series B | 2018 | $52M | ~$200M | Tiger Global |
| Series C | 2021 | $300M | $1B | Tiger Global, D1 Capital |
| Series D | Aug 2024 | $640M | $2.8B | BlackRock, Neuberger Berman |
| Series E | Sep 2025 | $750M | $6.9B | Disruptive, BlackRock, Samsung, Cisco |
Total Raised: ~$1.75 billion
Deal Price: $20 billion
Multiple on Last Round: 2.9×
Multiple on Total Investment: 11.4×
For context, that's:
- More than AMD paid for Xilinx ($35B, but that was a full acquisition)
- Equivalent to buying Discord twice
- 33% of Nvidia's Q3 2025 revenue ($60B annualized)
But here's the real question: what's the ROI math?
The Technical Deep-Dive: Why LPUs Are Different
This is where it gets fascinating.
Groq didn't just build a faster chip-they built a fundamentally different architecture. And understanding this is key to understanding why Nvidia paid $20 billion.
The Inference Split: Prefill vs Decode
Here's the insight that explains everything. Inference isn't one workload-it's two:
Prefill Phase:
- The "prompt" stage where the model ingests data
- Could be 100,000 lines of code or an hour of video
- Compute-bound: Requires massive matrix multiplication
- GPUs are excellent at this
Decode Phase:
- Token-by-token generation
- Each word feeds back to predict the next
- Memory-bandwidth bound: Data must move from memory to processor fast
- GPUs struggle here-this is where Groq shines
As Gavin Baker (Groq investor) summarized: "Inference is disaggregating into prefill and decode."
GPU vs LPU: The Architecture War
Traditional GPU (Nvidia H100):
Memory: HBM3e (High Bandwidth Memory)
Bandwidth: 3.35 TB/s
Architecture: Parallel processing, batch-optimized
Strength: Training + Prefill
Weakness: Decode (sequential token generation)
Groq LPU:
Memory: SRAM (Static RAM, on-chip)
Bandwidth: 80 TB/s internal
Architecture: Deterministic, single-stream
Strength: Decode (ultra-fast token generation)
Weakness: Large model training, massive context
The SRAM Advantage: Michael Stewart (Microsoft M12) explains it simply: "The energy to move a bit in SRAM is like 0.1 picojoules. To move it between DRAM and the processor is 20 to 100 times worse."
SRAM is etched directly into the processor-no external memory shuttling. For token generation, this is game-changing.
The Trade-off: SRAM is expensive and limited in capacity. Groq's sweet spot is models 8 billion parameters and below-edge inference, robotics, voice, IoT devices. Not the trillion-parameter frontier models.
But that's not a small market. It's a giant segment Nvidia wasn't serving.
The Numbers That Matter
Llama-2 70B Performance:
| Metric | Nvidia H100 | Groq LPU | Difference |
|---|---|---|---|
| Tokens/second | 30-50 | 241-300 | 5-10× faster |
| Latency (first token) | 200-500ms | <100ms | 2-5× faster |
| Power efficiency | ~700W | ~300W | 2.3× better |
| Cost per 1M tokens | $0.60-1.00 | $0.10-0.20 | 5× cheaper |
Source: Groq benchmarks, Artificial Analysis, Tom's Hardware (2025)
Why This Matters: In 2025, AI inference revenue surpassed training revenue for the first time. The market is shifting from "build the model" to "serve the model."
And Nvidia's GPUs-while dominant in training-aren't optimized for inference.
The 80 TB/s Bandwidth Advantage
Here's the technical insight that explains everything.
The Bottleneck Problem: Large language models are "memory-bound" during inference. The chip spends most of its time waiting for data, not computing.
GPU Solution: Use HBM (High Bandwidth Memory) to increase data transfer speeds. H100 achieves 3.35 TB/s.
LPU Solution: Put everything on-chip using SRAM. Groq achieves 80 TB/s internal bandwidth-24× faster than H100's memory bandwidth.
The Trade-off: SRAM is expensive and limited in capacity. Groq's chips have less total memory than GPUs. But for inference workloads where the model fits, it's dramatically faster.
Real-World Impact:
- ChatGPT response time: 2-5 seconds (GPU)
- Groq-powered response: <1 second (LPU)
That's not just faster-it's a different user experience.
The Antitrust Angle: Why This Deal Is Structured So Weirdly
Let me be direct: this deal is structured to avoid regulatory scrutiny.
The "Non-Exclusive License" Fiction
Bernstein analyst Stacy Rasgon told CNBC:
"Structuring the deal as a non-exclusive license may keep the fiction of competition alive."
What This Means:
- Nvidia doesn't "own" Groq
- Groq can still license to others (theoretically)
- FTC/DOJ can't block it as easily as an acquisition
The Reality:
- Nvidia hired the entire founding team
- Nvidia has exclusive access to key IP
- Groq without Jonathan Ross is like Apple without Steve Jobs
The Anthropic Factor: Why Nvidia Was Nervous
Here's what most people missed: Anthropic broke Nvidia's moat.
Anthropic pioneered a portable engineering approach-a software layer that allows Claude models to run across multiple AI accelerators, including Nvidia GPUs AND Google TPUs. They recently committed to accessing 1 million TPUs from Google-over a gigawatt of compute capacity.
Val Bercovici (Weka's Chief AI Officer):
"The fact that Anthropic was able to build up a software stack that could work on TPUs as well as on GPUs, I don't think that's being appreciated enough in the marketplace."
The Threat: If companies can easily run inference on Google TPUs instead of Nvidia GPUs, Nvidia's CUDA lock-in weakens. The Groq deal ensures the most performance-sensitive workloads stay within Nvidia's ecosystem.
The Microsoft/Google Playbook
This isn't new. We've seen this pattern before:
Microsoft + OpenAI:
- $13B investment (not acquisition)
- Exclusive cloud partnership
- OpenAI remains "independent"
Google + Anthropic:
- $2B investment
- Cloud partnership
- Anthropic remains "independent"
Nvidia + Groq:
- $20B license (not acquisition)
- Key team moves to Nvidia
- Groq remains "independent"
The Pattern: Big Tech has learned that outright acquisitions trigger antitrust review. But "partnerships," "investments," and "licenses" fly under the radar.
The FTC's Dilemma
The FTC under Lina Khan has been aggressive on tech antitrust. But this deal is hard to challenge:
- No ownership transfer - Groq still exists
- Non-exclusive license - Others can license too (in theory)
- Talent is free to move - Can't block people from changing jobs
My Take: This is regulatory arbitrage. Nvidia gets 90% of the benefit of an acquisition with 10% of the antitrust risk.
The Jonathan Ross Factor: Why One Engineer Is Worth Billions
You can't understand this deal without understanding Jonathan Ross.
The TPU Origin Story
In 2011, Ross was a Google engineer who started the TPU as a "20% side project." By 2013, he was designing and implementing the core elements of the chip that would eventually power all of Google's AI infrastructure.
The Result: Google's Tensor Processing Unit (TPU), which:
- Powers Google Search, YouTube, Gmail
- Reduced inference costs by 10×
- Gave Google a 3-5 year lead in AI infrastructure
Ross's Contribution: He originated the entire project. The guy who saw the problem before anyone else and designed the solution from scratch.
From Google to Groq
Ross left Google in 2016 and brought 8 of the original 10 TPU team members with him to found Groq. His thesis:
"TPUs are good, but they're still designed for Google's specific workloads. I can build something better for general inference."
The Groq Difference:
- TPU: Optimized for Google's models and infrastructure
- LPU: Optimized for any large language model inference
The Bet: Ross believed the AI market would shift from training to inference, and whoever owned the best inference chip would win.
He was right.
Why Nvidia Wanted Him
Nvidia's problem isn't GPUs-it's the future.
Current State:
- Nvidia owns 80%+ of AI training market
- GPUs are the standard for model development
- Revenue: $60B+ annually
The Threat:
- Inference is becoming bigger than training
- GPUs aren't optimal for inference
- Competitors (AMD, Intel, custom silicon) are catching up
The Solution: Hire the guy who invented the TPU and built the fastest inference chip in the world.
What Nvidia Gets:
- LPU architecture knowledge
- Inference optimization expertise
- The engineer who sees problems 5 years before everyone else
The Price: $20 billion. Or roughly $200 million per year of Ross's career.
The Market Context: Why Inference Is the New Battleground
Let me show you why this deal makes strategic sense.
The Training vs Inference Shift
2023:
- Training revenue: $15B
- Inference revenue: $10B
- Ratio: 1.5:1
2025:
- Training revenue: $25B
- Inference revenue: $30B
- Ratio: 0.83:1
2028 (projected):
- Training revenue: $40B
- Inference revenue: $80B
- Ratio: 0.5:1
Source: Gartner, IDC, Bank of America estimates (projections subject to market conditions)
The Insight: Training a model is a one-time cost. Inference is ongoing. As AI products scale to billions of users, inference costs dominate.
The Economics of Inference
ChatGPT Example:
- Users: 200M+ weekly active
- Queries per user: ~10/week
- Total queries: 2B/week
- Cost per query (GPU): $0.01-0.05
- Weekly inference cost: $20-100M
Annual Inference Cost: $1-5 billion
Training Cost (GPT-4):
- One-time: $100M-500M
The Math: Inference costs 10-50× more than training over a model's lifetime.
Nvidia's Inference Problem
Current Nvidia Inference Solutions:
- H100: Optimized for training, okay for inference
- L40S: Better for inference, but still GPU architecture
- TensorRT: Software optimization (helps, but limited)
The Gap: Nvidia's best inference solution is still 5-10× slower than Groq's LPU for token generation.
The Risk: If inference becomes 80% of the market and Nvidia only has 50% share (vs 80%+ in training), that's a massive revenue hit.
The Solution: Buy the best inference technology before someone else does.
The Agentic Future: Why KV Cache Matters
Here's the timing that's not coincidental: Meta acquired agent pioneer Manus for over $2 billion on December 29, 2025-just days after the Nvidia-Groq deal.
The Statefulness Problem
If an AI agent can't remember what it did 10 steps ago, it's useless for real-world tasks. KV Cache (Key-Value Cache) is the "short-term memory" that LLMs build during inference.
Manus reported: For production-grade agents, the ratio of input tokens to output tokens can reach 100:1. For every word an agent says, it's "thinking" and "remembering" 100 others.
The Problem: If that cache gets evicted from memory, the agent loses its train of thought. The model must burn massive energy to recompute everything.
The Solution: Groq's SRAM acts as a "scratchpad" for these agents-near-instant retrieval of state. Combined with Nvidia's Dynamo framework, they're building an "inference operating system" that tiers state across SRAM, DRAM, HBM, and flash storage.
The Cluster Is Now the Computer
Thomas Jorgensen (Supermicro):
"Compute is no longer the primary bottleneck for advanced clusters. Feeding data to GPUs is the bottleneck."
This is why Nvidia is pushing disaggregated inference. Specialized storage tiers feed data at memory-class performance, while "Groq-inside" silicon handles high-speed token generation.
The Competitive Landscape: Who's Building Inference Chips?
Nvidia isn't the only one who sees this opportunity.
The Inference Chip Race
| Company | Chip | Architecture | Status | Funding/Valuation |
|---|---|---|---|---|
| Groq | LPU | SRAM-based | Nvidia deal | $20B (Nvidia deal) |
| Cerebras | WSE-3 | Wafer-scale | Production | $4B valuation |
| SambaNova | SN40L | Dataflow | Production | $5B valuation |
| Graphcore | Bow | IPU | Acquired by SoftBank | ~$500M |
| Tenstorrent | Wormhole | RISC-V | Development | $1B valuation |
| d-Matrix | Corsair | Digital in-memory | Development | $300M raised |
Gavin Baker's Prediction: The Groq deal will cause all other specialized AI chips to be canceled-except Google's TPU, Tesla's AI5, and AWS's Trainium.
The Pattern: Every major AI chip startup was focused on inference. Now Nvidia owns the best one.
Why Nvidia Had to Move
Scenario 1: Nvidia doesn't buy Groq
- Groq partners with AMD or Intel
- Cloud providers (AWS, Azure, GCP) adopt LPUs
- Nvidia loses inference market share
- Revenue impact: $10-20B annually by 2028
Scenario 2: Nvidia buys Groq
- Nvidia owns best-in-class inference technology
- Can integrate LPU architecture into future GPUs
- Maintains market dominance across training AND inference
- Cost: $20B one-time
The Math: $20B to protect $10-20B in annual revenue is a no-brainer.
The Financial Analysis: Does $20B Make Sense?
Let me run the numbers that matter.
Nvidia's Financial Position
Q3 FY2026 (ended October 2025):
- Revenue: $57B (quarterly)
- Cash & equivalents: $60.6B
- Market cap: ~$4.5 trillion
- Free cash flow: $22.09B (quarterly)
Source: Nvidia Q3 FY2026 earnings
The Context: $20B is:
- 33% of Nvidia's cash reserves
- Less than 1 quarter of free cash flow
- 0.4% of market cap
Translation: This is a significant but manageable expense for Nvidia. They can fund it entirely from cash without debt.
The ROI Calculation
Assumption 1: Inference Market Growth
- 2025 inference market: $30B
- 2028 inference market: $80B (projected)
- Nvidia current share: ~50%
- Nvidia target share with Groq: 70%
Revenue Impact:
Without Groq (50% share): $40B (2028)
With Groq (70% share): $56B (2028)
Incremental revenue: $16B/year
Payback Period:
Deal cost: $20B
Annual incremental revenue: $16B
Gross margin (80%): $12.8B
Payback: 1.6 years
Assumption 2: Defensive Value What if Groq went to AMD instead?
AMD + Groq Scenario:
- AMD gains best inference technology
- Cloud providers shift inference workloads to AMD
- Nvidia loses 20% inference market share
- Revenue loss: $6-10B annually
Defensive Value: Preventing $6-10B annual loss is worth $20B upfront.
The Valuation Sanity Check
Groq's Metrics (estimated):
- 2025 revenue target: ~$500M
- 2024 actual revenue: ~$90M
- Growth rate: 5-6× YoY
- Gross margin: 60-70%
Valuation Multiples:
- Price/2025 Revenue Target: 40×
- Price/2024 Revenue: 222×
- Price/Last Round: 2.9×
Comparable Deals:
- Nvidia bought Mellanox (2020): $6.9B at 7× revenue
- AMD bought Xilinx (2022): $35B at 10× revenue
- Intel bought Habana (2019): $2B at 20× revenue
The Verdict: At 50-100× revenue, this is expensive. But Nvidia isn't buying revenue-they're buying technology and talent.
The Integration Question: What Happens Next?
Here's where it gets speculative.
Scenario 1: The Vera Rubin Strategy
Nvidia has already announced the Vera Rubin chip family, architected specifically for the prefill/decode split:
Rubin CPX (Prefill):
- Optimized for massive context windows (1M+ tokens)
- Uses GDDR7 memory instead of expensive HBM
- Cost-effective for ingesting large datasets
"Groq-inside" Silicon (Decode):
- High-speed token generation
- SRAM-based architecture
- Integrated into Nvidia's inference roadmap
Timeline: 2026-2027
Impact: Nvidia builds a complete inference stack-prefill AND decode-within the CUDA ecosystem.
Scenario 2: Separate Product Lines
Nvidia keeps GPUs for training, launches LPU line for inference.
Potential Products:
- "Nvidia Inference" chip family
- Optimized for data center inference
- Different pricing/positioning than GPUs
Timeline: 2026-2027
Impact: Nvidia owns both markets with specialized products.
Scenario 3: Software Integration
Nvidia uses Groq's architecture knowledge to improve software.
Potential Products:
- Enhanced TensorRT for inference
- Better compiler optimization
- Improved memory management
Timeline: 2026
Impact: Existing GPUs get faster at inference through software.
My Prediction
I think we'll see all three, in this order:
- 2026: Software improvements (TensorRT 10)
- 2027: Separate inference product line
- 2028: Hybrid architecture in next-gen GPUs
Nvidia paid $20B for a 3-year technology roadmap.
The Risks: What Could Go Wrong
Let me be honest about the downsides.
Risk 1: Integration Failure
The Problem: Groq's architecture is fundamentally different from GPUs. Integrating the two isn't trivial.
Historical Precedent:
- Intel bought Nervana (2016) - product never shipped
- Intel bought Habana (2019) - limited market impact
- Google built TPU internally - took 5+ years to mature
The Risk: Nvidia spends $20B and the technology never makes it to products.
Probability: 20%
Risk 2: Talent Departure
The Problem: Jonathan Ross and team are joining Nvidia. But will they stay?
The Pattern: Acqui-hires often see key talent leave within 2-3 years:
- Vesting schedules complete
- Corporate culture clash
- Founders want to start something new
The Risk: Ross leaves Nvidia in 2028, starts a new company, and Nvidia is left with IP but no vision.
Probability: 30%
Risk 3: Market Shift
The Problem: What if inference doesn't grow as expected?
Scenarios:
- AI hype cools, inference demand plateaus
- New architecture makes both GPUs and LPUs obsolete
- Open-source models reduce inference costs dramatically
The Risk: Nvidia overpaid for a market that doesn't materialize.
Probability: 15%
Risk 4: Regulatory Intervention
The Problem: FTC could still challenge the deal, even with its creative structure.
Precedent:
- FTC blocked Nvidia/Arm ($40B, 2022)
- FTC challenged Microsoft/Activision ($69B, 2023)
The Risk: Deal gets unwound or restricted, Nvidia loses key benefits.
Probability: 10%
What I Learned: Three Key Insights
After digging through the data, three things stand out.
1. The General-Purpose GPU Era Is Ending
VentureBeat put it bluntly: "Nvidia just admitted the general-purpose GPU era is ending."
We're entering the age of disaggregated inference architecture-silicon split into specialized types for different workloads. In 2026, "GPU strategy" stops being a purchasing decision and becomes a routing decision.
The New Questions:
- Prefill-heavy vs decode-heavy?
- Long-context vs short-context?
- Interactive vs batch?
- Small-model vs large-model?
- Edge vs data center?
Your architecture will follow those labels.
2. Talent Is Worth More Than Technology
Nvidia didn't just buy LPU architecture-they bought Jonathan Ross.
The Math:
- Groq's technology: Maybe worth $5-10B
- Jonathan Ross + team: Worth the other $10-15B
Michael Stewart (Microsoft M12):
"If even the leader, even the lion of the jungle will acquire talent, will acquire technology-it's a sign that the whole market is just wanting more options."
3. The CUDA Moat Is Under Attack
Anthropic proved you can build a portable stack that runs on both GPUs and TPUs. Google is offering competitive pricing. AWS has Trainium.
The Insight: Nvidia's Groq deal isn't just about getting better technology-it's about keeping the best inference workloads inside the CUDA ecosystem before competitors steal them.
The Bottom Line: Is This a Good Deal?
Let me give you my honest assessment.
For Nvidia: Yes
The Bull Case:
- Secures best inference technology
- Hires the engineer who invented TPUs
- Protects against competitive threat
- Payback period: <2 years (if execution works)
The Bear Case:
- Integration risk is real
- $20B is a lot for a license
- Talent might leave
My Take: At $20B, this is expensive but defensible. Nvidia is buying insurance against losing the inference market. Even if the technology integration fails, they've prevented AMD or Intel from getting it.
For Groq: Mixed
The Good:
- Investors get 2.9× return in 16 months
- Technology gets Nvidia's distribution
- Team gets Nvidia resources
The Bad:
- Groq loses its independence
- Vision gets absorbed into Nvidia's roadmap
- Startup culture dies
My Take: This is a good financial outcome but a sad ending for an innovative company. Groq could have been the next Nvidia. Now it's a division of Nvidia.
For the AI Industry: Concerning
The Pattern: Every promising AI chip startup is getting acquired:
- Graphcore → SoftBank
- Groq → Nvidia (effectively)
- Who's next? Cerebras? SambaNova?
The Risk: If all the innovation gets absorbed by incumbents, we lose the competitive pressure that drives progress.
My Take: This deal is good for Nvidia shareholders but potentially bad for AI innovation long-term.
The Calculation That Summarizes Everything
Here's the math that explains this deal:
Nvidia's Position (2028 Projected):
Training market size: $40B
Inference market size: $80B
Without Groq:
- Training (80% share): $32B
- Inference (50% share): $40B
- Total: $72B
With Groq:
- Training (80% share): $32B
- Inference (70% share): $56B
- Total: $88B
Incremental value: $16B/year
The Deal Math:
Cost: $20B (one-time)
Annual incremental revenue: $16B
At 80% gross margin: $12.8B profit/year
Simple payback: ~1.6 years
Caveat: These projections assume Nvidia executes well and the inference market grows as expected. Market projections are inherently uncertain-but even at half these numbers, the deal math still works.
The Verdict: Even with conservative assumptions, this deal creates significant value for Nvidia. The question isn't whether it's worth $20B-it's whether Nvidia can execute.
Resources:
- Groq Official Announcement: Non-Exclusive Licensing Agreement - Groq Newsroom
- Nvidia buying AI chip startup Groq for about $20 billion - CNBC
- Nvidia just admitted the general-purpose GPU era is ending - VentureBeat
- Nvidia acquires AI chip challenger Groq for $20B - TechCrunch
- Nvidia: $20B Groq Deal, Valuation, And More - Seeking Alpha
- AI inference market surpasses training in 2025 - Deloitte
Further Reading:
- Meta's $75B AI Infrastructure Bet: Inside the Biggest Cloud Deals of 2025
- DeepSeek-OCR: When a Picture Is Worth 10× Fewer Tokens
- Building a Distributed Cron System That Scales to 1000+ Users
Connect
- GitHub: @0xReLogic
- LinkedIn: Allen Elzayn
Comments