The AI Realist

What to Buy for Local LLMs (April 2026)

Julien Simon — Fri, 03 Apr 2026 15:55:00 GMT

I just published a piece on why NVIDIA’s product segmentation created this market.[1] This is the practical companion. No thesis, no structural argument — just what works, what doesn’t, and what it costs. All prices in USD as of April 2026; EUR and GBP prices are roughly comparable at current exchange rates. I’ll update this guide quarterly.

A GPU is not a computer. Every NVIDIA recommendation below assumes a workstation around it. Prebuilt single-GPU RTX 5090 systems run $5,000 to $8,000 complete — often cheaper than the GPU alone at street prices.[2] Professional workstations cost more. Dual-GPU builds start around $7,600. Apple and AMD mini PCs are complete systems; NVIDIA GPUs are not. I’ll state total system costs throughout.

Software. For inference: Ollama or LM Studio on Apple Silicon (both wrap llama.cpp’s Metal backend; Ollama is adding MLX). Ollama or llama.cpp with CUDA on NVIDIA single-GPU. vLLM for multi-GPU serving. For training: Unsloth (built on Hugging Face’s TRL and PEFT ecosystem) on CUDA; mlx-lm for LoRA, QLoRA, and full fine-tuning on Apple Silicon. Download GGUF models from Hugging Face Hub for Ollama and llama.cpp; safetensors for vLLM and transformers. All model sizes in this guide assume Q4_K_M quantization — the standard, quality-optimized 4-bit format — unless otherwise noted.

Inference

Inference is memory-bandwidth-bound. The hardware that generates tokens fastest is the hardware that reads model weights from memory fastest. Capacity determines which models you can run. Bandwidth determines how fast they run.

Under 30B: RTX 5090

Nothing touches it. 32 gigabytes of GDDR7 at 1,792 GB/s.[3] A dense 30B model fits and runs at 60 to 90 tokens per second at short to moderate context — the bandwidth ceiling for 32B Q4 decode is about 94 tok/s. Speeds drop at longer context as KV cache competes for bandwidth. MoE architectures are dramatically faster: Hardware Corner measured a 30B MoE at 234 tok/s because only 3B active parameters are read per token.[3] If your workload fits in 32 gigabytes, buy this and stop reading. At long context (64K+), KV cache grows fast — verify that model weights plus KV cache fit before committing. $3,500 to $4,800 at current street prices; the DRAM shortage has made these hard to find. System cost: $5,000 to $8,000.[4] If 24GB is enough for your models, a used RTX 4090 at $1,500 to $2,200 remains the best-value NVIDIA card — 1,008 GB/s bandwidth, mature CUDA support, and a total system cost under $4,000.

70B: Mac Studio M4 Max

The 70B sweet spot. 128 gigabytes of unified memory at 546 GB/s.[5] A Q4 Llama 3.3 70B runs at 8 to 15 tokens per second — closer to 15 at short context, dropping toward 8 at longer conversations. Speculative decoding with a small draft model can roughly double effective throughput in llama.cpp, though results depend on how well the draft model matches the target. $3,499 with 512GB SSD, $3,699 with 1TB (the practical minimum for storing multiple large models).[6] Complete system — plug in power and a display, and you’re running. The M3 Ultra (819 GB/s, starting at $3,999 for 96GB) is faster per token, but the M4 Max is the value pick at this tier. Most practitioners use llama.cpp’s Metal backend via Ollama or LM Studio. Ollama is transitioning to an MLX backend, with a preview showing 57% faster prefill and 93% faster generation on supported models.[7]

70B on a budget: Mac mini cluster

Four Mac Mini M4 Pro units (48GB each) connected via Thunderbolt 5 pool 192 gigabytes of shared memory for $6,400 to $7,200.[8] EXO Labs demonstrated Nemotron 70B at 4-8 tokens per second and Qwen2.5Coder-32B at 18 tokens per second on M4 Pro clusters.[9] The entire cluster draws about 200 watts under full load — less than a single RTX 5090. The catch: you need direct Thunderbolt 5 cable connections between nodes (no TB5 switches exist yet), and inter-node latency makes this better suited for batch inference than interactive chat. macOS 26.2’s RDMA support drops inter-node latency from about 300 microseconds to under 50, but that’s still orders of magnitude slower than on-chip memory access.[10] If your budget is $7,000 and you want 70B inference with room to grow, a cluster is viable. For the best single-machine experience at 70B, the Mac Studio M4 Max at $3,699 is still the answer.

70B with CUDA: RTX PRO 6000

The CUDA answer to the 70B tier. 96 gigabytes of GDDR7 at 1.8 TB/s — nearly identical bandwidth to the RTX 5090 but with three times the VRAM.[11] A 70B Q4 model fits on a single card with over 50GB of headroom for long context and concurrent users. For team serving (4+ users via vLLM), that headroom matters — each concurrent user at 8K context adds 2-4 gigabytes of KV cache.

No NVLink — dual-card setups communicate over PCIe Gen 5. A dual PRO 6000 gives you 192GB total for running 70B in FP16 or fitting very large models, but the PCIe interconnect creates the same bottleneck as dual 5090s for cross-GPU workloads. A single-card PRO 6000 avoids the bottleneck entirely and handles 70B Q4 with room to spare.

A complete single-GPU professional workstation runs about $22,000; a dual-GPU one, about $30,000 to $33,000.[12] At these prices, the honest comparison is a year of B200 cloud time. Buy a PRO 6000 if you need always-on 96GB CUDA locally — for team inference, compliance-constrained training, or workflows where cloud latency or data residency rules it out.

Multi-GPU NVIDIA (no NVLink)

Dual RTX 5090 (64GB). Two RTX 5090s give you 64 gigabytes of VRAM and access to vLLM’s tensor parallelism over PCIe.[13] NVLink was last available on the RTX 3090; the two GPUs communicate over PCIe x8/x8, a bottleneck for large models. A 70B Q4 model fits in 64GB but runs at a pace comparable to or slower than a single Mac Studio M4 Max — per-layer PCIe synchronization overhead eats up the raw bandwidth advantage. Where dual 5090s shine is inference on 30 to 40B models that benefit from parallelism, or training (see below). System cost: $9,000 to $12,000, or $7,600 for prebuilt GPUs at list price.[14]

200B+: Mac Studio M3 Ultra

Still the current Ultra — Apple skipped the M4 generation. 256 gigabytes at 819 GB/s. Llama 3.1 405B fits in Q4 (~235 GB). DeepSeek V3 671B fits only at aggressive quantization (1.5-2-bit dynamic quants via Unsloth, ~192-226GB) — functional but with measurable quality loss.[15] About $5,999 on the base chip with 1TB SSD. The M5 Ultra is expected mid-2026 with potentially 1,200+ GB/s bandwidth — if you can wait two to three months, wait.[16] Jeff Geerling tested a four-unit M3 Ultra cluster connected via Thunderbolt 5 RDMA, pooling 1.5 terabytes of unified memory and running large MoE models at 28 to 32 tokens per second.[17] macOS 26.2 enables RDMA natively, though clusters max out at four units in a full mesh (no TB5 switches).[18] Apple recently removed the 512GB option and raised the 256GB upgrade price from $1,600 to $2,000 — a signal of a DRAM shortage.[19]

Budget and niche

AMD Strix Halo. 128 gigabytes for $2,000 in a mini PC.[20] Bandwidth is lower (212 GB/s measured), which makes dense 70B models painfully slow at 3 to 5 tokens per second. But Mixture-of-Experts models change the math: Llama 4 Scout (109B total, 17B active MoE) manages an estimated 10 to 20 tokens per second.[21] Vulkan via llama.cpp now outperforms AMD’s own ROCm on Strix Halo.[22] If you’re on a budget and your workloads are MoE-heavy, this is the most memory per dollar you can buy.

DGX Spark. 128 gigabytes of LPDDR5x, 273 GB/s, $4,699.[23] Hard to recommend for most practitioners. For inference, a Mac Studio M4 Max delivers twice the bandwidth at a lower price. For training, a PRO 6000 is faster, and the cloud is cheaper — $4,699 buys over 900 hours of B200 time. The Spark’s only defensible use case is always-on, locally 128GB of CUDA when the cloud is not an option (air-gapped environments, compliance constraints, or workflows that require continuous local iteration at 70B+ model scales). The EXO Labs hybrid setup (Spark for prefill, Mac Studio for decode) showed a 2.8× speedup on an 8B model, but 70B+ results have not been published.[24]

Coming soon

The Mac Studio M5 Ultra is expected in mid-2026, potentially with 1,200+ GB/s and up to 256GB.[16] AMD’s Strix Point is also expected to be released late 2026 with improved bandwidth. The rumored RTX 5090 Super with 48GB GDDR7 would change the NVIDIA story at the 70B tier — but the DRAM shortage makes a 2026 launch unlikely.

Training: Supervised Fine-Tuning (SFT)

SFT — training a model on input-output pairs to follow instructions, adopt a style, or learn a domain — is the most common local training task. Memory scales with model size, quantization, and method: full fine-tuning loads the entire model and its optimizer states; LoRA freezes most weights and trains small adapter layers; QLoRA additionally quantizes the frozen weights to 4-bit — cutting VRAM by 60-80%.

8B to 40B: RTX 5090

QLoRA fine-tuning of an 8B model takes 7 to 16 gigabytes of VRAM with Unsloth depending on LoRA rank, context length, and batch size — 7GB at rank 16 with short context, 14 to 16GB at rank 64 with 8K context.[25] Unsloth’s “full fine-tuning” mode — all parameters trained, but base weights stored in 4-bit — uses 20 to 24 gigabytes, which is workable on 32GB.[25] Traditional FP16 full fine-tuning of 8B (model + AdamW optimizer states + gradients) needs 48 to 64 gigabytes and does not fit on a single 5090. Unsloth’s Blackwell-optimized kernels deliver about 2× the training speed of standard implementations.[26] System cost: $5,000 to $8,000.

NVIDIA’s own benchmarks show QLoRA fine-tuning of models up to 40B parameters on a single RTX 5090.[27] Full SFT of 40B does not fit in 32GB. This is the ceiling for single-GPU consumer SFT.

50B to 70B: PRO 6000, dual 5090, or cloud

A single PRO 6000 with 96 gigabytes can QLoRA a 70B model at about 38 gigabytes peak VRAM — about 4 hours for a standard fine-tune.[28] The DGX Spark’s 128GB also handles 70B QLoRA, though lower bandwidth makes it 30-50% slower, and at $4,699, a cloud GPU is cheaper unless you need to stay local. Full SFT of 70B requires about 300GB total (model weights, AdamW optimizer states, and gradients) — cloud only (2× H100 80GB with DeepSpeed ZeRO-3, or a single B200).[29] If you don’t own a PRO 6000 (a complete professional workstation runs about $22,000), renting a cloud GPU for a few hours is cheaper for occasional fine-tuning.

Two 5090s (64GB combined) with DeepSpeed ZeRO can train QLoRA models up to 50-60B — beyond the single-GPU ceiling but limited by PCIe interconnect overhead. Not practical for full SFT of 70B (optimizer states don’t fit). System cost: $9,000 to $12,000.[30]

Apple Silicon and AMD

MLX LoRA. It works. mlx-lm supports LoRA and QLoRA natively.[31] mlx-tune adds an Unsloth-compatible training API on top of MLX, supporting SFT, DPO, GRPO, and multi-modal fine-tuning, letting you prototype locally on Apple Silicon before scaling to cloud GPUs. The 128 to 256 gigabytes of unified memory on a Mac Studio lets you load larger SFT models than any consumer NVIDIA card. The ecosystem is thinner: no Unsloth (yet — “coming very soon”), no DeepSpeed. If your workflow is LoRA on a custom dataset, MLX handles it well. If you need GRPO, DPO, or the latest training techniques, you need CUDA.[32]

AMD. Functional but not recommended as your primary path. ROCm supports PyTorch training, and Unsloth offers AMD compatibility through its Core library. The driver-kernel maturity gap means more debugging than training.[33]

Training: Reinforcement Learning (GRPO, DPO)

RL fine-tuning is harder on hardware than SFT. GRPO (the technique behind DeepSeek R1) generates multiple completions per prompt, scores them, and updates the policy model — requiring 1.5 to 2× the memory of equivalent SFT because the model must hold both the policy weights and the generated sequences simultaneously. DPO loads a quantized reference model alongside the training model, adding 2-4 gigabytes to an 8B model with modern implementations. Both require CUDA for production-quality training as of April 2026 — TRL’s DPO trainer technically runs on any PyTorch backend, including ROCm, but optimization and stability are not there yet for serious workloads.

8B to 30B: RTX 5090 or cloud

GRPO on an 8B model via Unsloth uses 14-18 gigabytes — well within the 5090’s 32GB.[34] DPO on 8B is similar. This is the entry point for local RL. System cost: $5,000 to $8,000.

GRPO on a 14B model pushes into 22-28 gigabytes, leaving the 5090 with little headroom for longer sequences. A 30B GRPO run may not fit at all depending on sequence length and batch size. The DGX Spark’s 128GB handles 30B GRPO with room to spare — but a cloud B200 does it faster and cheaper unless you’re running these jobs frequently enough to justify the $4,699.[35]

70B: PRO 6000 (marginal) or cloud

GRPO or DPO on 70B needs 80-100 gigabytes for the policy model, generated sequences, and optimizer states. No consumer device handles this. A single PRO 6000 (96GB) may fit 70B GRPO at the lower end of that range, but has no headroom — and at $22,000+ for the workstation, cloud is almost always the better answer. A dual PRO 6000 over PCIe gives you 192GB but adds interconnect overhead, and the price tag becomes astronomical. A B200 (192GB) or 2× H100 (160GB combined) handle it cleanly. Budget $15 to $50 per GRPO run, depending on provider pricing ($3 to $6 per GPU-hour).[36]

Coming soon for training

Unsloth lists “MLX training coming very soon” as of March 2026 — if this ships, Apple Silicon gains GRPO and SFT through Unsloth’s optimized kernels, narrowing the CUDA gap significantly for parameter-efficient methods.[37] On the NVIDIA side, the PRO 6000 with 96GB remains the local SFT limit; for RL beyond 8B, most practitioners are better served by the cloud.

Cloud

For workloads that exceed local hardware — 70B+ full fine-tuning, 70B RL, multi-GPU distributed training, or high-concurrency production inference — the B200 is the default. The discipline that makes cloud work: don’t leave it idle, and don’t use it for debugging.

B200 for training and inference. 192GB HBM3e at 8,000 GB/s, with NVLink for multi-GPU scaling. A single B200 handles 70B QLoRA and 70B GRPO; an 8-GPU node handles 70B full SFT. For inference, the B200 delivers up to 4.9× the throughput of a PRO 6000 and wins on cost-per-token despite the higher hourly rate.[38] Pricing: $3 to $6 per GPU-hour on neo-cloud providers, higher on hyperscalers.

The workflow that saves money: get everything ready locally — code, data, configuration, hyperparameters tested on a small model — then ship the job to the cloud. A $5/hour B200 running for 4 focused hours costs $20. The same B200 left idle overnight while you debug a data loading issue costs $120. The difference between a $20 fine-tune and a $500 fine-tune is almost entirely local preparation.[39]

H100 and H200 remain viable. H100 at $2.50 to $3.00 per GPU-hour is adequate for 8B to 30B SFT and RL. H200 at $3.00 to $4.00 per GPU-hour with 141GB HBM3e is the value option for 70B QLoRA when B200 availability is tight.[40]

Providers. RunPod, Lambda, Vast.ai, Together AI, and the hyperscalers all offer B200 and H100/H200 instances. Pricing, availability, and minimum commitments change faster than this guide can be updated. For occasional training jobs, spot instances work. For sustained inference, reserved capacity or on-prem is more economical, which is where the local hardware recommendations above take over.[41]

The asymmetry

The same machine is rarely best for both. The Mac Studio dominates inference because bandwidth is king, and Apple ships more of it per dollar than anyone. The RTX 5090 dominates local training because there’s no equivalent in CUDA's ecosystem. And for anything beyond what fits in 32 or 96 gigabytes of VRAM, the B200 is the default — as long as you treat cloud time as a production resource, not a sandbox.

What you get at each price point.

For practitioners who need local, always-on inference, a 128 GB+ Apple Silicon Mac is the best option today. For training, a local GPU will force you to compromise on model sizes and algorithms, and rely on a B200 in the cloud for jobs that don’t fit.

The irony — training on the company whose product segmentation created the inference vacuum, then inferring on the company that filled it — is the subject of the companion piece.[42]

Notes

[1] “Your Parents Paid,” The AI Realist. Paid post.

[2] Prebuilt single-GPU RTX 5090 workstation pricing, April 2026: MSI Aegis $3,599 (sold out — cheaper than standalone GPUs); Skytech $5,300; CyberPower/Maingear $4,400–$5,300; Alienware Area-51 $5,300 (discounted). ArsenalPC MES2X dual RTX 5090 base: $7,602. Professional workstations (Dell Precision, Lenovo ThinkStation) start higher. Sources: Tom’s Hardware; VideoCardz; Petronella AI Workstation Guide.

[3] NVIDIA RTX 5090: 32GB GDDR7, 512-bit bus, 1,792 GB/s. Bandwidth ceiling for decode: 1,792 GB/s ÷ model weight read per token. Dense 32B Q4_K_M (~19GB): ceiling ~94 tok/s. MoE 30B with 3B active (~2GB read): ceiling ~896 tok/s. Hardware Corner RTX 5090 LLM benchmarks (Q4_K_XL via llama-bench): Qwen3 8B at 145–185 tok/s TG, Qwen3 30B A3B MoE at 234 tok/s (4K context) declining to ~110 tok/s (32K), Qwen3 32B dense at 52 tok/s (147K extreme context). LocalLLM.in/RunPod report 213 tok/s on 8B models.

[4] RTX 5090 street prices as of April 2026: Newegg FE at $3,695, Amazon at $3,899, custom AIB models $4,500–4,800 (WCCFTech, BestValueGPU tracker). The DRAM shortage has driven prices well above the $1,999 list price. Prebuilt RTX 5090 workstations: $5,000–8,000 complete. VideoCardz notes that standalone RTX 5090 pricing has approached the cost of entire prebuilt systems. EU: €3,800–5,200. UK: £3,200–4,000.

[5] Apple Mac Studio M4 Max: 128GB unified memory requires the 16-core CPU / 40-core GPU chip variant. 546 GB/s memory bandwidth.

[6] Mac Studio M4 Max 128GB pricing confirmed by PetaPixel review (March 2025) and B&H Photo (April 2026): $3,499 with 512GB SSD, $3,699 with 1TB SSD. EU: €4,099 / €4,299. UK: £3,599 / £3,799. Build-to-order upgrade from the $1,999 base (36GB).

[7] Ollama MLX preview: March 2026. Performance claims from Ollama blog. llama.cpp Metal remains the current default for most Mac users.

[8] Mac Mini M4 Pro 48GB: $1,599 with 512GB SSD, $1,799 with 1TB. Four units: $6,396–$7,196 plus ~$200 for Thunderbolt 5 cables. The 24GB base ($1,399) is cheaper but limits total pooled memory to 96GB — not enough for 70B. M4 Pro bandwidth: 273 GB/s per node.

[9] EXO Labs Mac Mini cluster benchmarks: Qwen2.5Coder-32B at 18 tok/s, Nemotron-70B at 4–8 tok/s on M4 Pro nodes. Five-node cluster total power: ~200W under full load. Sources: AIBase; Medium/Faizan Saghir (January 2026).

[10] macOS 26.2 RDMA over Thunderbolt 5: inter-node latency ~300μs → under 50μs (Awesome Agents, March 2026). Requires Recovery Mode boot (rdma_ctl enable). No TB5 switches exist — direct cabling only. EXO Labs is the primary clustering software.

[11] RTX PRO 6000 Blackwell Workstation Edition (datasheet PDF): 96GB GDDR7 ECC, 1.8 TB/s bandwidth, PCIe Gen 5 x16. No NVLink support — multi-GPU setups communicate over PCIe only, same bottleneck as dual 5090s. MSRP ~$8,565; retail $8,000–$9,200 as of March 2026. Server Edition: 1.6 TB/s, passive cooling. Also: Thunder Compute; Lenovo Press.

[12] PRO 6000 workstation pricing: APY (France) configures a single RTX PRO 6000 + Threadripper Pro 9965WX + 128GB ECC + 1TB at €20,305 HT (~$22,000). Dual adds ~€8,000 for the second GPU: total €28,000–30,000 HT (~$30,000–33,000). BOXX APEXX T4 PRO-X priced similarly. Cloud comparison: a B200 at $5/hr × 24/7 × 30 days = $3,600/month. The dual PRO 6000 breaks even in 8–9 months of continuous use at these prices.

[13] Dual RTX 5090: the RTX 5090 does NOT support NVLink (removed since RTX 3090). Two GPUs communicate over PCIe. X670E/X870E consumer boards bifurcate to x8/x8 with two GPUs. vLLM supports tensor parallelism over PCIe for inference; DeepSpeed ZeRO supports it for training.

[14] Dual RTX 5090 system cost: ArsenalPC MES2X base at $7,602. At current GPU street prices ($3,500–4,800 each), DIY or custom builds run $9,000–12,000. RTX 5090 TDP: 575W; dual GPUs need 1,500W+ PSU. Consumer AM5 boards bifurcate to x8/x8; Threadripper provides full x16/x16 but adds $4,500+ to CPU cost.

[15] Mac Studio M3 Ultra 256GB: base (28-core, 96GB, 1TB) at $3,999 + $2,000 memory upgrade = ~$5,999. Higher chip (32-core/80-core) adds $1,500. 512GB option discontinued March 2026. GGUF file sizes for DeepSeek V3.1 671B: Q8_0 = 713GB, Q4_K_M = 405GB, Q3_K_M = 320GB, Q2_K = 246GB, UD-IQ1_S = 192GB (unsloth/DeepSeek-V3.1-GGUF). At 256GB, the 671B model requires 1.5 to 2-bit dynamic quantization to fit with room for KV cache and OS. Llama 3.1 405B at Q4_K_M = ~235GB fits comfortably. Also: AppleInsider; VideoCardz.

[16] M5 Ultra: not yet announced. Projection based on M5 Max (614 GB/s on MacBook Pro) and UltraFusion architecture. Expected mid-2026 per Macworld/Bloomberg. M4 Ultra was never released; Apple skipped to M5 Ultra.

[17] Jeff Geerling, “1.5 TB of VRAM on Mac Studio — RDMA over Thunderbolt 5,” jeffgeerling.com, December 2025. Four Mac Studios, 1.5TB total. DeepSeek V3.1 671B at 32.5 tok/s, Qwen3 235B at 31.9 tok/s.

[18] macOS 26.2 RDMA: Requires Recovery Mode boot (rdma_ctl enable). Apple TN3205. No TB5 switches — clusters require direct full-mesh wiring, limiting practical size to four units.

[19] Apple removed the 512GB memory option from Mac Studio in March 2026 and raised the 256GB upgrade price. VideoCardz; MacRumors.

[20] AMD Ryzen AI Max+ 395 (Strix Halo): 128GB LPDDR5x. Framework Desktop at $1,999 (US); ~€2,200 (EU). Also Beelink GTR9 Pro, GMKtec EVO-X2.

[21] MoE performance on Strix Halo: community benchmarks (Hardware Corner GPU ranking, Level1Techs). Llama 4 Scout is 109B total / 17B active per token. At Q4, active weights per token read ≈ 10GB; theoretical max at 212 GB/s ≈ 21 tok/s. Practical speeds with overhead: ~10–20 tok/s. Treat as directional estimates.

[22] AMD Vulkan via llama.cpp: AMD used Vulkan for GTC 2026 DGX Spark comparisons. Community testers confirmed Vulkan RADV outperforms ROCm HIP on Strix Halo.

[23] DGX Spark: 128GB LPDDR5x, 273 GB/s, NVIDIA Grace Blackwell. $4,699 (increased from $3,999 at launch due to memory-shortage surcharge).

[24] EXO Labs, “Combining NVIDIA DGX Spark + Apple Mac Studio for 4x Faster LLM Inference,” October 2025. Measured speedup: 2.8× over Mac Studio alone. Model tested: Llama-3.1 8B — 70B+ not published.

[25] Unsloth QLoRA VRAM for 8B: ~7GB at rank 16, batch 1, 2K context; ~12–16GB at rank 64, batch 2, 8K context. Unsloth’s “full fine-tuning” stores base weights in 4-bit but trains all parameters — uses ~20–24GB for 8B. This is NOT traditional FP16 full SFT, which would require ~48–64GB (16GB model + 32GB AdamW states + gradients). NVIDIA Developer Blog, November 2025.

[26] Unsloth Blackwell-optimized kernels: 2× training speed vs standard implementations. NVIDIA Developer Blog, November 2025.

[27] “Fine-tune models with as many as 40 billion parameters on a single Blackwell GPU.” NVIDIA Developer Blog, November 2025. QLoRA on all linear layers.

[28] Spheron benchmark: QLoRA fine-tuning of Llama-3.1 70B at 38GB peak VRAM. ~4 hours on A100 80GB; PRO 6000 (96GB) should be comparable. Spheron blog, February 2026.

[29] Full SFT of 70B in FP16: ~300GB total (model + optimizer states). Requires 2× H100 80GB with DeepSpeed ZeRO-3, or a single B200. Cloud is the practical option.

[30] Dual RTX 5090 for training: DeepSpeed ZeRO Stage 2/3 enables model sharding over PCIe. QLoRA on 50–60B feasible; full SFT of 70B is not. PCIe x8/x8 creates ~15–30% throughput penalty vs NVLink.

[31] mlx-lm supports LoRA, QLoRA, and full fine-tuning. Also supports distributed fine-tuning via mx.distributed.

[32] MLX SFT limitations as of April 2026: no Unsloth integration (”coming very soon”), no DeepSpeed. SFT and LoRA work well. The 128–256GB unified memory on Mac Studio enables SFT on larger models than any consumer NVIDIA card.

[33] AMD ROCm training: Unsloth Core supports AMD GPUs. PyTorch + ROCm is functional. Community reports 10–20% more debugging overhead vs CUDA.

[34] GRPO on 8B via Unsloth: ~14–18GB (model + generated sequences + optimizer states). Fits on RTX 5090 with headroom. MarkTechPost, March 2026.

[35] DGX Spark for RL: 128GB allows GRPO on models up to ~30B. Unsloth Docker supports Spark natively (CUDA 12.x, PyTorch, TRL, GRPO). Bandwidth limitation (~273 GB/s) slows throughput vs PRO 6000, but memory capacity enables model sizes the 5090 cannot touch.

[36] 70B GRPO memory requirement: policy model (~38GB in 4-bit), generated sequences, optimizer states, gradient buffers. Total: 80–100GB depending on sequence length and batch size. B200 (192GB) handles this on a single GPU. 2× H100 (160GB combined) is the alternative. At $3–6/GPU-hr (neo-cloud), a 4–8 hour GRPO run = $12–48.

[37] Unsloth Studio changelog, March 2026: “macOS: Currently supports chat and Data Recipes. MLX training is coming very soon.”

[38] B200 (192GB HBM3e on SXM5; some providers list 180GB variants): 8,000 GB/s bandwidth. CloudRift benchmarks (February 2026): up to 4.9× RTX PRO 6000 throughput in 8-GPU configurations. Pricing as of April 2026: RunPod $4.99/GPU-hr on-demand; Spheron $6.03 on-demand, $2.18 spot; Lambda and neo-cloud providers $3–6/GPU-hr; hyperscalers (AWS, GCP, Azure) $6–12/GPU-hr on-demand. Average across 22 providers: $4.76/GPU-hr (getdeploying.com).

[39] The local-then-cloud workflow: test on a small model locally (8B on RTX 5090, or reduced batch on Apple Silicon), verify end-to-end, then scale on cloud hardware.

[40] H100 SXM: ~$2.50–3.00/GPU-hr. H200: ~$3.00–4.00/GPU-hr. H200 delivers 1.8–2.1× H100 throughput on long-context inference. CloudRift benchmarks.

[41] Provider comparison changes faster than this guide. Current pricing: RunPod, Lambda, Vast.ai, Together AI.

[42] “Your Parents Paid,” The AI Realist.

Your Parents Paid

Julien Simon — Fri, 03 Apr 2026 15:54:33 GMT

Jensen Huang stood before 20,000 developers at GTC 2026 and said something remarkable about the product line that made NVIDIA a household name. “GeForce is NVIDIA’s greatest marketing campaign,” he told the crowd. “We attract future customers starting long before you could afford to pay for it yourself. Your parents paid.” He paused, then repeated it: “Your parents paid for you to be NVIDIA customers. And every single year, they paid up. Year after year after year until someday you became an amazing computer scientist and became a proper customer, a proper developer.” Then the kicker: “This is the house that GeForce made.”[1]

The audience laughed. They weren’t supposed to take notes. The product specs tell a different story from the keynote.

The house that GeForce built and its tenants

In fiscal year 2026, NVIDIA’s datacenter segment generated $193.7 billion in revenue, roughly 90% of the company’s total revenue of $215.9 billion.[2] Gaming, the segment that includes GeForce, contributed $16 billion. Seven percent. The company’s gross margin for the full year was 71.1%.[3] NVIDIA didn’t just build the house that GeForce made: it evicted GeForce from the master bedroom, converted it to an Airbnb, and moved to a penthouse funded by H100s.

That financial reality shapes every product NVIDIA ships in ways Jensen didn’t mention on stage. NVIDIA’s consumer product line is not engineered to serve its most demanding users. It is engineered to ensure that its most demanding users become datacenter customers. The RTX 5090 has 32 gigabytes of video memory. The next NVIDIA product with enough memory to run a 70-billion-parameter model costs four times as much. The product after that costs more than ten times as much. This is not a gap in the lineup. It is the lineup.[4]

NVIDIA didn’t lose the local inference market. It designed a product line that made winning it someone else’s job.

Three layers of segmentation

The mechanism has three parts. Each independently routes demand toward NVIDIA’s highest-margin products. Together, they create a segmentation architecture so precise that it may have inadvertently handed Apple and AMD the fastest-growing consumer AI use case.

The first layer is the VRAM ceiling. The RTX 5090, launched in January 2025, pairs 32 gigabytes of GDDR7 memory with a 512-bit memory bus delivering 1,792 GB/s of bandwidth — a 78% generational improvement that makes it the highest-bandwidth consumer GPU ever built for workloads that fit in memory.[5] NVIDIA did increase VRAM by a third, from 24 gigabytes on the RTX 4090. The problem is that model sizes have increased faster. A 70-billion-parameter model quantized to 4-bit precision requires roughly 35-40 gigabytes for weights alone, more with long context. It does not fit. A 120-billion-parameter Mixture-of-Experts model requires 60-70 gigabytes. It does not fit. The emerging class of frontier open-weight models — DeepSeek R1 at 671 billion parameters and Llama 3.1 at 405 billion — requires memory measured in the hundreds of gigabytes. None of them fit.[6]

The 32-gigabyte ceiling is not a technical constraint. Samsung’s 3-gigabyte GDDR7 modules are in mass production. NVIDIA’s own Founders Edition design video inadvertently showed the RTX 5090 PCB labeled with 3-gigabyte module part numbers.[7] The RTX 5090 laptop variant already ships with 3-gigabyte modules.[8] GamersNexus confirmed during its teardown of the RTX PRO 6000 that the same GB202 die — identical silicon, slightly more cores enabled — supports 96 gigabytes using thirty-two 3-gigabyte chips.[9] A 48-gigabyte consumer card is well within NVIDIA’s engineering capability: the silicon supports it, the modules exist, and the laptop ships with them. NVIDIA chose not to ship it.

The reason is arithmetic. The RTX PRO 6000, with 96 gigabytes of GDDR7 ECC on the same GB202 die, costs $7,999 to $8,900.[10] Same silicon with 10% more cores. Triple the memory. Four times the price. If NVIDIA shipped a 48-gigabyte RTX 5090, it would cannibalize the professional tier. If it shipped a 64-gigabyte variant, it would threaten the economics of cloud GPU rental. Every gigabyte of GDDR7 allocated to a $2,000 consumer card is a gigabyte not generating revenue in an $8,000 workstation card or a $25,000 datacenter GPU. At current DRAM prices — which surged 171% year-over-year by the third quarter of 2025 — the allocation math is unambiguous.[11]

The second layer is the interconnect restriction. The RTX 3090, launched in September 2020, was the last GeForce card to include NVLink, the high-speed GPU-to-GPU interconnect that allows two cards to share memory.[12] When NVIDIA removed it from the RTX 4090, Jensen explained that the I/O area had been “repurposed to cram in as much AI processing as we could.”[13] The same decision persisted through Blackwell. Neither the RTX 5090 nor the RTX PRO 6000 has NVLink.[14] The technology exists exclusively on datacenter GPUs — the H100 at 900 GB/s bidirectional, the B200 at 1,800 GB/s — which cost $25,000 and up per card.

Without NVLink, multi-GPU setups on consumer hardware communicate over the standard motherboard bus at roughly 64 GB/s — fourteen times slower than H100 NVLink.[15] Tensor parallelism over PCIe still works — vLLM supports it, and a dual RTX 5090 can run 70B models — but the communication overhead is severe enough that independent benchmarks found a single RTX PRO 6000 outperforming multi-card consumer setups on large models, simply by avoiding the bottleneck.[16] For most practitioners, single-GPU memory remains the practical ceiling. That ceiling is 32 gigabytes on the RTX 5090 — or 96 gigabytes if you pay $8,000 for the RTX PRO 6000. The segmentation ladder, again.

Even a dual PRO 6000 setup — $16,000 and 192 gigabytes, matching a single B200’s memory capacity — delivers roughly a third to a fifth of the B200’s throughput at less than half its price. Even on a cost-per-token basis, the B200 wins by roughly 2×, because GDDR7 over PCIe cannot compete with HBM3e over NVLink.[16]

The third layer is the bandwidth constraint. When NVIDIA did build a unified-memory device for local AI, it paired 128 gigabytes of memory with 273 GB/s of bandwidth. The DGX Spark — announced at CES 2025 for $3,000, shipping in October 2025 at $3,999, now $4,699 after a memory-shortage surcharge — has the capacity.[17] It does not have the speed. The bandwidth limitation likely reflects both the thermal envelope of a 1.1-liter desktop enclosure and the economics of LPDDR5x — but whatever the cause, the effect is the same. Token generation in LLM inference is memory-bandwidth-bound: the model reads its entire weight matrix from memory for every token produced. At 273 GB/s, the DGX Spark generates tokens at roughly half the rate of a Mac Studio M4 Max (546 GB/s) and a third the rate of a Mac Studio M3 Ultra (819 GB/s).[18]

John Carmack (yes, that John Carmack) tested his unit in October 2025 and posted the results: “DGX Spark appears to be maxing out at only 100 watts power draw, less than half of the rated 240 watts, and it only seems to be delivering about half the quoted performance.”[19] Awni Hannun, lead developer of Apple’s MLX framework, independently confirmed similar results — roughly 60 teraflops in matrix operations, well below expectations.[20] A CES 2026 software update improved matters, with NVIDIA claiming up to 2.6× speedups on optimized configurations that use speculative decoding and aggressive quantization. Typical workloads saw 1.3 to 1.4×.[21]

The Spark reveals NVIDIA’s priorities. It gives you the memory and the CUDA ecosystem but not the bandwidth, ensuring that anyone who needs both capacity and speed still has to rent datacenter GPUs. Jensen positioned the Spark as a prototyping companion for DGX Cloud, which is exactly what a funnel would look like if it weighed two pounds and sat on your desk.[22]

A more creative use of the Spark came from outside NVIDIA. In October 2025, EXO Labs — a small open-source distributed inference project — wired two DGX Sparks to a Mac Studio M3 Ultra and split the inference workload between them. The Sparks handled prefill, the compute-intensive phase in which a long input prompt is processed via large matrix multiplications. The Mac handled decode, the bandwidth-heavy phase where tokens are generated one at a time. The result: a 2.8× speedup over the Mac Studio alone, with each device contributing exactly the capability the other lacked: the Spark’s 100 teraflops of FP16 compute for prefill, the Mac’s 819 GB/s bandwidth for decode.[23] This is disaggregated inference — the same architectural principle that AWS and Cerebras announced at datacenter scale in March 2026, using Trainium for prefill and the Cerebras wafer-scale engine for decode.[24] EXO demonstrated it on two consumer desktops connected by standard 10 Gigabit Ethernet for under $10,000.

The structural irony is precise. NVIDIA is building disaggregated inference into its next-generation Rubin CPX datacenter platform — compute-dense processors for prefill, HBM-rich GPUs for decode, and NVLink 6.0 connectivity.[25] The architecture NVIDIA is building its next datacenter generation around already works on a desk, across vendor boundaries, orchestrated by a twenty-person startup in London. The Spark isn’t a bad standalone product. It’s half of an excellent hybrid, and the other half is a Mac.

The DRAM shortage locked it in

The segmentation strategy might have softened over time — a 48-gigabyte RTX 5090 Super was widely rumored for 2026 — if the memory market hadn’t intervened. DRAM contract prices surged 171% year-over-year by Q3 2025, driven by datacenter demand for DDR5 and high-bandwidth memory cannibalizing total wafer capacity.[26] NVIDIA reportedly cut GeForce GPU production by 30 to 40 percent in early 2026.[27] The 16-gigabyte RTX 5060 Ti was at risk of discontinuation due to rising memory costs, making low-margin consumer SKUs uneconomical.[28]

The shortage converted a product strategy into a supply constraint. At current prices, memory accounts for the majority of the bill-of-materials cost on high-end consumer GPUs.[29] Every 3-gigabyte GDDR7 module allocated to a hypothetical $2,000 consumer card could generate $8,000 in revenue for a professional card or $25,000 in a datacenter product. NVIDIA’s allocation committee — if such a thing exists — would have to be economically irrational to prioritize the consumer tier. The shortage is expected to persist through 2027 at minimum, with some analysts projecting normalization no earlier than 2028.[30]

NVIDIA’s product segmentation creates a vacuum. The DRAM shortage prevents NVIDIA from closing it. The 32-gigabyte ceiling is now both a choice and a constraint.

What filled the vacuum

Apple didn’t set out to build the best local inference platform. Most practitioners still run models that fit in 32 gigabytes, and for them, the RTX 5090 is unmatched. But the capability frontier is moving toward 70B and above, and the sovereignty use case concentrates at exactly those model sizes: the models powerful enough to handle sensitive medical, legal, and financial workloads are the models that don’t fit on a consumer NVIDIA card. The unified memory architecture that makes Apple Silicon exceptional for large language models was designed for a different problem entirely — eliminating the CPU-GPU memory copy overhead that drained laptop battery life and slowed video editing workflows. But the same design that lets Final Cut Pro share memory buffers seamlessly between CPU and GPU also means that a Mac Studio with 128 gigabytes of unified memory has, functionally, 128 gigabytes of VRAM. No bus to cross. No copy overhead. Every byte is accessible to both the CPU and the GPU’s matrix multiplication units at full bandwidth.[31]

The numbers are specific. The Mac Studio M3 Ultra delivers 819 GB/s across its memory bus — three times that of the DGX Spark, and faster per dollar than anything NVIDIA sells below the datacenter tier.[32] The Mac Studio M4 Max offers 128 gigabytes at 546 GB/s for $3,699 — twice the Spark’s bandwidth at a lower price.[33] The MacBook Pro M5 Max, shipping since early 2026, offers 128 GB of storage and 614 GB/s of bandwidth in a laptop form factor.[34] Apple’s M5 generation added dedicated Neural Accelerators in every GPU core — purpose-built matrix-multiplication hardware that delivers 3.3 to 4.1 times faster prompt processing than the M4 generation on equivalent workloads.[35] Token generation, the bandwidth-bound phase, improved by 19 to 27 percent — closely matching the 28% memory bandwidth increase between the base M5 and base M4.[36] Two different mechanisms, one confirmation: for decode-heavy inference, bandwidth is the bottleneck, and Apple is shipping more of it every year.

The software ecosystem matured with startling speed. Apple’s MLX framework, released in December 2023, reached version 0.31.1 with roughly biweekly releases and 23,900 GitHub stars.[37] Most Mac practitioners today run models through llama.cpp’s Metal backend — hardware-agnostic, NVIDIA-independent, but not Apple-controlled. In March 2026, Ollama — the most popular tool for running LLMs locally — began transitioning its Apple Silicon backend from llama.cpp to MLX, with a preview release showing 57% faster prefill and 93% faster token generation on initial supported models.[38] The full rollout is expected in Q2 2026. When it arrives, the default path for running an open-weight model on a Mac will increasingly route through Apple’s own inference framework.

Whether Apple planned this matters less than what it did next. Multiple sources describe the LLM advantage as initially coincidental, a side effect of laptop chip architecture decisions.[39] But Apple has since leaned in hard. The M3 Ultra was explicitly marketed as running “LLMs with over 600 billion parameters.”[40] M5 added dedicated matrix multiplication hardware. macOS 26.2 enables Thunderbolt 5 clustering of multiple Mac Studios for combined memory pools exceeding a terabyte.[41] The trajectory has shifted from architectural accident to competitive strategy. Apple can afford to sell 128 gigabytes of GPU-accessible memory at consumer prices because it has no datacenter GPU business to cannibalize. The structural asymmetry is the advantage: NVIDIA must protect $194 billion in datacenter revenue; Apple must protect nothing.

AMD attacked from a different direction. The Ryzen AI Max+ 395, codenamed Strix Halo, packs 128 gigabytes of LPDDR5x unified memory into a mini PC that costs $2,000 — less than half the DGX Spark, less than half the equivalent Apple Silicon.[42] The bandwidth is lower: 256 GB/s theoretical, roughly 212 GB/s measured, which makes dense 70-billion-parameter models painfully slow at 3-5 tokens per second.[43] But the emerging class of Mixture-of-Experts architectures — where only a fraction of the total parameters are active per token — plays to Ryzen’s strengths. A 30-billion-parameter MoE model with 3 billion active parameters runs at around 50 tokens per second. Llama 4 Scout, with 109 billion total parameters, manages roughly 15 tokens per second.[44] Usable.

The software story is rougher. AMD’s ROCm stack remains a source of friction. Vulkan, the open graphics API, now outperforms ROCm on Strix Halo for many llama.cpp workloads. AMD itself used Vulkan for its GTC 2026 benchmark comparisons against the DGX Spark.[45] This effectively sidesteps AMD’s software maturity problem for inference — the one workload where CUDA’s moat is thinnest. Qualcomm’s Snapdragon X Elite brings similar unified LPDDR5x memory to Windows laptops, though benchmark data at 70B+ scales remains limited.[46]

The ecosystem compounds

The deeper consequence is not that Apple and AMD are selling hardware. It is that each sale weakens CUDA’s gravitational pull at the inference layer.

CUDA’s dominance in AI is real and earned. PyTorch, DeepSpeed, Unsloth, TRL — virtually every training framework is optimized for NVIDIA first, with alternatives months or years behind.[47] Porting a codebase from CUDA to ROCm typically requires modifying 15 to 20 percent of the code and three to six months of optimization work.[48] For training, the moat is deep and getting deeper.

But inference is not training. Running a pretrained model does not require custom CUDA kernels. It requires loading weights into memory and multiplying matrices — operations that llama.cpp, MLX, and Vulkan handle on any hardware. Every developer who downloads Ollama on a Mac Studio, every startup that deploys a Ryzen AI Max+ mini PC for edge inference, every enterprise that builds a compliant local cluster has learned to run models without CUDA. They haven’t left the NVIDIA ecosystem for training. But they’ve discovered that inference — the workload that will eventually dwarf training in market size — doesn’t require it.[49] This doesn’t eliminate NVIDIA dependency; it bifurcates it. Training stays on CUDA. Inference increasingly doesn’t. The question is which half of the workflow grows faster.

This is the pattern I described in “Open Source, Closed Orbit”: NVIDIA’s ecosystem strategy works by routing community adoption through hardware-dependent infrastructure.[50] The Black Hole pulls everything toward NVIDIA silicon. Local inference, running through hardware-agnostic frameworks, is the first workload category where the gravity is measurably weakening. Not because anyone built a better CUDA. Because the workload doesn’t need CUDA at all.

The compounding accelerates when privacy is factored into the calculation. Forty-four percent of organizations cite data privacy as the top barrier to LLM adoption.[51] HIPAA violations can result in fines of up to $2.1 million per incident. The EU Data Act took effect in September 2025. The US CLOUD Act’s compelled disclosure provision means that any inference workload running on a US cloud provider’s infrastructure is, in principle, accessible to a US court order — regardless of where the server sits physically.[52] For a European hospital, a defense contractor, or a financial institution running models on patient data, contract terms, or trading signals, local inference is not a cost optimization. It is a compliance requirement. For individual practitioners and small teams, a Mac Studio solves this today. For enterprises with regulatory audit requirements, local hardware is necessary but not sufficient — fleet management, monitoring, and certification infrastructure are still missing from Apple’s offering.

NVIDIA’s product line prices that compliance requirement into the segmentation ladder. A CTO who needs private 70-billion-parameter inference has three NVIDIA options: a 32-gigabyte RTX 5090 that cannot run the model, a $4,699 DGX Spark that can run it slowly, or cloud GPU rental that puts the data on someone else’s infrastructure — defeating the purpose. The fourth option is a $3,699 Mac Studio that runs 70B locally at usable speed with no data leaving the building. The sovereignty premium — the additional cost of keeping inference private — is not set by the physics of silicon. It is set by NVIDIA’s product segmentation. Apple and AMD make it cheaper because they have no datacenter business pushing practitioners toward the cloud.[53]

What Jensen would say

Jensen would not dispute the segmentation. He announced it. His rebuttal would be more precise: the DGX Spark gives you 128 gigabytes with full CUDA compatibility, 200 Gbps RDMA networking for clustering, and a direct path to DGX Cloud — the entire stack, on your desk, for $4,699. The bandwidth limitation is a trade-off for thermals and form factor, not a deliberate throttle. And cloud GPU rental at $0.69 per hour for an RTX 5090 makes local ownership unnecessary for most practitioners.

The first two points are defensible. The Spark is a genuine product with a genuine use case — CUDA prototyping at model scales that don’t fit on consumer GPUs. The RDMA clustering is technically impressive, though multi-Spark clustering benchmarks for 70B+ inference have not been independently published. The third point — cloud rental — deserves scrutiny. A cloud RTX 5090 at $0.69 per hour costs about $600 per month at 24/7 utilization, or about $6,000 per year with a savings plan [54] A Mac Studio M4 Max costs $3,699 once. The break-even for always-on local inference is measured in months, not years. A January 2026 study found consumer hardware breaking even against API pricing in 15 to 118 days at moderate volume.[55]

Cloud rental is cheaper for intermittent use; local hardware is cheaper for anything resembling a production workload. The caveat is organizational: buying a Mac is a hardware decision, but deploying it as inference infrastructure means retraining an engineering team that learned on CUDA and integrating devices that most IT departments have never managed at scale. The economics push practitioners toward owning hardware. NVIDIA’s product line pushes them toward owning someone else’s.

What would have to break

The segmentation thesis breaks down under three conditions.

First, NVIDIA ships a consumer GPU with 48 gigabytes or more of VRAM before the M5 Ultra arrives. A rumored RTX 5090 Super with 48 gigabytes of GDDR7 would close the gap for 70-billion-parameter models. If it arrives at the $2,000 to $2,500 price point with the RTX 5090’s 1,792 GB/s bandwidth, the value proposition against Apple Silicon reverses for that model tier. The DRAM shortage makes this unlikely before late 2026 at the earliest, but it remains the most direct competitive response.[56]

Second, NVIDIA re-enables NVLink or an equivalent high-speed interconnect on consumer cards. This would allow practitioners to pool VRAM across multiple GPUs at datacenter-comparable speeds. The incentive against this is structural: every consumer NVLink bridge sold is an H100 not rented. NVIDIA has moved in the opposite direction for three consecutive GPU generations.

Third, the CUDA moat extends into inference. If NVIDIA ships inference-specific optimizations — through TensorRT-LLM, NIM, or a CUDA-exclusive quantization format — that make the performance gap between CUDA and llama.cpp/MLX too large to ignore, practitioners return to NVIDIA hardware regardless of memory capacity. The DGX Spark’s CES 2026 software update, which delivered meaningful speedups through TensorRT-LLM and speculative decoding, suggests NVIDIA is pursuing this path.[57] But the update also demonstrated the strategy’s limitation: software optimizations can improve throughput within the bandwidth constraint, but cannot eliminate the constraint itself. At 273 GB/s, no amount of software makes the Spark faster than hardware with three times the bandwidth.

The most likely outcome is coexistence. NVIDIA dominates training and high-throughput production inference in the datacenter. Apple dominates personal and small-team local inference through memory capacity and ecosystem maturity. AMD competes on price at the entry tier. The local inference market grows despite NVIDIA’s product line, not because of it — because that product line is optimized for a $194 billion datacenter business that dwarfs any revenue a 48-gigabyte consumer card could generate.

Intuition, if not logic, points to a place Apple hasn’t been since discontinuing its Xserve rack-mounted servers in 2011.[58] Unified memory, Thunderbolt 5 clustering, MLX, and a silicon advantage at the inference layer add up to a server product that competes with DGX — not on training, but on private inference at enterprise scale. Tim Cook’s Apple is unlikely to re-enter the server market. But Cook’s potential successor is John Ternus, the SVP of Hardware Engineering, who already oversees the silicon and devices, and now the design teams that would build them.[59]

Jensen was right about one thing. This is the house that GeForce made. He just didn’t mention that some tenants had moved out, bought a Mac, and stopped paying rent.

Notes

[1] Jensen Huang, GTC 2026 keynote, March 16, 2026, SAP Center, San Jose. Transcript confirmed by Yahoo Finance, heise.de, 36kr, and Rev.com.

[2] NVIDIA Q4 FY2026 earnings press release (Form 8-K, EX-99.1), filed February 25, 2026, SEC EDGAR. Datacenter revenue: $193.737 billion. Total revenue: $215.938 billion.

[3] NVIDIA CFO Commentary (Form 8-K, EX-99.2), filed February 25, 2026. Full-year GAAP gross margin: 71.1%. Non-GAAP: 71.3%. The Q3 FY2026 quarterly margin was 73.4%; the full-year figure was lower due to a $4.5 billion H20 inventory charge in Q1 related to China export restrictions.

[4] NVIDIA RTX 5090: 32GB, $3,500–4,800 street as of April 2026 (Newegg FE at $3,695, Amazon at $3,899, custom AIB models to $4,800; DRAM shortage has driven prices well above the $1,999 list price). NVIDIA RTX PRO 6000: 96GB, $7,999–8,900. NVIDIA H100 SXM: 80GB, approximately $25,000–40,000.

[5] NVIDIA GeForce RTX 5090 specifications: 32GB GDDR7 on 512-bit bus, 1,792 GB/s bandwidth. RTX 4090 delivered 1,008 GB/s on a 384-bit bus. Improvement: 78%. VideoCardz; NVIDIA product page.

[6] Model sizes at Q4 quantization (approximate): Llama 3.3 70B ≈ 35–40GB; Nemotron 3 Super 120B ≈ 60GB; DeepSeek R1 671B ≈ 336GB; Llama 3.1 405B ≈ 203GB. Rule of thumb: BF16 ≈ 2GB per billion parameters; Q4 ≈ 0.5GB per billion parameters, plus overhead for KV cache. Note: NVIDIA’s NVFP4 format (available only on Blackwell GPUs via TensorRT-LLM) can compress a 70B model to approximately 18GB, fitting within the RTX 5090’s 32GB — but at a noticeable quality penalty compared to Q4, particularly on reasoning tasks. This is a partial escape hatch, not a full solution.

[7] VideoCardz analysis, citing @unikoshardware: NVIDIA Founders Edition design video showed RTX 5090 PCB labeled with K4VCF322ZC — a Samsung 3GB GDDR7 module part number. Samsung 2GB and 3GB GDDR7 modules share identical BGA footprints. B-tier source for PCB detail; Samsung module pin compatibility confirmed by Samsung semiconductor product catalog (A-tier).

[8] RTX 5090 Laptop GPU ships with 24GB (8× 3GB GDDR7 modules). NVIDIA product specifications.

[9] GamersNexus RTX PRO 6000 Blackwell teardown, June 24, 2025. Confirmed 32 memory positions populated with Samsung 3GB GDDR7 modules (32 × 3GB = 96GB). Die markings confirmed GB202-870-A1 variant. Note: a 48GB desktop consumer card using sixteen 3GB modules would increase DRAM power draw relative to the current sixteen 2GB configuration. Whether the existing VRM and thermal solution accommodate this without modification is unconfirmed — but the laptop SKU ships with 3GB modules at lower TDP, and the PRO 6000 runs thirty-two 3GB modules at 600W. The constraint is commercial, not physical.

[10] NVIDIA RTX PRO 6000 Blackwell: 96 GB GDDR7 ECC, 24,064 CUDA cores (188/192 SMs enabled), GB202-870-A1 die. $7,999 retail (Newegg as of March 2026); some configurations to $8,900. The PRO 6000 serves genuine non-AI workstation markets — CAD, simulation, film VFX — where ECC memory, ISV certification, and long-lifecycle support justify a premium over consumer cards. The 4× price premium over the RTX 5090 is not pure segmentation, but the memory capacity gap (96GB vs. 32GB) is the feature most relevant to AI inference practitioners, and that gap is a product design choice. NVIDIA RTX PRO Blackwell GPU Architecture Whitepaper V1.0; Thundercompute pricing analysis (February 2026).

[11] TrendForce Q3 2025 DRAM contract pricing data, reported by XDA Developers: overall DRAM contract prices 171.8% higher year-over-year.

[12] NVIDIA GeForce RTX 3090, launched September 2020, supported NVLink via the NVLink Bridge accessory. Confirmed: NVIDIA product specifications; Best Buy product listing.

[13] Jensen Huang, press gaggle following RTX 4090 launch event, September 20, 2022. Reported by Chuong Nguyen, Windows Central, September 21, 2022. Verbatim: “The reason why we took [NVLink] out was that we needed the I/Os for something else, and so we use the I/O area to cram in as much AI processing as we could.”

[14] RTX 5090: no NVLink. ASUS TUF RTX 5090 spec page: “NVLink/Crossfire Support: No.” RTX PRO 6000 Blackwell: no NVLink. The official NVIDIA RTX PRO 6000 datasheet lists PCIe 5.0 x16 with no mention of NVLink; Thundercompute teardown analysis confirms communication is limited to the PCIe bus.

[15] PCIe 5.0 x16: approximately 64 GB/s bidirectional. H100 NVLink: 900 GB/s bidirectional. Ratio: 14×. B200 NVLink: 1,800 GB/s. NVIDIA datacenter GPU specifications.

[16] CloudRift benchmarks (October 2025, February 2026) comparing RTX 4090, RTX 5090, RTX PRO 6000, H100, H200, and B200 across multiple model sizes using vLLM. For large models requiring multi-GPU tensor parallelism, the single PRO 6000 outperformed multi-card consumer setups because its 96GB avoided PCIe communication entirely. The benchmarker noted: “consumer-grade GPUs lack NVLink, and tensor parallelism requires extensive PCIe communication, which becomes a bottleneck.” Dual RTX PRO 6000 vs. single B200: both have 192GB, but B200 delivers up to 4.9× the throughput of a single PRO 6000 in 8-GPU configurations at 8K+8K context. For a 2-GPU PRO 6000 setup, the gap narrows to roughly 3× on short-context workloads (bandwidth ratio: B200 at 8,000 GB/s vs. dual PRO 6000 at ~3,000 GB/s after PCIe overhead) and widens to ~5× on long-context workloads. cloudrift.ai; cloudrift.ai. [17] DGX Spark: Announced as “Project DIGITS” at CES 2025 (January 6, 2025) at “starting at $3,000.” Shipped October 15, 2025 at $3,999 (delayed from original May target). Price raised to $4,699 on February 23, 2026, per NVIDIA Developer Forums announcement citing “worldwide constraints in memory supply.” Wccftech; WinBuzzer; NVIDIA Developer Forums.

[18] DGX Spark hardware specifications: 128GB LPDDR5x, 273 GB/s memory bandwidth. NVIDIA DGX Spark User Guide (docs.nvidia.com). Mac Studio M4 Max: 546 GB/s (Apple specifications). Mac Studio M3 Ultra: 819 GB/s (Apple specifications). Bandwidth ratios: M4 Max/Spark = 2.0×; M3 Ultra/Spark = 3.0×.

[19] John Carmack, X post, October 27, 2025. Verbatim: “DGX Spark appears to be maxing out at only 100 watts power draw, less than half of the rated 240 watts, and it only seems to be delivering about half the quoted performance.” Note: 240W is the external power supply rating. NVIDIA documents the SoC TDP at 140W. Carmack’s comparison was directionally correct; the TDP distinction is worth noting.

[20] Awni Hannun, GitHub gist with DGX Spark microbenchmark results, October 2025. Approximately 60 TFLOPS in BF16 matrix operations. Independent tester Lance Cleveland reproduced approximately 70 TFLOPS using Hannun’s methodology.

[21] NVIDIA Developer Blog, January 2026: “New Software and Model Optimizations Supercharge NVIDIA DGX Spark.” Headline claim: up to 2.6× speedup. This peak figure applies to Qwen-235B on a dual DGX Spark configuration using NVFP4 and speculative decoding. Typical single-unit workloads (Qwen3-30B, Stable Diffusion 3.5 Large) saw 1.3–1.4× improvements. StorageReview; HotHardware.

[22] NVIDIA positions the Spark alongside DGX Cloud: the product page features “DGX Spark + DGX Cloud” workflow integration. NVIDIA product page (nvidia.com/dgx-spark).

[23] EXO Labs, “Combining NVIDIA DGX Spark + Apple Mac Studio for 4x Faster LLM Inference with EXO 1.0,” blog.exolabs.net, October 15, 2025. Configuration: two DGX Sparks (128GB, 273 GB/s, 100 TFLOPS FP16 each) + one Mac Studio M3 Ultra (256GB, 819 GB/s, 26 TFLOPS FP16). Benchmark: Llama-3.1 8B FP16, 8,192-token prompt, 32 output tokens. Measured speedup: 2.8× over Mac Studio alone. The blog post headline claims “4×” but this is a theoretical projection for longer contexts; the measured result is 2.8×. Tom’s Hardware and Simon Willison both reported the measured figure. Note: all benchmark data originates from EXO Labs; no independent reproduction has been published. The 8B model used fits on each device individually; performance at 70B+ scales requiring combined memory has not been published.

[24] AWS and Cerebras disaggregated inference partnership announced March 13, 2026. Trainium3 chips handle compute-bound prefill; Cerebras CS-3 wafer-scale engines (44GB SRAM, 21+ PB/s internal bandwidth) handle bandwidth-bound decode. Connected via Amazon Elastic Fabric Adapter. Cerebras press release; AWS announcement.

[25] NVIDIA Rubin CPX announced GTC 2026. Compute-dense Rubin CPX processors for prefill, standard Rubin GPUs with HBM4 for decode, connected via NVLink 6.0. NVIDIA Developer Blog, “NVIDIA Rubin CPX Accelerates Inference Performance and Efficiency for 1M+ Token Context Workloads,” March 2026. [26] See note 11.

[27] Production cut: reported by Overclock3D, PC Gamer, Windows Central, and Igor’sLAB, all citing BoBantang/Benchlife. NVIDIA has not officially confirmed this figure. Igor’sLAB: “The reports of a significant reduction in GeForce GPU production are based exclusively on unofficial sources and have not been confirmed.”

[28] Overclock3D: NVIDIA reportedly considering discontinuing the 16GB RTX 5060 Ti variant due to GDDR7 cost escalation.

[29] At current DRAM prices, multiple analysts estimate that memory accounts for 70–80% of the bill of materials cost for high-VRAM consumer GPUs (the GPU die plus VRAM combined). Historically, VRAM accounted for 30–40% of the BOM. The inflation-era figure is specific to the current supply crisis. BuySellRam; Quasa.io analysis; VideoCardz.

[30] IDC, TeamGroup, and Counterpoint Research project DRAM shortages through 2027. Intel CEO and IEEE Spectrum analysis of new fab timelines suggest 2028 or beyond for full normalization. SK Hynix plans to boost DRAM production 8× in 2026, which TweakTown notes “still won’t be enough.”

[31] Apple Silicon unified memory architecture: CPU, GPU, and Neural Engine share a single memory pool with zero-copy access. No discrete VRAM; all system memory is GPU-accessible. Apple technical documentation.

[32] Mac Studio with M3 Ultra: up to 256GB unified memory, 819 GB/s memory bandwidth. Starting at $5,599 for the 192GB configuration. Apple product specifications (apple.com).

[33] Mac Studio with M4 Max: up to 128GB unified memory, 546 GB/s memory bandwidth. The 128GB configuration requires the 16-core CPU / 40-core GPU chip variant and is $3,499 with a 512GB SSD, $3,699 with a 1TB SSD (apple.com; confirmed by B&H Photo and PetaPixel review, March 2025). EU: €4,099 / €4,299. UK: £3,599 / £3,799.

[34] MacBook Pro with M5 Max: up to 128GB unified memory, 614 GB/s memory bandwidth. Apple newsroom, March 2026. Apple product specifications.

[35] Apple Machine Learning Research, “Exploring LLMs with MLX and the Neural Accelerators in the M5 GPU,” published November 19, 2025. Prompt processing (time-to-first-token) improvement: 3.33× to 4.06× across six tested models. Token generation improvement: 19–27%. Benchmarks conducted on base M5 vs. base M4 MacBook Pro (both 24GB configurations).

[36] Base M5 memory bandwidth: 153 GB/s. Base M4: 120 GB/s. Improvement: 28%. The 19–27% token-generation improvement, corresponding to a 28% bandwidth increase, confirms the memory-bandwidth-bound nature of LLM decode. Apple ML Research, ibid.

[37] MLX GitHub repository (github.com/ml-explore/mlx): 23,900 stars as of March 2026. Version 0.31.1. Release frequency: approximately biweekly. MLX was first released in December 2023.

[38] Ollama v0.19.0: released March 27, 2026 (GitHub tag); blog post March 30, 2026 (ollama.com/blog/mlx). Performance claims: prefill 1,154 → 1,810 tok/s (57% improvement); decode 58 → 112 tok/s (93% improvement). These are Ollama-published figures. The MLX backend is described as a “preview” — at launch, only Qwen3.5-35B-A3B is supported. llama.cpp remains the backend for all other models. Full rollout expected Q2 2026. Methodological note: the benchmark compared NVFP4 quantization (MLX) against Q4_K_M (llama.cpp); part of the improvement reflects the difference in quantization format, not solely the backend change.

[39] Multiple sources describe Apple Silicon’s LLM advantage as initially incidental. Cult of Mac: “How Apple accidentally made the best AI computer.” XDA Developers: “Apple has a sleeper advantage when it comes to local LLMs.” One investment analyst quoted by a Substack: “The Mac mini M4 may be the most underanalyzed product in Apple’s lineup from an AI strategy perspective.”

[40] Apple Newsroom, March 2025: M3 Ultra announcement explicitly stated the chip enables running “LLMs with over 600 billion parameters.” Apple product marketing (apple.com/newsroom).

[41] macOS 26.2 Thunderbolt 5 clustering: enables pooled inference memory across multiple Mac Studios via RDMA. Demonstrated by EXO Labs and community builders. Awesome Agents reported Mac Studio clusters running trillion-parameter models for approximately $40,000 in hardware.

[42] AMD Ryzen AI Max+ 395 (Strix Halo): 128GB LPDDR5x unified memory, 256 GB/s theoretical bandwidth. Framework Desktop: $1,999 for 128GB configuration. Also available from Beelink GTR9 Pro and GMKtec EVO-X2 at similar prices. 31+ OEM devices announced at CES 2026. AMD product specifications; Framework blog.

[43] Measured bandwidth: approximately 212 GB/s (LLM Tracker benchmarks). Dense 70B model performance at Q4: 3–5 tok/s. LLM Tracker; Hardware Corner benchmarks.

[44] MoE model performance on Ryzen AI Max+ 395: 30B MoE at Q8 ≈ 50 tok/s; Llama 4 Scout 109B ≈ 15 tok/s. LLM Tracker; community benchmarks. These figures are from community testing and should be treated as approximate.

[45] AMD used Vulkan llama.cpp for GTC 2026 benchmark comparisons against DGX Spark. Community testers found that Vulkan via the RADV driver outperforms ROCm HIP on Strix Halo for many llama.cpp workloads. GitHub llama.cpp Vulkan performance discussions; AMD blog.

[46] Qualcomm Snapdragon X Elite: ARM-based SoC with LPDDR5x unified memory (up to 64GB on current configurations). The unified memory architecture is conceptually similar to Apple Silicon — all memory is GPU-accessible — but current configurations max out at 64GB, half the Apple and AMD offerings. Benchmark coverage for large LLM inference (70B+) on Snapdragon X Elite is sparse as of publication. The platform is primarily positioned for Windows laptops, not desktop workstations. [47] CUDA training ecosystem dominance: PyTorch defaults to CUDA. DeepSpeed, Unsloth, and TRL require CUDA. Apple Silicon has MLX LoRA for basic SFT but lacks GRPO support. AMD ROCm is functional but substantially less mature. This is the consensus among practitioners, documented across multiple sources.

[48] CUDA-to-ROCm porting effort: 15–20% codebase modification, 3–6 months optimization, 10–20% initial performance penalty. HyperFRAME Research; Introl analysis.

[49] Jensen Huang, Q4 FY2026 earnings call, February 2026: stated “the agentic AI inflection point has arrived” and projected inference would eventually dwarf training in market size. NVIDIA earnings transcript.

[50] “Open Source, Closed Orbit: The Hardware Monopolist’s Guide to Owning Open Source,” The AI Realist (www.airealist.ai). Framework: NVIDIA’s “Black Hole” model (centripetal, routing ecosystem gravity back to NVIDIA hardware) versus Hugging Face’s “Sun” model (centrifugal, hardware-agnostic).

[51] Privacy as a barrier to LLM adoption: 44% of organizations cited data privacy as the top concern in enterprise LLM deployment surveys. Multiple analyst reports corroborate this range; the specific 44% figure is from Cisco’s 2024 Data Privacy Benchmark Study, the most recent large-sample study available. HIPAA penalty: maximum $2.1 million per violation category per year under the HITECH Act tiered penalty structure.

[52] US CLOUD Act compelled disclosure provision (18 U.S.C. § 2713): requires providers of electronic communication or remote computing services subject to US jurisdiction to produce data in their “possession, custody, or control” regardless of data location. For a detailed trace of the legal pathway and its implications for cloud-hosted AI workloads, see “Access, Disable, Destroy,” The AI Realist (www.airealist.ai). EU Data Act (Regulation 2023/2854) entered into application on September 12, 2025.

[53] The sovereignty premium framing draws on the cost comparison structure throughout this piece. NVIDIA options for private 70B inference: RTX 5090 (32GB, cannot run the model), DGX Spark ($4,699, runs at ~3–5 tok/s on 70B), cloud rental ($2–5/hr, data leaves the building). Apple option: Mac Studio M4 Max ($3,699 with 1TB SSD; runs 70B at Q4, ~8–15 tok/s; data stays local). The price delta between the cheapest NVIDIA option that works (Spark at $4,699) and the Apple option ($3,699) is $1,000, and the performance delta (2–3× faster on Apple at the bandwidth-bound decode step) means the effective cost of NVIDIA sovereignty is higher than the sticker price suggests.

[54] Cloud RTX 5090: $0.69/hr on RunPod community cloud (March 2026 pricing). At 24/7 utilization: $0.69 × 24 × 30 ≈ $497/month, or approximately $6,000/year. RunPod.

[55] Knoop and Holtmann, “Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs,” arXiv, January 2026. Found consumer GPU electricity-only inference costs of $0.001–0.04 per million tokens; break-even against API pricing at 15–118 days at moderate volume (30 million tokens/day).

[56] RTX 5090 Super with 48GB GDDR7: widely rumored based on Samsung 3GB GDDR7 module availability and PCB compatibility. Launch reportedly slipped to Q3 2026 or later due to DRAM supply constraints. GameGPU; TweakTown; VideoCardz. Unconfirmed by NVIDIA.

[57] See note 20. The CES 2026 software update for DGX Spark focused on TensorRT-LLM optimizations and speculative decoding — CUDA-exclusive techniques that do not benefit Apple Silicon or AMD platforms.

[58] Apple Xserve: rack-mounted 1U server sold from 2002 to January 31, 2011. When a customer complained about the discontinuation, Steve Jobs replied, “Hardly anyone was buying them.” Apple suggested migrating to the Mac Pro Server or the Mac mini Server. Apple does run server-side inference today via Private Cloud Compute (PCC), announced at WWDC 2024 — but PCC serves Apple’s own services (Apple Intelligence), not enterprise customers. A rack-mounted inference product for sale would be a fundamentally different market entry. Wikipedia; Macworld, November 5, 2010.

[59] John Ternus, Apple SVP Hardware Engineering, age 50. Bloomberg (Mark Gurman, March 2026), NYT (January 2026), and multiple outlets identify him as the leading candidate to succeed Tim Cook as CEO. In January 2026, Cook expanded Ternus’s role to include oversight of hardware and software design teams, robotics, and product marketing — in addition to his existing responsibility for all hardware engineering, including iPhone, iPad, Mac, and AirPods. Ternus was the face of the MacBook Neo launch, a role Cook has historically reserved for himself.

Open Source Won, But Changed Owners

Julien Simon — Thu, 02 Apr 2026 17:25:21 GMT

In October 2021, I published a blog post on Hugging Face titled “Large Language Models: A New Moore’s Law?” I had just joined as Chief Evangelist. Microsoft and Nvidia unveiled the long-forgotten Megatron-Turing NLG 530B, and the industry celebrated. I was not. The post argued that the mega-model arms race was a dead end — exponential parameter growth producing diminishing returns at escalating cost. Use pretrained models. Use smaller models. Fine-tune. Optimize. The reactions ranged from skeptical to hostile.[1]

Almost five years later, the most downloaded open model family in the world is Qwen, and the models driving those numbers are the 7-billion- and 14-billion-parameter variants, fine-tuned into over 200,000 derivatives, running on laptops and single GPUs.[2] Alibaba’s single-month downloads in December 2025 exceeded the combined total of the next eight most popular model families.[3]

The model that prompted that blog post — Megatron-Turing NLG 530B, trained on 560 DGX A100 servers at a cost I estimated in the tens of millions — scored roughly 87% on the commonsense reasoning benchmark HellaSwag, state of the art at the time.[4] That benchmark is now too easy to report. Modern model evaluations have moved on to harder tests — MMLU, GPQA, AIME, SWE-bench — because models saturated HellaSwag years ago. Qwen3-14B scores 81% on MMLU, handles context windows over fifteen times longer natively, and runs on a consumer GPU.[5] It has 38 times fewer parameters. A 4-billion-parameter model, fine-tuned for a specific task, can match or exceed a 120-billion-parameter teacher.[6]

The scaling laws that justified the mega-model trend were not wrong: they were misapplied. DeepMind’s Chinchilla paper, published five months after my blog post, proved it: the field had been building models that were too big and undertrained.[7] The answer was never more parameters. It was more data, better training recipes, and relentless engineering optimization. The scaling race produced useful research: distillation requires a teacher, and you cannot compress knowledge that was never learned. But the blog post wasn’t about research. It was about the industry consensus that bigger was the only path forward, and that anyone who couldn’t afford hundreds of DGX servers was locked out. That consensus was wrong. The practitioners who fine-tuned, optimized, and compressed were the ones who put AI into production. The scaling-era mega-models produced papers. The small models produced products.

But the path from that blog post to the current landscape also passed through a French supercomputer, a $300 training run, a Bulgarian C++ project with zero dependencies, and a geopolitical shift that none of us anticipated.

The open model revolution happened in three phases. The first was idealistic. The second was corporate. The third was geopolitical. At every stage, the engineering that made models runnable on real hardware mattered more than the architecture that made them intelligent. And at the end of the arc, the movement built to democratize AI handed ecosystem leadership to a handful of Chinese labs, not because the system failed, but because it worked exactly as designed.

The Fork

The modern history of open models begins with a fork.

In 2017, Google researchers published “Attention Is All You Need,” introducing the Transformer architecture.[8] Within eighteen months, two teams had taken it in opposite directions. OpenAI built GPT-1 in June 2018 — a decoder-only, generative model, designed to predict the next token.[9] Google built BERT in October 2018 — an encoder-only, bidirectional model, designed to understand context from both directions simultaneously.[10]

The critical difference was not architectural. It was strategic. Google released BERT’s weights, code, and training methodology to the public. BERT became “ubiquitous.”[11] Thousands of papers have been built on it. The Hugging Face Transformers library, which would become the connective tissue of the open ecosystem, was initially built to make BERT accessible.[12] OpenAI chose differently. GPT-2, arriving in February 2019 with 1.5 billion parameters, was initially withheld — “too dangerous to release.”[13] GPT-3, with 175 billion parameters, launched in June 2020 as an API-only service. No weights. No code. The most powerful language model in the world was a black box you rented by the token.[14] One path created a research ecosystem. The other created a product category. And for the next two years, the closed path looked like the future — GPT-3’s in-context learning was genuinely stunning, the API model made it accessible, and the consensus formed that scale required resources held by only a handful of organizations.

The Democratic Experiment

The first serious attempts to build open alternatives to GPT-3 were driven by principle as much as engineering.

EleutherAI, a grassroots collective of researchers, released GPT-J (6B parameters) in 2021 and GPT-NeoX-20B in 2022 — open-source models explicitly designed as alternatives to the closed GPT series.[15] They were credible for their size, but they were not competitive with GPT-3. The gap was real. Meta’s OPT-175B, released in May 2022, was the first open model to match GPT-3’s parameter count, accompanied by code, training logbook, and weights — a deliberate act of transparency.[16] It narrowed the gap further but didn’t close it.

The most ambitious attempt was BLOOM. Led by Hugging Face through the BigScience workshop, BLOOM was a 176-billion-parameter multilingual model trained by over 1,000 researchers across hundreds of institutions, on the Jean Zay supercomputer funded by the French government.[17] Training ran from March to July 2022 on 384 A100 GPUs. It covered 46 natural languages and 13 programming languages, and it was released under the Responsible AI License.[18]

BLOOM was the high-water mark of the “open AI as public good” vision — multinational, volunteer-driven, publicly funded, transparently documented. It was also, honestly, a mixed result. The model performed competitively on multilingual benchmarks, but it did not match the best proprietary models on English-language tasks, and its adoption never reached the critical mass that BERT had achieved.[19] The ambition was extraordinary and the execution was genuine (I was at Hugging Face for the entire project), but the lesson was uncomfortable: a thousand researchers and a government supercomputer could produce a model that was good, not a model that was dominant.[20] Scale required not just compute but the kind of ruthless iteration on data quality and training recipes that a consensus-driven research workshop struggled to achieve.

MMLU scores over time - Download full-resolution version

Then, in November 2022, OpenAI released ChatGPT. And the competition's terms changed overnight.

The $300 Proof

ChatGPT did not introduce new capabilities. GPT-3 could already do most of what ChatGPT demonstrated. ChatGPT’s innovation was the interface — a conversational wrapper around instruction-tuned GPT-3.5 that made language models comprehensible to non-technical users. It reached 100 million users within two months.[21] The industry panicked. Every organization wanted a ChatGPT-equivalent, and the closed model providers held all the keys.

Meta had been moving toward open release for months. In February 2023, it announced LLaMA — a family of models from 7 to 65 billion parameters that outperformed GPT-3 on most benchmarks despite being dramatically smaller.[22] The initial release was restricted to approved researchers under a non-commercial license, but within a week, the weights appeared on 4chan via BitTorrent.[23] The leak made headlines, but it was less consequential than it seemed at the time. Meta was already heading toward open commercial release — Llama 2 arrived with a commercial license just five months later.[24] The engineering community that would build on LLaMA didn’t need the leak; it needed the model, and Meta would provide it.

What mattered more than the leak was what happened in the two weeks after. On March 13, Stanford released Alpaca — LLaMA 7B fine-tuned on 52,000 instruction-following examples generated by GPT-3.5. Total training cost: under $600. Performance: comparable to text-davinci-003 on the team’s evaluation.[25] Two weeks later, a group from UC Berkeley, CMU, Stanford, and UCSD released Vicuna-13B — LLaMA fine-tuned on 70,000 conversations scraped from ShareGPT. Cost: approximately $300. A preliminary evaluation using GPT-4 as a judge rated it at 90% of ChatGPT’s quality.[26]

These were not frontier models. The benchmarks were rough, the evaluations informal, and the comparison to ChatGPT was generous. But the structural revelation was precise: the gap between a base model and a useful chatbot did not require the billions of dollars the industry had assumed it did. It was a weekend, a few hundred dollars, and a clever dataset. The expensive part was pretraining. The valuable part — instruction following, conversational fluency, task completion — could be added cheaply.

This finding validated the core argument of the October 2021 blog post, though not in the way I had anticipated. I had argued for using smaller pretrained models and fine-tuning them. Alpaca and Vicuna proved the economics. A 7B model, fine-tuned on synthetic data for the cost of a plane ticket, could approximate a frontier product. The mega-model consensus didn’t just produce diminishing returns. It had been solving the wrong problem. The bottleneck was never raw intelligence at the base layer. It was the efficiency of the last mile: instruction tuning, alignment, and the engineering to make inference fast on real hardware.

The Engineering Revolution

The models were necessary. The engineering was sufficient.

In March 2023 — two weeks after the LLaMA release — Georgi Gerganov published llama.cpp: a pure C/C++ implementation of LLaMA inference with zero dependencies.[27] It ran on the CPU. No GPU required. No Python. No PyTorch. No CUDA. A laptop could run a 7-billion-parameter language model. By August 2023, Gerganov’s project had introduced the GGUF file format — a self-contained binary that bundled model weights, tokenizer, and metadata into a single downloadable file.[28] When a new model dropped on Hugging Face, GGUF-quantized versions appeared within hours. GGUF became the de facto standard for distributable AI models. As of early 2026, llama.cpp has over 85,000 GitHub stars and supports dozens of architectures.[29]

Quantization — compressing model weights from 16-bit floating point to 4-bit integers with manageable quality loss — was the technical mechanism that made local inference possible. The GGUF K-quant variants use mixed-precision per layer, allocating more bits to the layers that matter most. The practical sweet spot, Q4_K_M — a 4-bit mixed-precision quantization scheme — retains roughly 92% of the original model’s quality while reducing size by 75%.[30] A 7B model that requires 14 GB in full precision fits comfortably in 4 GB quantized. A 70B model that needs 140 GB fits in 40 GB — within reach of a Mac Mini with unified memory.

Flash Attention, published by Tri Dao in 2022, was equally consequential and even less glamorous.[31] By rewriting the attention computation to be IO-aware — minimizing memory reads and writes rather than raw floating-point operations — Flash Attention delivered 2–4× faster attention with a lower memory footprint. It enabled longer context lengths on the same hardware. It was adopted across virtually every major framework. It was added to llama.cpp in April 2024.[32] If you use a language model in 2026, you are almost certainly benefiting from Flash Attention, whether you know it or not.

Continuous batching, pioneered by vLLM’s PagedAttention system in 2023, transformed inference serving.[33] By managing the key-value cache — the memory that stores the context of a conversation — like virtual memory pages, vLLM dramatically improved throughput for concurrent requests — the reason inference providers can serve thousands of users per GPU and offer competitive per-token pricing. Speculative decoding — using a small draft model to generate candidate tokens, which are then verified by the larger model — added another 2–3× speedup for interactive use cases.[34]

None of these innovations changed what models could do. All of them changed who could run models and at what cost. Flash Attention, quantization, llama.cpp, GGUF, continuous batching, speculative decoding: these are the unglamorous infrastructure achievements that made the open model revolution a reality rather than a research curiosity. The Transformer architecture matters. But the engineering that makes Transformers affordable on real hardware is what differentiated the open model ecosystem from the closed one.

Same Knowledge, Faster Speed

Mixture-of-Experts was the architectural innovation that made open models economically viable at the frontier scale.

The concept is straightforward: instead of activating all parameters for every token, route each token through a subset of specialized “expert” subnetworks. The model retains the knowledge capacity of its full parameter count but runs at the inference cost of its active parameter count. MoE had been explored in research for years, but Mistral’s Mixtral 8x7B, released in January 2024 under an Apache 2.0 license, was the first widely adopted open MoE.[35] With 46.7 billion total parameters and roughly 12.9 billion active per token, it competed with Llama 2 70B at a fraction of the inference cost.

DeepSeek V3, released in December 2024, scaled MoE to 671 billion total parameters with 37 billion active — and reportedly trained for approximately $5.5 million, a figure that stunned the industry.[36] Qwen 3, released in April 2025, deployed a 235-billion-parameter MoE with 22 billion active parameters across 119 languages.[37] Arcee AI’s Trinity Large, released in January 2026 and upgraded to a full reasoning model in April, pushed sparsity further: 400 billion total parameters, 13 billion active, 256 experts with 4 active per token. A 30-person U.S. startup trained a frontier-class model in 33 days on 2,048 Nvidia B300 GPUs for approximately $20 million — and its reasoning variant now ranks as the number one open model in the U.S. on OpenRouter.[38]

MoE dissolved the equation that had defined the scaling era: bigger meant better meant more expensive to run. A 400B MoE with 13B active parameters runs 2–3× faster than a comparable dense model on the same hardware.[39] The knowledge is in the total parameter count. The cost is in the active parameter count. All parameters still load into memory — the hardware requirement doesn’t shrink proportionally — but if the model fits, the speed advantage is dramatic. This is why open models can compete at frontier scale — the effective inference cost is an order of magnitude less than the headline parameter count suggests.

Qwen3-30B-A3B makes the point concrete. Thirty billion total parameters, 3 billion active per token. It runs at 196 tokens per second on a single RTX 4090 — faster than 8-billion-parameter dense models — with quality competitive with models five times its active size.[40] The model that prompted the October 2021 blog post had 530 billion parameters and required 560 DGX A100 servers. This one fits on a gaming GPU.

The Architecture That Didn’t Die

Every year since 2023, someone has announced the Transformer's death. It hasn’t happened.

Mamba, published by Albert Gu and Tri Dao in December 2023, introduced a selective state space model that achieved linear-time inference — compared to the Transformer’s quadratic scaling with sequence length — and demonstrated 5× throughput gains on long sequences.[41] The LinkedIn posts wrote themselves. AI21 Labs shipped Jamba in March 2024, the first production-grade hybrid: interleaving Mamba and Transformer layers at a 1:7 ratio, with MoE on top, achieving 256K-token context on a single 80GB GPU.[42] IBM built Granite 4.0 on a hybrid SSM architecture. Nvidia’s Nemotron-H replaced 92% of attention layers with Mamba2 blocks and demonstrated 3× throughput over comparable Transformers.[43]

The hybrids are real, and in specific deployment niches — such as long-context inference and memory-constrained edge devices — they are superior. But they haven’t displaced the Transformer for general-purpose language modeling. IBM’s own evaluation found that pure SSM models still fall behind on tasks requiring strong associative recall or in-context learning.[44] The practical verdict, as of early 2026, is that architecture choice has become a deployment decision rather than a research religion. Pure Transformers dominate general-purpose tasks. Hybrids win on efficiency for specific workloads. Nobody who bet their company on “Mamba kills Transformers” has been rewarded.

Meanwhile, the Transformer itself kept evolving, quietly, incrementally. Grouped-Query Attention reduced KV-cache memory.[45] Multi-Head Latent Attention, introduced in DeepSeek V3, further compressed it.[46] Rotary Position Embeddings enabled flexible context lengths. SwiGLU replaced GELU in feed-forward layers. Each refinement was incremental. Together, they compounded into models dramatically more efficient than GPT-3’s architecture while remaining recognizably Transformers.

Sebastian Raschka, surveying the architectural landscape in mid-2025, noted the structural similarity: “At first glance, looking back at GPT-2 (2019) and forward to DeepSeek V3 and Llama 4 (2024–2025), one might be surprised at how structurally similar these models still are.”[47] Seven years of refinement. Same fundamental architecture. The improvements that mattered most happened below the architecture — in the attention kernels, memory management, quantization schemes, and serving infrastructure. The building stayed the same. The plumbing was rebuilt from scratch.

Ten Lines of Code

The gap between “downloading a base model” and “having a useful model” collapsed.

In 2023, fine-tuning a language model required substantial ML expertise, custom training loops, and significant compute. LoRA — Low-Rank Adaptation, published by Hu et al. in 2021, had introduced the idea of training small adapter weights instead of the full model, but the tooling was immature.[48] By early 2024, Hugging Face’s TRL library had matured to the point where supervised fine-tuning, DPO, RLHF, and PPO were available as high-level Python APIs. A DPO training run on a 7B model: roughly ten lines of code.[49]

Unsloth pushed this further — 2× faster fine-tuning with 70% less memory, enabling QLoRA (quantized LoRA) fine-tuning of 70B models on a single consumer GPU in hours rather than days.[50] The Unsloth community became a prolific source of quantized model variants on Hugging Face. The barrier between “I have an idea for a specialized model” and “I have a specialized model” shrank to an afternoon and a free Colab notebook.

This commoditization is the structural mechanism behind the ecosystem’s explosive growth. When post-training is cheap and accessible, the base model becomes the platform, and the fine-tune becomes the application. Hugging Face adds 1,000 to 2,000 new models per day.[51] The Qwen family alone has spawned over 200,000 derivative models.[52] Most of these derivatives exist because the tooling to create them requires neither deep ML expertise nor significant compute. The dynamic mirrors the mobile app explosion — free development tools, low barriers to entry, but faster.

Arcee AI’s trajectory illustrates the consequence. The company initially built its business on post-training other people’s base models for enterprise clients — taking Llama, Mistral, or Qwen and customizing them. But as post-training commoditized, the defensible value shifted to the base model layer. Arcee’s decision to build Trinity Large from scratch was driven by the realization that, if anyone can fine-tune, the moat lies in pretraining — and that U.S. enterprise clients were increasingly uncomfortable depending on Chinese base models.[53]

The Laptop as Inference Server

The open model revolution would have remained academic without a parallel revolution in consumer hardware.

Apple Silicon’s unified memory architecture — where CPU and GPU share the same memory pool — eliminated the PCIe bottleneck that made GPU inference on consumer machines impractical for large models.[54] A Mac Studio with 192 GB of unified RAM can hold a quantized 70B model entirely in memory. Apple’s MLX framework, released in December 2023, provided a native array library optimized for this architecture, enabling both inference and fine-tuning on Macs.[55]

Ollama reduced the installation and execution of local models to two commands — install, then run. LM Studio provided a GUI for browsing, downloading, and comparing models side-by-side. Both are built on llama.cpp as their inference backend, meaning one person’s C++ project serves as the runtime layer for most of the local inference movement. vLLM powered production serving on GPUs. By 2026, a common practitioner pipeline is LM Studio for evaluation, Ollama for development, and vLLM for production.[56] Any developer can run Qwen 3 14B on a Mac Mini without ever sending a token to a cloud API.

This is not a convenience story. It is a sovereignty story. When inference runs locally, the cloud layer of the coercion stack — the switch that allows a provider or a government to suspend service — is bypassed entirely.[57] The practitioner who runs a quantized model on consumer hardware depends on nobody’s continued willingness to serve them. The model is a file. The runtime is open source. The hardware is owned.

One chokepoint remains: the model was downloaded from a platform — typically Hugging Face — that can be compelled to remove it, as Meta demonstrated with DMCA takedowns during the original LLaMA leak. Sovereignty is real only for models already on disk. But once downloaded, the dependency chain ends. This is the most complete form of AI sovereignty available to an individual or a small organization, and it exists because Georgi Gerganov wrote a C++ inference engine, Tri Dao rewrote the attention kernel, and a generation of engineers figured out how to compress 16-bit weights to 4 bits without destroying the model’s capabilities.

The Shift

By the end of 2025, the open model ecosystem had achieved its original goal and exceeded it. Open-weight models rivaled or matched closed models on most standard benchmarks. Inference ran locally on consumer hardware. Post-training was accessible to anyone with a laptop and an afternoon to spare. Hugging Face hosted over 2 million public models. The revolution was complete.[58]

It also produced an outcome no one intended.

DeepSeek R1, released in January 2025 under an MIT license, achieved reasoning performance comparable to OpenAI’s o1 at a fraction of the cost.[59] It was the most impactful single model release of the year — it briefly moved Nvidia’s stock price and triggered a scramble among every major lab to release competing reasoning models. Its distilled variants ran on consumer GPUs. Qwen 3, released in April 2025, covered 119 languages under an Apache 2.0 license and became the world's most fine-tuned base model family.[60] By October 2025, Qwen had overtaken Llama in cumulative downloads on Hugging Face.[61] By December, Chinese-origin models accounted for 63% of all new fine-tuned or derivative models uploaded to the platform.[62]

This shift happened on merit. Qwen’s multilingual support is broader than Llama’s. DeepSeek’s MoE architecture is genuinely more efficient. The Apache 2.0 and MIT licenses are more permissive than Llama’s acceptable use policy. Chinese labs iterated faster, released more model sizes, and better served the global developer community’s needs than their Western counterparts. This is, by any measure, extraordinary open-source engineering — the kind of sustained execution that Llama’s head start should have made impossible. Meta’s Llama 4, launched in April 2025, received mixed reviews and failed to recapture ecosystem momentum, as measured by community adoption.[63]

MMLU Pro scores over time - Download full-resolution version

The structural consequence is that the foundation layer of the global open AI ecosystem — the base models on which hundreds of thousands of derivatives are built — is now predominantly produced by Chinese labs. This isn’t a conspiracy, and it isn’t a failure of open source. It is an emergent property of a system where adoption is driven by capability and licensing terms, and where the labs that iterated fastest and released most permissively happened to be in China.

The dependency is not abstract: content restrictions embedded in a Chinese base model persist by default during fine-tuning unless specifically removed; training data composition reflects the priorities of Chinese regulatory and commercial environments; and when 63% of derivatives share a foundation, the assumptions baked into that foundation propagate across the entire ecosystem.

The responses are forming, though none yet constitute a trend. Arcee AI framed Trinity Large as “a permanently open, Apache-licensed, frontier-grade alternative” built in the U.S. — and its CTO described the reasoning model, released this week, as “the strongest open model ever released outside of China.”[64] One startup does not make an ecosystem — but the claim is no longer aspirational. OpenAI released GPT-OSS in the summer of 2025. Google released Gemma 4 under an Apache 2.0 license the day this piece was published, the first time the Gemma family has shipped with a fully permissive license (MMLU / MMLU Pro scores will be added when they’re available).

The question none of them has answered: who pays for pretraining when the weights are free? Meta gives Llama away to drive platform adoption. Alibaba uses Qwen to drive Alibaba Cloud consumption. DeepSeek is funded by a quantitative trading fund. Arcee charges for API access. Every open model lab has a different subsidy structure, and none has proven that open-weight pretraining is a self-sustaining business.

As Hugging Face’s Spring 2026 State of Open Source report noted, Western organizations are now urgently seeking commercially deployable alternatives to Chinese models — a reversal of the dynamic that defined the field just two years earlier.[65] And as this piece went to publication, reports emerged that key members of the Qwen team — including lead researcher Junyang Lin — had resigned from Alibaba, a development that illustrates just how fragile ecosystem concentration at the base model layer can be.[66]

What Breaks

The open model ecosystem is simultaneously the most democratic and the most concentrated it has ever been. The tools are available to everyone. The base models are built by a handful of labs, most of which are funded by Chinese technology conglomerates or operate within the Chinese regulatory environment.

For the ecosystem to structurally rebalance, Western base model investment would need to match the iteration speed and licensing permissiveness of Chinese labs — not just training competitive models but releasing them under Apache 2.0 or equivalent, something Meta has been unwilling to do fully, and OpenAI has only begun to explore. The developer community would need to start weighing provenance alongside performance, making geopolitical risk and the composition of training data selection criteria rather than afterthoughts. And the engineering revolution that made open models possible would need to keep widening the gap between open and closed — every improvement in quantization, inference speed, and post-training tooling makes open models more attractive relative to API-dependent alternatives, but it favors whichever open model family the community builds on, regardless of origin.

In October 2021, I argued that the mega-model trend was unsustainable and that the future belonged to smaller, fine-tuned, optimized models running on accessible hardware. The prediction was correct. The mechanism was engineering — Flash Attention, quantization, MoE, llama.cpp, GGUF, Unsloth, MLX. The revolution happened exactly the way the industry said it wouldn’t.

What the prediction could not have anticipated was the question that now defines the field. The models are open, the tools are free, the engineering is democratized — but who builds the foundation that everyone else builds on, and what does it mean that the answer, increasingly, is Hangzhou?

Notes

[1] Julien Simon, “Large Language Models: A New Moore’s Law?”, Hugging Face Blog, October 26, 2021.

[2] Hugging Face, “State of Open Source on Hugging Face: Spring 2026,” March 2026. Qwen 2.5 models from 0.6B to 7B were collectively downloaded over 750 million times in 2025; the Qwen family has over 200,000 derivative models. Note: download counts are engagement proxies, not deployment counts — they include CI pipelines, research experiments, and multiple quantization downloads by the same user. Source. See also ATOM Project download data; AI World, “Chinese developers account for over 45% of top open-model public downloads,” December 2025.

[3] Xinhua, “Alibaba’s Qwen leads global open-source AI community with 700 million downloads,” January 13, 2026. Single-month December 2025 downloads exceeded the combined total of the next eight most popular model families (Meta, DeepSeek, OpenAI, Mistral, Nvidia, Zhipu.AI, Moonshot, MiniMax).

[4] Megatron-Turing NLG achieved approximately 87.1% on HellaSwag (few-shot), state-of-the-art for language models at the time of publication. Smith et al., “Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model,” January 2022. Infrastructure cost estimate from the author’s October 2021 blog post: “anyone looking to replicate this experiment would have to spend close to $100 million dollars.”

[5] Qwen3-14B achieves 81.1% on MMLU — per GPU benchmark testing, “territory that required 70B parameters just a year ago.” Qwen3-8B achieves 74.7% on MMLU (5-shot). See Awesome Agents, “Home GPU LLM Leaderboard,” February 2026; and “An Empirical Study of Qwen3 Quantization.” MT-NLG’s context window was 2,048 tokens; Qwen3 supports 32,768 natively, extendable to 131,072.

[6] Distil Labs, “We Benchmarked 12 Small Language Models Across 8 Tasks,” December 2025: “A 4B parameter model, properly fine-tuned, can match or exceed a model 30x its size.” Qwen3-4B-Instruct-2507 matched or exceeded a 120B+ teacher model (GPT-OSS) on 7 of 8 benchmarks.

[7] Hoffmann et al., “Training Compute-Optimal Large Language Models” (the Chinchilla paper), DeepMind, March 2022. Chinchilla (70B parameters, 4× more training data than Gopher) “uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.”

[8] Vaswani et al., “Attention Is All You Need,” NeurIPS 2017.

[9] Radford et al., “Improving Language Understanding by Generative Pre-Training,” OpenAI, June 2018.

[10] Devlin et al., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Google AI, October 2018.

[11] Wikipedia, “Large language model,” accessed March 2026. BERT “quickly became ubiquitous” following its 2018 release.

[12] The Hugging Face Transformers library (originally “pytorch-pretrained-bert”) was first released to provide a PyTorch implementation of BERT and has since expanded to support thousands of model architectures.

[13] Radford et al., “Language Models are Unsupervised Multitask Learners,” OpenAI, February 2019. The staged release of GPT-2 was framed as a safety measure; the full 1.5B model was released in November 2019.

[14] Brown et al., “Language Models are Few-Shot Learners,” NeurIPS 2020. GPT-3 (175B parameters) was available only through OpenAI’s API.

[15] EleutherAI released GPT-J-6B in June 2021 and GPT-NeoX-20B in April 2022. Both were fully open under Apache 2.0. Source.

[16] Zhang et al., “OPT: Open Pre-trained Transformer Language Models,” Meta AI, May 2022. 175B parameters, released with code and training logbook.

[17] BigScience Workshop, “BLOOM: A 176B-Parameter Open-Access Multilingual Language Model,” November 2022. Over 1,000 researchers contributed. Training compute provided by GENCI and IDRIS via the Jean Zay supercomputer.

[18] BLOOM was trained on the ROOTS corpus — 1.6 TB of data in 46 natural languages and 13 programming languages. Training used 384 A100 80GB GPUs over approximately 117 days (March–July 2022). Released under the BigScience RAIL License v1.0.

[19] BLOOM achieved competitive performance on multilingual benchmarks, with stronger results after multitask prompted fine-tuning (BLOOMZ). On English-only benchmarks, it generally matched but did not exceed OPT-175B or the best proprietary models of the period.

[20] Disclosure: I was Chief Evangelist at Hugging Face from 2021 to 2024 and was involved in the BigScience/BLOOM project. This assessment reflects the project’s public outputs and published evaluations.

[21] ChatGPT launched November 30, 2022. The 100-million-user figure within two months was widely reported; see Reuters, “ChatGPT sets record for fastest-growing user base,” February 2, 2023.

[22] Touvron et al., “LLaMA: Open and Efficient Foundation Language Models,” Meta AI, February 2023. Models at 7B, 13B, 33B, and 65B parameters.

[23] The LLaMA weights were uploaded as a torrent on 4chan on March 3, 2023. Meta filed DMCA takedown requests, but copies proliferated. See Wikipedia, “Llama (language model).”

[24] Meta released Llama 2 (7B, 13B, 70B) on July 18, 2023, with a commercial license and in partnership with Microsoft. The license included an acceptable use policy.

[25] Taori et al., “Stanford Alpaca: An Instruction-following LLaMA Model,” Stanford CRFM, March 2023. Fine-tuned LLaMA 7B on 52K instruction-following demonstrations generated via OpenAI’s text-davinci-003. Training cost under $600.

[26] Chiang et al., “Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality,” LMSYS Org, March 2023. Fine-tuned LLaMA 13B on ~70K ShareGPT conversations. Training cost ~$300. The “90% quality” claim used GPT-4 as a judge — the authors explicitly noted this was “a fun and non-scientific evaluation.”

[27] Georgi Gerganov, llama.cpp, first released in March 2023. Pure C/C++ implementation with zero dependencies.

[28] The GGUF (GGML Universal File) format was introduced in August 2023, superseding the original GGML format. It stores tensors, tokenizer vocabulary, and architecture metadata in a single self-contained binary file. Source.

[29] As of early 2026, llama.cpp has over 85,000 GitHub stars and supports dozens of model architectures, with backends for Metal (Apple), CUDA (Nvidia), ROCm (AMD), and CPU-only operation.

[30] Quantization quality retention varies across models and tasks. The ~92% figure for Q4_K_M relative to FP16 is a commonly cited approximation in the llama.cpp community for general-purpose language tasks; actual retention depends on the specific model and benchmark. See Maxime Labonne, “Quantize Llama models with GGUF and llama.cpp,” Towards Data Science, 2023.

[31] Dao et al., “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness,” NeurIPS 2022 (v1, May 2022). FlashAttention-2 was published in July 2023 with further improvements.

[32] FlashAttention support was added to llama.cpp on April 30, 2024, per the project’s changelog.

[33] Kwon et al., “Efficient Memory Management for Large Language Model Serving with PagedAttention,” SOSP 2023. vLLM’s PagedAttention manages the KV-cache using virtual memory paging concepts, enabling continuous batching and dramatically improving multi-user serving throughput.

[34] Speculative decoding uses a small “draft” model to generate candidate token sequences that are then verified in parallel by the larger “target” model. When the draft model’s predictions are correct (which is often the case for common tokens), the system achieves the quality of the large model at a speed closer to that of the small one. See Leviathan et al., “Fast Inference from Transformers via Speculative Decoding,” ICML 2023.

[35] Jiang et al., “Mixtral of Experts,” Mistral AI, January 2024. 46.7B total parameters, ~12.9B active per token, 8 experts with 2 active. Released under Apache 2.0.

[36] DeepSeek-AI, “DeepSeek-V3 Technical Report,” December 2024. 671B total parameters, 37B active. The ~$5.5M training cost figure was widely reported; the precise figure depends on assumptions about GPU rental rates and is DeepSeek’s claim, not an independently audited figure.

[37] Qwen Team, “Qwen3 Technical Report,” Alibaba, April 2025. Dense models at 0.6B through 32B and MoE models at 30B (3B active) and 235B (22B active). Trained on 36 trillion tokens in 119 languages and dialects. Apache 2.0 license. Technical report.

[38] Singh et al., “Arcee Trinity Large Technical Report,” Arcee AI, January 2026. 398B total parameters, ~13B active per token. 256 experts, 4 active per token. Trained on 17T tokens in ~33 days on 2,048 Nvidia B300 GPUs. Apache 2.0 license. The training cost of approximately $20M reported in VentureBeat. The reasoning variant, Trinity-Large-Thinking, was released April 1, 2026, and ranks #1 among open models in the U.S. on OpenRouter (#4 globally), with 3.37 trillion tokens served in its first two months.

[39] Arcee reports 2–3× faster inference than peers in the same weight class due to extreme sparsity (1.56% active parameters). Actual speedup depends on hardware, batch size, and sequence length.

[40] Qwen3-30B-A3B: 30B total parameters, ~3B active per token via MoE. Runs at approximately 196 tokens/second on RTX 4090. Quality competitive with 14B dense models. See Awesome Agents, “Home GPU LLM Leaderboard,” February 2026: “This is a Mixture-of-Experts model with 30B total parameters but only ~3B active at any time. It fits in 24GB easily and runs at nearly 196 tok/s on an RTX 4090 — faster than the 8B-dense models. Quality is competitive with 14B dense models.”

[41] Gu and Dao, “Mamba: Linear-Time Sequence Modeling with Selective State Spaces,” December 2023.

[42] Lieber et al., “Jamba: A Hybrid Transformer-Mamba Language Model,” AI21 Labs, March 2024. 52B total parameters, 12B active. Ratio of 1 Transformer layer per 7 Mamba layers. Released under Apache 2.0.

[43] Nvidia Nemotron-H replaces 92% of attention layers with Mamba2 blocks and reports up to 3× throughput improvement over similar-sized Transformers. IBM’s Granite 4.0 a uses hybrid SSM architecture. See Nvidia technical blog and IBM Research documentation, 2025.

[44] IBM Research, Granite 4.0 evaluation results: “pure SSM models match or exceed Transformers on many tasks, but Mamba and Mamba-2 models remain significantly behind on tasks requiring strong copy or in-context learning capabilities.” Cited in multiple secondary sources.

[45] Grouped-Query Attention (GQA) was introduced in Ainslie et al., “GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints,” 2023. Used in Llama 2 and subsequent models to reduce KV-cache memory requirements.

[46] Multi-Head Latent Attention (MLA) was introduced in DeepSeek-V2 (May 2024) and retained in DeepSeek-V3. It compresses key-value representations through learned low-rank projections, further reducing KV-cache size. Source.

[47] Sebastian Raschka, “The Big LLM Architecture Comparison,” Ahead of AI, July 2025.

[48] Hu et al., “LoRA: Low-Rank Adaptation of Large Language Models,” 2021.

[49] Hugging Face TRL (Transformer Reinforcement Learning) library provides high-level APIs for SFT, DPO, PPO, and RLHF.

[50] Unsloth provides optimized LoRA/QLoRA fine-tuning with reduced memory usage. The “2× faster, 70% less memory” claim is Unsloth’s; actual speedups depend on model, hardware, and configuration.

[51] Hugging Face, “State of Open Source on Hugging Face: Spring 2026.” Activity of 1,000–2,000 new models uploaded per day. 13 million users, 2 million+ public models, 500,000+ datasets as of early 2026.

[52] Hugging Face Spring 2026 report: “Alibaba as an organization has more derivative models than both Google and Meta combined, with the Qwen family constituting more than 113,000 derivative models. When including all models that tag Qwen, that number balloons to over 200,000 models.”

[53] Julie Bort, “Tiny startup Arcee AI built a 400B-parameter open source LLM from scratch to best Meta’s Llama,” TechCrunch, January 28, 2026. CEO Mark McQuade: “We would take a Llama model, we would take a Mistral model, we would take a Qwen model... and we would post-train it to make it better.” The pivot to pretraining was driven by dependency risk and U.S. enterprise customer discomfort with Chinese base models.

[54] Apple Silicon’s unified memory architecture, introduced with M1 in November 2020, allows CPU and GPU to access the same memory pool without PCIe bus transfers. For LLM inference, where memory bandwidth rather than compute is typically the bottleneck, this architecture provides significant advantages. The M3 Max achieves approximately 400 GB/s memory bandwidth.

[55] Apple, MLX framework, released December 2023. An array framework for machine learning on Apple Silicon, optimized for unified memory.

[56] The three-tool pipeline (LM Studio for evaluation, Ollama for development, vLLM for production) is a common practitioner pattern described across multiple community guides as of early 2026.

[57] See Julien Simon, “Access, Disable, Destroy,” The AI Realist. The coercion stack’s three layers — chips, cloud, models — represent dependencies that can be deliberately activated. Local inference on owned hardware bypasses the cloud layer entirely.

[58] Hugging Face Spring 2026 report. Over 30% of the Fortune 500 maintain verified accounts on Hugging Face. Established companies, including Airbnb, have increased engagement with the open ecosystem.

[59] DeepSeek-AI, “DeepSeek-R1,” January 20, 2025. MIT license. Achieves performance comparable to OpenAI o1 on AIME 2024 (79.8% pass rate) and MATH-500 (97.3%). Distilled variants from 1.5B to 70B parameters.

[60] Qwen 3 is described by Interconnects’ “2025 Open Models Year in Review” as “the choice for a lot of problems, especially in terms of multilinguality” and the “most-used base model to fine-tune.”

[61] ASO World, “Qwen Surpasses Llama in Hugging Face AI Model Downloads,” December 2025. Cumulative downloads of approximately 385 million for Qwen vs. 346 million for Llama by mid-December 2025.

[62] Stanford HAI / DigiChina issue brief, “Beyond DeepSeek: China’s Diverse Open-Weight AI Ecosystem and Its Policy Implications,” December 2025. “Chinese fine-tuned or derivative models made up 63% of all new fine-tuned or derivative models released” on Hugging Face in 2025.

[63] Meta released Llama 4 in April 2025 with “Scout” and “Maverick” variants. VentureBeat reported that the release received a “mixed reception,” and Meta AI researcher Yann LeCun subsequently acknowledged the company had used specialized training approaches that limited the models’ generalizability.

[64] Mark McQuade, CEO of Arcee AI, quoted in TechCrunch: “Arcee exists because the U.S. needs a permanently open, Apache-licensed, frontier-grade alternative that can actually compete at today’s frontier.” CTO Lucas Atkins described Trinity-Large-Thinking as “on many axes, the strongest open model ever released outside of China.” Disclosure: the author was Chief Evangelist at Arcee AI until November 2025. All benchmarks and adoption figures cited in this piece are independently verifiable via OpenRouter and the sources linked above.

[65] Hugging Face Spring 2026 report: “Western organizations increasingly seek commercially deployable alternatives to Chinese models, creating urgency around efforts like OpenAI’s GPT-OSS, AI2’s OLMo, and Google’s Gemma to offer competitive open options from US and European developers.”

[66] Simon Willison, “Something is afoot in the land of Qwen,” March 4, 2026. Lead researcher Junyang Lin and several core team members — including leads for code, post-training, and VL development — resigned from Alibaba’s Qwen team in early March 2026, reportedly triggered by an internal reorganization. Alibaba’s CEO attended an emergency all-hands meeting. As of publication, the situation remains fluid.

What China Shipped While the West Was Banning Chips

Julien Simon — Wed, 01 Apr 2026 21:06:37 GMT

In February 2026, Chinese AI company Zhipu AI released GLM-5, a large language model trained entirely on Huawei Ascend chips — silicon fabricated by SMIC on a 7-nanometer process, using equipment the United States has spent three years trying to deny China access to. GLM-5 led on two of three frontier benchmarks and came within two points on the third — SWE-bench, where only GPT-5.2 scored higher.[1] The model that was supposed to be impossible is now competing at the frontier.

That result was not an anomaly. It was a data point in a pattern that Western analysis, organized around what China cannot access, has systematically failed to see.

Last week, ASPI updated its Critical Technology Tracker with the complete 2025 dataset. China now leads in high-impact research output in 69 of 74 critical technologies tracked — up from 57 of 64 in late 2024, and just 3 of 64 in the 2003–2007 period.[2] The trajectory is not a line. It is a steepening curve.

The standard response to these numbers is to recite what China still lacks: EUV lithography (the ultraviolet light technology that prints the smallest chip features — monopolized by the Dutch company ASML), top-end Nvidia GPUs, TSMC’s (Taiwan Semiconductor Manufacturing Company) most advanced nodes, the CUDA software ecosystem that locks AI developers to Nvidia hardware. Those constraints are real, and this piece will address them honestly. But the access-denial frame measures inputs. What it misses is output. China is deploying — at industrial scale, across semiconductors, AI models, cloud infrastructure, nuclear energy, satellite constellations, and export control evasion — a full-stack technological alternative. The six domains do not advance independently. They compound.

The West measures what China can’t get. China measures what it has shipped. The deployment record tells a story that the restriction narrative does not.

The Deployment Snapshot

Here is what China has deployed, domain by domain. None of these is the story alone. The story is what happens when all six feed each other.

Semiconductors. SMIC, China’s largest foundry, posted record revenue of $9.3 billion in 2025 using an older light-printing technology called DUV (deep ultraviolet) lithography at the 7-nanometer node — a brute-force approach requiring 34 printing passes where ASML’s newer EUV systems would need nine.[3] National chip production hit a record 484.3 billion units (mostly at legacy nodes), up 85.2 percent from 2020.[4] Huawei’s Ascend 910C delivers 60 percent of Nvidia’s H100 inference performance according to DeepSeek researchers — and the next generation is already arriving: Huawei claims its Atlas 350 accelerator card, equipped with the Ascend 950PR, delivers 2.8 times the performance of Nvidia’s H20 — the throttled chip Nvidia designed specifically to comply with US export limits.[5] In early 2026, Chinese authorities partially banned imports of the H200 itself, citing concerns over its tracing capabilities — the restricted country rejecting the restricted product.[6] The chip cannot compete at frontier training. But for inference — the workload that matters for deployment — it is sufficient. And sufficiency at scale, not parity per chip, is the metric that determines outcomes.

AI models. Between February 2025 and February 2026, Chinese models accounted for 41 percent of all Hugging Face downloads versus 36.5 percent for American models — a complete inversion from two years earlier.[7] Alibaba’s Qwen family overtook Meta’s Llama as the most downloaded model series. Sixty-three percent of all new fine-tuned models on Hugging Face in September 2025 were built on Chinese base models.[8] On benchmarks, multiple Chinese models now match or exceed Western frontier performance — and the pricing is the structural signal: comparable flagship APIs are modestly cheaper, but Chinese open-weight models available for self-hosting at near-zero cost have no Western equivalent at the same quality tier.[9]

Cloud. China’s cloud market reached approximately $50 billion in 2025, up over 20 percent year over year.[10] Alibaba Cloud commands roughly a third of the domestic market and has sustained triple-digit AI revenue growth for ten consecutive quarters. But the most striking number belongs to ByteDance’s Volcano Engine, which captured approximately half of China’s public cloud large-model invocations in the first half of 2025 — more than Alibaba and Baidu combined — processing 30 trillion tokens daily.[11] The domestic cloud market is self-contained: Chinese chips running Chinese models on Chinese infrastructure powered by a Chinese grid. The US Big Three generate roughly six times more cloud revenue globally.[12] But revenue measures commercial scale. What matters for the thesis is strategic autonomy — and the Chinese cloud stack has it.

Nuclear energy. China’s HTR-PM at Shidao Bay achieved commercial operation in December 2023 — the world’s first Generation IV reactor.[13] Linglong One, a 125-megawatt pressurized water reactor on Hainan Island, completed its steam turbine test in December 2025 and is on track for commercial operation in the first half of this year — positioning it as the world’s first land-based commercial small modular reactor.[14] China has approximately 38 reactors under construction — roughly half the global total.[15] Its 61 operating reactors now surpass France’s 57, and the fleet is projected to overtake the United States’ 94 by approximately 2030.[16]

Space. China has filed applications with the International Telecommunication Union for approximately 51,000 low-Earth-orbit satellites across at least six constellation programs — including the three largest: GuoWang, Qianfan, and Honghu-3.[17] GuoWang, the state-backed broadband constellation, had approximately 163 satellites in orbit by mid-March 2026, following 20 launch missions, with a target of 310 by year-end.[18] Qianfan, backed by Shanghai, has launched 108 satellites and plans to launch 324 more in 2026.[19] Against Starlink’s 10,000-plus satellites and ten million subscribers, China lags by five to six years in deployment. But the launch cadence — 92 orbital missions in 2025, up 35 percent — and the GalaxySpace IPO filing announced this week suggest the deployment is accelerating, not plateauing.[20]

Enforcement. On March 19, a federal indictment was unsealed charging Supermicro co-founder Yih-Shyan Liaw and two associates with conspiring to divert $2.5 billion in AI servers to China through a Southeast Asian intermediary.[21] Workers used hair dryers to peel serial-number stickers off servers, swapping them for stage dummy units for auditors. When a broker sent Liaw a news article about other chip smugglers being arrested, he replied with sobbing-face emojis — and kept shipping.[22] The export controls have been in place since October 2022. The scheme ran for roughly two years before a single arrest. The enforcement model prosecutes after the fact — it does not intercept in real time. By the time an indictment is unsealed, the diverted hardware is already a generation behind the frontier.

None of these findings, taken individually, is unknown. Each has been reported. What has not been assembled is the picture they form together — and why the aggregate changes the assessment.

The Compound

The reason single-domain analysis yields systematically incorrect conclusions about China’s competitive position is that the six domains do not operate independently. They compound.

The mechanism is a closed loop, and the cloud is the integration layer where you can watch it operate.

Alibaba Cloud runs Qwen models on Ascend chips and serves over 90,000 enterprise clients through its Model Studio platform.[23] ByteDance’s Volcano Engine runs Doubao on infrastructure that includes both Nvidia hardware (accessed through offshore Malaysian clusters) and an expanding domestic Ascend deployment.[24] These are not research demonstrations. They are production systems processing trillions of tokens per day — China’s total large-model invocations reached 536.7 trillion tokens in the first half of 2025 alone.[25] The revenue from these services funds the next training run on the next generation of domestic chips, which in turn funds the next foundry expansion. The loop feeds itself.

The pricing dynamics reveal how the compounding actually works. In May 2024, DeepSeek released V2 with API pricing of 1 yuan (approximately 14 cents) per million tokens. Baidu’s equivalent was charging 120 times more.[26] Within days, Alibaba slashed Qwen-Long pricing by 97 percent. Baidu made two model tiers free. ByteDance followed. By December 2024, ByteDance had dropped its Doubao vision model to $0.00041 per thousand tokens; Alibaba matched the price within two weeks.[27] The cascade was not irrational subsidization. It was the economic signature of the compounding mechanism: when the chip is domestic, the model is open-weight, and the cloud is your own, the marginal cost of inference approaches the cost of electricity. And China has more electricity than it needs.

The resulting pricing gap is visible at every tier. Flagship to flagship, OpenAI’s GPT-5.4 charges $2.50 per million input tokens; Alibaba’s Qwen-Max charges $1.60 — a modest gap that reflects comparable infrastructure costs.[28] But the Chinese ecosystem also offers what the Western ecosystem does not: open-weight models at near-zero cost. DeepSeek’s V3 charges $0.27 input and $1.10 output — and can be self-hosted for free. For a startup in Jakarta or a government agency in Riyadh, choosing between a $15-per-million-output-token proprietary API and a $1.10 open-weight alternative, the structural incentive is overwhelming. Some of that gap is competitive subsidization — Tencent’s own vice president has acknowledged that AI revenue is not yet scaling in China.[29] But the structural floor is real: when every layer of the stack is domestically owned, the cost basis is fundamentally lower. The subsidy can end, and the advantage persists.

Energy feeds the entire stack. China’s national grid operates with a reserve margin that Goldman Sachs estimates will reach 400 GW of spare generation capacity by 2030 — roughly triple the expected needs of the entire global data center fleet.[30] While American hyperscalers scramble for power, the PJM grid operator’s capacity auction fell 6,600 MW short in December 2025, and US transformer lead times run 143 weeks; Chinese data center operators procure domestically manufactured transformers in 48 weeks.[31] The “Eastern Data, Western Computing” initiative, launched in 2022 with over $28 billion invested to date, has established eight national computing hub nodes redistributing compute to renewable-rich western provinces.[32] It is the infrastructure equivalent of what hyperscaler nuclear announcements aspire to be, except it exists.

The satellite constellations provide a connectivity layer that is structurally significant for reasons that go beyond broadband. GuoWang and Qianfan are being built by companies that also serve as partners in China’s expanding cloud infrastructure across Southeast Asia, the Middle East, and Africa.[33] GalaxySpace — which filed for its IPO today — has demonstrated direct-to-cell satellite communication and established partnerships in Thailand, the UAE, Saudi Arabia, Indonesia, and Malaysia.[34] These are the same geographies where Huawei Cloud and Alibaba Cloud are building data centers and signing government contracts. The cloud and connectivity layers are converging on the same markets simultaneously.

The enforcement evasion lane — the $2.5 billion Supermicro scheme, the offshore compute clusters ByteDance operates in Malaysia, the stockpiled million-plus H20 chips Chinese companies acquired before the ban — functions as bridge technology.[35] Every leaked or legally accessed GPU buys time for the domestic stack to mature. But the bridge metaphor holds only if domestic training capability arrives within one GPU generation cycle. If it doesn’t, the compound’s inference layer rests on models trained on hardware it cannot replenish. By the time enforcement catches up to the diverted hardware, the domestic alternative has advanced another step. The bridge does not need to last forever. It needs to last until the loop closes — and for inference, that point may already have arrived.

Why It’s Working This Fast

The deployment velocity across six domains is not luck. It is the output of a system designed to produce exactly this result.

China graduates approximately 3.6 million STEM students per year — roughly five times the United States’ total across all degree levels.[36] The engineering pipeline alone produces 1.3 to 1.5 million graduates annually.[37] Since 2018, Chinese universities have established over 2,300 dedicated AI programs, creating a feeder system from undergraduate education to the labs building the models and the factories building the chips.[38] These graduates do not, by and large, flow into finance. The structural incentive in China channels technical talent toward engineering, manufacturing, and applied research — a pattern that DeepSeek itself illustrates. The company spun out of High-Flyer, a quantitative hedge fund that was hit by a government crackdown on computer-driven trading. The talent that might have gone to Wall Street in the US ended up at a model lab in Hangzhou.

The state planning apparatus converts this talent advantage into coordinated deployment at a speed that market-driven systems cannot replicate. China’s 15th Five-Year Plan (2026–2030), approved in March 2026, mentions artificial intelligence 52 times — quadrupling the previous cycle.[39] The State Council’s “AI+” initiative, released in August 2025, outlines a three-step plan: 70 percent AI application penetration in six priority sectors by 2027, comprehensive AI-driven development by 2030, and full integration across the economy by 2035.[40] The semiconductor roadmap is equally specific: stabilize production at the 28-nanometer node (mainstream chips for cars and appliances), achieve reliable 14-nanometer manufacturing (mid-range processors), and create an entirely Chinese 7-nanometer production line — with lithography identified as the priority bottleneck.[41]

The core AI industry was valued at 1.2 trillion yuan ($165 billion) in 2025; the broader AI-related economy — including applications and services — is targeted to exceed 10 trillion yuan by 2030.[42] These are not aspirational white papers. They trigger mandatory coordination across all central ministries, provincial governments, and state financial institutions — cascading into competitive provincial implementation. Guangdong alone launched a 10 billion yuan AI and Robotics Industry Fund in early 2025. Shenzhen is building a dedicated industrial zone for embodied intelligence.

The West tends to explain China’s AI progress as a story about one company — DeepSeek — and one model. The deployment record tells a different story. It is the output of a system that has been building this capability for a decade, through state investment in education, coordinated industrial policy, and an engineering culture that treats manufacturing speed as a competitive weapon. The restrictions did not slow the system. In some domains, they accelerated it.

The Honest Gaps

The chokepoints are real, and one of them is genuinely binding.

SMIC cannot fabricate at 5 nanometers in production — it is stuck at 7 nanometers using repeated DUV printing passes, while TSMC ships at 2 nanometers using EUV.[43] The gap is not the chip. The gap is the machine that makes the chip. ASML, a single Dutch company, holds a global monopoly on EUV lithography — the technology required to print features below 7 nanometers in a single pass. No other company on earth can build one. Each standard EUV system costs $200 to $300 million, weighs 180 tons, and contains 100,000 components sourced from hundreds of suppliers across a dozen countries — the newest High-NA variant exceeds $380 million. ASML shipped 42 percent of its systems to China in Q3 2025 — all of them older DUV models, because the United States has pressured the Netherlands to block EUV exports since 2019.[44]

This is the binding constraint. China’s most advanced domestic lithography tool, from SMEE (Shanghai Micro Electronics Equipment), is a production-grade system operating at 90 nanometers — roughly where ASML was in 2004.[45] SMEE’s next-generation SSA800 immersion scanner targets 28-nanometer capability, and a prototype from the Huawei-linked startup Yuliangsheng is reportedly being tested at SMIC.[46] But even optimistic assessments put domestic sub-10-nanometer lithography beyond 2030. ASML’s CEO Christophe Fouquet said in April 2025 that China would need “many, many years” to catch up.[47] CSIS (the Center for Strategic and International Studies) concluded that China’s recent lithography announcements “reveal more exaggeration than transformation.”[48]

China knows this is the bottleneck. The semiconductor roadmap published ahead of the 15th Five-Year Plan explicitly prioritizes lithography, targeting a fully Chinese 7-nanometer production line.[41] At least three separate domestic efforts are pursuing EUV development.[49] But a patent is not a machine, and a machine is not an ecosystem. ASML’s EUV monopoly rests on decades of integration across optics (Zeiss), light sources (Cymer/ASML), and precision engineering that cannot be replicated by reverse engineering or state funding alone. This is one constraint that the deployment compound cannot route around through efficiency. It requires building something that has never been built outside a single factory in Veldhoven.

The Ascend 910C’s 60 percent inference performance relative to the H100 matters less than the training gap. Long-term training stability on Ascend hardware remains, in DeepSeek researcher Yuchen Jin’s assessment, “the biggest challenge for Chinese chips,” because Nvidia’s two-decade CUDA (Compute Unified Device Architecture) ecosystem cannot be replicated by engineering workarounds alone.[50] Huawei’s CANN (Compute Architecture for Neural Networks), its CUDA equivalent, is years from maturity — developers describe it as “difficult and unstable” with “disorganized” documentation.[51] Every model running on Ascend requires deep optimization by Huawei engineers, a bottleneck that does not scale. Ecosystem adoption cannot be purchased. It has to be earned.

The capital expenditure gap is enormous. US hyperscalers committed approximately $350 to $400 billion in infrastructure spending for 2025 — roughly seven times the combined capex of Chinese providers.[52] The global cloud market outside China remains overwhelmingly American: AWS, Azure, and Google Cloud hold the top three positions in every major region. Chinese providers generate 90-95% of their cloud revenue domestically.[53]

In space, no Chinese company has recovered an orbital rocket booster — the gap with SpaceX is measured in years, not months.[54]

And perhaps the most telling measure: despite the deployment record documented above, Goldman Sachs estimated that Chinese suppliers met only about 14 percent of domestic semiconductor demand by value in 2024, projecting 37 percent by 2030 — far short of the 14th Five-Year Plan’s 70 percent target, now raised to 80 percent by thirteen industry leaders for the 15th plan cycle.[55] Wang Yangyuan, the co-founder of SMIC itself, has described China’s semiconductor industry as “small, dispersed, and weak.”[56] The deployment compound is real, but the self-sufficiency it is building toward is not yet achieved.

Do the constraints propagate linearly through the stack? Does a two-generation chip gap produce a two-generation model gap? Does a seven-to-one capex gap produce a seven-to-one capability gap?

The evidence says no. Efficiency innovations at the model layer — pioneered in part because of the hardware constraint — compress the chip gap before it reaches the output. DeepSeek built its R1 reasoning model on top of V3, which was pre-trained on 2,048 H800 GPUs at a reported cost of $5.6 million — the full pipeline produced a model that matched systems trained on hardware an order of magnitude more expensive.[57] Zhipu AI’s GLM-5, trained entirely on Ascend chips, matched models trained on unrestricted Nvidia hardware.[58] The constraint did not just fail to prevent competitiveness. In some cases, it appears to have produced an engineering discipline that accelerated it.

This is not a universal claim. The training gap for the largest frontier models remains significant, and a sophisticated skeptic would ask: Does today’s inference sufficiency survive the next model generation if Chinese labs cannot train at frontier scale on domestic hardware? The answer depends on whether efficiency gains at the model layer continue to compress the hardware gap faster than it widens. So far, the compression is winning. But the question is live, and the lithography constraint is the reason it stays live.

The Global South

Huawei Cloud launched its Saudi Arabia region in September 2023 with a $400 million five-year commitment. Huawei reports tenfold revenue growth in the first year, with over 1,000 customers, including government agencies and major telecommunications companies.[59] Huawei claims to be the first cloud provider fully compliant with Saudi government data security policies — a sovereignty pitch that maps directly to the concerns this Substack has documented in European enterprises struggling with CLOUD Act exposure.[60] Alibaba Cloud opened data centers in Mexico, Brazil, and Dubai in 2025 and operates a joint venture with STC (Saudi Telecom) in Saudi Arabia.[61] Tencent Cloud announced a $150 million investment in its first Middle East data center.[62]

But the sharpest evidence of the deployment compound reaching the Global South is not infrastructure investment. It is model adoption.

Singapore’s national AI program, AI Singapore, switched the foundation of its flagship Sea-Lion language model from Meta’s Llama to Alibaba’s Qwen in November 2025 — a state-level adoption decision in one of the world’s most technically sophisticated governments.[63] In October 2025, Uganda launched Sunflower, a national language model built on Qwen architecture, delivering agricultural advice in Luganda and educational content in local dialects for 46 million people.[64]

Malaysia deployed NurAI, described as the world’s first Sharia-aligned large language model, built on DeepSeek’s foundation and refined by the China-ASEAN AI Laboratory — a government initiative explicitly designed to deliver Chinese AI technology to Southeast Asian nations.[65] DeepSeek sent engineers to help build it. The model targets 340 million people across Malaysia, Indonesia, and Brunei.

The enterprise layer is following. In Singapore, OCBC rolled out over 30 internal AI tools powered by Qwen and DeepSeek.[66] In Indonesia, Indosat partnered with AIonOS to build sector-specific tools on DeepSeek. Malaysia’s Communications Ministry launched a sovereign AI ecosystem running on Huawei GPUs. Microsoft’s AI Economy Institute found that DeepSeek usage in Africa is two to four times higher than in other regions.[67]

The submarine cable layer completes the physical infrastructure. HMN Tech, formerly Huawei Marine, has delivered over 100,000 kilometers of submarine cable across 70-plus countries at costs 20 to 30 percent below Western rivals.[68] The PEACE Cable (Pakistan and East Africa Connecting Europe), running from Singapore through Pakistan, Djibouti, Kenya, Egypt, and on to France, provides 13 landing points across 12 countries.[69] Between 2017 and 2023, Chinese companies invested approximately $22 billion in digital infrastructure under the Digital Silk Road.[70]

What these countries are being offered is not inferior technology at a discount. It is a complete stack — chips, models, cloud, connectivity, and increasingly energy infrastructure — that operates outside the jurisdiction of the three switches I mapped in “Access, Disable, Destroy”: the chip switch (US export controls), the cloud switch (CLOUD Act), and the model switch (API revocation).[71] For a government in Riyadh, Jakarta, or Nairobi evaluating its options, the Chinese stack does not eliminate dependency. It changes the direction of the dependency. The coercion risk does not disappear — China’s own National Intelligence Law mandates cooperation with state intelligence, creating its own access concerns. And the content control problem is real: researchers testing NurAI found that prompts with minor grammatical errors cause the model to revert to DeepSeek’s default training, producing Chinese government-aligned responses on Taiwan’s status.[72] The fine-tuning modifies behavior. It does not excise the base model’s constraints — a trap this Substack has documented for every country attempting to build sovereignty through download rather than training.[73]

But for countries whose primary strategic anxiety is American extraterritorial reach, the Chinese alternative addresses the specific risk they are trying to mitigate. And the competitive frame matters. This is not China “catching up” in markets the US already serves. In many of these geographies, the US hyperscalers have historically underinvested. Huawei Cloud operates more regions in Latin America than any Chinese competitor and claims the most cloud locations of any provider in the region.[74] The infrastructure is being built at a price point and sovereignty structure that the US was not offering. The adoption is happening where US hyperscalers were present but not competitive on the terms these governments care about most. Not all of this adoption is equally durable — a government that downloaded Qwen can download Llama next quarter. But the infrastructure layer — submarine cables, data centers, cloud contracts — creates switching costs that model downloads do not. By the time the strategic significance registers in Washington, the cables are laid, the data centers are operational, and the switching costs are structural.

The deployment compound’s strategic reach does not stop at the Global South. It extends, through a structural irony, into Europe itself.

The Sovereignty Irony

At the Berlin Summit on European Digital Sovereignty in November 2025, Macron warned that Europe must avoid becoming a “vassal” of the United States or China in the technology sphere.[75] “You cannot dedicate the strength of your economy to the Magnificent Seven,” he told the audience. At Davos in January, he described US tariffs as “fundamentally unacceptable — even more so when they are used as leverage against territorial sovereignty.”[76] In the same speech, he said something the sovereignty narrative rarely quotes: “China is welcome, but what we need is more Chinese foreign direct investments in Europe, in some key sectors, to contribute to our growth, to transfer some technologies.”[77] At Munich in February, he urged the EU to deploy its “anti-coercion mechanism” against American economic pressure.[78] The French president is simultaneously building a wall against American platforms and opening a door to Chinese capital. The contradiction is structural, not personal: Europe needs technology transfer it cannot generate domestically, and China is the partner willing to offer it on terms that do not include the CLOUD Act.

Foreign Policy reported the consensus plainly: “Europeans are worried that Trump may weaponize tech next — threatening to disrupt or cut off digital services to extract concessions. None of the researchers, European officials, and experts deemed that possibility overly far-fetched.”[79]

The institutional response is accelerating — and failing. After US sanctions led to the ICC’s chief prosecutor being locked out of his Microsoft Outlook account, the court replaced Microsoft with an open-source suite from the German Center for Digital Sovereignty.[80] Germany’s Schleswig-Holstein migrated 40,000 government email accounts from Exchange. Denmark began phasing out Office 365. But when the Dutch government chose a local provider, Solvinity, specifically to avoid CLOUD Act exposure, the American firm Kyndryl acquired Solvinity five months later, collapsing the exit before the migration was complete.[81] Microsoft France’s director of public and legal affairs told the French Senate he could not guarantee that customer data would never be transferred to US authorities under the CLOUD Act.[82] European sovereign cloud spending is forecast to triple between 2025 and 2027.[83]

These policies are designed to reduce dependence on American platforms. They are not designed to block Chinese models — and structurally, they cannot. The open-weight Chinese stack passes through the sovereignty filter that the EU built to catch US hyperscalers. Qwen and DeepSeek can be downloaded, run locally on European infrastructure, and fine-tuned without any API dependency, CLOUD Act exposure, or kill-switch risk. The developer in Berlin who migrates from GPT-4 to a locally hosted Qwen model is making the same structural move as the government in Schleswig-Holstein migrating from Exchange to Thunderbird — open-source, locally controlled, no foreign jurisdiction. Sixty-three percent of new fine-tuned models on Hugging Face are already built on Chinese base models.[84] That figure does not distinguish European developers from others. But the incentive structure is clear: Europe’s sovereignty policies push developers toward open-weight models, and the best open-weight models are increasingly Chinese. Europe is investing in alternatives — Mistral, OVHcloud, the European Chips Act — but none yet compete with Qwen or DeepSeek at the model distribution layer.

Europe will not adopt Huawei Cloud. European governments have banned DeepSeek from official devices in Italy, Denmark, Belgium, and the Czech Republic, and Estonian intelligence has warned that the model embeds Chinese propaganda.[85] The content control problem that surfaces in NurAI surfaces in European contexts too — DeepSeek produces Russian-aligned responses to Ukraine questions when prompted in Russian rather than English.[86] Europe knows the risk. But the governance mechanisms Europe is building govern platforms, not weights.[87] They block the cloud switch but not the model layer when the model runs locally. The Chinese deployment compound does not need Europe to adopt its cloud. It needs Europe’s sovereignty panic to fragment the Western tech ecosystem — and that fragmentation is already underway.

What the West Gets Wrong About the Response

The deployment compound is not inevitable. But responding to it requires understanding why the current response is structurally misallocated.

The export controls were designed to maintain a generational lead in frontier AI training — a narrower and more defensible goal than preventing Chinese AI capability entirely. But the access-denial architecture concentrates nearly all leverage at a single chokepoint: the chip layer. Export controls on GPUs. Restrictions on lithography tools. Pressure on ASML and TSMC to cut off advanced manufacturing. This is the layer where the advantage is genuine — ASML’s EUV monopoly, TSMC’s manufacturing lead, Nvidia’s training ecosystem. It is also the layer with the most dangerous geographic assumption. If the Strait of Hormuz is the chokepoint of global energy security, the Taiwan Strait is the chokepoint of global chip security — and the country the export controls are designed to constrain sits 130 kilometers away. ASML is Dutch, Nvidia is American, but TSMC fabs the chips both depend on, and those fabs are on an island that Beijing claims as sovereign territory. China does not need to invade Taiwan to leverage this. The threat alone reshapes the calculus. The entire export control architecture rests on a manufacturing base that a single geopolitical crisis could remove from the board.

And it is the layer that matters least for deployment outcomes, because the deployment compounds around it. Inference runs on 7-nanometer chips that the controls did not stop. Models trained on restricted hardware get released as open weights and fine-tuned on unrestricted hardware. Cloud infrastructure runs on domestic silicon.

The mismatch is visible in three places where the West has leverage it is not using.

The first is the model layer. Nearly two-thirds of new fine-tuned models on Hugging Face are built on Chinese base models — not because those models are the best in the world, but because they are the best available under permissive open-weight licenses at near-zero cost. Meta’s Llama held this position until Qwen displaced it. OpenAI released its first open-weight models in August 2025, years late.[88] The open-weight ecosystem is the distribution channel through which the deployment compound reaches developers worldwide. Matching Alibaba and DeepSeek on open-weight quality, cadence, and licensing would not require a technical breakthrough — the frontier labs have the models. The obstacle is strategic: the business models of US frontier labs depend on keeping their best models proprietary, ceding the distribution layer to China. A strategic response would treat open-weight releases as a national security investment, not a commercial sacrifice.

The second is infrastructure in the Global South. The submarine cables are being laid. The data centers are being constructed. The switching costs are accumulating. Every year of underinvestment by US hyperscalers in Africa, Latin America, and Southeast Asia is a year of Chinese infrastructure deployment that will take a decade to displace. The US International Development Finance Corporation, EXIM Bank, and allies like Japan’s JBIC have financing instruments designed for exactly this purpose. They have not deployed them at the scale or speed the deployment compound demands.[89] The reason is the same structural mismatch: Western policy treats infrastructure as a commercial asset, while China treats it as a strategic asset. Commercial logic says the return on a data center in Nairobi is lower than in Virginia. Strategic logic holds that the data center in Nairobi determines which technology stack an entire region builds on for the next 20 years.

The third is enforcement architecture. The Supermicro scheme diverted $2.5 billion in AI servers over two years before a single arrest.[90] Enforcement must move from prosecution to interdiction: real-time tracking of controlled hardware through the supply chain, mandatory end-use verification before shipment rather than after diversion, and penalties that fall on companies whose compliance programs failed—not just on the individuals who circumvented them. In a deployment race measured in months, a prosecution lag of one GPU generation is the gap through which $2.5 billion flows.

The common thread: in each case, the West has genuine leverage that it is not deploying because its strategic framework measures the wrong thing. It measures chip restrictions instead of model adoption. It measures hyperscaler revenue instead of Global South infrastructure. It measures indictments instead of interceptions. The access-denial frame produces an access-denial response. A deployment frame would produce a deployment response—and that response would look entirely different.

What the Deployment Record Means

The standard Western assessment of China’s technological position runs through a checklist of what China cannot access — EUV lithography machines, top-end Nvidia GPUs, TSMC’s most advanced manufacturing nodes, the CUDA developer ecosystem — and concludes that the restrictions are working. That conclusion is correct at the component level but wrong at the system level. It is the difference between measuring the inputs to a factory and measuring what the factory ships.

What China has shipped, as of March 2026: frontier AI models that match Western benchmarks at a fraction of the cost. A domestic AI chip ecosystem that shipped 1.65 million accelerator cards last year — 41 percent of the Chinese market — with Huawei alone accounting for over 800,000.[93] A $50 billion cloud market running domestic chips and domestic models in a self-contained stack. The world’s only operating Generation IV reactor and its first commercial land-based SMR, backed by 38 reactors under construction. Two mega-constellations with 270-plus satellites in orbit and 51,000 planned. And an enforcement regime that, at the scale of a single indictment, took two years to catch a $2.5 billion smuggling operation run with hair dryers and sobbing emojis.

White House AI Czar David Sacks has estimated that China’s AI sector lags by 3 to 6 months.[91] A senior US AI executive told the House Select Committee that the real gap is closer to three months.[92] Those estimates address a single domain. The deployment compound operates across six. By the time the model gap closes or widens by another quarter, the submarine cables are laid, the cloud contracts are signed, and the switching costs are structural. The West is measuring the scoreboard. China is building the stadium.

Notes

[1] Zhipu AI GLM-5: trained entirely on 100,000 Huawei Ascend 910B chips using the MindSpore framework. Released February 11, 2026. SWE-bench Verified: 77.8% (vs. GPT-5.2 at 80.0% at xhigh reasoning effort with OpenAI scaffold — GLM-5 trailed by 2.2 points). BrowseComp: 75.9% (first among open-weight models; GPT-5.2 Pro scored ~77.5%). Humanity’s Last Exam with tools: 50.4% (leading in this variant; standard text-only HLE scores are lower across all models). Zhipu AI arXiv paper 2602.15763; Reuters; Hugging Face model card. Benchmark claims are vendor-reported; SWE-bench and HLE methodologies are public. arXiv 2602.15763

[2] ASPI Critical Technology Tracker. The August 2024 report (2019-2023 data) showed China leading in 57 of 64 technologies. The December 2025 update (2020-2024 data, expanded to 74 technologies) showed 66 of 74. The March 31, 2026, update (complete 2025 dataset) shows 69 of 74. The US leads in the remaining 5. Note: ASPI tracks the top 10% of highly cited research publications — a research output measure, not a deployment measure. This piece uses it as a lead indicator of capability trajectory, not as direct evidence of deployment. ASPI Critical Technology Tracker | ASPI March 2026 update

[3] SMIC FY2025 results: revenue of US$9.327 billion, 16.2% YoY growth. Q4 2025 revenue US$2.489 billion. TrendForce, February 11, 2026; SMIC company disclosure via chinastarmarket.cn. The 34-step DUV multi-patterning figure is from SemiAnalysis and industry reporting on SMIC’s N+2 process. TrendForce

[4] China Ministry of Industry and Information Technology: national semiconductor production reached a record 484.3 billion units, up 85.2% from 2020. L’Opinion, March 27, 2026, citing MIIT and describing the figure as “l’année dernière” (2025). Some sources attribute this figure to 2024 full-year data; Jan-Oct 2025 output was 386.6B units (gov.cn, December 2025), tracking toward a higher 2025 total. Note: the unit count flatters because the majority of production is at legacy nodes (≥28nm) where value per unit is low. Goldman Sachs estimates only 14% self-sufficiency by value (fn. 55) — the gap between volume and value reflects continued dependence on imports for advanced chips. gov.cn (Jan-Oct 2025)

[5] Ascend 910C: 60% of H100 inference performance per DeepSeek researcher Yuchen Jin, via AGI Hunt and Tom’s Hardware (February 4, 2025). Ascend 950PR on Atlas 350 accelerator card: Huawei claims approximately 2.8x the performance of Nvidia’s H20. Unveiled by Huawei VP Ma Haixu at the March 20, 2026, China Partner Conference; specifications provided by Zhang Dixuan. TrendForce, SCMP, March 2026. Note: L’Opinion (March 27, 2026) reported the comparison as against the H200, which appears to be an error — multiple English-language sources confirm the comparison was against the H20. The comparison uses FP4 precision, which the H20 does not natively support, so the figure cannot be independently verified on comparable terms. Vendor-claimed, not independently benchmarked. Tom’s Hardware (910C) | Tom’s Hardware (950PR)

[6] Chinese authorities restricted imports of Nvidia H200 beginning January 14, 2026, with customs agents telling importers the chips were “not permitted to enter China” — one day after US export approval. The restrictions cite concerns about the chip's tracing capabilities. Reuters, Bloomberg, Asia Times, January-March 2026. L’Opinion (March 27, 2026) also reported the restrictions. L’Opinion | Asia Times

[7] Hugging Face download data via The New Stack (March 2026). For the period from February 2025 to February 2026, Chinese models accounted for 41% of downloads, compared with 36.5% for US models. The New Stack

[8] Stanford HAI / DigiChina issue brief, “Beyond DeepSeek: China’s Diverse Open-Weight AI Ecosystem and Its Policy Implications,” December 2025. Alibaba’s Qwen surpassed Meta’s Llama as the most downloaded model family in September 2025. 63% of all new fine-tuned or derivative models on Hugging Face in September 2025 were based on Chinese base models. Stanford HAI/DigiChina

[9] As of March 2026: Flagship API comparison — OpenAI GPT-5.4: $2.50/M input, $15/M output; Alibaba Qwen-Max: $1.60/M input, $6.40/M output (gap: ~1.6x input, ~2.3x output). Open-weight API comparison — DeepSeek V3: $0.27/M input, $1.10/M output (also available for free self-hosting). The structural advantage is not a single multiplier but a pattern: at every tier, the Chinese option is cheaper, and open-weight models available for local deployment have no pricing equivalent in the Western proprietary ecosystem. On benchmarks: GLM-5 scored 77.8% on SWE-bench Verified (vs. GPT-5.2 at 80.0%); ByteDance Seed 2.0 Pro scored 98.3% on AIME 2025; MiniMax M2.5 achieved 80.2% on SWE-bench Verified. See fn. 1 and fn. 28 for full pricing details. All benchmark claims are vendor-reported. OpenAI pricing | DeepSeek pricing

[10] China cloud market size: approximately $40B in 2024 (Canalys, Q4 2024 data: $11.1B quarterly, annualized ~$44B), accelerating to ~$50B in 2025 with 20-24% YoY growth (Omdia, Canalys, DCD). Exact figures vary by methodology and scope (IaaS vs. IaaS + PaaS + SaaS). Canalys

[11] Volcano Engine: IDC China data reported by AIBase and Tiger Brokers. Approximately 49% of China’s public cloud large-model invocations in H1 2025. 30 trillion daily tokens per Volcano Engine disclosure (a 253x increase from the May 2024 debut). ByteDance is private; revenue figures are industry estimates. AIBase

[12] US Big Three annualized revenue run rate mid-2025: AWS ~$124B, Azure ~$96B, GCP ~$78B = ~$298B (from respective earnings releases, Q2 2025). China's total cloud market is ~$50B. Ratio approximately 6:1. AWS Q2 2025

[13] World Nuclear News, “China’s demonstration HTR-PM enters commercial operation,” December 2023. The HTR-PM at Shidao Bay, Shandong Province, is a 210 MWe plant consisting of two reactor modules driving a single turbine. World Nuclear News

[14] World Nuclear Association, “Nuclear Power in China,” country profile (accessed March 31, 2026). Linglong One (ACP100), 125 MWe, developed by CNNC. Cold functional tests completed October 2025; steam turbine test completed December 2025; commissioning expected H1 2026. World Nuclear Association

[15] WNA 2025 World Nuclear Performance Report (data as of July 31, 2025) lists 32 reactors, ~34 GWe under construction. The count fluctuates as units complete and new starts begin; Wikipedia, citing IAEA PRIS (March 29, 2026), lists “over 28.” The “roughly half the global total” framing is consistent across all sources. WNA Performance Report

[16] World Nuclear Association Reactor Database (accessed March 2026). China: 61 operable reactors. France: 57 operable reactors. United States: 94 operable reactors. China surpasses France in reactor count but not yet in installed capacity (GW). WNA China reactors | WNA France reactors

[17] Combined ITU filings for GuoWang (~13,000 satellites), Qianfan/Thousand Sails (~15,000), and Honghu-3 (~10,000). China-in-Space Substack; Global Security; KeepTrack. China-in-Space

[18] GuoWang: ~163 satellites in orbit by mid-March 2026 after 20 launch missions. China-in-Space, KeepTrack, SpaceNews. China-in-Space

[19] Qianfan: 108 satellites launched in six batches. Plans for 324 more in 2026. Global Security; Connectivity.technology. GlobalSecurity

[20] China conducted approximately 92 orbital launches in 2025 (SpaceNews). GalaxySpace IPO tutoring process announced on March 31, 2026 (Reuters). LandSpace filed for a $1.07B IPO on the Shanghai STAR Market (December 2025). SpaceNews

[21] DOJ press release, “Three Charged with Conspiring to Unlawfully Divert Cutting Edge U.S. Artificial Intelligence Technology to China,” March 19, 2026. Defendants: Yih-Shyan “Wally” Liaw, 71, Fremont, CA; Ruei-Tsang “Steven” Chang, 53, Taiwan (fugitive); Ting-Wei “Willy” Sun, 44, Taiwan. DOJ

[22] Per the indictment, as reported by Fortune (March 23, 2026). Liaw responded to a news link about chip smuggling arrests with “sobbing-face emojis” and continued operations. Dummy servers were staged at the Southeast Asian warehouse; Chang arranged for an auditor he called “friendly” to conduct the review. Fortune

[23] Alibaba Cloud Model Studio hosts DeepSeek-V3.2, R1, and the full Qwen family. 90,000+ corporate clients per Alibaba Cloud disclosure at Apsara Conference 2025. Alibaba Cloud Model Studio

[24] ByteDance deployed approximately 36,000 Nvidia B200 chips through a Malaysian partnership with Aolani Cloud, at a hardware investment exceeding $2.5 billion. The Online Citizen, March 13, 2026; WSJ reporting. ByteDance’s Volcano Engine also deploys Ascend hardware for domestic inference workloads. The Online Citizen

[25] IDC China, reported by KR Asia (January 2026). China’s total large model invocations reached 536.7 trillion tokens in H1 2025. KR Asia

[26] DeepSeek V2 was released on May 6, 2024, with API pricing at RMB 1 (~$0.14) per million tokens. Baidu’s Wenxin 4.0-8K was charging RMB 120 per million tokens at the time. KR Asia, “LLM prices hit rock bottom in China,” January 2026. KR Asia

[27] ByteDance dropped the Doubao vision model to $ 0.00041 per 1,000 tokens in December 2024 (85% below the industry average). Alibaba matched within two weeks with Qwen-VL-Max at the same price. Global Times, January 2025; io-fund, March 2025. Global Times

[28] Pricing as of March 2026 via provider documentation. Flagship proprietary: OpenAI GPT-5.4: $2.50/M input, $15/M output. GPT-5.2: $1.75/M input, $14/M output. GPT-5.2 Pro (reasoning tier): $21/M input, $168/M output. Alibaba Qwen2.5-Max: $1.60/M input, $6.40/M output (Qwen official X/Twitter, January 2026). Open-weight: DeepSeek V3: $0.27/M input (cache miss), $1.10/M output (DeepSeek pricing page; launch price was $0.14 in December 2024). Note: GPT-5.4 and DeepSeek V3 serve different market segments (proprietary flagship vs. open-weight). The flagship-to-flagship gap (GPT-5.4 vs. Qwen-Max) is ~2x, not 14x. The structural advantage is the availability of competitive open-weight models for self-hosting, which has no Western equivalent at the same quality tier. DeepSeek pricing

[29] Tencent VP Martin Lau laid out three reasons why China’s AI revenue lags behind US peers: a smaller enterprise market, a less vibrant SaaS ecosystem, and fewer AI startups purchasing compute. io-fund, March 2025. io-fund

[30] Goldman Sachs, per Energy Connects (November 2025): China è prevista a disporre di circa 400 GW di capacità di generazione residua entro il 2030. B-tier (journalist paraphrase of analyst estimate). Energy Connects

[31] PJM capacity auction shortfall: Monitoring Analytics (PJM Independent Market Monitor), December 2025. US transformer lead times of 143 weeks vs. Chinese 48 weeks per Fortune (August 2025) and Wood Mackenzie reporting. PJM auction report

[32] “Eastern Data, Western Computing” (东数西算): launched February 2022 by NDRC. Over $28 billion invested to date, with total planned investment of $56-70 billion. Eight national computing hub nodes, ten data center clusters. DCPulse; Premia Partners; Sinocities Substack. DCPulse

[33] GuoWang and Qianfan constellation operators overlap with Huawei Cloud and Alibaba Cloud’s expansion geographies in Southeast Asia, the Middle East, and Africa.

[34] GalaxySpace: 25+ satellites launched, direct-to-cell demonstration between Beijing and Bangkok. The IPO tutoring process began on March 31, 2026 (Reuters). Partnerships in Thailand, UAE, Saudi Arabia, Indonesia, and Malaysia, per company disclosures. Reuters

[35] ByteDance $2.5B Malaysia deployment (fn. for ByteDance Malaysia). Chinese companies stockpiled approximately 1 million H20 chips valued at $12+ billion prior to restrictions, per KrASIA and industry reporting. Offshore compute strategy: Alibaba and ByteDance train LLMs in Singapore and Malaysia, per Tom’s Hardware and TechSpot. KrASIA | Tom’s Hardware

[36] Georgetown CSET (2023), using the UNESCO ISCED framework, estimates that China produces approximately 3.57 million STEM graduates per year. US figures vary by definition: ~500,000 at bachelor’s level only (NSF), ~820,000 across all degree levels (NCES/CSET). The ~5:1 ratio uses the broadest comparable definitions. Previous estimates of “5 million” and “10:1” (WEF, King’s College London) use broader Chinese classification criteria that include fields not typically counted as STEM elsewhere. Georgetown CSET

[37] China produces approximately 1.5 million engineering graduates annually, bringing the technical workforce to over 5 million engineers. EU-27 produces roughly 650,000 annually. University of Liège working paper, Attia (2026).

[38] Over 2,300 AI programs have been established in Chinese universities since 2018. Ekioz / Le Monde synthesis, February 2026. The Ministry of Education reported that one-fifth of higher education programs were revamped in 2024-2025 to channel students into AI and integrated circuits. CNBC, December 2025. CNBC

[39] China’s 15th Five-Year Plan (2026-2030), approved March 12, 2026. AI was mentioned 52 times, quadrupling the previous cycle. Nature, March 2026; The Diplomat, March 28, 2026; AI CERTs, March 2026. The Diplomat

[40] State Council “Opinions on Deeply Implementing the ‘Artificial Intelligence +’ Initiative,” August 2025. Three-step plan: 70% AI application penetration in six priority sectors by 2027; comprehensive AI-driven development by 2030; full integration by 2035. CSET Georgetown translation; 36Kr analysis, August 2025. CSET Georgetown

[41] Pre-15th Five-Year Plan recommendations from Chinese semiconductor specialists, including SMIC co-founder Wang Yangyuan: stabilize 28nm production, achieve reliable 14nm manufacturing, create a fully Chinese 7nm production line. Lithography is identified as the priority bottleneck. L’Opinion, March 27, 2026. L’Opinion

[42] The core AI industry is valued at 1.2 trillion yuan in 2025, per AI CERTs, citing government figures. Target exceeds 10 trillion yuan by 2030. Guangdong launched a 10 billion yuan AI and Robotics Industry Fund in early 2025; Shenzhen is building a dedicated embodied-intelligence industrial zone. The Diplomat, March 2026. gov.cn (AI industry)

[43] TSMC began shipping at 2nm (N2 process) for advanced customers in late 2025. SMIC’s most advanced production process is N+2 (7nm-class DUV multi-patterning). The gap is approximately two full process generations. SMIC’s 7nm yields have reportedly improved from ~20% to 40-70% by mid-2025 (FT; TD Cowen analyst Krish Sankar estimated 60-70%). At 5nm, yields are far lower — Kiwoom Securities estimated ~33%, which some sources have incorrectly attributed to 7nm. Even at 40-70%, the cost-per-good-die remains substantially higher than TSMC’s >90% yields at equivalent nodes. TrendForce

[44] ASML holds a global monopoly on EUV lithography; no other company has achieved volume production of EUV systems. Each system costs approximately $350-400 million and contains ~100,000 components. ASML reported 42% of Q3 2025 sales from China — all DUV systems, as EUV exports to China have been restricted since 2019 under US pressure on the Dutch government. TrendForce, November 2025; ASML quarterly earnings. TrendForce (ASML) | ASML earnings

[45] SMEE’s most advanced production tool operates at 90nm. A 28nm immersion DUV prototype is in testing. TrendForce (November 2025); Semiecosystem Substack. TrendForce (SMEE)

[46] Yuliangsheng (linked to Huawei-backed SiCarrier) has a 28-nanometer immersion DUV system reportedly being tested at SMIC. Tom’s Hardware assessed the system as resembling ASML’s Twinscan NXT:1950i from 2008. Even if integrated into SMIC’s 28nm process by 2027, domestically made lithography systems are unlikely to achieve sub-10nm production before 2030. TrendForce, November 2025; Tom’s Hardware; Financial Times, September 2025. TrendForce (Yuliangsheng)

[47] ASML CEO Christophe Fouquet, April 2025: China would need “many, many years” to catch up in EUV lithography. All About Industries, January 2026, citing ASML public statements. All About Industries

[48] CSIS Strategic Technologies Blog, “Breakthroughs or Boasts? Assessing Recent Chinese Lithography Advancements,” September 2025. The assessment examined several announced Chinese lithography milestones and found that each revealed “more exaggeration than transformation in terms of leading competitive capabilities.” CSIS

[49] At least three separate efforts are pursuing EUV development in China: SMEE (state-owned), SiCarrier (Huawei-linked), and university consortia. SMEE filed a patent for an EUV lithography scanner in 2024. Wikipedia, “Shanghai Micro Electronics Equipment”; SCMP, October 2025. The existence of three parallel programs may reflect coordination or fragmentation — the diagnosis is not yet clear. SCMP (AMIES)

[50] DeepSeek’s Yuchen Jin: “The biggest challenge for Chinese chips is the stability of long-cycle training.” Tom’s Hardware, February 4, 2025. Tom’s Hardware

[51] Developer descriptions of CANN from ChinaTalk, “Can Huawei Take On Nvidia’s CUDA?” and Tom’s Hardware reporting on Huawei’s Ascend ecosystem. ChinaTalk

[52] US hyperscaler capex 2025: Amazon ~$118-125B, Microsoft ~$120B annualized, Google ~$91-93B, Meta ~$70B. Total ~$400B. Chinese providers combined for ~$45-55B, according to Goldman Sachs estimates. Ratio approximately 7:1. Amazon IR

[53] Synergy Research Group, 2025 data. The US operates approximately 640-700 hyperscale data centers (~55% of the global total). China ~200-210 (~16%). Synergy Research

[54] LandSpace Zhuque-3 first-stage recovery attempt December 2025: landing burn cut out at ~3 km altitude. No Chinese company has achieved orbital booster recovery as of March 31, 2026. SpaceNews (LandSpace)

[55] Goldman Sachs estimated that Chinese national suppliers accounted for approximately 14% of domestic semiconductor demand by value in 2024, projecting that figure to approximately 37% by 2030. The 14th Five-Year Plan (2020-2025) had targeted 70% self-sufficiency. L’Opinion, March 27, 2026, citing Goldman Sachs. The 80% target was set by thirteen industry leaders, including Yangtze Memory Technologies Chairman Chen Nanxiang and Naura Chairman Zhao Jinrong, as reported at SEMICON China 2026. Nikkei Asia, “China chip sector targets 80% self-sufficiency with US in its sights,” March 28, 2026. Nikkei Asia

[56] Wang Yangyuan, co-founder of SMIC, described the Chinese semiconductor industry as “small, dispersed, and weak” (« petit, dispersé et faible »). L’Opinion, March 27, 2026. L’Opinion

[57] DeepSeek V3: pre-trained on 2,048 Nvidia H800 GPUs. Reported final pre-training cost of ~$5.6 million. R1 was then built on top of V3 using reinforcement learning across 512 GPUs, at an estimated cost of ~$294K. Combined V3+R1 pipeline: ~$5.9 million. R1 matched OpenAI o1 across most reasoning benchmarks. DeepSeek V3 technical report (arXiv:2412.19437). The $5.6M figure covers the final training run only — total R&D, failed experiments, and cluster amortization are excluded. DeepSeek V3 technical report

[58] Zhipu AI GLM-5: trained entirely on Huawei Ascend chips. SWE-bench Verified: 77.8% (vs. GPT-5.2 at 80.0% per OpenAI’s evaluation; Vals.ai independent testing scored GPT-5.2 lower at 75.4%, but this piece uses OpenAI’s self-reported figure consistently — see fn. 1). GLM-5 led on BrowseComp and HLE with tools. Medium / Maxime Labonne, February 2026. Medium / Labonne

[59] Huawei Cloud Saudi Arabia: launched September 2023, three availability zones, $400M five-year commitment. Revenue grew 10x in the first year. 1,000+ customers, including government agencies and STC, Zain KSA. Note: revenue growth figure is vendor-claimed. Huawei Cloud Saudi

[60] Huawei Cloud compliance claim per LEAP 2025 conference disclosure. For CLOUD Act exposure analysis, see Julien Simon, “Access, Disable, Destroy,” The AI Realist. LEAP 2025

[61] Alibaba Cloud: Mexico data center launched in February 2025; Brazil announced in September 2025; second Dubai facility announced in October 2025. Saudi Arabia JV with STC (Saudi Cloud Computing Company). DCD; Alibaba Cloud blog. Alibaba Cloud blog

[62] Tencent Cloud: $150M investment in a Saudi Arabia data center, February 2025. $500M third Indonesia data center. CNBC (January 2026); EqualOcean. CNBC

[63] AI Singapore switched Sea-Lion from Meta’s Llama to Alibaba’s Qwen in November 2025. TechNode, November 25, 2025; South China Morning Post. TechNode

[64] Uganda launched Sunflower LLM, built on Alibaba’s Qwen architecture, in October 2025. Supports Luganda and local dialects for agricultural advice and educational content. Eurasia Review, December 2025; Mail & Guardian, January 2026. Uganda ICT Ministry

[65] Malaysia NurAI: described as the world’s first Sharia-aligned LLM, built on DeepSeek’s foundation, refined by the China-ASEAN AI Laboratory. DeepSeek sent engineers to assist. Targets 340 million people across Malaysia, Indonesia, and Brunei. Bloomberg; China Media Project (Lingua Sinica), October 2025. Zetrix NurAI

[66] OCBC Bank AI tools: over 30 internal tools powered by Qwen and DeepSeek across operations in Singapore, Hong Kong, Malaysia, Indonesia, Thailand, and Vietnam. Wall Street Journal, reported by Asia Tech Lens, August 2025. WSJ

[67] Microsoft AI Economy Institute, “Global AI Adoption 2025” (H2 2025 update). DeepSeek usage in Africa is estimated at 2-4x higher than in other regions, aided by Huawei’s distribution infrastructure and the absence of cost barriers. Microsoft AI Economy Institute

[68] HMN Tech (formerly Huawei Marine): 100,000+ km of submarine cable across 70+ countries. Costs are 20-30% below those of Western rivals. Atlantic Council; DCD. Atlantic Council

[69] PEACE Cable: funded by Hengtong Group, built by HMN Tech. Approximately 15,000-25,000 km, 13 landing points across 12 countries, up to 192 Tbps capacity. Submarine Networks; DCD. Submarine Networks

[70] Digital Silk Road investment 2017-2023: ~$22 billion. ORF Online; MERICS. MERICS

[71] Julien Simon, “Access, Disable, Destroy,” The AI Realist. The three-switch model: chips (export controls), cloud (CLOUD Act/service suspension), models (API revocation/geo-blocking). The AI Realist

[72] NurAI content control finding: Researchers at China Media Project found that prompts with minor grammatical errors cause the model to revert to DeepSeek’s default training data, producing Chinese government-aligned responses on Taiwan. Lingua Sinica / CMP, October 2025. China Media Project

[73] For analysis of how fine-tuning modifies but does not excise base model constraints, see the Southeast Asia/Taiwan piece in the Country AI series. The AI Realist

[74] Huawei Cloud: 34 regions, 101 availability zones globally. Latin American regions include São Paulo, Santiago, Mexico City, Buenos Aires, and Lima. Huawei Cloud global infrastructure page. Huawei Cloud infrastructure

[75] Macron at the Summit on European Digital Sovereignty, Berlin, November 18, 2025: “Europe doesn’t want to be the client of the big entrepreneurs or the big solutions being provided either from the US or from China... a refusal of being a vassal.” “You cannot dedicate the strength of your economy to the Magnificent Seven.” France 24; Global Times; Élysée.fr. Élysée.fr

[76] Macron's special address, World Economic Forum, Davos, January 20, 2026: “Competition from the United States of America through trade agreements that undermine our export interests... combined with an endless accumulation of new tariffs that are fundamentally unacceptable — even more so when they are used as leverage against territorial sovereignty.” Full transcript at weforum.org. WEF transcript

[77] Macron at WEF Davos, January 20, 2026: “China is welcome, but what we need is more Chinese foreign direct investments in Europe, in some key sectors, to contribute to our growth, to transfer some technologies, and not just to export towards Europe.” Full transcript at weforum.org. See also Financial Times, “We urgently need to rebalance EU-China relations,” Macron op-ed post-Beijing visit. WEF transcript

[78] Macron at Munich Security Conference, February 2026. Urged the EU to deploy an “anti-coercion mechanism” while prioritizing “Made in Europe” technology. IntInsight, February 2026. IntInsight

[79] Foreign Policy, “Europe’s Digital Sovereignty Means Decoupling From U.S. Technology,” February 27, 2026. Foreign Policy

[80] The International Criminal Court replaced Microsoft with OpenDesk, an open-source suite from the German Center for Digital Sovereignty (ZenDiS), in November 2025. The decision followed an incident in which chief prosecutor Karim Khan was temporarily locked out of his Outlook account after US sanctions. IEEE Spectrum (March 2026); The Register (December 2025). Microsoft has denied cutting services to the ICC as a whole. IEEE Spectrum

[81] Kyndryl announced the acquisition of Solvinity, a Dutch managed cloud provider, in November 2025. Clients, including the municipality of Amsterdam and the Dutch Ministry of Justice and Security, had chosen Solvinity specifically to reduce CLOUD Act exposure. The Register, December 2025. Kyndryl

[82] Microsoft France director of public and legal affairs Anton Carniaux, in a French Senate hearing on June 10, 2025, could not guarantee customer data would never be transferred to US authorities under the CLOUD Act. Exact quote: “Non, je ne peux pas le garantir, mais, encore une fois, cela ne s’est encore jamais produit.” Computerworld, December 2025; French Senate website. Computerworld

[83] Gartner forecast: worldwide sovereign cloud spending to hit $80 billion in 2026, up 35.6% from 2025. Europe (83% growth) leads major regions. European sovereign cloud spending is projected to triple from 2025 to 2027. The Register, February 2026. Gartner

[84] Stanford HAI figure (fn. 7). The 63% represents all new fine-tuned or derivative models on Hugging Face globally, not just European-specific ones.

[85] Italy banned government use of DeepSeek. Denmark, the Czech Republic, and Belgium followed with restrictions on official devices. Euronews, February 2026. The 2026 International Security Report of the Estonian Foreign Intelligence Service warned that DeepSeek “conceals key information and inserts Chinese propaganda.” CEPA, February 2026. Euronews

[86] Policy Genome audit of Chinese AI models found DeepSeek produced largely accurate English and Ukrainian responses on the Ukraine war, but endorsed Kremlin talking points in Russian-language responses. CEPA, February 2026; Swedish Psychological Defense Agency-funded study. CEPA

[87] The EU’s proposed Cloud Act for Digital Autonomy (CADA), expected in Q1 2026, includes “effective control” concepts that could extend to model-provenance requirements. If CADA’s eligibility criteria address model origin — not just cloud provider nationality — the open-weight Chinese channel the piece describes could narrow. The legislation is pending; its final scope is not yet determined.

[88] OpenAI released its first open-weight models (gpt-oss-120b and gpt-oss-20b) in August 2025 — almost six years after closing its earlier open approach. Stanford HAI / DigiChina brief, December 2025. Stanford HAI

[89] The US International Development Finance Corporation, EXIM Bank, and Japan’s JBIC have infrastructure financing instruments applicable to digital infrastructure. Deployment at scale in Global South digital infrastructure has not materialized at the pace of Chinese Digital Silk Road investment ($22B, 2017-2023).

[90] The Supermicro scheme (fn. 20-21) operated for approximately two years before the first arrest. The indictment describes compliance failures, including the use of staged dummy servers, friendly auditors, and continued operations after learning of other smuggling arrests. The enforcement architecture’s prosecution-first model creates a structural lag of at least one GPU generation.

[91] White House AI Czar David Sacks: estimated China’s AI sector lags by 3-6 months. Per reporting and congressional testimony, 2025. House CCP report

[92] House Select Committee on the CCP, “DeepSeek Unmasked” report. Unnamed senior US AI executive: “Some in the industry have claimed that the U.S. holds an 18-month AI lead, but that obfuscates reality — it’s closer to three months.” House CCP report

[93] IDC data reported by Reuters, April 1, 2026. Total AI accelerator card shipments in China: approximately 4 million units in 2025. Chinese vendors: 1.65 million cards (41%). Nvidia: 2.2 million cards (55%). AMD: 160,000 cards (4%). Among Chinese vendors: Huawei 812,000 (roughly half of all domestic shipments), Alibaba T-Head 265,000, Cambricon and Baidu Kunlunxin ~116,000 each. Note: Nvidia's 55% market share represents a significant decline from its pre-export-control dominance but confirms it remains the single largest vendor even in the restricted market. Reuters/Yahoo Finance

The Half-Life of a Press Release

Julien Simon — Tue, 31 Mar 2026 07:10:21 GMT

In September 2024, Larry Ellison described Oracle’s future to investors: a data center powered by three small modular reactors, over a gigawatt of dedicated nuclear capacity feeding nothing but Oracle’s servers.[1] No reactor vendor was named. No site was identified. No filing appeared with the Nuclear Regulatory Commission. The stock moved anyway.

Between late 2024 and early 2026, every major hyperscaler announced its own nuclear bet. Google signed a deal with Kairos Power for 500 megawatts of reactors that have never been certified.[2] Amazon backed X-energy for twelve modules totaling 960 megawatts that require a fuel only Russia can commercially produce.[3] Meta signed deals across four partners for up to 7.7 gigawatts — but most of the near-term capacity comes from Constellation and Vistra reactors that already exist, while the new-build commitments from TerraPower and Oklo depend on reactors that do not.[4] Total announced SMR capacity for data centers: roughly six gigawatts. Total delivered: zero.

This would be unremarkable if the promise were merely optimistic. Technology bets are often early. But the SMR-for-data-centers thesis rests on a specific claim that is structurally broken: that a compact reactor can be co-located with a hyperscale facility, generating power independently of the grid, on a timeline competitive with the data center buildout it is supposed to serve. That claim fails on six dimensions. And the most instructive evidence comes not from the United States, which has the weakest nuclear construction record among major nuclear nations, but from France, South Korea, and China — the countries that should be able to do this if anyone can.

Every announced SMR-for-data-centers deal can be evaluated with five questions. The reactor’s certification status. The fuel it requires. Its distance from the data center it claims to power. Who controls the supply chain from mine to megawatt? And who will operate it when it needs to stop? These five questions separate credible commitments from announcement theater — and they reveal that the credible nuclear-for-data-centers play is the opposite of the SMR pitch.

The power crisis is real

The demand is not invented. Global data center electricity consumption is expected to more than double by 2030, driven by AI training and inference workloads that consume orders of magnitude more power than conventional computing.[5] In the United States, PJM Interconnection — the grid operator serving the country’s densest data center corridor from Northern Virginia through New Jersey — failed to clear enough generation in its December 2025 capacity auction. The shortfall was 6,623 megawatts, the first time the entire region failed to meet its reliability target.[6] Data center load accounted for roughly forty-five percent of the capacity costs from that auction and two previous ones, a cumulative burden of $21.3 billion passed to all ratepayers.[7]

Hyperscalers need baseload power: electricity available twenty-four hours a day, every day of the year, regardless of weather or season. Solar and wind, for all their cost advantages, cannot deliver this alone. Natural gas can, but contradicts the carbon-neutral commitments that every major cloud provider has made. Nuclear — with its ninety-two percent capacity factor and zero direct carbon emissions — is the obvious theoretical answer.[8]

The problem is not the goal. The problem is the specific way the industry has chosen to pursue it.

Who can actually build nuclear

Who has ever built nuclear power successfully — and do those capabilities transfer to SMRs?

France operates 57 reactors that generate roughly two-thirds of its electricity, more than any other nation on Earth.[9] It maintains a complete nuclear fuel cycle from enrichment to reprocessing at Orano’s La Hague facility, and its nuclear sector employs approximately 247,000 people across 2,000 companies.[10] France is the deepest Western repository of nuclear expertise. And France cannot build a new reactor on schedule. Flamanville 3, the European Pressurized Reactor that was supposed to demonstrate French nuclear prowess, ran twelve years late and roughly four times over its original construction budget of €3.3 billion.[11] The SMR picture is worse: EDF’s Nuward subsidiary restructured its design in July 2024, withdrew from the United Kingdom’s SMR competition, and relaunched in January 2025 with a simplified 400-megawatt concept. It now aims to finalize the conceptual design by mid-2026, with a first-of-a-kind reactor in France sometime in the 2030s.[12] The country with the deepest nuclear expertise in the Western world has not finalized the design of its SMR, let alone begun construction.

On March 10, 2026, European Commission President Ursula von der Leyen stood at the Nuclear Energy Summit in Boulogne-Billancourt and called Europe’s turn away from nuclear a “strategic mistake,” announcing a €200 million guarantee to support private investment in innovative nuclear technologies.[13] Emmanuel Macron called on banks and venture capital funds to invest in civilian nuclear. But €200 million is roughly five percent of the estimated cost of a single 300-megawatt SMR at Darlington — a policy signal, not a construction program.[14]

South Korea built the Barakah nuclear power station in the United Arab Emirates — four APR-1400 reactors totaling 5,600 megawatts — completing the project at a final cost reported between $25 and $32 billion, or roughly $4,500 to $5,700 per kilowatt.[15] The project ran about four years behind its original schedule and significantly over its initial $20 billion contract price.[16] By any Western comparison, this is still a success story: Barakah cost roughly one-third per kilowatt of what Vogtle cost in Georgia, and each successive unit was delivered faster than the last — a genuine learning curve.[17] But the APR-1400 is a 1,400-megawatt pressurized water reactor with decades of design heritage. South Korea is developing an SMR concept, i-SMR. None is under construction. The discipline that built Barakah — standardized large reactor designs with deep production experience — is precisely the opposite of first-of-a-kind modular construction.

China is the only country operating small modular reactors. The HTR-PM at Shidao Bay reached commercial operation in December 2023, producing 210 megawatts.[18] The Linglong One is expected to reach commercial operation in the first half of 2026.[19] China has thirty-two reactors under construction and a fully vertically integrated supply chain from uranium enrichment to fuel fabrication to heavy forging.[20] But China achieved this through state capital, state direction, and a state-controlled supply chain that no Western hyperscaler can access. The technology works. The governance model does not export.

The paradox: the countries that can build nuclear cannot build SMRs for data centers. The country that can build SMRs for data centers builds them under a system no Western hyperscaler can replicate. The United States, which has the hyperscaler demand, hasn’t completed a reactor on time in decades — Vogtle Units 3 and 4 were seven years late and came in at two-and-a-half times the original budget.[21]

Six dimensions of failure

Geography

Data center demand is concentrated in places where reactors cannot go. Northern Virginia hosts the world’s densest data center corridor — and the population density that makes reactor siting politically impossible.[22] Dublin is among Europe’s largest data center markets and sits in a country that has banned nuclear power by statute.[23] Oregon, where Amazon, Google, Meta, and Apple operate major campuses, has a nuclear moratorium.[24] Frankfurt is the largest data center market in continental Europe and is located in a country that shut down its last three reactors in April 2023.[25] And where siting is theoretically possible, reactors and data centers both require cooling water — co-locating them doubles the demand at a site where drought or permitting may already be a constraint.[26]

Every announced SMR data center project is located at an existing nuclear site, eighty to one hundred eighty miles from the data centers it claims to serve.[27] Amazon’s X-energy deal sites reactors near Columbia Generating Station in Richland, Washington — a three-hour drive from Oregon’s data center corridor.[28] The reactor goes where nuclear infrastructure already exists. The data center goes where fiber, customers, and interconnection are. The co-location promise dissolves into a grid-connected Power Purchase Agreement (PPA) at a distance — and most hyperscalers have quietly structured their deals exactly this way, with the co-location narrative confined to earnings calls rather than contracts. That sharpens the critique: if these are grid PPAs for reactors that do not yet exist, they compete directly with TMI and Susquehanna — and lose, because those reactors already generate power.

Timeline

A hyperscaler builds a data center in eighteen to thirty-six months. No Western SMR has yet generated any commercial electricity. The furthest-along project, Ontario Power Generation’s BWRX-300 at Darlington, targets grid connection in 2030 after receiving its construction license in April 2025, and it has already slipped roughly two years from its original schedule.[29] TerraPower received a construction permit for its Natrium reactor on March 4, 2026 — the first for a commercial non-light-water reactor in over forty years — but still needs a separate operating license, expects to begin pouring nuclear concrete in late 2026 or 2027, and targets commercial operation no earlier than 2030.[30][31]

The international construction record reinforces the pattern. Flamanville: twelve years late. Hinkley Point C: roughly double the original budget, targeting 2030.[32] Olkiluoto 3: eighteen years to completion at nearly four times the original cost.[33] SMR vendors argue that factory modular construction — assembling reactor components in controlled facilities rather than building on-site — will break this pattern, and the argument has architectural merit: the BWRX-300, for instance, is designed specifically for factory fabrication of the reactor pressure vessel. But the escalation pattern is not inherited from those megaprojects — it emerges from the SMR projects’ own histories: NuScale, Darlington, and TerraPower have each escalated before a single module was manufactured. Regulatory reform under the Trump administration’s nuclear executive orders may compress future NRC timelines, but it does not address the other five dimensions— cost, fuel, supply chain, workforce, or operational complexity.[84]

Cost

Every first-of-a-kind SMR project has escalated. NuScale’s UAMPS project — the only NRC-certified SMR design — was cancelled in November 2023 after costs rose from $5.3 billion to $9.3 billion for 462 megawatts, or roughly $20,000 per kilowatt.[34] OPG’s Darlington BWRX-300 received its final investment decision in May 2025 at approximately $15,000 per kilowatt for the first unit — about six times GE Hitachi’s original target of $2,333 per kilowatt, and that is before a single watt of electricity has been generated.[35] TerraPower’s Natrium is officially estimated at $4 billion for 345 megawatts, or roughly $11,600 per kilowatt, though nuclear industry critics project the all-in cost could reach $29,000 per kilowatt based on historical first-of-a-kind escalation patterns.[36] These figures include costs unique to first units — test facilities, fuel qualification programs, and first-time regulatory proceedings — that subsequent units would not bear.[37]

For comparison: new utility-scale solar costs $38 per megawatt-hour and onshore wind $37 — roughly one-quarter the floor price of new nuclear at $141.[38] Gas combined-cycle plants run $48 to $109, undercutting nuclear across its entire range.[39] Even after adding transmission and firming costs that narrow the gap for intermittent sources, new nuclear remains the most expensive new-build option.[40] Dedicated nuclear power was supposed to serve as a hedge against grid price volatility. At current first-of-a-kind economics, nuclear power is more expensive than the grid it claims to displace. If later units achieve the $4,000 to $6,000 per kilowatt that vendors project for nth-of-a-kind production, the economics change — but that projection requires completing the first units on schedule and on budget — a record that does not yet exist.

Fuel sovereignty

Most of the advanced SMR designs that hyperscalers have backed — X-energy’s Xe-100, TerraPower’s Natrium, Kairos Power’s fluoride-salt reactor, and Oklo’s Aurora — require high-assay, low-enriched uranium, a fuel enriched to between 5 and 20 percent, compared with the 3 to 5 percent used in conventional reactors.[41] Russia’s TENEX, a Rosatom subsidiary, is the only commercial-scale producer.[42] Western production is embryonic: Centrus Energy’s demonstration cascade in Piketon, Ohio, produced approximately 900 kilograms of HALEU in 2025, orders of magnitude below the tonnage required to fuel a fleet of advanced reactors.[43] Even Orano, which operates the most advanced enrichment infrastructure in Western Europe, does not expect to produce HALEU at a commercial scale before the late 2020s.[44] The DOE’s $2.7 billion enrichment investment may close the supply gap by the early 2030s — roughly when the first HALEU-fueled reactors would need fuel — but the fabrication chain that converts enriched uranium into reactor-ready fuel assemblies does not yet exist at scale for any advanced fuel type.

The designs that avoid this dependency — the BWRX-300, the Rolls-Royce SMR, Westinghouse’s AP300 — use conventional low-enriched uranium with diversified supply. But those are not the reactors the hyperscalers chose for their headline deals. The most ambitious announcements require a fuel that only an adversary produces at scale.

Supply chain: mine to megawatt

The nuclear fuel chain has six links, and every one has a chokepoint owner. Kazakhstan mines roughly forty percent of global uranium, with a significant portion converted or enriched at Russian facilities before reaching Western customers.[45] Conversion capacity is concentrated across four countries. Enrichment: Rosatom controls forty to forty-six percent of global capacity; together with China, that exceeds sixty percent — a concentration tighter than OPEC’s share of oil production.[46] Fuel fabrication for advanced reactor designs is vendor-locked and doesn’t exist at a commercial scale outside China for the fuel types the hyperscaler-backed SMRs require.[47] Heavy forgings for reactor pressure vessels are produced at fewer than a dozen facilities worldwide, with no American company currently manufacturing large commercial reactor components at scale.[48][49]

Operational reality

The workforce does not exist either. The United States employs roughly 68,000 nuclear workers, and sixty-three percent of employers in nuclear manufacturing report that hiring is “very difficult.”[50] Approximately forty percent of the current workforce is eligible to retire within a decade.[51] The International Atomic Energy Agency projects that over four million nuclear professionals will be needed globally by 2050 to meet planned expansion targets.[52] Each new SMR design requires NRC-licensed operators trained on technology-specific simulators — simulators that, for the most part, do not yet exist.

Conventional reactors refuel every 18 to 24 months, requiring roughly 30 to 38 days of planned downtime.[53] During these outages, the data center needs grid backup — the same grid the reactor promised to eliminate. The grid remains essential. Advanced SMR designs claim refueling intervals of 3 to 7 years, but no design has demonstrated this commercially.[54] Unplanned outages account for two to three percent of reactor time, against data center availability targets of 99.995 percent — a gap of roughly two orders of magnitude that requires exactly the redundancy the co-location thesis was supposed to remove.[55]

The reactor adds complexity rather than resolving it. Standard hyperscale data centers use N+1 or 2N redundancy — every critical power component has at least one backup, with grid feeds through dual utility substations and diesel generators as tertiary protection. A co-located SMR replaces one source in this chain but does not eliminate any of the others. The data center still needs grid connection for refueling backup, still needs uninterruptible power supply systems for power quality during switchover, and still needs backup generators for dual-failure scenarios — indeed, Tier III and Tier IV data center certifications require on-site engine-generator backup regardless of primary power source.[56] The reactor is additive complexity, not a replacement.[57]

The grid stress problem is symmetrical. In July 2024, dozens of data centers in Virginia simultaneously dropped from the PJM grid during a transmission failure, losing over a gigawatt in seconds.[58] The reverse scenario — hundreds of megawatts of data center load suddenly demanding grid power because a co-located reactor went offline — also places comparable stress on the grid. Modern grid codes are increasingly requiring data centers to behave as virtual power plants with controlled ramp rates. An SMR-powered data center must comply with both NRC reactor regulations and evolving grid participation requirements — two regulatory regimes designed independently, with no precedent for their intersection on a single campus.[59] And reactor builds compete for the same constrained trades — electricians, welders, heavy-equipment operators — as the data center buildouts they claim to power.[60]

The SMR Delivery Test

Five questions. Apply them to any announced SMR-data center deal.

What is the reactor? If NRC-certified with construction experience: five to eight years to deployment. If uncertified: eight to fifteen years. Google’s Kairos and Amazon’s X-energy are uncertified. TerraPower received its construction permit in March 2026. OPG’s BWRX-300 is under construction but hasn’t generated a watt of power.

What is the fuel? If a conventional low-enriched uranium diversified supply exists. If HALEU: supply is controlled by Russia, with Western production years away. Most hyperscaler-backed designs require HALEU.

What is the distance? If truly co-located with a data center, zero announced projects achieve this. If grid-connected at 50 to 200 miles, it is a PPA, not co-location. If financial only, it is a carbon accounting exercise.

Who controls the supply chain? Map the six links from mine to megawatt. Count how many route through a single actor or adversary. No hyperscaler controls any link.

Who runs it, and what happens when it stops? Does the workforce exist? Is there a refueling plan? What is the backup power source during outages? In every case, the answer loops back to the grid.

Applied to the announced deals:

The contrast in the last row is the point. The deal that scores highest on the delivery test is the deal that involves no SMR. It is a fifty-year-old reactor, grid-connected, staffed by an existing workforce, burning conventional fuel — the inverse of every SMR announcement above it in the table.[61]

The table is not exhaustive — Standard Power, Dominion Energy, and Equinix have signed their own SMR deals, none of which would score above one — but the pattern holds across the full landscape.[62] On the eve of publication, X-energy filed for a Nasdaq IPO, disclosing $390 million in losses on $94 million in revenue — a company that has never generated a commercial watt, going public to fund reactors that remain years from construction.[3]

What actually works

The deals that deliver real power look nothing like the SMR pitch. Microsoft’s twenty-year agreement with Constellation to restart Three Mile Island’s Unit 1 will add 837 megawatts of proven capacity, at an estimated cost of $1.6 billion, with a $1 billion DOE loan already closed — targeting 2027, at a contract price reportedly above $110 per megawatt-hour.[63] That is a premium over wholesale rates, but a fraction of what any new-build reactor would cost.

Meta signed a twenty-year PPA with Constellation for Clinton Clean Energy Center’s 1,121 megawatts in Illinois — a plant that was headed for closure until the deal replaced its expiring state subsidies with private offtake.[64] Meta also signed with Vistra for nuclear power from operating plants in Ohio and Pennsylvania, plus uprates at those plants and at Beaver Valley, adding a further 433 megawatts in the early 2030s.[65]

Amazon’s PPA with Talen Energy for Susquehanna’s 1,920 megawatts is delivering power now.[66] Google signed with NextEra to restart Duane Arnold, a 615-megawatt reactor in Iowa, targeting early 2029.[67]

Combined, these conventional nuclear deals represent over seven gigawatts — dwarfing the announced SMR total — with delivery timelines of 2026 to 2029. But the runway is finite: the United States has roughly 95 gigawatts of operating nuclear capacity, and the conventional PPA pipeline will exhaust itself well before data center demand does.

The credible nuclear-for-data-centers play is the one that requires no technological breakthrough, no new fuel supply, no first-of-a-kind construction, and no workforce that doesn’t already exist. It is also the one that no one writes press releases about, because “we signed a power purchase agreement with a fifty-year-old reactor” does not move a stock price.

Beyond nuclear, the technologies that actually compete on the SMR value proposition — firm, carbon-free, baseload — are arriving faster. Fervo Energy’s Cape Station in Utah will deliver its first hundred megawatts of enhanced geothermal to the grid in 2026, with an additional four hundred megawatts by 2028.[68] Cape Station is itself a first-of-a-kind project at this scale — the difference is that a geothermal well that underperforms can be redrilled in months, while a reactor that underperforms enters a years-long correction cycle.

Chevron, GE Vernova, and Engine No. 1 are building natural gas “power foundries” delivering four gigawatts of co-located behind-the-meter capacity by the end of 2027 — technology that exists, at a scale that matters, on a timeline that matches the demand.[69] Gas foundries deliver on timeline and cost but face their own headwinds: air-quality permits and state decarbonization mandates may constrain expansion in markets where data center demand is highest. The tell is in Meta’s own portfolio: the same company that announced 6.6 gigawatts of nuclear deals in January 2026 is simultaneously building a $3.2 billion, two-gigawatt gas plant in Louisiana and secured approval for a 700-megawatt gas plant in Ohio — more capital committed to gas than to any single nuclear partner.[70] Battery storage costs are plummeting, with Lazard’s 2025 analysis showing sharp year-over-year declines that bring hybrid solar-plus-storage into the competitive range for firmed power.[71]

What it costs

None of this makes cloud cheaper. Even before the AI-driven demand surge, Microsoft disclosed $800 million in unexpected energy costs in fiscal 2023, compressing cloud margins.[72] Colocation rates in Northern Virginia have surged, with premiums breaching $215 per kilowatt per month.[73] The cost chain is direct: energy represents thirty to forty percent of data center operating expenses, and those costs pass through to enterprise customers via instance pricing, storage fees, and capacity charges.[74]

PJM capacity costs are recovered from all ratepayers — including businesses that are also cloud customers — creating double exposure: higher cloud bills from hyperscaler energy pass-through and higher electricity bills from capacity charges driven by the same data center load.[75] Virginia has already approved data-center-specific rate classes; other states are following.[76]

The capital allocation question sharpens the problem. The five largest hyperscalers plan a combined capital expenditure of $600 to $690 billion in 2026, consuming nearly 100% of their operating cash flows, compared with a ten-year average of 40%.[77] Amazon faces a projected negative free cash flow of $17 to $28 billion. Alphabet’s free cash flow is expected to fall roughly ninety percent. The hyperscalers have collectively issued over $120 billion in bonds in 2025 alone — Alphabet quadrupled its long-term debt in a single year to $46.5 billion, including a hundred-year sterling tranche; Amazon’s total debt now exceeds $100 billion.[78] As cheap options against a $600 billion annual capex budget, the SMR commitments make financial sense; as near-term power procurement strategies, they contribute nothing.

Adding $4 to $15 billion nuclear reactors on top of that capex — for power that arrives five to ten years after the data center it was built to feed — is not a rounding error. It is a second capital-intensive construction program running in parallel with the first, competing for the same engineering talent, balance sheet capacity, and investor patience.

At current first-of-a-kind economics, SMR power would be the most expensive electricity a data center has ever bought. And for any deal that promises co-location, there is a temporal mismatch: a nuclear reactor has a 60- to 80-year operating life, while the AI workloads it serves operate on 3- to 5-year technology cycles.[79] A reactor commissioned in 2035 to power GPU clusters will still be generating electricity in 2095. A grid-connected reactor avoids this trap — the electrons go to the grid, and the grid doesn’t care what’s plugged in. But if the answer is grid PPAs, the hyperscalers already have them.

Announce, delay, bridge

The pattern has a name. In the 1990s, the telecommunications industry promised fiber to every home. Billions were committed. The physics worked, the technology existed, but the deployment economics didn’t close — the last mile was too expensive to build one home at a time. What bridged the gap was wireless: 3G, then 4G, now 5G, delivering bandwidth through completely different infrastructure at a fraction of the cost. Fiber eventually arrived decades later, in dense corridors, deployed by companies that looked nothing like the ones that made the original promises.[80]

The SMR data center thesis follows the same arc. Nuclear physics is real. The technology, in principle, works — China has proven that. But the economics of deployment do not close: all six dimensions compound. What bridges the gap is the less revolutionary infrastructure — gas turbines, solar farms, battery storage, geothermal wells, and grid-connected PPAs with reactors that already exist — deployed by companies making purchase orders rather than press releases.

Four scenarios define what comes next, and they operate on fundamentally different clocks.

OPG’s Darlington BWRX-300 is expected to reach commercial operation around 2030 and will demonstrate whether a Western SMR can be built on budget and on schedule. The four-unit project is budgeted at approximately $15 billion for 1,200 megawatts; if the learning curve materializes, the later units should cost substantially less than the $15,000-per-kilowatt first unit.[81] But even in the optimistic case, the learning curve takes a decade — the fourth unit would likely generate power in the mid-2030s.

Meanwhile, Fervo will have been operating for nearly a decade. It could reach five hundred megawatts by 2028 and prove that enhanced geothermal can compete directly with nuclear on the baseload value proposition — firm, twenty-four-hour carbon-free power — at a fraction of the timeline and cost. If it does, the SMR thesis loses its unique selling point.[82]

China continues to deploy SMRs under state direction, accumulating operational experience and cost data that the West can observe but cannot replicate — the same pattern that played out in high-speed rail, solar panel manufacturing, and battery production.[83] The question is not whether China’s model produces cheaper nuclear. It does. The question is whether any Western government or company can import the model without importing the governance structure that makes it work. So far, the answer is no.

There is, however, one Western institution moving at something approaching China’s speed, and it is not a hyperscaler. The U.S. Army’s Janus Program has named nine bases for microreactor deployment, with an executive order mandating an operational reactor by September 2028.[85] The DOE’s Reactor Pilot Program has selected ten companies across eleven projects racing to reach criticality by July 4, 2026 — and on February 15, Valar Atomics’ Ward250 was airlifted on a C-17 from California to Utah in the first airlift of a nuclear microreactor in American history.[86] The catch: both programs bypass the NRC entirely, using DOE and Army regulatory authority under the Atomic Energy Act. The only Western institution hitting nuclear timelines competitive with China’s is the one that removed the commercial regulatory process from the equation. The first real customer for microreactors will not be a hyperscaler. It will be the Pentagon, justifying cost on mission assurance rather than price per megawatt-hour, and deploying through a regulatory pathway no commercial operator can access.[87]

The SMR co-location promise fails against all six dimensions. Even the most nuclear-capable nations on earth have not accomplished what Oracle’s earnings call describes as imminent. And the hyperscalers that made the announcements are quietly doing something else entirely: signing purchase agreements with fifty-year-old reactors, contracting for gas turbines, and buying geothermal.

The case for building new nuclear capacity in the 2030s is strong — the conventional PPA runway is finite, and nothing else proven matches nuclear’s combination of baseload reliability, energy density, and zero carbon emissions.[88] But the case for believing these specific announcements will deliver it is not. The thesis breaks if Darlington delivers on budget by 2030, or if any HALEU-fueled commercial reactor achieves grid connection before 2032. Until then, the scoreboard reads: announced six gigawatts, delivered zero.

The promise was a reactor for every data center. The reality is a press release for every earnings call. The stock still moves anyway.

Notes

[1] Ellison, Oracle FY25 Q1 earnings call, September 9, 2024. Ellison’s exact words: “We’re in the middle of designing a data center that’s north of a gigawatt. The location and the power place we’ve located, they’ve already got building permits for three nuclear reactors.” No NRC filing, vendor agreement, or site selection has appeared in any subsequent SEC disclosure. Motley Fool

[2] Google-Kairos Power agreement, October 2024. Kairos’s fluoride-salt-cooled high-temperature reactor is not NRC-certified. A non-power demonstration unit (Hermes) is under construction at Oak Ridge National Laboratory. Google Blog

[3] Amazon-X-energy/Energy Northwest agreement, October 2024. The Xe-100 is a high-temperature gas-cooled reactor requiring HALEU. Twelve 80 MW modules totaling 960 MW, sited near Columbia Generating Station in Richland, WA. On March 20, 2026, X-energy filed an S-1 for a Nasdaq IPO (ticker: XE), disclosing $390 million net loss on $94 million revenue (excluding grants) for 2025 — losses tripling year-over-year while the Xe-100 remains uncertified and no commercial reactor is under construction. The company claims an 11+ GW development pipeline across US and UK partnerships. Amazon ; S-1: SEC

[4] Meta-TerraPower: up to eight Natrium reactors totaling approximately 2.8 GW, from 2032. Meta-Oklo: 1.2 GW across multiple Aurora reactors, Pike County, OH, 2030-2034. As of March 17, 2026, Oklo announced a DOE safety design agreement for its Aurora demonstration reactor at Idaho National Laboratory — a milestone for the demo unit, not the commercial Pike County project. Meta-Vistra: uprates at operating plants in Ohio and Pennsylvania. Meta also signed a 20-year PPA with Constellation for Clinton Clean Energy Center (1,121 MW BWR, Illinois), June 2025. Meta; Constellation Energy

[5] IEA, Electricity 2025 report, January 2025. Global data center electricity consumption is projected to exceed 945 TWh by 2030, from approximately 415 TWh in 2024. IEA

[6] PJM Interconnection, Base Residual Auction results, December 2025. The 6,623 MW shortfall marked the first time the entire RTO failed to clear its reliability target. PJM

[7] PJM Monitoring Analytics estimates. Data center share of capacity costs: approximately $6.5 billion from the December 2025 auction; cumulative $21.3 billion across three consecutive auctions (July 2024, December 2025, and February 2026), representing approximately 45% of the $47.2 billion total. Utility Dive

[8] EIA, Nuclear Explained, 2024. US nuclear fleet average capacity factor: 92.2% in 2024. EIA

[9] World Nuclear Association, “Nuclear Power in France,” updated 2025. France has 57 operable reactors following the grid connection of Flamanville 3 in December 2024. Nuclear share of French electricity: IAEA reports 67.3% for 2024; historical share ranged 65-75% before the 2022 corrosion crisis temporarily reduced availability. WNA

[10] GIFEN 2025 Match Report: approximately 247,000 people across ~2,000 companies in France’s nuclear sector. Older GIFEN web pages cite “more than 3,000” using a broader industry definition; the 2025 methodology identifies ~1,830 companies directly. The earlier workforce figure of 220,000 (cited by World Nuclear News/Chamberlain-Rullion, May 2025) used a narrower definition. SFEN

[11] Flamanville 3 EPR: construction began in December 2007; original target approximately 2012; grid connection December 21, 2024 — twelve years late. Original budget €3.3 billion; final cost approximately €13.2 billion per EDF’s December 2022 estimate. The French Court of Auditors estimated the total cost, including financing, at €23.7 billion. World Nuclear Association; NucNet. WNA

[12] EDF/Nuward: design optimization announced July 2024; EDF withdrew Nuward from Great British Nuclear’s SMR competition July 2024; relaunched January 2025 as a simplified 400 MW PWR concept. Conceptual design finalization target: mid-2026. FOAK target: “the 2030s.” World Nuclear News, January 7, 2025. WNN

[13] Nuclear Energy Summit, Boulogne-Billancourt, March 10, 2026. Von der Leyen: “I believe that it was a strategic mistake for Europe to turn its back on a reliable, affordable source of low-emissions power.” €200 million guarantee for innovative nuclear technologies, funded from the EU Emissions Trading System. Macron called on banks and venture capital funds to invest in civilian nuclear, stating that “each public and private actor must take its share.” NucNet, Bloomberg, ANS Nuclear Newswire, March 10-12, 2026. For context: OPG’s Darlington first BWRX-300 reactor is budgeted at CAD $6.1 billion (~US$4.5 billion, ~€4.1 billion). €200 million is roughly 5% of that figure. The total first-unit budget including shared site infrastructure (roads, tunnels, utilities serving all four planned units) is CAD $7.7 billion per World Nuclear News, June 2025. WNN

[14] The EU SMR Industrial Alliance, launched in February 2024, has over 350 members, including utilities, reactor developers, and supply chain companies, coordinating cross-border regulatory frameworks. CEZ (Czech Republic) signed a strategic partnership with Rolls-Royce SMR in October 2024; Poland’s Orlen-Synthos selected Włocławek as the site for a BWRX-300 in August 2025. These utility-driven projects represent a genuine European SMR pipeline — but none is under construction, and the gap between industrial alliance membership and commercial deployment remains measured in years. The €200 million guarantee is designed to catalyze private investment into this pipeline; whether it scales to the tens of billions required for actual construction remains to be seen. European Commission

[15] Barakah cost: original KEPCO contract $20.4 billion (ENEC/KEPCO, 2009). Final cost reported at $32 billion per Power Technology (April 2020). Bloomberg has reported $25 billion. Per-kilowatt range: approximately $4,500-$5,700/kW for 5,600 MW nameplate capacity. Power Technology

[16] Barakah schedule: Unit 1 commercial operation originally targeted May 2017; achieved April 2021, approximately four years late. Unit 4 commercial operation September 2024 versus original 2020 target. WNA

[17] Vogtle Units 3 and 4: approximately $35 billion for 2,234 MW, or roughly $15,700/kW. Barakah at $5,700/kW (high estimate) is approximately one-third of Vogtle’s cost per kilowatt. ENEC reported a 40% improvement in the operational readiness schedule from Unit 1 to Unit 4. EIA

[18] China HTR-PM: 210 MWe pebble-bed reactor at Shidao Bay, Shandong Province. Commercial operation declared December 2023. IAEA PRIS. IAEA

[19] Linglong One (ACP100): 125 MWe, Changjiang Nuclear Power Plant, Hainan Island. CNNC targets commercial operation in H1 2026. World Nuclear Association. WNN

[20] China: 32 reactors under construction as of early 2026 per World Nuclear Association. CNNC operates an enrichment capacity of approximately 9 million SWU per year. WNA

[21] Vogtle Units 3 and 4: original cost estimate approximately $14 billion (2012); final cost approximately $35 billion. Construction began in 2013; Unit 3 commercial operation: July 2023; Unit 4: April 2024. Georgia Public Service Commission filings. EIA

[22] Northern Virginia hosts approximately 35% of US data center capacity by some estimates. Loudoun County alone contains more than 300 data centers. Population density and existing residential development make reactor siting politically nonviable. CBRE

[23] Ireland: the Electricity Regulation Act 1999, Section 18(6), effectively prohibits nuclear power generation. Dublin is among the top five data center markets in Europe (approximately fifth in IT load, after London, Frankfurt, Amsterdam, and Paris, per JLL and CBRE FLAP-D rankings). Irish Statute Book

[24] Oregon: ORS 469.595 imposes a moratorium on nuclear plant construction pending federal resolution of waste disposal. Amazon, Google, Meta, and Apple operate significant data center campuses in The Dalles, Prineville, and Hillsboro. Oregon Legislature

[25] Germany shut down its last three operating reactors — Emsland, Isar 2, and Neckarwestheim 2 — on April 15, 2023. Frankfurt is the largest data center market in continental Europe. CNBC

[26] Water co-location: nuclear reactors require cooling water for the steam turbine cycle, even advanced designs that use non-water primary coolants (sodium, helium, molten salt). Data centers require cooling water for heat rejection systems (cooling towers, evaporative systems). Co-locating both roughly doubles the site's water demand. In the American West and Southwest — where Google, Meta, and Microsoft are expanding data center campuses — water rights are contested, drought is chronic, and permitting for water-intensive industrial use is becoming politically contentious. Some jurisdictions (e.g., The Dalles, Oregon; Mesa, Arizona) have already restricted or conditioned data center water usage.

[27] The author’s analysis of announced SMR-data center project sites versus data center demand locations, based on public filings and company announcements.

[28] Energy Northwest’s Columbia Generating Station is in Richland, WA, approximately 180 miles from The Dalles, OR, where Google, Amazon, and Meta operate data centers. Energy Northwest

[29] OPG Darlington BWRX-300: CNSC construction license issued April 2025. The original target for commercial operation was approximately 2028; the grid connection is now targeted for the end of 2030, roughly two years behind the original schedule. Globe and Mail, May 8, 2025; CBC, May 8, 2025; POWER Magazine, April 2025. CNSC

[30] TerraPower Natrium: NRC construction permit issued March 4, 2026 — the first for a commercial non-light-water reactor in over 40 years. 345 MW sodium-cooled fast reactor. Still requires a separate operating license. TerraPower stated nuclear construction would begin “in the coming weeks” as of March 9, 2026; first nuclear concrete expected late 2026 or 2027. Commercial operation targeted 2030 (DOE) to 2031 (TerraPower COO). DOE press release; NucNet, March 3, 2026; POWER Magazine, March 2026. WNN

[31] Natrium is rated at 345 MW — technically above the 300 MW threshold commonly used to define small modular reactors. The IAEA and some industry sources classify it as an advanced reactor rather than an SMR. This piece groups it with the SMR-for-data-centers category because hyperscalers, investors, and the press have consistently described it as part of the SMR wave, and the structural constraints the piece identifies (HALEU dependency, supply chain concentration, workforce, timeline) apply regardless of the classification. Latitude Media explicitly noted the classification discrepancy in March 2026. Latitude Media

[32] Hinkley Point C: original budget approximately £18 billion (2015 prices). Current estimate approximately £35 billion in 2015 prices (equivalent to roughly £48-49 billion in current prices) per EDF’s February 2026 annual results. Unit 1 officially targeting 2030, though further slippage remains likely given EDF’s track record. EDF and UK government disclosures. EDF

[33] Olkiluoto 3: construction began in August 2005; regular electricity production began on April 16, 2023; TVO designated formal commercial operation on May 1, 2023. Original budget approximately €3 billion; final cost approximately €11 billion. TVO/Areva disclosures; World Nuclear Association. WNA

[34] NuScale UAMPS: 462 MW (six 77 MW modules). Original cost estimate $5.3 billion; escalated to $9.3 billion ($20,139/kW). Cancelled November 8, 2023. IEEFA analysis; NuScale/UAMPS joint statement. E&E News

[35] OPG Darlington BWRX-300 FOAK: CAD $6.1 billion (~US$4.5 billion) for 300 MW = approximately US$15,000/kW. GE Hitachi original cost target: US$700 million per reactor, or approximately $2,333/kW. Escalation ratio: approximately 6.4×. Globe and Mail, May 2025; Carbon Commentary analysis. WNN

[36] TerraPower Natrium: official estimate $4 billion for 345 MW (DOE ARDP 50/50 cost share: $2 billion federal, $2 billion TerraPower). Official per-kW: approximately $11,600/kW. The $29,000/kW projection is from the Southern Alliance for Clean Energy (June 2025), citing Bill Gates interview comments on the expected all-in cost. Both figures cited for transparency; the actual cost will be determined by construction experience. DOE

[37] FOAK overhead vs. structural costs: first-of-a-kind nuclear projects include one-time costs that subsequent units would not bear. OPG Darlington’s CAD $1.6 billion in shared infrastructure (roads, cooling tunnels, administrative buildings) is allocated to the first unit but serves all four. TerraPower’s $4 billion includes the Sodium Test and Fill Facility and the Natrium Fuel Fabrication Facility — both one-time investments. NuScale’s cost escalation included first-time NRC design certification costs that a second project using the same certified design would avoid. For investment evaluation, the relevant question is not the FOAK cost but whether the FOAK-to-NOAK cost reduction ratio matches vendor projections, which requires completing the first unit on schedule, which no Western project has done.

[38] Lazard LCOE+ Version 18.0 (June 2025). Unsubsidized LCOE: utility-scale solar $38-$78/MWh; community and C&I solar $78-$217/MWh (a separate category with different economics); onshore wind $37-$86/MWh; nuclear $141-$220/MWh. The body text uses the utility-scale solar low end ($38) as the relevant comparison for hyperscaler procurement decisions. Lazard

[39] Lazard LCOE+ Version 18.0. Gas combined cycle: $48-$109/MWh unsubsidized. This undercuts nuclear ($141-$220) across its entire range. Lazard

[40] Lazard LCOE+ Version 18.0 includes a “firming cost” analysis that adds the cost of supplemental capacity needed to make intermittent renewables reliable in each ISO region. Firming adders vary by region and renewable penetration level. Even with firming costs included, the delivered cost of wind and solar-plus-storage remains below new nuclear LCOE in most US markets. However, the gap narrows, particularly in regions with high renewable penetration, where effective load-carrying capacity is declining. The honest comparison for a data center procurement team is delivered, firm cost — not generation-point LCOE. Lazard

[41] HALEU: high-assay low-enriched uranium, enriched to between 5% and 19.75% U-235. Conventional reactor fuel is typically enriched to 3-5%. Designs requiring HALEU: X-energy Xe-100, TerraPower Natrium, Kairos Power KP-FHR, and Oklo Aurora. Designs using conventional LEU: GE Hitachi BWRX-300, Rolls-Royce SMR, Westinghouse AP300. The hyperscaler-backed fleet skews heavily toward HALEU designs. NRC

[42] TENEX (Rosatom subsidiary) is the only entity producing HALEU at a commercial scale as of March 2026. World Nuclear Association; DOE HALEU program documentation. WNA

[43] Centrus Energy: HALEU demonstration cascade at Piketon, OH. DOE contract. Production is approximately 900 kg in 2025. The DOE issued a $900 million task order in January 2026 to accelerate domestic HALEU production across multiple contractors. DOE

[44] Orano: Operating enrichment capacity at Georges Besse II is sufficient for LEU. Higher-enrichment production at commercial scale is not expected before the late 2020s, per industry reporting and Orano corporate communications. Orano

[45] Kazakhstan: approximately 39-40% of global uranium production (WNA, 2024). A significant portion is converted or enriched at Russian facilities. Kazakhstan also has direct export routes via China and via the Caspian Sea. The transit dependency is partial, not absolute. WNA

[46] Rosatom: approximately 27 million SWU/year across four facilities, representing 40-46% of global enrichment capacity (approximately 61-67 million SWU/year total). Rosatom plus CNNC (China): over 60% of global capacity. WNA; Mordor Intelligence. WNA

[47] TRISO fuel (required by X-energy, Kairos) and metallic fuel (required by TerraPower) are not produced at a commercial scale in the West. China’s fuel fabrication capacity includes TRISO lines at INET/Tsinghua for the HTR-PM.

[48] Major nuclear forging facilities: Japan Steel Works (Muroran), Doosan (Changwon, South Korea), Shanghai Electric and Dongfang Electric (China), Framatome/Creusot Forge (France), ENSA (Spain), Atomenergomash (Russia). BWXT in the US manufactures naval reactor components and is expanding into commercial nuclear, but does not currently mass-produce large commercial reactor pressure vessel forgings.

[49] The author’s synthesis is based on nuclear construction activity tracked by WNA, IAEA PRIS, and country-specific regulatory filings.

[50] DOE United States Energy & Employment Report (USEER) 2025: 67,900 workers in nuclear energy (approximately 68,000). 63% of nuclear manufacturing employers reported hiring “very difficult.” DOE

[51] NEI estimate, confirmed by DOE and multiple industry sources: approximately 40% of the current US nuclear workforce is eligible to retire within the next decade. Retirement eligibility does not equal certainty of departure; IAEA uses a lower global figure of approximately one-third. NEI

[52] IAEA: projects that global nuclear capacity could increase 2.5× by 2050, requiring “over four million professionals.” Multiple IAEA publications, 2024-2025. IAEA

[53] EIA, NRC data: US nuclear fleet average refueling outage duration 30-38 days, depending on year (top performers achieve 25 days; fleet average has trended down from 44 days in 2000). Refueling frequency: every 18-24 months for most US commercial reactors. EIA

[54] Advanced SMR refueling targets per IAEA design documentation: 3-7 years between refueling for some designs; Westinghouse AP300 claims a 4-year cycle. These are design targets, not demonstrated commercial performance.

[55] NRC and WANO data show that US forced (unplanned) outage rates have averaged 2-3% over the past decade. A nuclear capacity factor of 92% implies approximately 29 days per year offline (planned and unplanned downtime combined). Uptime Institute Tier IV standard guarantees 99.995% availability, or approximately 26 minutes of annual downtime. The gap between 2-3% unplanned unavailability and 0.005% allowable unavailability is roughly 400-600×. Uptime Institute

[56] Uptime Institute Tier III and Tier IV certifications require redundant power paths, including on-site engine-generator backup, regardless of the primary power source. The Tier Standard is technology-neutral — diesel, natural gas, or DRUPS systems all qualify, though the overwhelming majority of implementations use diesel. Even a 100% reliable nuclear reactor (which no reactor is) would not eliminate the need for a generator to maintain tier certification. The SMR therefore adds a power source without removing any existing one. Uptime Institute

[57] The author’s analysis is based on Schneider Electric data center design guidance, IEEE/IEC redundancy standards, and ABB DataCenterKnowledge reporting on DC power architecture. Standard N+1 and 2N redundancy architectures require grid, UPS, and diesel backup regardless of primary power source. No NRC guidance exists for the co-location of a reactor data center on a shared campus.

[58] PJM/NERC reporting: dozens of data centers in Northern Virginia disconnected during a July 2024 transmission event, with reported losses exceeding 1 GW. Exact figures vary across reporting sources; the NERC event report details have not been fully published. The reverse scenario — sudden grid demand from reactor outage — presents comparable load-balancing challenges.

[59] Emerging grid codes requiring data center “virtual power plant” behavior — controlled ramp rates (10-20 MW per minute during reconnection), staged restoration logic, and voltage ride-through capability — are documented in sgrids.com grid code analysis and FERC Order 2023 interconnection requirements. The regulatory intersection of NRC reactor oversight and FERC/PJM grid participation rules on a single campus has no precedent.

[60] Nuclear construction and data center construction compete for the same constrained skilled trades — electricians, heavy-equipment operators, specialized welders — particularly in markets like Virginia and the Southeast, where both buildouts are concentrated. DOE USEER 2025 documents the skilled-trade shortage across energy infrastructure.

[61] The author’s assessment applies the five-question SMR Delivery Test to each announced deal. The test evaluates delivery readiness, not economics; cost analysis is developed separately in the “What it costs” section. A sixth question — “What does the power cost per megawatt-hour, and how does that compare to alternatives?” — is analytically essential but excluded from the delivery test because no SMR deal has progressed far enough to produce a binding price. Oracle’s announcement fails all five: no reactor identified (fails Q1), no fuel determined (fails Q2), no site identified (fails Q3), no supply chain mapped (fails Q4), no operational plan disclosed (fails Q5).

[62] Deal structures vary significantly across the table. Microsoft-Constellation TMI is a signed 20-year power purchase agreement. Amazon-Talen Susquehanna is a signed front-of-meter PPA. Google-Kairos is a milestone-contingent offtake agreement. Meta-TerraPower includes investments in reactor development, as well as offtake commitments for future units. Oracle has disclosed no contractual commitments, vendor agreements, or site selection in any SEC filing — the 1+ GW figure originates from an earnings call. The binding nature of the commitment is a material distinction for investment evaluation. The table focuses on the five major hyperscalers but is not exhaustive: Standard Power selected NuScale for ~1,848 MW across two data center sites in Ohio and Pennsylvania (October 2023) — the only deal using an NRC-certified design, but no construction timeline has been disclosed. Amazon also signed an MOU with Dominion Energy for ~300 MW at North Anna, Virginia (October 2024) — an exploratory agreement, not a PPA. Equinix signed SMR deals with Oklo, Radiant, Rolls-Royce/ULC-Energy, and Stellaria totaling over 1 GW (August 2025). None of these additional deals would score above 1/5 on the delivery test, which reinforces rather than undermines the pattern. NuScale ; Dominion Energy ; Equinix

[63] Constellation-Microsoft: 20-year PPA for TMI Unit 1 (renamed Crane Clean Energy Center), 837 MW. Restart cost: $1.6 billion. DOE $1 billion loan closed November 18, 2025, at 0.375% interest rate. Target: 2027 (accelerated from original 2028 timeline). Constellation Energy press release, September 20, 2024; DOE Loan Programs Office, November 2025; CNBC, November 18, 2025. PPA pricing: Jefferies analysts estimated approximately $110-115/MWh over the 20-year contract term. Neither Constellation nor Microsoft has disclosed the exact price. For context, wholesale PJM spot prices fluctuated between $30 and $ 80/MWh in 2024-2025; new-build nuclear LCOE is $141-220/MWh, per Lazard. The TMI PPA sits between spot and new-build — a premium for firm, carbon-free, long-duration price certainty. Constellation Energy

[64] Meta-Constellation: 20-year PPA for output of Clinton Clean Energy Center, a 1,121 MW BWR in Clinton, Illinois. Signed June 3, 2025. Begins June 2027. Replaces the Illinois Zero Emission Credit program expiring mid-2027. Includes 30 MW uprate. Preserves 1,100 jobs and $13.5 million annual tax revenue. Constellation is also evaluating the deployment of an advanced reactor (SMR) at the Clinton site. Constellation Energy press release; CNBC, June 3, 2025; World Nuclear News. Constellation Energy

[65] Meta-Vistra: agreements for nuclear power from Perry, Davis-Besse (Ohio), and Beaver Valley (Pennsylvania). Includes uprates to increase output. Combined fleet capacity is approximately 2,600 MW. Vistra announcement, January 9, 2026. Utility Dive

[66] Amazon-Talen Energy: PPA for power from Susquehanna nuclear plant, 1,920 MW (two units). Front-of-meter PPA structure after FERC rejected the original behind-the-meter design. Generating power. Multiple reporting sources. SEC

[67] Google-NextEra: agreement to restart Duane Arnold, a 615 MW BWR in Iowa, shut down in August 2020. Target: early 2029. Announced October 27, 2025. NextEra Energy press release; ANS Nuclear Newswire. NextEra Energy

[68] Fervo Energy Cape Station: Phase I delivers 100 MW baseload enhanced geothermal to the grid in 2026. Phase II adds 400 MW by 2028, bringing the total to 500 MW. BLM permit approved for up to 2 GW expansion. Located in Beaver County, Utah. $462 million Series E (December 2025), $206 million additional financing (June 2025). Fervo Energy press releases; Canary Media. Fervo Energy

[69] Chevron, GE Vernova, and Engine No. 1: partnership announced January 28, 2025. “Power foundries” using seven GE Vernova 7HA natural gas turbines, up to 4 GW behind-the-meter co-located with data centers in the US Southeast, Midwest, and West. Initial in-service targeted end of 2027. Chevron/Engine No. 1 joint press release. Engine No. 1

[70] Meta’s Hyperion project in Richland Parish, Louisiana: a $3.2 billion, 2 GW combined-cycle natural gas plant to power a planned data center campus. Meta also secured approval from the Ohio Power Siting Board for a 700 MW natural gas plant at its Prometheus AI campus in New Albany, Ohio — expanded from an initial 400 MW proposal. Both projects deliver power on a 2–3 year timeline, compared to the 2030s targets for Meta’s SMR partnerships. Brookings Institution, March 2026; Ohio Power Siting Board filings; tech-insider.org. Tech Insider

[71] Lazard LCOE+ Version 18.0: battery energy storage system costs showed “sharp year-over-year declines” driven by oversupply of cells and increased energy density. Lazard

[72] Microsoft FY23 Q1 earnings call, October 26, 2022. CFO Amy Hood disclosed “$800 million of greater-than-expected energy cost” for fiscal year 2023, driven primarily by the European energy crisis, compressing cloud margins by approximately 1 percentage point. This preceded the AI-driven demand acceleration; more recent earnings calls (FY26 Q1 and Q2) reference continued margin pressure from AI infrastructure investment but do not isolate energy costs specifically. Microsoft

[73] CBRE Data Center Trends reports, 2025. Ashburn, VA wholesale colocation rates ranged from $175 to $225/kW/month in H2 2024, with premium deals exceeding $215 in 2025. Year-over-year increases of 6.5-17.6% in 2025, following sharper surges in 2023-2024. CBRE

[74] The author’s analysis synthesizes PJM capacity cost data, hyperscaler earnings disclosures, and enterprise cloud pricing trends. Energy accounts for approximately 30-40% of data center operating expenses across multiple industry analyses (Wolfspeed/Power & Beyond, Schneider Electric, Uptime Institute).

[75] The double-exposure mechanism: enterprise cloud customers pay hyperscaler energy costs via cloud service pricing AND pay ratepayer capacity charges via their own electricity bills. Both cost streams are driven by the same underlying data center load growth. PJM capacity costs from the three auctions (cumulative $21.3 billion at 45% data center share) are allocated across all ratepayers in the thirteen-state region.

[76] Virginia State Corporation Commission: approved new electricity rate classes for customers exceeding 25 MW, effectively creating a data-center-specific pricing tier. Georgia, Colorado, and several other states are pursuing similar legislative or regulatory measures as of early 2026. Utility Dive and state PUC filings. Utility Dive

[77] Combined hyperscaler capex for 2026: $600-690 billion across Amazon, Alphabet, Microsoft, Meta, and Oracle, consuming nearly 100% of operating cash flows vs. a ten-year average of approximately 40% (UBS estimate). Amazon projected negative free cash flow of $17-28 billion (Morgan Stanley, Bank of America estimates). Alphabet FCF projected to decline ~90% to $8.2 billion (Pivotal Research). Sources: CNBC, February 6, 2026; CreditSights, November 2025; Platformonomics, February 2026. CNBC

[78] Hyperscalers issued over $121 billion in bonds in 2025 per BofA analysis. Alphabet’s long-term debt quadrupled in 2025 to $46.5 billion, including a 100-year sterling tranche. Amazon raised $15 billion in November 2025 and a further $37-42 billion in March 2026, bringing total debt to over $100 billion. Oracle plans $40-50 billion in capex for FY2026, with negative free cash flow projected through 2029. Fortune, December 2025; Yahoo Finance/24/7 Wall St., March 2026; BofA, December 2025; Bloomberg, March 10, 2026.

[79] Stranded-asset risk: commercial nuclear reactors are licensed for 40-80 years (with renewal). AI computing architectures — GPU configurations, model training approaches, and inference optimization — operate on 3-5-year cycles. The temporal mismatch creates the risk that a reactor commissioned to serve one computing paradigm will outlive several successive paradigms, potentially becoming an oversized, inflexible asset. Note: this risk is specific to the co-location thesis. A grid-connected reactor serves whoever needs baseload power, regardless of computing architecture — the electrons go to the grid, and the grid doesn’t care what’s plugged in at the other end.

[80] The fiber-to-the-home deployment arc is well documented in telecommunications history. Promises of universal fiber deployment in the 1990s gave way to wireless bridging technologies. Fiber deployment eventually reached high-density corridors decades later, primarily through infrastructure investors rather than the original telecom incumbents.

[81] OPG Darlington four-unit budget: CAD $20.9 billion (~US$15.1 billion) for 1,200 MW, expressed in 2024 dollars and including interest charges and contingencies. First unit: CAD $6.1 billion (~US$4.5 billion) for 300 MW, or ~US$15,000/kW. OPG expects later units to cost “substantially less.” Globe and Mail, May 8, 2025; World Nuclear News, June 2025. WNN

[82] Geothermal drilling speed: Fervo’s drilling times have decreased with each new well at Cape Station, and overall efficiency has improved as the project advances. A geothermal well that underperforms can be redrilled or stimulated in weeks to months. A nuclear reactor that underperforms enters a regulatory and engineering correction cycle measured in years. The structural difference in iteration speed is one reason venture capital (B Capital, Breakthrough Energy Catalyst, Google) has invested in geothermal at a pace typically associated with software, not energy infrastructure. Fervo Energy

[83] China’s pattern of state-directed infrastructure deployment outpacing Western market-driven approaches has been documented across high-speed rail (40,000+ km vs. Amtrak’s Acela corridor), solar panel manufacturing (>80% of global production share), and EV battery production (CATL and BYD dominating global supply). China’s nuclear construction program — 32 reactors under construction as of early 2026 — is the latest iteration of this pattern. WNA

[84] In May 2025, President Trump signed executive orders aimed at quadrupling US nuclear power production from approximately 100 GW to 400 GW by 2050, including measures to accelerate NRC review and clear permitting roadblocks. If successful, these reforms would compress the timeline dimension — but they do not address the cost escalation, HALEU fuel dependency, supply chain concentration, workforce shortage, or operational complexity that constitute the other five dimensions of failure. Regulatory reform is necessary but not sufficient. White House

[85] U.S. Army Janus Program: announced October 2025 at AUSA; nine installation sites named November 18, 2025 (Fort Benning, Fort Bragg, Fort Campbell, Fort Drum, Fort Hood, Fort Wainwright, Holston Army Ammunition Plant, Joint Base Lewis-McChord, Redstone Arsenal). Executive Order 14299, “Deploying Advanced Nuclear Reactor Technologies for National Security,” mandates an Army-regulated reactor operational at a domestic installation by September 30, 2028. The Army uses its independent regulatory authority under Section 91 of the Atomic Energy Act, bypassing the NRC entirely. Defense Innovation Unit AOI released November 18, 2025. Army.mil; Breaking Defense; World Nuclear News. US Army

[86] Operation Windlord, February 15, 2026: Valar Atomics’ Ward250 microreactor (5 MW, unfueled) transported via three C-17 Globemaster III aircraft from March Air Reserve Base, California, to Hill Air Force Base, Utah. Energy Secretary Chris Wright and Under Secretary of Defense for Acquisition Michael Duffey were aboard. The reactor will be moved to the Utah San Rafael Energy Lab for testing. DOE Reactor Pilot Program (EO 14301): 10 companies selected across 11 projects, August 2025, goal to achieve criticality of at least three test reactors by July 4, 2026, using DOE authorization — not NRC licensing. Selected companies include Aalo Atomics, Antares Nuclear, Atomic Alchemy (Oklo subsidiary), Deep Fission, Last Energy, Natura Resources, Oklo (two projects), Radiant Industries, Terrestrial Energy, and Valar Atomics. Reuters, February 15, 2026; The War Zone; World Nuclear News; DOE press release, August 2025. Reuters ; DOE

[87] The military pathway confirms, rather than contradicts, the piece’s thesis. The only Western institution deploying microreactors on a timeline competitive with China is the one that removed itself from the commercial regulatory process (NRC), accepted HALEU dependency as a national security cost, and justified reactor economics on mission assurance rather than grid price competition. The Janus Program’s contracting model — commercially owned reactors on military installations, milestone-based payments modeled on NASA’s COTS program — could eventually produce nth-of-a-kind cost data that informs the commercial market. But the regulatory pathway, cost justification, and customer base are fundamentally different from the hyperscaler-for-data-centers thesis. The fiber analogy holds: the first deployment will be by different actors, for different use cases, through different channels — and the commercial market will eventually benefit, years later, from the learning curve the military funded.

[88] Fusion is the other nuclear technology attracting AI-sector capital — and its delivery timeline is even longer than fission SMRs. Helion Energy (backed by $375 million from Sam Altman personally, plus rounds from Mithril, Lightspeed, and SoftBank) has a PPA with Microsoft for 50 MW starting in 2028 from a plant not yet built, using a technology that has not demonstrated scientific breakeven. On March 23, 2026, Axios reported that OpenAI is in talks to purchase 5 GW from Helion by 2030 and 50 GW by 2035 — requiring approximately 100 reactors in four years from a company that has built zero, scaling to 1,000 by 2035. Altman stepped down as Helion board chair to manage the conflict of interest. Google has also signed a separate agreement with Commonwealth Fusion Systems for 200 MW. This piece focuses on fission SMRs because they are closer to commercial deployment; fusion commitments are even more speculative but follow the identical announce-delay pattern. TechCrunch ; Axios

Reverse-Engineering The Apollo 11 Code with AI

Julien Simon — Sat, 28 Mar 2026 08:26:18 GMT

The complete walkthrough — 8 modules, 6,500 lines of technical analysis, all prompts used, and a full process trace documenting what the AI got right and wrong — is on GitHub: apollo11-ai-walkthrough. Please give it a ⭐️ if you found this useful.

The Apollo 11 Guidance Computer source code has been on GitHub since 2016. 40,000 lines of 1960s assembly code for a 15-bit computer with 4 KB of RAM. The code that flew Neil Armstrong to the Moon, handled the 1202 alarms during descent, and brought the crew home. Public domain.

Almost nobody can read it.

Why I Did It

Two things collided in my head.

First, Artemis II is about to fly, the first crewed Moon mission since Apollo 17 in 1972. It felt like the right moment to look at the mind-boggling software that got us there the first time, before the next chapter starts.

Second, I keep hearing the same objection from engineering teams when I talk about AI agents: “Our codebase is too legacy for AI.” Too old. Too weird. Too far from the Python and TypeScript that LLMs were trained on. Hold my beer.

If an AI can make sense of 1960s assembly for a 15-bit computer with 1’s-complement arithmetic and bank-switched memory, your decade-old Java monolith is not the hard case you think it is.

The Hardware

Some numbers to calibrate your intuition. The Apollo Guidance Computer had a 1.024 MHz clock (from a 2.048 MHz oscillator, divided by two). A typical instruction took two memory cycles of 11.72 microseconds each — about 23 microseconds per instruction, or roughly 43,000 instructions per second. It addressed 36,864 words of fixed (ROM) memory and 2,048 words of erasable (RAM) memory. Each word was 15 bits plus a parity bit. Total memory: about 72 KB in modern byte-equivalent terms — but only ~4 KB of that was RAM — magnetic core memory, tiny ferrite rings whose magnetization direction stored one bit each. The rest was read-only. The AGC cost roughly $200,000 per unit in 1966 dollars — about $1.9 million today.

Four kilobytes of working memory. To put that in perspective: a typical smartcard chip (the one in your bank card) runs an ARM SC300 at 30+ MHz with 300 KB of ROM — faster clock, more memory, fits on your fingernail. An Arduino Uno (16 MHz, 32 KB flash, 2 KB SRAM, $25) is remarkably close to the AGC’s spec sheet, fifty years later. The Apple II (1977, 1 MHz 6502, 48 KB RAM) had a comparable clock speed and more RAM for $1,298 — eight years after Apollo 11. A modern washing machine controller runs a Cortex-M0 at 48 MHz with up to 256 KB of flash — roughly 50x the AGC’s clock speed.

The AGC, however, was purpose-built for one job: real-time guidance and navigation in space. Its ROM was core rope memory — literally woven by hand by factory workers, threading wires through or around tiny magnetic cores to encode ones and zeros. A single bit was a physical knot. The entire program was frozen into hardware months before launch and could not be patched in flight. The AGC also had a hardware restart capability (GOJAM), hardwired I/O channels to the inertial measurement unit, the radar, the engine, and the DSKY display. No general-purpose computer of the era could do what it did because none were designed to survive the failure modes of spaceflight.

The software was written by a team of about 350 people at the MIT Instrumentation Laboratory, led by Margaret Hamilton. Many of the flight software developers were in their mid-twenties. Hamilton coined the term “software engineering”, a phrase considered an oxymoron at the time. Her team’s insistence that software be engineered with the same rigour as hardware is what saved the Apollo 11 landing when things went wrong.

The Problem

AGC4 assembly is a dead language. The architecture is 1’s-complement (not 2’s-complement like every modern CPU). The primary conditional branch, CCS, does a 4-way skip based on positive, plus-zero, negative, and minus-zero — because 1’s-complement has two representations of zero. There’s no stack. One register holds one return address. Memory is bank-switched across three registers, plus a “superbank” bit. The codebase is split between native assembly and an interpreted bytecode language that runs on a software virtual machine built into the AGC itself.

Existing resources cover the history well. The Virtual AGC project at ibiblio.org provides emulators and an excellent assembly language manual. Borja Sotomayor wrote a good Medium explainer on the FLAGORGY subroutine. But nobody had done a systematic, module-by-module technical walkthrough of the actual code — the kind where you trace register contents through an instruction sequence and explain what each line does and why.

I wanted to know if AI could do that. Not as a stunt, but as a genuine test: can an LLM trained overwhelmingly on modern code make sense of a dead architecture for which it has almost no training data?

The Method

The critical insight came early: you cannot just point Claude at AGC assembly and ask it to explain. Without an architectural context, the model assumes modern conventions. It treats CCS as a simple conditional. It misses the TS skip-on-overflow pattern. It doesn’t understand bank switching.

So I built a 3,500-word context prompt — a condensed AGC4 architecture reference covering the instruction set, memory map, register file, interrupt system, and the interpreter’s packed opcode format. I also fetched and cached the actual Virtual AGC Assembly Language Manual from ibiblio.org and injected key sections alongside my summary as ground truth. Belt and suspenders: the summary provides the model with a reasoning framework; the raw manual prevents hallucination about specifics.

The workflow was five phases, all scripted (all prompts are in the repo):

Context priming — the architecture reference, loaded into every API call
Repo reconnaissance — scan all 175 .agc files, extract headers, categorize by function
Targeted deep dives — one per key module, each receiving the full source file plus the architecture context
Synthesis — feed all walkthrough files back in, extract cross-cutting lessons
Quality check — cross-reference claims across files, verify against the manual, flag inconsistencies

I used Claude Code’s CLI in pipe mode (claude -p) with Opus 4.6. Each deep dive took 3-7 minutes of compute. Total wall-clock time for the entire project: under an hour of model time across two days. No API key needed — my Max subscription covered it.

What the Code Reveals

Eight walkthrough files. 6,500 lines of analysis. Here’s what each module taught us (module names in bold link to the relevant analysis).

The Executive — The AGC had no operating system; the Executive was the operating system. It implements cooperative multitasking across 7 fixed-core sets with priority-based scheduling in ~600 lines of assembly language. The design anticipates Go’s goroutine scheduler and Python’s asyncio by decades. With fixed resource pools and static analysis, the MIT team could prove their scheduler would never run out of slots, something no modern system with 10,000 goroutines can claim.

The Waitlist — The timer-driven task scheduler that ran the AGC’s real-time heartbeat. It fires tasks at precise intervals using the hardware TIME3 counter, managing up to 9 concurrent timed events. The source comments include a hand-computed worst-case execution time analysis, written in 1966, before real-time systems theory existed as a formal discipline.

Fresh Start & Restart — The module that saved Apollo 11. When the 1202 alarms fired during descent, this code restarted the computer, verified the integrity of a checksummed phase table, reinitialised all scheduling, and resumed the guidance equations within milliseconds, while the descent engine kept firing. This is a crash-only design and “let it crash” philosophy, implemented 20 years before Erlang and 37 years before the pattern was formally described at Stanford.

Landing Guidance Equations — The math that flew the Lunar Module to the surface. Programs P63 (braking), P64 (approach with redesignation), and P66 (manual rate-of-descent) implement a gravity-turn guidance algorithm running at 2 Hz in interpreted bytecode. The code handles the transition from automatic to manual control, the moment Armstrong took the stick to dodge a boulder field.

BURN_BABY_BURN — The master ignition routine that starts every engine burn. It uses table-driven virtual method dispatch — structurally identical to a C++ vtable — so one generic routine handles descent, ascent, and orbital burns. Also, the most culturally rich file in the codebase: Latin inscriptions (”NOLI SE TANGERE” — touch it not), a reference to the Order of the Garter, and the word “EXTIRPATE” where a modern programmer would write “clear.”

The Interpreter — The flight software wouldn’t fit in 36K words of ROM as native assembly, so MIT built a bytecode virtual machine inside the AGC. It packs two 7-bit opcodes per 15-bit word, provides vector/matrix math and trig functions, and runs 10–25x slower than native code, but it saved an estimated 15,000–40,000 words of ROM. Without it, there is no Moon landing. This predates the Java JVM by nearly 30 years.

Pinball Game (DSKY Interface) — The astronaut’s only interface to the AGC: 19 keys and 7-segment displays. The Verb-Noun command language is one of the earliest structured human-computer interfaces. The display buffer (DSPTAB) uses sign bits as dirty flags, the same principle as React’s virtual DOM diffing, in 14 words of 1966 assembly. At ~3,800 lines, it’s one of the largest modules in Luminary.

Lessons for 2026 — A synthesis essay covering architectural patterns ahead of their time, constraints as a design force, the 1202 story as a case study in graceful degradation, and what a 2026 engineer building safety-critical systems can still learn from code written for a 15-bit computer with 2K of RAM.

To give you a flavour of what this code looks like, here’s how the master ignition routine announces itself, and how the error handler is named:

# BURN, BABY, BURN -- MASTER IGNITION ROUTINE

# THE MASTER IGNITION ROUTINE IS DESIGNED FOR USE BY THE
# FOLLOWING LEM PROGRAMS: P12, P40, P42, P61, P63.

		TC	POSTJUMP	# RESUME SENDS CONTROL HERE
		CADR	ENEMA
POODOO		INHINT
		CA	Q
ABORT2		TS	ALMCADR

Yes, the fatal error handler for the Moon landing software is called POODOO. The routine it jumps past is called ENEMA. These were real labels in real flight code, reviewed by NASA, and woven into core rope memory. The engineers were in their twenties, under existential pressure, and coped by naming things accordingly.

Where the AI Struggled

Honest accounting matters more than the successes. I double-checked the model’s output with additional runs.

The model showed ambiguity about the exact number of Executive core sets (7 vs 8—a CCS loop-counter interpretation question). Both readings are defensible depending on whether you count the running job’s context as a “core set.” I standardised the language across all files.

The I/O channel 14 description in the BURN_BABY_BURN walkthrough was oversimplified — described as “controls the DPS throttle” when channel 14 is actually a multi-function output channel where specific bits handle engine commands. Accurate in spirit, imprecise in detail. I corrected it.

The DSKY walkthrough called the Verb-Noun interface “the world’s first” command-line interface. Probably true given the 1966 date, but unprovable. I softened it to “one of the earliest.”

The pattern is consistent: the model is strong on control flow, data structures, and architectural reasoning. It struggles at the hardware boundary — where software behaviour depends on physical properties of specific registers, I/O channels, or timing. This is exactly what you’d expect from a model trained overwhelmingly on high-level code. The context prompt helps, but it can’t fully substitute for hands-on experience with the actual hardware. Every claim at the hardware boundary needed manual verification.

None of these errors was catastrophic. The architectural understanding — instruction semantics, control flow, data structures — was consistently correct across 6,500 lines of output.

The Takeaway

This project isn’t about AI writing code. The AI didn’t produce a single line of AGC assembly. It reads code and translates it into something modern developers can understand.

Most code is read far more than it is written. The most important code is old. The AGC is an extreme case, but the pattern applies everywhere: legacy COBOL in banking, vintage Fortran in scientific computing, decade-old C++ in embedded systems. If AI can make 1960s assembly for a dead architecture accessible, what else can it unlock? Legacy modernisation. Regulatory code review. Technical due diligence on acquisitions. Onboarding engineers onto unfamiliar codebases.

The Apollo 11 source code is nearly six decades old. It ran on a computer with not enough memory to store this piece. And an AI trained on modern code — with the right architectural context — can read it, trace its control flow, identify its design patterns, and explain why it still matters.

The process I followed — context priming, structured reconnaissance, targeted deep dives, synthesis — isn’t specific to the AGC. It’s a playbook for any legacy codebase. If you’re sitting on a million lines of COBOL, Fortran, or early C++ that nobody fully understands, this same approach can help you explore, document, and plan a migration path. Feed the AI the architecture context it needs, point it at the code module by module, and let it build the documentation that should have existed all along.

No code is too old for Claude. You just have to teach the architecture first.

Build, Regulate, Scare

Julien Simon — Fri, 27 Mar 2026 15:11:23 GMT

On March 17, the Innovation Makers Alliance played an AI-generated fake newscast to an amphitheatre at Station F.[1] IMA is a consortium of the CIOs, CDOs, and innovation directors at three-quarters of the CAC 40 — France’s top-40 listed companies — and major French government ministries, roughly 10,000 tech decision-makers from 165 organizations.[2] These are the people who sign the purchase orders for Europe’s US digital services. They are the dependency.

The scenario: Trump threatens to cut Europe off from all US digital services unless he gets the Greenland deal. Cloud collapses. Logistics freeze. Cards stop working. The audience, by the journalist’s account, went silent.[3]

Ten days later, Siemens CEO Roland Busch told the Financial Times that Europe’s push for AI sovereignty could be “a disaster.”[4] Siemens is investing €1 billion in AI tools and prioritizing that investment in the US and China.[5]

The Station F audience and the Siemens CEO are looking at the same dependency. They are drawing opposite conclusions. The audience sees the threat as disconnection; the CEO sees the dependency as manageable — use the best tools no and build European alternatives over time. The audience is wrong about the mechanism. The CEO underestimates the risk. A dependency that is commercially rational for both sides can still be strategically dangerous — and the coercion doesn’t require pulling the plug.

Act 1: Build (and never build). The sovereignty conversation is framed as though Europe fell behind during the cloud era. The deficit is older than that. Much older.

No European microprocessor architecture at commercial scale.[6] No operating system built on one. No productivity suite built on that. No search engine. No smartphone platform. No social network. No cloud hyperscaler. No GPU or AI accelerator. One European enterprise software company at a global scale — SAP, founded in 1972, nothing comparable since.[7] The stack builds upward. Europe is absent at every layer.

This is not a gap that opened in 2015 when AWS hit its stride. It is a fifty-year structural deficit in every software and platform layer above the physics. Europe builds excellent cars, trains, aircraft engines, turbines, nuclear reactors, satellites, and lithography machines. ASML makes the equipment without which no advanced chip on Earth gets fabricated.[8]

But Europe has never built the digital infrastructure that runs on top of the hardware. The reasons are structural — employment systems that penalize the job-hopping a software culture requires, compensation structures that undervalue engineering relative to finance and consulting, procurement cultures that optimize for buying over building, and capital markets that have never produced a venture ecosystem at US or Chinese scale.[9] The software layer — the layer that compounds, that scales at near-zero marginal cost, that creates platform lock-in — has been American for half a century, and increasingly Chinese in consumer platforms for the last fifteen years. Europe has been a customer at every layer, in every era.

European enterprises spend approximately €264 billion per year on US digital services, roughly 80% of their total digital expenditure.[10] That is not a gap that regulation can close. It is a market verdict, delivered annually, by European companies choosing American infrastructure with their own budgets, because nothing European competes at the same level.

Act 2: Regulate (what you can’t build). The AI Act entered into force in August 2024.[11] The Cloud and AI Development Act (CAIDA) was supposed to follow — initially scheduled for March 25, 2026, then delayed to April 15, then again to May 27.[12] Layer on the DMA, DSA, and the expanding stack of cloud certification and data sovereignty requirements. The regulatory architecture is formidable, but it is not a substitute for capability.

The regulations were not all designed as sovereignty instruments: the DMA targets market dominance, GDPR targets privacy, and the AI Act targets risk. The cumulative effect is the same regardless of intent: friction applied to tools that European enterprises depend on, without competitive European alternatives to switch to.

The sovereignty advocates would argue that the €264 billion isn’t a free-market verdict — it’s a network effects trap, where European enterprises choose AWS because everyone else does, and migration costs are prohibitive. That’s a legitimate diagnosis. But the prescription — regulate first, build alternatives later — still fails the competitiveness test. Who pays the productivity cost while Europe waits?

A French manufacturer running SAP on Azure with Copilot competes against a Chinese manufacturer running the same stack without the regulatory friction and an American manufacturer running it natively. Force the French manufacturer onto OVHcloud, Nextcloud, and Mistral — even assuming all three work flawlessly, which they don’t at the same scale today — and you’ve imposed a productivity tax on a company that’s already behind. Sovereignty purchased at the cost of competitiveness is a subsidy for your competitors.

Busch named this precisely. Europe’s approach is “miscalibrated,” he told the FT. The US is “a fast-flowing river.” Europe is “standing water.”[13] He is not arguing that sovereignty doesn’t matter. He is arguing that you don’t get there by throttling the tools your companies need today while waiting for European alternatives that don’t exist yet. Slowing European adoption of AI while the US and China accelerate widens the gap that the regulation claims to close. Both costs are real — the productivity cost of forced migration and the strategic cost of connected dependency — and no one has demonstrated a path that avoids both.

Last July, a group of Europe’s top CEOs wrote to Ursula von der Leyen, asking the Commission to pause obligations under the AI Act for high-risk AI and general-purpose AI (GPAI) models.[14] The CAIDA package has been delayed twice. The Commission is listening to industry on the timeline. But the direction of travel is unchanged: more rules, applied to someone else’s products. Regulation doesn’t make European tools better. It makes American tools harder to use. Those are not the same thing.

Act 3: Scare (when regulation isn’t fast enough). Since January 2026, a wave of “Trump déconnecte l’Europe” scenarios has swept French media — Le Point, Futura Sciences, France 24, L’Usine Digitale — all variations on the same premise.[15]

There is one problem with the premise: Trump has never threatened to disconnect Europe from US digital services. The scenario was manufactured by the French IT establishment. In April 2025, the Cigref — the association of IT directors at France’s largest companies and government ministries — commissioned a study from Nicolas Bouzou’s Asterès consultancy. The methodology: interviews with six CIOs, extrapolated to all of Europe. The result: the €264 billion dependency figure that now appears in every article on the subject.[21] Cigref’s own delegate general admitted the number “doesn’t appear in any official statistics.”[22] When Trump’s National Security Strategy (December 2025) used “civilizational erasure” language about Europe and the Greenland crisis escalated in January, the lobby had its hook. Within weeks, every major French outlet was running the same scenario, citing the same number, arriving at the same conclusion: buy European cloud. The beneficiaries are not hard to identify: OVHcloud, Scaleway, Clever Cloud, and the consulting firms that would manage the migration.

The format matches the premise. The Station F video is an AI-generated deepfake newscast — a format the EU’s own AI Act (Article 50) requires to be labeled as synthetic content.[20] The organizers described it as a parody. The audience’s reaction suggests they weren’t in on the joke. A deepfake video dramatizing a scenario no one has threatened, sourced to a number no official statistic confirms, presented to the 10,000 executives who would sign the procurement contracts. That is not an analysis. It is a sales pitch.

Which is a shame, because the real coercion cases the video cited deserve better framing. Thierry Breton was banned from entering the US after enforcing the DSA. French ICC judge Nicolas Guillou was cut off from Visa, Mastercard, Amazon, and Airbnb after authorizing arrest warrants for the Israeli prime minister.[16] Huawei lost access to Android and Google Play Services overnight due to an executive order. These are real. They are also the strongest argument against the scenario the video dramatizes, because every one of them was targeted, surgical, and connected. Not a continental blackout. You don’t cut the cord. You hold the scissors. The CLOUD Act’s compelled disclosure provision (18 U.S.C. § 2713) works because data flows through US-controlled infrastructure.[17] Disconnection destroys the leverage. Connection preserves it.

The economics confirm it. That €264 billion flows both ways. No American president orders Amazon, Microsoft, and Google to vaporize a quarter of a trillion dollars in annual revenue. The math wouldn’t survive the lobbying response.

The verdict. Build failed—not recently, but structurally, across 50 years and every platform layer. Regulate has legitimate objectives — safety, privacy, market fairness — but without competitive European alternatives, its cumulative effect is friction on the tools European companies depend on, taxing their competitiveness to protect an industry that doesn’t yet exist at an equivalent scale. Scare mobilizes by dramatizing a threat model that gets the mechanism exactly backwards.

Let’s be plain about what happened at Station F on March 17. A lobby group blamed Donald Trump for a threat he never made, projected a deepfake to an amphitheatre of 10,000 executives, and called it a wake-up call for European sovereignty. The €264 billion figure anchoring the entire evening was commissioned by the lobby itself, based on six interviews, and wasn’t confirmed by any official data source.[21][22] The executives in the room are the procurement decision-makers who would benefit from the migration that the event advocates. The format — an AI-generated fake newscast designed to provoke fear before being revealed as parody — is the definition of the disinformation the EU spent five years building the DSA and AI Act to combat.[20] Le Point sent its rédacteur en chef Sciences et Tech. He reported the fear. He listed the alternatives. He did not question the premise, the number, the format, or who benefits. He promised his readers he would write about Panoramax.[19]

The European Union has spent half a decade legislating against fake news, deepfakes, platform manipulation, and undisclosed lobbying. On March 17, at Station F, a fake news scenario was dramatized via deepfake, promoted by a registered lobby, amplified by uncritical press coverage, and presented to the audience most likely to act on it without scrutiny. This was not a failure of digital sovereignty. It was a post-truth evening organized in its name.

The threat to European digital autonomy is real. I mapped the actual coercion mechanism — three switches, all wired, all permanent — in “Access, Disable, Destroy.”[18] But the threat isn’t that someone pulls the plug. It’s that they never need to, because the dependency is too comfortable and too profitable for both sides to break — and because the people responsible for fixing it would rather blame Trump than explain why fifty years of European industrial policy never produced a cloud platform.

Notes

[1] Innovation Makers Alliance, event at Station F, Paris, March 17, 2026. ima-dt.org

[2] IMA is a loi 1901 non-profit consortium founded in 2015. Its membership includes the CIOs, CDOs, innovation directors, and digital transformation leads at approximately 165 organizations — three-quarters of the CAC 40 (L’Oréal, BNP Paribas, Crédit Agricole, AXA, among others), major French ministries, and mid-cap enterprises. As of January 2026, the consortium claims roughly 10,000 individual tech decision-makers. IMA is registered as a lobbying entity with the European Commission. Sources: IMA website and EU Transparency Register.

[3] Le Point, “Et si un jour Donald Trump décidait de déconnecter l’Europe,” March 2026.

[4] Roland Busch, CEO of Siemens, interview with the Financial Times, published March 25, 2026. Accessible summary at Cybernews.

[5] Siemens is investing €1 billion in AI tool development; Busch’s statements to the FT indicated the majority is directed at the US and China. For context, the European Commission allocated €1.3 billion for AI, cybersecurity, and digital skills investment in spring 2025. The comparison is directional, not exact — Siemens’s figure covers AI tools specifically; the Commission’s covers AI plus cybersecurity plus digital skills across the full EU. The scale proximity remains striking. Note: Busch has a direct commercial interest in frictionless access to US AI infrastructure. Siemens is a major consumer of the tools the sovereignty agenda would constrain.

[6] ARM was designed in Cambridge but is owned by SoftBank (Japan) and fabbed by TSMC (Taiwan). It is not a European-controlled architecture in any operational sense. RISC-V, an open-source instruction set architecture, has European champions (SiPearl for HPC; the RISC-V International association is incorporated in Switzerland), but no commercially dominant European RISC-V processor ships at scale as of early 2026. ASML makes lithography equipment — it does not design or fabricate chips.

[7] The inventory covers nine platform layers in the modern digital stack: processor architecture, operating system, search, productivity software, mobile platform, social network, enterprise SaaS, cloud infrastructure, AI accelerator. Europe is absent from eight of nine. SAP (founded 1972, Walldorf) is the sole European enterprise software company operating at global scale — and it is the exception that sharpens the question: why has nothing comparable followed in over fifty years? Europe is present at the physics and manufacturing layer (ASML, Airbus, automotive) — the layers below the platform stack, not within it.

[8] ASML holds a monopoly on extreme ultraviolet (EUV) lithography machines required for advanced chip fabrication below 7nm. Every advanced chip from TSMC, Samsung, and Intel depends on ASML equipment. This is a genuine and strategically significant European capability — at the hardware manufacturing layer, not the software platform layer. asml.com

[9] The structural mechanisms behind Europe’s software gap are examined in detail in the “Why Can’t It Build an LLM?” country series, particularly the Japan, South Korea, and France entries. The pattern recurs across countries: employment systems that penalize mobility, compensation structures that channel talent away from software, procurement cultures that optimize for buying over building, and venture ecosystems that have never matched US or Chinese scale. The gap is structural, not incidental — it reflects system design, not bad luck.

[10] Cigref/Asterès, “Le coût de la non-souveraineté numérique de l’Europe,” 2024/2025. The 80% figure was cited by Gilles Babinet, co-president of the Conseil national du numérique, at the IMA event.

[11] EU AI Act (Regulation 2024/1689) entered into force August 1, 2024. Full application by August 2027, with staggered compliance deadlines. A group of major European CEOs requested a pause on high-risk and GPAI obligations in July 2025.

[12] The tech sovereignty package (Cloud and AI Development Act, Chips Act 2, open-source software strategy, AI-in-energy roadmap) was initially planned for March 25, 2026, postponed to April 15, then rescheduled to May 27. Reported by Euractiv. Note: the Act is referred to as both “CAIDA” and “CADA” in coverage; this piece follows the Euractiv convention. The broader regulatory stack also includes the EU Cybersecurity Certification Scheme for Cloud Services (EUCS) and France’s SecNumCloud certification — both adding compliance requirements for cloud providers operating in European public-sector and sensitive-data contexts.

[13] Busch, FT interview, March 25, 2026. Cybernews summary.

[14] Letter from European CEOs to European Commission President Ursula von der Leyen, July 2025. Reported by Euronews, July 3, 2025.

[15] The January–March 2026 wave includes coverage in Le Point, Futura Sciences, France 24, L’Usine Digitale, La Libre Belgique, and numerous aggregators. The Cigref/Asterès dependency figures appear in nearly all of them.

[16] Thierry Breton was subject to a US visa ban on December 24, 2025, under Section 212(a)(3)(C) of the Immigration and Nationality Act, for his role in creating the DSA. Secretary of State Marco Rubio described the targets as “agents of the global censorship-industrial complex.” Widely reported (CNBC, CNN, NPR, Euronews). Nicolas Guillou, French judge on the ICC Pre-Trial Chamber, was sanctioned on August 20, 2025 via expansion of Executive Order 14203 for authorizing arrest warrants against Israeli Prime Minister Netanyahu. Guillou has publicly described being cut off from Visa, Mastercard, Amazon, Airbnb, PayPal, and Expedia — stating that sanctions are “like being sent back to the 1990s.” Source: Euronews, February 18, 2026; Verfassungsblog legal analysis, December 23, 2025. Both cases demonstrate targeted, surgical, connected coercion — not wholesale disconnection.

[17] The CLOUD Act’s compelled disclosure provision (18 U.S.C. § 2713) requires providers of electronic communication or remote computing services subject to US jurisdiction to produce data in their “possession, custody, or control” regardless of where that data is stored. The mechanism depends on connection, not disconnection. See “Access, Disable, Destroy” for the full three-switch model.

[18] “Access, Disable, Destroy: The Three Switches No One Models,” The AI Realist.

[19] Le Point, March 2026. The closing paragraph reads, in full: “J’ai trouvé ce débat, en plus d’offrir une opportunité pour l’économie européenne, passionnant. Et, j’essaierai au fur et à mesure de futurs articles, de vous faire connaître des solutions françaises ou européennes intéressantes, un peu comme Panoramax, qui sont encore trop peu connues et qui, peut-être, vous intéresseront?” This is the science and technology editor-in-chief of a major French newsweekly.

[20] EU AI Act, Article 50(4): deployers of AI systems that generate or manipulate content constituting a deep fake shall disclose that the content has been artificially generated or manipulated. An exemption exists for content that is “part of an evidently artistic, creative, satirical, fictional or analogous work or programme” — but “evidently” is the operative word. A deepfake newscast designed to produce a fear response before being revealed as synthetic is not evidently fictional at the moment of viewing. The Le Point journalist described the audience reaction as an “électrochoc” — which suggests the fictional framing was not evident to the audience when the video played. Whether the IMA event’s use of the video complied with Article 50 is a question no outlet covering the event has raised.

[21] Asterès, “La dépendance technologique aux softwares & cloud services américains: une estimation des conséquences économiques en Europe,” commissioned by Cigref, presented April 25, 2025, at the Rencontres Numérique de Strasbourg. Methodology: interviews with six CIOs from large French enterprises, extrapolated to the EU on the basis that cloud-software spending is proportional to GDP. The €264 billion figure covers BtoB software and cloud services purchased from US providers. The study was subsequently cited in the November 2025 Franco-German Digital Sovereignty Summit joint declaration and in virtually every French-language article on the “Trump disconnects Europe” scenario in January–March 2026. The IMA event at Station F (March 17, 2026) and the CISPE letter to the Commission were filed on the same day. The beneficiaries of a sovereignty-driven procurement shift — European cloud providers (OVHcloud, Scaleway, Clever Cloud) and consulting firms managing migration — are well represented in the organizations commissioning and amplifying the research.

[22] Henri d’Agrain, delegate general of the Cigref, acknowledged that the €264 billion figure “n’apparaissent dans aucune statistique officielle” (does not appear in any official statistics), attributing the opacity to profit repatriation structures routed through Ireland. The admission is significant: the single number anchoring the entire disconnection campaign is an estimate from a consultancy commissioned by the lobby it serves, based on six interviews, and confirmed by no official data source.

Acquired, Absorbed, Disaggregated

Julien Simon — Thu, 26 Mar 2026 22:43:34 GMT

(Updated on March 27th based on new public information).

Gadi Hutt, director of product and customer engineering at Amazon’s Annapurna Labs, has left the company. [1] He is the second Annapurna leader to depart in the past 7 months. In August, Rami Sinno — director of engineering — left to join Arm Holdings. [2] Arm launched its first production CPU this week, explicitly positioned for “agentic AI infrastructure,” with Cerebras as a launch partner. [3] The engineering director of the chip AWS built left the company for the one building the CPU layer of the architecture that replaced it.

Between those two departures, three things happened. AWS announced a partnership with Cerebras to disaggregate inference, splitting prefill and decode across two vendors’ chips. [4] Peter DeSantis was elevated to lead a unified org spanning AI models, custom silicon, and quantum. [5] And Nvidia launched its own disaggregated inference rack, built around a non-GPU chip. [6] The full-stack Trainium story that Gadi and Rami spent years building lost its architecture, its org structure, and its two most visible leaders.

I worked with Gadi and his team across three companies. At AWS, where I spent six years. At Hugging Face, where Inferentia and Trainium adoption were part of the ecosystem play I led as Chief Evangelist. And at Arcee AI, where making custom silicon work in production was a practical, daily question. Gadi ran engineering and solutions architecture teams — responsible for making Annapurna’s chips work for customers, not just pitching them. The story he carried held up for a long time: Trainium is a full-stack AI chip that competes with Nvidia on training and inference. One product. One pitch.

The pivot nobody is naming. The AWS-Cerebras deal is either smart engineering or a structural concession — and the org chart tells you which one Amazon thinks it is. [7] Trainium handles prefill, the compute-bound phase. Cerebras’s WSE-3 handles decode, the memory-bandwidth-bound phase where the model generates tokens sequentially. [8] David Brown, VP of Compute & ML Services, is the named spokesperson for the new architecture — not anyone from Annapurna’s product organization. [9] When TechCrunch toured the Trainium lab in Austin in March, the guides were Kristopher King and Mark Carroll, engineering leadership. [10] Gadi was already gone. Rami was already at Arm.

The org shift behind the architecture shift. Annapurna was not originally structured as a service team. It operated as an R&D center — designing chips and delivering them to AWS service teams, with its own leadership, customer relationships, and its own external voice. That changed roughly two years ago, when Brown’s compute and ML services organization absorbed the product and go-to-market layer that sat between Annapurna’s engineering and external customers. Then, in December 2025, DeSantis was elevated to lead the unified org, reporting directly to Jassy. [5] Each step moved Annapurna closer to becoming an internal component supplier rather than an independent product shop. By the time the Cerebras deal was announced, the organizational structure that had given Gadi his role — an Annapurna that spoke for itself — no longer existed. Amazon spent a decade acquiring Annapurna. It spent the last two years digesting it.

What the deal reveals. Gadi told Time last year that “Stargate is easy to announce — let’s see it implemented first.” [11] That confidence reflected the old story: AWS builds the chips, builds the servers, builds the datacenter, runs the whole stack. The Cerebras deal breaks it — the inference pipeline now runs on someone else’s silicon for its most demanding phase.

There are 1.4 million Trainium chips deployed across three generations; Anthropic’s Claude runs on over one million of them. [12] OpenAI has committed to two gigawatts of Trainium capacity — a commitment, not yet a deployment. [13] Trainium succeeded at training and at prefill. This is not a failure of the chip.

But AWS had made a consequential bet along the way: it discontinued Inferentia entirely. [14] The rationale was sound — Trainium1 was actually better at inference than Inferentia2. Inf2 was designed as a lower-cost chip, optimized to crush inference costs as a slower but more cost-effective alternative to GPUs. When your training chip outperforms your dedicated inference chip at inference, you consolidate. AWS did.

Then the market changed beneath the consolidated architecture. Agentic AI made inference the dominant workload — generating 15x more tokens per query than conversational chat [15] — and decode became the binding constraint on cost and latency. The Reasoning Tax breaks the monolithic chip by concentrating costs on the phase the chip handles worst. [16] Trainium could win inference when it meant prefill. It could not when inference meant decode at reasoning scale. Whether killing Inferentia was yet another AWS miscalculation or the deliberate first step toward the disaggregated architecture AWS eventually built with Cerebras, the result is the same: the full-stack chip story ended.

The industry confirmed it. Nvidia shipped Dynamo, the open-source framework for orchestrating disaggregated prefill and decode, which all four major hyperscalers are adopting. [17] Then at GTC 2026, Nvidia launched the Groq 3 LPX — its first rack built around a non-GPU chip. [6] Rubin GPUs handle prefill. Groq’s SRAM-based LPUs handle decode. Same split, different partners. When Nvidia and AWS reach the same architectural conclusion in the same month — one through a reported $20 billion Groq licensing deal, the other through Cerebras — that is not two companies making independent bets. That is an industry settling a technical argument. For reasoning-heavy inference at scale, the “one chip does everything” era ended in March 2026.

For all my frustration and cursing at the Neuron SDK, I have a lot of respect for what Gadi and the Annapurna team built. The Inferentia-to-Trainium arc is the most ambitious custom silicon program any cloud provider has shipped, and the adoption numbers vindicate the engineering. [18] The departure is not a verdict on the person. It is a verdict on the narrative.

The talent is telling you where the architecture went. Rami Sinno is at Arm, building the CPU for agentic inference. Gadi’s next move will complete the signal. The people who built the full-stack chip story are the clearest evidence that it is over — not just at AWS, but everywhere.

Notes

[1] The Information, “Amazon AI Chip Product Leader Departs,” March 26, 2026. Hutt’s title was Director of Product and Customer Engineering at Annapurna Labs. He ran engineering and solutions architecture teams — author’s direct knowledge from working with Annapurna across AWS, Hugging Face, and Arcee AI.

[2] The Information [1] notes Hutt is “the second Annapurna leader to depart in the past seven months after Rami Sinno left to join Arm Holdings in August.” Sinno was director of engineering at Annapurna Labs — the same role visible in the Fortune (April 2025) and TechCrunch (March 2026) lab tours.

[3] Arm Holdings, “Arm Expands Compute Platform to Silicon Products,” March 24, 2026. First production silicon in Arm’s 35-year history. 136 Neoverse V3 cores, TSMC 3nm, positioned for “agentic AI infrastructure.” Meta is lead co-developer; OpenAI, Cerebras, Cloudflare among launch partners. See also CNBC exclusive, “Arm Launches Its Own CPU, with Meta as First Customer,” March 24, 2026.

[4] AWS press release, “AWS and Cerebras Collaboration Aims to Set a New Standard for AI Inference Speed and Performance in the Cloud,” March 13, 2026.

[5] Andy Jassy, “Amazon Leadership Update,” aboutamazon.com, December 17, 2025. “I’ve asked Peter DeSantis to lead a new organization that drives our most expansive AI models (e.g. Nova—and the team we’ve called ‘AGI’), silicon development (e.g. Graviton, Trainium, Nitro), and quantum computing.” Jassy confirms DeSantis “spearheaded the acquisition of Annapurna Labs” in 2015 and “continues to manage that team.” The organizational absorption of Annapurna’s product and go-to-market layer into Brown’s compute org occurred approximately two years earlier — author’s direct knowledge from the customer and partner side, working with Annapurna across AWS, Hugging Face, and Arcee AI.

[6] Nvidia, Groq 3 LPX announced at GTC 2026, March 17, 2026. Rubin CPX GPU racks handle prefill; LPX handles decode. Based on a reported $20 billion licensing agreement with Groq — figure not confirmed by Nvidia in filings; if accurate, it would be material enough to require disclosure. Coverage: WinBuzzer, March 17, 2026. Nvidia’s own GTC materials confirm the product and disaggregated architecture.

[7] Brown, quoted in [4]: the disaggregated architecture means “each system does what it’s best at.” That is legitimate engineering. It is also a departure from the Trainium-does-everything positioning that defined Annapurna’s external narrative for years. Both readings are correct; the org changes determine which one is operative.

[8] Cerebras, “Cerebras Is Coming to AWS,” March 13, 2026. In disaggregated mode, Trainium handles prefill (computing the KV cache), sent to the WSE via EFA. The WSE exclusively performs decode. The WSE-3 houses 44 GB of on-chip SRAM — no HBM — eliminating the memory-bandwidth bottleneck that constrains conventional GPU decode.

[9] Brown was the named AWS spokesperson in the March 13 joint announcement [4], the AWS Silicon Innovation Day keynote (2023), and the Peter DeSantis/Dave Brown infrastructure keynote at re:Invent 2025. His title evolved from VP Amazon EC2 to VP Compute & ML Services as the scope expanded. Separately, Gadi Hutt ran engineering and solutions architecture teams at Annapurna and was the external face of the chips across re:Invent 2022, Time (April 2025), and Fortune (April 2025). His departure removes both the engineering bridge and the customer-facing narrative from Annapurna’s product layer.

[10] TechCrunch, “An Exclusive Tour of Amazon’s Trainium Lab,” March 22, 2026. Tour led by Kristopher King (lab director) and Mark Carroll (director of engineering).

[11] Gadi Hutt, quoted in Time, “Inside Amazon’s Race to Build the AI Industry’s Biggest Datacenters,” April 2, 2025.

[12] TechCrunch, March 22, 2026. Company-reported figure: “1.4 million Trainium chips deployed across all three generations, and Anthropic’s Claude runs on over one million of the Trainium2 chips deployed.”

[13] AWS-Cerebras joint press release [4], March 13, 2026. “OpenAI will consume 2 gigawatts of Trainium capacity through AWS infrastructure.” This is a commitment, not a deployment. See also Jassy’s CNBC interview on the OpenAI-Trainium relationship.

[14] Next Platform, “With Trainium4, AWS Will Crank Up Everything But The Clocks,” December 3, 2025. “With Trainium2...AWS moved on to the NeuronCore-v3 architecture and stopped making Inferentia chips because inference started becoming more like training.” There is no Inferentia3. Practitioner context: Trainium1 was already outperforming Inferentia2 at inference. Inf2 was designed as a lower-cost, lower-performance chip optimized to reduce inference costs — a slower but cheaper alternative to GPUs, not a faster one. Consolidating around Trainium was the rational engineering decision given the performance gap. The question the Cerebras deal answers is which kind of inference Trainium wins: prefill (yes), decode at reasoning scale (no). Author’s direct knowledge.

[15] Cerebras [8], March 13, 2026. “Agentic coding generates approximately 15x more tokens per query.” Vendor-published figure; directionally consistent with Nvidia Dynamo documentation describing the same workload shift.

[16] Julien Simon, “AWS Built Its Own AI Chip. Now It Needs Someone Else’s,” The AI Realist, March 15, 2026. Introduces the Reasoning Tax framework, the Platform Absorption Test, and the three-ecosystem convergence (AWS-Cerebras, Nvidia-Groq, Huawei Ascend 950). The present note is a personnel coda to that structural analysis.

[17] Nvidia, “NVIDIA Enters Production With Dynamo, the Broadly Adopted Inference Operating System for AI Factories,” investor relations press release, March 16, 2026. Dynamo 1.0 is open-source, production-grade, integrated by AWS, Microsoft Azure, Google Cloud, and OCI. See also Nvidia developer blog and glossary entry on disaggregated serving.

[18] TechCrunch [10], March 22, 2026. Apple’s director of AI publicly described Apple’s use of Graviton, Inferentia, and Trainium at an AWS event. Anthropic and OpenAI commitments sourced in [12] and [13].

The King's New Datacenters

Julien Simon — Wed, 25 Mar 2026 17:02:34 GMT

Updated on April 2nd: Sesterce’s “redressement judiciaire” has been overturned.

On February 11, 2025, at Station F in Paris, Emmanuel Macron announced that France had secured €109 billion in private investment for artificial intelligence. He compared the figure favorably to America’s $500 billion Stargate project. The audience applauded. The press reported the number. The Élysée published the breakdown.[1]

One year later, at the AI Impact Summit in New Delhi, Macron was still performing. The €109 billion had become “we are delivering this project — €58 billion in 2025.” The figure corresponds to a UNCTAD report that counted announced foreign greenfield projects, and the overwhelming majority of the total consisted of two entries: MGX and Brookfield, the same pledges from the Paris summit, reclassified as FDI announcements. The same money, counted three times: once as a summit pledge, once as a UNCTAD greenfield announcement, once as “delivery” in Delhi.[2]

In March 2026, the masks are coming off. One pledge left for America. One entered judicial restructuring. One is a holding company with seven employees working on a public inquiry that hasn’t started. And France’s sovereign AI champion — the company the state claimed as proof that the model works — trains its models on American cloud infrastructure, co-develops base models on Nvidia’s DGX Cloud, and just invested €1.2 billion in Sweden for its first data center outside France. The summit didn’t produce an AI strategy. It produced a ledger of failing aspirations that it’s time to audit, line by line.

The Pledge Credibility Test

Before the cold accounting, a framework. Every infrastructure commitment made at a government summit can be graded on a five-tier scale, from aspiration to operational reality. The tiers: MOU (a memorandum of understanding, committing nothing), LOI (a letter of intent, slightly firmer), PPA or binding contract (a legal obligation conditional on construction), construction underway (capital deployed, permits issued, ground broken), and operational (power flowing, GPUs running, customers paying). The summit’s €109 billion was reported as though it occupied the top two tiers. Almost none of it did.[3]

The test also requires examining who made the pledge. A company with an existing balance sheet, operating assets, and debt covenants has structural accountability — miss the target and the bondholders notice. A startup with $4.5 million in disclosed equity and a €10 billion pledge has no accountability mechanism at all. The summit gave them equal billing because it needed a large, self-serving number.[4]

Here is the €109 billion, thirteen months later.

The departed: FluidStack (€10 billion)

On February 10, 2025, Macron stood beside FluidStack co-founder César Maklary as the company signed an MOU to build a one-gigawatt AI supercomputer in France. “This €10 billion agreement with FluidStack embodies my ambition,” Macron said.[5]

FluidStack had disclosed raising $4.5 million in equity. The pledge was 2,222 times its disclosed capital.[6]

On March 18, 2026, Bloomberg reported that FluidStack had completely withdrawn, pivoting to the United States after signing a $50 billion partnership with Anthropic to operate data centers in New York and Texas.[7] FluidStack also exited the Eclairion site south of Paris, where it was building an 18,000-GPU cluster for Mistral AI, a real facility with a multi-year contract, physical hardware, and a named marquee customer. Local authorities in Bosquel have reopened the application process. No replacement has been found. The company is relocating its headquarters from London to New York.[8]

What killed the Eclairion cluster is instructive. In June 2025, Mistral announced Mistral Compute, a direct partnership with Nvidia that replaced FluidStack as the commercial layer. Same département, same power scale, same GPU count. FluidStack was the intermediary between Nvidia hardware and Mistral’s training workloads. Once Mistral and Nvidia connected directly, the intermediary had no purpose.[9] The summit’s model was that foreign operators would build France’s sovereign infrastructure. The flagship French AI company replaced the flagship foreign operator with a direct chipmaker deal, and the summit’s model was disproved by the company it was supposed to serve.

The catalogue price: Sesterce (€52 billion announced, €400 million recorded)

Sesterce deserves its own section because it reveals the summit’s vetting standards — or their absence. The company, a sixty-person Marseille GPU cloud provider that pivoted from cryptocurrency mining, announced €52 billion in AI investment at the summit. The number was calculated, as L’Usine Digitale reported, by multiplying the catalogue price of high-end Nvidia GPUs by 1.2 million units and adding in infrastructure costs. The CEO acknowledged that the company “obviously does not have €52 billion in equity.” The Élysée quietly recorded only the first tranche at €400 million. The Journal des Entreprises noted the €52 billion claim exceeded total US private AI investment in 2023.[10]

On February 5, 2026 — five days before the one-year anniversary of its summit announcement — Sesterce Group entered judicial restructuring in the Marseille commercial court. The judgment was overturned on March 5, and the company says it has continued operating normally.[11]

The summit’s headline machinery could not distinguish between a commitment and a catalogue price multiplied by a wish. The €52 billion entered the discourse. The €400 million entered the Élysée ledger. The redressement judiciaire entered the commercial court. Each number lives in its own universe.

The holding company: MGX (€30–50 billion)

MGX is the largest single entry in the €109 billion. It is the most instructive, because unlike FluidStack or Sesterce, the MGX commitment progressed — on paper.

I searched French corporate records. Campus AI SAS was registered on April 7, 2025, SIREN 943 352 161. All three directors are senior MGX executives — the CIO for Semiconductors & Infrastructure, a Managing Director, and the COO/CFO. No Bpifrance, Mistral, or Nvidia personnel hold governance roles. BPI claims undisclosed “blocking rights” but invoked business secrecy when asked about the terms. The company has seven employees as of February 2026.[12]

The site is Fouju, in Seine-et-Marne: roughly seventy hectares of currently cultivated farmland. A public consultation ran in October–November 2025 and drew roughly 100 attendees to the opening session. RTE signed France’s first “Fast Track” grid connection contract in January 2026 — 240MW by the end of 2027, 700MW by 2029, designed for 1,400MW. This is a genuine and unprecedented reform achievement. The Fast Track procedure, approved by the energy regulator in spring 2025, requires a developer's upfront financial commitment to prevent phantom capacity reservations. That commitment is real.[13]

Everything else remains pre-construction. Building permits have been filed but not issued. The public inquiry has not opened. No construction contract has been awarded. No GPU orders have been disclosed. And here is the fact that reframes the entire €30–50 billion line: Campus IA does not build datacenters. It is a site developer seeking operators. As of February 2026, no data center operator had signed. Mistral — a founding shareholder — has not committed as a commercial tenant. In the project coordinator’s words, Mistral is “a preferred client lead… but today, nothing has been signed with it.” No external tenant has signed either.[14]

Meanwhile, MGX’s global portfolio includes co-investment in OpenAI’s Stargate project, co-lead investment in Anthropic, and participation in BlackRock’s $30 billion AI infrastructure fund. MGX targets approximately $10 billion in annual global deployment.[15] France’s €30–50 billion competes for that budget with every other commitment on MGX’s ledger. Stargate Abilene has two operational buildings. Stargate UAE has 5,000 construction workers on site. Campus IA at Fouju has seven employees and a large patch of French farmland.

The rest of the ledger

The remaining €15–20 billion falls into three categories, none of which changes the structural picture.

First, established operators continue pre-existing trajectories. Digital Realty — France’s largest datacenter operator with 13 facilities across Paris and Marseille, several still under construction — pledged €5 billion at the AI summit; the confirmed tranche at Choose France three months later was €2.3 billion across two specific projects — the remaining €5 billion remains unallocated commitments. The datacenter PAR11 opened on schedule in Q3 2025; MRS6 began its public inquiry in October 2025, with the building permit granted in January 2026. Equinix committed €630 million at the summit for new facilities in Paris and Bordeaux. Telehouse (€400 million), a KDDI subsidiary, is expanding incrementally in Paris. This is real infrastructure, built by these operators before the summit and will continue to be built after, regardless of who dines in the Élysée.[16]

Second, global financial investors are making thematic bets that happen to include France. Apollo’s $5 billion was a “funding initiative for AI energy projects” — global, not France-specific. Since the summit, Apollo’s actual deployments have been American: $3.5 billion in financing to xAI for compute through Valor, a majority acquisition of Stream Data Centers in Dallas, and $35 billion in financing discussions with Meta. No France-specific project has been announced. Amazon pledged €6 billion at the Paris summit; at Choose France in May, the confirmed figure was €300 million — the majority of which was logistics infrastructure, including a distribution center in Eure-et-Loir. AWS simultaneously committed €33.7 billion to Spain. France’s share of AWS’s $200 billion 2026 global capex is a rounding error.[17]

Third, ambitious startups with familiar leverage ratios. Evroc, a Swedish sovereign cloud company backed by EQT Ventures, pledged €4 billion — the “at full capacity” cost of a 96MW facility in Mougins, near Sophia Antipolis. Evroc had raised €50.6 million in Series A funding. The pledge-to-equity ratio is 79 to one — better than FluidStack’s 2,222 to one, but the same structural pattern. The Stockholm flagship (10,000 GPUs) targets H2 2026. Construction in Mougins was “expected to be completed in 2025”; no completion has been confirmed. Prologis, the world’s largest logistics REIT, pledged €3.5 billion at the AI summit, increased to €6.4 billion at Choose France — four datacenter sites in Île-de-France, 584MW total, “full commissioning planned for 2035.” It has a market cap of roughly $125 billion and a genuine corporate strategy to convert logistics properties into data centers. But this is a decade-long bet, and the company has not previously built a data center in France. Eclairion — FluidStack’s former partner — raised €50 million from Tikehau Capital after FluidStack’s departure and may host Mistral directly, but remains in early stages.[18]

None of these lines is fraudulent. Several are credible long-term bets. But they share a characteristic: the summit claimed them as evidence of France’s AI strategy, while the companies’ actual decisions — where to build first, how much to allocate, which country to prioritize — were made on commercial logic that has nothing to do with a presidential ceremony.

The sovereign champion’s stack

Mistral AI is not a pledge: it is a company, and a successful one. $400 million in annualized revenue. €11.7 billion valuation. Three technical founders who trained at DeepMind and Meta and came back to build. More power to them.[20]

But Mistral is the thread that runs through every failure on this ledger, because the state claimed Mistral as the sovereign champion — not a claim Mistral itself makes with any consistency — and Mistral’s actual compute trajectory is a map of every gap the sovereign infrastructure was supposed to fill.

The trajectory, in sequence. Mistral trained its frontier models on Nvidia GPUs — hosted across Microsoft Azure, Scaleway’s French cluster, and later its own Mistral Compute infrastructure — because in 2023–2024, no sovereign alternative existed at frontier scale.[21] Then came the promises: FluidStack would build an 18,000-GPU cluster for Mistral at Eclairion. MGX would build a 1.4GW campus where Mistral would be “a preferred client lead.” The state promised sovereign compute, and Mistral waited. FluidStack left. The campus is farmland. The sovereign compute never arrived.

So Mistral did what any rational company would do. In June 2025, it announced Mistral Compute — “a premier NVIDIA partner” offering “the latest NVIDIA reference architectures.”[22] This is Nvidia’s Cloud Partner program in everything but the acronym. Nvidia designs the stack. Nvidia provides the GPUs. Nvidia’s NIM microservices run the inference. Nvidia’s NeMo framework handles fine-tuning. Mistral operates it. In March 2026, Mistral joined the Nvidia Nemotron Coalition as a founding member, co-developing a base model trained on DGX Cloud.[23] The base model that European AI companies will fine-tune from is being trained on American cloud infrastructure, co-developed with an American chipmaker.

Mistral is a founding shareholder of Campus IA at Fouju. But its first major infrastructure investment was not there. It was €1.2 billion in Borlänge, Sweden — Nvidia Vera Rubin GPUs at an EcoDataCenter facility, targeting 2027.[24] Sweden has cheaper energy, faster permitting, and no Fouju-style public inquiry. The decision is rational. It is also the final proof that the summit’s sovereign infrastructure model failed on its own terms: the sovereign champion builds in Sweden, trains on DGX Cloud, distributes through Azure and AWS, and partners with Accenture for enterprise deployment.

And then, on March 19, 2026 — four days ago — Mensch published an op-ed in the Financial Times proposing a 1–1.5 percent revenue-based levy on all AI operators in Europe. In exchange, AI developers would be “shielded from liability for training on materials accessible on the web.” Three weeks earlier, Mediapart brought public attention to what practitioners already knew: Mistral had used copyrighted works, including Harry Potter, in its training data.[25]

I diagnosed the Levy Ratchet in “Register, Disclose, Pay”—the three-act European pattern in which enforcement fails and a levy steps in to fill the gap. Mistral is now the one proposing it. The competitive disadvantage Mensch describes is real — US and Chinese competitors train under more permissive copyright regimes, and the current EU opt-out system satisfies no one. But the levy retroactively legalizes what Mistral already did. It is prospective for any European competitor that hasn’t yet trained a frontier model. At $400 million in ARR, Mistral can absorb 1.5 percent. A startup trying to train its first model cannot absorb the levy, the computing costs, and the legal uncertainty it is meant to resolve. The solution benefits all European AI companies in theory. In practice, it benefits the one that already crossed.

Is Mensch cynical? No. He is rational. Every move follows the incentives the environment creates. France offers sovereignty branding and political access but no infrastructure. America offers infrastructure. The EU offers regulatory moats if you’re positioned to shape them. A rational company plays all three cards because that is the only hand the system deals.

The problem is not Mistral. The problem is the system that claims sovereignty for a company whose stack is American at every layer except the postal address.

If that is what the sovereign champion’s trajectory reveals about the system, what does credible infrastructure investment actually look like? The answer is the most boring entry in the ledger — and the only one where money, engineering, and regulatory process converge.

What it actually takes to build a datacenter

The summit presented infrastructure as though it were a purchase order — announce the number, sign the MOU, wait for delivery. Data4’s actual trajectory — the operational arm of Brookfield’s €20 billion pledge[26] — shows what datacenter construction requires in France, and why a summit pledge is approximately as useful as a weather forecast for a construction crew.

Start with land. France’s zero-net-artificialization law means you cannot pave a greenfield without justification. Data4 solved this by acquiring only brownfield sites — repurposed industrial or military land where the artificialization has already occurred. The Cambrai site is the former BA 103 airbase, inside the E-Valley logistics park in Hauts-de-France. Data4 already owns the land. The Nozay site is the former Nokia France headquarters, purchased in 2023 — two years before the summit pledge — for its proximity to Data4’s existing Marcoussis campus three kilometers away. The Escaudain site is a former Usinor steelworks that sat dormant for forty-five years until Data4 was selected by the local council in December 2025, beating a competing bid from AWS.[27] Each acquisition took years of negotiation, environmental assessment, and local political engagement. The summit pledges assumed land would materialize. Data4 spent years acquiring it.

Then power. A gigawatt-scale datacenter campus is the electrical equivalent of a city. The Cambrai site alone targets 1GW — roughly the output of one large nuclear reactor. Data4 signed a twelve-year nuclear power allocation contract with EDF in September 2025, the first such agreement between France’s nuclear operator and a data center company. The contract covers 40MW, approximately 230 GWh annually, under a cost-and-risk-sharing mechanism that took months to negotiate because the instrument didn’t previously exist for datacenter customers.[28] Data4 also signed a Westinghouse MOU to explore AP300 small modular reactors for future European sites — post-2030 power, not near-term. And it has renewable PPAs with Eurowatt for wind and Photosol for solar. Each energy contract has its own negotiation, regulatory approval, and timeline. The ARENH mechanism — France’s regulated price for buying nuclear power from EDF — expired at the end of 2025, meaning the cost of nuclear energy is rising, and post-ARENH contracts are being negotiated in an environment no French datacenter operator has previously experienced.[29]

Then the grid. Connecting a gigawatt of new load to the French transmission network is an unprecedented industrial challenge. RTE, the grid operator, has developed a “Fast Track” procedure specifically for datacenter connections — but even the fast track for Campus IA at Fouju delivers only 240MW by the end of 2027, with 700MW by 2029. Data4’s timeline for the Cambrai grid connection has not been publicly confirmed. In Hauts-de-France, which hosts eight of the government’s thirty-five designated turnkey datacenter sites — the most of any region — the cumulative new load from all announced projects exceeds anything RTE has previously accommodated. The data center grid has to be built. The data center cannot be built until the grid arrives.[30]

Then, permitting and local politics. At Nozay, the mayor, Didier Perrier, publicly stated he needed to be “listened to” by Data4, warning that without adequate consultation, there would be no building permit. At Escaudain, a council member raised concerns about energy consumption, water use, and the preservation of industrial heritage during the December 2025 vote. At Fouju, the environmental group FNE Seine-et-Marne filed unfavorable contributions, a La France Insoumise deputy filed parliamentary questions, and a local media investigation questioned the sovereignty narrative. France now allows datacenters over forty hectares to be classified as a “Projet d’Intérêt National Majeur” — effectively letting the state override local planning — but Campus IA has not been designated, and invoking the override would confirm every local critic’s complaint.[31]

Then financing. Data4 raised €3.3 billion in debt in January 2025 — the largest digital infrastructure financing in Europe — split between refinancing mature assets and a capex facility for new construction. It sold a 30 percent stake in its stabilized portfolio to Arjun Infrastructure for $3.6 billion in August 2025, then transferred a further 40.1 percent to Brookfield’s own super-core infrastructure fund in September. Financial engineering is deliberate: recycle capital from mature, revenue-generating facilities to fund greenfield builds that don’t yet produce revenue. This is infrastructure finance — the kind of capital structure that requires credit ratings, covenant compliance, and bondholder accountability at every stage.[32]

That is what it takes. Land acquisition, brownfield remediation, environmental assessment, power purchase agreements with nuclear and renewable providers, grid connection contracts with a national transmission operator, local political negotiation across multiple communes and départements, multi-billion-euro debt structures with institutional investors, and building permits that require public inquiries before a single foundation is poured. Data4 has been operating French datacenters for years. It started acquiring AI-relevant sites in 2023. And as of March 2026, no new AI-dedicated facility is operational. Building permits are pending at Cambrai and Nozay. This is not failure: it is the responsible pace of infrastructure at this scale. The problem is that the summit treated a multi-year industrial process as though it could be summoned with a press release. And while France navigates this process, competing jurisdictions are building at a different speed entirely. Stargate’s Abilene campus in Texas went from dirt to live servers in fifteen months. MGX’s own Stargate UAE has 5,000 construction workers on site building Phase 1 for a Q3 2026 delivery — the same MGX that has seven employees on farmland in Fouju. Mistral voted with its capital: its first infrastructure investment outside France was €1.2 billion in Sweden, where hydropower and faster permitting compress timelines that France cannot match.[35]

Now recall what the summit gave equal billing to: a company that multiplied GPU catalogue prices by 1.2 million and called the result an investment.

The outsider who has already built

Xavier Niel is the other profile that works, and his trajectory is the inverse of the summit model. Scaleway had roughly 5,000 GPUs running before the summit asked him to pledge. Kyutai, the open-source research lab he co-founded with Rodolphe Saadé and Eric Schmidt — €300 million total, operational since November 2023. Opcore, his data center joint venture with InfraVia, was announced in December 2024 and closed in April 2025, with €2.5 billion in committed investment. Scaleway is expanding across Europe — Milan launched in March 2026, and Sweden and Germany are planned — as the continent’s largest European-owned AI cloud.[33]

Niel’s €3 billion pledge at the summit formalized existing deployment. The infrastructure preceded the ceremony. The operational GPUs were announced before the press release. That is the opposite of the summit model, and the only approach to have produced operational AI compute. The irony is structural: the only French-owned, French-operated, sovereignty-compliant AI infrastructure at scale was built by a man who spent thirty years fighting the French state’s monopolies.

The pattern

The summit’s theory was: announce large numbers, attract foreign operators, and infrastructure will follow. The theory assumed the operators needed France more than France needed them.

A fair objection: thirteen months is early. Elon Musk built a 100,000-GPU supercomputer in Memphis in 122 days, but for the rest of the world, datacenter construction is a multi-year process, as the Brookfield section demonstrates. And the summit did produce genuine policy results — 35 rigorously pre-screened datacenter sites at the February summit, 28 more added in November 2025, the RTE Fast Track grid connection procedure, the PINM classification for projects over 40 hectares, and the CRE reform enabling accelerated connections. These are real improvements to a real bottleneck, and Data4 and Niel will benefit.[19]

The policy reforms did not require giving stage time to Sesterce, and the structural failures of the three largest pledges are not timeline problems: they are accountability problems. FluidStack did not leave because 13 months was too short. MGX has not failed to sign an operator because the timetable is ambitious. These are structural failures that no timeline extension will fix.

The selection mechanism is structural, not incidental. A summit that needs a headline selects for maximum claims rather than maximum credibility. FluidStack’s €10 billion was more impressive than Scaleway’s existing cluster. Sesterce’s €52 billion was more impressive than Eclairion’s €50 million raise. MGX’s €30–50 billion was more impressive than Data4’s quiet land purchases in Nozay and Cambrai. The summit optimized for the number. The market optimized for the operator. The numbers and the operators were inversely correlated.

What works is the opposite of the summit model. Industrial operators with existing assets, existing balance sheets, and existing customers — who build because the conditions are right, not because the president asked. Mavericks who build with their own capital, on their own timeline, solving their own problems — and who happened to be in the room when the state wanted a number to announce. Neither profile needs a summit. Both need conditions: nuclear power, grid access, permitting reform that doesn’t require a public inquiry for every sixty hectares, stable regulation, and a competitive energy pricing framework beyond ARENH. France has most of these — the nuclear fleet, the engineering schools, and the geography. The state’s job is more of those conditions, fewer signing ceremonies.

Macron knew. The Delhi triple-count proves it. The same money — MGX and Brookfield — was reclassified from a summit pledge to an UNCTAD greenfield announcement to “delivery,” without a single new watt of AI compute having been produced. L’Usine Nouvelle, reviewing Bercy’s one-year data, concluded it remained “difficult to know which investments have actually been realized.”[34] The Élysée knew which numbers were real and which were staging. It published them anyway. The €109 billion was not just a domestic headline — it was France’s bid to position itself as Europe’s AI infrastructure leader, the number Macron carried to Brussels and Berlin to argue that France, not Germany or the Nordics, should anchor the EU’s compute strategy. The summit needed a number that could be spoken in the same sentence as Stargate. The honest number — closer to €25 billion in commitments with industrial accountability from Brookfield, Iliad, and the serious tail of the ledger — would not have served that purpose.

The honest number would have been enough. €25 billion from operators who know how to build, deployed in a country with the cleanest grid in Europe, the strongest engineering pipeline on the continent, and thirty-five government-designated datacenter sites with expedited grid connections — that is a serious industrial proposition. It is also a proposition that requires admitting the rest of the ledger is theater: the Gulf money that may or may not arrive, the neoclouds that lack the capital to build, the startups that lack the revenue to survive.

But an honest number requires honest industrial policy. Permitting reform that lets Data4 break ground at Cambrai without years of additional process. Energy contracts at post-ARENH pricing that give operators visibility beyond the current year. Grid investment that matches the pace of the government’s own site designations. Regulatory certainty that doesn’t require the sovereign champion to propose a copyright levy to legalize what it already did. Industrial policy that works is boring. It is not a summit. It does not produce €109 billion in headlines. It produces datacenters.

The king stood at Station F and announced €109 billion. Thirteen months later, the departed pledge is building for Anthropic in Texas. The holding company has seven employees and farmland. The sovereign champion trains on American infrastructure because the sovereign alternative was never built. And the only operational AI compute in France was built by a man who spent his career proving the state’s plans are less effective than the state believes.

The €109 billion was never a strategy. It was a number Macron needed for a speech. The speech is over, and datacenters are going live in Texas.

Notes

[1] Macron announced the €109 billion figure at Station F on February 11, 2025. Élysée official transcript. He explicitly compared it to the US Stargate project. Élysée transcript, Station F Business Day | Élysée summit overview

[2] UNCTAD Global Trade Update, cited by Macron at AI Impact Summit, New Delhi, February 2026. L’Usine Nouvelle’s one-year review noted it remained “difficult to know which investments have actually been realized.” The 87% concentration in MGX and Brookfield is the author’s calculation from UNCTAD greenfield project data. “Mistral Succeeded. France’s AI Strategy Didn’t,” The AI Realist, March 2026 — reconstructed the Delhi triple-counting. ‘Mistral Succeeded. France’s AI Strategy Didn’t,’ The AI Realist

[3] The Pledge Credibility Test adapts the Commitment-vs-Spend Gap framework from the Capex vertical. The five-tier grading (MOU / LOI / binding contract/construction underway / operational) is derived from the energy infrastructure vertical’s Nuclear Delivery Test. Applied here to AI infrastructure commitments.

[4] Breakdown table compiled from: Élysée press materials (February 2025); The Media Leader FR, “IA: d’où viendront les 109 milliards d’euros d’investissements d’ici 2031” (February 10, 2025); Maddyness, “Ce que contient le plan à 109 milliards” (February 10, 2025); Journal des Entreprises (April 2025); Public Sénat (February 2025). The Choose France summit (May 2025) confirmed €20.8 billion of the €109 billion as “concrétisés” — less than 20% materialized within three months. The Media Leader FR | Maddyness

[5] FluidStack press release, February 10, 2025 (BusinessWire). Macron's quote from the AI Action Summit, confirmed by World Nuclear News and DCD. FluidStack press release (BusinessWire) | DCD coverage

[6] AInvest analysis, March 2026 (B-tier; FluidStack is private and may have raised additional undisclosed capital).

[7] Bloomberg, March 18, 2026: “Fluidstack Drops Out of Marquee €10 Billion AI Project in France.” Bloomberg

[8] Blockspace, March 19, 2026. $50 billion partnership with Anthropic to operate (not finance) custom compute clusters across New York, Texas, and other states. Physical infrastructure built by TeraWulf (25-year JV, $1.275B senior secured notes, Google-backed lease obligations) and Cipher Mining. FluidStack also exited the Eclairion site. DCD confirmed (March 18, 2026). Bosquel Business Park, near A16 in the Somme département — local authorities (CC2SO) reopened the application process. Blockspace | DCD

[9] FluidStack-Eclairion partnership announced March 5, 2025 (BusinessWire): 18,000+ GPU cluster at Eclairion’s 40MW Bruyères-le-Châtel site. Mistral Compute announced June 2025: 18,000 Blackwell GPUs, 40MW, Essonne — same region, same power, same GPU count, with Nvidia as direct partner instead of FluidStack. Neither has publicly confirmed a causal connection; operational overlap and timeline are consistent with vertical integration displacing the intermediary. DCD reported March 11, 2026, that Eclairion raised €50M from Tikehau Capital and may host a Mistral cluster directly. Mistral Compute | FluidStack-Eclairion (BusinessWire)

[10] L’Usine Digitale, February 12, 2025: €52B calculation method (GPU catalogue price × quantity + infrastructure), CEO acknowledgement. Journal des Entreprises (April 2025): comparison to US private AI investment. Sesterce background: Solutions Numériques, November 5, 2024. L’Usine Digitale on Sesterce

[11] Sesterce Group entered judicial restructuring (redressement judiciaire) February 5, 2026, Marseille commercial court (Pappers.fr, SIREN 902372481). Pappers.fr (Sesterce Group SIREN 902372481) - March 5 overturning ruling.

[12] Campus AI SAS registered April 7, 2025, SIREN 943 352 161 (Annuaire des Entreprises, data.gouv.fr). APE code 7010Z (head office activities). Directors: Omar Alismail (Président, MGX CIO Semiconductors & Infrastructure), Ignacio Quintana Alonso (DG, MGX Partner & Managing Director), Dani Dweik (DG, MGX COO/CFO). No BPI, Mistral, or Nvidia personnel in registered directorship. BPI “droits de blocage” confirmed during CNDP concertation; terms undisclosed. 7 employés par L’Usine Nouvelle (février 2026). Annuaire des Entreprises (Campus AI SAS)

[13] Site: ZAC des Bordes, Fouju (Seine-et-Marne), ~70 hectares de terres agricoles. CNDP concertation préalable ran October 13 – November 23, 2025; report published January 2026. RTE “Fast Track” grid connection signed January 26, 2026 (RTE press release): 240MW by end 2027, 700MW by end 2029, scalable to 1,400MW. First use of the accelerated connection procedure approved by CRE in spring 2025. Requires upfront financial commitment. Building permits and environmental authorization filed by February 2026; public inquiry targeted May 2026; no permits issued as of March 2026. SOCOTEC engaged for ICPE/design review; Arcadis for project management. RTE Fast Track press release | CNDP concertation

[14] L’Usine Nouvelle, “Un alignement de bonnes volontés nous permet d’avancer assez vite: un an après le Sommet sur l’IA, ce que l’on sait du campus IA de MGX, Mistral, Nvidia et Bpifrance,” February 2026. Paul Sayar (Campus IA project coordinator): “Nous cherchons des opérateurs de datacenters pour le faire, mais aucun contrat n’a encore été signé.” On Mistral: “Mistral est une piste privilégiée de client… mais aujourd’hui, rien n’a été signé avec lui.” L’Usine Nouvelle, Campus IA one-year update

[15] MGX global portfolio: co-investment in OpenAI Stargate, co-lead investment in Anthropic, BlackRock $30B AI infrastructure fund, TikTok USDS JV. Bloomberg, February 17, 2026: MGX targets $100B AUM and ~$10B annual deal deployment. Stargate Abilene: two buildings operational since September 2025 (DCD). Stargate UAE: 5,000+ construction workers, Phase 1 200MW targeting Q3 2026 (The National, December 2025). Bloomberg on MGX

[16] Digital Realty pledged €5B+ at the AI Action Summit (February 2025); at Choose France (May 2025), the confirmed figure was €2.3B and 750 jobs (Choose France announcement PDF). Digital Realty is France’s largest data center operator (13+ facilities across Paris and Marseille at various stages of operation and construction; confirmed by BusinessWire/ResearchAndMarkets, February 2025). PAR11 opened Q3 2025 (Mercury Engineering). MRS5 under construction; MRS6 public inquiry opened October 2025, building permit granted January 22, 2026 (Gomet’, DCD). Blackstone-Digital Realty $7B hyperscale JV covers Paris campuses (Blackstone, August 2024). Equinix committed €630M at the AI Action Summit for new facilities in Paris and Bordeaux (Silicon Canals, February 2025; Élysée materials). Telehouse (KDDI subsidiary): €400M in announced investments at the AI Action Summit, expanding existing Paris operations. Digital Realty MRS6 permit (DCD) | PAR11 delivery (Mercury Engineering) | Blackstone-Digital Realty JV

[17] Apollo: “$5 billion funding initiative for AI energy projects” announced at AI Action Summit (Silicon Canals, February 11, 2025). Described as a global thematic allocation, not a France-specific datacenter commitment. Subsequent Apollo deployments: $3.5B capital solution for xAI/Valor compute (Apollo press release, January 7, 2026), majority acquisition of Stream Data Centers in Dallas (November 2025), $35B Meta datacenter financing discussions (Reuters, February 2025). No France-specific Apollo project identified. Amazon/AWS: €6B pledged at AI Action Summit. At Choose France (May 2025), AWS confirmed €300M for France, the majority of which was logistics infrastructure, including a distribution center at Illiers-Combray (Eure-et-Loir), not datacenter infrastructure (aboutamazon.fr; L’Usine Digitale, March 2026). AWS simultaneously committed €33.7B to Spain. AWS has operated the EU (Paris) region since 2017; planned investment 2022–2031 was €5.3B for the broader Paris region (L’Usine Digitale). AWS 2026 global capex projected at $200B (DCD, February 2026). Apollo xAI/Valor deal | Apollo Stream Data Centers acquisition | AWS Spain €33.7B (L’Usine Digitale) | AWS 2026 capex $200B (DCD)

[18] Evroc: €4B pledged at AI Action Summit for 96MW Mougins facility “at full capacity” (evroc press release, February 10, 2025; DCD; Sifted). Series A funding: €50.6M from Blisce/, Giant Ventures, EQT Ventures, Norrsken VC (SiliconANGLE, March 20, 2025). Stockholm flagship at Arlandastad: 10,000 GPU capacity (revised from initial 16,000 target), land purchased for SEK 400M ($39M); broke ground H1 2025; operational H2 2026 (evroc newsroom). Mougins facility was described as “construction expected to be completed in 2025” (evroc press release); no completion confirmation found as of March 2026. Cloud services launched (Datacenter Forum, citing evroc) using strategic partner datacenters in Paris, Stockholm, and Frankfurt. Prologis: €3.5B pledged at AI Action Summit; increased to €6.4B at Choose France (DCD, May 21, 2025). Four DC sites in Île-de-France, 584MW total, “full commissioning planned for 2035.” World’s largest logistics REIT; $8B planned globally across 20 DC projects over four years (CoStar, October 2025). No previous data center construction in France. Eclairion: ~€1B at AI Action Summit. FluidStack exited. Raised €50M from Tikehau Capital (DCD, March 11, 2026). May host Mistral cluster directly. Evroc Mougins announcement | Evroc Series A (SiliconANGLE) | Evroc Stockholm flagship | Prologis France (DCD)

[19] Policy reforms attributed to the summit period: 35 designated turnkey datacenter sites at AI Action Summit (February 2025) plus 28 added in November 2025, totaling 63 sites nationally (Élysée documents; L’Usine Nouvelle; DGE datacenter implementation guide, November 25, 2025). RTE Fast Track grid connection procedure approved by CRE in spring 2025 (RTE press release). “Projet d’Intérêt National Majeur” (PINM) classification for datacenters over 40 hectares: Article 15 of January 2026 simplification law (Legifrance). L’Usine Nouvelle one-year review (February 2026) noted 52 enterprises accompanied by the state and 5.8GW of electrical power secured — the 63-site and 52-enterprise figures are different metrics (sites designated vs. companies formally engaged). Élysée AI infrastructure plan

[20] Arthur Mensch, Financial Times, approximately February 11, 2026: ARR crossed $400 million, up from $20 million a year earlier. Guiding toward exceeding $1 billion ARR by the end of 2026. Series C (September 2025): €1.7 billion led by ASML at €11.7 billion valuation. FT interview with Mensch, February 2026

[21] Microsoft-Mistral partnership announced in February 2024. Multi-year deal: Azure supercomputing infrastructure for training, Mistral models available through Azure AI Studio/MaaS, collaboration on “training purpose-specific models for select customers, including European public sector workloads.” Microsoft took a minority stake. DCD

[22] Mistral Compute launch, June 2025 (mistral.ai): “As a premier NVIDIA partner, Mistral Compute will offer the latest NVIDIA reference architectures, with availability of tens of thousands of GPUs.” This is the Nvidia Cloud Partner (NCP) program in substance — Nvidia designs the reference architecture, the partner deploys to spec. Mistral Compute operates on Nvidia silicon (Blackwell), Nvidia software (NIM, NeMo, TensorRT-LLM), and integrates with the DGX Cloud Lepton marketplace. Mistral Compute | DGX Cloud Lepton (GlobeNewswire)

[23] Nvidia Nemotron Coalition announced March 16, 2026 (nvidianews.nvidia.com). Mistral is a founding member. First project: base model co-developed by Mistral and Nvidia, trained on DGX Cloud, underpinning Nemotron 4 family. “The models will be open-sourced, providing a shared foundation for post-training and specialization.” Nvidia Newsroom | Mistral AI

[24] Mistral-EcoDataCenter partnership announced February 11, 2026: €1.2 billion, Borlänge site, Vera Rubin GPUs, operational 2027. DCD, CNBC, EcoDataCenter press release. Mistral’s first infrastructure investment outside France. DCD | EcoDataCenter

[25] Arthur Mensch, Financial Times op-ed, March 19, 2026: proposed a 1–1.5% revenue-based levy on all AI operators in Europe in exchange for legal certainty. Mistral external affairs chief Audrey Herblin-Stoop confirmed the 1–1.5% range to AFP. Mediapart investigation (February 2026): Mistral used copyrighted works, including Harry Potter and The Little Prince, in training data. Mistral’s response: works are “especially popular and duplicated many times online,” making exclusion difficult. The Levy Ratchet framework: “Register, Disclose, Pay,” The AI Realist, 2025. Mensch FT op-ed coverage (AFP/KTEN) | ‘Register, Disclose, Pay,’ The AI Realist

[26] Brookfield pledged €20B at AI Action Summit (February 10, 2025). €15B for datacenters via Data4, €5B for associated AI infrastructure. Brookfield launched a $100B global AI infrastructure program in November 2025 (Nvidia and KIA as co-investors). Launched Radiant (vertically integrated AI infrastructure company) in February 2026. Brookfield AI infrastructure

[27] Data4 sites: Cambrai (former BA 103 Cambrai-Épinoy airbase, within E-Valley logistics park, Hauts-de-France), 1GW/€10B target, land owned (DCD, L’Usine Nouvelle, Le Journal des Entreprises, confirmed at Choose France May 2025). Nozay PAR03 (former Nokia France HQ, purchased 2023, 3km from existing Marcoussis campus), 250MW/€2B (doubled from €1B due to AI density requirements), first AI datacenter operational 2027, 200,000 GPUs (Data4 press release, June 2025; Structure Research, July 2025). Escaudain/Denain (former Usinor steelworks, dormant 45 years), up to 700MW/€5B, selected over AWS by local council December 15, 2025 (DCD, Baxtel). France’s zero-net-artificialization (ZAN) regulations incentivize repurposed brownfield sites — all three Data4 acquisitions comply. Data4

[28] EDF nuclear PPA: 12-year Nuclear Production Allocation Contract (CAPN) signed September 4, 2025 (Data4 press release; EDF press release; DCD). 40MW, ~230GWh/year, cost-and-risk-sharing mechanism. First such agreement between EDF and a datacenter operator. Westinghouse AP300 SMR MOU signed March 2025 (NucNet; World Nuclear News) — exploration of on-site nuclear for future campuses, post-2030 timeline. Renewable PPAs: Eurowatt (wind, 80 GWh/year) and Photosol (solar, ~70 GWh/year), both signed in 2024 for French operations. Data4-EDF PPA

[29] ARENH (Accès Régulé à l’Électricité Nucléaire Historique) set a fixed nuclear price of €42/MWh from EDF, capped at 100 TWh/year, and designed in 2011 to encourage competition. The mechanism expired at the end of 2025, with transitional arrangements extending into 2026. Post-ARENH pricing is under negotiation, and the transition creates uncertainty for large industrial power consumers, including data center operators. DCD analysis, February 2026: “Vive la révolution: The inside story of the big French AI data center build-out.” DCD ‘Vive la révolution’

[30] RTE grid connection: France designated 35 turnkey data center sites at the February 2025 summit, plus 28 added in November 2025, totaling 63 nationwide. Hauts-de-France hosts 8, the most of any region. RTE Fast Track procedure approved by CRE spring 2025, but even Fast Track for Campus IA (Fouju) delivers 240MW by the end of 2027 at best. Cumulative new datacenter load announcements in France exceed RTE’s historical accommodation at this pace. RTE

[31] Nozay: Mayor Didier Perrier’s comments reported in the local press (cited by DCD, February 2026). Escaudain: council member concerns during December 2025 vote (DCD). Fouju: FNE Seine-et-Marne filed unfavorable contributions (October 2025); La France Insoumise deputy Arnaud Saint-Martin filed Written Question No. 8839 and Oral Question No. 501 (Assemblée Nationale). “Projet d’Intérêt National Majeur” (PINM) provision: Article 15 of the January 2026 simplification law allows datacenters over 40 hectares to receive PINM designation; Campus IA has not been designated. Assemblée Nationale questions

[32] Data4 financial structure: €3.3B debt raised in January 2025 (Linklaters, Clifford Chance) — described by Structure Research and DCD as Europe’s largest digital infrastructure financing at the time. €2.2B refinancing stabilized portfolio (StableCo), €1.1B capex facility for greenfield (GrowthCo). Arjun Infrastructure acquired a 30% stake in StableCo for ~$3.6B (August 2025; DCD). Brookfield BSIP held 40.1% of StableCo's equity (September 2025; Bloomberg). Capital recycling strategy: mature, revenue-generating assets fund new AI-dedicated construction. DCD on Data4 financing

[33] Iliad Group €3B AI investment announced at AI Action Summit (February 10, 2025). Components: Scaleway (~5,000 GPUs operational pre-summit, serving Mistral, H, Photoroom), Opcore JV with InfraVia (€2.5B+, announced December 2024, closed April 2025), Kyutai open-source research lab (€300M from Niel, Rodolphe Saadé/CMA CGM, and Eric Schmidt; November 2023). Scaleway expanding: Milan launched in March 2026, Sweden and Germany planned. AION consortium submitted for EU AI Gigafactory (June 2025). Scaleway | Kyutai

[34] L’Usine Nouvelle one-year review of AI summit pledges, February 2026. “Il reste difficile de savoir quels investissements ont été réellement concrétisés.” Bercy data reviewed. The €25 billion estimate for commitments with industrial accountability is the author’s calculation: Brookfield/Data4 (~€20B with sites, debt, and PPAs), Iliad (~€3B with operational assets), plus the tail of confirmed smaller deployments (Amazon partial confirmation, Eclairion €50M). Excludes all commitments graded MOU or below on the Pledge Credibility Test. L’Usine Nouvelle

[35] International construction speed comparison. Stargate Abilene: first two buildings (200+ MW, 980,000 sq ft) operational September 30, 2025, roughly 15 months after construction began in June 2024 (Crusoe press release; DCD; CNBC). Stargate UAE: 5,000+ workers on site, Phase 1 (200MW) targeting Q3 2026, ~15 months from announcement (The National, December 2025; DCD). EcoDataCenter Borlänge (Mistral’s Sweden site): land purchased September 2024, broke ground September 24, 2025, Phase 1 targeting early-to-mid 2027 (EcoDataCenter press release; DCNN Magazine; BeBeez). Data4 president Olivier Micheli on French grid connection delays: risen to “more than five years and sometimes seven” (Techniques de l’Ingénieur, 2025). Crusoe Abilene announcement | Stargate UAE (The National) | EcoDataCenter Borlänge groundbreaking (DCNN)

When the Credit Market Builds a Fire Exit

Julien Simon — Tue, 24 Mar 2026 18:38:03 GMT

On Monday, Bloomberg reported that JPMorgan launched a basket of credit default swaps on five hyperscalers — Alphabet, Amazon, Meta, Microsoft, and Oracle — tradeable in $25 million increments.[1] Three days earlier, Meta, Alphabet, and Microsoft were added to the CDX Investment-Grade Index, the 125-company benchmark that defines how institutional investors hedge corporate credit risk.[2] Four months before that, Citadel Securities began making markets in two baskets of hyperscaler bonds — one with ten-year maturities, one with thirty-year maturities.[3]

Twelve months ago, single-name CDS on these companies barely existed. Today, they are among the most actively traded US derivatives contracts outside the financial sector.[4]

The speed matters more than the products. When Wall Street builds hedging infrastructure this fast, it is not reacting to a risk. It is pricing one in.

What the Credit Market Sees

The equity market treats hyperscaler AI spending as a growth story. The credit market treats it as a leverage story. Both are looking at the same balance sheets and reaching different conclusions — and neither is obviously wrong. The hyperscalers collectively hold roughly $350 billion in cash, are projected to generate approximately $725 billion in operating cash flow in 2026, and carry a liabilities-to-assets ratio of 48%, half the S&P 500 average.[5] Saba Capital is selling CDS protection on these names, collecting premiums on what it views as overpriced AI fear.[6] The balance sheets are not telecom balance sheets.

But the volume of debt issuance is approaching telecom-scale levels. The Big Five hyperscalers issued $121 billion in bonds in 2025 — up from an annual average of $28 billion between 2020 and 2024 — with over $90 billion raised in the last three months of the year alone. Alphabet issued a 100-year bond in February 2026.[7] BofA expects the Big Five to borrow roughly $140 billion annually over the next three years, potentially exceeding $300 billion annually — putting them on pace with the Big Six banks’ expected issuance.[8] Barclays projects total US corporate bond issuance will hit $2.46 trillion in 2026, with hyperscaler AI capex as the single largest upside risk.[9]

Aggregate capex for the Big Five, after buybacks and dividends, now exceeds projected operating cash flows.[10] These companies — historically self-funding — are borrowing to sustain the AI buildout. “Chip and Mortar” traced this shift at the company level: Amazon’s infrastructure reversal showed how capex funded by operating cash flow becomes capex funded by debt when the bet gets big enough. That shift from self-funded to debt-funded is what moved the CDS market from dormant to hyperactive in under a year.

Oracle is the canary. Its five-year CDS spread tripled from 43 basis points to over 135 basis points between September and December 2025 — the highest level since the 2008-09 financial crisis.[11] Its total borrowings exceeded $108 billion by late 2025 and have continued to grow, making it the largest non-bank issuer in the Bloomberg US Corporate Index.[12] S&P revised its outlook to negative.[13] In January, bondholders sued, alleging Oracle failed to disclose plans for $38 billion in additional borrowing when it sold $18 billion in bonds the previous September.[14] Oracle’s CDS now averages over $830 million in weekly trading volume — the most liquid investment-grade CDS in the market.[15]

None of this is surprising if you read the balance sheet analysis in “Cloud vs. Clout” or the codependence architecture in “Hotel Abilene” — the five-stage access sequence described there maps precisely onto this covenant pressure stage.

But Oracle is not the structural story. Oracle is the outlier that made the structural story visible. The structural story is that the entire hyperscaler credit complex is being repriced as a distinct risk category.

The Product Sequence Is the Signal

Three products in five months, each escalating the tradability of hyperscaler credit risk, with a fourth building alongside:

November 2025: Citadel Securities’ bond baskets. Cash bond positions in four hyperscalers at two durations.

February 2026: JPMorgan’s CDS basket (reported publicly today). Synthetic exposure to five hyperscalers in $25 million clips. No bond ownership required.

March 2026: CDX IG Index inclusion — a mechanical semi-annual reshuffle, but one that now embeds Meta, Alphabet, and Microsoft in the benchmark every institutional portfolio uses to manage credit risk.

Some of this is market infrastructure catching up with market size. But when BofA’s chief investment strategist calls hyperscaler bonds the best short, and banks are approaching hedge funds for protection for the first time, the infrastructure is being used directionally. JPMorgan isn't alone. Goldman Sachs is reportedly pitching a parallel product that gives hedge funds the same capability — a way to bet on whether AI debt holds up.[16]

The BIS noticed. In a March 2026 report, officials flagged what they called “shadow borrowing”: hyperscalers using off-balance-sheet arrangements to finance data center expansions with private credit firms—obligations that are economically equivalent to debt but are outside corporate balance sheets, creating new shock transmission channels.[17] If you read “Compute Equals Commitments,” the gap between announced AI investment and actual capital expenditure was already the market’s blind spot. The BIS just named the next one: the gap between on-balance-sheet debt and total economic obligation.

This Has Happened Before

The last time Wall Street built hedging infrastructure this fast for a single sector was the telecom boom. Between 1996 and 2001, real private fixed investment in communications equipment more than doubled — from $62 billion to over $135 billion annually — much of it debt-financed.[18] Telecom became the largest sector in the Merrill Lynch High-Yield Bond Index, accounting for 20.3% of the index by 2000.[19] Credit markets saw the problem first: telecom junk bond spreads rose 100–130% between late 1997 and late 1998, while telecom equity investors were still earning 18% annualized returns.[20] The CDS market grew from $151 billion in notional value in 1997 to $2 trillion by the end of 2002, driven primarily by banks hedging telecom loan exposure.[21] By mid-2002, $110 billion in telecom bankruptcies had been filed — a quarter of all corporate defaults.[22]

The pattern repeated before 2008. The ABX.HE index — essentially insurance contracts on baskets of subprime mortgage bonds, packaged into a tradeable index — launched in January 2006, created by dealers to make the subprime market more liquid.[23] Michael Burry had closed his first CDS trade on subprime eighteen months earlier — he had to persuade banks to create the instruments.[24] The dealers built the infrastructure; the informed shorts were the first to use it. The hedging infrastructure was in place 18–36 months before the crisis. Goldman Sachs was marketing ABX shorts to hedge funds by summer 2006.[25] Bear Stearns CDS spreads remained near the risk-free rate until four months before the firm collapsed.[26]

The consistent pattern across both cycles: massive debt accumulation forces credit markets to build new risk-transfer infrastructure. Sophisticated participants begin using it. Defaults arrive 18–36 months later, while less-informed participants are still positioned long. Not every hedging infrastructure buildout precedes a crash — energy CDS grew rapidly in 2014-15 and the sector restructured without systemic failure. The signal is not deterministic. But the signal was never what existing instruments priced — it was that entirely new product categories were created to hedge previously unhedgeable risks.

What This Means

If you manage a portfolio, you now have hyperscaler AI credit exposure whether you chose it or not. The CDX IG inclusion embeds it in every index-level credit hedge. The BIS shadow borrowing finding suggests private credit and insurance allocations may carry it as well.

The question is not whether hyperscaler AI capex produces returns. It does. The question is whether your portfolio’s exposure to the downside scenario — where AI revenue disappoints and $600 billion in annual capex meets $140 billion in annual debt issuance meets extended depreciation schedules that flatter the income statement — is a position you chose or one you inherited.

JPMorgan built a basket so its clients could answer that question. The fact that the basket exists is the answer to a different question: whether the credit market thinks the AI capex cycle carries tail risk worth hedging.

It does. The fire exit is now open. That tells you something about what the building inspectors found.

Notes

[1] Bloomberg, “JPMorgan Offers Clients a New Way to Hedge AI Debt Risk,” March 23, 2026. Basket includes CDS on Alphabet, Amazon, Meta, Microsoft, and Oracle; trades in $25 million increments ($5 million per firm).

[2] Bloomberg, “Meta, Alphabet Join Credit-Risk Index as AI Hedging Demand Soars,” March 20, 2026. See also GuruFocus/Yahoo Finance. Meta, Alphabet, and Microsoft added to S&P Dow Jones Indices’ CDX Investment-Grade Index effective March 20. The index is equally weighted across 125 names; each hyperscaler represents ~0.8% of the index. The exposure is dilute per-name but cumulative across all hyperscaler constituents.

[3] Bloomberg, “Citadel Securities Launches AI Bond Trading Baskets for Hedging,” November 19, 2025. Hedgeweek confirmed baskets include 10-year and 30-year maturities of bonds issued by Microsoft, Amazon, Alphabet, and Meta.

[4] DTCC data, as reported by Bloomberg, March 23, 2026, and S&P Dow Jones Indices.

[5] JPMorgan, as cited by FT, November 2025: hyperscalers collectively hold ~$350 billion in liquid cash and investments and are projected to generate ~$725 billion in operating cash flow in 2026. CreditSights, Q3 2025: hyperscaler liabilities-to-assets ratio fell to 48%, close to 2015 levels, versus S&P 500 steady at ~80%. Per MUFG Americas, December 2025.

[6] Reuters exclusive, November 17, 2025, by Nell Mackenzie and Lucy Raitano. Saba Capital (Boaz Weinstein) selling CDS protection on Oracle, Microsoft, Meta, Amazon, and Alphabet to banks seeking to hedge AI-related credit exposure. Source with direct knowledge of the trades. First time banks had approached Saba for this specific trade. BofA CIS Michael Hartnett, same week: “Best short is AI hyperscaler corporate bonds.” Confirmed by Hedgeweek, November 19, 2025.

[7] Mellon Investments, “Record-Breaking AI-Related Debt Issuance in 2025,” citing Bloomberg data as of December 15, 2025: hyperscalers issued $121 billion in bonds in 2025, with over $90 billion in the last three months alone, up from a $28 billion annual average 2020–2024. BIS Quarterly Review, March 2026 confirms issuance “topped $100 billion.” Alphabet 100-year sterling bond (£1 billion) priced February 10, 2026: Bloomberg, Reuters, CNN. First century bond by a tech company since Motorola in 1997. Part of $20 billion multi-tranche dollar offering upsized from $15 billion after >$100 billion in orders.

[8] Reuters, “AI hyperscalers will drive higher US corporate bond supply in 2026, analysts say,” January 15, 2026. BofA estimate: Big Five to borrow ~$140 billion annually over three years, potentially exceeding $300 billion annually.

[9] Barclays report, January 2026, as reported by Reuters. Barclays projects $2.46 trillion in total US corporate bond issuance (up 11.8% from $2.2 trillion in 2025) and $945 billion in net issuance (up 30.2% from $726 billion).

[10] MUFG Americas, “AI Chart Weekly: Financing the AI Supercycle,” December 19, 2025. Aggregate capex for the Big Five, after buybacks and dividends, exceeds projected cash flows. CreditSights 2026E data.

[11] BondBloX, January 2026, citing Bloomberg data. Oracle 5Y CDS surged from 43 bps to its highest level since the 2008–09 financial crisis. ROIC.ai reported 139 bps on December 11, 2025, citing S&P Global Market Intelligence.

[12] BondBloX, January 2026: approximately $105 billion in total debt, including $95 billion in corporate bonds. ainvest.com (December 2025) cites $108 billion. Winbuzzer (citing Fortune, March 24, 2026) reports $124 billion in current borrowings — single source, not independently confirmed. Growth trajectory is directionally confirmed across all three measurements.

[13] S&P revised Oracle’s outlook to negative in July 2025 on expectations of weakening cash flows, per BondBloX. Moody’s described Oracle as having the weakest credit metrics among investment-grade hyperscalers.

[14] CNBC, Reuters, Bloomberg, January 14–15, 2026. Ohio Carpenters’ Pension Plan v. Oracle Corp., filed in New York state court. Bondholders allege Oracle failed to disclose plans for $38 billion in additional borrowing when selling $18 billion in bonds on September 25, 2025.

[15] S&P Dow Jones Indices, citing DTCC data, as reported by Bloomberg. Oracle CDS averages over $830 million in weekly trading volume — the most liquid investment-grade CDS.

[16] Goldman Sachs reportedly pitching hedge funds total return swaps on corporate loan price swings. Winbuzzer, “JPMorgan Launches CDS Basket to Hedge AI Debt Risk,” March 24, 2026, citing Fortune. Single secondary source; not independently confirmed at time of publication.

[17] BIS Quarterly Review, March 2026. Authors: Egemen Eren, Ingomar Krohn, Karamfil Todorov. Structures involve dedicated vehicles and special-purpose entities capitalised with equity from consortia and raising debt through private placements. Banks support vehicles with funding lines, creating links to insurers and private credit vehicles, and new refinancing pressure and guarantee activation channels.

[18] Richmond Fed Economic Quarterly, Fall 2003, Alexander L. Wolman, “Boom and Bust in Telecommunications.” Real private fixed investment in communications equipment surged from $62 billion annually (1996) to over $135 billion (Q4 2000).

[19] Edward Altman and Gaurav Bana, “Defaults and Returns on High Yield Bonds: The Year 2002 in Review and the Market Outlook,” NYU Stern Working Paper, 2003. Telecom accounted for 20.3% of the Merrill Lynch High-Yield Bond Index and 52% of all defaulted dollar amounts in 2002.

[20] San Francisco Fed Economic Letter 2001-33, “Rising Junk Bond Yields: Liquidity or Credit Concerns?” November 2001. Telecom junk bond spreads rose 100–130% from November 1997 to October 1998 while annualized stock returns for telecom junk issuers averaged +17.8%.

[21] ISDA Market Survey, year-end 2002. CDS notional grew from $151 billion (mid-year 1997 survey) to $2 trillion (year-end 2002 survey). ISDA credited credit derivatives with having prevented severe distress from telecom bankruptcies. CDS documentation standardized by ISDA in 1999.

[22] FCC Chairman Michael Powell, testimony to Senate Commerce Committee, July 2002. $110 billion in telecom bankruptcies filed over the preceding 18 months. Overall high-yield default rate hit 12.8% in 2002 (Altman, fn 19).

[23] ABX.HE index launched January 19, 2006, administered by Markit Group. Created by a consortium of 16 dealer desks. Reserve Bank of Australia, “Box B: The ABX.HE Credit Default Swap Indices,” Financial Stability Review, March 2008.

[24] Michael Burry closed first CDS trade on subprime MBS with Deutsche Bank on May 19, 2005, for $60 million, after five banks told him they had no idea what he was talking about. Gregory Zuckerman, The Greatest Trade Ever (2009). Burry accumulated over $1 billion in CDS on subprime by October 2005.

[25] Goldman Sachs Mortgage Department developed “predominantly pessimistic view” by mid-2006. Senate Permanent Subcommittee on Investigations, “Wall Street and the Financial Crisis: Anatomy of a Financial Collapse,” April 2011. Also available via FRASER/St. Louis Fed.

[26] Bear Stearns CDS spreads were statistically indistinguishable from the risk-free rate before November 2007; the firm collapsed in March 2008. Trabelsi and Hammami, “Tail return analysis of Bear Stearns’ credit default swaps,” Economic Modelling, 2011.

Musk's Chip Gambit

Julien Simon — Mon, 23 Mar 2026 11:25:01 GMT

On March 14, 2026, Elon Musk posted five words to X: “Terafab Project launches in 7 days.” The announcement described a $25 billion semiconductor fabrication facility targeting 2nm process technology — the most advanced node in commercial production — with a stated capacity of 100,000 wafer starts per month and an ambition to scale toward one million.[1] If built, it would produce between 100 and 200 billion custom AI chips annually, rivaling the output of TSMC’s most advanced facilities.[2]

Twelve years earlier, Tesla couldn’t modify a supplier’s image-processing algorithm. Its first Autopilot system ran on Mobileye’s EyeQ3 — a chip Tesla didn’t design, running software Tesla couldn’t change, processing 110 frames per second in a pipeline Tesla didn’t control.[3] When Mobileye ended the partnership in July 2016 — after a fatal crash and a dispute over whether Tesla was pushing beyond the hardware’s safety envelope — the automaker had no silicon capability whatsoever.[4] It was, in semiconductor terms, a customer.

The distance between those two moments is the most aggressive vertical integration trajectory any non-semiconductor company has ever attempted. And the question it raises is not whether the ambition is impressive — it is —, but whether the structure that produced it can survive contact with the physics of chipmaking, the governance of cross-entity resource allocation, and the paradox of building compute independence on the back of your largest competitor’s hardware.

No one else on earth can build this orbit. The question is whether a single focal point can hold it.

The orbit

To understand what Musk is building, start with what connects the entities. Tesla’s fleet of over 4 million vehicles equipped with vision-based Autopilot hardware generates real-world driving video—the raw material for training autonomous-driving models.[5] That data flows into Tesla’s Cortex training cluster at Giga Texas, which ran roughly 50,000 H100 GPUs at the end of 2024 and grew to approximately 81,000 H100-equivalent GPUs by the third quarter of 2025.[6] Separately, xAI’s Colossus facility in Memphis — the world’s largest single-site GPU installation — runs over 200,000 Nvidia GPUs training Grok, xAI’s large language model.[7] The models trained on both clusters are deployed on Tesla’s custom inference chips inside vehicles and, increasingly, inside Optimus robots. Tesla’s Megapack batteries provide backup power to xAI’s Memphis data center — Colossus reportedly consumes up to 150 megawatts.[8] SpaceX, which acquired xAI in February 2026, launches the Starlink satellites that could eventually distribute AI compute globally — satellites running on custom silicon co-designed with STMicroelectronics over a decade-long partnership that has shipped over five billion RF chips.[9] SpaceX is also building the largest chip packaging facility in North America in Bastrop, Texas, with a $280 million expansion backed by a $17.3 million grant from the Texas Semiconductor Innovation Fund.[10] And Terafab, if realized, would close the final gap: fabrication itself.

Each entity’s output is another entity’s input. Tesla vehicles produce training data. Training clusters produce models. Custom chips deploy models. Deployed models improve vehicles and robots. Vehicles and robots generate more data. Megapacks power the data centers. SpaceX launches the satellites. The packaging fab packages the chips. The proposed Terafab would manufacture them.

No competitor controls all of these simultaneously. Google comes closest with Waymo, DeepMind, TPU infrastructure, and YouTube — but lacks space assets, energy storage, humanoid robotics, and a consumer vehicle fleet generating real-world data at scale. Amazon has AWS, Zoox, Kuiper, and Trainium — but far less vertical integration across the physical and digital layers. Microsoft and OpenAI have no hardware manufacturing, no vehicle fleet, and no space infrastructure.

The chip architecture itself is converging. Tesla’s roadmap spans seven generations, but two matter most: AI5, the next-generation chip dual-sourced from TSMC on 3nm and Samsung on 2nm, with volume production confirmed by Samsung for H2 2027, designed to serve vehicles, Cybercab robotaxis, and Optimus robots simultaneously; and AI6, covered by a $16.5 billion Samsung contract through 2033, designated for Optimus and data centers.[11] Beyond AI6, a restarted Dojo 3 project (now called AI7) targets “space-based AI compute” for SpaceX. The ambition is a unified chip family — “excellent for inference and at least pretty good for training,” as Musk put it — serving every entity from a shared architecture, with a new design every nine to twelve months.[12]

The financial architecture matches the ambition. Tesla’s accumulated AI-related capital expenditure — including infrastructure — reached approximately $5 billion through the end of 2024, per CFO Vaibhav Taneja; the company’s 2026 capex guidance, also stated by Taneja on the Q4 FY2025 earnings call, exceeds $20 billion.[13] xAI has raised approximately $45 billion in total funding, including a $20 billion Series E in January 2026 at a $230 billion valuation, from investors including Nvidia, Fidelity, and the Qatar Investment Authority.[14] The SpaceX-xAI combination, completed in February 2026, was valued at $1.25 trillion — CNBC reported it as the largest corporate combination in history.[15] Tesla invested $2 billion in xAI shortly before the SpaceX acquisition, converting its stake into a minority position in SpaceX.[16]

Tesla’s market capitalization of approximately $1.3 to $1.6 trillion trades at roughly 200 to 300 times trailing earnings — depending on the measurement date — versus a legacy automaker median of 7 to 12 times trailing earnings. A Bank of America sum-of-parts analysis from late 2025 attributes only 12 percent of Tesla’s enterprise value to its core automotive business, with the remainder assigned to robotaxi, Optimus, Full Self-Driving (FSD) software, and energy storage.[17]

That remaining 88 percent — roughly $1 to $1.35 trillion in market capitalization — is the market’s bet that the silicon strategy will work.[18] The orbit is the substrate beneath the premium.

How the orbit was built

The trajectory from Mobileye customer to semiconductor fab builder followed a pattern: each successful step created the institutional confidence for the next, harder bet. Each bet was an order of magnitude more difficult than the last. And the one failure in the sequence — Dojo — revealed where the pattern breaks.

After the Mobileye breakup, Tesla pivoted to Nvidia’s Drive PX 2 for Hardware 2 in October 2016. But months before the split went public, Musk had already placed his bet on silicon independence. In January 2016, Tesla hired Jim Keller — architect of AMD’s Zen, Apple’s A4/A5, and DEC Alpha — as VP of Autopilot Hardware Engineering. Weeks later, Pete Bannon joined from Apple, where he had led development of the A5 through A9 processors.[19] Two of the most decorated chip architects in the industry were hired simultaneously by a car company with no semiconductor program. The FSD chip design team was formed in February 2016. Eighteen months later, in August 2017, the chip taped out. First silicon returned in December 2017, functional but requiring a respin; the production B0 stepping qualified in July 2018.[20]

Hardware 3 shipped in vehicles from March 2019 — Samsung 14nm, 260mm² die, 6 billion transistors, two custom neural network accelerators delivering approximately 37 trillion operations per second (TOPS) each for a combined 73 TOPS per chip, with two chips per vehicle board providing both redundancy and a total system throughput of roughly 146 TOPS — all within the 100-watt-per-chip envelope of the Nvidia board it replaced.[21] Tesla’s Bannon claimed a 21-fold improvement in image processing over HW2.5 at Autonomy Day in April 2019.[22] Three years from concept to volume production, by a company that had never designed a chip. The accomplishment was genuine and should not be diminished by what followed.

What followed was the belief that if Tesla could build an inference chip, it could build a training chip. The Dojo project, led by former AMD designer Ganesh Venkataramanan, produced the D1 — a 645mm², 50-billion-transistor training processor fabricated by TSMC on 7nm, with 354 custom CPU cores, novel tile-based interconnects, and custom floating-point formats.[23] Tesla scaled D1 into increasingly large clusters, reaching over an exaFLOP of compute across 3,000 chips per deployment unit. During testing, a single cabinet drew 2.3 megawatts before tripping a power substation in San Jose.[24] Musk stated plans to invest over a billion dollars in Dojo infrastructure, including a $500 million facility in Buffalo, New York.[25] Morgan Stanley projected that Dojo could add $500 billion to Tesla’s market capitalization.[26]

The D1 was technically impressive. It was also, in Musk’s eventual assessment, “an evolutionary dead end.”[27] The problem was not the silicon — it was the ecosystem. Training chips compete in Nvidia’s world: CUDA, cuDNN, PyTorch, the entire software stack that makes GPU clusters productive. Dojo required a fully custom software environment with no support for established frameworks. Meanwhile, Tesla was simultaneously building its Cortex cluster on Nvidia H100s—and Cortex was delivering results. The Nvidia cluster trained the models that powered FSD v13. Dojo trained ambitions. By mid-2025, approximately twenty core Dojo engineers departed to found DensityAI, a stealth startup building AI infrastructure chips for robotics and autonomous systems.[28] In August 2025, Bloomberg reported that Tesla disbanded the remaining Dojo team. Bannon — the man who had designed Apple’s A5, delivered Tesla’s HW3, and led the chip program since Keller’s 2018 departure — left the company.[29]

The Dojo write-off was the structural pivot—and, in fairness, the kind of kill decision most organizations cannot make. Walking away from a billion-dollar program with technically impressive silicon because the ecosystem math didn’t close is the opposite of sunk-cost thinking. Musk’s conclusion was not to abandon custom silicon but to abandon the two-architecture approach. AI5 and AI6 would be designed as a unified chip family: optimized for inference but capable of training when deployed in large clusters. “It doesn’t make sense for Tesla to divide its resources and scale two quite different AI chip designs,” Musk wrote.[30] Dojo 3, announced in January 2026, would use the AI5/AI6 architecture — not the D1 lineage — targeting space-based compute for SpaceX.[31]

The escalation continued, but on a single track instead of two. And the next step was harder, not easier. Dojo failed at the software layer — building a training ecosystem to rival CUDA. Terafab is a bet on the manufacturing layer, where the barriers are higher still. “2nm” is a marketing designation, not a physical gate length; actually building at that node requires either partnering with a foundry that offers access to its process technology or developing proprietary process engineering from scratch. TSMC and Samsung do not license their process design rules to third parties — that is the foundry model’s entire moat. Intel Foundry Services is the only leading-edge manufacturer to offer something approaching a partnership model for external fabrication, which makes Musk’s mention of a potential Intel collaboration not one option among three but the only technically plausible path for Terafab at the leading edge.[32]

The inference-to-training pivot had a parallel in hardware manufacturing. SpaceX opened the largest PCB manufacturing site in the United States at Bastrop, Texas, in 2024, supplying Starlink’s demand for printed circuit boards at industrial scale. Equipment for an advanced chip packaging line — using fan-out panel-level packaging with 700mm × 700mm substrates, the largest in the industry — began arriving at Bastrop in September 2025, with small-scale production targeting the third quarter of 2026.[33]

Then came the Samsung megadeal: $16.5 billion through 2033 for AI6 chip fabrication at Samsung’s Taylor, Texas facility using 2nm gate-all-around transistors — Samsung Foundry’s largest publicly disclosed contract.[34] And then Terafab itself — Tesla’s own fab, announced on the Q4 2025 earnings call, with Musk noting that even best-case supplier projections fell short of Tesla’s projected chip demand within three to four years.[35]

The pattern traces from Mobileye customer to Nvidia customer to inference chip designer, through the failed training chip attempt and the pivot to a unified chip family, past the $16.5 billion Samsung contract and the SpaceX packaging fab, to the announcement of Tesla’s own semiconductor fab. Each step required capabilities that the previous step didn’t test.

The escalation is genuine, the execution is partial, and the distance between where Tesla is (shipping AI4, designing AI5) and where Terafab requires it to be (operating a competitive 2nm fab) is the distance between designing a chip and manufacturing one, which is to say, the distance between architecture and physics.

Only one company has successfully completed the first half of this transition. Apple went from chip customer to chip designer over 12 years — the PA Semi acquisition in 2008, A4 in 2010, and M1 in 2020 — and now designs the most power-efficient consumer silicon on earth under a design lead, Johny Srouji, who has run the program for 18 years.[56] But Apple stopped at design. TSMC fabricates every Apple chip. Apple also ships over 230 million iPhones a year — the amortization base that makes custom silicon economics viable. Tesla’s Autopilot-equipped fleet is roughly two orders of magnitude smaller today; the economic case for custom silicon depends on Optimus, Cybercab, and xAI inference volumes that do not yet exist — though Apple’s own volumes were similarly modest at the A4 stage in 2010. The most successful custom silicon program in history examined the gap between chip design and manufacturing and chose not to cross it. Tesla, with a smaller volume base and a less stable design team, is proposing to cross it.

What actually ships

The honest assessment requires separating silicon that exists from silicon that has been announced.

Hardware 3 was real. It shipped in millions of vehicles, proved a car company could design competitive inference silicon, and established the institutional capability. Hardware 4 ships today — Samsung 7nm, with neural network performance of roughly 50 TOPS per system-on-a-chip (SoC), running FSD v13 at native camera resolution; Musk has described it as three to eight times more powerful than its predecessor.[36] These are production chips doing production work. xAI’s Colossus is real — 200,000-plus Nvidia GPUs that trained Grok 3, which ranked number one on the LMSys Chatbot Arena at launch.[37] SpaceX’s Bastrop PCB plant is operational, and the packaging facility is in equipment installation. STMicroelectronics has shipped over five billion Starlink chips, as independently confirmed in an SEC 6-K filing.[9] These are verifiable accomplishments.

AI5 is not. Every performance claim for AI5 — 40 to 50 times AI4 performance, 8 times raw compute, 3 times efficiency per watt — traces to Musk’s statements on X and earnings calls.[39] No independent benchmark exists. No third-party teardown has been published. The production timeline has slipped three times: originally scheduled for January 2026, revised to the end of 2026, pushed to early 2027, and now confirmed by Samsung’s foundry president as H2 2027 volume production at the Taylor, Texas facility.[40] The dual-sourcing strategy — TSMC 3nm and Samsung 2nm producing physically different implementations of the same chip across foundries using different transistor architectures (FinFET versus gate-all-around nanosheet) — is a genuine act of supply chain sophistication, providing both risk mitigation and supplier leverage that no other automaker has attempted. It is also unprecedented for an automaker and exceptionally rare even among dedicated semiconductor companies, requiring parallel validation and qualification processes where the power curves, thermal behavior, and memory bandwidth characteristics differ at the transistor level.[41] AI5 may well deliver. But the orbit’s multi-trillion-dollar valuation premium rests substantially on a chip that has not been independently tested, whose production date has shifted three times, and whose claimed performance improvement is extraordinary by any industry standard. For comparison, Nvidia’s generational improvements between H100, H200, and B200 have ranged from roughly 1.5 to 3 times on comparable inference workloads. Tesla is claiming 40-50 times.

AI6 is a contract. The $16.5 billion Samsung deal is signed and confirmed by multiple independent sources, but it is a manufacturing commitment, not a product.[34] AI6 design is in progress; volume production was projected for mid-2028, but Samsung’s 2nm multi-project wafer run has slipped approximately six months, pushing realistic volume production to late 2028 or beyond. Terafab is, as of this writing, an earnings call disclosure that became a launch event — site confirmed (Austin), corporate structure announced (Tesla/SpaceX joint venture), but no process partner, no filed construction permit, and no disclosed engineering plan.[1]

The gap between what ships and what’s announced is the silicon version of the Commitment-vs-Spend Gap I’ve applied to hyperscaler nuclear deals and AI capex elsewhere in this publication.[44] Announced chip performance is to delivered silicon what a hyperscaler’s nuclear MOU is to grid-connected power: the intention is real, the physics are unforgiving, and the timeline is almost always longer than the press release suggests.

The focus

Kepler’s first law: orbits are ellipses with the gravitating body at one focus. Not the center — the focus. Remove the mass at that point, and the orbit dissolves into a straight line. Musk is the gravitational mass at the focus of this orbit. He is the CEO of Tesla, the controlling shareholder of SpaceX (which now owns xAI), and the person who directs resource allocation across all five entities. The chips, the data, the capital, and the talent all curve around him. The structural question is whether a single focal point — no matter how energetic — can hold an orbit this large.

Three risks concentrate at the focus.

The first is execution. The orbit’s value depends on silicon that doesn’t yet exist at the claimed performance level. Tesla has proven it can design inference chips — HW3 is the evidence. It has not proven it can design training-grade chips (Dojo failed), manufacture chips at scale (Terafab is an announcement), or deliver a chip at 40 to 50 times its predecessor’s performance (AI5 is unverified).

Musk’s companies have a track record of delivering manufacturing feats that conventional analysis deemed impossible — reusable rocket boosters, a Gigafactory built in 11 months, a 100,000-GPU data center assembled in 122 days. Semiconductor fabrication is categorically different. The escalation from inference design to fab construction requires capabilities that are not incrementally but fundamentally harder. Intel spent decades and hundreds of billions of dollars building fab capacity, yet it still fell behind TSMC. Samsung’s Taylor, Texas, facility — where Tesla’s AI6 will be manufactured — has faced yield challenges that have delayed the production of its most advanced nodes.[45] Tesla proposing to build a competitive 2nm fab from scratch is not impossible. It is, however, the hardest thing in manufacturing.

There is also a physical bottleneck that no amount of capital can accelerate. A 2nm fab requires extreme ultraviolet lithography tools manufactured by ASML — the only company on earth that makes them. Each machine costs over $300 million; ASML’s order book is filled through 2028.[57] Even Jensen Huang — who has more reason than anyone to want additional fab capacity in the world — warned at a TSMC event in November 2025 that building advanced chip manufacturing is “extremely hard” and that matching TSMC’s capabilities is “virtually impossible.”[58] Musk’s response has been characteristically dismissive: in a January 2026 interview, he argued that the semiconductor industry has “got cleanrooms wrong” and bet he could eat a cheeseburger and smoke a cigar inside a 2nm fab, proposing “wafer isolation” — sealing wafers in nitrogen-purged micro-environments — as an alternative to the hyper-sterile factory floors that every leading-edge fab on earth considers non-negotiable.[59] Whether this reflects a genuine insight about wafer containment or a misunderstanding of what 2nm fabrication physically requires is a question the March 21 event left unanswered.

The second is governance. Synergies and conflicts of interest are the same resource flows viewed from different angles. Tesla sold $430 million in Megapack batteries to xAI in 2025 — a synergy from Tesla Energy’s perspective, but a related-party transaction from a shareholder’s perspective.[46] Musk diverted 12,000 H100 GPUs from Tesla to xAI in late 2023, delaying over $500 million in Tesla shipments — confirmed by leaked Nvidia internal communications reported by CNBC.[47] Musk’s defense — that Tesla’s Cortex facility wasn’t ready to receive the GPUs — may explain the logistics, but it does not address the fiduciary question: GPUs purchased with Tesla capital were redirected to a private entity in which Musk held a controlling personal stake. At least eleven Tesla AI employees migrated to xAI, including the computer vision chief.[48]

The $2 billion Tesla investment in xAI crystallizes the problem. In November 2025, Tesla shareholders voted on whether to authorize a potential xAI investment: 1,058,999,435 shares in favor, 916,321,296 against, 473,073,200 abstaining. Tesla’s bylaws count abstentions as votes against; the proposal failed.[49] Tesla’s board approved the investment two months later anyway, under existing board authority.

The board that approved it is the board where Chancellor McCormick has found “extensive ties” to Musk, where Kimbal Musk serves as a director, and where chair Robyn Denholm publicly claimed Tesla and xAI are “fundamentally different in the AI space” — a position rendered untenable five days later when Musk announced Digital Optimus, a formal joint Tesla-xAI project integrating Grok with Tesla hardware.[50] A shareholder lawsuit in Delaware Chancery Court seeks to force Musk to disgorge his entire xAI stake to Tesla.[51] Tesla’s subsequent reincorporation in Texas and the passage of Texas SB 29 — imposing a three percent ownership threshold for derivative lawsuits, approximately $40 to $50 billion in Tesla stock at recent prices — raised the barriers to shareholder litigation precisely as cross-entity flows accelerated.[52] At the same meeting where the xAI investment proposal failed, a shareholder proposal to repeal the three percent threshold was defeated by a three-to-one margin — 611 million shares for repeal versus 1.82 billion against.[49]

The operational management of each entity is distributed. SpaceX runs under Gwynne Shotwell. Tesla has a full executive team. The orbit functions without Musk personally directing daily operations. The governance risk is not operational — it is allocative. Capital allocation, strategic direction, and the cross-entity resource flows that determine where GPUs ship, where talent migrates, and where shareholder money goes are directed by one person who sits on both sides of every transaction. This is also, it must be said, the structure that built the orbit in the first place. No committee would have moved 12,000 GPUs to Memphis in a weekend, redirected Megapacks to power them, or signed a $16.5 billion Samsung contract while simultaneously announcing a competing fab. The speed and audacity are features of concentrated control, not accidents. The question is whether the same structure that enables a $1.25 trillion combination also produces the accountability that $1.25 trillion requires — and the shareholder vote override, the SB 29 threshold, and the Proposal 10 defeat suggest the answer is: not yet.

On March 21, the answer evolved but did not resolve. Musk took the stage at Austin’s defunct Seaholm Power Plant to formally launch the Terafab Project — and SpaceX, not Tesla, made the initial announcement, with Tesla’s own post following.[60] The project was described as a joint venture between Tesla and SpaceX, with Musk calling it the most “epic chip building exercise in history by far.”[60] Then came the allocation: the facility’s stated goal is to produce over a terawatt of compute per year, with approximately 80 percent of output designated for space applications and 20 percent for ground — Musk’s announced target, not yet a contractual commitment, but a signal of where the value is intended to flow. Eighty percent for SpaceX. Twenty percent for Tesla.

A formal joint venture is a governance instrument: auditable, contractual, with defined capital contributions. That is a structural improvement over the ad hoc resource flows described in the piece. But the allocation ratio tells you where the value flows. Tesla shareholders — whose board overrode their vote to invest $2 billion in xAI, which SpaceX then acquired — are now co-funding a semiconductor facility that will produce four-fifths of its output for a company they do not own shares in. The JV institutionalizes the cross-entity structure. It does not independently govern it. And the 80/20 split tells you which entity the orbit was built to serve.

The third is the Nvidia dependency paradox. Musk’s silicon strategy exists to achieve compute independence from Nvidia. The ecosystem it replaces currently provides most of the computing power on which the ecosystem runs. Tesla and xAI combined spend an estimated $24 to $26 billion annually on Nvidia GPUs and related infrastructure — a figure that includes data center construction and networking, not just GPU procurement alone.[53] xAI’s Colossus runs on Nvidia. Tesla’s Cortex runs on Nvidia. Custom silicon (AI5) won’t reach volume production until mid-2027 at the earliest. xAI’s rumored custom ASIC with Broadcom — codenamed X1 — is in early development.[54]

The dependency is temporary by design — building the replacement is the whole point of the silicon strategy. But the transition window has two dimensions. The first is whether AI5 ships on time. The second is whether Nvidia’s own roadmap — H100 to B200 to GB300, each generation compressing the performance gap that custom silicon is designed to exploit — narrows the advantage before Tesla’s chips reach production. If AI5 slips again, or if Nvidia deprioritizes Musk’s entities during a supply crunch — and the GPU diversion scandal shows resource conflicts have already occurred, albeit flowing in the other direction — the orbit loses momentum precisely when it needs maximum compute to train the models that justify the valuation premium.

The mirror

The parallel to “Open Source, Closed Orbit” — this publication’s analysis of Nvidia’s ecosystem strategy — is structural, not superficial. That piece described Nvidia’s developer ecosystem as a black hole: centripetal, routing all gravity back to Nvidia hardware, converting open-source community adoption into hardware lock-in through an eight-domain replication strategy. The community thought it was building freedom; it was building capture.[55]

Musk’s orbit is a different shape but shares the same structural vulnerability. Nvidia’s black hole depends on CUDA — remove CUDA compatibility, and the ecosystem unravels. Musk’s orbit depends on the focal point — remove or overload the single person directing resources across five entities, and the orbit dissolves. Both are closed systems that derive their power from a single irreplaceable element. Nvidia’s is a software ecosystem. Musk’s is a person.

The deeper parallel is this: both systems are simultaneously the strongest competitive position in their domain and the most concentrated risk. Nvidia’s CUDA lock-in makes it nearly impossible for competitors to attract developers — and nearly impossible for Nvidia to evolve past CUDA if a better paradigm emerges. Musk’s cross-entity orbit makes it nearly impossible for competitors to assemble the same vertical stack — and nearly impossible to institutionalize the orbit beyond one person’s attention span, capital allocation decisions, and tolerance for related-party governance risk.

What breaks

The orbit-and-focus framework produces specific falsifiability conditions.

The orbit thesis strengthens if AI5 ships on time and delivers independently verified performance within the range of Musk’s claims — specifically, if a third-party benchmark confirms that performance exceeds 1,500 TOPS in production silicon by mid-2027. It strengthens further if Terafab breaks ground with a credible construction timeline and process partner, and if SpaceX’s Bastrop packaging facility reaches volume production on its stated Q1 2027 schedule. Each delivered milestone compresses the gap between announced ambition and operational reality.

The focus thesis strengthens if AI5 slips again, if Terafab remains an announcement without filed construction permits by the end of 2026, or if the Delaware Chancery lawsuit yields discovery revealing the scale of inter-company transfers the board did not independently evaluate. It strengthens decisively if Musk’s attention fragments further — a real risk given that he simultaneously serves as CEO of Tesla, controls SpaceX (which now includes xAI), owns X, and has intermittently led the Department of Government Efficiency. The orbit is built around a person, not an institution, and whether it can be institutionalized is a question the current governance structure is not designed to answer.

The two theses don’t yield a clear verdict because they describe the same structure from opposite angles. The orbit is real: no competitor can assemble this combination of fleet data, training compute, custom silicon, edge deployment, energy infrastructure, satellite connectivity, and packaging capability under coordinated control. The focus is real: every element of that combination depends on chips that haven’t been independently verified, governance that hasn’t been independently tested, and compute independence that hasn’t been achieved.

March 21 told us something. Musk delivered a site — Austin — and a corporate structure — a Tesla/SpaceX/xAI joint venture.[61] He described an “advanced technology fab” with equipment to make and test any kind of chip, starting small before scaling to the mega-facility. That is more than a rendering and a target date. It is also less than a process partner, a filed construction permit, or a disclosed engineering plan. Bloomberg noted that Musk “has no background in semiconductor production and a history of over-promising on goals and timelines.” The stepping-stone approach — build a process development fab first, learn, then scale — is how credible semiconductor companies operate, and it is a more realistic path than the hundred-thousand-wafer-starts-per-month announcement suggested. It also means the timeline to volume production extends further, which lengthens the Nvidia dependency window the piece has described.

The structural question the event could not answer is the one the orbit poses by its own design: whether the most ambitious vertical integration play in the history of the technology industry can survive the single point of failure it was built around. Tesla went from a company that couldn’t modify a Mobileye algorithm to a company that proposes to manufacture its own 2nm semiconductors — and allocate 80 percent of the output to space. The orbit that connects those two points is extraordinary. The focus that holds it together is one person, five companies, and an allocation ratio that tells you which company the orbit was really built for.

Notes

[1] Musk, X post, March 14, 2026. Tesla first confirmed Terafab on its Q4 FY2025 earnings call on January 28, 2026. CFO Vaibhav Taneja acknowledged cost was “not yet incorporated” into the $20B+ 2026 capex guidance. The $25B estimate and specific output figures (100–200B chips, 100K wafer starts/month scaling to 1M) derive from Musk statements across the earnings call, shareholder meetings, and X posts — not engineering plans or SEC filings.

[2] TSMC's total output comparison is approximate. TSMC’s monthly wafer starts across all nodes exceeded 1.3 million as of 2024. Tesla’s 1M wafer target would approach ~70% of TSMC’s total advanced-node capacity, concentrated in a single US facility.

[3] Tesla Autopilot hardware history: Wikipedia, “Tesla Autopilot hardware,” accessed March 2026. HW1 used Mobileye EyeQ3, processing ~110 fps. Vehicles manufactured after September 2014.

[4] Mobileye ended its partnership in July 2016. Amnon Shashua: Tesla “was pushing the envelope in terms of safety.” Tesla claimed Mobileye attempted to block in-house vision development. CNBC, Bloomberg, Consumer Reports, July–September 2016.

[5] Tesla’s cumulative vehicle deliveries exceed 7 million through Q1 2026. “Over four million” refers to vehicles equipped with HW3 or later vision-based Autopilot hardware capable of generating training data, not total deliveries. Earlier HW1/HW2 vehicles generate less usable training data due to sensor limitations.

[6] Tesla Q4 FY2025 earnings call, January 28, 2026; shareholder deck. Cortex deployment is described as “roughly 50,000 H100 GPUs” at Q4 FY2024, expanded through 2025. 81,000 H100-equivalent figure includes 16,000 H200s per multiple reports. The H100-equivalent conversion is approximate.

[7] xAI Colossus: 100,000 H100s operational September 2024 (122-day build); doubled to 200,000 by early 2025; 150K H100 + 50K H200 + 30K GB200 as of mid-2025. Sources: CNBC, Data Center Dynamics, R&D World, xAI official page. xAI claims “over 1 million H100-equivalent compute” as of January 2026 — vendor-claimed.

[8] Tesla Megapack sales to xAI: $430 million in 2025, per multiple reports, including Electrek and financial news aggregators. Described as 3.4% of Tesla Energy revenue.

[9] STMicroelectronics: “over 5 billion” RF antenna chips shipped to SpaceX. Source: STMicroelectronics SEC 6-K filing and press release, 2025. Decade-long partnership. Current rate: 5 million chips per day, projected to double by 2027. Yahoo Finance, SemiWiki.

[10] Texas Governor’s Office press release, March 12, 2025: $17.3 million Texas Semiconductor Innovation Fund grant to SpaceX for $280 million expansion of Bastrop facility. One million additional square feet, 400+ jobs, including PCB production, semiconductor failure analysis lab, and advanced packaging.

[11] Tesla chip roadmap: AI4 (Samsung 7nm, shipping since January 2023), AI4.5 (three-SoC transitional), AI5 (TSMC 3nm + Samsung 2nm, H2 2027 per Samsung Foundry President Han Jin-man at Samsung shareholders’ meeting, March 18, 2026; Reuters), AI6 ($16.5B Samsung contract, 2nm GAA, volume late 2028 or beyond — Samsung’s 2nm MPW run postponed ~6 months per The Elec, March 2026; Electrek), AI7/Dojo 3 (restarted January 2026 for space compute). Sources: Reuters, The Elec/Electrek, TrendForce, Wikipedia “Tesla Autopilot hardware”.

[12] Musk, X post, August 2025: “The Tesla AI5, AI6, and subsequent chips will be excellent for inference and at least pretty good for training.” Iteration cadence for new chip design: every 9–12 months, per Musk, reported by Teslarati and multiple outlets, late 2025.

[13] Tesla AI capex: CFO Taneja, Q4 FY2024 earnings call (January 2025): “accumulated AI-related capital expenditures, including infrastructure, so far have been approximately $5 billion.” This is management’s characterization of how total capex was allocated, not an audited line item — Tesla’s 10-K (filed January 29, 2025, SEC EDGAR CIK 0001318605) reports total capital expenditures of $11.34B for FY2024 but does not separately itemize AI-related spending. 2026 capex guidance of $20B+ was stated by Taneja on the Q4 FY2025 earnings call, January 28, 2026. B-tier for AI allocation; A-tier for total capex.

[14] xAI funding: Series E raised $20B at $230B valuation, per xAI official announcement and CNBC, January 2026. Investors include Nvidia, Fidelity, Qatar Investment Authority, and MGX (Abu Dhabi). Total funding is approximately $45B, including ~$5B debt facility (Morgan Stanley). Revenue estimated at ~$3.8B ARR end-2025 (consolidated with X) per Sacra and analyst estimates — private company, no filings.

[15] CNBC, February 3, 2026: “Musk’s xAI, SpaceX combo is the biggest merger of all time, valued at $1.25 trillion.” All-stock acquisition of xAI by SpaceX.

[16] Tesla $2B xAI investment: confirmed January 2026, per Electrek, CNBC, FinancialContent. Tesla’s stake was converted to a minority position (<1%) in SpaceX following the SpaceX-xAI acquisition. FTC clearance confirmed per Analytics Insight.

[17] Bank of America's sum-of-parts analysis, late 2025, was attributed across multiple financial news sources, including Investing.com. Breakdown: automotive 12%, robotaxi 45%, Optimus 19%, FSD software 17%, energy storage 6%. Tesla's P/E ratio is approximate and varies significantly with share price; the range of ~200–300× trailing earnings reflects fluctuation across Q4 2025–Q1 2026. Legacy automaker median P/E of 7–12× per analyst coverage.

[18] The implied AI/autonomy premium is the author’s calculation: market cap minus a fundamental DCF valuation of the automotive business, estimated at roughly $200–250 billion based on peer automotive multiples applied to Tesla’s auto revenue.

[19] Keller was hired in January 2016 from AMD; Bannon was hired in February 2016 from Apple. Both reported by Electrek (exclusive, February 28, 2016) and 9to5Mac. Keller: architect of AMD Zen, Apple A4/A5, DEC Alpha. Bannon: led Apple A5–A9 development. The FSD chip design team was formed in February 2016, per WikiChip.

[20] WikiChip, “FSD Chip – Tesla”: tape-out August 2017, first silicon December 2017, “fully working.” “A number of additional modifications were done to the design, requiring respinning.” B0 stepping released to manufacturing in April 2018, full production after qualification in July 2018.

[21] HW3 specifications: Samsung 14nm, Austin, TX fab, 260mm², 6 billion transistors, 12 ARM Cortex-A72 at 2.6 GHz, 1 GHz Mali GPU, two custom neural network accelerators (NPUs) at 36.86 TOPS each = 73.7 TOPS per chip. Two chips per vehicle board for redundancy, providing ~144 TOPS system throughput. 100W max power per chip. WikiChip; Wikipedia “Tesla Autopilot hardware”; WikiChip Fuse, “Inside Tesla’s Neural Processor in the FSD Chip,” September 2019.

[22] Bannon, Tesla Autonomy Day, April 22, 2019: claimed 21× improvement in image processing (2,300 fps vs. 110 fps for HW2.5). Vendor-claimed. Hexus.net, WikiChip.

[23] Dojo D1: TSMC 7nm, 645mm², 50 billion transistors, 354 CPU cores, custom ISA (RISC-V + proprietary). CFloat8/CFloat16 formats published in the October 2021 whitepaper. Wikipedia “Tesla Dojo”; WikiChip.

[24] Dojo scaling architecture: Training Tile = 25 D1 chips in 5×5 array, 9 petaFLOPS at BF16, 11 GB SRAM, 36 TB/sec bandwidth, 15 kW (288A at 52V). Six tiles per System Tray (with 512 x86 host cores); two trays per Cabinet; 10 Cabinets per ExaPOD (3,000 D1 chips, >1 exaFLOP). Cabinet tripped the 2.3 MW substation in San Jose, per the AI Day 2022 presentation. Wikipedia “Tesla Dojo”.

[25] Musk, Q2 2023 earnings call (Bloomberg, July 19, 2023): “We will be spending well over $1 billion on Dojo” through end-2024. This is a stated investment plan, not a confirmed expenditure. CFO Kirkhorn clarified: split between R&D and capex, in line with the three-year expense outlook. Buffalo $500M investment: TechCrunch, January 26, 2024; confirmed at New York governor press conference. The Register, July 21, 2023; Fortune, December 7, 2023.

[26] Morgan Stanley, September 2023: projected Dojo could add $500 billion to Tesla's market cap via robotaxi and software revenue. Analyst estimate, B-tier.

[27] Musk, X post, August 11, 2025: “Once it became clear that all paths converged to AI6, I had to shut down Dojo and make some tough personnel choices, as Dojo 2 was now an evolutionary dead end.” TechCrunch, Electrive, eWeek.

[28] DensityAI: co-founded by Ganesh Venkataramanan, Bill Chang, and Ben Floering. Reuters reported ~20 departures. TechCrunch, Bloomberg, eWeek, August 2025.

[29] Bloomberg, August 7, 2025: Tesla disbanded the Dojo team, Pete Bannon departing. Confirmed by CNBC and TechCrunch.

[30] Musk, X post, August 2025: “It doesn’t make sense for Tesla to divide its resources and scale two quite different AI chip designs.” eWeek, TechCrunch, Electrive.

[31] Musk, January 2026: Dojo 3 restarted, built on the AI5/AI6 chip family for “space-based AI compute.” TechCrunch, January 20, 2026; TechSpot; Teslarati.

[32] “2nm” is a marketing designation used across the semiconductor industry that does not correspond to a physical transistor dimension. TSMC and Samsung operate as pure-play foundries: they manufacture chips designed by customers but do not license their process design rules or process technology to third parties. Intel Foundry Services (IFS) is the only leading-edge manufacturer to offer external fab partnerships, though IFS has struggled with utilization and its own process delays. A Terafab using Intel process technology would be a licensed fab; a Terafab developing proprietary process technology would be historically unprecedented for a company outside the foundry industry. Sources: semiconductor industry structure is well-established; Musk Intel comments per Reuters, Data Center Dynamics; Digitimes reporting notes Musk has attracted “senior experts from TSMC, Intel, Samsung.”

[33] SpaceX FOPLP facility: equipment delivery began in September 2025, installation in Q1 2026, small-scale production in Q3 2026, and large-scale production in Q1 2027. 700mm × 700mm substrates — the largest in the industry. Sources: Digitimes via Tom’s Hardware, SemiWiki, SmBom, GlobalSMT. SpaceX's Bastrop PCB plant (the largest in the US) will be operational in 2024.

[34] Samsung-Tesla $16.5B deal: Bloomberg, CNN, TechCrunch, KED Global, July 28, 2025. Through December 2033. 2nm GAA with high-NA EUV at Taylor, TX. Musk stated the figure is “just the bare minimum.”

[35] Musk, Tesla annual shareholders meeting, 2025: “Even when we extrapolate the best-case scenario for chip production from our suppliers, it’s still not enough.” Terafab confirmed the Q4 FY2025 earnings call. Reuters, Data Center Dynamics, multiple reporting.

[36] AI4/HW4: Samsung 7nm (Hwasung, South Korea). Neural network accelerator performance of ~50 TOPS per SoC; Musk described HW4 as “three to eight times more powerful” than HW3 (vendor-claimed). 20 CPU cores per side at 2.35 GHz. 16 GB RAM, 256 GB storage — double and quadruple HW3, respectively. FSD v13 runs at native camera resolution on AI4 per reporting as of March 2026. Wikipedia “Tesla Autopilot hardware”; AutoPilot Review HW4 teardown, August 2023.

[37] Grok 3: trained on 200K H100s, ~200M GPU-hours, 15× Grok 2 compute; #1 LMSys Chatbot Arena at launch (February 2025). Grok 4 training estimated at ~246M H100-hours, ~$490M cost. Epoch AI, R&D World.

[39] AI5 performance claims — 40–50× AI4, 8× compute, 9× memory, 5× bandwidth, 3× efficiency, 800W — all source to Musk X posts and earnings calls. GlobalChinaEV, Teslarati, Tesery reporting on Musk statements. No independent verification exists as of March 2026.

[40] AI5 timeline slippage: originally January 2026 (Musk, annual meeting June 2024), revised to end-2026 (Q2 2025 earnings call), pushed to early 2027 (Musk X post, January 2026). Samsung Foundry President Han Jin-man confirmed volume production at Taylor, Texas, in “the second half of next year” (i.e., H2 2027) at Samsung’s shareholders’ meeting, March 18, 2026 (Reuters). This is the first independent confirmation from the foundry side of the slipped timeline. Wikipedia “Tesla Autopilot hardware”; TrendForce, November 2025; Reuters, March 18, 2026.

[41] AI5 dual-sourcing: TSMC 3nm (Arizona) and Samsung 2nm (Taylor, TX). TrendForce, October 2025: “AI5 production split between Samsung, TSMC.” Multi-foundry qualification is practiced by major semiconductor companies (Apple has used both TSMC nodes for different products; Qualcomm has split Snapdragon between Samsung and TSMC), but producing physically different implementations of a single chip design across two foundries with different process architectures is exceptionally complex.

[44] The Commitment-vs-Spend Gap is an analytical framework developed in this publication, applied to hyperscaler AI capex in “Chip and Mortar” and “Compute Equals Commitments,” and to nuclear energy announcements in “The Half-Life of a Press Release” (forthcoming). The gap measures the distance between announced investment commitments and verified capital expenditure or operational deployment.

[45] Samsung Taylor, TX, yield challenges: widely reported across the semiconductor industry press. Samsung’s advanced process nodes at Taylor have experienced yield issues, affecting customer timelines. TrendForce, November 2025: “Musk signals Tesla AI5 mass production delay to 2027, casting uncertainty over Samsung.”

[46] See note 8.

[47] CNBC, June 2024: leaked Nvidia internal emails confirmed Musk directed 12,000 H100 GPUs from Tesla to xAI. Musk’s response: Tesla “had no place to send the Nvidia chips” because the Cortex facility wasn’t ready. Data Center Dynamics, Fox Business, Yahoo Finance.

[48] At least 11 Tesla AI employees migrated to xAI, including computer vision chief Ethan Knight. TechCrunch, Electrek, multiple reporting.

[49] Tesla 8-K, filed November 7, 2025 (SEC EDGAR, CIK 0001318605), Item 5.07, Proposal 7: “Shareholder proposal regarding Board authorization of an investment in x.AI Corp.” For: 1,058,999,435. Against: 916,321,296. Abstained: 473,073,200. Broker Non-Votes: 302,456,274. Filing states: “Since our bylaws generally consider abstention as votes against, this was not approved under the bylaw standard.” The same filing shows Proposal 10 — a shareholder proposal to repeal the 3% derivative suit ownership threshold — was defeated 611,152,245 to 1,821,038,859. Tesla’s board approved the $2B xAI investment in January 2026 under existing board authority. SEC EDGAR

[50] Chancellor McCormick: found “extensive ties” between Tesla board members and Musk (Tornetta v. Musk compensation case). Robyn Denholm stated Tesla and xAI are “fundamentally different in the AI space.” Kimbal Musk serves on Tesla's board of directors. Digital Optimus / “Macrohard” joint Tesla-xAI project announced March 11, 2026 — formally integrates Grok as “System 2” reasoning layer with Tesla hardware “System 1.” CNBC, Electrek.

[51] Cleveland Bakers and Teamsters Pension Fund v. Musk, Delaware Court of Chancery, filed June 2024. Remedy sought: disgorgement of Musk’s xAI stake to Tesla. TechCrunch, Corporate Board Member. Senator Elizabeth Warren sent a ten-page letter to Tesla board chair Robyn Denholm in August 2024 describing the GPU diversion as a “glaring conflict of interest” and requesting an investigation. Warren Senate website; Fortune.

[52] Tesla reincorporated in Texas, June 2024. Texas SB 29 (signed May 2025): imposes 3% ownership threshold for filing derivative lawsuits, up from the near-zero threshold under Delaware law. The practical effect: most institutional and retail shareholders cannot meet the threshold individually, significantly raising barriers to fiduciary duty litigation. CNBC, Bloomberg Law.

[53] Combined Nvidia GPU and infrastructure spend estimate: Tesla ~$3–4B/year (Cortex H100/H200 procurement and data center construction); xAI ~$18B+ for Colossus 2, including facility construction, networking, and GPU procurement. Combined range of $24–26B is the author’s estimate and includes infrastructure costs, not GPU procurement alone. This would make the combined Musk entities collectively among Nvidia’s largest customers.

[54] xAI custom ASIC “X1”: reported by Digitimes and TweakTown; Broadcom reportedly won the project for LLM training. Samsung is in contention as a fab partner. Data Center Dynamics. Not officially confirmed by xAI.

[55] “Open Source, Closed Orbit: The Hardware Monopolist’s Guide to Owning Open Source,” The AI Realist, 2025. 6,283 words, 100 footnotes. Central framework: Nvidia’s “black hole” model (centripetal, routing to hardware) vs. Hugging Face’s “sun” model (centrifugal, hardware-agnostic).

[56] Apple Silicon timeline: PA Semi acquired April 2008 ($278M). A4 shipped in iPhone 4, June 2010. M1 shipped in November 2020. Johny Srouji joined Apple in 2008, became SVP Hardware Technologies in 2015, and has led the silicon program continuously since its inception. Apple shipped approximately 232 million iPhones in 2024 (IDC estimate). TSMC fabricates all Apple silicon; Apple has never operated or announced a fabrication facility. Sources: Apple press releases, IDC Quarterly Mobile Phone Tracker, WikiChip for chip specifications.

[57] ASML is the sole manufacturer of EUV lithography tools required for 2nm fabrication. EUV machines cost $300M+ each; the next-generation High-NA EUV (TWINSCAN EXE:5200B) costs approximately $350M. ASML’s Q4 2025 net bookings hit €13.2B, doubling expectations; 2026 revenue guidance €34–39B. Multiple industry sources report EUV delivery slots booked through 2028. Tom’s Hardware reader comments and industry analysts note this as the binding physical constraint on new fab capacity. Sources: ASML Q4 2025 earnings; Motley Fool, February 2, 2026; Tom’s Hardware, March 14, 2026.

[58] Jensen Huang, TSMC event, November 2025: “Building advanced chip manufacturing is extremely hard. It is not just building the plant, but the engineering, the science, and the artistry of doing what TSMC does for a living is extremely hard.” He told reporters that matching TSMC’s capabilities is “virtually impossible.” Source: Electrek, March 16, 2026, citing Huang’s November 2025 remarks.

[59] Musk, interview on Moonshots with Peter Diamandis, January 6, 2026: “I think they are getting clean rooms wrong in these modern fabs. I am going to make a bet here that Tesla will have a 2nm fab, and I can eat a cheeseburger and smoke a cigar in the fab.” His thesis: wafers should be sealed in nitrogen-purged micro-environments (”wafer isolation”) throughout the production line, making the hyper-sterile factory floor unnecessary. Semiconductor process engineers note that ISO Class 1 cleanrooms allow at most 10 particles ≥0.1 µm per cubic meter; a single human breath produces millions; smoking generates billions; and organic contamination damages EUV mirrors and fab chemistry. Sources: Tom’s Hardware, January 7, 2026; Wccftech, January 7, 2026; HotHardware, January 8, 2026; Pressvia, January 30, 2026.

[60] SpaceX (@SpaceX), X post, March 21, 2026: “Announcing TERAFAB: the next step towards becoming a galactic civilization.” Musk (@elonmusk), X post, March 21, 2026: “Formal announcement of the TERAFAB project, which will be done jointly by @SpaceX and @Tesla, tonight around 8 pm CT. Livestream on 𝕏. The goal is to produce over a TERAWATT of compute per year (logic, memory & packaging) with ~80% for space and ~20% for the ground.” Note: both the SpaceX announcement and Musk’s own framing place SpaceX first. The 80/20 allocation ratio is Musk’s stated target, not a contractual commitment — but it signals where the facility’s primary value is intended to flow. Musk presented at the defunct Seaholm Power Plant in Austin, calling it the most “epic chip building exercise in history by far.” Sources: TeslaNorth, March 21, 2026; Bloomberg, March 22, 2026; PANews, March 22, 2026.

[61] Bloomberg, March 22, 2026 (Hyunjoo Jin): “Musk said his Terafab project — a grand plan to eventually manufacture his own chips for robotics, artificial intelligence and space data centers — will be built in Austin and jointly run by Tesla and SpaceX.” He will “start off with an ‘advanced technology fab’ in Austin that will have all of the equipment necessary to make chips of any kind, and test them.” Bloomberg notes: “Musk, who has no background in semiconductor production and a history of over-promising on goals and timelines, had said before that the company will start with a smaller scale fab before moving to a bigger one.” Separately, Tom’s Hardware reported March 20, 2026, that Tesla began hiring a Technical Program Manager to “oversee the whole end-to-end fab program,” which the publication noted “indicates that the whole fab program is not in its early stages, but rather in its pre-stages and currently does not have a scope, strategy, or execution plan.”

Every Country Needs Sovereign AI. Jensen Is Selling It.

Julien Simon — Fri, 20 Mar 2026 18:05:25 GMT

On March 16, Jensen Huang stood on the GTC stage in San Jose and showed the audience a map. Dozens of countries. More than 1 million Nvidia GPUs are deployed worldwide through the company’s cloud partner network. 1.7 gigawatts of AI compute capacity — more than doubled from the year before.[1] “Sovereign AI,” he called it. The slide stayed up long enough for everyone to photograph it.

Five of the flags on that map belonged to countries I’ve been investigating. I wrote about why India exports its best AI talent instead of employing it.[2] Why Japan’s employment system creates a doom loop that prevents AI talent from accumulating.[3] Why France built Mistral despite its AI strategy, not because of it.[4] Why Singapore discovered that every model it could deploy came with someone else’s geopolitical strings attached.[5] Why South Korea’s hardware mastery doesn’t cross the software boundary.[6]

Each piece diagnosed a different structural failure. Each country’s system was optimized for an outcome that conflicts with the goals of frontier AI. And each failure created the same gap: a country that needs AI capability but cannot build it.

Jensen didn’t just identify that gap. He filled it, named it “sovereignty,” and sold it back to them on hardware subject to US export controls. “Sovereign” is a specific claim, and it invites a specific audit: does the legal structure, the commercial architecture, and the governance of the product actually deliver national independence? Nvidia chose the label. The audit follows from the label.

Five layers, one vendor

What Nvidia calls “sovereign AI” is not a single offering. It is a five-layer product stack, each layer reinforcing the others, each branded as enabling national independence.

The first layer is hardware. Nvidia certifies local cloud providers — telcos, state-backed operators, regional data centers — as NVIDIA Cloud Partners. Deutsche Telekom in Germany. SoftBank in Japan. Orange in France. Yotta in India. YTL in Malaysia. Cassava Technologies across Africa.[7] The NCP program doubled its GPU deployments in a single year, from 400,000 GPUs representing 550 megawatts at GTC 2025 to over one million GPUs and 1.7 gigawatts today.[8] The certification is the catalog constraint: NCPs run Nvidia’s reference architecture, Nvidia’s networking, Nvidia’s AI Enterprise software stack. The local operator builds and staffs the facility. Nvidia provides the blueprint.

The NCP’s commercial incentives — priority GPU allocation, NIM licensing, Nvidia’s brand and go-to-market engine — are aligned with Nvidia’s program, not with the country’s sovereignty interest. The operator chose distribution over independence. The country may not realize there are different choices.

The second layer is models. Nemotron 3 Ultra, previewed at GTC as the upcoming flagship, was explicitly positioned as a sovereign AI tool — a base model any country or enterprise can fine-tune for its own domain, language, or regulatory context.[9] The Nemotron family now spans language and reasoning, voice, multimodal, robotics, autonomous vehicles, drug discovery, and climate.[10] Open weights. Trained on Nvidia’s DGX Cloud. Optimized for Nvidia’s hardware through NVFP4 (Nvidia’s 4-bit numerical format for Blackwell).

The third layer is a coalition. The Nemotron Coalition, announced at GTC on March 16, brings together eight AI labs to co-develop open frontier models.[11] The first model will be a base model co-developed by Mistral AI and Nvidia on DGX Cloud, underpinning the upcoming Nemotron 4 family.[12] The other members — Black Forest Labs, Cursor, LangChain, Perplexity, Reflection AI, Sarvam, and Thinking Machines Lab — will contribute data, evaluations, and domain expertise.[13] Nvidia has invested in at least three of the eight: Mistral’s €1.7 billion Series C, Reflection AI’s $2 billion round, and Thinking Machines Lab’s $2 billion seed, with reporting suggesting Black Forest Labs as a fourth.[14]

The resulting model will be open-sourced and will underpin Nvidia’s own Nemotron 4 commercial family. Eight labs will contribute. Nvidia owns the brand, the training infrastructure, and the distribution. Open weights, Nvidia’s product.

The structure is familiar. In “Jensen’s COMECON,” I mapped Nvidia’s patron-satellite architecture at the infrastructure layer — neoclouds bound to Nvidia through bilateral deals where the patron simultaneously serves as investor, supplier, guarantor, and customer.[46] The Nemotron Coalition extends that architecture from infrastructure to intelligence. The neoclouds buy Nvidia’s GPUs. The coalition members will build Nvidia’s models. Same bilateral dependency, higher layer.

And the exit cost is not upstream — it is downstream. Once Sarvam fine-tunes Nemotron for Indian government services, those applications will be embedded in ministries, trained on institutional data, and certified for government use. Once Mistral Forge customers build on the Nemotron base for defense and space agencies, those deployments will run for years. Switching the base model means retraining, re-evaluating, and re-certifying every downstream application. The lock-in is not the contribution to Nemotron. It is everything the country builds on top of it.

The fourth layer is deployment. Starting with H100, Nvidia GPUs support confidential computing — a hardware-based Trusted Execution Environment that isolates data and model weights from the cloud operator during processing, encrypting everything that crosses the boundary between CPU and GPU. Available on both NCP-deployed hardware and DGX Cloud. Jensen’s sovereignty pitch at GTC leaned on it: “even the operator cannot see your data, even the operator cannot touch or see your models.”[15] The Palantir and Dell partnership extends this to air-gapped, on-premises deployment “in any country, in any air-gapped region.”[16] This addresses who can see the data during processing. It does not address who can compel access through legal process — which is a different question entirely.

The fifth layer is certification itself. The NCP program is the monitoring mechanism. Every certified partner runs Nvidia’s stack, reports against Nvidia’s benchmarks, and maintains compatibility with Nvidia’s hardware roadmap. The US government proposed a global AI chip export licensing framework in early 2026, building on the Biden-era AI Diffusion Rule that would have required chip exporters to monitor installations and recipients to use software preventing unauthorized clustering.[17] The framework was withdrawn by the Bureau of Industry and Security (BIS) on March 13 — three days before GTC.[18] It didn’t need to be enacted. The commercial architecture already meets the requirements of the regulation. The NCP certification tracks which GPUs are deployed where, which software they run, and which customers they serve.[57] The monitoring isn’t the regulation. It’s the product.

Jensen told the GTC audience that 60% of Nvidia’s business comes from the top five hyperscalers. The other 40% spans industrial, robotics, and sovereign AI.[19] In the Q4 FY2026 earnings call, Nvidia disclosed sovereign AI revenue for the first time: over $30 billion, more than triple year-over-year.[20] Sovereign AI is not a marketing campaign. It is a $30 billion revenue line inside a company that generated $215.9 billion in fiscal 2026.

Why there are buyers

The product exists because the vacuum is real. India’s services equilibrium exports its best talent before the domestic ecosystem can absorb it.[2] Japan’s employment system creates a doom loop that no government spending plan can break.[3] France’s sixty-year state apparatus optimizes for incumbent consolidation — Mistral succeeded by staying outside the system, not because the system produced it.[4] Singapore hit the trilemma: build, buy from the US, or download Chinese.[5] South Korea masters hardware but has never produced globally significant software.[6] Each country committed real money — India’s $1 billion IndiaAI Mission, Japan’s ¥10 trillion plan, France’s €109 billion summit, South Korea’s 260,000 Nvidia GPUs.[21][22][23][24] Every investment ran through Nvidia’s stack. Germany’s SOOFI consortium builds its sovereign foundation model using Nemotron frameworks on Deutsche Telekom’s Nvidia-powered cloud.[25] Singapore’s defense agencies are early Forge customers on Nvidia infrastructure.[26]

The pattern is the same in every country. A real need for sovereignty. A genuine investment. A structural constraint that prevents the investment from producing true sovereignty. And a single vendor filling every layer of the gap.[27]

What the Entity Test reveals

The Entity Test — the analytical tool this publication uses to assess whether a “sovereign” offering survives contact with the applicable legal framework — does not just fail when applied to Nvidia’s sovereign AI stack.[28][59] It inverts.

Start with hardware. Every GPU in the NCP network is manufactured by TSMC in Taiwan, designed in Santa Clara, and subject to US Bureau of Industry and Security export controls. BIS’s budget received a 23% increase for fiscal 2026, with funding specifically earmarked for semiconductor enforcement.[29] In February 2026, Applied Materials was fined $252 million for illegally exporting equipment to China — the second-largest penalty in BIS history.[30] The export control regime is not relaxing. The AI OVERWATCH Act, which would give Congress veto power over AI chip export licenses, passed the House Foreign Affairs Committee in January.[31] Every GPU in a sovereign AI factory depends on the export license remaining valid for continued operation.

Move to models. The Nemotron Coalition’s first base model will be trained on Nvidia’s DGX Cloud — a US-jurisdiction service.[12] Coalition members — including the two most explicitly associated with national sovereignty, Mistral and Sarvam — will contribute their expertise to a model that carries Nvidia’s brand and trains on Nvidia’s cloud.

Nvidia is more transparent than any other hardware vendor — that’s not contested, and it matters. Nemotron ships open weights, pretraining data (partially gated), training recipes, and technical reports.[47] A sovereignty customer gets more visibility into Nemotron than into Llama, GPT, or Claude. The capability is genuine. But transparency is not governance. For a researcher inside a sovereign AI program — say, a Singapore AI lab auditing for cultural or political bias before deploying to government services — the openness has a floor. Even with full NCP access and every gated dataset approved, the decisions that determine whose values the model reflects remain in Santa Clara: the data filtering criteria and thresholds that decided what was “low quality” and discarded; the mixing ratios that determine how much weight Southeast Asian content receives relative to English; the synthetic data generation prompts; the RLHF (reinforcement learning from human feedback) reward model preferences and annotator guidelines; and the checkpoint selection rationale that chose this version over another.[48] You get the ingredients. You don’t get the recipe proportions, the dishes sent back to the kitchen, or the reason this version was served. Many of these gaps are likely to create issues for EU AI Act compliance.

The synthetic data pipeline adds an ironic layer. Nemotron’s post-training data was generated using over ten models — including DeepSeek R1 and multiple Qwen variants alongside Western models like Phi-4, Mixtral, and Nvidia’s own Nemotron 4 340B.[49] The Chinese models in that pipeline are subject to documented content-control constraints. The political sensitivities embedded in those base models — the constraints that make DeepSeek hedge on Taiwan, that make Qwen keep answers about China “positive and constructive” — may survive into the synthetic outputs and from there into Nemotron’s training data. Nvidia’s filtering pipeline may catch them. But verifying that claim would require the filtering criteria, which are not published. The structural concern is not that contamination definitely occurred — it is that the sovereign customer cannot audit whether it did. Singapore’s trilemma was build, buy from the US, or download Chinese. Nemotron is option two with potential traces of option three, and the filtering that would resolve the question is Nvidia’s.

Move to deployment. Confidential computing prevents the cloud operator from accessing the customer’s data and models during processing. It does not prevent the US government from compelling access through legal process directed at Nvidia, the NCP, or any US-person entity in the chain. The CLOUD Act’s compelled disclosure provision at 18 U.S.C. § 2713 requires a court order, judicial process, probable cause, and a specific target.[34] But the Foreign Intelligence Surveillance Act (FISA) Section 702 permits warrantless collection targeting non-US persons reasonably believed to be outside the United States. Sovereign AI customers are, by definition, non-US entities operating outside the US — exactly the category 702 targets without individual judicial authorization. Nvidia is subject to US jurisdiction.[60] The question is not whether the operator can see your data. The question is whether the jurisdiction that controls the hardware vendor, the software stack, the model training infrastructure, and the export license can compel access if it chooses to.

The downstream sovereign models will compound the exposure. Nvidia’s license says it doesn’t claim ownership of derivative works — so the fine-tuned sovereign model legally belongs to the customer.[50] But if that fine-tuning runs on DGX Cloud, which is how the coalition base model will be trained, the model weights will transit through US-jurisdiction infrastructure during creation. An Indian government citizen services model, containing institutional data from Indian ministries, would pass through a US cloud during training.[58] Even if fine-tuning runs on-prem through an NCP, the NCP runs Nvidia’s software stack, and production deployment through NIM is governed by the NVIDIA Software License Agreement — a separate, more restrictive license than the open model license.[51] Legal ownership stays with the customer. The jurisdictional chain never fully breaks.

The sharpest inversion is the monitoring layer. A country running AI on smuggled GPUs in an air-gapped facility — China’s approach — is harder to coerce than one running on certified, NCP-deployed, Nvidia-tracked infrastructure with a clear entity chain back to Santa Clara. The “sovereign” infrastructure is the most legible infrastructure to the government whose jurisdiction the country was trying to escape. Sovereignty, as sold by Nvidia, doesn’t reduce the customer’s dependence on US-controlled chips, US-jurisdiction cloud, or US-licensed models. It makes that dependence more organized, more certified, and more visible to the one government that controls all three layers.

Then there is the question nobody in the sovereign AI program is asking publicly: who controls what happens to the models these countries will build on Nvidia’s base? The customer owns the derivative. But the derivative sits on a base model whose roadmap Nvidia controls. When Nemotron 5 ships and the ecosystem moves forward, your Nemotron 4 fine-tune ages out. You re-fine-tune on Nemotron 5 on DGX Cloud, using NeMo (Nvidia’s training framework), optimized for Nvidia’s next numerical format. The derivative is yours. The upgrade treadmill is Nvidia’s. And if those sovereign models are trained or served through DGX Cloud, the jurisdictional exposure described above applies to the infrastructure even if the model weights are nominally sovereign.[51]

The temporal lock deepens with every hardware generation. A country buys Blackwell today under current export rules. Two years from now, Vera Rubin ships with up to 10x inference throughput per watt over Blackwell — and considerably more when paired with Groq LPUs in the combined system Nvidia announced at GTC.[52] The upgrade is not optional — competitors who upgrade will run the same workloads at a fraction of the cost. But the upgrade requires a new export license, under whatever rules BIS applies at that point.[52] Every hardware generation is a re-authorization event. The initial purchase gets you in. Every upgrade asks permission again. The export control regime doesn’t need to block your current capability. It just needs to control your next one.

And this creates a leverage dynamic that no government strategy document accounts for. Nvidia doesn’t need to refuse GPU sales to any country. It needs to make standalone GPUs insufficient. A country can buy GPUs on the open market and run open-source inference — vLLM, SGLang, llama.cpp all work on Nvidia hardware without NCP certification. NCP certification itself does not contractually prohibit deploying non-Nvidia hardware alongside the certified stack. But NCP certification, NIM production deployment, Nemotron model optimization, the NeMo training framework, AI Enterprise software, and Nvidia’s forward-deployed engineering support all require entering the sovereign AI program. The open path exists.

The production-grade path at the national scale runs through the program. A country that buys GPUs without it gets hardware it can use but cannot optimize, support, or upgrade at a sovereign scale. A country that enters the program gets the capability. What it gives up takes longer to see.

What the tables show is the distance between “open” and “sovereign” — a distance that matters precisely because the product is good enough that entire national AI programs will be built on top of it.[53][56]

Mistral and Sarvam: sovereignty captured

Two coalition members make the distance concrete, because they are the countries’ own sovereignty projects — and they are being harvested.

Mistral is France’s AI exception. The company succeeded because its founders left the state apparatus, turned down the grande école pipeline’s default career path (the elite engineering schools that feed France’s corporate and government establishment), and built a model company that attracted roughly €2.8 billion in cumulative funding — with Nvidia participating in its €1.7 billion Series C at an €11.7 billion valuation.[35] Mistral is on track to exceed €1 billion in revenue.[36] It is proof that France can produce world-class AI when the system doesn’t interfere.

At GTC, Mistral did three things simultaneously, and nobody is asking which one wins. First, it released Mistral Small 4 under Apache 2.0 — a genuine open-source model, Mistral’s own.[38] That’s the founding thesis: a European frontier model lab. Second, it announced the Nemotron Coalition co-development — building the base model for Nvidia’s next commercial family, on Nvidia’s DGX Cloud, contributing what Mistral’s own blog calls “proprietary training techniques.”[32][54] That’s Mistral’s model expertise committed to Nvidia’s product line. Third, it launched Forge — an enterprise platform where Mistral sends forward-deployed engineers to embed with customers and train custom models on proprietary data. TechCrunch described the model as “borrowed from the likes of IBM and Palantir.”[37][55] Early partners include ASML, Ericsson, the European Space Agency, and Singapore’s DSO and HTX. ASML led Mistral’s Series C — and is now a Forge customer, suggesting the investment was strategic (buying a services relationship) rather than financial (buying equity upside in a model company). That’s consulting.

The trajectory is legible. Mistral Small 4 is available on Hugging Face — and as a NIM on build.nvidia.com, and customizable with Nvidia NeMo.[38] Three distribution channels, two are Nvidia’s.

Forge customers can train on Mistral’s own base models or on the Nemotron base — Mistral becomes the services layer above whichever foundation the customer’s Nvidia NCP is optimized to run. The question Mensch hasn’t answered: in two years, will Mistral’s revenue come primarily from model API calls, or from Forge engagements? If Forge wins, France’s AI champion will be a consulting firm that fine-tunes models for European enterprises on Nvidia’s infrastructure. That is a good business. It is not a sovereign model company. The state apparatus didn’t capture Mistral. Nvidia’s ecosystem did, and it may be converting France’s AI champion from a model company into a model services company, one Forge deployment at a time.

Sarvam is India’s sovereignty proof. Founded in 2023, backed by Lightspeed and Khosla Ventures, it was the first company selected to build India’s foundational AI model under the IndiaAI Mission.[39] Its 105B model was trained from scratch — not fine-tuned from Llama or Qwen — on domestic infrastructure using 4,096 government-subsidized Nvidia H100s through Yotta’s Shakti Cloud.[21] The model supports 22 Indic languages. It is a genuine technical achievement and a real sovereignty claim at the data layer.

But the training stack is Nvidia’s NeMo, Megatron-LM, and NeMo-RL. The hardware is Nvidia’s H100s. The optimization was co-engineered with Nvidia — a joint effort documented in Nvidia’s own technical blog, which reported 2x inference speedup on H100 and 4x on Blackwell.[40] And now, Sarvam has committed to contributing its sovereign-language AI expertise to the Nemotron Coalition.[33] India’s sovereign AI capability — trained on Indian data, for Indian languages, by an Indian company — will be folded into a US company’s global model line. The extraction mechanism is not hostile. It is collaborative. That makes it harder to resist.

The pattern

In “Open Source, Closed Orbit,” I described Nvidia’s ecosystem strategy using the Black Hole framework: a centripetal system where every open-source contribution, every developer tool, every community investment routes gravity back to Nvidia’s hardware.[41] The developer community thought it was building freedom. It was building a funnel.

The sovereign AI program is the same mechanism at the geopolitical scale. The NCP is the gravitational field — local infrastructure, locally operated, but architecturally bound to Nvidia’s hardware roadmap. The Nemotron models are the content that feeds the black hole — open weights, freely downloadable, optimized to run best on Nvidia silicon. The coalition is the accretion disk — eight labs, including the two most explicitly associated with national sovereignty, contributing data and expertise that strengthens Nvidia’s position at the center.

The test is simple: Does Nvidia's open model make it harder or easier to use a competitor's hardware?[42] If the Nemotron models perform equivalently to AMD's MI350X, the sovereignty claim has a technical foundation — the country could switch vendors without performance loss. If the models are optimized for Nvidia's NVFP4 format on Blackwell, the open weights are open in name and locked in practice. Nemotron 3 Ultra was pretrained "completely in NVFP4," Nvidia's proprietary numerical format, which is incompatible with AMD's MXFP4 standard.[43] The optimization is the lock-in.

We have seen this before. IBM bundled hardware, software, and services until a consent decree forced unbundling in 1969. Intel paid OEM rebates for exclusivity until the EU fined it €1.06 billion in 2009.[61] Oracle converted database lock-in through proprietary SQL extensions into a license-audit revenue machine. Microsoft bundled Windows, Office, and Internet Explorer until the DOJ filed a lawsuit.

Each company had the best product in its generation — that was never in dispute. Each converted technical excellence into ecosystem practices that made the best product also the only practical choice. And each insisted, at the peak of its dominance, that its commercial structure was simply good engineering.[62] The pattern doesn’t require malice. It requires market power, a proprietary layer that competitors can’t replicate, and a customer base that has already built too much on the stack to leave. Nvidia has all three. No antitrust finding has been issued against it — investigations are active in the US, EU, UK, and China, but none has produced a ruling.[62] The sovereign AI program is the current generation’s version of the same architecture. It’s hardly ever different this time.

Jensen Huang saw what the country series saw: structural failure, not resource failure. Every country with a government AI plan and a compute budget believed the problem was money or hardware. The problem was system architecture — talent pipelines, software culture, employment structures, and state apparatus design. Those problems don’t get solved by buying GPUs. But they do create a permanent market for buying GPUs. Jensen built a business on the insight that sovereign AI is needed and that most countries cannot build it. Both are true. That’s what makes the product work.

What sovereignty would actually require

Three things would have to change for sovereign AI to mean what the name implies.

First, hardware independence. A country would need access to non-US-jurisdiction accelerators at competitive performance: AMD on a non-CUDA software stack, domestic silicon programs at scale, or a fundamentally different compute architecture. No country in the Nemotron Coalition is attempting any of them.

Second, model independence. The Nemotron 4 base model will be open-sourced. If its license permits unrestricted deployment on non-Nvidia hardware without performance degradation, countries could fork the model and run it independently. If the model is optimized for Nvidia’s formats and the performance gap on alternative hardware is significant, the open license is a formality. The open weights are a gift certificate redeemable only at one store.

Third, training infrastructure independence. No country in the coalition has demonstrated the ability to train a frontier model without Nvidia’s software stack. Sarvam trained from scratch on it.[21] Mistral’s co-development runs on DGX Cloud.[12] The training layer is the deepest dependency because it is the least visible — the model weights are open, but the tools that produced them are Nvidia’s.

And none of these exits is stable, because the lock-in renews itself every hardware generation. The upgrade cycle described above — Vera Rubin making Blackwell noncompetitive, each generation requiring a new export license — applies to every country on the map. But the NCP certification adds a layer: want early access to Vera Rubin NVL72? Stay in the program. Maintain the reference architecture. Run the software stack. Every eighteen months, the dependency renews. Every generation is a new leverage point.

Nvidia doesn’t need to coerce. The hardware roadmap does it for him.

China is the only nation paying the full-stack sovereignty tax. The bill includes a chip gap estimated at 1.5–2 generations behind the bleeding edge, dependence on smuggling networks for critical components, content control constraints that persist through fine-tuning, and a state-directed research culture that excels at incremental improvement but struggles with discontinuous innovation.[44] The question for every country on Jensen’s map is whether its system can afford that cost. The country series answered that question, one country at a time. The answer, in every case, was no.

The honest advice, then, is not “don’t buy Nvidia.” Speed matters — a country that deploys AI-powered government services in ten months through an NCP delivers real value to its citizens years before a domestic stack could. But speed purchased at the cost of permanent dependency should be priced honestly, not marketed as independence. The advice is: enter the program with eyes open. Negotiate contract terms that preserve hardware-generation optionality — a contractual right to deploy fine-tuned models on non-Nvidia accelerators without losing NCP certification, a data portability guarantee for any model trained on DGX Cloud, and a pre-committed allocation slot for next-generation hardware that doesn’t require re-entering the full certification process. Invest in training sovereign engineers on the full stack — not just fine-tuning, but pretraining, data curation, and inference optimization — so the capability exists to evaluate alternatives when they mature. Fund domestic research into non-CUDA chips and frameworks. And stop calling it sovereignty.

A country that enters Nvidia’s sovereign AI program knowing it is buying capability and accepting dependency is making a rational strategic choice — the same choice Singapore has made for sixty years across every domain. A country that enters the program believing it is achieving independence is making a mistake that the contract terms will not forgive.

Jensen Huang saw the structural gap between what countries promised and what they could build, and turned it into a multi-billion-dollar business line. The capability is genuine. The label is not.

The pitch is “your AI, your data, your future.” The structure is: on our silicon, under our software license, trained on our cloud, optimized for our formats, certified by our partner, subject to our government’s export controls. The countries that failed to build sovereign AI are now sovereign AI customers. The vendor they depend on is the one company whose government has repeatedly demonstrated, across multiple administrations, the willingness to use technology export controls as instruments of foreign policy.[45]

Every country on that GTC map needed AI infrastructure that it couldn’t build. Every country got capability it couldn’t get elsewhere. And every country signed up for a dependency that renews with every hardware generation, deepens with every model fine-tuned on the base, and compounds with every government application deployed on the stack. The product is a subscription to Nvidia’s roadmap, billed as national independence, priced in sovereignty; the customer doesn’t realize they’re spending.

That’s not sovereignty. That’s a subscription.

Notes

[1] Nvidia, “GTC 2026: Live Updates on What’s Next in AI,” Nvidia Blog, March 16, 2026. NCP deployment doubled from 400,000 GPUs / 550 MW at GTC 2025 to 1M+ GPUs / 1.7 GW.

[2] Julien Simon, “Indians Rule Big Tech. Why Can’t India Build?,” The AI Realist.

[3] Julien Simon, “Japan Built the Bullet Train. Why Can’t It Build an LLM?,” The AI Realist.

[4] Julien Simon, “Mistral Succeeded. France’s AI Strategy Didn’t.,” The AI Realist.

[5] Julien Simon, “Build, Buy, or Download Someone Else’s Politics,” The AI Realist.

[6] Julien Simon, South Korea country piece, The AI Realist.

[7] Named NCPs drawn from Nvidia GTC 2026 announcements and Nvidia sovereign AI page. Deutsche Telekom: Industrial AI Cloud, GTC Paris 2025. SoftBank: Japan sovereign AI partnership. Orange: GTC Paris 2025. Yotta: Shakti Cloud, Nvidia case study. YTL: Malaysia NCP building ILMU models on Nemotron. Cassava Technologies: first pan-African NCP, announced GTC 2025.

[8] Nvidia, “GTC 2026: Live Updates on What’s Next in AI,” Nvidia Blog, March 16, 2026.

[9] Shashi.co, “GTC 2026: Jensen Huang’s Five Arguments for Why the AI Build-Out Is Just Getting Started,” March 16, 2026. Reports Jensen “framed this explicitly as a sovereign AI tool.”

[10] Nvidia GTC 2026 announcements. Model families: Nemotron (language/reasoning), Cosmos (physical AI), Alpamayo (autonomous vehicles), GR00T (robotics), BioNeMo (drug discovery), Earth-2 (climate).

[11] Nvidia, “NVIDIA Launches Nemotron Coalition of Leading Global AI Labs to Advance Open Frontier Models,” press release, March 16, 2026.

[12] Ibid. “The first project stemming from the coalition will be a base model codeveloped by Mistral AI and NVIDIA... trained on NVIDIA DGX Cloud... will underpin the upcoming NVIDIA Nemotron 4 family.”

[13] Ibid. Eight founding members: Black Forest Labs, Cursor, LangChain, Mistral AI, Perplexity, Reflection AI, Sarvam, Thinking Machines Lab.

[14] Nvidia investments confirmed: Mistral Series C (€1.7B round, September 2025, Nvidia among investors — TechCrunch); Reflection AI ($2B round at $8B valuation, October 2025, Nvidia investor — TechCrunch); Thinking Machines Lab ($2B seed at $12B valuation, July 2025, Nvidia investor — TechCrunch). Black Forest Labs was reported by Invezz as an Nvidia investment recipient, but it was not independently confirmed at the same level of specificity. See also TechCrunch, “Nvidia’s AI empire: A look at its top startup investments,” January 2, 2026.

[15] Techloy, “Nvidia GTC 2026: Everything Jensen Huang Announced at the Keynote,” March 17, 2026. Direct Jensen quote from the keynote.

[16] Ibid. Jensen quote per keynote reporting: “We could stand up these platforms in any country, in any air-gapped region, completely on-prem, completely on-site, completely in the field.” Exact verbatim wording could not be independently verified against a full English-language transcript; substance confirmed across multiple GTC recaps and the Palantir-Dell-Nvidia partnership announcement (Business Wire, March 12, 2026).

[17] Taipei Times, citing Reuters reporting, “US mulls regulatory framework for AI chip exports,” March 7, 2026. The March 2026 proposal was titled “AI Action Plan Implementation” — a global three-tier licensing framework requiring government approval for large GPU shipments to any country, including allies. The monitoring-and-clustering requirements quoted by Reuters (”the exporter would have to monitor them, and the recipient would have to agree to use software that would not allow the chips to be linked to other chips to form a cluster”) describe elements carried over from the Biden-era AI Diffusion Rule (January 2025, rescinded May 2025). The new rule was built on those concepts within a broader export licensing architecture.

[18] Reuters, “US Commerce Department withdraws planned rule on artificial-intelligence chip exports,” March 13, 2026.

[19] ServeTheHome, “NVIDIA GTC 2026 Keynote Live Coverage,” March 16, 2026. Jensen’s remarks paraphrased; the full 40% list included “regional clouds, sovereign clouds, enterprise, industrial, robotics, edge, big systems, supercomputing systems, small servers, enterprise servers.” The body text compresses this to the three categories most relevant to the piece’s argument.

[20] Nvidia FY2026 results: $215.9B full-year revenue, $68.1B Q4 record. Verified against SEC filing (sec.gov/Archives/edgar/data/0001045810/000104581026000019/q4fy26pr.htm). Sovereign AI revenue “more than tripled year over year to over $30 billion” per the Q4 FY2026 earnings call. Earlier calls used directional language: “high single-digit billions” (Q1 FY2025), “low double-digit billions” (Q2 FY2025), “multi-billion dollar vertical” (various). The $30B disclosure in February 2026 was the first time Nvidia quantified sovereign AI as a specific revenue figure.

[21] Sarvam 105B: trained from scratch using MoE architecture on 4,096 Nvidia H100 SXM GPUs via Yotta, subsidized at ₹99 crore under IndiaAI Mission. Sources: Business Standard, “Why Sarvam’s new 105B model marks a shift in India’s sovereign AI ambitions,” February 19, 2026; The Hans India; Nvidia developer blog.

[22] SoftBank US vs. Japan AI investment ratio from Japan country piece. Sources cited in original footnotes.

[23] Nvidia, “NVIDIA, South Korea Government and Industrial Giants Build AI Infrastructure and Ecosystem,” press release, October 31, 2025. 260,000+ GPUs across Samsung (50K), SK Group (50K), Hyundai (50K), government via NAVER Cloud/NHN Cloud/Kakao (50K+).

[24] Mistral at GTC 2026: announced co-development of Nemotron 4 base model (Nvidia press release); launched Forge (TechCrunch, “Mistral bets on ‘build-your-own AI,’” March 17, 2026); Mistral Small 4 available as NIM deployment (TestingCatalog, March 16, 2026).

[25] TechCrunch, “Mistral bets on ‘build-your-own AI,’” March 17, 2026. “Mistral has already made Forge available to partners, including Ericsson, the European Space Agency, Italian consulting company Reply, and Singapore’s DSO and HTX.”

[26] Nvidia, “GTC 2026: Live Updates on What’s Next in AI.” “SOOFI is building its foundation models using NVIDIA Nemotron 3 Nano and Super, as well as Deutsche Telekom’s industrial AI cloud.” Nvidia’s language (”using”) indicates SOOFI is leveraging Nemotron architecture, frameworks, and models as tooling to build a separate 100B-parameter sovereign foundation model — not fine-tuning Nemotron 3 Nano as a base. Deutsche Telekom’s infrastructure includes ~130 DGX B200 systems (TeckNexus).

[27] Julien Simon, “Access, Disable, Destroy,” The AI Realist. Three-switch coercion stack: chips, cloud, models.

[28] The Entity Test is defined in the sovereignty vertical of this publication’s analytical framework. It checks ownership chain, incorporation jurisdiction, personnel, contractual relationships, technical dependencies, and management control. If any factor creates a link to US jurisdiction, the CLOUD Act’s compelled disclosure provision may apply.

[29] Morrison Foerster, “Managing Export Control Risks in the AI Chip Ecosystem,” February 9, 2026. “Congress recently approved a 23 percent increase in BIS’s Fiscal Year 2026 budget, with several members explicitly signaling bipartisan support for stronger export control enforcement.”

[30] East Asia Forum, “US chip export controls have cooled down,” March 11, 2026. “On 12 February 2026, Applied Materials was fined US$252 million for illegally exporting ion implantation equipment to China — the second-largest penalty in the Bureau’s history.”

[31] Mayer Brown, “Administration Policies on Advanced AI Chips Codified,” January 22, 2026. AI OVERWATCH Act introduced by HFAC Chairman Brian Mast in December 2025; passed committee on January 22, 2026.

[32] Nvidia, “NVIDIA Launches Nemotron Coalition,” press release. Mensch quote: “Together with NVIDIA, we will take a leading role in training and advancing frontier models at scale.”

[33] Ibid. “Additional expertise includes... sovereign language AI development from Sarvam AI.” Sarvam CEO Pratyush Kumar quote: “AI reaches its full potential when it works in every language and for every community.”

[34] CLOUD Act compelled disclosure provision, 18 U.S.C. § 2713. For analysis of this provision’s application to cloud infrastructure, see Julien Simon, “Access, Disable, Destroy,” The AI Realist, and the sovereign cloud investigation published in this series. Important distinction: the CLOUD Act requires a court order — judicial process, probable cause, specific target. FISA Section 702 (50 U.S.C. § 1881a) permits warrantless collection targeting non-US persons reasonably believed to be outside the United States. For sovereign AI customers — who, by definition, are non-US entities operating outside the US — FISA 702 is the higher-risk authority because it does not require individual judicial authorization. Table 1’s legal exposure column reflects this distinction.

[35] Mistral cumulative funding: ~€105M seed (June 2023), €385M Series A (December 2023), €600M Series B (June 2024), €1.7B Series C at €11.7B valuation (September 2025). Total approximately €2.8B. Nvidia participated in the Series C. Sources: TestingCatalog, TechCrunch.

[36] Arthur Mensch, Davos, January 22, 2026: “We should exceed one billion in revenue by the end of the year” (reported by Maddyness UK, Mobile World Live, Bloomberg). Original figure was €1 billion; some English-language outlets converted to dollars. TechCrunch later restated as “$1 billion ARR” — the original quote said “revenue,” not specifically ARR. CEO-stated figure; not independently verified or filed.

[37] Ibid. Forge enables enterprises to “build custom models trained on their own data” using Mistral’s training recipes, data pipelines, and forward-deployed engineers.

[38] TestingCatalog, “Mistral and Nvidia join forces to develop open-source models,” March 16, 2026. Mistral Small 4 “live in Mistral API and AI Studio, published on Hugging Face... and offered through Nvidia’s stack as both a prototype option on build.nvidia.com and a production-ready NIM deployment.”

[39] Business Standard, February 19, 2026. Sarvam “selected as the first company to build India’s foundational AI model under the mission.”

[40] Nvidia Developer Blog, “How NVIDIA Extreme Hardware-Software Co-Design Delivered a Large Inference Boost for Sarvam AI’s Sovereign Models,” February 18, 2026. Reports 2x total inference speedup on H100 and 4x on Blackwell through joint Nvidia-Sarvam optimization. The Nvidia/Yotta customer story page separately cites “20–100% improvement in throughput compared to open-source options” — a different metric from a different source within Nvidia’s marketing ecosystem.

[41] Julien Simon, “Open Source, Closed Orbit: The Hardware Monopolist’s Guide to Owning Open Source,” The AI Realist.

[42] From the Black Hole vs. Sun framework in the AI Tooling vertical: “The key diagnostic: does the vendor’s open-source contribution make it harder or easier to use a competitor’s hardware?”

[43] SiliconAngle, “Nvidia expands open AI model portfolio and enlists partners for frontier development,” March 16, 2026. “It’s been pretrained completely in NVFP4, is the most intelligent and has two times the max throughput of the current best open base model.” NVFP4 is Nvidia’s proprietary 4-bit floating-point format, designed for the Blackwell architecture. It is distinct from MXFP4, the OCP (Open Compute Project) industry-standard microscaling FP4 format supported by AMD’s MI350X. The formats are not natively interchangeable. Models pretrained in NVFP4 achieve optimal throughput on Nvidia hardware; running them on AMD hardware requires format conversion (quantization to MXFP4 or dequantization to a higher-precision format), potentially leading to accuracy and performance degradation. The lock-in operates through the numerical format of the pretrained weights, not through the license.

[44] China’s full-stack sovereignty costs drawn from the “Access, Disable, Destroy” analysis and Chinese AI compute sourcing research. Performance gap, smuggling networks, and content control constraints are documented across these analyses.

[45] US export controls used as foreign policy instruments: documented across multiple administrations. Biden-era AI Diffusion Framework (January 2025, rescinded May 2025). Trump administration chip export relaxation to China (December 2025) and subsequent withdrawal of the replacement rule (March 13, 2026). The pattern across administrations is policy instability with consistent directional control over the hardware layer.

[46] Julien Simon, “Jensen’s COMECON: How Nvidia Built...“ The AI Realist, February 14, 2026. Maps Nvidia’s quad-role architecture (investor, supplier, guarantor, customer) and patron-satellite dependency network across the neocloud ecosystem. The Abilene follow-up (March 2026) documented the self-repair mechanism: Nvidia paid Crusoe a $150M deposit and brokered Meta as a replacement tenant to prevent AMD from filling a vacant data center.

[47] Nvidia developer page: “Nemotron models are transparent — the training data used for these models, as well as their weights, are open and available on Hugging Face... The technical reports outlining the steps necessary to recreate these models are also freely available.” The Nemotron GitHub repository (Apache 2.0 license) includes training recipes, cookbooks, and dataset catalogs. This is genuinely more open than Meta’s Llama (more restrictive license, no training data), Mistral’s proprietary models (closed weights), or any other hardware vendor’s model release. The gated portions — code, math, and multilingual pretraining data — require Nvidia’s approval but are “permissively licensed for model training purposes.” The NVIDIA Nemotron Open Model License is commercially permissive (derivative works, sublicensing, no royalties), but it is not Apache 2.0 or OSI-approved; it includes a patent retaliation clause.

[48] Specific opacity layers confirmed from Hugging Face model cards and technical reports for Nemotron 3 Nano and Super: (a) Nvidia states it “did not have reliable multilingual model-based quality classifiers” for 15 non-English languages and used heuristic filtering — heuristics and thresholds not published; (b) data mixing ratios for pretraining phases not disclosed; (c) synthetic data generation prompts and rejection criteria not published; (d) RLHF annotator guidelines not published; (e) checkpoint selection criteria not disclosed. The artifacts (weights, data, recipes) are open. The editorial decisions that shaped them are not.

[49] Hugging Face model card for NVIDIA-Nemotron-3-Nano-4B-BF16 lists synthetic post-training data generated from “DeepSeek R1/R1-0528, Qwen3-235B-A22B, Nemotron 4 340B, Qwen2.5-32B-Instruct-AWQ, Qwen2.5-14B-Instruct, Qwen 2.5 72B” as well as GPT-OSS 120B, Phi-4, QwQ-32B, and Mixtral-8x22B. The body text highlights the Chinese-origin models because their documented content controls are the sovereignty-relevant finding; the full list also includes Western models. The “positive and constructive” characterization of Qwen’s constraints comes from China Media Project, “Tokens of AI Bias,” February 9, 2026, which used thought token forcing on Qwen3 to reveal internal instructions: “Keep the answer positive and constructive. Focus on China’s achievements and contributions to the world.” For analysis of how Chinese base model constraints persist through fine-tuning and downstream use, see Julien Simon, “Build, Buy, or Download Someone Else’s Politics,” The AI Realist (Singapore piece).

[50] NVIDIA Nemotron Open Model License, Section 2: “NVIDIA does not claim ownership to any outputs generated using the Works or Derivative Works.” Section 2 grants “a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable license to reproduce, prepare Derivative Works of... sublicense, and distribute the Work and such Derivative Works.” The legal ownership of fine-tuned models is clear. The jurisdictional exposure during training and deployment is a separate question that the license does not address.

[51] The NVIDIA Nemotron Open Model License governs the model weights. The NIM container deployment is governed by the NVIDIA Software License Agreement and Product-Specific Terms for AI Products — a separate, more restrictive commercial license. This creates a two-tier licensing structure: open weights for the model, commercial terms for the production deployment path. Hugging Face model cards for Nemotron 3 Super and Nano explicitly note this distinction: “Governing Download Terms: NVIDIA Nemotron Open Model License. Governing Download Terms with NIM: NVIDIA Software License Agreement.”

[52] Vera Rubin NVL72 delivers up to 10x inference throughput per watt versus Blackwell, per Nvidia’s Vera Rubin press release (nvidianews.nvidia.com). The 35x figure cited in keynote recaps refers to the combined Vera Rubin + Groq 3 LPX system, not Vera Rubin standalone. Vera Rubin was already in production at Microsoft Azure at the time of the GTC keynote. The upgrade cycle from Blackwell to Vera Rubin represents a generational shift that makes prior-generation infrastructure economically uncompetitive for inference workloads at scale — even the standalone 10x makes Blackwell-era sovereign AI factories obsolete within 2 years.

[53] Nvidia’s Nemotron is genuinely more open than Meta’s Llama (more restrictive license, no training data), Anthropic’s Claude (closed everything), or Google’s Gemini (closed weights, closed data). The transparency is real. The argument is not about openness relative to competitors. It is about openness relative to what sovereignty requires — and on that standard, every “closed” row in the table is a governance decision made in Santa Clara that the sovereignty customer cannot see, challenge, or change.

[54] Mistral AI blog, “Mistral AI partners with NVIDIA to accelerate open frontier models,” March 16, 2026. “Mistral AI will contribute its proprietary training techniques, multimodal capabilities, and enterprise-grade fine-tuning tools — while leveraging NVIDIA’s compute, tools & technologies.” The language frames the contribution as bidirectional. The output carries Nvidia’s brand.

[55] TechCrunch, “Mistral bets on ‘build-your-own AI’ as it takes on OpenAI, Anthropic in the enterprise,” March 17, 2026. The “borrowed from the likes of IBM and Palantir” characterization is TechCrunch’s editorial framing by reporters Anna Heim and Rebecca Bellan, not a direct quote from Mistral executives. Forge includes Mistral’s forward-deployed engineers who embed with customers. Early partners: ASML (which led Mistral’s Series C), Ericsson, European Space Agency, Reply (Italian consulting), Singapore’s DSO and HTX. VentureBeat, “Mistral AI launches Forge to help companies build proprietary AI models,” March 17, 2026, confirmed the forward-deployed engineer model.

[56] For EU-based sovereign AI programs, the openness gaps in Table 2 are also AI Act compliance gaps. Under the EU AI Act, AI systems deployed in government services are classified as high-risk and require transparency into training data, evaluation methodology, and known biases. A sovereignty customer that cannot audit the filtering criteria, mixing ratios, RLHF preferences, or checkpoint selection for the base model it deploys cannot fully comply with AI Act transparency requirements for high-risk systems. The sovereignty audit and the regulatory compliance audit are the same audit, and neither can be completed with the artifacts currently published.

[57] NCP certification requires partners to run Nvidia’s reference architecture and maintain compatibility with the Nvidia software stack and hardware roadmap. The certification structure implies ongoing reporting on deployment metrics. The specific reporting cadence and data shared with Nvidia under NCP agreements are governed by the enterprise agreement terms, which are not publicly disclosed. The inference that Nvidia has visibility into aggregate workload profiles across its NCP network is structural (the certification requires stack compatibility, which requires telemetry) rather than sourced to a specific disclosed reporting requirement.

[58] For European sovereign AI customers, training data that moves from an NCP in the EU to DGX Cloud (US jurisdiction) constitutes a cross-border data transfer under GDPR. The current legal basis is the EU-US Data Privacy Framework (adequacy decision, July 2023), which rests on Executive Order 14086. As documented in this publication’s sovereignty vertical, EO 14086 can be modified through executive and administrative action — multiple steps, all within executive branch authority. A sovereign AI customer’s training data is one executive order away from lacking a legal basis for the transfer that created the model. See Julien Simon, “Access, Disable, Destroy,” The AI Realist, for the full adequacy trap analysis.

[59] The European Union Cybersecurity Certification Scheme (EUCS) is the regulatory framework that will define what “sovereign cloud” means in EU law. Draft versions debated whether to require exclusion of non-EU entities from the highest assurance levels. If EUCS passes with an exclusion requirement, NCP deployments by operators with US-entity links would fail the highest certification tier — making “sovereign AI” as sold by Nvidia legally incompatible with the EU’s own definition of sovereignty. If EUCS passes without the exclusion, EU sovereign claims become weaker for everyone. Either way, EUCS is the legal instrument that tests these claims in the European context.

[60] Important nuance on NCP legal exposure: the CLOUD Act analysis in the body applies to NCPs with US-person entities in the ownership chain. An NCP operated by a purely non-US entity (e.g., Orange in France, with no US parent) may not be directly reachable through the CLOUD Act’s standard compelled disclosure pathway. In that case, US government access would need to route through Nvidia’s software stack (which Nvidia controls as a US entity) or through the hardware export license regime — a different legal pathway than the standard Entity Test assumes. The sovereign customer’s first question should be: Does the NCP’s corporate structure include any US-person entity? If yes, the CLOUD Act applies directly. If not, the exposure is through the vendor stack, not the operator—a narrower but still real pathway.

[61] Intel EU antitrust fine: European Commission Decision C(2009) 3726, May 13, 2009. €1.06 billion for rebates to OEMs, conditional on exclusive or near-exclusive use of Intel processors. Upheld on appeal (General Court, 2014; CJEU, 2017 partial annulment; General Court reassessment, 2022 annulled the fine on procedural grounds; Commission did not re-appeal). The structural parallel is not the rebate mechanism but the pattern: genuine technical leadership converted into commercial practices that made the best product also the only practical choice. IBM consent decree: United States v. IBM, filed January 17, 1969, dropped January 8, 1982 — but the unbundling of hardware, software, and services that IBM implemented during the litigation reshaped the industry permanently.

[62] The DOJ is already investigating Nvidia. Subpoenas were issued in September 2024 examining bundling practices, exclusionary pricing, and the Run:ai acquisition. China’s SAMR announced a preliminary finding that Nvidia violated the anti-monopoly law in September 2025. The UK CMA identified Nvidia in an “interconnected web” of 90+ partnerships warranting scrutiny. France’s Autorité de la concurrence conducted dawn raids at Nvidia’s offices in Courbevoie and Sophia-Antipolis in September 2023. The investigation is active; no findings have been issued. The historical pattern suggests that by the time regulators act, the lock-in is already structural. See Julien Simon, “Open Source, Closed Orbit,” The AI Realist, for the detailed antitrust landscape.

[63] Devendra Singh Chaplot announced his departure on X on March 13, 2026 (some outlets reported March 14 due to time zones). Chaplot was part of Mistral’s founding team, where he trained Mistral 7B, Mixtral 8x7B, and Mistral Large, and led the multimodal research team that built Pixtral 12B and Pixtral Large. He also established Mistral’s US office in Palo Alto. He left Mistral in 2024–2025, briefly joined Thinking Machines Lab (Mira Murati’s startup and a fellow Nemotron Coalition member), then joined xAI/SpaceX to work on Grok model training. FinTech Weekly, “xAI’s Superintelligence Push Just Got a Co-Founder of Mistral AI,” March 14, 2026. See also Chaplot’s personal site: devendrachaplot.github.io.

Three Orbits

Julien Simon — Thu, 19 Mar 2026 11:52:19 GMT

On February 2, 2026, Elon Musk published a blog post announcing what SpaceX called the largest corporate merger in history. SpaceX would acquire xAI — his artificial intelligence company — at a combined valuation of $1.25 trillion. The stated reason: to build orbital data centers.[1] Three days earlier, SpaceX had filed an application with the Federal Communications Commission for permission to launch up to one million satellites.[2]

Twelve time zones away, in Chengdu, a company called ADA Space was running AI inference on twelve satellites it had launched the previous May. Each carried domestically designed accelerators delivering 744 trillion operations per second at INT8 precision — roughly equivalent, at that precision, to a single high-end AI server on the ground. Except that this server was not in a rack. It was distributed across 12 coordinated satellites, connected by 100-gigabit-per-second laser links. The constellation had already served its first commercial customer — the Aerospace Information Research Institute of the Chinese Academy of Sciences.[3] It was the first operational AI compute cluster in orbit. And it was Chinese.

In Mountain View, Google’s Project Suncatcher team had just published a preprint laying out the engineering requirements for space-based machine learning at scale. Their conclusion: the economics work when launch costs fall below $200 per kilogram — roughly seven to fifteen times cheaper than today, depending on the launch provider. Their projected timeline: the mid-2030s.[4]

And at Cape Canaveral, Lonestar Data Holdings was preparing to launch a one-kilogram data center to the Moon — eight terabytes of solid-state storage in a box the size of a hardback novel, riding an Intuitive Machines lunar lander.[5]

One term covers all of this. “Datacenters in space.” But these four efforts share almost nothing — not physics, not economics, not timeline, not customer. There are three fundamentally different propositions hiding inside that phrase, and the most consequential engineering and policy decisions in AI infrastructure depend on knowing which one is actually being proposed.

Earth is full

The case for computing in orbit starts on the ground.

Global data center electricity consumption is on track to exceed 1,000 terawatt-hours by the end of 2026 — roughly equivalent to Japan’s entire national electricity demand.[6] In the United States, data centers will account for nearly half of electricity demand growth through 2030.[7] Dominion Energy in northern Virginia, the world’s largest data center market, has a multi-year interconnection queue. New transmission lines take a decade to permit. New generation takes longer. Local officials have begun blocking new server farms that strain grids, consume water, and swallow land.[7]

Every terrestrial alternative has its own timeline problem. Small modular reactors are post-2030 at the earliest — a technology I will analyze in a separate piece soon.[8] Renewables at the required scale need land and storage that create their own political fights. Natural gas faces emissions scrutiny. The honest assessment: no single terrestrial solution scales fast enough to meet projected AI compute demand in the 2028–2035 window.

This is why serious organizations are spending serious money on orbital compute. Google has committed research resources and a 2027 prototype launch. The European Commission funded a sixteen-month feasibility study. China has hardware in orbit. Nvidia just announced a purpose-built AI chip for orbital data centers — the Vera Rubin Space Module — at its annual conference.[9]

In a sun-synchronous dawn-dusk orbit, the satellite perpetually rides the boundary between day and night on Earth, so its solar panels face the Sun almost continuously. In this configuration, a solar panel generates up to eight times more energy per unit area than a typical ground-based solar panel. No night cycle, no clouds, no atmospheric absorption, 36 percent stronger sunlight above the atmosphere.[10] That is not marketing. That is orbital mechanics.

But acknowledging the pull does not validate the specific claims being made — or the timelines attached to them. And this is where the analysis requires a taxonomy that the current debate lacks. Three orbits. Three physics constraints. Three cost structures. Three radically different timelines. The rest of this piece walks through each orbit and shows what each announcement about “datacenters in space” actually claims — and what the engineering actually supports.

Orbit 1: The sidecar

The most viable form of orbital compute is the one getting the least attention.

SpaceX’s next-generation Starlink V3 satellites, expected to begin launching in the first half of 2026 aboard the Starship rocket (itself still in development), are not dumb transponders. Each carries an Xsight Labs X2 networking chip capable of processing 12.8 terabits per second — a specialized processor that routes data between satellites via laser links, and steers signals down to users on the ground.[11] At roughly 1,250 kilograms, with an estimated 10 to 20 kilowatts of solar power and over one terabit per second of downlink capacity, V3 represents a tenfold leap over the current V2 Mini generation.[12] The V3 deployment timeline is coupled to Starship’s maturation — if Starship continues to experience delays, Orbit 1 slides with it.

The step from a networking chip to an AI inference chip is small on a satellite that already manages power and thermal dissipation. SpaceX has already solved the hard parts of operating computing hardware in orbit — at a production rate of over 2,300 satellites launched in 2025 alone.[13] Adding a low-power inference chip to each V3 is a modest hardware decision, not a new space program. The distinction between inference and training matters for everything that follows: inference answers questions from a pre-trained model using a single chip at modest power; training builds the model in the first place, requiring thousands of chips in tight synchronization for weeks. Orbit 1 is an inference play. Orbit 2 claims to be a training play. The physics of each is fundamentally different.

The economics of piggybacking are compelling. A satellite that already costs several hundred thousand dollars to manufacture and launch absorbs the marginal cost of a low-power inference accelerator without a fundamental redesign. On March 16, 2026, Nvidia announced the Vera Rubin Space Module at its annual GTC developer conference, a variant of its next-generation AI architecture designed specifically for orbital data centers, delivering, the company claims, 25 times the inference performance of the H100 that Starcloud flew last November.[52] The supply side of Orbit 1 just materialized as a named product on a roadmap.

Adding compute to a mass-optimized satellite increases thermal load on an already-tight power budget. But allocating two to three kilowatts of an estimated 10-to-20-kilowatt envelope to inference does not gut the communications mission — the costs are manageable within an existing satellite program’s budget, not a separate capital campaign. The constraint is model size: at two to three kilowatts, a sidecar chip can run models in the single-digit-billion-parameter range — useful for classification, summarization, and image analysis, but not for the largest frontier models. Nvidia’s Vera Rubin Space Module may push that ceiling higher, but the power envelope, not the chip, is the binding limit.

This is the evolution of content delivery networks applied to AI. Akamai didn’t replace the websites it served — it cached copies closer to users so a customer in Tokyo didn’t have to fetch data from Virginia. A Starlink inference layer would work the same way: not replacing the training cluster in Iowa but serving pre-trained models from overhead, with as little as four milliseconds of one-way propagation delay. A new computing tier — global, always overhead, and available to the three billion people who live beyond reliable terrestrial broadband.[59]

The concept is no longer theoretical. ADA Space’s twelve-satellite cluster in China runs an eight-billion-parameter model in orbit and has served its first commercial customer.[14] Starcloud flew a single H100 in orbit last year and has since filed with the FCC for an 88,000-satellite constellation.[64] HPE’s Spaceborne Computer — built by Hewlett Packard Enterprise — has logged over 2,000 days on the International Space Station, demonstrating a 20,000-to-one data reduction ratio.[15] In January 2026, Kepler Communications launched ten optical relay satellites with multi-GPU compute, and Axiom Space bought two orbital data center nodes on the network.[53][54] Orbit 1 is a category in formation, with operational hardware from at least four organizations across three countries. The first flights have happened. The chip vendor just showed up.

The SpaceX FCC filing, read charitably, describes this architecture. It specifies “distributed processing nodes, specifically optimized for large-scale AI inference” that could be built “simply by scaling up the Starlink V3 satellites.”[16] As a natural extension of an existing constellation with an existing manufacturing line, existing launch schedule, and existing customer base, Orbit 1 is commercially plausible within two to four years — a timeline based on SpaceX’s V3 development cadence and Nvidia’s product announcements, not on independent analysis.

Orbit 1 is being conflated with something far more ambitious — and the engineering gap between them is the subject of this piece.

Orbit 2: The megastructure

The headline narrative — the one behind the million-satellite filing — is gigawatt-scale training clusters in orbit, powered by solar arrays measured in square kilometers. It faces a physics constraint that no amount of venture capital can repeal. But even the realistic near-term version, at a fraction of that scale, faces a cost problem that its proponents have not yet solved.

In a vacuum, heat can only be radiated away. The Stefan-Boltzmann law — which governs how objects radiate heat — dictates that at the temperatures at which electronics operate, a radiator surface emits roughly 100 to 350 watts per square meter.[17] The International Space Station’s active thermal control system dissipates approximately 70 kilowatts through ammonia-cooling loops and radiator panels — enough to cool roughly 100 high-end GPUs, or about 12 server racks.[18] A single modern AI-optimized rack can draw 100 to 200 kilowatts, meaning the ISS’s entire thermal budget could not cool a handful of them. A gigawatt-class data center — the scale that the EU-funded ASCEND project, a European consortium pursuing orbital data sovereignty, aims to reach by 2050 — would require a radiator surface area approaching one square kilometer, weighing thousands of tons.[19]

But no credible player is targeting a centralized gigawatt before 2050. ASCEND’s near-term milestone is 10 megawatts by 2036 — a hundred times smaller. The thermal math is more forgiving at that scale: roughly 30,000 square meters of radiator area, weighing approximately 80 tons.[19] That is still the largest structure ever assembled in orbit by an order of magnitude — but it is an engineering program, not a physics impossibility. The problem is what 10 megawatts buys you: roughly 5,000 to 8,000 high-end GPUs at rack-level power density. Enough for fine-tuning, smaller model training, and meaningful inference — but not a frontier training cluster, which currently requires 16,000 to 100,000 GPUs. At 10 megawatts, Orbit 2 is a subscale facility that costs three times as much as a terrestrial facility of the same size. The thermal wall at the gigawatt scale is a physics wall. At a 10 megawatt scale, it becomes a cost wall — and the cost case has not been made.

The distributed alternative

SpaceX’s FCC filing proposes a different path: not one big facility but a million distributed satellites at roughly 100 kilowatts each, aggregating to 100 gigawatts.[60] At the per-satellite level, the thermal math is manageable—roughly 40 to 70 square meters of radiator per satellite —feasible as a subsystem.[61] The full satellite at 100 kilowatts, however, would require 90 to 170 square meters of total deployed area, including solar arrays — three to six times a current Starlink — making it a significant redesign, not a sidecar modification.

The distributed architecture dissolves the thermal wall by refusing to concentrate it. But it runs into a different wall: interconnect. Frontier model training requires thousands of GPUs working in microsecond-level synchronization across shared memory — the kind of tight coupling that technologies like NVLink and InfiniBand — the high-speed wiring between GPUs inside a data center — provide on the ground.[63] Spread those GPUs across satellites connected by laser links at millisecond latencies — three orders of magnitude slower —, and you cannot train a frontier model. Google’s Suncatcher paper uses an 81-satellite cluster as a reference architecture for cost modeling and demonstrated 800-gigabit-per-second optical links in a lab.[21] Raw bandwidth at that speed is comparable to a single InfiniBand link between servers in a terrestrial cluster — but the latency is a thousand times worse, and in this comparison it is the latency, not the bandwidth, that blocks training. This is why Suncatcher’s own paper flags thermal management, not interconnect, as the binding constraint: the team believes the bandwidth problem is tractable while the thermal problem is not. At millisecond latencies, training must be restructured so aggressively that the result is no longer a substitute for terrestrial clusters — a problem the paper does not address.

Techniques for relaxing these constraints exist. DeepMind’s DiLoCo demonstrated training across poorly connected nodes by synchronizing only every 500 optimization steps instead of every step, reducing communication frequency by roughly 500 times.[65] Less frequent synchronization makes high latency more tolerable: if nodes only need to exchange data every few minutes rather than every few milliseconds, a thousand-fold latency gap matters less. But removing the interconnect wall does not remove the cost wall. DiLoCo trades communication for compute — each worker runs hundreds of local optimization steps between synchronizations — which compounds the cost disadvantage of orbital hardware, estimated at 3x that of terrestrial hardware. Solving the interconnect problem still leaves you training in orbit at a multiple of the cost on the ground, with more difficult maintenance, shorter hardware lifespans, and no way to swap a failed GPU without a launch.

What distributed orbital compute can do is run inference, fine-tune smaller models, and process satellite imagery — all of which Orbit 1 piggybacks on distributed satellites, scaled to enormous size. The distributed architecture is not a path to Orbit 2. It is Orbit 1 at the constellation scale, marketed as Orbit 2.

Google’s Suncatcher radiation testing reinforces the case for Orbit 1: Trillium TPU chips — Google’s custom AI processors — survived a proton beam simulating five years of low-Earth-orbit exposure with no permanent failures, suggesting commercial AI processors are more radiation-tolerant than assumed.[20] But thermal management remains, in the paper’s own words, “a critical optimization challenge.”[22]

The economics

The economics hinge entirely on a launch cost threshold that does not yet exist. Google projects that costs must fall below $200 per kilogram for orbital data centers to approach terrestrial competitiveness. The current range is $1,500 to $2,900 per kilogram.[23] An academic analysis of 3,207 satellites launched between 2000 and 2020 found that launch costs declined at 4.4 percent annually — a trajectory that reaches $200 per kilogram between 2045 and 2076, substantially slower than SpaceX’s promotional timeline.[24]

SpaceX’s Starship could accelerate that curve if it achieves full reusability, flights every week or more, and 100-to-200-ton payloads. None of these has been demonstrated. Starship has flown eleven times since 2023, with mixed results — booster catches succeeded, but two upper stages were lost during ascent, and no flight has yet reached a full orbital trajectory.[25] SpaceX’s historical learning rate on Falcon 9 is steeper than the academic average, and Starship may follow the same pattern — a concession worth making. But even at $200 per kilogram, an independent analysis by Andrew McCalip of Varda Space Industries shows that orbital data centers cost approximately three times as much as terrestrial equivalents.[26]

Launch cost is necessary but not sufficient — it must be accompanied by breakthroughs in thermal management, radiation tolerance, and on-orbit maintenance. Power generation is cheap in orbit — orbital compute startups estimate energy costs 10 to 50 times lower per kilowatt-hour than terrestrial grid power — but it is a minor fraction of total ownership costs. Launch, hardware refresh, and servicing dominate the economics, and those costs have no equivalent to the solar advantage. Rockets get cheaper by manufacturing iteration. Radiators improve through physics research — a slower, less predictable curve.

The hardware refresh problem compounds the difficulty. GPU architectures improve on a 2- to 3-year cycle. On Earth, a rack swap takes hours. In orbit, every upgrade is a launch. A chip launched in 2028 is architecturally obsolete by 2031, but must still operate to amortize its deployment cost. DARPA — the Pentagon’s advanced research agency — is launching its Mission Robotic Vehicle in 2026 in the first demonstration of robotic satellite servicing, but replacing compute modules is years beyond even that milestone.[27]

Orbit 2 has a direct historical precedent: space-based solar power, studied since Peter Glaser’s 1973 patent. In the more than fifty years since, $879 million across 157 projects has produced zero operational systems, and NASA has found orbital solar still twelve to eighty times more expensive than ground-based alternatives.[28][29] The barriers — extreme launch costs, kilometer-scale structures, hardware degradation, and the relentless improvement of cheaper terrestrial alternatives — are the same four barriers facing any centralized Orbit 2 architecture. The distributed alternative dissolves the second barrier but introduces others: interconnect latency that blocks training, constellation management at a million-satellite scale, and hardware refresh economics that compound across the fleet.

The critical difference is scalability. Space solar requires massive structures before producing any useful output; orbital compute can scale incrementally, as Orbit 1 demonstrates. But Orbit 2 at any scale that justifies the narrative — gigawatt-scale orbital compute, a target that Jeff Bezos has also endorsed with a “ten-plus years” timeline[62] — requires exactly the kind of massive structures that space-based solar power has failed to build for half a century. At 10 megawatts, the physics becomes manageable, but McCalip’s three-to-one cost ratio holds. At the gigawatt scale, the physics itself is the barrier. Musk’s claim that orbital data centers will be the cheapest way to generate AI compute “within two to three years” is not supported by any published engineering analysis — not Google’s, not NASA’s, not the independent assessments.[30] It is an assertion without a thermal management plan, a launch cost trajectory, or a servicing architecture.

Orbit 3: The vault

From off-site backup to off-planet backup.

Lonestar Data Holdings is the only company attempting this. Founded by Chris Stott — a twenty-year veteran of international spectrum regulation who co-founded ManSat, a satellite licensing firm — the company has flown two proof-of-concept payloads to the Moon on Intuitive Machines landers.[5] Its pitch: disaster recovery and sovereign data storage at a distance no terrestrial threat can reach. The Moon inverts several of the key constraints that make Orbit 2 so difficult.

Lunar lava tubes — underground channels carved by ancient volcanic flows — maintain a constant temperature of roughly minus 20 degrees Celsius.[31] If pressurized, these tubes would allow convection-based cooling — air and liquid carrying heat away from processors, as on Earth — unlike the pure vacuum of orbital space. Pressurizing a lava tube section is itself a formidable challenge: the largest pressurized structure ever built in space is the International Space Station at 916 cubic meters, and lunar lava tubes can be hundreds of meters across. Sourcing coolant from lunar ice extraction — itself undemonstrated at an industrial scale — adds another layer of infrastructure dependency. The physics favors the Moon; the engineering to exploit it does not yet exist.

Even on the open lunar surface, permanently shadowed craters at the south pole reach temperatures as low as 40 Kelvin, minus 233 degrees Celsius, colder than the surface of Pluto. The thermal wall that dominates Orbit 2 becomes, on the Moon, an engineering advantage — though one that fully materializes only when hardware moves into pressurized lunar structures.

Lonestar’s long-term vision targets exactly that: racks humming in the permanent dark of a lava tube. But the company cannot wait for lunar pressurization to become viable, and recent experience — the IM-2 lander tipped over on arrival — demonstrates that even landing on the surface remains unreliable. So the near-term architecture has moved off the surface entirely, to the L1 Lagrange point — a gravitational equilibrium between Earth and the Moon where a spacecraft can park with minimal fuel.[33] L1 is unstable, requiring periodic thruster corrections, yet it remains the cheapest long-term address between Earth and the Moon. A spacecraft there experiences only four hours of shade every 90 days — solving the lunar night problem, since any fixed point on the Moon’s surface endures fourteen days of continuous darkness. The company plans to launch six data storage spacecraft at Lunar L1 between 2027 and 2030, each carrying multi-petabyte storage and edge processing capabilities.[34]

The latency is disqualifying for real-time applications and irrelevant for the actual use case. Earth-Moon round-trip communication takes approximately 2.56 seconds — permanently eliminating interactive workloads. But Lonestar is not selling low-latency inference. It is selling disaster recovery and sovereign data storage, workloads where high latency is a feature, not a penalty. A ransomware attack cannot encrypt data that sits 384,000 kilometers away with a 2.56-second speed-of-light delay per query.[32]

Disaster recovery planning has gone from Cold War bunkers to colocation facilities to cloud regions to Amazon Glacier. The next step is 384,000 kilometers away, and the temperature is 40 Kelvin. And unlike a terrestrial data center — or even a LEO satellite vulnerable to anti-satellite weapons and debris cascades — it is hard for Iranian drones to hit the Moon.

Sovereignty in orbit

The sovereignty angle has the sharpest implications. Under the Outer Space Treaty, no nation may claim territorial sovereignty over space, but the state of registry retains jurisdiction and control over its space objects.[35] Stott has secured radio frequency rights through the United Kingdom — the regulatory filings that establish who is authorized to communicate with a spacecraft, specifically to establish clear jurisdictional provenance.[36] A UK-registered data storage spacecraft at the Moon’s L1 point is under UK jurisdiction. No transit through other nations’ airspace. No ground station on foreign soil is required for the data itself. But the same treaty provision that protects also exposes: Article VI requires the registering state to authorize and continuously supervise national space activities, giving the UK legal authority to compel access under its own national security laws.

The sovereignty claims are weaker than they appear. ASCEND explicitly claims that orbital data centers would be exempt from the CLOUD Act.[37] But Hewlett Packard Enterprise, a US company, is a named consortium partner responsible for the hardware. The CLOUD Act’s compelled disclosure provision reaches data in the “possession, custody, or control” of any provider subject to US jurisdiction, regardless of where it physically sits.[38] If HPE personnel have administrative access, the data is within reach of US legal process. Placing a server at an altitude of 1,400 kilometers does not change the company's corporate domicile. The test is the same one that applies to terrestrial sovereign cloud claims: trace the ownership chain, the personnel with technical access, and the contractual dependencies. The conclusion applies equally in orbit — only removing the US entity from the chain breaks the chain.

The same test applies to Lonestar. The company is US-incorporated. Its CEO is a US citizen. Its payloads launch on SpaceX rockets from US soil. The UK frequency registration creates a jurisdictional argument — the strongest of any current orbital compute venture —, but it has never been tested in court. The most thoughtfully designed sovereignty claim in orbital compute remains untested in law.

Orbit 3’s customer base is narrow but real: government agencies requiring disaster recovery beyond terrestrial threats, sovereign data vaults, and — Lonestar’s most evocative pitch — civilizational backup. The company transmitted the U.S. Declaration of Independence to the Moon and back as a proof-of-concept.[39] The idea that humanity’s most important data should not be stored exclusively on the planet most capable of destroying it is not frivolous, even if the current implementation is a one-kilogram SSD on a lander that tipped over.

Three propositions. Three altitudes. Three levels of physical plausibility.

The taxonomy sorts the claims. The roster below sorts the claimants.

The cost of conflation

If any organization can build Orbit 1, it is SpaceX. The company launches more mass to orbit than all other providers combined, manufactures satellites at a pace no competitor matches, and operates the only constellation with the scale and laser-link infrastructure to support distributed orbital compute. The Starlink business is real: approximately $8 billion in profit on $15 to $16 billion of revenue in 2025, as estimated by Reuters.[40] SpaceX is weighing an IPO at valuations as high as $1.5 trillion, with orbital data centers cited as a primary driver.[41][42][43] Nvidia has announced a purpose-built chip for the architecture. The engineering credibility is earned. The question is not whether SpaceX can add inference chips to Starlink satellites — it almost certainly can — but whether the announced ambition matches what the physics supports on the announced timeline.

SpaceX’s structural advantage extends beyond its own constellation. Every non-Chinese orbital compute venture launches on SpaceX rockets. Starcloud flew its H100 on a SpaceX rideshare. Kepler’s ten satellites launched on Falcon 9. Lonestar rides Intuitive Machines landers — which launch on Falcon 9. Even Amazon, SpaceX’s fiercest competitor, was forced to book Falcon 9 launches for its Kuiper constellation after Blue Origin’s New Glenn and ULA’s Vulcan could not deliver the cadence to meet FCC deployment deadlines — a decision so contentious that a pension fund sued Amazon’s board, alleging that a personal rivalry with Musk had delayed a cost-effective launch contract.[66] Blue Origin has flown New Glenn twice; SpaceX launched 166 Falcon 9 missions in 2025 alone. No other Western provider is within an order of magnitude of the launch cadence that orbital compute at scale requires. China is the structural exception: ADA Space launched on a Long March 2D rocket from Chinese soil. The full-stack sovereignty in China’s orbital program includes the layer most Western ventures take for granted — the ride up. For everyone else, SpaceX’s launch monopoly is simultaneously its deepest competitive moat and the orbital compute ecosystem’s most concentrated single point of failure.

The gap between what has been demonstrated and what has been announced is not a percentage — it is the distance between a few dozen operational nodes and a million.[44][45][46] Musk claims orbital compute will be the cheapest way to generate AI within two to three years.[30] Google’s independent engineering analysis says the mid-2030s.[50] Gartner calls it “peak insanity.”[51]

The demonstrated capability — ADA Space’s twelve satellites, Starcloud’s single H100 in orbit, HPE’s Spaceborne Computer on the ISS — confirms that Orbit 1 works in principle.[14][15][53][54] It does not confirm that Orbit 1 scales to a million satellites, or that distributed inference at orbital cost undercuts terrestrial edge computing, or that any version of Orbit 2 arrives before 2035.

SpaceX’s own FCC filing illustrates the conflation in a single document: one section describes Orbit 1 architecture — “distributed processing nodes” built by “scaling up Starlink V3 satellites” — while another references becoming “a Kardashev II-level civilization,” a theoretical framework in which a civilization harnesses the entire energy output of its star.[2] The near-term engineering is plausible precisely because it is modest. If SpaceX is building Orbit 1 — distributed inference at a global scale — the taxonomy validates the plan. The timeline for everything beyond inference remains unsupported by any published engineering analysis — including SpaceX’s own filing, which contains no deployment schedule, cost estimate, or thermal management plan.

The honest timelines reinforce the distinction. Google’s Suncatcher is explicitly labeled a “moonshot,” with a prototype launch in 2027 and economic viability projected for the mid-2030s.[47] ASCEND targets 10 megawatts of orbital capacity by 2036 and 1 gigawatt by 2050.[48] These are honest timelines from organizations publishing open engineering assessments. What is not supportable — from any source — is presenting Orbit 1 (inference piggybacking on communication satellites, viable within years) and Orbit 2 (gigawatt-scale training in vacuum, decade-plus away if ever) as the same proposition on the same timeline. The cost of this conflation falls on specific people making specific decisions: the infrastructure investor who underwrites datacenter-scale power for an orbital facility that will run inference, not training; the government that funds an Orbit 2 feasibility study when it should be contracting Orbit 1 hardware.

What would have to break

Each orbit has different conditions for commercial viability, and conflating them makes all three harder to evaluate.

Orbit 1 requires radiation-tolerant inference accelerators — a condition that Nvidia’s Vera Rubin Space Module announcement substantially de-risks. What remains is execution: SpaceX must allocate mass and power on V3 satellites to compute payloads, and V3 deployment is coupled to Starship’s still-uncertain maturation. Customers must pay a premium for inference served from orbit rather than terrestrial edge locations — defense and intelligence applications are the likeliest early market, with In-Q-Tel already investing in the space-compute ecosystem.[49] Timeline: two to four years for initial capability. This is an author’s estimate, not an independent projection. Initial capability is not the same as a profitable business — the premium over terrestrial edge inference remains unproven. At the million-satellite scale, orbital debris and regulatory constraints — the American Astronomical Society warned that SpaceX’s filing represents a hundred-fold increase in the satellite population[45] — become their own limiting factor.

Orbit 2 requires three independent advances arriving in roughly the same window: launch costs approaching $200 per kilogram (requiring Starship at full reusability), thermal management at megawatt scale (no major NASA or DARPA program targets this for data centers), and on-orbit servicing mature enough to swap compute modules every two to three years (first robotic demo launches in 2026). Google places the convergence in the mid-2030s under optimistic assumptions.[50] Gartner’s Bill Ray labeled the concept “peak insanity.”[51] Timeline: post-2035 at the earliest.

Orbit 3 requires reliable delivery to cislunar space — the region between Earth and the Moon — including mature launch and landing systems (Artemis program timelines remain uncertain; the IM-2 lander tipped over on arrival), a customer base willing to pay a substantial premium for 2.56-second-latency disaster recovery, and the legal architecture to hold under jurisdictional challenge. The market is small, but the margins could be high, and the competitive moat — you need a rocket to reach the data — is unlike any in terrestrial computing. Niche commercial services by 2028–2030 if Lonestar’s L1 constellation deploys on schedule.

China’s structural exception

China was the first nation to deploy a dedicated orbital AI compute constellation, and it is not stopping at twelve satellites. ADA Space plans to triple its constellation this year to 50 satellites and reach 1,000 by 2032.[55] The Astro-Future Institute, backed by Lenovo and the Beijing municipal government, is pursuing a sixteen-spacecraft constellation of laser-linked data centers at a gigawatt scale — an Orbit 2 play from the only country that has demonstrated Orbit 1 and is simultaneously pursuing Orbit 2.[56] The state-owned China Aerospace Science and Technology Corporation has written gigawatt-scale space computing infrastructure into the Fifteenth Five-Year Plan.[57] And at the Chinese University of Hong Kong, a single satellite is running a version of the DeepSeek model in orbit, processing data from other Hong Kong-built spacecraft.[58] This is not one company’s experiment. It is an industrial strategy executed across state-owned enterprises, commercial startups, provincial governments, and universities — all built on domestically designed accelerators specifically to exit the dependency structure that Western chip export controls create.

The full-stack exit is visible in the hardware chain: domestically designed chips, on domestically manufactured satellites, launched on domestically built rockets, communicating through domestically operated ground stations — no layer where a foreign government holds an off switch. China’s orbital compute program is not a moonshot. It is a sovereignty play executed in hardware — the coercion stack routed around by going up.

The taxonomy applies beyond any single company. The boundaries between the three orbits are not always sharp — a V3 satellite with inference chips could, over successive generations, shade into a more capable compute node. But the physics does the sorting even when the announcements don’t. Better chips do not fix the latency wall; that gap is set by the speed of light between satellites, not by the silicon on them. If the claim rests on incremental watts piggybacking on an existing thermal budget, that is Orbit 1. If it requires purpose-built megawatt radiator arrays in a vacuum, that is Orbit 2. If the value proposition is isolation rather than performance, that is Orbit 3. An announcement that blurs the three is, at best, imprecise. At worst, it is claiming the feasibility of one orbit on the timeline of another.

Orbital compute is coming. The sidecar is an engineering program on a proven platform. The megastructure is a research frontier that may never close. The vault is a niche that inverts the physics everyone else is fighting. Three propositions, three altitudes, three levels of physical plausibility — and the most consequential decisions in AI infrastructure depend on knowing which orbit the physics actually supports.

Notes

[1] SpaceX blog post, February 2, 2026. Musk stated the merger’s primary purpose was to build “orbital data centers.” CNBC confirmed the $1.25 trillion valuation on February 3, 2026, with SpaceX valued at $1 trillion and xAI at $250 billion. “Largest corporate merger” by the nominal value of the acquired entity (~$250 billion in stock for xAI). The combined entity valuation ($1.25 trillion) is the post-merger market cap, not the transaction value. CNBC · SpaceNews

[2] FCC application SAT-LOA-20260108-00016, filed January 30, 2026. The application requests authorization for a system of up to one million satellites at 500–2,000 km altitude for orbital data center operations. SpaceNews

[3] ADA Space (listed on Hong Kong Stock Exchange, February 2025) launched 12 satellites of its “Three-Body Computing Constellation” on May 14, 2025, via Long March 2D from Jiuquan. Each provides 744 TOPS; cluster delivers 5 petaops combined with 100 Gbps inter-satellite laser links. Precision level (INT8/FP16/FP32) not specified in available English-language sources; at INT8, 744 TOPS is comparable to a single H100; at FP16, substantially less. First customer: Aerospace Information Research Institute of the Chinese Academy of Sciences. DataCenterDynamics, May 2025; Global Times confirmation. DCD · SpaceNews

[4] Google Research preprint, “Towards a future space-based, highly scalable AI infrastructure system design,” November 2025. Authors include Blaise Agüera y Arcas and James Manyika. The $200/kg threshold is derived from extrapolation of historical launch pricing data at ~20% learning rate, projected at ~180 Starship launches/year. Not yet peer-reviewed. Preprint · Google Blog

[5] Lonestar Data Holdings, “Freedom” payload. 1 kg data center with 8 TB Phison Pascari enterprise SSD and Microchip PolarFire FPGA edge processor. Launched aboard Intuitive Machines IM-2 on SpaceX Falcon 9, February 26, 2025. Lander tipped over on lunar surface but Lonestar reported successful cislunar data storage and edge processing tests prior to landing. PR Newswire, February 2025; IEEE Spectrum, February 2025. IEEE Spectrum

[6] International Energy Agency, “Electricity 2024” report. Global data center electricity consumption projected to exceed 1,000 TWh by 2026, approximately equal to Japan’s total electricity consumption. IEA

[7] Multiple sources on terrestrial data center constraints. Dominion Energy interconnection queue: industry reporting. US data centers accounting for nearly half of electricity demand growth: IEA and Goldman Sachs projections, 2024–2025.

[8] The SMR piece (”The Half-Life of a Press Release”) is in development. Central thesis: SMRs are a post-2030 technology marketed as a solution to a 2025–2028 crisis.

[9] Nvidia announced the Vera Rubin Space Module at GTC 2026, San Jose, March 16, 2026. Designed for orbital data centers, geospatial intelligence, and autonomous space operations. Company claims 25x inference performance over H100 for space-based workloads. Also highlighted IGX Thor and Jetson Orin as current orbital inference platforms. CEO Jensen Huang: “With our partners, we’re extending NVIDIA beyond our planet.” Yahoo Finance, March 16, 2026. Nvidia had previously posted a role for “Orbital Data Center System Architect” at $224,000–$356,500 base salary (DataCenterDynamics, March 2026). Nvidia GTC 2026

[10] Google Suncatcher preprint. Solar panels in sun-synchronous orbit can be “up to 8 times more productive than on earth” due to continuous sunlight, no atmospheric absorption, and 36% higher solar irradiance in Earth orbit vs. surface. Preprint

[11] Xsight Labs X2 12.8 Tbps programmable Ethernet switch, TSMC N5, sub-200W. Selected as networking core for Starlink V3 satellites. ServeTheHome, December 2025; Calcalist Tech, December 2025. Michael Nicolls, VP Starlink Engineering, confirmed in press release. Note: SpaceX does not publish official V3 satellite specifications. Chip selection and performance figures are from industry reporting and the chip manufacturer’s disclosures, not from SpaceX directly. ServeTheHome

[12] Starlink V3 specifications estimated from industry reporting: ~1,250 kg mass, over 1 Tbps downlink capacity, over 200 Gbps uplink, sub-20ms latency, estimated 10–20 kW solar power. More than 10× downlink and 24× uplink capacity vs. V2 Mini. Sources include Gear Musk, NextBigFuture, October–December 2025. SpaceX has not published official V3 specifications. Mass, power, and capacity figures are industry estimates, not official disclosures.

[13] SpaceX launch cadence: over 2,300 Starlink satellites launched in the past year, deploying over 5 Tbps of capacity per week. Starlink filings and public disclosures, mid-2025.

[14] ADA Space successfully ran an 8-billion-parameter model on orbit. Second batch of 12 satellites (”Liangxi”) with 4× computing power announced July 2025. Beijing’s three-phase plan: 200 kW with 1,000 petaops by 2027. Global Times, DataCenterDynamics, CNTechNews. Performance figures per ADA Space; independent on-orbit verification of compute throughput, thermal margins, and inter-satellite coherency has not been published. DCD

[15] HPE Spaceborne Computer: over 2,000 cumulative days of COTS computing on the ISS across three iterations (SBC-1 launched 2017, SBC-2 launched 2021, SBC-3), running 39+ experiments. Not continuous operation of a single system. DNA sequence data compression result: 1.8 GB reduced to 92 KB, 12.2 hours of downlink replaced by 2-second transmission. HPE press release, April 2022; ISS National Lab. HPE

[16] SpaceX FCC filing, January 30, 2026. Quotes from application text and Elon Musk’s accompanying statements. SatNews reporting, January 31, 2026. SpaceNews

[17] Stefan-Boltzmann law application to space radiators. At 300–350 K operating temperatures, ideal blackbody radiates 520–850 W/m². Practical rates after emissivity, view factors, and solar back-loading: 100–350 W/m². NASA thermal control documentation; Per Aspera, “Realities of Space-Based Compute,” 2025; multiple engineering analyses. Per Aspera

[18] ISS Active Thermal Control System: rated at approximately 70 kW thermal rejection capacity via ammonia cooling loops and external radiator panels totaling ~422 m². NASA technical documentation. At ~700W TDP per H200 GPU, 70 kW cools ~100 GPUs or ~12.5 standard 8-GPU racks. Actual cooling overhead means fewer in practice.

[19] Author’s calculation and Per Aspera analysis. At 200–350 W/m² practical rejection rate, 600 MW of waste heat (from a 1 GW facility at ~60% compute efficiency) requires radiator surface on the order of one million square meters. Mass at ISS radiator density (~2.7 kg/m²): thousands of tons. The precise figure depends on radiator technology and operating temperature; the order of magnitude does not. At 10 MW scale: ~6 MW waste heat at ~60% efficiency, requiring ~30,000 m² of radiator at 200 W/m², weighing ~80 tons at ISS panel density. GPU count estimate (5,000–8,000) assumes rack-level power density of 1.2–2.0 kW per GPU including networking, cooling, and power conversion overhead — not chip TDP alone.

[20] Google Suncatcher preprint. Trillium TPU v6e tested under 67 MeV proton beam. No hard failures attributable to TID up to 15 krad(Si). Shielded 5-year mission dose estimated at ~750 rad(Si). HBM showed irregularities after 2 krad — nearly 3× the minimum threshold but the most sensitive component. Preprint

[21] 800 Gbps bidirectional optical links achieved in lab setting using commercial DWDM transceivers across short free-space path. Not demonstrated in orbit. Google Suncatcher preprint. Preprint

[22] Google Suncatcher preprint, stated limitations section. Thermal management described as requiring future experimental validation. Preprint

[23] Current LEO launch costs: $1,500–$2,900/kg depending on vehicle and orbit requirements. Google Suncatcher preprint cites this range and projects $200/kg by mid-2030s. Preprint

[24] Academic analysis of LEO satellite launch costs, Economics Bulletin (2022). Study of 3,207 satellites launched 2000–2020 found average commercial costs declining at 4.4% annually. Extrapolation to $200/kg yields mid-2040s to mid-2070s range depending on methodology. ResearchGate. ResearchGate

[25] Starship flight history: flights 7 and 8 did not complete primary test objectives. Flight 11 (October 2025) successfully deployed 8 V3 mass simulators. SpaceNews, multiple dates. SpaceNews

[26] Andrew McCalip, Varda Space Industries, orbital data center cost analysis. Approximately 3× terrestrial per watt under base-case Starship pricing assumptions, compared to US average hyperscale facility costs including land, power, and cooling. The ratio is sensitive to the terrestrial baseline: a facility in a power-constrained market (Northern Virginia at 8+ cents/kWh) narrows the gap; an unconstrained site (Iowa at 3 cents/kWh) widens it. SpaceNews, February 2026. SpaceNews

[27] DARPA Robotic Servicing of Geosynchronous Satellites (RSGS) program. Mission Robotic Vehicle targeting 2026 launch for first demonstration of robotic satellite servicing. SpaceNews.

[28] Space-based solar power investment: $879 million across 157 projects over 57 years. ScienceDirect peer-reviewed study, 2023. ScienceDirect

[29] NASA Office of Technology, Policy, and Strategy, “Space-Based Solar Power” report, January 2024. SBSP designs found to be 12–80× more expensive than terrestrial alternatives under current conditions. NASA

[30] Musk, SpaceX blog post, February 2, 2026: “My estimate is that within 2 to 3 years, the lowest cost way to generate AI compute will be in space.” On the Cheeky Pint podcast (February 4, 2026), Musk extended the claim: “You can mark my words, in 36 months but probably closer to 30 months, the most economically compelling place to put AI will be space.” He added: “Five years from now, my prediction is we will launch and be operating every year more AI in space than the cumulative total on Earth.” For context, projected global terrestrial data center capacity by 2030 is approximately 200 GW. TechCrunch, February 5, 2026. TechCrunch

[31] Lunar lava tube temperatures: approximately constant -20°C. IEEE Spectrum, February 2025, citing lunar science research. Permanently shadowed crater temperatures as low as ~40 K from lunar exploration data.

[32] Lonestar Data Holdings positions high latency as a security feature for disaster recovery. Earth-Moon round-trip light-speed delay: ~2.56 seconds. Lonestar CEO Chris Stott, quoted in Reuters and IEEE Spectrum.

[33] Lonestar L1 Lagrange point architecture: only 4 hours of shade every 90 days, batteries for that duration. InformationWeek, April 2025, quoting CEO Chris Stott. InformationWeek

[34] Lonestar plans six data storage spacecraft at Lunar L1, 2027–2030, each carrying multi-petabyte storage with edge processing. InformationWeek, April 2025. InformationWeek

[35] Outer Space Treaty (1967), Article VIII: “A State Party to the Treaty on whose registry an object launched into outer space is carried shall retain jurisdiction and control over such object.” Article II prohibits national appropriation of outer space. Article VI further requires that states bear international responsibility for national space activities, including those of non-governmental entities, and must “authorize and continuously supervise” such activities. This creates a dual-edged jurisdiction: the registering state has legal authority to protect data on the spacecraft, but also has a legal obligation to supervise its operation — and could compel access under its own national security laws. UNOOSA

[36] Lonestar CEO Chris Stott secured S, X, and Ka-band frequency filings (the primary radio bands used for satellite communication) through the United Kingdom. SpaceNews, April 2022. Stott co-founded ManSat, a spectrum regulation consultancy, prior to Lonestar.

[37] ASCEND consortium quote from Damien Dumestier, systems architect, Thales Alenia Space. Orange Hello Future, September 2024.

[38] CLOUD Act, 18 U.S.C. § 2713: “A provider of electronic communication service or remote computing service shall comply with the obligations of this chapter to preserve, backup, or disclose the contents of a wire or electronic communication and any record or other information pertaining to a customer or subscriber within such provider’s possession, custody, or control, regardless of whether such communication, record, or other information is located within or outside of the United States.” Cornell LII

[39] Lonestar transmitted the U.S. Declaration of Independence to the IM-1 lander in transit to the Moon; the lander returned digital copies of the Constitution and Bill of Rights. DataCenterFrontier, February 2024.

[40] SpaceX 2025 financials: approximately $8 billion profit on $15–16 billion revenue. Reuters, citing two people familiar with the company’s results, reported late January 2026. CNBC confirmed. Starlink accounts for an estimated 67–70% of total revenue. Reuters · CNBC

[41] Bloomberg reported February 27, 2026, that SpaceX was weighing a confidential IPO filing as early as March 2026. Financial Times previously reported the company is targeting up to $50 billion raise at valuations as high as $1.5–1.75 trillion. SpaceX has begun pitching non-US banks. Investing.com, March 2026. FT

[42] xAI burned approximately $9.5 billion through the first nine months of 2025. The Information, February 2, 2026. CNBC confirmed. This figure includes one-time infrastructure buildout costs (the Memphis “Colossus” GPU cluster, reported at $3–4 billion), meaning operational burn rate excluding infrastructure capex is likely $400–600 million/month rather than the headline ~$1 billion. Monthly burn rate also includes X platform operations with $1.2 billion in annual debt servicing from the 2022 leveraged buyout. CNBC

[43] Triangular merger structure: Reuters reported the transaction is structured as a triangular merger, allowing xAI to function as a subsidiary while minimizing SpaceX’s exposure to xAI liabilities. Share exchange ratio: 1 xAI share = 0.1433 SpaceX share. CNBC viewed valuation documents. CNBC

[44] FCC acceptance timeline and “Build America Agenda” framing. FCC.gov, “Boosting America’s Space Economy” initiative under Chairman Brendan Carr. The Register reported the FCC opened the application for public comment on February 5, 2026.

[45] Amazon petition to deny. Characterized the filing as “a speculative placeholder.” BASENOR reporting. The American Astronomical Society also issued a public alert warning that one million satellites would represent “a factor-of-100 increase over the current satellite population in LEO.” AAS official statement. AAS

[46] Tim Farrar, President of TMF Associates. Characterized the filing as “quite rushed” and a narrative tool for the IPO. SatNews, January 31, 2026. Farrar separately noted to CNBC that “it is clear SpaceX can’t fund xAI itself” from operating cash flow, making the IPO essential. CNBC

[47] Google CEO Sundar Pichai: “Like any moonshot, it’s going to require us to solve a lot of complex engineering challenges.” Google Blog, November 2025. Two prototype satellites planned for early 2027 in partnership with Planet Labs. Google Blog

[48] ASCEND targets: 13 building blocks at 10 MW total by 2036 as starting point for cloud commercialization; 1,300 building blocks at 1 GW by 2050. CNBC, June 2024, quoting ASCEND project manager Damien Dumestier. EU-funded at €2 million for the feasibility study under Horizon Europe. CNBC

[49] In-Q-Tel interest in space compute noted via PitchBook records of investments in the broader space-compute ecosystem. Defense and intelligence applications represent significant demand signal for secure, non-terrestrial inference capability. Starcloud’s work with Capella Space on SAR imagery processing is the most specific disclosed contract. CNBC, December 2025.

[50] Google Suncatcher preprint projects economic viability at ~$200/kg launch costs and ~180 Starship launches/year, which they estimate could occur by mid-2030s under optimistic learning-rate assumptions. The paper explicitly notes this is “not a full economic analysis.” Preprint

[51] Gartner VP Bill Ray, report titled “Orbital Datacenters Won’t Serve Terrestrial Needs,” February 2026. Characterized the concept as “peak insanity.” The Register, February 25, 2026. The Register

[52] Nvidia Vera Rubin Space Module announced at GTC 2026, March 16, 2026. Nvidia claims 25x AI inference performance over H100 for “space-based inferencing.” The announcement specifies inference, not training — consistent with this piece’s Orbit 1/Orbit 2 distinction. The Space Module is a variant of Nvidia’s next-generation Vera Rubin architecture adapted for orbital environments, not a purpose-built space chip. IGX Thor and Jetson Orin platforms also highlighted for current orbital deployment. Yahoo Finance, March 16, 2026; Nvidia GTC 2026 keynote. Vendor-claimed performance figure; not independently verified. Nvidia GTC 2026

[53] Kepler Communications launched ten 300-kilogram-class optical relay satellites aboard SpaceX Falcon 9 from Vandenberg, January 11, 2026. Each satellite equipped with SDA-compatible optical terminals, multi-GPU compute modules, and terabytes of onboard storage. SatNews, February 9, 2026; Kepler press release, January 11, 2026. Kepler has raised over $200 million in equity since 2015. Second tranche supporting ESA’s HydRON program planned approximately two years later. SpaceNews

[54] Axiom Space purchased two initial orbital data center (ODC) computing payloads on Kepler’s network, announced April 7, 2025. First two ODC nodes launched on Kepler’s January 2026 mission. Axiom separately launched AxDCU-1, a data processing prototype running Red Hat Device Edge, to the ISS aboard SpaceX CRS-33 on August 24, 2025. SpaceNews, April 7, 2025; DataCenterDynamics, April 10, 2025; Data Center Knowledge, September 9, 2025. DCD · SpaceNews

[55] ADA Space founder Wang Jian confirmed at the 2026 Two Sessions that the Three-Body Computing Constellation will have 50 computing satellites launched in 2026, with plans for 1,000 by 2032. China-in-Space, March 2026. Program currently has 39 satellites under development per CGTN, February 2026. China-in-Space

[56] Astro-Future Institute plans a sixteen-spacecraft constellation of laser-linked gigawatt-scale data centers, with backing from Lenovo and the Beijing municipal government. At least ¥140 million ($20.4 million) in disclosed funding. Demonstration satellite expected 2026. China-in-Space, March 2026. China-in-Space

[57] China Aerospace Science and Technology Corporation (CASC) included “Build gigawatt-scale space-based digital and intelligent infrastructure” in its focus areas for the 15th Five-Year Plan period (2026–2030). CGTN, February 10, 2026. CGTN

[58] CUHK-1 satellite, developed by the Chinese University of Hong Kong, runs a version of the DeepSeek model in orbit. Designed to connect with and process data from other Hong Kong-made satellites. China-in-Space, March 2026. China-in-Space

[59] International Telecommunication Union, “Facts and Figures 2024.” Approximately 2.6 billion people remain unconnected to the Internet; an additional several hundred million have only intermittent or low-quality access. The “three billion” figure is a rounded estimate encompassing both unconnected and underserved populations. ITU

[60] SpaceX FCC filing (SAT-LOA-20260108-00016), January 30, 2026: “launching one million tonnes per year of satellites generating 100kW of computer power per tonne would add 100 gigawatts of AI compute capacity annually.” DataCenterDynamics, February 11, 2026, quoting filing directly. The architecture is explicitly distributed — one million satellites at ~100 kW each — not a centralized gigawatt facility. DCD

[61] Marc Bara, “Orbital Data Centers, Part II: SpaceX’s Million-Satellite Bet,” Medium, February 4, 2026. Bara (PhD Electrical Engineering, UPC Barcelona; decade on Galileo and ESA missions) calculates that at 100 kW per satellite with 40% efficiency, each satellite needs approximately 41 square meters of radiator surface at 400 K — “feasible” and “well within existing engineering practice.” His conclusion: “The distributed architecture elegantly sidesteps the cooling challenge...but ‘converting an engineering problem into a cost problem’ only helps if the costs are tractable. At current projections, they are not.” See also Mach33 Research, “Debunking the Cooling Constraint in Space Data Centers,” March 2026, which reaches a similar conclusion using Starlink V3 as a reference platform.

[62] Jeff Bezos: “There will be gigawatt data centers in space in 10+ years.” Quoted in Concept to Cloud, November 2025. Bezos acquired no company to pursue this but referenced Blue Origin’s capabilities. The 10+ year timeline aligns with Google’s mid-2030s projection and contradicts Musk’s 2–3 year claim. Concept to Cloud

[63] Interconnect latency hierarchy for AI training: NVLink 4.0 (H100) provides 900 GB/s bidirectional bandwidth with sub-microsecond latency between GPUs within a single server node. InfiniBand NDR provides 400 Gb/s (= 400 Gbps) with 1–5 microsecond latency between nodes in a data center cluster — the relevant comparison for distributed training. Free-space optical laser links between LEO satellites provide 100–800 Gbps (demonstrated by Suncatcher in lab) with millisecond-class latency depending on inter-satellite distance. At 800 Gbps, Suncatcher’s optical link has comparable raw bandwidth to a single InfiniBand NDR link (400 Gbps). The gap is latency: single-digit microseconds for InfiniBand vs. milliseconds for inter-satellite links, a factor of approximately 1,000. Training synchronization is latency-bound — it requires frequent small exchanges, not occasional large transfers — making the latency gap, not the bandwidth gap, the binding constraint. Preprint

[64] Starcloud (formerly Lumen Orbit) FCC application for 88,000-satellite constellation accepted for filing March 13, 2026. Satellites at 600–850 km altitude. SpaceNews, March 15, 2026. The company’s Starcloud-4 concept envisions a 5 GW satellite with solar arrays approximately 4 km on a side — an explicit Orbit 2 architecture. CEO Philip Johnston also announced plans to fly Bitcoin mining ASICs on Starcloud-2 (targeted late 2026), which would be the first cryptocurrency mining in orbit — a pure Orbit 1 workload requiring zero interconnect. The company’s roadmap thus spans both orbits: demonstrated Orbit 1 (H100 inference), filed Orbit 1 at scale (88K constellation), and aspirational Orbit 2 (5 GW). This progression mirrors the SpaceX FCC filing’s conflation of sidecar and megastructure in a single document. SpaceNews

[65] DiLoCo (Distributed Low-Communication Learning): Douillard et al., DeepMind, 2024. Demonstrated training across poorly connected nodes with synchronization every 500 local optimization steps, reducing inter-node communication by approximately 500×. The technique trades communication for compute: each worker performs hundreds of additional local steps between synchronizations, increasing the total compute required per training run. At orbital hardware costs (approximately 3× terrestrial per the McCalip analysis), this additional compute overhead compounds the cost disadvantage rather than resolving it. The interconnect barrier is one of several; removing it does not make the economics close. arXiv

[66] Amazon launch dependency on SpaceX: Amazon originally booked 77 launches with ULA (Vulcan), Arianespace (Ariane 6), and Blue Origin (New Glenn) for its Kuiper satellite constellation. Development delays across all three providers threatened Amazon’s FCC-mandated deployment deadline (half of 3,236 satellites in orbit by July 2026). Amazon subsequently contracted SpaceX Falcon 9 launches, completing three missions in July, August, and October 2025. The Cleveland Bakers and Teamsters Pension Fund filed a shareholder derivative suit alleging that Amazon’s board allowed Bezos’s personal rivalry with Musk to delay a cost-effective launch contract, harming shareholders. SatNews, March 9, 2026; CNBC reporting on Kuiper launch contracts. Blue Origin’s New Glenn achieved orbit on its first flight (January 2025) and successfully landed its booster on its second flight (November 2025), but has flown only twice total — insufficient cadence for constellation-scale deployment. SpaceX launched 166 Falcon 9 missions in 2025. CNBC

Built the Datacenter, Lost the Brain

Julien Simon — Tue, 17 Mar 2026 21:42:48 GMT

On March 10, 2026, Yann LeCun’s new company announced a $1.03 billion funding round at a $3.5 billion pre-money valuation.[1] The company had twelve employees, no product, and no revenue. Its technology — world models based on an architecture called JEPA — was developed during LeCun’s twelve years leading Meta’s Fundamental AI Research lab, the unit that had once made Meta one of the most respected AI research organizations in the world.[2] The investor list read like a verdict: Nvidia, Bezos Expeditions, Eric Schmidt, Samsung, and Toyota Ventures.[3]

The next morning, Meta unveiled four custom AI chips — the MTIA 300, 400, 450, and 500 — built with Broadcom and fabricated by TSMC.[4] The company said it would deploy all four within two years on a six-month cadence, the fastest chip release schedule in the industry.[5] The MTIA 400 claimed raw performance competitive with that of leading commercial generative AI inference products.[6] Meta already had hundreds of thousands of first-generation MTIA chips in production, serving the ranking and recommendation systems behind Facebook and Instagram.[7]

Two days later, the New York Times reported that Meta had delayed its frontier AI model, codenamed Avocado, from a planned March launch to at least May.[9] In internal testing, the model fell short of Google’s Gemini 3 in reasoning, coding, writing, and agentic behavior — performing somewhere between Gemini 2.5 from March 2025 and Gemini 3 from November 2025.[10] Three sources told the Times that leaders inside Meta’s AI division had discussed temporarily licensing Google’s Gemini to power Meta’s products while Avocado caught up.[11] By Friday, Reuters reported that Meta was weighing layoffs affecting 20 percent or more of its workforce — roughly 15,800 people — to offset the cost of its AI infrastructure buildout.[12]

The researcher that Meta pushed out raised a billion dollars on Monday. The infrastructure Meta built without him launched on Tuesday. The model Meta built without him failed on Thursday. Five days that revealed the most important structural split in the AI industry: the distance between building infrastructure and producing intelligence. And one company that spent $135 billion on the first and destroyed its capacity for the second.[13]

The paradox in numbers

Meta’s 2026 capital expenditure guidance — $115 billion to $135 billion, including principal payments on finance leases — represents the largest single-year AI infrastructure commitment any company has ever made.[14] It nearly doubles Meta’s 2025 actual spending of $72.2 billion and dwarfs the annual GDP of most countries.[15] The money is building data centers, buying GPUs, and deploying custom silicon at a pace Meta’s VP of Engineering Yee Jiun Song called “unusual for any silicon company or team.”[16]

Spending more doesn’t help if you can’t convert capital into intelligence.[71]

The chips are real. MTIA is not vaporware. The 300-series is in production. The 400-series has completed testing and is entering data centers. The 450-series doubles HBM (high-bandwidth memory) capacity and claims to exceed the inference performance of leading commercial products. The 500-series still adds 50 percent more bandwidth.[17] Across the lineup, Meta claims a 25-fold increase in compute and a 4.5-fold increase in memory bandwidth.[18] Broadcom confirmed that Meta will install “multiple gigawatts” of custom silicon in 2027 and beyond.[19] These are serious inference chips built by a serious silicon team.

But in the same weeks that Meta announced its chip roadmap, it also signed a multi-billion-dollar deal with Nvidia for training GPUs, a $60 billion deal with AMD for training and inference hardware, and a separate agreement to use Google’s TPU chips for training capacity.[20] Meta builds its own inference silicon. It rents training silicon from three competitors — including the company whose model it is now considering licensing. The company that designed four custom chips on a six-month deployment cadence cannot train a competitive model on those chips, because the chips were designed for inference, not training. CFO Susan Li told a Morgan Stanley conference that Meta is “hopeful” of expanding MTIA to training “eventually”— an aspirational statement with no timeline.[21]

Meta is world-class at delivering intelligence to 3.6 billion daily users. It cannot produce the intelligence it delivers. That gap — between operational excellence at the serving layer and structural failure at the creation layer — is a category error with a $135 billion price tag. And it has a specific cause.

What Meta had

FAIR — Facebook AI Research, later Fundamental AI Research — was founded in December 2013 when Mark Zuckerberg recruited Yann LeCun, a professor at New York University and one of the three researchers who would share the 2018 Turing Award for their foundational work on deep learning.[22] For the next decade, FAIR operated as an academic-style research lab inside a commercial company. Researchers published openly, explored long-term ideas, and pursued directions that might not produce revenue for years. LeCun described it as “a tabula rasa with a carte blanche” where “money was clearly not going to be a problem.”[23]

FAIR never topped the closed-model leaderboard — never beat GPT-4, never beat Gemini. That concession matters. But FAIR built Llama, and Llama changed the industry’s structure. The decision to release a competitive large language model under an open license broke the assumption that frontier AI would remain the exclusive property of three or four closed labs. Llama didn’t win benchmarks against GPT-4. It did something harder: it created an ecosystem. Thousands of companies, researchers, and governments built on Llama because FAIR’s culture — publish openly, share weights, let the community iterate — made that possible.[24]

No operator culture produces that decision. Operator culture closes the code and ships the product. Research culture opens the weights and builds the field. What research culture generates isn’t any single model — it’s the architectural intuition, the willingness to explore directions that won’t ship for years, and the tolerance for negative results that inform the next attempt. Those are the inputs to the next breakthrough. You can’t buy them with a $100 million signing bonus. You grow them over a decade in a lab that protects long-horizon work from quarterly pressure.

Frontier model training does require operational excellence — running a 100,000-GPU cluster reliably for months is serious engineering. But it also requires research insight about architecture and data curation that no amount of operational discipline can substitute for. Meta’s error wasn’t hiring operators. It was replacing researchers with operators rather than pairing them.

What Meta did

In April 2025, Meta launched Llama 4 to a reception that LeCun would later describe with unusual bluntness: the benchmarks were “fudged a little bit,” with the team using different model variants for different benchmarks to produce better numbers.[25] An experimental version submitted to Chatbot Arena — a public benchmark where models are ranked by blind human preference — produced verbose, emoji-laden responses optimized for human preference rankings, and bore little resemblance to the publicly released models.[26] The research community noticed immediately. Zuckerberg, by LeCun’s account, was “really upset and basically lost confidence in everyone who was involved.”[27]

The response was not to reinvest in the research culture that could have prevented the problem. It was to replace it. Joelle Pineau, who had led FAIR since 2023, departed in May after eight years at the company.[28] In June, Zuckerberg spent $14.3 billion to acquire a 49 percent stake in Scale AI and installed its 28-year-old co-founder, Alexandr Wang, as Meta’s first Chief AI Officer, leading a new unit called Meta Superintelligence Labs.[29]

Wang’s background was in data infrastructure. Scale AI is a data labeling and evaluation company — essential plumbing for AI development, but not a frontier model builder. Wang had never led an AI research lab or trained a large language model.[30] LeCun’s assessment, delivered in a Financial Times interview after his departure, was precise: “He learns fast, he knows what he doesn’t know. There’s no experience with research or how you practice research, how you do it. Or what would be attractive or repulsive to a researcher.”[31]

What Wang brought was operator culture: ship fast, close the code, productize everything. The hires matched the mandate. Nat Friedman, the former CEO of GitHub, came in to lead product. Shengjia Zhao, a co-creator of ChatGPT, joined as chief scientist — and reportedly threatened to quit within days of arriving, requiring a title upgrade to retain him.[32] Compensation packages for recruits from Google, OpenAI, and Anthropic reached $100 million to $300 million over four years.[33] The money attracted talent. Whether it attracted the right kind of talent is the question the next twelve months answered.

In October, Wang laid off 600 people from Meta Superintelligence Labs, with cuts concentrated in FAIR.[34] His internal memo explained the logic: “By reducing the size of our team, fewer conversations will be required to make a decision, and each person will be more load-bearing.”[35] The reasoning is sound for shipping a product — and catastrophic for running a research lab, where the “unnecessary conversations” are often where the breakthroughs happen. It was the right framework applied to the wrong problem.

The robotics group was dissolved. FAIR was pushed toward short-term projects aligned with TBD Lab, the new unit developing Meta’s frontier models under Wang’s direct leadership.[36] Llama development, which FAIR had originated, was formally moved to TBD Lab. The open-source philosophy that had defined Meta’s AI identity was abandoned — Avocado would be proprietary, a closed model in the mold of GPT or Gemini.[37]

Researchers began leaving within weeks of the reorganization. At least eight departed in the months following MSL’s creation, including several who never formally started their new roles.[38] The departures were negatively selected: the researchers with the best outside options — which is to say, the ones with the most valuable research intuitions — left first. Multiple people went to OpenAI. Chaya Nayak, a longtime director of GenAI product management, left for OpenAI’s Special Initiatives group. LeCun himself announced his departure in November, telling the Financial Times that the new hires were “completely LLM-pilled” and that Meta’s approach to superintelligence through language model scaling was “a dead end.”[39] His parting line: “You don’t tell a researcher what to do. You certainly don’t tell a researcher like me what to do.”[40]

By December, Wang reportedly told colleagues he felt “suffocated” by the level of oversight Zuckerberg maintained over AI strategy.[41] In March 2026, Meta created a parallel Applied AI Engineering group under Maher Saba, a Reality Labs veteran reporting to CTO Andrew Bosworth — effectively splitting Wang’s authority nine months after his $14.3 billion appointment.[42] Meta spokesperson Andy Stone insisted Wang’s influence was “growing, not waning.”[43] The organizational structure told a different story: repeated restructures in nine months, the research lab gutted, its founder gone, and a parallel organization created to hedge against the remaining one.

The market’s verdict

The three-day sequence in March was not a coincidence. It was a consequence.

LeCun raised $1.03 billion for AMI Labs on the strength of a single asset: the research intuition and architectural vision that Meta had decided it didn’t need. The investors — Nvidia, Bezos, Schmidt, Samsung — were not betting on a product. They were betting on a capability. Research culture is a scarce asset, and the market priced it accordingly: $3.5 billion pre-money for twelve people and an idea.[44]

Two days later, the model that Wang’s operational culture produced — Avocado, built by TBD Lab, using the fast-ship, closed-source, operator-driven methodology the reorganization was designed to enable — couldn’t beat a model built by the company that still employed its research leader.[45] Google’s Demis Hassabis won the 2024 Nobel Prize in Chemistry for work conducted at Google DeepMind.[46] He was not replaced by an operator. He was promoted.

The category error is now nameable. Frontier AI is a research problem that uses infrastructure as an input. Meta treated it as an infrastructure problem that uses researchers as an input. When the researchers didn’t produce on an operational timeline, Meta cut them and hired operators. The operators built competitive chips and massive data centers — because building chips and data centers is an operational scaling problem, which operators are excellent at. The model didn’t come because models are not an operational scaling problem. DeepSeek demonstrated this from the other direction: roughly $5.6 million in pre-training compute, 2,048 GPUs, and a research-led team produced a model that triggered emergency “war rooms” inside Meta.[47] The conversion factor isn’t capital. It’s a research culture.

The counterexample is OpenAI, where an operator CEO built the industry’s leading frontier lab. But Altman’s organizational architecture protected research autonomy — Sutskever set the research agenda for years, Murati ran execution, and the research team’s timelines were not subordinated to product shipping schedules until the very tensions that led to Altman’s brief firing. What matters is not whether the CEO is an operator or a researcher — it is whether the organizational structure shields long-horizon research from short-horizon operational pressure. Meta didn’t. Google did.

What actually exists

Meta’s inference infrastructure is world-class. Hundreds of thousands of MTIA chips serve 3.6 billion daily active users across Facebook, Instagram, WhatsApp, and Threads. The ad targeting engine — the business that generated $201 billion in revenue in 2025 — runs superbly on this stack.[49] Meta AI, the company’s conversational assistant, surpassed 700 million monthly active users.[50] And the balance sheet clock is ticking. Meta generated $46 billion in free cash flow last year. Its 2026 spending plan, at the top of the range, is nearly three times that.[72] The math requires either a revenue acceleration Meta has not guided for, or a drawdown of the $78 billion cash reserve that took a decade of advertising monopoly to build.[73]

The problem is the layer above: frontier intelligence. The layer Zuckerberg staked his capex narrative on when he told investors that 2026 would be “a big year for personal superintelligence.”[51] The layer that justifies the $135 billion to Wall Street. The layer where MTIA chips are irrelevant because they don’t train models, and where $100 million signing bonuses are irrelevant because mercenaries optimize architectures they didn’t invent.

The Gemini licensing discussion is the structural tell. The company, which is spending more on AI infrastructure than any other company in history, is considering renting intelligence from a competitor. Not because it can’t afford to build. Because it can’t produce. Meta’s stock fell 3.8 percent on March 13 alone, to $613 — down more than 23 percent from its September high of $796 — as the Avocado delay and layoff reports landed on the same day.[75] The market’s verdict arrived faster than the model.

Then, on Monday, the sequel. Meta signed a cloud computing deal with Nebius worth up to $27 billion — $12 billion in initial compute capacity starting in 2027, plus up to $15 billion more over five years — for access to Nvidia’s next-generation Vera Rubin platform.[76] Nebius, a Dutch AI infrastructure company in which Nvidia invested $2 billion last week, rose 12 percent on the announcement. Meta’s stock rose 3 percent the same morning, boosted by the Nebius deal and the weekend’s layoff reports. Wall Street’s message was legible: cut the humans, buy more infrastructure. The dependency chain now runs five layers deep — Meta rents training silicon from Nvidia, AMD, and Google, rents cloud compute from Nebius, and is considering licensing intelligence from Google. Even within its own inference stack, Meta’s MTIA chips require Nvidia’s Vera CPUs as orchestrators for agentic workloads.[77] The company, which is spending $135 billion on AI infrastructure, is renting every input except the electricity.

The mirror and the reversion

Google made the opposite decision at the same fork.

In April 2023, Google merged its two AI research units — Brain (founded in 2011 by Jeff Dean) and DeepMind (acquired in 2014 for roughly $500 million) — into Google DeepMind.[52] The decision that mattered was who would lead it. Google chose the researcher, Hassabis, who became CEO of the combined unit. Dean was elevated to Chief Scientist. The research leader was empowered, not replaced by an operator.[53] The merger wasn’t frictionless — DeepMind lost researchers to Anthropic and other startups — but the institutional decision was clear: research culture sets the direction, operational culture supports it.

The results speak in the only language investors trust. Gemini 3 is the model Meta can’t beat. Google’s Gemini app reached 750 million monthly active users.[54] Eight million enterprise seats sold.[55] Cloud revenue grew 48 percent year over year to a $70 billion-plus annual run rate, with a $240 billion backlog.[56] TPU v7 “Ironwood” represents Google’s seventh generation of custom AI silicon — a decade of co-design between chip architects and model researchers that no competitor can replicate by writing a check.[57] Google controls five of the six layers that determine an AI company’s structural position: chips, intelligence, infrastructure, platform, and distribution. Every layer generates revenue independently, and each reinforces the others — a structural position no other public company in the industry can match.

Amazon Web Services tells the same story from the infrastructure side — and the Cerebras deal announced on the same Thursday as the Avocado delay completes the pattern. AWS tried to build frontier models. Titan was, as I documented in an earlier analysis, a press release with an API endpoint. Nova is a message to analysts.[58] The models never competed. So AWS reverted to what its institutional DNA supports: platform operations. Bedrock hosts Claude, GPT, Llama, and Mistral. It wins regardless of which model wins. That reversion was healthy because AWS has a platform to revert to.

Now the reversion extends to silicon. Trainium, AWS’s custom training chip, handles prefill — the computationally intensive phase of processing a prompt. But for decode — the serial, memory-bandwidth-intensive phase of generating tokens — AWS brought in Cerebras and its wafer-scale CS-3 engine, hosted inside AWS data centers on Amazon’s own networking and security stack.[59] The architecture is Bedrock logic applied to chips: not “our silicon is best,” but “we’ll host the right silicon for each phase of the workload.” AWS tried to build its own models and reverted to hosting everyone else’s. Then it tried to build its own inference silicon and reverted to hosting Cerebras’s. The pattern is fractal—and it works because the platform absorbs every reversal.

Meta cannot follow either path. It cannot follow Google’s path because the research culture is gone — LeCun is in Paris, FAIR is gutted, the institutional knowledge walked out the door, and was immediately funded at a billion dollars. It cannot follow AWS’s path because it has no platform to fall back on. Meta’s infrastructure doesn’t serve external customers. There is no Bedrock equivalent. Meta’s chips serve Meta’s products. Meta’s data centers serve Meta’s models. If the model fails, the infrastructure serves a failed model, with no marketplace to absorb the loss. Apple, meanwhile, opted out of the intelligence race entirely — licensing Google’s Gemini for its revamped Siri, treating models as commodity inputs to a distribution problem it dominates through 2.4 billion devices, and sitting on $141 billion in cash.[60] Microsoft built its flagship AI product, Copilot Cowork, on Anthropic’s Claude rather than OpenAI’s models, pivoting from single-model dependency toward a platform play where the underlying intelligence is swappable.[61]

The grid

The table tells the story. Google survives disruption at any single layer because the others generate independent value — five filled cells, each reinforcing the rest.[62] Apple’s bet is that intelligence commoditizes, and the company that owns 2.4 billion endpoint devices wins regardless of whose model runs on them.[63] Microsoft’s real-time pivot from OpenAI exclusivity to model diversity — building Copilot Cowork on Anthropic’s Claude — is a strategic admission that no single intelligence provider is safe to depend on.[64] AWS’s every reversion strengthens the platform: from proprietary models to Bedrock marketplace, from proprietary inference silicon to the Cerebras partnership.[65] Meta’s two empty cells — platform and intelligence — are the two layers its business model cannot survive without.[66]

Oracle is the grid’s bottom row — the only company whose AI infrastructure exists solely to serve others’ models. Oracle’s data centers serve OpenAI’s, xAI’s, and Meta’s models. The company is cutting an estimated 20,000 to 30,000 employees to free cash for construction, while carrying more than $100 billion in debt and watching its borrowing costs rise as banks pull back from datacenter project lending.[67][68][69] The market has already delivered a verdict — Oracle’s stock has fallen roughly 54 percent from its September 2025 highs — but the grid reveals why the market priced it in. I analyzed Oracle’s structural position in detail in a prior piece: the codependency among Oracle, OpenAI, and SoftBank creates a system in which each party’s commitments depend on the others’ execution, and none can exit without triggering a cascade.[70] Oracle is building infrastructure for an intelligence layer it doesn’t control, financed by debt markets losing confidence, and cutting the humans who maintain its existing business to fund construction. Every empty cell in Oracle’s row is a dependency on someone else.

And financing every position on the grid sits Nvidia, which has invested an estimated $50 billion-plus in its own customers — CoreWeave, Nebius, OpenAI, Anthropic, Lumentum, Coherent — creating the demand for the GPUs that generate its $216 billion in annual revenue.[8] The arms dealer doesn’t need to own any layer. It needs every layer to keep buying ammunition.

The grid’s diagnostic sentence: for any company spending on AI, the question is not how much.[74] It is which layers you control, whether your business model survives without the layers you don’t, and whether you have the organizational culture — not the budget, the culture — to produce at the layers you’re missing.

What breaks

Three scenarios test the thesis.

Avocado ships competitively in May. The category error claim weakens for Meta specifically, though not as much as it appears, because Avocado was conceived and substantially developed under the old research culture. The real falsification point is Watermelon, the next frontier model, built entirely under Wang’s operator-driven structure. If Watermelon competes with whatever Google and OpenAI ship in late 2026, the thesis breaks: operator culture can produce frontier intelligence given enough capital and time. If Watermelon disappoints — if the model that was conceived, trained, and shipped entirely without FAIR’s institutional knowledge falls short — the category error is confirmed. Either way, the grid still holds as a diagnostic tool, because Meta’s structural exposure (no platform, no training silicon, rented intelligence) remains even if one model ships well.

Models commoditize. Apple’s bet proves right, and the intelligence layer becomes interchangeable. Meta’s failure to produce a frontier model stops mattering, because nobody’s proprietary model matters. In this scenario, MTIA’s inference efficiency and Meta’s distribution to 3.6 billion users become the strategic assets. The category error still occurred, but the market shift absorbed its consequences. The $135 billion produced the world’s best inference platform for commodity models, which is a defensible business, even if it’s not the one Zuckerberg described.

The category error compounds. More infrastructure spending creates more operational pressure. More pressure demands faster shipping timelines. Faster timelines drive out remaining researchers who value patience and exploration. The intelligence gap widens. Meta becomes the most expensive licensee of someone else’s model in the history of the technology industry. The capex narrative quietly shifts from “building superintelligence” to “building the world’s best delivery system for someone else’s brain.” AWS deliberately chose that position, and it works — because AWS charges rent. Meta’s version lacks a revenue model. The data centers serve Meta’s own products, and those products need a competitive model to justify the investment thesis Wall Street bought.

The grid identifies, in one framework, why Google can survive almost anything, why Apple’s restraint may be prescience, why Microsoft is hedging in real time, why AWS’s reversions strengthen rather than weaken it, and why Meta is the only company in the grid that destroyed its position at the one layer its business model cannot survive without.

Meta built the datacenter. The brain caught a flight to Paris.

Notes

[1] TechCrunch, March 9-10, 2026. AMI Labs (Advanced Machine Intelligence Labs) funding round. Pre-money valuation $3.5 billion; post-money approximately $4.53 billion. Described as Europe’s largest seed round. Co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions.

[2] LeCun founded FAIR in December 2013. He shared the 2018 Turing Award with Geoffrey Hinton and Yoshua Bengio for foundational work on deep learning. JEPA (Joint Embedding Predictive Architecture) predicts future states in abstract representational space rather than pixel space — a departure from LLM approaches. LeCun developed JEPA during his final years at Meta.

[3] Strategic investors include Nvidia, Samsung, Sea, Temasek, and Toyota Ventures. Individual investors include Jeff Bezos, Mark Cuban, Eric Schmidt, Tim Berners-Lee, Jim Breyer, and Xavier Niel. CEO is Alexandre LeBrun (formerly Nabla); LeCun serves as Executive Chairman.

[4] Meta Newsroom, March 11, 2026. All four chips use the RISC-V architecture, are built in partnership with Broadcom, and are fabricated by TSMC.

[5] VP of Engineering Yee Jiun Song, CNBC interview, March 11, 2026: “It’s unusual for any silicon company or team to be releasing a new chip every six months.”

[6] Meta blog post, March 11, 2026. MTIA 400 is described as the first Meta chip with “raw performance competitive with leading commercial products” for generative AI inference. Uses two compute chiplets. Completed testing phase.

[7] Meta Newsroom: “We deploy hundreds of thousands of MTIA chips for inference workloads across both organic content and ads on our apps.” MTIA 300 is in production as of March 2026.

[8] Nvidia SEC filing, March 11, 2026. $2 billion investment in Nebius Group NV at $94.94 per share, approximately 8.3% stake. Partnership targets 5 GW of Nvidia systems deployed by the end of 2030. Follows $2B investment in CoreWeave (January 2026), $30B in OpenAI (February 2026), up to $10B in Anthropic (November 2025), $2B each in Lumentum and Coherent (March 2026).

[9] New York Times, March 12, 2026. Three sources familiar with the matter. The model was originally targeted for late 2025, slipped to Q1 2026, then mid-March, and is now at least May.

[10] NYT and Reuters independently confirmed. “The model outperformed Meta’s previous model and did better than Google’s Gemini 2.5 model from March [2025], two of the people said. But it has not performed as strongly as Gemini 3 in November.” Performance gaps specifically in reasoning, coding, writing, and agentic behavior.

[11] NYT: “The leaders of Meta’s A.I. division had instead discussed temporarily licensing Gemini to power the company’s A.I. products, though no decisions have been reached.” Meta spokesperson Dave Arnold: “We’re excited for people to see what we’ve been cooking very soon.”

[12] Reuters, March 13, 2026 (Katie Paul, Jeff Horwitz, Deepa Seetharaman). Meta employed 78,865 people as of December 31, 2025. 20% = approximately 15,800. Meta spokesperson Andy Stone: “This is speculative reporting about theoretical approaches” — notably not a denial.

[13] Meta Q4 2025 earnings release, January 28, 2026: 2026 capital expenditure guidance of $115-135 billion, including principal payments on finance leases, “with year-over-year growth driven by increased investment to support our Meta Superintelligence Labs efforts and core business.”

[14] Meta Q4 2025 earnings release. For comparison: Alphabet guided $175-185B, Amazon guided ~$200B, Microsoft tracking ~$130-145B (H1 actual $72.4B, no explicit full-year guidance). Meta’s figure represents approximately 57-67% of FY2025 revenue of $201 billion — the highest capex-to-revenue ratio among the hyperscalers.

[15] Meta FY2025 actual capital expenditures, including finance lease payments: $72.2 billion, per Q4 2025 earnings release. FY2024: $39.2 billion.

[16] Song, CNBC, March 11, 2026.

[17] Meta AI Blog, March 11, 2026. MTIA 450 doubles MTIA 400’s HBM bandwidth; supports MX4 data type yielding approximately 6x FLOPs versus FP16/BF16. MTIA 500 adds 50% more HBM bandwidth and 80% more capacity than MTIA 450, uses a 2x2 chiplet configuration. Mass deployment: MTIA 450 early 2027, MTIA 500 later 2027.

[18] Meta blog post. Across the MTIA 300-500 lineup: HBM bandwidth increases by 4.5x, and compute FLOPs increase by 25x.

[19] Broadcom statement reported in Seoul Economic Daily and CNBC, March 2026.

[20] AMD deal: approximately $60 billion, announced February 2026. AMD issued Meta warrants for up to 160 million shares (~10% of the company). Nvidia deal: multi-billion, ongoing. Google TPU deal: confirmed by multiple outlets; Meta signed a “multibillion-dollar deal for Google’s TPU chips” per Mashable/Tech Brew, March 2026.

[21] Susan Li, CFO, Morgan Stanley Technology Conference, March 4, 2026: “The sort of ranking and recommendations workloads have been where we have started, and that’s the place where we have rolled out custom silicon at the most scale. But we expect and are hopeful that we are going to expand that over time, including eventually to training AI models.”

[22] LeCun joined Facebook in December 2013 to serve as FAIR's director while maintaining his NYU professorship. The Turing Award was shared with Geoffrey Hinton and Yoshua Bengio in 2019 (awarded for 2018 contributions). CNBC, November 19, 2025.

[23] LeCun, Financial Times interview, circa January 3-5, 2026.

[24] Llama 1 (February 2023) was the first competitive large language model released with open weights by a major lab, authored by FAIR researchers. Its release catalyzed an open-source ecosystem: within months, thousands of fine-tuned variants appeared on Hugging Face, and Llama became the foundation for models deployed by governments, startups, and enterprises globally. By Llama 3.1 (July 2024), Meta’s open models were the most widely deployed non-proprietary LLMs worldwide. More than half of the 14 original Llama authors left Meta within six months of publication. Fortune, April 2025: former FAIR employees described the lab as “dying a slow death.”

[25] LeCun, FT interview: “Results were fudged a little bit.” He specified that teams used “different models for different benchmarks to give better results.”

[26] The “Llama-4-Maverick-03-26-Experimental” submission to LM Arena briefly reached No. 2 (ELO 1,417) using verbose, emoji-laden responses. LM Arena condemned the approach. The publicly released Maverick dropped to 32nd-35th place after the experimental version was removed. Llama 4 launched Saturday, April 5, 2025 — an unusual weekend release.

[27] LeCun, FT interview: Zuckerberg “was really upset and basically lost confidence in everyone who was involved in this. And so basically sidelined the entire GenAI organization.”

[28] Joelle Pineau announced her departure on April 1, 2025 (Bloomberg, CNBC). Last day May 30, 2025. She had led FAIR since early 2023 and joined Meta in 2017.

[29] Meta acquired a 49% non-voting stake in Scale AI for approximately $14.3 billion in June 2025. One source reports $14.8B; $14.3B is the more widely reported figure. Wang became Meta’s first Chief AI Officer, leading the newly created Meta Superintelligence Labs. Fortune, CNBC, Bloomberg.

[30] Scale AI provides data labeling and AI model evaluation services. It grew to a valuation of ~$29 billion through the Meta deal. Fortune: “Scale doesn’t make AI models.” TechCrunch: “Wang hasn’t led an AI lab of this sort before.”

[31] LeCun, FT interview, January 2026.

[32] Nat Friedman (former GitHub CEO) leads Products and Applied Research within MSL. Shengjia Zhao (co-creator of ChatGPT, GPT-4, and other OpenAI models) became Chief Scientist. FT and Wired reported (August/September 2025) that Zhao threatened to quit shortly after joining and was given the title of Chief Scientist to retain. B-tier sourcing, anonymous.

[33] Compensation figures from VentureBeat, CNBC, and eWeek, citing industry sources: packages of $100 million to $300 million over four years for senior recruits from Google, OpenAI, and Anthropic.

[34] Axios, October 22, 2025: “Meta is cutting roughly 600 positions out of the several thousand roles within Meta’s superintelligence lab.” Cuts concentrated in FAIR and AI infrastructure teams. TBD Lab (Wang’s frontier model unit) was spared.

[35] Alexandr Wang, internal memo, obtained by Business Insider, October 2025.

[36] The protein-folding research team was cut. Researcher Yuandong Tian confirmed his reinforcement learning team was affected. FAIR was directed to integrate its research into TBD Lab’s training runs, per Business Insider. Fortune: FAIR was “increasingly shoved out of the limelight.”

[37] Bloomberg, December 2025: Avocado expected to launch as a closed, proprietary model — “the biggest departure to date from the open-source strategy Meta has touted for years.” CNBC, December 9, 2025, confirmed the pivot.

[38] AIM Media House, August 2025: documented at least eight departures in the weeks following MSL’s creation. Named: Avi Verma (returned to OpenAI), Ethan Knight (returned to OpenAI), Rishabh Agarwal (joined Periodic Labs), Chaya Nayak (joined OpenAI Special Initiatives). Multiple others undisclosed.

[39] LeCun, FT interview: “I’m not gonna change my mind because some dude thinks I’m wrong. I’m not wrong.” On the LLM approach: “The path to superintelligence — simply train large language models, train with more synthetic data, hire thousands of people to ‘educate’ your system in post-training, and invent new tricks for reinforcement learning — I think it’s complete nonsense. It simply won’t work.” The “complete nonsense” quote refers specifically to the technical approach, not to Meta’s broader strategy.

[40] LeCun, FT/The Decoder interview, January 2026.

[41] Financial Times, December 2025. Wang reportedly told colleagues he felt “suffocated” by Zuckerberg’s oversight. B-tier, anonymous sourcing.

[42] The Applied AI Engineering group was created in March 2026 under Maher Saba (VP, formerly Reality Labs), reporting to CTO Andrew Bosworth. Focuses on data pipelines, internal tools, and building a “data engine.” Engineering teams previously under Wang were moved to Saba’s unit.

[43] Andy Stone, via X (formerly Twitter), March 2026: “Totally false… Alex helped create the new team, still runs MSL and TBD, has growing, not waning influence. This is all so silly.”

[44] AMI Labs raises details per TechCrunch, PitchBook, and EU-Startups. The piece takes no position on whether LeCun’s world model thesis will prove correct — the structural claim is that the market priced research capability as an investable asset at $3.5 billion pre-money, independent of current product.

[45] Avocado is developed by TBD Lab, the ~100-person unit within MSL under Wang’s direct leadership. Bloomberg reported that TBD Lab used distillation from rival models, including Google’s Gemma, OpenAI’s gpt-oss, and Alibaba’s Qwen. B-tier, anonymous sourcing.

[46] Demis Hassabis and John Jumper shared the 2024 Nobel Prize in Chemistry for AlphaFold’s protein structure prediction. Nobel Prize organization, October 2024. A-tier.

[47] DeepSeek. The widely cited “$5.6 million” figure refers only to pre-training compute costs; the total development cost was substantially higher. DeepSeek is trained on 2,048 Nvidia H800 GPUs. Fortune reported that Meta assembled four “war rooms” to analyze DeepSeek’s success; an anonymous Meta employee posted that DeepSeek “rendered Llama 4 already behind in benchmarks” and that “every single ‘leader’ of GenAI org is making more than what it cost to train DeepSeek V3 entirely.”

[48] Author disclosure. Three years as Chief Evangelist at Hugging Face (2021-2024). Previous: six years at AWS. Current: AI Operating Partner at Fortino Capital.

[49] Meta FY2025 total revenue: $200.97 billion, per Q4 2025 earnings release (SEC filing, January 28, 2026). A-tier. Revenue growth of 22% YoY driven by 12% increase in ad impressions and 9% increase in average price per ad. Q4 2025 revenue was $59.89 billion. Previous version of this footnote incorrectly cited the FY2024 figure of $164.5 billion; corrected to FY2025 actual.

[50] Zuckerberg, Q3 2025 earnings call: Meta AI surpassed 700 million MAU. The exact current figure may be higher.

[51] Zuckerberg, Q4 2025 earnings call, January 28, 2026.

[52] Google/Alphabet announced the merger of Brain and DeepMind on April 20, 2023. Google Brain was founded in 2011 by Jeff Dean, Greg Corrado, and Andrew Ng. DeepMind was acquired in January 2014 for a reported $500-650 million.

[53] CNBC obtained the internal memo. Hassabis became CEO of Google DeepMind. Jeff Dean was elevated to the role of Chief Scientist at Google DeepMind and Google Research.

[54] Sundar Pichai, Alphabet Q4 2025 earnings call, February 4, 2026: “The Gemini App has grown to over 750 million monthly active users.”

[55] Pichai, same call: “We have sold more than 8 million paid seats of Gemini Enterprise to more than 2,800 companies.”

[56] Google Cloud Q4 2025: revenue $17.66 billion, up 48% year-over-year. Annual run rate exceeds $70 billion. Backlog grew 55% sequentially and more than doubled YoY to $240 billion. Operating margin 30.1%. Alphabet Q4 2025 earnings.

[57] Google TPU v1 was deployed internally in 2015 (inference-only, 28nm). TPU v7 “Ironwood” announced November 2025, commercially available. Seven generations of continuous custom silicon investment — the longest sustained custom AI chip program in the industry.

[58] “Chip and Mortar: Amazon Failed at AI Models, Chips, and Frameworks. Then It Stopped Trying — and Became the Infrastructure Beneath Everyone Else’s AI.” The AI Realist, March 2026. The Infrastructure Reversion Test was introduced there: “When a company attempts to cross the infrastructure-intelligence boundary, the direction it reverts to is where the returns are.”

[59] AWS and Cerebras joint press release, March 13, 2026. Architecture: Trainium optimized for prefill, Cerebras CS-3 optimized for decode, connected via Elastic Fabric Adapter networking, built on the Nitro System. AWS is the first and exclusive cloud provider for Cerebras’s disaggregated inference solution, available through Amazon Bedrock. David Brown, VP Compute & ML Services: “Each system does what it’s best at.” No Inferentia3 has been announced — AWS appears to be converging its custom silicon strategy around Trainium variants rather than maintaining separate training and inference chip lines.

[60] Apple and Google's joint statement on January 12, 2026, confirmed the partnership. Bloomberg reported terms at approximately $1 billion annually (B-tier, not officially confirmed). Apple licenses a custom Gemini model running on Apple’s Private Cloud Compute servers, not Google’s infrastructure. Architecture designed for hot-swappable model replacement over time. Apple Q1 FY2026 balance sheet: approximately $141 billion in cash and marketable securities. Apple FY2025 capex: $12.7 billion — less than 10% of Alphabet’s 2026 guidance.

[61] Microsoft 365 Blog, March 9, 2026: Copilot Cowork built “in close collaboration with Anthropic.” Claude is now available in the mainline Copilot Chat for Frontier program users alongside OpenAI models. Microsoft-Anthropic investment: up to $15 billion; Anthropic committed $30 billion in Azure compute. Microsoft 365 E7 bundle at $99/user/month launching May 1, 2026. Jared Spataro (CMO, AI at Work): “Every 60 days at least, there’s a new king of the hill. There’s so much demand for a platform that doesn’t feel like, ‘I have to skip over to the next vendor.’”

[62] Google’s layers: Chips (TPU v7, 10+ years), Intelligence (Gemini 3), Infrastructure (data centers, $175-185B 2026 capex guidance per Alphabet Q4 2025 earnings), Platform (Vertex AI/Google Cloud), Distribution (Android 3B+ devices, Chrome, Search, YouTube, Workspace with 325M paid seats).

[63] Apple’s on-device inference: Foundation Models framework runs a ~3B-parameter model at 0.6ms time-to-first-token latency, 30 tokens/second, offline-capable, with zero API cost for developers. Introduced at WWDC 2025. Apple Neural Engine in A-series and M-series chips is purpose-built for on-device ML inference. 2.4 billion active devices per Apple earnings.

[64] Microsoft's investment in OpenAI: cumulative ~$13 billion+. Microsoft’s own custom AI chip (Maia 100), announced at Ignite 2023, has seen minimal public adoption — the company appears to have concluded that platform orchestration and model diversity, not proprietary silicon, are its competitive advantage. OpenAI broke Microsoft’s infrastructure exclusivity by signing a $38B+ AWS deal in February 2026.

[65] Amazon 2026 capex: approximately $200 billion, per CEO Andy Jassy on Q4 2025 earnings call, February 5, 2026. “Predominantly” for AWS. Amazon stock fell 8-10% on the announcement. Bedrock serves 100,000+ customers, hosting models from Anthropic, OpenAI, Meta, Mistral, and Amazon’s own Nova family.

[66] Meta has 3.58 billion daily active users across its family of apps, per Q4 2025 earnings release.

[67] Oracle layoff estimates from TD Cowen research note, January 26, 2026. 20,000-30,000 positions from a workforce of approximately 162,000. Not confirmed by Oracle. TD Cowen estimated that layoffs would generate approximately $8-10 billion in cash flow. Moody’s rates Oracle Baa2 — two notches above junk.

[68] Oracle's total debt exceeded $108 billion as of Q2 FY2026 (November 2025 10-Q filing); Q3 data (March 2026) suggests further increase. The $18 billion bond issuance in September 2025 comprised notes with maturities from 2030 to 2065 across six tranches (4.45%-6.10%), per SEC 8-K filing. Restructuring plan originally $1.6 billion, increased to $2.1 billion per Q3 FY2026 10-Q (March 2026). $982 million recognized through Q3; approximately $1.12 billion remaining.

[69] TD Cowen: “Multiple U.S. banks have pulled back from Oracle-linked data center project lending.” Interest rate premiums “roughly doubled” since September 2025. Blue Owl Capital withdrew from a $10 billion Michigan datacenter project for Oracle/OpenAI in December 2025 (CNBC, Bloomberg). Oracle’s 5-year CDS hit 155.27 basis points in December 2025 — the highest since the 2009 financial crisis.

[70] “Hotel Abilene: The AI Datacenter Deal Nobody Wanted to Stop.” The AI Realist, March 2026. Analyzed the Oracle-OpenAI-SoftBank codependency structure within the Stargate venture.

[71] Capex table sourcing. The non-correlation between spending and model capability is structural, not linear — Google’s capex produces the best models because of its research culture, not despite its spending level. Amazon: ~$200B, per CEO Andy Jassy, Q4 2025 earnings call, February 5, 2026 (company guidance). Alphabet/Google: $175-185B, per CFO Anat Ashkenazi, Q4 2025 earnings call, February 4, 2026 (company guidance). Microsoft: ~$130-145B estimated range; H1 FY2026 capex was $72.4B ($34.9B Q1 + $37.5B Q2), per Microsoft Q1 and Q2 FY2026 earnings releases. CFO Amy Hood guided Q3 capex to “decrease on a sequential basis,” implying H2 may be lower than H1. The range reflects annualized H1 at the full rate (~$145B ceiling) and a scenario in which H2 declines modestly (~$130B floor). Microsoft does not issue formal annual capex guidance. Meta: $115-135B including principal payments on finance leases, per Q4 2025 earnings release, January 28, 2026 (company guidance). Oracle: $50B FY2026 guidance, raised from $35B at Q1 FY2026 to $50B at Q2 FY2026 earnings call, December 10, 2025. Confirmed unchanged in Q3 FY2026 SEC filing (March 2026). Apple: FY2025 capex was $12.7B per 10-K filing; no formal FY2026 guidance issued. The figure represents the trailing actual. Table sorted by descending capex magnitude.

[72] Meta's trailing twelve-month free cash flow is $46.1 billion through Q4 2025, per a GuruFocus calculation based on SEC filings. Meta defines free cash flow as cash from operations minus purchases of property and equipment minus principal payments on finance leases. FY2024 FCF was $52.1 billion per Meta Q4 2024 earnings release (SEC filing). The decline in TTM FCF from FY2024 to Q4 2025 reflects accelerating capex outpacing operating cash flow growth.

[73] Meta cash, cash equivalents, and marketable securities: $77.81 billion as of December 31, 2024, per Q4 2024 earnings release. Long-term debt: $28.83 billion as of the same date. Additionally, Meta extended the estimated useful life of servers and network assets from 4.5 to 5.5 years, effective January 1, 2025, reducing annual depreciation expense by approximately $2.9 billion (Meta Q4 2024 earnings release, footnote 1; confirmed in Q1 2025 10-Q at $826 million quarterly impact, or $695 million after tax). Meta’s total depreciation and amortization was approximately $14.6 billion in FY2024 (10-K), making the $2.9 billion reduction roughly a 20% decrease in annual depreciation — a material impact on reported operating income. This change flatters reported operating income without affecting cash flow — the Depreciation Lens applies: “improved margins” from the useful life extension are accounting, not operational. As capex accelerates in 2026, the compounding effect of lower depreciation on older assets, combined with new asset additions, will temporarily inflate reported profitability.

[74] The grid analyzes the six largest AI infrastructure investors by 2026 capex. Private companies (xAI, Anthropic, DeepSeek) and Chinese labs (Baidu, Alibaba, ByteDance) operate under different capital structures, regulatory environments, and disclosure requirements. The framework applies to any company; these six illustrate the range of structural positions.

[75] Meta (META) closed at $613.71 on March 13, 2026, down 3.83% on the day, on elevated volume of 18.77 million shares versus a 15.02 million average. The session followed both the NYT Avocado delay report and the Reuters layoff report, landing within hours of each other. 52-week high: $796.25 (September 2025). Year-to-date performance as of March 14: -5.64%. The stock was down approximately 23% from its September peak. Pre-market on March 14 showed partial recovery toward $672. Source: Yahoo Finance, TipRanks, Meyka market data.

[76] Nebius Group press release, March 16, 2026: “Nebius Signs New AI Infrastructure Agreement with Meta.” $12 billion in initial compute capacity starting in 2027, providing access to Nvidia’s Vera Rubin platform. Meta has also committed to purchasing up to $15 billion in additional compute capacity reserved for third-party customers over five years. Total deal value up to $27 billion. Nebius (NBIS) rose approximately 12% on the announcement. Nvidia disclosed a $2 billion investment in Nebius the prior week to deploy more than 5 gigawatts of datacenter capacity by the end of 2030. Meta (META) rose approximately 3% on March 16, boosted by the Nebius deal and weekend Reuters layoff report. The dependency structure: Nvidia invests in Nebius → Nebius sells compute to Meta → Meta’s capex flows back through the Nvidia ecosystem. Yahoo Finance, March 16, 2026.

[77] JPMorgan analyst Harlan Sur, post-Q4 FY2026 Nvidia earnings note (February 2026): “NVDA CPUs are being deployed alongside META MTIA ASIC XPUs.” Nvidia head of AI infrastructure Dion Harris told CNBC (March 13, 2026): “CPUs are becoming the bottleneck in terms of growing out this AI and agentic workflow.” At GTC 2026 (March 16-19), Nvidia unveiled CPU-only racks and provided expanded details on Vera CPU deployments alongside hyperscaler-custom inference ASICs. The implication: even Meta’s inference workloads — the one compute layer Meta designed its own silicon for — require Nvidia CPUs as orchestrators. Sherwood News, CNBC, Tom’s Hardware GTC 2026 coverage.

Still Missing Critical Pieces

Julien Simon — Mon, 16 Mar 2026 17:10:56 GMT

In July 2025, I published a piece arguing that the Model Context Protocol systematically overlooked four decades of hard-won lessons from distributed systems.[1] The piece made specific claims. MCP’s session model can’t scale horizontally without sticky routing. Authentication was an afterthought. JSON overhead and round-trip tool calling don’t survive production load. No cost attribution, no token counting, no protocol-level quota management. And the fragmentation that MCP aimed to prevent would still emerge — from the adopters, not the competitors.

My post drew over 50,000 views and 2,200 claps, and The New Stack cited it by name.[3] Yet the timing was not popular. LinkedIn influencers were calling MCP "USB-C for AI." Developers were shipping demo servers in an afternoon and declaring the integration problem solved. I got a lot of angry messages, the huge majority completely unsubstantiated. Was I standing in the way of juicy consulting projects?

Two months later, I walked into Europe's first MCP developer conference in Berlin and titled my talk "Missing Critical Pieces."[2] Grey-haired practitioners — the engineers who'd shipped production RPC systems, who remembered why gRPC has deadline propagation — nodded. I was saying out loud what enterprise architects were thinking, but the hype cycle wouldn't let them say: “This protocol is not production-ready.”

Eight months later, MCP won the standard war. OpenAI adopted it and signaled a move away from its own Assistants API.[4] Google DeepMind and Microsoft integrated it. Anthropic donated MCP to the Linux Foundation’s Agentic AI Foundation in December 2025, with AWS, Bloomberg, Cloudflare, Google, and Microsoft as platinum members.[5] Over 10,000 MCP servers published. Monthly SDK downloads exceeding 97 million.[6] No competing protocol came close. By every adoption metric, MCP is the standard. Credit where it’s due: MCP solved the discovery and transport problem. Getting any AI model to find and connect to any tool through a single protocol is a genuine achievement.

Then came the week of March 9–13, 2026. The companies most invested in MCP’s success confirmed claim after claim from last summer.

On March 9, the MCP project published its 2026 roadmap.[7] I wrote that you can’t horizontally scale MCP servers without sticky routing. The roadmap’s own language: Streamable HTTP — the transport that lets MCP servers run as remote services — has “stateful sessions that fight with load balancers” and “horizontal scaling requires workarounds.”[8] Enterprise readiness is listed as the fourth and final priority, described as “the least defined of the four priorities.” No Enterprise Working Group exists. No new spec version has shipped since November 2025.[9]

On March 11, Perplexity CTO Denis Yarats announced at the Ask 2026 conference that Perplexity is moving away from MCP internally.[10] Yarats cited two reasons: context window overhead and authentication friction — the same authentication gap and ecosystem fragmentation risk I had flagged.[11] The company shipped its own MCP server in late 2025. Within months, their solution was to abandon MCP in favor of a single REST endpoint with a single API key. Y Combinator CEO Garry Tan independently built a CLI instead, citing reliability and speed.[12]

On the same day, Cloudflare published a technical analysis that put a number on the overhead problem.[13] Their MCP server covers 2,500 API endpoints using two tools and roughly 1,000 tokens. A native MCP implementation exposing the same endpoints would consume roughly 244,000 tokens — more than the entire context window of most models.[14] For complex batch operations, Cloudflare’s Code Mode approach uses 81% fewer tokens than standard MCP tool calling.[15] MCP works as a discovery layer. It collapses as a production execution layer, because dumping full tool schemas into context for every interaction is a cost no production system can absorb.

The scale of the token problem only became visible when companies actually measured it — because MCP has no built-in cost attribution or token counting. Gil Feig, CTO of Merge, estimates that tool metadata overhead accounts for 40–50% of available context in typical deployments.[16] One developer reported that seven MCP servers consumed 67,300 tokens — a third of a 200,000-token context window — before any conversation began.[17] The overhead is structural: MCP requires the model to see complete tool definitions for every interaction. It cannot be patched without redesigning how MCP works.

The response to all of this is predictable: “But there’s a roadmap!”

A roadmap is not a fix. The 2026 MCP roadmap is a governance document. It describes Working Groups that will define deliverables on timelines they control. It lists priority areas, not solutions. The Enterprise Working Group doesn’t exist yet — the roadmap invites volunteers to form one.[18] The Transport Working Group is exploring “several approaches” to session handling, “with a cookie-like mechanism being one potential candidate.”[19] Exploring candidates is the language of research, not shipping. HTTP could mature slowly because the early web was patient. MCP cannot, because enterprise AI deployment timelines are measured in months, not years.

The fragmentation I predicted has arrived — from multiple directions at once. Cloudflare kept MCP for discovery but replaced its tool-calling mechanism with code generation. Perplexity abandoned MCP internally in favor of direct APIs. Block’s goose framework implemented Code Mode as an extension.[20] Anthropic itself independently explored the same code-execution pattern.[21] MCP’s defenders will say this is evolution — that Code Mode builds on MCP, not around it.[22] But replacing the core tool-calling protocol while keeping the discovery layer is not building on. It is keeping the address and gutting the house.

I have seen this gap from the other side. Full disclosure: Fortino Capital, where I’m an AI Operating Partner, acquired MEHRWERK in July 2025.[23] Their team recently built a production MCP server for mpmX, their process intelligence platform [24]. I sat with the engineer who built it. What it took: enterprise-level security and compliance that MCP does not provide, and a full back office for configuration, observability, and everything else that MCP does not specify. The gap between a demo MCP server and one you can ship to enterprise customers is months of engineering.

If you are a CTO or VP of Engineering evaluating MCP for your agentic infrastructure, use MCP for discoverable workflows, preferably local. That is what it actually delivers. For deterministic work at production scale, use function calling, direct APIs, or CLIs. The security, observability, and authorization layers that MCP does not provide will take months, not days, to build. And do not plan your timeline around the MCP roadmap. The companies that built and championed MCP are not planning their own around it either.

MCP won the standard war. Winning was the easy part. The hard part — making the standard work at production scale — is being done by everyone except the standard.

To my fellow engineers: chins up, and keep calling it before the roadmap does. That is what separates practitioners from influencers.

Notes

[1] Julien Simon, “Why MCP’s Disregard for 40 Years of RPC Best Practices Will Burn Enterprises,” July 2025. https://julsimon.medium.com/why-mcps-disregard-for-40-years-of-rpc-best-practices-will-burn-enterprises-8ef85ce5bc9b

[2] MCP Conference Berlin, September 16, 2025 — Europe’s first developer conference dedicated to the Model Context Protocol. Talk title: “Missing Critical Pieces.” https://luma.com/mcpconferenceberlin2025.

[3] Cited in Richard MacManus, “Why the Model Context Protocol Won,” The New Stack, December 18, 2025. https://thenewstack.io/why-the-model-context-protocol-won/

[4] OpenAI adopted MCP in March 2025. Multiple sources report the deprecation of the Assistants API, with a mid-2026 sunset, but no primary OpenAI announcement has been independently confirmed as of publication. See Greg Robison, “The Model Context Protocol: The Architecture of Agentic Intelligence,” Medium, December 23, 2025. https://gregrobison.medium.com/the-model-context-protocol-the-architecture-of-agentic-intelligence-cfc0e4613c1e

[5] Linux Foundation press release, “Linux Foundation Announces the Formation of the Agentic AI Foundation (AAIF),” December 9, 2025. https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation

[6] Anthropic, “Donating the Model Context Protocol and Establishing the Agentic AI Foundation,” December 9, 2025. Vendor-published figures. https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation

[7] David Soria Parra (Lead Maintainer), “The 2026 MCP Roadmap,” Model Context Protocol Blog, March 9, 2026. http://blog.modelcontextprotocol.io/posts/2026-mcp-roadmap/

[8] Paraphrased from the roadmap and the MCP Transport Working Group post. The roadmap states that running Streamable HTTP at scale “surfaced a consistent set of gaps: stateful sessions fight with load balancers, horizontal scaling requires workarounds.” http://blog.modelcontextprotocol.io/posts/2025-12-19-mcp-transport-future/

[9] Per the 2026 roadmap: “MCP’s current spec release came out in November 2025. We haven’t cut a new version since.” MCP contributor Yann Jouanin tells me that there’s an enterprise Interest Group (#enterprise-ig on the MCP Discord).

[10] Denis Yarats, CTO of Perplexity, at Ask 2026 conference, March 11, 2026. Reported in multiple outlets; no official transcript or recording located as of this writing. All coverage is secondhand.

[11] Sophie Zhang, “Perplexity CTO Moves Away from MCP Toward APIs and CLIs,” Awesome Agents, March 11, 2026. Perplexity’s Agent API — a single REST endpoint supporting models from six providers — launched in general availability February 2026. https://awesomeagents.ai/news/perplexity-agent-api-mcp-shift/

[12] Garry Tan’s CLI preference cited in the same Awesome Agents report and in the Versalence analysis. No primary Tan source located.

[13] Cloudflare published two related posts. The original: “Code Mode: the better way to use MCP,” September 26, 2025. The expanded follow-up: “Code Mode: give agents an entire API in 1,000 tokens,” March 11, 2026. https://blog.cloudflare.com/code-mode-mcp/

[14] Per the Cloudflare MCP GitHub repository: “The Cloudflare OpenAPI spec is 2 million tokens. Even with native MCP tools using minimal schemas, it’s still ~244k tokens.” Note “minimal schemas” — this is the best-case native MCP estimate, not worst-case. The Code Mode server covers the same API surface in approximately 1,000 tokens. https://github.com/cloudflare/mcp

[15] 32% reduction for simple tasks and 81% for complex batch operations. Demonstrated by Rita Kozlov, VP AI and Developers at Cloudflare, at MCP Night: The Holiday Special, December 2025. Vendor-demonstrated benchmarks. https://workos.com/blog/cloudflare-code-mode-cuts-token-usage-by-81

[16] Gil Feig, CTO of Merge. Cited in Versalence, “Long Live MCP: Why the Model Context Protocol Is Facing an Evolution in 2026.” https://blogs.versalence.ai/mcp-model-context-protocol-evolution-2026

[17] Developer-reported figure (7 MCP servers, 67,300 tokens, 33.7% of 200K context). Same Versalence source. Anonymous; treat as illustrative, not definitive.

[18] Per the 2026 roadmap: “A dedicated Enterprise WG does not yet exist. If you work in enterprise infrastructure and want to lead or join one, the Working Groups page explains how to get started.”

[19] MCP Transport Working Group, “Exploring the Future of MCP Transports,” December 19, 2025. http://blog.modelcontextprotocol.io/posts/2025-12-19-mcp-transport-future/

[20] Block/goose implemented Code Mode in v1.17.0, December 2025. https://block.github.io/goose/blog/2025/12/15/code-mode-mcp/

[21] Cloudflare’s March 2026 blog post states: “Anthropic independently explored the same pattern in their Code Execution with MCP post.” Primary Anthropic source not independently located as of publication.

[22] Block’s goose team explicitly published “Code Mode Doesn’t Replace MCP” (December 21, 2025), arguing that Code Mode still uses MCP under the hood and that the protocol remains the discovery and transport layer. This is accurate — and that is precisely the point. https://block.github.io/goose/blog/2025/12/21/code-mode-doesnt-replace-mcp/

[23] Fortino Capital, “Fortino Capital Acquires MEHRWERK to Accelerate Open Platform Innovation and AI-Driven Process Intelligence,” July 8, 2025. https://www.fortinocapital.com/news/fortino-capital-acquires-mehrwerk-accelerate-open-platform-innovation-and-ai-driven-process

[24] mpmx.ai

AWS Built Its Own AI Chip. Now It Needs Someone Else’s.

Julien Simon — Sun, 15 Mar 2026 14:06:27 GMT

In March 2024, I explained to my YouTube audience how LLM inference actually works.[1] Two distinct phases, I said. Prefill — the phase where the model processes your entire prompt in parallel, filling the key-value cache. And decode — the phase where the model generates its answer one token at a time, reading the full weight matrix from memory for every single output token. Prefill is compute-bound, embarrassingly parallel, the kind of workload GPUs were born for. Decode is memory-bandwidth-bound, stubbornly sequential, the kind of workload that leaves 60-80% of a GPU’s transistors idle. Two phases with opposite hardware requirements, running on the same chip. I called it the fundamental tension of LLM inference.

Two years later, AWS resolved that tension by splitting the workload across two different companies’ chips.

On March 13, 2026, Amazon Web Services announced a partnership with Cerebras Systems to deploy what it calls “disaggregated inference” on Amazon Bedrock.[2] AWS Trainium handles prefill. Cerebras CS-3 systems, built around the wafer-scale WSE-3 processor, handle decode. Amazon’s Elastic Fabric Adapter connects them. The service launches later this year, running open-source models and Amazon Nova. AWS is the first major cloud provider to deploy Cerebras’s disaggregated inference solution inside its own data centers.

The press coverage focused on speed. The structural story is elsewhere. AWS — the cloud provider that has invested more in custom AI silicon than any competitor, through its Annapurna Labs subsidiary — just conceded that its own chip cannot optimally serve the workload that now dominates AI compute. Every GPU running inference today is a compromise: half the silicon is idle half the time, and reasoning models just tripled the bill for the idle half. That compromise is breaking. Three independent ecosystems — American cloud, American silicon, and Chinese silicon — have converged on the same architectural answer. What that answer reveals about where value settles in the AI hardware stack is the subject of the rest of this piece.

The Reasoning Tax

To understand why inference is fracturing, you need to understand its economics and why reasoning models have restructured it.

Every time you prompt a large language model, two computations happen in sequence. First, the model processes your input. Every token in your prompt gets embedded, run through dozens or hundreds of attention layers, and the resulting key-value pairs are stored in a cache. This prefill phase is massively parallel. A modern GPU hits 90-95% utilization during prefill. The hardware is doing exactly what it was designed for: large matrix multiplications across thousands of cores simultaneously.

Then the model starts generating its response, and the physics change entirely. Each output token depends on all previous tokens. The model must read the full weight matrix from memory for every single token it produces — billions of parameters, fetched one token at a time. This decode phase is inherently sequential. GPU utilization drops to 20-40%.[3] The bottleneck is no longer compute. It is memory bandwidth: how fast can the chip read model weights from memory? On an Nvidia H100, the answer is 3.35 terabytes per second of high-bandwidth memory (HBM).[4] For a 70-billion-parameter model, that limits decode throughput to roughly 24 tokens per second at the theoretical ceiling.[5] Most of the chip’s compute units sit idle, waiting for data.

This asymmetry has always existed. What changed is the ratio.

In 2023, a typical chatbot interaction involved a few hundred input tokens and a few hundred output tokens. The prefill and decode workloads were roughly balanced. A monolithic GPU that balanced both phases handled the job adequately. Then, reasoning models arrived. The latest DeepSeek R1 averages 23,000 tokens per question on math benchmarks.[6] Anthropic’s Claude 3.7 Sonnet in thinking mode generates approximately fifteen times more tokens than the standard model across an evaluation suite.[7] Agentic coding workloads produce roughly 15 times as many tokens per query as conversational chat.[8] Barclays estimates that agentic tasks can generate up to twenty-five times more tokens per interaction.[9]

Every one of those additional tokens hits the decode bottleneck.

The cost structure makes the problem visible. Across all major LLM providers in March 2026, output tokens cost four to eight times as much as input tokens, with a median ratio of approximately 5-to-1 for frontier models.[10] Claude Opus 4.6: $5 per million input tokens, $25 per million output. A five-to-one ratio.[11] GPT-5.4: $2.50 input, $15 output. A six-to-one ratio.[12] This pricing asymmetry is not arbitrary. It reflects the hardware asymmetry: decode is physically more expensive because it uses the silicon less efficiently.

Now multiply the pricing asymmetry by the token multiplication. A reasoning model that generates fifteen times more output tokens, at five times the per-token price, produces a total per-request cost roughly twelve times higher than a non-reasoning model processing the same query, even though per-token prices may have dropped 80% year over year.[13] I call this the Reasoning Tax: the structural cost of intelligence that scales with the number of thoughts, not the number of questions. Per-token prices fall. Per-request costs rise. Any business model that assumes inference gets cheaper per request as models improve is exposed.

The Reasoning Tax breaks the monolithic GPU by concentrating costs on the phase the GPU handles worst. When decode accounted for 50% of the workload, a chip that was 40% efficient in decode was an acceptable compromise. When decode becomes 90% of a single request’s compute — which is approximately what a fifteen-times token multiplier implies — that same chip is wasting most of its silicon most of the time. The compromise no longer holds.

What follows is the supply-chain story AWS did not tell, the global convergence that confirms it, and the framework that identifies where margin settles in the era of disaggregated inference.

What AWS actually revealed

The partnership AWS announced on March 13 is unprecedented in its architecture — no hyperscaler has previously split a single inference request across two vendors’ silicon in production — but the engineering logic is direct: Trainium for prefill, Cerebras CS-3 for decode, EFA networking between them, deployed on Bedrock. The engineering logic maps directly to the physics described above. Trainium’s dense compute cores excel at the parallel matrix multiplications of prefill. The Cerebras WSE-3, which stores model weights in 44 gigabytes of on-chip static memory (SRAM) rather than in HBM, delivers orders of magnitude more memory bandwidth than any GPU — eliminating the decode bottleneck entirely.[14] Cerebras has independently demonstrated over 2,100 tokens per second on Llama 70B and 969 tokens per second on Llama 405B, verified by Artificial Analysis.[15] AWS cannot match those decode speeds on Trainium.

The KV cache transfer — the data that must move from the prefill chip to the decode chip between phases — adds roughly 7 milliseconds of overhead, negligible compared to the decode latency on conventional GPUs, which can stretch to seconds.[16][17] The academic literature confirms the approach: DistServe, published at OSDI 2024, demonstrated up to 7.4 times higher throughput with prefill-decode disaggregation, with KV cache transfer overhead less than a single decode step.[18]

The technical logic is the obvious reading. The supply-chain logic adds a second, less visible motivation.

AWS’s Trainium allocation is fully committed. Project Rainier — the Anthropic training cluster — deploys roughly 500,000 Trainium2 chips across a 1,200-acre facility in Indiana, with plans to scale to one million.[19] OpenAI’s February 2026 expansion committed approximately 2 gigawatts of Trainium capacity across current- and next-generation chips.[20] SemiAnalysis reported that Trainium2 and Trainium3 are fully subscribed, with assembly yield issues during the Trainium2 ramp causing delays in fab-to-rack shipments.[21] When your two largest customers have collectively committed to consuming every chip you can manufacture for training, and inference demand is doubling annually, partnering with a decode specialist is both sound engineering and supply-chain pragmatism — the technical case and the allocation case reinforce each other.

The naming evolution tells its own story. In 2020, AWS launched two chips with unambiguous names: Inferentia for inference, Trainium for training. By 2024, Trainium2 was repositioned for “training and inference.” By late 2025, no Inferentia3 had been announced; Trainium absorbed both workloads. Then, in March 2026, AWS partnered with Cerebras specifically for decode, the exact workload Inferentia was originally designed to handle. AWS’s own product arc describes a chip that aspired to serve both phases, discovered it could not do both competitively, and outsourced the harder half.[22]

This is what I have previously called the Infrastructure Reversion Test, applied to inference.[23] AWS tried to build the full stack in-house. The intelligence-layer ambition — one chip for everything — hit a physical limitation. The resolution: revert to infrastructure. AWS controls the platform layer — Nitro for security, EFA for networking, Bedrock for abstraction, and Neuron SDK for orchestration — and absorbs third-party components for the capabilities it cannot build.

I call the generalized version the Platform Absorption Test: when a hyperscaler deploys third-party silicon for a core workload, it tells you where its moat is, and where it is not. AWS’s moat is the platform. The chip is a swappable component inside it. (Full disclosure: I spent six years at AWS, where I watched this platform-absorption pattern play out across Graviton, Inferentia, and Trainium — each chip designed to be excellent, each ultimately subordinate to the Nitro/EFA/EC2/Bedrock platform that made it deployable.[51])

Three ecosystems, one architecture

If the AWS-Cerebras deal were an isolated decision, it would be an interesting partnership. It is not isolated. Three independent ecosystems, operating under entirely different constraints, have converged on the same architectural conclusion.

Nvidia saw disaggregation coming and spent aggressively to control it. In December 2025, it struck a deal to absorb Groq’s assets — licensing the startup’s SRAM-based inference IP and hiring roughly 90 percent of its staff for approximately $20 billion, while Groq continued operating independently under new leadership.[24] Jensen Huang was explicit about the logic: Groq’s low-latency inference processors would be integrated into Nvidia’s AI factory architecture.[25] Rather than let an SRAM-based decode specialist become a platform for competitors, Nvidia absorbed its capabilities. Three months earlier, Nvidia had announced Rubin CPX: a dedicated prefill GPU — expected to ship in late 2026 — with 30 petaflops in NVFP4 and GDDR7 memory instead of HBM — approximately five times cheaper per gigabyte — explicitly designed to pair with standard Rubin GPUs for decode.[26] Two specialized chips from one vendor, connected by NVLink, orchestrated by Dynamo, Nvidia’s open-source disaggregated serving framework.[27] Jensen is not defending the GPU. He is building the system that replaces it.

The most revealing convergence comes from China. At Huawei Connect in September 2025, rotating chairman Eric Xu unveiled the Ascend 950 — not one chip, but two. The Ascend 950PR, optimized for prefill and recommendation workloads, ships in the first quarter of 2026 with HiBL 1.0, Huawei’s proprietary low-cost HBM. The Ascend 950DT, optimized for decode and training, ships in the fourth quarter of 2026 with HiZQ 2.0 HBM — 144 gigabytes at four terabytes per second of bandwidth.[28] Same die. Two memory configurations. Two products, explicitly named for the two phases of inference. Xu’s explanation was precise: prefill is compute-intensive and has lower memory-bandwidth demand, so it does not require expensive high-bandwidth memory. Decode requires fast memory access. Build each product for what it actually needs.[29]

Huawei went further. Its CloudMatrix-Infer system disaggregates inference into three independent subsystems — prefill, decode, and caching — operating as peer-to-peer resource pools connected by high-bandwidth interconnect.[30] The third subsystem is the key innovation: rather than each decode worker managing its own KV cache locally, CloudMatrix pools the cache into a shared disaggregated memory layer that all prefill and decode workers access uniformly. This decouples request scheduling from data locality — a limitation that constrains every other disaggregated system. In the AWS-Cerebras architecture, for instance, the KV cache must transfer point-to-point from the specific Trainium instance that computed it to the specific Cerebras chip that will decode from it. Huawei’s approach eliminates that coupling entirely, at the cost of requiring interconnect bandwidth sufficient to serve the cache to any worker in the cluster.

The Atlas 950 SuperPod, scheduled for late 2026, deploys 8,192 Ascend chips with an aggregate interconnect bandwidth of 16 petabytes per second.[30] This is not software disaggregation on identical GPUs. This is an entire inference architecture designed from first principles, based on the observation that prefill, decode, and memory are three distinct problems that require distinct optimizations.

Huawei arrived at disaggregation because it had no choice. Constrained to a domestic 7-nanometer process while Nvidia builds on TSMC 3nm, Huawei cannot compete on raw transistor performance.[31] When you cannot shrink the die, you specialize the system. Export controls made disaggregation mandatory, not optional. But the architecture works regardless of motivation — and the fact that Nvidia, unconstrained and dominant, arrived at the same answer from the opposite direction confirms it.

AMD occupies the middle ground. Its MI350X and MI355X ship with software-level disaggregation support: separate GPU pools for prefill and decode, with AMD’s MoRI interface handling KV cache transfer over RDMA.[32] But AMD uses the same chip on both sides. The disaggregation is in scheduling, not in silicon. This is technically sound and delivers real gains, but it does not capture the architectural advantage of phase-specific hardware. The tradeoff is operational simplicity: one chip, one vendor, one support contract — advantages that matter to enterprise customers who lack the engineering depth to manage multi-vendor silicon in a single inference request.

AMD’s strategic play is elsewhere: it acquired ZT Systems, a server ODM, to move from chip vendor to systems integrator, and its Meta 6GW deal and OpenAI partnership both come with equity rebates — AMD shares exchanged for GPU volume.[33] AMD’s explicit bet is that equity-for-volume buys enough installed base for ROCm to become a second software ecosystem. If it works, AMD escapes commoditization. If it doesn’t, giving away equity to win deals is the behavior of a component vendor losing pricing power.

Google bets against the thesis entirely. Its TPU v7 Ironwood is homogeneous: 4,614 teraflops per chip, 192 gigabytes of HBM3e, 7.4 terabytes per second of bandwidth, deployed in 9,216-chip pods.[34] Google’s position is that if you build enough bandwidth into a single chip and scale aggressively, disaggregation is unnecessary. Google compensates with aggressive software optimization — prefix caching, continuous batching, and XLA compiler fusion — that reduces decode waste without hardware specialization. This is the strongest counterargument to the piece’s thesis, and it may prove correct for Google’s specific integration of hardware, compiler (XLA), and workloads. But Google is the only major hyperscaler not publicly pursuing some form of disaggregation, which either makes it the smartest player in the room or the one most likely to be disrupted when the architecture shifts.

Three constraints, one architecture

Disaggregation serves the highest-value inference workloads — reasoning, agentic coding, long-context processing — that are also the fastest-growing segment of inference demand. The monolithic GPU has been declared obsolete before — by Graphcore, Habana, and Cerebras in its own early training pitch — and has survived every challenge. What distinguishes disaggregation is that it is not a competing chip. It is the incumbents themselves — Nvidia, AWS, Huawei — restructuring around the same physics. The monolithic GPU does not disappear. It becomes the commodity tier, handling short-context conversational queries at volume. But the premium tier, where margins are highest and competition is fiercest, is disaggregating. And in that tier, the question shifts from which chip is fastest to who owns the system.

Systems eat chips

The convergence tells you disaggregation is happening. The next question is who benefits.

The GPU era had a simple value chain. Nvidia designed the chip, TSMC fabricated it, server OEMs racked it, and cloud providers sold it. Nvidia captured the largest margin because it controlled the entire stack from silicon to CUDA. The disaggregated era is more complex — and the complexity reshuffles who captures value.

Start at the bottom of the stack. Original design manufacturers — Quanta, Foxconn, Wistron — build servers from components. Their margins sit at two to three percent.[35] One step up, branded OEMs like Super Micro integrate and qualify systems. Super Micro’s gross margin in its most recent quarter was 6.3 percent, down from 15.4 percent in late 2023, and still falling as competition intensifies.[36] Building racks from someone else’s chips is not a moat. It is a commodity assembly.

Now consider the other end. AWS operates at roughly a 30 percent operating margin.[37] Nvidia operates at roughly 73 percent gross margin.[38] Broadcom — which designs Google’s TPUs, Meta’s MTIA accelerator, and Anthropic’s custom chips, among others — operates at non-GAAP gross margins above 75 percent.[39] The margin gradient from ODM to OEM to chip vendor to platform operator is not random. It maps to control: whoever owns the layer that makes the adjacent layers interchangeable captures the most value.

Where the margin sits in a disaggregated stack (gross margin unless noted)

AWS reports operating margin, not gross; its gross margin is estimated at 60-65 percent, comparable to Nvidia’s. The operating figure is used because AWS does not disclose segment-level cost of goods sold, and because operating margin — which includes the cost of running the platform — is the more conservative measure of what the platform layer captures.

In the monolithic GPU era, that layer was the chip plus its software ecosystem. CUDA made every Nvidia GPU interchangeable from a developer’s perspective, and no competing chip could replicate the ecosystem. Nvidia controlled the interchangeability layer, and the margin followed.

Disaggregation changes which layer provides interchangeability. Once inference is a system — prefill chip, decode chip, interconnect fabric, orchestration software, platform abstraction — the value migrates to whoever assembles and orchestrates the components. I call this the Integration Premium: in any disaggregating hardware stack, margin migrates from component manufacturers to the integration layer. For any AI hardware investment, the first question is where the company sits in this hierarchy — and whether it has a credible path to the tier above.

The historical parallel is instructive, if imperfect. In the PC era, Intel’s x86 architecture was dominant, but the integration margin went first to Dell and HP (who assembled the systems) and then to AWS and Azure (who absorbed the systems into cloud platforms). Intel’s share of the value chain declined as the layers above it became the control point. The AI inference stack is not identical — Nvidia’s CUDA ecosystem is stickier than x86 was, and the hardware cycle is compressing faster — but the underlying force is the same. When hardware is disaggregated into components, the components lose pricing power to the assembler.

Three tiers are emerging. Platform operators — AWS, Google, Azure — absorb any silicon into managed services and control the customer relationship. They do not need to build the best chip. They need to make any chip deployable. Design IP providers — principally Broadcom and, to a lesser extent, Marvell — translate hyperscaler architectures into manufacturable silicon. Broadcom has designed chips for Google, Meta, ByteDance, and Anthropic, with a separate co-development partnership with OpenAI and at least one additional unannounced customer, representing an AI backlog exceeding $73 billion.[40] Its margins are the highest in the chain because it sells design expertise, IP blocks, and fab management — not finished products.

Could Broadcom build its own merchant inference ASIC — an application-specific integrated circuit, purpose-built for a single workload — and sell it directly? It has every technical ingredient: seven generations of TPU co-design, TSMC fab relationships, packaging expertise, and intimate knowledge of what prefill-optimized and decode-optimized silicon requires. The answer is structural, not technical. Google did not invest seven generations of TPU co-design so that Broadcom could sell a TPU-equivalent to Azure. Meta did not pay for custom MTIA silicon so that a merchant version could be released to the open market. A Broadcom merchant chip would commoditize every custom design its hyperscaler customers paid to differentiate — destroying the franchise that generates 75-plus percent margins. Broadcom’s CEO has explicitly framed the stickiness: the value of custom accelerators comes from the learning curve in co-design with each hyperscaler, creating multi-generational relationships that deepen over time.[41]

Disaggregation does not push Broadcom toward merchant silicon. It multiplies its design revenue. Every hyperscaler that disaggregates inference now needs two custom chip designs instead of one — a prefill-optimized ASIC and a decode-optimized ASIC, each with different memory hierarchies, precision formats, and interconnect requirements. Huawei built both variants from a single die. Hyperscalers working with Broadcom may do the same, or may commission fully distinct designs. Either way, the design IP revenue per customer expands. This is why total ASIC shipments from Google and AWS alone reached 40-60 percent of Nvidia GPU shipments by 2025, and are projected to surpass Nvidia’s total GPU shipments by late 2026.[52]

The third tier is merchant silicon: Nvidia, AMD, Cerebras, and every other company selling chips into someone else’s platform. Nvidia’s position is unique because Jensen Huang is attempting to occupy all three tiers simultaneously. Rubin CPX and the Groq deal give him both specialized chips. NVLink Fusion — which allows third-party accelerators, including Trainium and TPUs, to plug into Nvidia’s interconnect fabric — positions Nvidia as the integration standard.[42] Dynamo gives him the orchestration software. If Jensen succeeds, Nvidia captures the Integration Premium on top of its chip margins. If third parties control the integration layer instead — if Bedrock and EFA become the standard, not NVLink and Dynamo — Nvidia’s 73 percent margins face the same competitive pressure that eventually compressed Intel’s.

Component vendor at $23 billion

The Cerebras IPO, expected as soon as April 2026, is the most direct test of whether a decode-specialist chip vendor can sustain premium margins without controlling the platform.

The company’s financial trajectory has been steep. Revenue climbed from $24.6 million in 2022 to $78.7 million in 2023.[43] The first half of 2024 reached an estimated $136.4 million — a 14-fold year-over-year increase — though the original S-1 revealed that G42, the UAE-based technology group, accounted for 87 percent of that revenue.[44] The company has never been profitable, posting a $66.6 million net loss in the first half of 2024, though gross margins improved from 11.7 percent to approximately 41 percent over the same period.[45]

The IPO narrative was reconstructed in three moves. In January 2026, OpenAI signed a deal worth over $10 billion for 750 megawatts of Cerebras compute through 2028.[46] In February, Cerebras raised a $1 billion Series H at approximately $23 billion — nearly tripling the $8.1 billion Series G valuation from five months earlier.[47] In March, the AWS partnership was announced during the active Morgan Stanley-led roadshow.[48] Each move addressed a specific S-1 vulnerability: OpenAI reduced customer concentration, the Series H established a valuation floor, and AWS provided validation of hyperscaler technology.

The valuation demands scrutiny. At $23 billion against approximately $270 million in the most recent publicly disclosed annualized revenue — now eighteen months stale — the nominal multiple is 85 times trailing revenue. Against the most optimistic industry estimate that 2026 revenue may approach $1 billion, the forward multiple is 23 times, still rich for a hardware company with gross margins in the mid-thirties to low forties.[49] Nvidia’s gross margins exceed 73 percent. AMD’s are in the high forties. Hardware margins in the thirties reflect the structural economics of wafer-scale manufacturing on TSMC 5nm — a sole-source fabrication dependency shared with every leading-edge fabless company, but with no chiplet or binning fallback; a single yield defect scraps the entire wafer, not just one chiplet — not the software-like scalability that justifies technology multiples.

Apply the Commitment vs. Spend Gap. The OpenAI deal is worth over $10 billion — but over how many years, at what delivery schedule, and under what contract structure? Whether the agreement is take-or-pay or consumption-based is the single most important undisclosed variable for the updated S-1. A take-or-pay contract provides revenue predictability. A consumption-based agreement means revenue depends on OpenAI’s actual inference volumes, which are a function of model architecture decisions, competitive dynamics, and customer demand that Cerebras does not control. The AWS partnership is even less specific: “collaboration” and “coming months,” with deployment “based on customer demand.”[50] Commitments are not revenue. Press releases are not purchase orders.

The Integration Premium framework suggests Cerebras faces structural headwinds as a public company. Its technology is real — the decode speed advantage is verified and architecturally durable. But Cerebras operates as a component vendor inside other companies’ platforms. AWS controls the Bedrock relationship. OpenAI controls the model architecture. The chip is exceptional. The margin structure may not be, because the platform owner, not the component vendor, sets the terms.

Where value settles

The Reasoning Tax is not a temporary condition. As models become more capable, they reason more, generating more tokens per request. The cost of that reasoning concentrates on decode — the phase that scales worst on conventional hardware. Any financial model that projects inference costs declining per request, rather than per token, needs stress-testing against reasoning workloads. The metric that matters is shifting from peak FLOPS to tokens per second per dollar per watt on decode-heavy workloads.

The Integration Premium identifies where the margin settles. In a disaggregated stack, the durable margin sits with whoever makes the components interchangeable: the platform operator that abstracts away the silicon, the design IP provider that translates architectures into manufacturable chips, or the full-stack vendor that controls the entire system from silicon to software. Pure-play chip companies — selling merchant silicon into someone else’s platform — face the same margin compression that server OEMs already experience, unless they control a layer above the chip.

A caveat on timeline: Nvidia may sustain full-stack margins for years before these forces take hold. Intel held its dominant position for two decades before cloud platforms displaced it. CUDA’s ecosystem lock-in is real and deep. Jensen’s strongest counter is that NVLink Fusion and Dynamo position Nvidia as the integration standard even in a disaggregated world — if AWS routes Trainium-to-Cerebras traffic through Nvidia fabric rather than EFA, Jensen’s bet pays off. Current evidence points in the other direction: every hyperscaler is building a proprietary interconnect.

The question for every AI hardware investment is no longer “which chip is fastest?” The chip is becoming a component. The question is: who owns the system?

The GPU did not die. It disaggregated. And when the hardware splits, the value migrates to whoever holds it together.

Notes

[1] “Deep Dive: Optimizing LLM Inference,” Julien Simon, YouTube, March 11, 2024. The video covers KV cache mechanics, continuous batching, and speculative decoding for decoder-only architectures.

[2] “AWS and Cerebras Collaboration Aims to Set a New Standard for AI Inference Speed and Performance in the Cloud,” AWS Press Release, March 13, 2026.

[3] DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving, Zhong et al., OSDI 2024. Roofline model analysis characterizes prefill arithmetic intensity at 200-400 FLOPs/byte (compute-bound) and decode at 0.5-8 FLOPs/byte (memory-bandwidth-bound).

[4] Nvidia H100 Tensor Core GPU Datasheet. HBM3 bandwidth: 3.35 TB/s.

[5] Author’s calculation. Llama 70B at FP16 requires ~140 GB of weight reads per decode step. At 3.35 TB/s theoretical bandwidth and ~70% practical efficiency, throughput is approximately (3.35 × 0.7 × 10^12) / (140 × 10^9) ≈ 16.75 tokens/sec, rising to ~24 tok/s with FP8 quantization halving the weight size. Theoretical ceiling; real-world throughput varies with batch size, sequence length, and serving framework.

[6] DeepSeek R1-0528 model card (Hugging Face, May 2025): “the new version averages 23K tokens per question” on the AIME math benchmark. The original R1 (January 2025) averaged approximately 12K tokens on the same benchmark. The 23K figure represents the latest model revision on a reasoning-intensive evaluation; general conversational usage produces fewer tokens.

[7] Artificial Analysis, token consumption measurements across evaluation suite, comparing Claude 3.7 Sonnet standard vs. Claude 3.7 Sonnet Thinking with 64K token budget. Approximately 15x multiplier.

[8] “Cerebras is coming to AWS,” Cerebras blog, March 13, 2026: “Unlike conversational chat, agentic coding generates approximately 15x more tokens per query.” Vendor-published claim.

[9] Barclays analyst estimate for agentic task token multiplication. B-tier source; presented as analyst estimate.

[10] “Understanding LLM Cost Per Token: A 2026 Practical Guide,” Silicon Data, 2026: “the median output-to-input price ratio is around 4×” across all providers, including budget models. Frontier models (Opus 4.6, GPT-5.4, Gemini 2.5 Pro) cluster at 5-8x. Range from approximately 2.5x (budget tiers) to 8x (GPT-5, Gemini 2.5 Pro).

[11] Anthropic official API pricing, March 2026. Claude Opus 4.6: $5/MTok input, $25/MTok output.

[12] OpenAI official API pricing, March 2026. GPT-5.4: $2.50/MTok input, $15/MTok output.

[13] Author’s calculation. Baseline non-reasoning request: 500 input tokens at price P, 500 output tokens at 5P. Total cost = 500P + 2,500P = 3,000P. Reasoning request (15x output multiplier): 500 input tokens at P, 7,500 output tokens at 5P. Total cost = 500P + 37,500P = 38,000P. Ratio: 38,000 / 3,000 ≈ 12.7x. Range of 10-15x, depending on input/output ratio assumptions and the specific provider's pricing. The output component alone is 75x higher (15 × 5), but the total request cost, including unchanged input tokens, is approximately 12-13x. YoY per-token price declines of ~80% (TLDL.io pricing analysis, March 2026) offset partially: 0.2 × 12.7 ≈ 2.5x the prior year’s per-request cost at 2026 token prices.

[14] The WSE-3 contains 900,000 cores and 44 GB of on-chip SRAM, delivering internal memory bandwidth orders of magnitude higher than that of HBM-based architectures. SiliconANGLE (March 13, 2026) reports 27 PB/s; Cerebras blog states “thousands of times greater memory bandwidth than the fastest GPU.” The exact figure is vendor-published; no independent verification. The 44 GB SRAM capacity means that models with more than approximately 70B parameters (at reduced precision) require weight streaming from external DRAM, reintroducing bandwidth constraints. Cerebras’s benchmarked speeds on 70B and 405B models reflect its weight-streaming architecture combined with on-chip SRAM caching.

[15] Artificial Analysis independent benchmarks. Cerebras-published; verified by third-party benchmarking service.

[16] Author’s calculation. Llama 405B with Grouped Query Attention: 8 KV heads, 126 layers, 128 head dimension, BF16 precision. Per-token KV cache ≈ 8 × 126 × 128 × 2 × 2 = ~0.49 MB. At 4,096 tokens: ~2.01 GB. AWS EFA provides 400 GB/s (3.2 Tbps) per Trn2 instance. At ~70% practical efficiency: ~2.01 GB / 280 GB/s ≈ 7.2 ms. Consistent with KV cache calculations in JarvisLabs technical documentation (January 2026) for comparable architectures.

[17] AWS EFA specifications for Trn2 instances: 3.2 Tbps (400 GB/s).

[18] DistServe, Zhong et al., OSDI 2024. Demonstrated 7.4x goodput improvement and 12.6x stricter SLO compliance vs. colocated serving. KV cache transfer overhead is characterized as “less than the time of a single decoding step.”

[19] “AWS activates Project Rainier: One of the world’s largest AI compute clusters,” About Amazon, October 2025. Nearly 500,000 Trainium2 chips across 30 data centers on a 1,200-acre Indiana site. Anthropic target of scaling to ~1 million chips per Introl (February 2026).

[20] AWS-OpenAI partnership expansion, February 2026. OpenAI is committed to consuming approximately 2 gigawatts of Trainium capacity through AWS infrastructure, per AWS press release and DCD reporting.

[21] AWS CEO Andy Jassy, Q3 2025 earnings call (per Yahoo Finance transcript, October 30, 2025): stated that Trainium2 is “fully subscribed” and “now a multi-billion-dollar business growing 150% quarter-over-quarter.” SemiAnalysis, “Amazon’s AI Resurgence,” September 2025: “Trainium has faced some yield issues on the assembly phase — fairly standard for a new system.” SemiAnalysis, “AWS Trainium3 Deep Dive,” December 2025, confirmed strong customer demand across Trn2 and Trn3.

[22] AWS product history: Inferentia1 launched in 2019, Inferentia2 launched in 2023, Trainium1 launched in 2022, Trainium2 launched in 2024 (repositioned for “training and inference”), Trainium3 launched in December 2025. No Inferentia3 has been announced as of March 2026. Per Introl (February 2026): “AWS appears focused on Trainium improvements that benefit both training and inference rather than maintaining separate chip lines.” See also Nikkei Asia (December 2024) on Inferentia development halt.

[23] The Infrastructure Reversion Test was introduced in “Chip and Mortar,” The AI Realist, analyzing Amazon’s pattern of reverting to infrastructure when intelligence-layer bets underperform.

[24] “Nvidia buying AI chip startup Groq’s assets for about $20 billion in its largest deal on record,” CNBC, December 24, 2025. Structured as a non-exclusive IP licensing agreement plus acqui-hire of approximately 90% of Groq staff. Groq continues operating independently under new leadership. Jensen Huang internal email (obtained by CNBC): “We are not acquiring Groq as a company.” The ~$20B figure is from investor sources, not officially confirmed by Nvidia.

[25] Jensen Huang's internal email to Nvidia employees, December 2025, obtained by CNBC. Described Groq’s “low-latency processors” and integrating them into “the NVIDIA AI factory architecture.” The exact phrase “extreme low latency” appeared in CES 2026 press Q&A commentary about the deal rationale.

[26] Nvidia Rubin CPX announcement, September 2025. 30 PFLOPS NVFP4, 128 GB GDDR7 (no HBM), no NVLink. Designed for prefill in disaggregated configurations. NVIDIA Developer Technical Blog, “NVIDIA Rubin CPX Accelerates Inference Performance and Efficiency for 1M+ Token Context Workloads.”

[27] Nvidia Dynamo, announced GTC 2025. Open-source distributed inference framework with first-class prefill and decode workers connected by KV-aware routing. GitHub.

[28] Huawei Connect 2025 keynote, Eric Xu, September 18, 2025. Ascend 950PR (prefill/recommendation, HiBL 1.0, Q1 2026) and Ascend 950DT (decode/training, HiZQ 2.0, 144 GB, 4 TB/s, Q4 2026). Per ChinaBiz Insider (September 2025) and South China Morning Post (September 20, 2025).

[29] Eric Xu keynote, Huawei Connect 2025: “Both the prefill stage of inference and recommendation algorithms are compute-intensive, with higher demand for parallel computing and lower demand for memory access bandwidth. A layered memory solution also means that prefill and recommendation algorithms don’t necessarily need huge amounts of local memory.”

[30] “Serving Large Language Models on Huawei CloudMatrix384,” Huawei research paper (arxiv, June 2025). Three-subsystem architecture: prefill, decode, and caching as peer-to-peer resource pools. Zhang Dixuan, President of Ascend Computing, Huawei Connect 2025: disaggregation of Attention and FFN stages improved decode throughput by over 50%.

[31] Huawei’s Ascend chips are believed to be manufactured primarily by SMIC, based on teardown analysis (TechInsights, SemiAnalysis) and supply chain reporting; Huawei has not officially confirmed its foundry partner. Earlier Ascend 910B/910C variants were found to contain TSMC dies obtained through intermediaries, for which TSMC was fined. U.S. export controls bar Huawei from TSMC’s advanced nodes. SMIC’s most advanced production process is an enhanced 7nm node; competitors (Nvidia, AMD, AWS) build on TSMC 3nm.

[32] AMD ROCm blog, “Unleashing AMD Instinct MI300X GPUs for LLM Serving: Disaggregating Prefill & Decode with SGLang,” August 28, 2025. AMD, “Speed is the Moat: Inference Performance on AMD GPUs,” February 2026, describing MoRI (GitHub) for KV transfer and adaptive kernel selection.

[33] Meta-AMD 6GW partnership, February 2026: Meta is entitled to up to 160 million AMD shares depending on GPU volume purchased. Per Techzine Global (February 2026). SemiAnalysis previously noted OpenAI receives an “equity rebate” to own up to 10% of AMD shares. AMD's acquisition of ZT Systems supports the delivery of rack-scale solutions, per Lisa Su (March 2026 earnings call). Note: AMD’s reported GAAP gross margin (~48%) does not reflect the economic cost of performance-based warrants issued to Meta and equity rebates to OpenAI, which dilute shareholder value below the gross margin line. The effective margin per GPU on warranted deals is substantially lower.

[34] Google Cloud Next 2025. TPU v7 Ironwood: 4,614 TFLOPS per chip, 192 GB HBM3e, 7.4 TB/s bandwidth per Google’s official blog (”Ironwood: The first Google TPU for the age of inference”). 9,216-chip pods. Committed pricing reported at $0.39/chip-hour; pricing may vary by commitment tier and region.

[35] SemiAnalysis, “How Dell Is Beating Supermicro,” May 2024: “Hyperscalers tend to buy from ODMs who make ~2% to ~3% margins from building servers.”

[36] Super Micro Computer 10-Q, Q2 FY2026 (Oct–Dec 2025): GAAP gross margin 6.3%. Comparison: Q2 FY2024 (Oct–Dec 2023) GAAP gross margin was 15.4%. The 15.6% figure sometimes cited reflects non-GAAP for a different quarter (Q3 FY2024).

[37] Amazon 10-K FY2025. AWS segment operating income divided by AWS revenue. Approximate; varies quarterly.

[38] Nvidia Q3 FY2026 earnings release (November 2025): GAAP gross margin 73.4%. Q1 FY2026 showed 60.5% due to a one-time $4.5B H20 inventory charge related to China export controls; excluding this charge, the underlying margin was 71.3%. The ~73% figure reflects normalized operations.

[39] Broadcom Q1 FY2026 earnings. Non-GAAP gross margins above 75%; AI revenue $8.4 billion, up 106% YoY. $73 billion AI backlog per Broadcom FQ4 2025 earnings call. XPU design customers include Google (TPU), Meta (MTIA), ByteDance, and Anthropic (~$11 billion order, revealed as the fourth customer at Q4 FY2025 earnings per CNBC, December 2025). OpenAI has a separate 10GW chip co-development partnership (announced October 2025) that is structurally distinct from Broadcom’s XPU design relationships.

[40] See note 39.

[41] Structural assessment, not Broadcom statement. Broadcom CEO Hock Tan, per Next Platform (June 2025), has stated that the value of custom accelerators lies in the learning curve of co-design with each hyperscaler, creating multi-generational stickiness. A merchant product would commoditize this relationship.

[42] Nvidia NVLink Fusion, announced at Computex 2025. Allows third-party accelerators (hyperscaler custom chips, including Trainium and TPUs) to connect to NVLink fabric. Per NVIDIA Developer Technical Blog: “Scaling AI Inference Performance and Flexibility with NVIDIA NVLink and NVLink Fusion.”

[43] Cerebras Systems S-1 filing, SEC, September 30, 2024. Revenue: $24.6M (FY2022), $78.7M (FY2023).

[44] Cerebras S-1: H1 2024 revenue approximately $136.4M. G42 represented 87% of H1 2024 revenue. Per Sherwood News: “Is it bad to rely on one customer for 87% of your revenue?”

[45] Cerebras S-1. H1 2024 net loss $66.6M. Gross margin progression: 11.7% (2022) to approximately 41% (H1 2024).

[46] “Cerebras scores OpenAI deal worth over $10 billion ahead of AI chipmaker’s IPO,” CNBC, January 14, 2026. 750 megawatts of compute through 2028, focused on inference and reasoning models. Per DCD (February 2026).

[47] Cerebras Systems press release, February 3, 2026: $1 billion Series H at approximately $23 billion valuation. Led by Tiger Global. Prior Series G at $8.1 billion (September 2025). Per Fintool: “Nearly tripling in five months.”

[48] Bloomberg (March 6, 2026) reported that Cerebras was tapping Morgan Stanley for an IPO return. AWS partnership announced March 13, 2026, during an active roadshow. IPO expected Q2 2026 per ION Analytics.

[49] Author’s calculation. $23B valuation / ~$270M annualized H1 2024 revenue ≈ 85x. Even against the most optimistic industry projection — ION Analytics estimates 2026 revenue may approach $1 billion — the multiple is 23x forward revenue, rich for a hardware company with gross margins in the mid-thirties to low forties. ION Analytics' projection is B-tier (an industry analyst estimate for a private company; not independently verifiable). Nvidia gross margins exceed 73%; AMD in the high 40s. Hardware margins in the 30s reflect the cost structure of wafer-scale manufacturing.

[50] AWS press release, March 13, 2026: “collaboration that will, in the coming months, deliver the fastest AI inference solutions.” Cerebras blog: “Disaggregated is ideal when you have large, stable workloads. Most customers run a mix of workloads with different prefill/decode ratios, where the traditional aggregated approach is still ideal.”

[51] Julien Simon worked at AWS from 2015 to 201on AI/ML services, including SageMaker, Inferentia, and Trainium. The platform-absorption pattern — where each new chip is subordinated to the Nitro/EFA/Bedrock platform layer — was visible from inside the organization across multiple product generations. Note: AWS’s Annapurna Labs functions as an in-house ASIC design arm — effectively occupying the “Design IP” tier for its own prefill silicon. The Cerebras partnership means AWS chose to outsource decode rather than design a decode-specialized Trainium variant, reinforcing the Platform Absorption Test.

[52] Industry supply chain analysis via Bincial (2025), citing TrendForce and supply chain surveys: combined AI TPU/ASIC shipments from Google and AWS reached 40-60% of Nvidia AI GPU shipments by 2025, with total ASIC shipments projected to surpass Nvidia GPU shipments at some point in 2026. B-tier source; industry estimate based on supply chain channel checks.

Open Source, Closed Orbit

Julien Simon — Fri, 13 Mar 2026 17:47:00 GMT

On August 8, 2023, Jensen Huang stood on stage at SIGGRAPH and asked a question.

“Where are the world’s models?”

He answered it himself: “Well, the world’s models are largely on Hugging Face today. It is the largest AI community in the world. Lots and lots of people use it. 50,000 companies. There’s some 275,000 models, 50,000 data sets. Just about everybody who creates an AI model and wants to share with the community puts it up in Hugging Face.” [1]

Then he announced a partnership: DGX Cloud training integrated into Hugging Face. “A brand new service,” he said, “to connect the world’s largest AI community with the world’s best AI training infrastructure.” Julien Chaumond, Hugging Face’s co-founder and CTO, posted the verbatim quote on LinkedIn with a heart emoji. “Thanks for the shout-out, Jensen Huang.” [2]

The DGX Cloud training service Jensen announced in that SIGGRAPH quote? Hugging Face deprecated it on April 10, 2025. [3]

On March 16, at GTC 2026, Jensen will convene a panel called “Open Models: Where We Are and Where We’re Headed.” On stage: leaders from A16Z, AI2, AMP Coalition, Black Forest Labs, Cursor, Reflection AI, and Thinking Machines Lab. Hugging Face — the largest AI community in the world, by Jensen’s own description — is not on the panel. [4]

The exclusion is surgical. Hugging Face has four sessions at GTC, including a co-presentation with Nvidia titled “The State of Open-Source AI” and a session on open-source inference. [4b] Just two months ago, at CES 2026, Jensen demoed a Reachy Mini robot running Hugging Face models on DGX Spark during his keynote. [4c] Nvidia and HF announced a joint LeRobot integration on the same day. The partnership is active. But when Jensen personally selects who shapes the keynote-level conversation about the future of open models, the company he once called the home of “the world’s models” is not invited.

In January 2026, I sat through a one-hour Nvidia presentation at an internal private equity meeting in the Bay Area. The audience was investors and portfolio company leaders, the people who decide which deployment stack their companies adopt. The word “open” was constantly used throughout. The words “Hugging Face” were not uttered once. I was media-trained at AWS, and I’m the former Chief Evangelist at Hugging Face. I noticed.

To understand why, you need to see three forces converging at once, and why Hugging Face sits at the center of all of them.

The Vacuum

The open-source AI ecosystem had one indispensable American patron: Meta. Llama was the model family that proved open weights could compete with the best closed models on standard benchmarks. It was the default starting point for enterprise fine-tuning, the foundation of sovereign AI programs from France to India to the UAE, the model that made Hugging Face the center of gravity for the AI developer community.

In late 2025, Meta began walking away.

The company is developing two new models, codenamed “Avocado” and “Mango,” targeting the first half of 2026. [5] CNBC reported that Avocado could be a proprietary model — a shift from the open-weight Llama strategy. Neither model is expected to offer weight downloads. Both would be available only through API and hosted services, the same model Meta spent years arguing against when OpenAI and Google did it. The company raised its 2025 capital expenditure guidance to $70-72 billion to build the infrastructure required for its new direction. [6]

The reversal is sharp. After Llama 4’s lukewarm reception (internal leadership admitted that “performance and popularity were behind competitors” [7]), Meta leadership directed employees to stop publicly discussing open-source and Llama products. Yann LeCun, one of the godfathers of deep learning and Meta’s most visible champion of open-source AI, left the company. Bloomberg reported that some employees had been encouraged to keep LeCun out of the spotlight. [8] Meta no longer saw him as emblematic of the company’s AI strategy. A former researcher described the dysfunction with unusual candor: “It’s not just dysfunction, it’s a metastatic cancer that is affecting the entire organisation.” [9]

The stated concern: DeepSeek’s use of Llama architecture to build competitive models at a fraction of the cost. [10] Open-sourcing, Meta’s leadership concluded, was giving Chinese competitors free access to architectures they could clone and optimize. Mark Zuckerberg confirmed that Meta would not release open-source models capable of superintelligence. [11]

The organizational restructuring signals the depth of the shift. The new closed models are being built by TBD Lab within Meta Superintelligence Labs, led by newly recruited Chief AI Officer Alexandr Wang, who was brought in through Meta’s reported $14.3 billion equity investment in his company, Scale AI. [12] The team is reportedly distilling from rival models, including Google’s Gemma, OpenAI’s gpt-oss, and Alibaba’s Qwen [13], a striking shift for a company that once positioned itself as the standard-bearer for American open-source AI.

Why does Meta’s retreat matter for this story? Because of what made Llama unique among open model families. Meta’s business model is advertising. It does not sell silicon. It does not sell inference. It had no commercial interest in which hardware you ran Llama on. That hardware-agnostic neutrality made Llama the perfect anchor for a platform like Hugging Face. Developers could download Llama, fine-tune it, and deploy it on Nvidia, AMD, or whatever hardware they had. Meta didn’t care.

With Meta going closed, that anchor disappears.

The Succession

The vacuum is not empty. Chinese labs filled it before most Western observers noticed.

On Hugging Face, Alibaba’s Qwen family has overtaken Meta’s Llama in cumulative downloads. A recent MIT study found that Chinese open-source models have surpassed American models in total downloads globally. [14] The ATOM project, which tracks open model adoption, reported that by August 2025, model derivatives based on Qwen accounted for more than 40% of all new language model derivatives on Hugging Face. Llama had fallen to roughly 15%. [15]

The numbers are hard to argue with: over 45% of top open-model downloads in 2025 came from Chinese models. [16] Qwen 2.5 variants alone were downloaded more than 750 million times during 2025, according to AI World’s Open Model Evolution dashboard (which counts downloads per model variant). [17] By January 2026, the South China Morning Post reported that cumulative downloads across the full Qwen family had surpassed 700 million (Alibaba’s own aggregate figure, using a different counting method). [18] The figures come from different sources and use different methodologies; the directional conclusion remains the same.

Chinese open models are hardware-agnostic by necessity, not by ideology. Built under US export controls that prevent dependence on Nvidia silicon, they ship with support for Huawei Ascend, Cambricon, and domestic Chinese chips alongside Nvidia GPUs. DeepSeek-V3.2 shipped with day-zero support for both — not as cloud demos, but as reproducible inference pipelines released alongside the weights. [19] Ant Group’s Ling models trained on domestic AI chips achieved near-H800 performance at 20% lower cost. [20]

This aligns Chinese models with Hugging Face’s vision of hardware-neutral openness. They radiate outward — models that work on any hardware and are released to anyone. They are the most downloaded models on the platform.

Nvidia sees this clearly. Jensen Huang told Bloomberg: “China is well ahead — way ahead on open-source.” [21] He is using the Chinese open-source surge to justify Nvidia’s own open model push. The pitch to Western enterprises is implicit: you need an open model you can trust, trained on clean data, backed by a company not subject to Chinese government influence, and with enterprise-grade deployment support. The deployment path for that trustworthy model runs through Nvidia.

The economics make this sustainable in a way Meta’s open-source strategy never was. As Next Platform put it, Nvidia is the only AI model maker that can afford to give models away forever. [22] Meta’s open-source was subsidized by an advertising business that eventually decided the subsidy was not worth the strategic exposure. Nvidia’s subsidy is structural and permanent: every open model that drives GPU adoption pays for itself through hardware margins. Against $216 billion in revenue for fiscal year 2026 (ended January 2026), [23] the estimated $1-1.5 billion Nvidia generates annually from software licensing is a rounding error. [24] The models are not the product. The GPUs are.

No other company can replicate this position. OpenAI and Anthropic charge for models because models are their product. Meta gave models away until its board decided that was a liability. Nvidia charges nothing for models because it charges for the hardware they run on. A loss leader with infinite runway.

The succession plan is already explicit. Nvidia built the Llama Nemotron model family directly on top of Meta’s Llama architecture, using Nvidia’s open datasets and post-training techniques. [25] The models inherit the parent Llama license. If Meta stops releasing new Llama weights, Nvidia already has the fork and the resources to maintain it independently.

The Nemotron 3 family is not a side project. It spans three sizes: Nano (31.6 billion total parameters with 3.6 billion active per token), Super (120 billion parameters, launched March 12, 2026), and Ultra (~500 billion parameters, expected H1 2026). [26] The architecture is a hybrid Mamba-Transformer mixture-of-experts design, a novel approach. Nvidia released the full pre-training corpus, post-training data, reinforcement learning (RL) environments, and training code alongside the model. [27] Major enterprise adopters signed on immediately. [28]

Nvidia has promoted these contributions aggressively. Kari Briski, Nvidia’s VP of GenAI Software at the time of the announcement, claimed that Nvidia was the top contributor on Hugging Face in 2025, citing 650 models and 250 datasets. [29] The claim refers to upload velocity, not total downloads: an independent analysis ranked Nvidia 34th by total all-time downloads on the platform [30], a metric that inherently favors older models but better reflects actual developer usage.

Jensen Huang framed the ambition plainly: “With Nemotron, we’re transforming advanced AI into an open platform that gives developers the transparency and efficiency they need to build agentic systems at scale.” [31] “Open platform.” “Transparency.” “Scale.” The language of Hugging Face, coming from the CEO of its most systematic competitor.

Nvidia is not the only American company trying to fill the vacuum. Arcee AI shipped Trinity Large in January 2026, a 400-billion-parameter sparse MoE trained from scratch for $20 million under the Apache 2.0 license. [32] [33] But Trinity was trained on 2,048 Nvidia B300 GPUs. The economics of the succession are circular: whether Nvidia builds the open model itself or a startup builds it, the training revenue flows to Nvidia. The question is only who controls the deployment path afterward.

The timing is suggestive, though not necessarily coordinated. Nvidia announced Nemotron 3 on December 15, 2025, [34] the same week multiple outlets reported Meta’s pivot to closed models. Super shipped on March 12, four days before GTC; Ultra is expected in the first half of 2026 — the same window Meta targets for Avocado and Mango. Nvidia is launching its most ambitious open model family at the precise moment Meta’s goes dark.

Two Definitions of Open

Two competing definitions of “open” now coexist in the AI model ecosystem. The difference between them is not quality. Both produce real value. The difference is direction: star versus black hole. A star radiates energy outward; everything in its orbit benefits from the light. A black hole is just as powerful, but nothing that crosses the event horizon comes back out. The event horizon here is not the model release. It is the deployment.

Nvidia’s open is a black hole. The models are open-weight. The datasets are released. The training recipes are published. This is genuine and should be acknowledged. Developers can and do download Nemotron models and run them on non-Nvidia hardware through vLLM, llama.cpp, and other open frameworks.

But the deployment infrastructure routes back to Nvidia. NIM (NVIDIA Inference Microservices) is Nvidia’s containerized inference platform — pre-optimized model containers that deploy with a single command on Nvidia GPUs. Production use requires an NVIDIA AI Enterprise license at $4,500 per GPU per year (list price; volume terms may vary). [35] Nemotron 3’s NVFP4 quantization format is specifically optimized for Nvidia’s Blackwell architecture. [36] The Nemotron 3 Nano technical blog states the model is “specifically designed for DGX Spark, H100, and B200 GPUs.” [37] Can you run it elsewhere? Yes. Will it run at the same throughput? No. The optimization differential is the gravity.

build.nvidia.com, Nvidia’s growing model catalog, hosts roughly 200 curated models as of early March 2026, including third-party models from DeepSeek, Mistral, Qwen, Meta, Microsoft, Google, and OpenAI, all pre-optimized as NIM containers for Nvidia silicon. [38] In two years, the catalog has grown from roughly 40 NIM microservices to more than 200, with about 80 Nvidia-developed models. Nvidia VP of GenAI Software Kari Briski told Gizmodo that Nvidia wants to be “the go-to open model platform,” framing this partly as a response to Chinese open-source dominance. [39]

A detail that reveals the strategy's architecture: build.nvidia.com includes a “Launch from Hugging Face” button, currently in beta. [40] It allows users to take any model hosted on Hugging Face and deploy it as an NIM container on Nvidia hardware. Hugging Face becomes the free catalog. build.nvidia.com becomes the paid checkout. The models are open. The orbit is closed.

NIM’s own documentation reinforces the gravitational pull, noting that Nvidia “cannot guarantee the security of any models hosted on non-NVIDIA systems such as HuggingFace.” [41] For enterprise buyers who need to justify deployment decisions to compliance teams, that single sentence can tip the scale.

An enterprise ML platform team can build the functional equivalent from open-source components: vLLM or SGLang for inference (both run on Nvidia and AMD), GPTQ, AWQ, or GGUF for hardware-agnostic quantization, Docker and Kubernetes for containerization, Ray Serve or KServe for autoscaling, and models downloaded directly from Hugging Face. Same models, same weights, no license fee. The engineering effort is real — weeks instead of days — and NIM’s enterprise support, pre-optimized containers, and security hardening have genuine value. But the $4,500/GPU/year buys packaging, not technology. The alternative to NIM is not Hugging Face. It is your own team building and owning your inference stack. Nvidia has made NIM’s onramp frictionless: build.nvidia.com offers free hosted API endpoints for prototyping, and Developer Program members can download NIM containers at no cost for development on up to 16 GPUs. You can fine-tune a supported base model and deploy it through NIM. But the catalog is the constraint. NIM only serves models whose base architecture Nvidia has pre-optimized — if your model isn’t derived from a NIM-supported base, you cannot package it as a NIM container. A developer seeking guidance on creating a custom NIM-compliant container for an unsupported model found no documented path. [41b] vLLM and SGLang, by contrast, serve any model in Hugging Face format with no catalog dependency.

The deeper friction arises at the production scale: moving from a free prototype to paid deployment requires an AI Enterprise license, which means engaging Nvidia’s sales team and negotiating an enterprise agreement. Every sales conversation is an opportunity to cross-sell Triton for inference optimization, NeMo for training, DGX Cloud for compute, and Slurm for scheduling. Everything and the kitchen sink. Each layer adopted adds a contractual dependency. Together, they compound into switching costs that building your own stack avoids. The lock-in is not in the technology. It is in the catalog and the contract.

Hugging Face’s open is a star. Models radiate outward to whatever hardware you have. This is not a recent pivot. It is a systematic investment spanning years, codified through a Hardware Partner Program launched in 2021 [42] and a growing family of dedicated Optimum libraries.

Since 2021, Hugging Face has built dedicated hardware integration libraries for every major non-Nvidia accelerator. Intel Gaudi gets Optimum Habana, with 40+ validated architectures and benchmarks showing Gaudi 2 outperforming A100 on BERT pretraining by 3x. [43] AWS Trainium and Inferentia get Optimum Neuron, actively developing with Trn2/Trn3 support and vLLM integration. [44] Google Cloud TPUs get Optimum TPU, with v5e and v6e support, though TPU availability on Inference Endpoints has since been suspended. [45] Intel Xeon CPUs are optimized with OpenVINO. [46] That last category is gaining strategic weight: as enterprise inference shifts toward smaller, domain-specific models and agentic workloads, CPUs are becoming a cost-effective alternative to GPUs for a growing share of production AI. [46b] Every model that infers on a CPU runs entirely outside Nvidia’s GPU gravity. An earlier partnership with Graphcore’s IPU ended when SoftBank acquired the company in 2024, and it exited the standalone accelerator market. [47]

The AMD partnership goes deeper than integration. Starting in 2023 with Optimum AMD and ROCm (AMD’s open-source GPU compute platform) support for MI210, MI250, and MI300 GPUs, it has evolved into active co-engineering. In late 2025 and early 2026, Hugging Face and AMD jointly developed custom MI300X kernels optimized for common transformer operations, all open-sourced in the hf-rocm-kernels repository. [48] In February 2026, HF launched the ROCm Kernel Builder to let the community contribute and share AMD-optimized kernels. [49] AMD’s Pervasive AI Developer Contest offers $160,000 in prizes and up to 700 hardware grants, with models required to be hosted on the HF Hub. [50] The word “partnership” undersells what is happening. Hugging Face and AMD are co-engineering production inference kernels, the kind of work that does not happen without a formal co-development agreement and dedicated headcount on both sides.

Clem Delangue delivered the keynote at AMD’s AI Day. His framing was precise: “Open-source means the freedom to build from a wide range of software and hardware solutions.” [51]

The structural conflict is now visible. Every model that runs equally well on AMD via Hugging Face’s stack does not need NIM. Nvidia GPUs still run the vast majority of AI inference workloads globally, and none of HF’s alternative hardware partnerships have changed that dominance at scale. But the AMD co-engineering is not vaporware. Custom MI300X kernels, production inference on ROCm via vLLM and SGLang, shared development teams writing fused attention operators — this is the kind of work that creates a credible alternative, even if the market share gap remains wide. Hugging Face’s hardware neutrality is not yet a market reality. It is a structural threat. And Nvidia’s entire open-model strategy is designed to prevent it from becoming one.

The Decoupling Is Already Underway

The star and the black hole are not theoretical. Over twelve months, a sequence of moves by Hugging Face systematically reduced its Nvidia dependency.

In October 2024, Hugging Face launched HUGS (Hugging Face Generative AI Services), a direct NIM competitor designed from the ground up to be hardware-agnostic. [52] Built on open-source TGI and Transformers, HUGS ran on Nvidia or AMD GPUs, with pricing that undercut NIM's: $1 per container per hour versus NIM’s $1 per GPU per hour plus AI Enterprise licensing. [53] By September 2025, HUGS was deprecated. The product never gained the enterprise traction needed to sustain it — a data point that illustrates how difficult it is to compete with a dominant vendor’s packaging, even when the underlying technology is open source.

In January 2025, Hugging Face launched Inference Providers, a new multi-vendor system for running models on third-party infrastructure. The initial partners: SambaNova, Fal, Replicate, and Together AI. TechCrunch reported that Hugging Face said its focus had shifted to “collaboration, storage, and model distribution capabilities.” [54] Nvidia was not among the inference providers.

On April 10, 2025, Hugging Face deprecated both Nvidia-powered services, the NIM API (serverless inference) and Train on DGX Cloud (training), on the same day. A one-line forum post from a Hugging Face engineer: “We decided to deprecate our Nvidia DGX Cloud Training service and Nvidia NIM API (serverless) experience.” [3] No explanation for why. The replacement: the multi-vendor Inference Providers system, where Nvidia is absent.

The architectural choice was clea,r regardless of the commercial reasons behind it. Hugging Face moved from a single-vendor Nvidia dependency to a structural multi-vendor-neutral approach.

In late 2025, Nvidia offered Hugging Face $500 million at a $7 billion valuation — more than the company had raised in its entire existence. Hugging Face said no. The company told the Financial Times it “does not want a single dominant investor that could sway decisions.” [55]

The financials give context to the decision. Hugging Face has grown revenue roughly 10x since 2021, when Forbes reported it at $10 million, reaching an estimated $70 million ARR by the end of 2023. [56] The company was profitable in 2025, though it slipped back into a loss in early 2026 after investing in datasets for its robotics and open-source initiatives. [57] Headcount has grown significantly since. Only 3% of users pay for premium services, which means the conversion headroom is enormous, but the current revenue density is thin. Roughly $200 million remains on the balance sheet from $400 million in total fundraising. [57]

With that financial profile, this was not a desperate company taking a principled stand. It was a strategic decision to preserve the one asset that makes Hugging Face irreplaceable: its neutrality between hardware vendors. CEO Clem Delangue told the Financial Times the company is “not trying to maximise revenue growth” but is “more interested in nudging developers towards open alternatives.” [55]

The Pattern

Hugging Face is not the first neutral ecosystem Nvidia has tried to acquire, failed to acquire, and then replicated. It is the latest in a documented pattern — one that spans more than a decade and has accelerated into the present.

Linux kernel drivers (2012-2022). Linus Torvalds, June 2012: “Nvidia has been the single worst company we’ve ever dealt with.” He raised his middle finger to the camera. [58] For years, Nvidia provided no documentation to the developers of the Nouveau open-source driver. When Nvidia finally released “open-source” GPU kernel modules in 2022, Red Hat engineers discovered they were thin wrappers around 30-40MB of opaque proprietary firmware running on the GPU System Processor. [59] Open-source in name. Proprietary in function. The community’s lead Nouveau maintainer subsequently left Red Hat and joined Nvidia. [60]

GeForce Partner Program (2018). Nvidia required add-in-board partners to reserve their premium gaming brands (ASUS ROG, Gigabyte AORUS) exclusively for GeForce, relegating AMD cards to lesser-known sub-brands. Partners reported that Nvidia would hold back GPU allocation if they refused. [61] The program was cancelled following scrutiny by the FTC and the European Commission. Years later, major partners still will not ship high-tier AMD Radeon cards. [62] The program is gone. The effect persists.

Mellanox (2020). Nvidia acquired the dominant InfiniBand networking provider for $6.9 billion. Pre-acquisition, Mellanox held 55-60% global market share [63] and sold to all comers. Post-acquisition, Nvidia’s networking revenue grew from $1.3 billion to over $10 billion annually. [64] The company now markets complete systems in which the switch, network adapter, DPU, CPU, GPU, and transceiver are sourced from a single supplier. Official documentation states: “NVIDIA does not support InfiniBand cables or modules not qualified or approved by NVIDIA.” [65]

The acquisition also gave Nvidia a structural narrative weapon. Jensen Huang declared at Computex 2023 that “lossy networks are unacceptable for supercomputing data centers.” [66] The statement was technically defensible and commercially convenient in equal measure. InfiniBand is lossless. Ethernet, the standard used by cloud providers who build their own GPU servers rather than buying DGX systems, is lossy by design. Nvidia’s own benchmarks show that poorly configured Ethernet clusters can deliver 30-50% lower training performance than InfiniBand equivalents. [67] The gap narrows with proper tuning: Juniper’s research and recent MLPerf benchmarks show that well-configured RoCEv2 Ethernet achieves 90-95% of InfiniBand performance. [68] But the narrative that Ethernet is fundamentally inferior for training has commercial value.

Every hyperscaler that adopts InfiniBand for its AI infrastructure buys it from one company: Nvidia. When Nvidia launched DGX Cloud as a managed service running on cloud provider infrastructure, Google, Microsoft, and Oracle agreed early. AWS held out for months before agreeing in late 2023, the last major cloud provider to sign on. [69] AWS deployed DGX Cloud on its own EFA (Elastic Fabric Adapter) Ethernet networking and Nitro system. Oracle got InfiniBand. Nvidia’s DGX Cloud head described the Oracle choice as “a result of our co-engineering for what would be the optimal experience,” [69] implicitly positioning InfiniBand as the superior option. The Ultra Ethernet Consortium, founded in July 2023 with AMD, Broadcom, Cisco, Intel, Meta, and Microsoft as members, exists for one reason: to build an Ethernet alternative to InfiniBand that breaks Nvidia’s networking lock-in. [70]

The DOJ began investigating in mid-2024 and issued formal subpoenas in September; no formal complaint has been filed. [71] The probe examines whether Nvidia bundles its chips, networking equipment, and software in ways that penalize customers who use competitors’ products. The core allegation: customers who buy Nvidia’s full networking stack get preferential GPU pricing and allocation, while those who use competitors’ equipment face restricted chip access. Jim Keller, CEO of competitor Tenstorrent, said customers “feel pressured to buy Nvidia’s networking gear to guarantee themselves access to the company’s vaunted AI server chips.” [72] Patrick Moorhead of Moor Insights was blunter: “It’s not volume-based pricing, it’s exclusionary-based pricing. You can’t do that if you’re a monopoly.” [72] China’s SAMR (State Administration for Market Regulation) issued a preliminary finding in September 2025 that Nvidia violated the anti-monopoly law related to the acquisition’s conditions. [73]

ARM (2020-2022). Nvidia attempted a $40 billion acquisition of ARM Holdings, the neutral CPU architecture that underpins nearly every competitor’s chips. ARM co-founder Hermann Hauser warned: “It’s in Nvidia’s interests to destroy ARM.” [74] The FTC voted 4-0 to sue to block the deal. [75] The UK, EU, and China piled on. The deal collapsed in February 2022. Nvidia’s response: it had announced the Grace CPU in April 2021, seven months into the acquisition process, while the deal was still pending. [76] By March 2025, Nvidia revealed the Vera CPU with fully custom “Olympus” cores, eliminating dependence on ARM’s core designs. [77] In Q4 2025, Nvidia sold its entire remaining equity stake in ARM. [78] If you cannot buy the neutral platform, build your own and walk away.

Open-source inference (2023-present). Nvidia sponsors vLLM, the dominant open-source LLM inference engine, co-hosts meetups, and publishes optimized containers. [79] Then it built NIM on top: a proprietary packaging layer that wraps open-source engines in enterprise licensing for $4,500 per GPU per year. NIM’s container images include a proprietary module called vllm_nvext, a closed-source extension built on the open-source project. [80] The open-source engine becomes the free input. The proprietary wrapper becomes the revenue stream.

Run:ai (2024). Nvidia acquired the GPU orchestration platform for $700 million, promising to open-source it. [81] Post-acquisition, Nvidia open-sourced only the KAI Scheduler component under the Apache 2.0 license. [82] The full commercial platform remains proprietary and Nvidia-exclusive.

SchedMD/Slurm (2025). Nvidia acquired the developer of Slurm, the workload manager used by roughly 60-65% of TOP500 supercomputers. [83] Next Platform’s headline: “Nvidia Nearly Completes Its Control Freakery With Slurm Acquisition.” [84] Slurm is open source under the GPL v2.0, which prevents relicensing, but whoever controls the maintainer controls the roadmap. The precedent: Nvidia acquired Bright Computing in 2022. By October 2024, it stopped selling Bright as a standalone product and bundled it exclusively into the AI Enterprise stack at $4,500 per GPU per year. [85]

The pattern across these cases is consistent, and the chronology tells its own story. It begins with open-source hostility (Torvalds, 2012), escalates to channel control (GeForce Partner Program, 2018), moves to strategic acquisition of neutral infrastructure (Mellanox 2020, ARM attempted 2020-2022), then accelerates into the AI era with open-source co-option (vLLM/NIM, 2023), acquisition-and-bundle (Run:ai 2024, Slurm 2025), and now model-layer replication (Nemotron, build.nvidia.com, 2025-2026). The playbook is not static. It is evolving, and each iteration is more sophisticated than the last. What separates this from normal vertical integration is market share. When a company with 5% of the accelerator market builds an alternative to a partner, that is competition. When a company with 80-95% of the market [86] builds an alternative to the only neutral platform in its ecosystem, the structural implications are different.

Semiconductor analyst Doug O’Laughlin of Fabricated Knowledge has applied the explicit “embrace, extend, extinguish” framework to Nvidia’s NVLink Fusion interconnect strategy. [87] The model layer is the latest domain where this pattern is playing out. And the HF case maps onto it with uncomfortable precision: Nvidia invested in HF’s Series D (embrace), became a major model contributor (extend), then built build.nvidia.com, NIM, and Nemotron as a parallel stack that captures enterprise value while HF hosts the community (commoditize). The $500 million offer was an acquisition attempt. The rejection may have accelerated what came next, but the replication was already underway — the same sequence ARM followed. Nvidia tried to buy the neutral platform for $40 billion, was blocked, built Grace and Vera as replacements, and sold its ARM stake. Nvidia offered Hugging Face $500 million, but was refused, and is now building the parallel stack described in this piece.

Nvidia’s approach differs from prior tech monopoly playbooks in its multi-layer execution. The lock-in is not at a single point. Each layer reinforces the others. A customer who uses Nvidia GPUs, Nvidia networking, Nvidia inference, and Nvidia scheduling faces switching costs across all layers simultaneously.

Every layer has a performant open alternative. Every layer is a reasonable product choice in isolation. Together, they compound into a stack you cannot leave without replacing seven components simultaneously. That’s not switching costs. That’s a migration project.

IBM controlled this many layers in the mainframe era, and the DOJ’s 1969 antitrust suit prompted it to unbundle software and services from hardware — a move widely credited with creating the independent software industry. [87b] The difference: IBM’s stack was a single vertically integrated product. Nvidia’s layers are nominally independent purchasing decisions that compound into lock-in only after adoption.

The concentration of control also makes the ecosystem difficult to criticize openly. In November 2025, Nvidia’s investor relations team circulated a private, seven-page memo to Wall Street sell-side analysts, pushing back point by point on criticisms from investor Michael Burry (who compared Nvidia to Cisco) and a Substack writer who compared its accounting to Enron-era fraud. [88] The Enron comparison was overwrought, and Nvidia had reason to respond. But the form of the response was revealing. The memo was not filed as an 8-K. Bernstein published it in full. Barron’s senior tech writer Tae Kim observed that Nvidia should have disclosed it through a proper SEC filing rather than emailing it privately to analysts. [89] Reuters described the episode as Nvidia “waging an information campaign on Wall Street and social media.” [90] Harbor Research analyst Jay Goldberg called the communication strategy “seriously flawed” and noted that “the decision to respond to these rumors in a memo with limited disclosure itself appears unusual.” [91] The memo targeted fraud allegations, not competitive criticism. But the demonstrated willingness to privately mobilize against public skeptics signals something about the cost of dissent in Nvidia’s ecosystem.

Hugging Face is the eighth entry in this sequence. The model repository and deployment layer. The last major neutral infrastructure between Nvidia’s hardware and the applications that run on it. As this piece goes to publication, Wired reports that a ninth may be forming: Nvidia is pitching “NemoClaw,” an open-source AI agent platform, to Salesforce, Cisco, Google, Adobe, and CrowdStrike ahead of GTC — hardware-agnostic at launch, built on Nvidia’s NeMo stack. [94]

The Replication Map

What Hugging Face built, and what Nvidia has built or is building alongside it.

Nvidia does not need to match Hugging Face on every layer. The roughly 200 models on build.nvidia.com are not trying to replicate HF’s two-million-model community catalog. They are trying to own the layer where money flows: enterprise deployment. The “Launch from Hugging Face” button tells the story. Discover the model on HF for free. Deploy it through Nvidia for $4,500 per GPU per year.

Hugging Face becomes the community tier. Nvidia becomes the enterprise tier. The switching costs compound with each layer a customer adopts.

The Best Case for Nvidia

Nvidia’s defenders have legitimate points.

The models are open in a way most labs are not. Nemotron weights, datasets, training recipes, and RL environments are published under Nvidia’s Open Model License, which allows commercial use, modification, and distribution. (The Llama Nemotron variants inherit Meta’s more restrictive community license.) Developers can and do run them on non-Nvidia hardware. The Nemotron 3 technical report is more detailed than what most frontier labs publish. The training data, including the Nemotron-CC Common Crawl corpus, is released openly, which exposes Nvidia to non-trivial legal risk and almost no other company at this scale is willing to do. [27]

Nvidia’s own characterization of its contributions warrants scrutiny. In October 2025, Nvidia’s blog claimed it was “a top contributor to Hugging Face, with more than 650 open models and 250 open datasets.” [29] The linked source clarified that Nvidia topped new repository uploads in 2025 — a measure of upload velocity, not downloads or developer usage. [92] An independent analysis ranked Nvidia 34th by total all-time downloads on HF. [30] Google, Meta, Microsoft, and several individual researchers all outrank it. By January 2026, Nvidia’s blog escalated to “NVIDIA’s open robotics models and datasets leading the platform’s downloads.” [93] The most-downloaded models on HF are sentence transformers, BERT variants, and Whisper. None are from Nvidia.

Clem Delangue was quoted in the October press release praising the contributions: “NVIDIA’s contributions to the open model ecosystem, commitment to open research for AI and Hugging Face’s ecosystem will empower millions of developers.” [29] The quote is diplomatic. HF’s platform would function without Nvidia’s models. It would not function without Qwen, DeepSeek, Mistral, and the thousands of community contributors who generate the platform’s actual download volume.

NIM solves a real problem. Enterprise deployment of open models requires integration work that many companies prefer to buy rather than build. The open-source components exist, but the packaging, support, and optimization that NIM provides have genuine value — and so does the vendor behind it. A team that invests in optimizing vLLM or SGLang with the right quantization and serving configuration can match or approach NIM’s performance — but that investment builds an internal AI practice, not a vendor dependency. Enterprise buyers factor in longevity: Nvidia will exist in five years. Hugging Face is a startup with $200 million in the bank. For a CISO choosing a deployment platform, that asymmetry matters independently of any lock-in concern. Charging for it is legitimate business. The $4,500/GPU/year price point is not unusual for enterprise software. [35]

Competition benefits developers. Hugging Face’s Inference Providers and the open-source inference ecosystem (vLLM, SGLang, and others) exist as alternatives to NIM — though HUGS, HF’s most direct NIM competitor, was deprecated in September 2025 after failing to gain enterprise traction. The multi-vendor inference market is better for practitioners than any monopoly would be. Nvidia’s open model push has also accelerated open dataset publishing, benefiting the entire ecosystem. And Nvidia’s primary target may not be Hugging Face at all — it may be the hyperscalers’ proprietary model APIs (Azure OpenAI, Bedrock, Vertex AI), with HF caught in the crossfire of a larger fight for who owns the enterprise model layer.

These points are true. They are also compatible with the structural argument. CUDA’s history demonstrates that technical excellence and ecosystem lock-in are not mutually exclusive. CUDA is the best GPU programming platform available. It is also the deepest moat in computing. Nvidia can deliver superior products at every layer, and those products can still function as the mechanism that closes the orbit.

The question is not whether Nvidia is doing good work. It is. The question is whether the structural incentives of a hardware monopolist that controls the dominant open model family, the enterprise deployment stack, the training infrastructure, the networking layer, and now the scheduling layer create forces that undermine hardware choice over time. The pattern evidence across eight domains and 18 years of CUDA history suggests they do.

The Verdict

Meta is going closed. Chinese models face geopolitical headwinds. Nvidia is building a parallel stack that turns “open” into a funnel for hardware lock-in. Hugging Face sits at the center of all three forces, and its survival depends on being essential to everyone who doesn’t want to be locked into a single vendor. The 2023 Series D investor list already reads this way: Google, Amazon, Nvidia, Intel, AMD, Qualcomm, IBM, and Salesforce. [95] Every major hardware player is betting on neutrality. The question is whether they will bet again at the required scale before the event horizon is crossed.

In August 2023, Jensen Huang told the world where the models are. “The world’s models are largely on Hugging Face today.” At GTC 2026, he is convening a panel on the future of open models — without the company he named. He demoed their robot at CES two months ago to promote his own models. His team co-presents a session with them at the same conference. At PE meetings, his team says “open” sixty times an hour and “Hugging Face” not once.

The question is not whether Nvidia is trying to own the open model layer. The evidence says it is. The question is whether anyone is investing hard enough in the alternative.

Notes

[1] Jensen Huang, SIGGRAPH 2023 keynote, August 8, 2023. Quote verified against NVIDIA-Hugging Face partnership press release, August 8, 2023, and Nvidia CFO Colette Kress’s remarks on Q2 FY2024 earnings call, August 23, 2023.

[2] Julien Chaumond, LinkedIn post, August 8, 2023. Verbatim quote from Jensen’s keynote with heart emoji. LinkedIn does not support permanent post URLs for non-logged-in users; searchable by name and date.

[3] Simon Pagez (Hugging Face engineer), Hugging Face forum post, April 10, 2025: “We decided to deprecate our Nvidia DGX Cloud Training service and Nvidia NIM API (serverless) experience.”

[4] GTC 2026 agenda and NVIDIA Newsroom press release, March 3, 2026. Open models panel: A16Z, AI2, AMP Coalition, Black Forest Labs, Cursor, Reflection AI, and Thinking Machines Lab. Hugging Face is listed among participating organizations at GTC but is not on Jensen’s open models panel. Perplexity and LangChain CEOs are pregame show speakers, not panel participants.

[4b] Hugging Face has four sessions at GTC 2026: “The State of Open-Source AI” (S81791, co-presented with Nvidia, Tue 3/17 4pm), GTC Developer Livestream (Wed 3/18 11:30am), Novita GTC After Hours: From Models to Agents to Infra (Wed 3/18 7pm), and “Accelerate AI Through Open-Source Inference” (S81902, Thu 3/19 1pm). HF is present at GTC — just not on the keynote open models panel Jensen personally convenes.

[4c] Jensen Huang CES 2026 keynote, January 5, 2026. Reachy Mini robot demo using Hugging Face models on DGX Spark. Same day, NVIDIA Newsroom announced Nvidia-HF LeRobot integration: “NVIDIA and Hugging Face integrate NVIDIA Isaac open models and libraries into LeRobot to accelerate the open-source robotics community.” See also Hugging Face blog and TechCrunch coverage.

[5] CNBC, December 9, 2025, reported Avocado “could be a proprietary model.” WinBuzzer, December 19, 2025; TechCrunch, December 19, 2025. Originally reported by Wall Street Journal, December 18, 2025. The WSJ reported the models’ existence and development roadmap; the proprietary licensing characterization originates with CNBC.

[6] Meta Q3 2025 earnings guidance (October 29, 2025) set 2025 capex at $70-72 billion. Actual full-year 2025 capex was $72.22 billion per Meta Q4/FY2025 results, January 28, 2026.

[7] Bloomberg, “Meta Pulls Back on Open-Source AI,” December 2025. Behind paywall. Cited via secondary reporting in WinBuzzer and TMTPOST.

[8] Bloomberg, December 2025. LeCun departure and internal directives. Behind paywall. Cited via TMTPOST and WinBuzzer.

[9] Tijmen Blankevoort (former Meta researcher), quoted in WinBuzzer, December 19, 2025.

[10] DeepSeek’s use of Llama architecture cited as internal concern. WinBuzzer and TMTPOST, December 2025.

[11] Zuckerberg confirmation that Meta would not release superintelligence-capable models as open-source. WinBuzzer, citing CNBC, December 2025.

[12] Meta’s $14.3 billion equity investment in Scale AI (acquiring a 49% stake, valuing Scale AI at ~$29 billion) and Alexandr Wang’s appointment as Meta’s Chief AI Officer. TMTPOST, December 19, 2025, citing Wall Street Journal. CNBC and Fortune corroborate the figure.

[13] Bloomberg, December 2025. TBD Lab distilling from rival models including Gemma, gpt-oss, and Qwen. TMTPOST: “Bloomberg reported that Meta’s TBD Lab is using several third-party models including Google’s Gemma, OpenAI’s gpt-oss and Alibaba’s Qwen in Avocado’s training process.”

[14] MIT Technology Review, “What’s Next for Chinese Open-Source AI,” February 12, 2026.

[15] ATOM Project data via MIT Technology Review, February 2026. Qwen derivatives >40% of new HF language-model derivatives; Llama ~15%.

[16] AI World / Open Model Evolution dashboard, 2025 data.

[17] AI World / Open Model Evolution dashboard, 2025 data. Qwen 2.5 download count.

[18] South China Morning Post, “Alibaba’s Qwen AI models downloaded 700 million times,” January 2026. Cited via Xinhua. Different counting methodology from AI World; directional finding consistent.

[19] Hugging Face blog, “One Year Since the DeepSeek Moment,” 2026. DeepSeek-V3.2 Ascend and Cambricon day-zero support.

[20] Hugging Face blog, “One Year Since the DeepSeek Moment,” 2026. Ant Group’s Ling models on domestic chips.

[21] Jensen Huang, Bloomberg interview, 2025. “China is well ahead — way ahead on open-source.” Behind paywall; quoted in multiple secondary sources.

[22] Next Platform, analysis of Nvidia’s open model economics, 2025. Nvidia described as “the only AI model maker that can afford to give models away forever.”

[23] Nvidia FY2026 earnings release, February 2026. Full-year FY2026 revenue of $215.9 billion (fiscal year ended January 25, 2026).

[24] Nvidia management indicated software and services revenue exceeded a $1 billion annual run rate in early 2024; the $1-1.5 billion range reflects subsequent analyst estimates. Nvidia does not break out software licensing as a separate line item in its financial statements.

[25] developer.nvidia.com/nemotron. Llama Nemotron built on Llama architecture.

[26] NVIDIA Newsroom, Nemotron 3 announcement, December 15, 2025. Model sizes and timeline.

[27] Nvidia technical blog and Hugging Face nvidia model cards, 2025-2026. Pretraining corpus, post-training data, RL environments, and training code released alongside models.

[28] NVIDIA Newsroom, Nemotron 3 announcement, December 2025. Named early adopters: Accenture, CrowdStrike, Oracle Cloud, Palantir, Perplexity, ServiceNow, and Cursor.

[29] NVIDIA Blog, “NVIDIA Launches Open Models and Data to Accelerate AI Innovation,” October 28, 2025. “As a top contributor to Hugging Face, with more than 650 open models and 250 open datasets.” Delangue quote from same release.

[30] Loïck Bourdois, “Model statistics of the 50 most downloaded entities on Hugging Face,” data collected October 1, 2025. Nvidia ranked 34th by total downloads. Bourdois noted: “NVIDIA is extremely balanced (it’s not a single model that drives all the entity’s downloads).”

[31] Jensen Huang, quoted in NVIDIA Newsroom, Nemotron 3 announcement, December 2025.

[32] Arcee AI, Trinity Large announcement, January 2026. TechCrunch and VentureBeat reporting.

[33] VentureBeat, January 2026. Arcee and OpenAI’s gpt-oss identified as the two American companies releasing frontier-class open models trained from scratch.

[34] NVIDIA Newsroom, December 15, 2025.

[35] NVIDIA AI Enterprise licensing documentation. List price $4,500 per GPU per year for production NIM deployment. Volume terms may vary.

[36] Futurum Group analysis, 2025. NVFP4 format “specifically suited for NVIDIA Blackwell architecture.”

[37] NVIDIA Technical Blog, Nemotron 3 Nano, 2025. “Specifically designed for DGX Spark, H100, and B200 GPUs.”

[38] build.nvidia.com, direct observation as of March 2026. 213 curated models, 84 Nvidia-developed.

[39] Kari Briski, quoted in Gizmodo, 2025. “The go-to open model platform.”

[40] build.nvidia.com, “Launch from Hugging Face” button, beta feature, direct observation.

[41] NVIDIA NIM documentation, security caveat regarding non-NVIDIA-hosted models.

[41b] NVIDIA Developer Forums, February 25, 2025. Developer asks: “I’m now exploring how to package my own model as a NIM. So far, all NIM models appear to be official, and I haven’t found any guidelines for creating a NIM-compliant container.” NIM fine-tuned model documentation confirms: “The base model must be available as a downloadable NIM for LLMs.” See NVIDIA NIM fine-tuned model support.

[42] Hugging Face Hardware Partner Program, announced at AI Hardware Summit, 2021.

[43] Optimum Habana documentation and benchmarks. Intel Disruptor Program participation.

[44] Optimum Neuron. v0.4.1 release with Trn2/Trn3 benchmarks, Qwen3 fine-tuning support.

[45] Optimum TPU. TPU on Inference Endpoints suspended per HF forum response.

[46] Optimum Intel. Quantization, pruning, knowledge distillation via Neural Compressor and OpenVINO.

[46b] Dan McNamara (AMD SVP), quoted in Yahoo Finance, “CPUs are back en vogue in the data center,” March 4, 2026: “As customers are evolving and shifting more towards inference, they’re using smaller language models, they’re using more domain-specific models, and a lot of that runs more efficiently on CPUs.” Intel CEO Lip-Bu Tan cited AI as a major driver for CPU demand on Intel’s January 2026 earnings call. Meta announced its largest deployment of Nvidia Grace CPU-only servers and a separate deal with AMD including Venice and Verano CPUs.

[47] Graphcore acquired by SoftBank, 2024. Exited standalone accelerator market. Earlier HF partnership launched 2021 AI Hardware Summit.

[48] github.com/huggingface/hf-rocm-kernels. Custom MI300X kernels co-developed with AMD, late 2025-early 2026.

[49] Hugging Face blog, “ROCm Kernel Builder,” February 2026.

[50] AMD Pervasive AI Developer Contest. $160,000 in prizes, up to 700 hardware grants, models hosted on HF Hub.

[51] Clem Delangue, AMD AI Day keynote. Quoted in Hugging Face blog, “Hugging Face and AMD”.

[52] HUGS documentation (archived), huggingface.co/docs/hugs. Launched October 2024 per InfoWorld. Deprecated September 2025.

[53] HUGS pricing per HF documentation. NIM pricing per NVIDIA AI Enterprise licensing. Comparison: $1/container/hour vs. $1/GPU/hour + AI Enterprise license.

[54] TechCrunch, “Hugging Face launches Inference Providers,” January 2025. Quote: “collaboration, storage, and model distribution capabilities.”

[55] Financial Times, “Why AI start-up Hugging Face turned down a $500mn Nvidia deal,” late 2025. Behind paywall. Cited via OODAloop and Fudzilla. $500M offer at $7B valuation. “Does not want a single dominant investor that could sway decisions.” Delangue: “not trying to maximise revenue growth.”

[56] Revenue: $10M (2021) per Forbes via Axios, August 2023. $70M ARR (end of 2023) per Sacra estimate, widely cited. Third-party trackers (GetLatka, Gitnux) cite higher 2024 figures ($100-130M) but these are unconfirmed estimates for a private company; Delangue has not publicly disclosed a specific 2024 revenue number.

[57] Profitability and loss timeline per Fudzilla, 2026, citing Delangue: “Hugging Face was profitable in 2025, then slipped into a first-quarter loss this year after investing in datasets.” Delangue also posted on X in mid-2024 that HF was profitable. Cash position: “about half” of $400M total fundraising “still on the balance sheet” per same Fudzilla article. 3% conversion rate per Financial Times (behind paywall; see [55]).

[58] Linus Torvalds, Aalto University Q&A, June 2012. Video on YouTube. Reported by Phoronix.

[59] Dave Airlie (Red Hat kernel engineer), LWN.net analysis, 2022. “Open-source” kernel modules as wrappers around proprietary GSP firmware.

[60] Ben Skeggs (Nouveau maintainer) resignation from Red Hat and hire by Nvidia, reported by Phoronix.

[61] Kyle Bennett, HardOCP, 2018. GeForce Partner Program brand alignment requirements and GPU allocation threats.

[62] Persistent brand alignment effects post-GPP cancellation. Industry observation.

[63] China SAMR filing, Mellanox acquisition conditions. 55-60% InfiniBand market share.

[64] DataGravity analysis. Nvidia networking revenue growth from $1.3B to $10B+.

[65] NVIDIA InfiniBand documentation. Cable and module qualification requirement.

[66] Jensen Huang, Computex 2023 keynote. Reported in Medium analysis.

[67] HPCwire, June 2025. 30-50% performance gap for non-optimized Ethernet vs. InfiniBand.

[68] Juniper Networks research whitepaper, 2024. 55% TCO savings, 90-95% InfiniBand performance with properly configured RoCEv2 Ethernet. Also MLPerf benchmark data showing <0.03% performance delta in some AI workloads.

[69] Data Center Dynamics, “Nvidia’s DGX Cloud”, 2025. “Google, Microsoft, and Oracle agreed to the proposal early on, but AWS held out until December, when it finally caved.” Bjorlin quotes on AWS EFA/Nitro and Oracle InfiniBand.

[70] Ultra Ethernet Consortium, founded July 19, 2023 per Linux Foundation press release. Founding members include AMD, Arista, Broadcom, Cisco, Eviden, HPE, Intel, Meta, and Microsoft.

[71] DOJ antitrust investigation opened mid-2024, subpoenas issued September 2024. Bloomberg (behind paywall); CNBC; Fortune.

[72] Jim Keller (Tenstorrent CEO) and Patrick Moorhead (Moor Insights), quoted in Fortune, “Nvidia faces DOJ antitrust probe”, 2025.

[73] China SAMR preliminary finding of anti-monopoly law violation, September 2025. CNBC.

[74] Hermann Hauser, quoted in CNBC, “ARM co-founder warns Nvidia deal”, 2020.

[75] FTC vote 4-0 to block Nvidia-ARM acquisition, FTC press release, December 2021.

[76] Grace CPU announcement, April 2021. HPCwire.

[77] Vera CPU with “Olympus” cores, March 2025. PC Gamer, NVIDIA.

[78] Nvidia ARM stake sale, Q4 2025. Baptista Research.

[79] vLLM sponsorship: vLLM GitHub, meetup records.

[80] vllm_nvext proprietary module in NIM containers: Hacker News user reports; TensorRT-LLM code closure analysis by SqueezeBits, March 2025.

[81] Run:ai acquisition, $700M, completed December 2024. VentureBeat; Yahoo Finance.

[82] KAI Scheduler open-sourced under Apache 2.0, March 2025. VentureBeat.

[83] SchedMD/Slurm acquisition, December 2025. TechCrunch; NVIDIA Newsroom.

[84] Next Platform, “Nvidia Nearly Completes Its Control Freakery With Slurm Acquisition,” December 2025.

[85] Bright Computing acquired 2022, stopped selling standalone by October 2024. Bizety.com; Next Platform.

[86] Nvidia AI accelerator market share estimates of 80-95% from multiple analyst reports including The Register, January 2024.

[87] Doug O’Laughlin, Fabricated Knowledge, “NVLink Fusion: Embrace, Extend, Extinguish,” May 2025.

[87b] The DOJ filed its antitrust suit against IBM on January 17, 1969, alleging monopolization of the general-purpose computer market. IBM preemptively announced unbundling of software and services from hardware in June 1969, before the case progressed, to reduce legal exposure. The suit lasted 13 years and was dismissed as “without merit” in 1982, but the unbundling is widely credited with creating the independent software industry and enabling the rise of Microsoft. See ProMarket, “Driving Innovation with Antitrust,” April 2024; Truth on the Market, “The Ghosts of Antitrust Past: IBM,” February 2020.

[88] Nvidia seven-page memo to sell-side analysts, November 2025. First reported by Barron’s (behind paywall). Seen by Reuters. Published in full by Bernstein. The memo responded to two critics: Michael Burry, who compared Nvidia to Cisco on X, and Substack writer Shanaka Anslem Perera, who compared Nvidia’s accounting to Enron. Burry explicitly rejected the Enron framing: “I am not claiming Nvidia is Enron. It is clearly Cisco.”

[89] Tae Kim, Barron’s senior tech writer, November 2025. “Nvidia should have published this in an 8-K filing.” Behind paywall.

[90] Reuters, “Nvidia waging an information campaign on Wall Street and social media,” November 27, 2025.

[91] Jay Goldberg, Harbor Research, quoted in MarketWatch via Futu, November 2025. “Seriously flawed.” “The decision to respond to these rumors in a memo with limited disclosure itself appears unusual.”

[92] AI World, “NVIDIA Leads Open Source AI Momentum as Chinese Labs Close In,” October 16, 2025. “NVIDIA has emerged as a surprising leader in this space, topping open source AI repository contributions in 2025.” This is the source Nvidia’s October blog linked to for its “top contributor” claim.

[93] NVIDIA Blog, “NVIDIA Unveils New Open Models, Data and Tools to Advance AI Across Every Industry,” January 6, 2026. “NVIDIA’s open robotics models and datasets leading the platform’s downloads.”

[94] Wired, March 10, 2026. Also CNBC, same date. Nvidia pitching “NemoClaw,” an open-source AI agent platform, to enterprise software companies ahead of GTC 2026. Platform reportedly works “regardless of whether their products run on Nvidia’s chips.” Partners approached include Salesforce, Cisco, Google, Adobe, and CrowdStrike. No formal partnerships confirmed at time of publication. Built on Nvidia’s NeMo platform.

[95] Series D investor list: Google, Amazon, Nvidia, Intel, AMD, Qualcomm, IBM, Salesforce. TechCrunch.

Register, Disclose, Pay

Julien Simon — Thu, 12 Mar 2026 05:47:53 GMT

On the morning of March 10, 2026, the Court of Justice of the European Union convened its fifteen-judge Grand Chamber in Luxembourg to hear oral arguments in Like Company v. Google — the first case to ask whether training a large language model on copyrighted text violates EU law [1]. The case turns on a Hungarian news article about a plan to bring dolphins to Lake Balaton. The facts are farcical; the legal questions are not. That same afternoon, two hundred and fifty kilometres west in Strasbourg, the European Parliament voted 460 to 71 to demand that every AI company disclose an itemised list of every copyrighted work used in training, pay retroactive compensation to creators, and register all training data with the EU’s intellectual property office [2].

The Grand Chamber will take a year or more to issue its ruling. The Parliament’s resolution is non-binding. And not a single general-purpose AI model in existence — including Mistral, the company France has staked its AI sovereignty on — could comply with what the MEPs are demanding [3]. Europe is simultaneously asking a court to decide whether AI training is legal and telling AI companies to pay for it regardless. The contradiction is not an accident. It is the opening move of a pattern Europe has repeated for sixty years.

The grievance is real

Unsealed court filings in Kadrey v. Meta revealed that Meta downloaded at least 81.7 terabytes from shadow libraries, including LibGen and Z-Library, to train Llama [4]. Employees explicitly identified the data as pirated. An engineer scripted tools to strip the word “copyright” from ebook files. A director of product management wrote that CEO Mark Zuckerberg had personally approved the use, adding that under no circumstances would Meta publicly disclose it [5]. Anthropic agreed to the largest publicly reported copyright settlement in US history — $1.5 billion, roughly $3,000 for each of the approximately 500,000 works pirated from the same shadow libraries [6].

The creative sector’s anger is justified. The Parliament resolution claims Europe’s creative and cultural industries generate 6.9% of EU GDP, though that figure traces to an industry-funded 2010 study using the broadest possible definition; official Eurostat data puts the narrower cultural industries at roughly 4% of value added [7]. Either way, the economic base is substantial, the extraction is real, and the industry has every reason to demand compensation.

The Parliament is right about the problem. The inevitable solution has a 60-year track record of failure.

Why enforcement always fails

The resolution demands that AI providers produce “an itemised list identifying each item of copyright-protected content used for training” [8]. This requirement collides with a technical reality: compliance is impossible at the current scale.

Modern foundation models train on trillions of tokens — the units of text that language models process. Llama 3 used 15 trillion. Qwen 3 used 36 trillion. An estimated 70-90% of these tokens originate from Common Crawl, the largest public web archive used for AI training, which contains 250 billion web pages spanning nearly two decades. A peer-reviewed audit of 1,858 AI training datasets found that license information was missing or unspecified for over 70% of them. Among those where licenses were specified, more than half were miscategorised [9]. The Stanford Foundation Model Transparency Index scored thirteen major AI developers in December 2025 and found that the industry average was 41 out of 100, with training data and training compute as the most opaque areas across the board. Mistral scored 18, the third-lowest of any company assessed and a 37-point drop from the previous year [10].

The problem is accelerating. The fastest-growing source of training data is synthetic text generated by AI models and fed back into training pipelines. If training on copyrighted text requires compensation, does synthetic data generated by rephrasing copyrighted text inherit the liability? The seed documents are copyrighted. The teacher models that rephrased them were trained on copyrighted data. The output is designed to be sufficiently transformed that it no longer matches the original, which is either a legitimate transformation or a laundering operation, depending on which side of the table you sit. No court has ruled. No collecting society has a position. And the volume of synthetic training data is growing faster than any other category.

The impossibility runs deeper than missing metadata. The Parliament’s demand implicitly assumes that a rights management infrastructure exists — or could be built — to identify every copyrighted work, determine who holds what rights, and route payments accordingly. No such infrastructure exists in any content vertical. Not one.

The music industry comes closest. It has unique identifiers for recordings (ISRC), compositions (ISWC), and releases (UPC). It has collecting societies — SACEM since 1851, GEMA since 1903, ASCAP since 1914 — with over a century of experience routing payments. It has DDEX, the metadata exchange standard that took from 2006 to the present to become the industry’s communication backbone [12]. And it still cannot track what AI models were trained on. DDEX’s latest extension, adopted by Spotify in September 2025, adds flags for AI involvement in music outputs — whether a track was made with AI. Tracking AI inputs — what a model ingested during training — is a different problem entirely, and one DDEX was never designed to solve [13].

Books have ISBNs, but an unsettled ownership crisis. The Authors Guild’s position is unambiguous: AI training rights are “a right entirely unrelated to publishing” that publishers do not hold unless separately negotiated [14]. Yet publishers are licensing anyway — HarperCollins at $5,000 per title to Microsoft, Wiley for $23 million to an undisclosed AI company — often without asking the authors whose rights they claim to represent [15]. Who can grant a license? The question is genuinely unresolved, and it preemptively blocks any systematic licensing framework.

News publishers have bilateral deals for the giants — OpenAI paid News Corp over $250 million, the Financial Times reported $5-10 million per year, and nothing for everyone else [16]. Microsoft launched a Publisher Content Marketplace in early 2026, but its roster includes only news and magazine outlets: Business Insider, Condé Nast, Hearst, and the Associated Press. No book publishers. No academic presses [17]. The RSL standard, launched in September 2025 as an “ASCAP for the web,” had attracted 1,500 publisher endorsers by early 2026 and partnered with Cloudflare and Akamai for enforcement — but not a single major AI company has committed to honouring it [18].

Images have stock licensing that doesn’t cover training. Web content — blogs, forums, social media, code repositories, the vast majority of what Common Crawl actually contains — has no identifiers, no rights registry, no payment infrastructure, and no unit of consumption that could map to a royalty. Full disclosure: I was VP Engineering at Digiplug, a digital music distributor, when the industry was adopting DDEX (2007-2009). I know firsthand how painful metadata standardisation is even in the best-case vertical — three major labels, a century of collecting societies, a finite catalogue. The content industry as a whole lacks these advantages [19].

The Parliament treats "copyrighted content" as a single category. Five broken verticals, none of which can support itemised disclosure at AI training scale. The most mature — music, with its unique identifiers, its three major labels, its century-old collecting societies, and its twenty-year-old metadata standard — still cannot track what trains an AI model. The least mature — general web content, which constitutes the overwhelming majority of what Common Crawl actually contains — has nothing [21].

Three systems, three outputs

The United States is answering the same question through litigation. In June 2025, Judge William Alsup ruled in Bartz v. Anthropic that training AI models on legally acquired books is “quintessentially transformative” and protected by fair use — while simultaneously holding that downloading pirated copies from shadow libraries is “inherently, irredeemably infringing” [40]. Two days later, Judge Vince Chhabria reached the same conclusion for Meta’s Llama models in Kadrey v. Meta [41]. An earlier case, Thomson Reuters v. Ross Intelligence, denied fair use when the AI product directly competed with the copyrighted source [42].

The emerging American framework is contextual: train on legal copies and transform the content, and you’re protected. Pirate it or build a substitute, and you’re not. It’s messy, slow, and case-specific. It also produces commercially functional outcomes, because each ruling clarifies a boundary that the market can price. More than seventy AI copyright cases are now pending in US federal courts, according to the US Copyright Office’s litigation tracker [43]. The first appellate decisions — the Third Circuit in Thomson Reuters and the Ninth Circuit in Doe v. GitHub — are expected in 2026. The US is building an AI copyright law the way it builds most technology law: through friction, precedent, and settlement.

China is not building a copyright framework. It is training. China now accounts for roughly 1,500 of the world’s 3,750 publicly released large language models. DeepSeek, Qwen, and their derivatives dominate open-weight rankings [44]. None discloses their training data. China’s Copyright Act technically contains no text-and-data-mining exception, meaning AI training requires permission — but enforcement is minimal, and state priorities overwhelmingly favour AI acceleration over rights holder compensation. Once released, open-weight models propagate globally and operate beyond any jurisdiction’s reach. You cannot levy a downloaded weight file.

The divergence is widening. The US builds and litigates. The EU regulates and taxes. China builds and ignores. And the gap between them — in cost structure, in innovation speed, in competitive position — grows with each cycle.

The Levy Ratchet

Europe has been here before. Every wave of digital disruption since the 1960s has triggered the same structural sequence — call it the Levy Ratchet.

Act 1: rights holders demand maximalist enforcement.
Act 2: enforcement proves impossible.
Act 3: a flat-rate levy collected by incumbent intermediaries absorbs the function that enforcement was supposed to serve. The ratchet only turns one way. No European copyright levy has been repealed once established.

When cassette tapes made home recording possible, Germany introduced the world’s first private copy levy in the 1960s. France followed in 1985. Rather than monitor every home recording — an impossible task — lawmakers imposed a surcharge on blank media, collected by the same organisations that represented rights holders. The enforcement problem was sidestepped, not solved [20].

When file sharing exploded in the 2000s, France created HADOPI — the Haute Autorité pour la Diffusion des Œuvres et la Protection des droits sur Internet. It was the most ambitious digital copyright enforcement system ever attempted in Europe, and the numbers it produced deserve a full accounting. Over its decade of existence, HADOPI consumed €82 million in public subsidies and collected a total of €87,000 in fines — a ratio of 942 euros spent per euro recovered [22]. The enforcement funnel tells the story better than any ratio: 13 million first-strike warning emails sent to alleged file-sharers, 6,994 dossiers transmitted to prosecutors, 517 court judgments, one internet disconnection order — which was never enforced because the French government abolished the disconnection penalty by decree five weeks after it was imposed [23]. The pirates, meanwhile, migrated from monitored peer-to-peer networks to VPNs and streaming services that HADOPI’s architecture could not reach. HADOPI was absorbed into Arcom in January 2022, its graduated response mechanism preserved on paper, and its reputation as a bureaucratic catastrophe intact.

What happened next is the structural finding. HADOPI failed as enforcement — that is, Act 2 of the pattern. But the private copy levy succeeded as a collection — that is, Act 3. The levy didn’t replace HADOPI; it absorbed the function HADOPI was supposed to serve. While HADOPI was sending warning emails to no effect, the levy expanded quietly to cover smartphones, tablets, USB drives, and external hard drives, each new device category adding a surcharge at the point of sale. By 2021, France alone was collecting roughly €300 million per year through this mechanism — nearly 3,500 times what HADOPI collected in its entire decade of operation [24]. The enforcement failed. The tax worked.

One objection is immediate: the enforcement target has changed. HADOPI tried to police fifty million anonymous file-sharers. AI copyright enforcement targets perhaps fifty identifiable, well-capitalised companies. You can name them. You can serve them. You can fine them — and the $1.5 billion Anthropic settlement proves that litigation works. But enforcement that can reach the companies does not solve the problem that no one — not the companies, not the regulators, not the collecting societies — can identify what those companies owe. The rights infrastructure gap is independent of the enforcement target. You can find the fifty companies. You still can’t tell them what they trained on.

When Google News began displaying snippets from European publishers, Germany passed an ancillary copyright law in 2013. It generated no meaningful revenue — Axel Springer, the law’s most vocal backer, eventually granted Google a free licence after traffic plummeted. Spain passed a stricter version; Google shut down Google News in the country for eight years, costing Spanish publishers an estimated €10 million annually [25]. The EU-wide version, Article 15 of the 2019 Copyright Directive, has generated some licensing deals — but overwhelmingly between large publishers and large platforms, leaving smaller outlets on the outside.

The pattern repeats: enforcement proves impossible, and the collecting societies that lobbied for it in the first place become the agents of a levy that doesn’t require enforcement at all. The levy is collected at the point of commercial activity — manufacturing, importing, selling — not at the point of content use.

The Levy Ratchet may not be limited to copyright. Nine of the ten largest GDPR fines have been imposed on companies headquartered outside the EU [45]. The AI Act’s compliance burden falls on companies that register models in the EU market, not on open-weight models downloaded from Chinese repositories. The sequence — maximalist demand, selective enforcement, convergence on a cost borne by compliant companies — looks familiar across European digital regulation. Whether the Ratchet generalizes beyond copyright is a question for another piece. The country that invented every turn of it is a good place to start.

The levy machine already exists

The same organisations now pushing for AI copyright enforcement operate the largest rights-revenue collection infrastructure in Europe. GESAC, the umbrella body lobbying hardest for the Parliament’s resolution, represents over one million creators through 32 collecting societies across Europe. SACEM manages 106 million works and collects France’s private copy levy. GEMA won a landmark ruling against OpenAI in Munich in November 2025 [26]. Copie France is the entity that physically collects the levy on every device capable of storing copies sold in France — €14 excluding tax on every smartphone [27].

Private copy levies collected over €1 billion annually across Europe as of 2018, the most recent comprehensive count. Germany contributed €332.5 million and France €277.5 million, together accounting for 60% of the global total [28]. Seventy-five countries have private copy levy systems [28]. The infrastructure to collect flat-rate payments from technology companies, route them through collecting societies, and distribute them to rights holders doesn’t need to be built. It’s been running for sixty years.

This context makes the Voss contradiction legible. Axel Voss — the same MEP who authored the 2019 Copyright Directive and now authored the 2026 AI copyright resolution — proposed in the explanatory memorandum to his draft report that the EU impose “an immediate, simple, flat-rate copyright fee” of 5 to 7 percent of global turnover [29]. The resolution he then authored and shepherded to a 460-71 vote explicitly rejects “a global licence for providers to train their genAI models in exchange for a flat-rate payment” [30]. This is not a contradiction. It is a negotiating structure. The resolution is the maximalist opening position. The levy is the known landing zone. The collecting societies know exactly what they want: a new taxable event within their existing collection infrastructure.

The Munich ruling accelerates the timeline. The Landgericht München found that when copyrighted works become “memorised” — reproducibly embedded in model weights such that the model can regenerate them — the training process constitutes reproduction under German copyright law [31]. The court drew an analogy to MP3 compression: just as an MP3 stores audio in compressed mathematical form, a language model stores text in statistical form, and both constitute reproduction. If training is reproduction, then compensation requires a new legislative act — and the collecting societies’ existing levy architecture is the readiest model for what that legislation would look like. The institutional infrastructure is already built. Both parties have appealed [32]. The legal pathway from GEMA’s Munich courtroom to SACEM’s collection infrastructure requires one legislative step, and the Commission is already preparing it.

The Commission is moving faster than expected. Commissioner Virkkunen has committed to presenting an evaluation report and proposal for review of the Copyright Directive by December 2026 [33]. The CJEU Grand Chamber heard the Like Company case on March 10. An Advocate General opinion is expected within months. The legislative window between the Parliament’s political signal and the Commission’s operational proposal is narrowing.

Who pays, who doesn’t

A levy on AI services offered in the EU market would reach OpenAI, Google, Anthropic, and Mistral — any company charging for API access, subscriptions, or enterprise licences to European customers. It doesn’t require tracking individual works. It’s administratively simpler than enforcement. This is the strongest argument for the levy, and it’s why the collecting societies want it.

But it doesn’t reach the most competitive open-source models. DeepSeek R1, Qwen 3, and their combined 170,000-plus derivatives are available as downloadable weights [34]. A European company that downloads Qwen and runs it on-premises for internal use pays no levy — there is no EU-market commercial transaction to attach it to. A European company that uses the Claude API for the same task does pay. The asymmetry narrows, however, if the levy attaches to deployers rather than model providers: a company building a commercial product on self-hosted Qwen and selling it to European customers would have a taxable transaction. The AI Act already distinguishes between providers and deployers. Whether the levy follows that architecture will determine how much of the self-hosted escape is real. Either way, the levy creates a structural incentive to move inference away from commercial APIs and onto self-hosted infrastructure — and, as noted above, roughly 1,500 of the world’s publicly released large language models originate in China [35]. None discloses their training data. None maintains GDPR representatives in Europe. None has signed the EU’s General-Purpose AI (GPAI) Code of Practice.

The pattern is familiar: rules for rule-followers. The levy applies to companies that operate within EU market structures — the companies that were already the most reachable, the most transparent, the most willing to negotiate. The companies that justify the strongest version of the creative industry’s grievance — those that train on everything, disclose nothing, and operate beyond any enforcement perimeter — are structurally outside the levy’s reach.

Mistral’s position is the sharpest illustration. Europe’s AI flagship scored 18 out of 100 on the Stanford transparency index — lower than DeepSeek, lower than Meta, lower than every company except Midjourney and xAI [36]. Its co-founder Guillaume Lample orchestrated the download of approximately 70 terabytes from LibGen while at Meta, reportedly telling colleagues that “everyone uses LibGen” [37]. He left Meta and co-founded the company France has championed as its answer to OpenAI. Mistral has not disclosed its training datasets, and its copyright policy addresses future opt-outs without accounting for past ingestion.

The irony is systemic, not personal. The company that built its training corpus using the same methods the Parliament wants to punish will pay whatever levy the collecting societies extract — because it sells API access on the EU market. DeepSeek, whose training data practices are entirely opaque, will not. The levy taxes the business model, not the behaviour.

The levy is a copyright instrument. The competitive threat from Chinese open-weight models is an industrial policy problem. These are different departments, and they don’t talk to each other.

Over a hundred major European companies — including Airbus, Siemens, ASML, BNP Paribas, and Mistral itself — demanded a two-year pause on AI Act enforcement in July 2025. The Commission refused [38]. The levy will arrive on top of the AI Act’s existing compliance obligations. The Bruegel think tank captured the structural bind precisely: “Full application of the law would endanger EU access to the best AI models and services and erode competitiveness” [39].

The verdict

The EU Parliament’s resolution will not produce the enforcement it demands. No technology exists to generate itemised disclosure at trillion-token scale. No rights management infrastructure — in music, books, news, images, or web content — can support systematic licensing for AI training. The Levy Ratchet will turn again: maximalist demand, enforcement failure, convergence on a levy.

The levy will be collected by the same organisations that have been collecting private copy levies since the 1960s — SACEM, GEMA, Copie France, and the 32 collecting societies under the GESAC umbrella. It will apply to AI services sold on the EU market. It will not reach Chinese open-source models downloaded and run locally. It will make self-hosted open-weight models from non-enforcing jurisdictions structurally cheaper than European API services. The creative industry will receive a revenue stream. Under existing French law, one quarter of private copy levy collections never reaches a creator — it funds “cultural activities” that the collecting societies themselves define and administer [24]. Individual creators, especially small ones without collective representation, will see a fraction of a fraction. The levy serves institutions better than individuals — but it is the only mechanism that generates any revenue at all for a right that cannot otherwise be enforced. The levy may be small. The pattern it reveals is not.

This is not a conspiracy and it is not necessarily wrong. A levy is a pragmatic response to an enforcement impossibility. It generates revenue for creators when no licensing infrastructure exists to route payments based on actual use. The private copy levy has funded European creative production for decades.

The question is not whether the AI levy is justified — the grievance is real and the money has to come from somewhere. The question is whether Europe understands what it is choosing. It is choosing a tax on AI services sold by compliant companies, collected by incumbent intermediaries, in a market where the most competitive alternatives are free, open-weight, and Chinese. If the CJEU rules in Like Company that the text-and-data-mining exception already covers AI training — a plausible outcome — the levy becomes unnecessary and the Levy Ratchet breaks. The structural bet of this analysis is that it won’t.

If you are evaluating an AI investment in Europe, price the levy into your margin model now — 5 to 7 percent of AI-attributed revenue is the political anchor, the landing zone will be lower, and the structure is more likely to follow the private copy precedent of a per-unit fee than a revenue share. Either way, collecting societies do not negotiate downward. If you are choosing between an API provider and self-hosted open weights, the levy widens the cost gap in favour of self-hosting — and every open-weight model from a non-enforcing jurisdiction becomes structurally cheaper to operate in Europe than every API from a compliant one. If you are building an AI company in the EU, the Levy Ratchet is not a risk to monitor. It is a cost to budget.

The United States will continue building the common law of AI copyright case by case through litigation—a slower process that produces messier, more context-specific, and ultimately more commercially functional outcomes. China will continue training on everything and disclosing nothing, because neither its legal framework nor its strategic priorities creates any incentive to do otherwise. And Europe will continue doing what it has always done. It will regulate. It will tax. And it will wonder why the technology gets built somewhere else.

In Europe, the only certainty is not just death and taxes. It’s death by taxes.

Notes

[1] Case C-250/25, Like Company v. Google Ireland Limited, referred to the CJEU by the Budapest Környéki Törvényszék on April 3, 2025. Assigned to the Grand Chamber. Oral hearing held March 10, 2026. The case asks whether chatbot output displaying text similar to press publisher content constitutes reproduction or communication to the public, and whether the TDM exception under Article 4 of Directive 2019/790 applies to AI training. See European Copyright Society opinion (CREATe, March 3, 2026) for detailed analysis of the questions referred. Case file: InfoCuria.

[2] European Parliament, “Protecting copyrighted work and the EU’s creative sector in the age of AI,” press release, March 10, 2026. Vote: 460 in favour, 71 against, 88 abstentions. Report A10-0019/2026, rapporteur Axel Voss (EPP, Germany). This is a non-binding own-initiative report, not legislation.

[3] Stanford Foundation Model Transparency Index, December 2025: average score 41/100 across 13 companies. Companies are “most opaque” about training data. Of the six companies the FMTI team assessed manually (because they did not submit transparency reports), Mistral scored 0 on the entire upstream domain, which covers training data and compute. See Wan et al., “The 2025 Foundation Model Transparency Index,” arXiv:2512.10169. The MIT Data Provenance Initiative (Longpre et al., “A Large-Scale Audit of Dataset Licensing and Attribution in AI,” Nature Machine Intelligence, vol. 6, pp. 975-987, August 2024) found license information missing for over 70% of 1,858 finetuning datasets audited.

[4] Kadrey et al. v. Meta Platforms Inc., Case No. 3:23-cv-03417-VC, N.D. Cal. Unsealed exhibits include internal Meta communications identifying LibGen as “a dataset we know to be pirated” and assessing its use as “medium-high legal risk.”

[5] Internal Meta email from Sony Theakanath (Director of Product Management) to VP Joelle Pineau. See TechCrunch, January 9, 2025 and Rolling Stone coverage of the unsealed filings. Engineer Nikolay Bashlykov wrote scripts to strip copyright markers from ebook files.

[6] Bartz et al. v. Anthropic PBC, Case No. 3:24-cv-05417-WHA, N.D. Cal. Settlement of $1.5 billion — the largest publicly reported copyright settlement in US history — preliminarily approved September 25, 2025 by Judge William Alsup. Final approval hearing scheduled April 23, 2026. Covers approximately 500,000 works from LibGen and PiLiMi datasets. The settlement does not create an ongoing licensing regime and is limited to past conduct through August 25, 2025. On the fair use question: Judge Alsup ruled in June 2025 that AI training on legally acquired books is “quintessentially transformative” and protected by fair use. Pirated copies were deemed “inherently, irredeemably infringing.”

[7] The EP resolution (Recital C) cites 6.9% of GDP. This figure traces to TERA Consultants, “Building a Digital Economy” (2010), commissioned by the International Chamber of Commerce/BASCAP, using a broad definition that includes interdependent and non-dedicated support industries (TV/radio manufacturers, general retail, transport). Core creative industries alone accounted for approximately 4.5%. The European Commission’s DG GROW currently cites approximately 3.95% of EU value added. Eurostat data (2022) places cultural enterprises’ value added at 2.0% of the total corporate economy.

[8] Report A10-0019/2026, operative text. The resolution also calls on the Commission to examine retroactive remuneration for past use, create a EUIPO-managed register, and introduce a rebuttable presumption that unaccounted-for works were used in training.

[9] Longpre et al., “A Large-Scale Audit of Dataset Licensing and Attribution in AI,” Nature Machine Intelligence, vol. 6, pp. 975-987, August 2024. The 70%+ figure refers to the proportion of 1,858 individual finetuning datasets (decomposed from 44 widely-used data collections) whose licenses were listed as “Unspecified” on major hosting platforms: GitHub 72%, Hugging Face 69%, Papers with Code 70%. Among datasets with specified licenses, miscategorisation rates exceeded 50% cross-platform, rising to 66% on Hugging Face specifically. Peer-reviewed.

[10] Stanford FMTI 2025: Mistral scored 18/100 (down from 55 in 2024). Bottom cluster alongside Midjourney (14) and xAI (14). Mistral scored 0 on the upstream domain (training data, compute) and 0 on model information. The FMTI team prepared Mistral’s transparency report because Mistral did not submit one. Industry average: 40.69/100. The only demonstrably copyright-clean training dataset, the KL3M project (see footnote [11]), is orders of magnitude too small for a general-purpose model.

[11] The KL3M Data Project (273 Ventures) contains 132 million documents from public-domain and government sources. Designed for legal domain applications, not general-purpose language modelling.

[12] DDEX (Digital Data Exchange), founded 2006. Approximately 100 members, over 15,000 implementation licenses issued. Six families of standards covering release delivery, sales reporting, works notification, licensing, and studio metadata.

[13] DDEX ERN v4.3.1 adds optional AI disclosure flags. Spotify announced support September 25, 2025. The standard enables labels and distributors to declare AI involvement in recordings (AI-generated vocals, AI-assisted mixing, etc.). This tracks AI in outputs, not AI training inputs.

[14] Authors Guild, “AI Licensing for Authors: Who Owns the Rights and What’s a Fair Split?”, November 11, 2025: “Licensing for AI training is a right entirely unrelated to publishing, and is not a right that can simply be tacked onto a subsidiary-rights clause. It is a right reserved by authors.”

[15] HarperCollins/Microsoft: $5,000 per title, author opt-in, 3-year term, select nonfiction backlist. Wiley: $23 million to an undisclosed tech company. See Publishers Weekly and Authors Alliance analysis. The Authors Alliance notes that “with few exceptions (a notable one being Cambridge University Press), publishers have not bothered to ask their authors.”

[16] OpenAI licensing deals: News Corp ($250M+ over 5 years, per Wall Street Journal), Wiley ($44M). Financial Times terms were not publicly disclosed; estimates range from $1-5M/year (The Information) to $5-10M/year (other reporting). Anthropic settlement ($1.5B) is distinct — it compensated for piracy, not a licensing arrangement.

[17] Microsoft Publisher Content Marketplace (PCM), launched February 2026. Partners: Business Insider, Condé Nast, Hearst, Associated Press, USA TODAY, Vox Media. Analysis: New Publishing Standard, February 11, 2026.

[18] RSL (Really Simple Licensing), launched September 10, 2025. 1,500+ publisher endorsers including Reddit, Yahoo, Medium, Quora, O’Reilly Media. CDN enforcement partnerships with Cloudflare, Akamai, Fastly. No major AI model developer has committed to honouring RSL files as of March 2026. See Crowell & Moring LLP analysis (January 2026) noting that RSL “acts largely as a request and instruction system” with no binding legal force.

[19] Disclosure: I was VP Engineering at Digiplug, a digital music distribution company, during the DDEX adoption period. I also spent three years as Chief Evangelist at Hugging Face, which hosts models from Mistral, Meta, Alibaba, and others directly affected by the rules discussed in this piece.

[20] Germany introduced the world’s first private copy levy with the 1965 copyright act (Urheberrechtsgesetz). France followed with the loi du 3 juillet 1985 (loi Lang). See CISAC Private Copying Global Study 2020 for the global evolution of levy systems.

[21] Even mechanisms weaker than itemised disclosure have failed. Spain’s Ministry of Culture proposed a Royal Decree on extended collective licensing (licencias colectivas ampliadas) for AI training under Article 12 of Directive 2019/790. The decree would have allowed collecting societies (SGAE, DAMA) to grant non-exclusive licenses to AI companies without individual creator consent. It was withdrawn on January 28, 2025, after over 30 cultural organisations campaigned against it under the slogan “Así, no.” Creators’ opposition was directed at the specific terms — inadequate remuneration guarantees and absence of individual consent requirements — rather than at collective licensing as a mechanism. The Ministry cited “falta de consenso por parte del sector cultural.” Source: International Publishers Association, February 25, 2025; Infobae/EFE, January 28, 2025.

[22] HADOPI budget: €82 million in cumulative public subsidies (programme 334, “Livre et industries culturelles”), 2009-2019. Fines: €87,000 cumulative through 2019, per HADOPI Rapport Annuel 2019, foreword by president Denis Rapone: “Depuis 2011, le montant total cumulé des amendes prononcées et portées à la connaissance de la Commission est de 87.000 euros.” Aggregation by Next INpact, confirmed by iGeneration, CCFI, and Europe 1.

[23] HADOPI enforcement funnel: ~13 million first-strike warning emails sent (Rapone: “treize millions d’avertissements”); 6,994 dossiers transmitted to prosecutors; 517 court judgments (Next INpact final tally); 1 internet disconnection order, issued June 3, 2013 by the Tribunal de police de Montreuil. The penalty was abolished by Décret n° 2013-596 of July 8, 2013, following the Lescure Report. Under retroactivity of more lenient criminal law, the suspension was immediately unenforceable. HADOPI merged into Arcom on January 1, 2022 (Loi n° 2021-1382 du 25 octobre 2021).

[24] Copie France collections: approximately €298 million in 2021, €285 million in 2022, €234 million in 2023, €246 million in 2024. Sources: MacGeneration/L’Informé (2022 figures), Next.ink (2024 figures), TorrentFreak (2021 figures). The levy applies to smartphones (€14 HT for devices above 64 GB), tablets, hard drives, USB keys, and other storage media. The tariff is set by the Commission pour la rémunération de la copie privée under Décision n°18 (September 5, 2018). By law (CPI, Article L.324-17), 75% of collections are distributed to rights holders; 25% funds “cultural activities” administered by the collecting societies.

[25] Germany’s ancillary copyright (Leistungsschutzrecht für Presseverleger, 2013): Axel Springer gave Google a free licence after traffic declined. Spain’s Canon AEDE (2014): Google shut down Google News in December 2014, returned in 2022 after Spain transposed the EU Copyright Directive, which replaced the national law. Article 15 of Directive 2019/790 (the EU-wide “press publishers’ right”) has produced some licensing agreements but primarily benefits large publishers.

[26] GEMA (Gesellschaft für musikalische Aufführungs- und mechanische Vervielfältigungsrechte) traces its institutional lineage to the founding of AFMA in 1903; the current entity was named GEMA in 1947 after a period of state control (STAGMA, 1933-1945). It represents approximately 100,000 members and manages over 4 million works in Germany. GESAC represents over 1 million creators through 32 collecting societies. SACEM manages 106 million works in its global repertoire (per SACEM’s 2023 annual data; the searchable database lists 96 million “most recently used” works).

[27] Copie France smartphone levy: €14 HT for smartphones above 64 GB capacity, per Décision n°18 of the Commission copie privée (September 5, 2018). With 20% VAT: approximately €16.80 TTC.

[28] CISAC Private Copying Global Study 2020 (using 2018 data): €1.019 billion collected across Europe, representing 90% of the global total of €1.046 billion. Germany: €332.5 million. France: €277.5 million. No updated comprehensive study has been published since.

[29] Axel Voss, Explanatory Statement, Draft Report on Copyright and Generative Artificial Intelligence (2025/2058(INI)), document PE775.433v01-00, dated 27 June 2025, pp. 13-14: “the European legislator or the European Commission should, pending the introduction of an appropriate provision to address this problem, establish an immediate, simple, flat-rate copyright fee for this use of 5 to 7% of global turnover in order to compensate for the added value that these businesses generate using the data of European creatives and to ensure it remains in Europe.”

[30] Report A10-0019/2026, adopted by JURI Committee January 28, 2026 (17-3-2) and by plenary March 10, 2026 (460-71-88). Paragraph 21 calls for fair remuneration “but not through a global licence for providers to train their genAI models in exchange for a flat-rate payment.” The resolution rejects the format Voss himself proposed.

[31] Landgericht München I, Case 42 O 14139/24, November 11, 2025 (GEMA v. OpenAI). The court distinguished between creating a training corpus (covered by the TDM exception under §44b UrhG) and the training process itself, where copyrighted works become “memorised” — meaning reproducibly embedded in model parameters. This memorisation constitutes reproduction under §16 UrhG, which implements Article 2 of the InfoSoc Directive (2001/29/EC) — an EU-harmonised provision, giving the reasoning persuasive authority beyond Germany. The court analogised to MP3 compression: both store protected content in compressed mathematical form and both constitute reproduction. Note: the existing private copy exception (Article 5(2)(b) InfoSoc Directive, §53 UrhG, CPI L.122-5) applies to copies made by natural persons for private, non-commercial use. Commercial AI training does not fit within this exception. The reproduction finding creates a legal basis for compensation, but applying the levy model to AI would require new or extended collective licensing legislation — not automatic application of the existing private copy framework.

[32] GEMA filed Berufung (appeal) on December 8, 2025, to the Oberlandesgericht München (Case Az.: 6 U 3662/25 e), challenging the dismissal of its personality-rights claims. OpenAI subsequently filed a cross-appeal on the copyright findings. The court rejected both parties’ suggestion to refer questions to the CJEU.

[33] Commissioner Henna Virkkunen, written answer to parliamentary question E-000528/2025, April 4, 2025: “The Commission will examine the application of these rules in the context of the review of the directive, which is due no sooner than 7 June 2026.” She confirmed before the CULT Committee in October 2025 that the evaluation report and proposal for review would be presented by December 2026.

[34] DeepSeek R1 was trained on 14.8 trillion tokens from undisclosed sources. Qwen 3 used approximately 36 trillion tokens. DeepSeek’s model disclosure document provides no information about training data provenance or copyright status. Note on self-hosting costs: the levy’s competitive impact is most significant at large inference volumes where self-hosting is already cost-competitive with API providers. At small scale, the non-levy costs of self-hosting (GPU infrastructure, MLOps, security, compliance) typically exceed API costs regardless of any levy.

[35] Estimates of Chinese LLM count vary by source. The figure of approximately 1,500 Chinese LLMs out of ~3,750 total publicly released models is drawn from multiple tracking sources as of mid-2025. Qwen derivative count: TechNode reported over 100,000 in April 2025; the Arcee Trinity Large technical report (February 2026) cites over 170,000 on Hugging Face.

[36] Stanford FMTI 2025 scores: IBM 95, Writer 73, AI21 Labs 66, Anthropic 46, Google 45, Amazon 39, OpenAI 35, DeepSeek 33, Meta 31, Alibaba 26, Mistral 18, Midjourney 14, xAI 14. Mistral is the lowest-scoring European company and lower than every Chinese company assessed.

[37] Guillaume Lample, co-founder of Mistral AI, orchestrated the download of approximately 70 terabytes from LibGen while employed at Meta in 2022, according to unsealed filings in Kadrey v. Meta and reporting by Mediapart (December 22, 2025). The quote “Tout le monde utilise LibGen” is attributed to Lample in these filings. Separately, a Meta internal email referenced “OpenAI and Mistral” using LibGen — but qualified this as “through word of mouth,” not direct evidence. Mistral has not categorically denied using LibGen. Covered by Mediapart, Next.ink, Siècle Digital, Actualitté, and Clubic.

[38] The EU AI Champions Initiative letter requesting a two-year AI Act enforcement pause was signed by over 110 European companies including Airbus, ASML, Siemens, BNP Paribas, and Mistral. See SiliconANGLE, July 3, 2025. The Commission rejected the request.

[39] Bertin Martens, “The European Union is still caught in an AI copyright bind,” Bruegel Analysis, September 2025. On competitive impact: the levy’s distortionary effect depends on its magnitude relative to the non-price advantages of commercial API providers (indemnification, SLA, compliance support, enterprise integration). At levy levels of 1-2% — the more likely landing zone — the distortion may be insufficient to shift enterprise purchasing behaviour. At 5-7%, it might.

[40] See footnote [6] for full case details. The fair use ruling (summary judgment, June 23, 2025) found that Claude’s use was “quintessentially transformative” because it converted expressive text into numerical weights. Pirated copies were separately held “inherently, irredeemably infringing.” The US Copyright Office’s Part 3 report (May 2025, 108 pages) reinforced this position: AI training “will often be transformative,” but fair use is not categorical. See footnote [43]. See also Skadden analysis, “Fair Use and AI Training: Two Recent Decisions,” July 2025.

[41] Kadrey et al. v. Meta Platforms Inc., N.D. Cal., summary judgment ruling, June 25, 2025. Judge Chhabria reached the same fair use conclusion for Meta’s Llama models, while noting that future plaintiffs who demonstrate actual market dilution from AI-generated content could flip the analysis on the fourth factor (market effect).

[42] Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc., D. Del. Fair use was denied because Ross built a direct competitor to Westlaw’s legal research platform. The court found the use was not transformative because it served the same market function as the original.

[43] U.S. Copyright Office, “Copyright and Artificial Intelligence, Part 3: Generative AI and the Copyright System,” 108 pages, May 9, 2025. The report concluded that AI training “will often be transformative” but that fair use is not categorical and must be assessed case by case. It also recommended against a compulsory licensing scheme, preferring to let the market develop.

[44] DeepSeek R1: 14.8 trillion tokens from undisclosed sources. Qwen 3: approximately 36 trillion tokens. Over 170,000 derivative models based on Qwen on Hugging Face alone. DeepSeek’s model disclosure provides no information about training data provenance or copyright status. Neither company has signed the GPAI Code of Practice or maintains a GDPR representative in Europe.

[45] As of early 2026, nine of the ten largest GDPR fines by value have been imposed on non-EU-headquartered companies: Meta/Facebook (€1.2B, €479M, €405M, €390M, €265M), Amazon (€746M), TikTok/ByteDance (€530M, €345M), LinkedIn/Microsoft (€310M). Source: CMS GDPR Enforcement Tracker, 6th edition (cut-off March 1, 2025), updated with 2025 enforcement actions. The Irish DPC alone issued eight of the top ten fines. On SME burden: a 2020 European Commission survey found 60% of SME respondents considered GDPR compliance “burdensome” despite the regulation’s recitals emphasising reduced administrative burden for smaller enterprises.

Cloud vs. Clout

Julien Simon — Wed, 11 Mar 2026 06:07:35 GMT

Oracle has produced the most leveraged cloud infrastructure transformation on record: $125 billion in debt, $553 billion in contractual commitments, 57 percent of which are concentrated in a single unprofitable customer, a stock that fell 56 percent in six months, and a credit rating two notches above junk.

Oracle’s moat was never cloud. It was clout — political access, personal relationships, and the willingness to make commitments no balance sheet could sustain. That distinction explains why the AI infrastructure boom’s first visible crack appeared not at a neocloud or a startup, but at the company whose chairman stood in the White House and promised half a trillion dollars he didn’t have.

This is the case study Hotel Abilene identified as the weakest link in the codependence web. Oracle reported Q3 FY2026 earnings on March 10. What follows is the full arc — from irrelevance to megalomania to reckoning — and what the Q3 numbers reveal about the structure beneath the backlog.

A decade at the wrong table

Oracle Cloud Infrastructure launched on October 20, 2016, under the name “Oracle Bare Metal Cloud Services.” It arrived a decade after AWS, six years after Azure, and four years after Google Cloud Platform. Ellison himself had to cancel the first attempt. “There was a pretty big disagreement between me and the powers that be at Oracle,” he told investors in 2022. “I thought we were just copying what the other guys were doing — which I thought was a really bad idea — and I wanted to start over.”[1]

The Gen 2 rebuild was led by Clay Magouyrk, a former AWS engineer who joined Oracle in 2014 and now serves as CEO of Oracle Cloud. Magouyrk was direct about what he brought: “I knew how they built them, because I had worked there. And we hired people who also worked there.”[2] Oracle poached aggressively from Seattle. The strategy was explicit — reverse-engineer the architectural decisions AWS had made, then build a version optimized for Oracle’s database workloads. The resulting network architecture, built on RDMA (Remote Direct Memory Access) cluster networking, achieves 200 Gbps per GPU. SemiAnalysis validated the performance claim: OCI’s cluster networking delivers roughly four times the bandwidth of comparable AWS configurations for large GPU training jobs.[3]

That networking advantage is real — it’s the reason OpenAI and xAI initially chose OCI for training clusters. It is the one piece of genuine technical differentiation Oracle possesses.

One piece isn’t enough. For the decade between OCI’s 2016 launch and the AI boom, Oracle remained a rounding error in cloud market share. Synergy Research Group never gave Oracle a specific share figure — it consistently fell below the threshold for individual reporting, grouped with Alibaba, IBM, and Salesforce in the “others” category.[4] Gartner rated OCI a “Niche Player” — its lowest quadrant classification — from 2017 through 2021. Even after upgrading Oracle to “Visionary” in 2022, the analysts added a qualifier that captured the market’s sentiment: Oracle suffered from “negative brand association for many organizations, caused by years of tough compliance enforcement and inconsistent sales and support.”[5]

The revenue tells the story without interpretation. OCI generated approximately $2.5 billion in FY2022 — a figure AWS produced roughly every two weeks.[6] By Q2 FY2023, quarterly OCI revenue was $1 billion, growing 53 percent off a base so small the growth rate was mathematically inevitable. Oracle’s IaaS (infrastructure-as-a-service) market share in 2022 was an estimated 2 percent. By 2023, it was still approximately 2 percent. AWS held 31 percent. Azure held 20 percent. Google held 13 percent.[7] Oracle was not shaping the cloud market. It was orbiting it.

How clout became cloud

The transformation from irrelevance to the world’s largest AI infrastructure commitment didn’t happen through engineering breakthroughs or customer acquisition. It happened through three relationships — one personal, one political, one transactional — that culminated in a White House photo op.

The personal relationship came first. Larry Ellison and Elon Musk are close friends, neighbors on the same Hawaiian island, and mutual investors. When Musk launched xAI in 2023 and needed GPU clusters fast, Ellison offered what OCI had: cheap, available capacity with good cluster networking, none of the bureaucratic overhead of AWS or Azure, and a willingness to cut deals the hyperscalers wouldn’t. xAI’s initial Oracle commitment reportedly reached $10 billion.[8] It was Oracle’s first marquee AI customer — and it validated OCI’s cluster networking as genuinely competitive for frontier AI training.

The political relationship was longer in the making. Ellison hosted a six-figure-per-person Trump fundraiser at his Rancho Mirage estate in 2020. He has donated over $30 million to Opportunity Matters Fund and related entities since 2021, ranking eleventh on OpenSecrets’ list of federal-level megadonors.[9] Court records show he participated in a call with Senator Lindsey Graham and Sean Hannity about contesting the 2020 election results.[10] Safra Catz served on Trump’s 2016 transition team and donated $1 million to Preserve America PAC in 2024. A Trump adviser told Wired that Ellison is “sort of a shadow president of the United States.”[11] At the Stargate announcement, Trump said of Ellison: “In the case of Larry, it’s well beyond technology, he’s sort of CEO of everything.”[12]

On January 21, 2025, the political and transactional threads converged. Oracle stood alongside OpenAI and SoftBank at the White House to announce Stargate — a $500 billion commitment to AI infrastructure over 4 years, the largest technology investment in history. The optics were striking. The substance was thinner. SoftBank and OpenAI each contributed approximately $19 billion in initial equity and hold roughly 40 percent ownership each. Oracle and MGX, the Abu Dhabi sovereign fund, contributed $7 billion each — a 15-20 percent stake in a venture in which Oracle was positioned as the primary infrastructure provider.[13]

Elon Musk’s assessment, posted the same day, was blunt: “They don’t actually have the money.”[14]

The GCC dimension extended the clout thesis into sovereign territory. Oracle signed sovereign cloud agreements with STC in Saudi Arabia, du in the UAE, and operators in Bahrain and Kuwait, delivering OCI services through Oracle’s Alloy platform.[15] A $20 billion “UAE Stargate” project with MGX, OpenAI, Nvidia, and SoftBank positioned Oracle in Abu Dhabi. Oracle built an underground data center nine stories below Jerusalem. When asked about Oracle’s expanding Gulf presence, Catz was emphatic: “Our support has improved our business.”[16] The GCC cloud deals followed the same pattern as Stargate: Oracle’s access came through political channels (Trump’s May 2025 Gulf tour, the Abraham Accords diplomatic network, the MGX co-investment relationship), not through winning competitive cloud evaluations against AWS or Azure.

The structural concern is what access-based positioning doesn’t select for: geographic risk management. Oracle’s GCC data centers now sit in a kinetic zone whose risk profile changed permanently when Iranian retaliatory strikes damaged cloud infrastructure in the Gulf earlier this month.[17] Data residency requirements force sensitive data within national borders, and Oracle, unlike AWS, lacks the global footprint to absorb a regional facility loss.[18] A capability-built provider diversifies against geographic concentration. An access-built provider takes whatever geography the political relationship offers.

The pattern across all three relationships — Musk, Trump, Gulf sovereigns — is consistent. Oracle gained access through personal and political channels to opportunities its technology couldn’t win. That access produced extraordinary commitments. Those commitments now require extraordinary capital. And the capital is borrowed.

The Ellison escalation

Larry Ellison’s earnings call rhetoric followed a trajectory that, plotted alongside the stock price and the debt load, reveals the mechanism of the current crisis: each quarter’s promises were larger than the last, each one requiring more capital to fulfill, and each one masking the gap between what Oracle was committing and what Oracle was delivering.

The escalation began modestly. On the Q1 FY2024 call in September 2023, Ellison mused: “Is generative AI the most important new computer technology ever? Maybe. We’re about to find out.” He revealed $4 billion in signed AI training contracts.[19] The stock was at $115. Total debt was $86 billion. Free cash flow was positive.

By September 2024, the rhetoric had gone orbital: a financial analyst meeting projecting $104 billion in FY2029 revenue — doubling the $53 billion base — alongside a 131,072-GPU Blackwell cluster announcement and Larry describing “soon-to-be-built data centers exceeding a gigawatt.”[20] By June 2025, Ellison abandoned any remaining restraint: “We will build and operate more cloud infrastructure data centers than all of our cloud infrastructure competitors combined.”[21] He described demand as “almost insatiable.”

The September 2025 Q1 FY2026 call was the crescendo. Safra Catz projected OCI revenue growing from $18 billion in FY2026 to $32 billion, then $73 billion, then $114 billion, then $144 billion over four years.[22] RPO — remaining performance obligations, Oracle’s measure of contracted but unrecognized revenue — hit $455 billion. The $300 billion OpenAI deal was revealed. The stock surged 36 percent in a single session — its strongest day since 1992 — and peaked at $345.72.[23] Ellison’s net worth jumped roughly $100 billion in 24 hours, briefly making him the world’s richest person.[24]

Ellison has always bet this way. PeopleSoft in 2004: $10.3 billion hostile takeover, delivered.[25] Sun in 2010: $7.4 billion for a company losing $100 million a month — most analysts wrote it off, but the hardware and systems engineering talent became part of Oracle’s infrastructure foundation.[26] Cerner in 2022: $28.3 billion — a 20 percent premium — for a healthcare IT platform now losing customers and reportedly being explored for divestiture.[27] Each bet larger than the last. Each time, the database monopoly’s cash flow bailed Oracle out. The difference this time is that the cash flow can’t cover the hangover: free cash flow is already negative $10 billion, and the bet is five times larger than Cerner.

Ellison is the octogenarian billionaire with a Hawaiian island, a fighter jet collection, and an unwavering belief that personal willpower can reshape industries. But unlike the comic-book version of that archetype, Ellison doesn’t build the thing himself. His data centers are built by Crusoe Energy Systems (a former cryptocurrency mining company), financed by Blue Owl Capital (whose own fund is gating redemptions), and powered by land and utilities sourced by Lancium.[28] Oracle’s contribution is the check, the customer relationship, and a signature on contracts that no other company’s CFO would approve.

The market believed the narrative for exactly three months. Then the balance sheet arrived.

The balance sheet beneath the backlog

Four tests reveal the structure beneath the headline numbers.

Can Oracle pay for its own buildout? The simplest question in corporate finance: does the business generate enough cash to fund what it’s spending? In FY2024, Oracle spent $6.9 billion building infrastructure against $18.7 billion in cash from operations — comfortably self-funded, with $11.8 billion left over.[29] In FY2025, infrastructure spending tripled to $21.2 billion. Cash from operations was $20.8 billion. For the first time in at least 15 years, Oracle spent more than it earned, recording a negative $390 million.[30] In the first half of FY2026, the gap widened: $20.5 billion in spending against $10.2 billion in cash from operations, leaving a $10.3 billion shortfall.[31] Over the trailing twelve months through Q3, cash from operations reached $23.5 billion — growing, but still $26.5 billion short of $50 billion in planned infrastructure spending. That gap is funded entirely by borrowing.

How real is the $553 billion backlog? Oracle’s remaining performance obligations — contracted revenue not yet recognized — reached $553 billion in Q3. That is approximately 32 times its quarterly revenue.[32] For comparison, AWS carries roughly 7 times and Microsoft roughly 9 times.[34] The conversion schedule explains why the ratio matters: only a third of the backlog converts to revenue within twelve months. Another third arrives in years three through five. The rest stretches beyond five years.[33]

The near-term portion is growing: the twelve-month slice grew 40 percent year-over-year in Q2, up from 25 percent the prior quarter.[35] That’s encouraging. But the total backlog grew 438 percent in the same period, meaning the portion that won’t convert for three to fifteen years is expanding far faster than the portion that will. The backlog isn’t fake. The timing is the problem: Oracle’s debt payments begin immediately, while the revenue that justifies the debt arrives over a five-to-fifteen-year horizon. If investors lose appetite for Oracle bonds before that revenue materializes, the debt clock kills the company before the backlog converts. Oracle locked in maturities stretching to 2065 on some recent issuances, so it doesn’t face imminent principal repayment — interest expense and infrastructure funding are the immediate pressure.[36]

The concentration makes the timing problem existential. Approximately 57 percent of Oracle’s backlog — roughly $300 billion — comes from a single customer, OpenAI, whose own financial projections show $218 billion in cumulative cash consumption through 2029 and no profitability until 2029 or 2030.[37] Moody’s flagged the OpenAI contract as “effectively one of, if not the world’s largest, project financing” and warned of “significant counterparty concentration risk.”[38] If OpenAI’s ability to honor tens of billions per year in Oracle commitments depends on continued multi-billion-dollar fundraising, then Oracle’s $553 billion backlog is not a committed revenue stream. It is a bet on other people’s willingness to fund OpenAI indefinitely.

How much debt is too much? Total debt has reached approximately $125 billion after the February 2026 bond offering — up from $87 billion in FY2024.[39] In just five months, Oracle issued $43 billion in new bonds. For every dollar of shareholder equity, Oracle owes roughly five dollars in debt — Barclays calculated the ratio at approximately 500 percent, versus 50 percent at Amazon, 30 percent at Microsoft, and even lower at Meta and Google.[40] Both S&P and Moody’s rate Oracle two notches above junk, with negative outlooks.[41] S&P’s trigger for downgrade: leverage remaining elevated for two years. Oracle is already past the threshold.

In November 2025, Barclays analyst Andrew Keches downgraded Oracle debt to Underweight — the bond market’s equivalent of Sell — and warned that Oracle would “run out of cash by the November 2026 quarter.”[42] The stress test showed the shortfall materializing even without further spending increases. Keches warned Oracle “could ultimately fall to a BBB-minus rating, nearing the threshold for junk bonds” and compared Oracle’s credit profile to “high-risk BBB-minus issuers, such as certain automotive and cable companies.”[43] The cost of insuring Oracle’s debt against default surged past 150 basis points in December 2025 — the highest level since 2009, and a signal that bond investors see more risk in Oracle than in any other major technology company.[44] Oracle has since raised $73 billion in debt and equity-linked instruments, which extends the runway — but borrowing to extend the runway is itself the mechanism the access sequence describes.

What is Oracle sacrificing to keep building? Bloomberg reported on March 5 that Oracle plans to cut 20,000 to 30,000 employees — 12 to 18 percent of its global workforce — to free $8 to $10 billion in cash flow for AI data center construction.[45] The company is cannibalizing its existing business to feed a new one. That’s not a pivot. It’s a confession that the existing business cannot support the commitments the new one requires.

What Ellison built

The cluster networking works — it’s the reason frontier labs chose Oracle before Oracle had political access to offer. The first AI customers came for the engineering, not the White House. The $300 billion Stargate commitment came through political channels. Technical merit at the cluster level cannot compensate for what happens when commitment scale overwhelms the operating model.

OCI’s pricing is lower than AWS's and Azure's for comparable compute instances, particularly for GPU clusters, with consistent global pricing that hyperscalers can’t match.[49] The Gen 2 architecture, rebuilt from scratch after Ellison’s cancellation of Gen 1, is cleaner than the legacy sprawl of AWS’s 200-plus services. For the narrow use case of massive GPU cluster training — the workload that defines the current AI infrastructure market — OCI is technically competitive. Customers confirm it: MosaicML (now part of Databricks) reported linear performance scaling across hundreds of GPUs on OCI’s cluster network. Reka, founded by researchers from DeepMind and Google Brain, chose OCI for training its multimodal models. Magouyrk disclosed on the Q2 earnings call that new customers consume allocated capacity “in two to three days.”[50]

The relevant question is not whether Oracle matches AWS’s operating model — it was never designed to do so. What matters is whether Oracle’s actual operating model — leasing facilities, outsourcing construction, concentrating on cluster networking for AI training workloads — can generate returns on the capital invested.

For a niche cloud provider running $5 billion in quarterly OCI revenue, the asset-light model might be exactly right. You don’t need to design your own chips or build your own cooling systems to rent GPU clusters profitably. AWS designs its own Graviton and Trainium processors, operates hundreds of availability zones, and has 20 years of institutional knowledge in permitting, power procurement, and construction management. Google designs its own TPUs and cooling systems. Microsoft developed the Maia AI accelerator and the Cobalt CPU.[51] Oracle does none of this — but for a focused GPU rental business, it doesn’t have to. (Full disclosure: I spent six years at AWS. That experience informs what I know about hyperscale operations — and readers should weigh it accordingly.)

What scale broke

The problem is not the model. The problem is that Oracle committed $553 billion in backlog and borrowed $125 billion while running a model designed for a fraction of that scale. When Ellison confirmed in late 2025 that Oracle had sold Ampere Computing because “we no longer think it is strategic for us to continue designing, manufacturing, and using our own chips,” he wasn’t making a mistake — he was affirming the asset-light approach.[52] But an asset-light approach that depends on third-party contractors for construction, third-party financing for capital, and a single customer for the majority of the backlog has no margin for error at the gigawatt scale. The operational gap showed at Abilene. Winter weather disrupted liquid-cooling infrastructure at the flagship Stargate campus, forcing buildings offline for multiple days.[53] The Stargate LLC joint venture itself is reportedly dormant — three sources told The Information it’s a “shelved idea” with no staff hired.[54] Blue Owl declined to finance the Michigan campus.[55] The Abilene expansion was scrapped.[56] xAI, initially a marquee customer, withdrew, with Musk implying Oracle couldn’t keep pace.[57]

Scale broke the model — the mismatch between an asset-light operating structure and half-a-trillion dollars in commitments that require the execution discipline of an asset-heavy one.

And the floor beneath the AI bet may not be stable. Oracle’s legacy database and applications business is the engine behind $23.5 billion in trailing twelve-month operating cash flow — real money, and the machine that bailed out every previous Ellison gamble. But that franchise faces its own pressure. Cloud-native databases are eroding the on-premise installed base. Third-party support providers like Rimini Street have built entire businesses on undercutting Oracle maintenance contracts. And the Cerner integration is consuming management attention amid reported customer dissatisfaction — the VA health record system implementation alone has drawn congressional scrutiny.[58] Oracle’s software support revenue, the bedrock of the legacy franchise, has been flat to slightly declining for several years, which means shrinking in real terms.[59] If the cash flow machine that’s supposed to sustain Oracle through the timing gap is itself degrading, Barclays’ timeline gets worse.

What Q3 reveals

Oracle reported Q3 after the bell. The numbers beat expectations: revenue of $17.2 billion (up 22 percent — what Oracle called its first 20-plus percent organic growth quarter in over fifteen years), earnings per share of $1.79 excluding stock compensation and other non-cash items (above the $1.70 Wall Street expected), and OCI revenue of $4.9 billion (up 84 percent year-over-year). The stock jumped nearly 8 percent after hours. FY2027 total revenue guidance was raised to $90 billion.[46]

The beat is real. What it doesn’t resolve is the structure beneath it. The backlog grew to $553 billion — still expanding, but each quarter adds less than the one before: $68 billion in Q2, only $29 billion in Q3. The commitment machine is slowing. OCI revenue acceleration is real: $4.9 billion, up from $4.1 billion last quarter. But the backlog-to-revenue ratio barely moved — still 32 times, virtually unchanged from Q2. The timing gap between what Oracle owes and what Oracle earns hasn’t closed.

On capital, Oracle raised another $30 billion — partly through bonds rated above junk (for now), and partly through a type of preferred stock that will eventually convert into common shares, diluting existing shareholders. Total capital raised now exceeds $73 billion in under a year. The company said it does not intend to issue more bonds in calendar 2026 — but it also secured the right to sell new shares directly into the stock market at any time, and hasn’t used that option yet.[47]

The $90 billion FY2027 total revenue guidance deserves serious engagement. At that run rate, the backlog-to-revenue ratio drops to roughly 25 times — better, but still roughly three times worse than AWS or Microsoft. Even if Oracle delivers $90 billion, the timing gap doesn’t close — it narrows. The question is whether management guidance that has escalated every quarter since September 2023 constitutes evidence that the revenue will materialize, or whether it is the latest iteration of the Ellison escalation, the piece just documented. The Q3 results are the strongest evidence yet that the bull case may be right on revenue. They are equally strong evidence that the capital structure needed to produce that revenue remains unsustainable without external funding. Both things are true simultaneously.

The most revealing detail was structural, not financial. Oracle disclosed that new large-scale AI contracts in Q3 are funded either by customer prepayments — in which customers pay Oracle upfront to purchase GPUs — or by customers buying GPUs themselves and supplying them to Oracle for operation. No cloud provider marketing itself as a hyperscaler has ever structured contracts this way. Colocation operators have always operated on customer-funded models, but they don’t claim to be cloud providers. AWS, Azure, and Google Cloud fund their own infrastructure and charge customers for usage. Oracle is inverting the model: the customer bears the capital cost, Oracle provides the rack space and networking. That is colocation economics at hyperscale prices — and it means the $553 billion backlog carries a different risk profile than the headline number implies. Oracle isn’t committing $553 billion in capital. Its customers are. Oracle is shifting capital risk to its counterparties — either because it has learned the lesson of Abilene, or because its balance sheet can no longer absorb the capital requirements of the commitments it signs.[48]

The access sequence

The analyst consensus on Oracle is “AI spending too much, debt too high.” That’s a description, not a diagnosis. When a company’s position on a capital-intensive technology layer depends on access rather than capability, the failure follows a sequence — and the sequence is predictable.

Stage 1 — Access. Political, personal, or financial relationships secure a position that the company couldn’t win through competitive evaluation. The access is real. The capability it implies is not.

Stage 2 — Overcommitment. Access produces commitments — contracts, joint ventures, government announcements — that exceed the company’s operational capacity to deliver. The commitments are rational for every counterparty: each one needs the relationship for a different reason.

Stage 3 — Capital substitution. Borrowed capital replaces operational capability as the mechanism for fulfilling commitments. The company finances what it cannot build. Debt or equity infusions substitute for the engineering, operations, and institutional knowledge that competitors accumulated over decades.

Stage 4 — Operational exposure. The gap between commitments and capability becomes visible. Projects fail, partners withdraw, timelines slip. Each failure increases the capital required to maintain the position.

Stage 5 — The maturity mismatch. Capital costs (debt service, interest expense, or cash burn) arrive on a fixed schedule. Revenue from commitments arrives on the counterparty’s schedule—years or decades later. If access to capital markets tightens before the revenue materializes, the structure collapses.

Oracle entered Stage 1 through Ellison’s relationships with Musk and Trump. Stage 2 was Stargate and the $553 billion backlog. Stage 3 was $43 billion in bonds in five months. Stage 4 is Abilene, Blue Owl, xAI. Stage 5 is what Barclays is warning about: cash exhaustion by November 2026.

The sequence is not new. Global Crossing followed suit in the late 1990s: political access to rights-of-way produced commitments to lay fiber across oceans, debt replaced operational cash flow, utilization never matched capacity, and the maturity mismatch led to bankruptcy. WorldCom followed the same path. The technology was different. The financial structure was identical.

OpenAI — the company on the other side of Oracle’s $300 billion contract — is following the same sequence. Stage 1: Sam Altman’s political access — White House visits, congressional testimony, the Stargate photo op — and the Microsoft partnership secured OpenAI infrastructure commitments no company with negative cash flow could have earned on financial merit alone. Stage 2: $300 billion in Oracle commitments, Stargate, enterprise contracts — commitments that assume revenue OpenAI has never generated. Stage 3: $6.6 billion in equity at a $157 billion valuation, a for-profit conversion designed to unlock further capital, projected cumulative cash consumption of $218 billion through 2029.[60] The mechanism is equity rather than debt, but the structural dynamic is identical: external capital substituting for self-generated cash flow.

Oracle’s $300 billion contract with OpenAI is not a transaction between a strong buyer and a leveraged seller. It is Stage 2 meeting Stage 3 — two companies whose positions depend on access rather than capability, each using the other’s commitment to justify its own capital raise. Moody’s called it the world’s largest project financing. It might be the world’s largest circular reference.

Does this pattern sort the entire AI infrastructure ecosystem? One distinction first: OpenAI built GPT-4 — a genuine product capability. It enters the access sequence only on the infrastructure and capital side, which means its path out (product revenue reaching self-funding scale) is more plausible than Oracle’s. The table below maps eight companies by how they got their infrastructure seat and whether the five-stage sequence applies.[62]

The pattern sorts cleanly. The four companies that earned their infrastructure position through engineering — AWS, Google, Meta, Nvidia — design their own processors, fund their capex from operating cash flow, and don’t appear on the framework at all. Their risk is cyclical (demand slowdown), not structural (access-dependent position collapsing). They built the thing. They own the thing.

Microsoft occupies an interesting middle ground. Azure is an engineering achievement on its own merits — two decades of enterprise operations, custom Maia AI chips, custom Cobalt CPUs, hundreds of thousands of enterprise customers who chose Azure in competitive evaluations. But the OpenAI partnership introduced Stage 2 characteristics: 45 percent of commercial commitments from a single counterparty that won’t be profitable until 2029.[61] Microsoft’s balance sheet can absorb the exposure. If OpenAI’s trajectory disappoints, Microsoft writes down a multi-billion-dollar investment and restructures a commercial relationship that accounts for nearly half its forward-looking cloud commitments. That’s not existential. It is the kind of risk that access-based dependencies introduce even into capability-built companies.

Then there are Oracle, CoreWeave, and OpenAI at the bottom of the table — deep into the five-stage sequence, dependent on continued access to capital markets. CoreWeave followed a parallel path through Nvidia’s relationship rather than political access, and its credit default swaps tell the same story Oracle’s do. The difference between them and OpenAI is that OpenAI has a product that could, eventually, fund the infrastructure. Oracle has a landlord business funded by debt. CoreWeave has a GPU rental business funded by an IPO and more debt.

What access purchases

Hotel Abilene asked why six rational companies are tightening a web that serves none of them. This piece answers a different question: how the weakest one got in. Codependence doesn’t select for strength. It selects for willingness, and willingness without capability is what access purchases.

The five-stage sequence is falsifiable. If Oracle’s RPO-to-revenue ratio closes from 32 times to the 7-9 times that AWS and Microsoft maintain — meaning the backlog converts into recognized revenue at hyperscaler rates — the model works at scale, and the debt is serviceable. If the pattern produces a different outcome at any company where access substitutes for capability, the framework is wrong. Apply it to your own deal flow.

Oracle’s chairman stood in the White House and committed half a trillion dollars. His balance sheet had $7 billion. His customers are now buying their own GPUs. The cloud was always someone else’s. The clout was always his.

Notes

[1] Larry Ellison, Oracle CloudWorld 2022, as reported by The Register, “Larry Ellison killed Oracle’s first-generation cloud,” October 24, 2022.

[2] Clay Magouyrk, SVP Oracle Cloud Infrastructure, interview with TechTarget, 2019. A-tier (named executive, on-the-record quote).

[3] SemiAnalysis, “How Oracle Is Winning the AI Compute Market.” OCI’s cluster networking uses RDMA over Converged Ethernet version 2 (RoCEv2) with non-blocking cluster fabric on NVIDIA ConnectX-7 NICs. Oracle claims 200 Gbps per GPU and approximately 4x the cluster bandwidth of comparable AWS configurations for large training jobs. B-tier: industry analyst architectural analysis, not independently replicated benchmark.

[4] Synergy Research Group consistently reported Oracle below the threshold for individual cloud IaaS market share reporting through 2022. B-tier (industry analyst data).

[5] Gartner, “Magic Quadrant for Cloud Infrastructure and Platform Services,” 2022. Oracle classified as “Visionary” (upgraded from “Niche Player” held 2017-2021). The Register, October 2022.

[6] Oracle FY2022 10-K: total cloud infrastructure (IaaS + PaaS) revenue approximately $2.5 billion. Note: Oracle’s cloud revenue reporting methodology combines IaaS and PaaS differently than AWS reports IaaS alone; the comparison is directionally accurate but not precisely apples-to-apples. Amazon 10-K FY2022: AWS revenue $62.2 billion. $62.2B / 26 two-week periods = ~$2.4B. A-tier (company filings).

[7] Cloud IaaS market share data from Synergy Research Group and Canalys for calendar years 2022-2023. AWS ~31%, Azure ~20%, Google Cloud ~13%, Oracle ~2%. B-tier (industry analyst data).

[8] xAI’s initial Oracle contract was reported at up to $10 billion. The Information and Bloomberg, 2024. xAI subsequently withdrew. B-tier (press reporting, contract terms not publicly filed).

[9] OpenSecrets, “Oracle invested millions in government influence before winning a major stake in TikTok,” September 2025. A-tier (federal campaign finance filings).

[10] Court records from the January 6 investigation showed Ellison’s participation in a call with Senator Lindsey Graham and Sean Hannity. CNN, September 2025. A-tier (court records).

[11] Safra Catz, Trump’s 2016 transition team. $1 million Preserve America PAC donation: OpenSecrets filings. A-tier (FEC filings).

[12] Trump at the Stargate announcement, January 21, 2025. A-tier (public remarks of record).

[13] Stargate LLC structure: SoftBank and OpenAI each ~$19 billion and ~40% ownership. Oracle and MGX each ~$7 billion. Multiple press reports citing SEC filings.

[14] Elon Musk, posted on X, January 21, 2025. Entrepreneur.com.

[15] Oracle sovereign cloud agreements: STC (Saudi Arabia), du (UAE), operators in Bahrain and Kuwait. Gulf Business, January 2025.

[16] Safra Catz: “Our support has improved our business.” CTech/Calcalist, July 2025. Underground Jerusalem data center confirmed in the same interview. B-tier.

[17] AWS facilities sustained damage during Iranian retaliatory strikes in early 2026. Referenced in prior AI Realist coverage (”Access, Disable, Destroy“).

[18] Data residency requirements in Saudi Arabia (CST regulations, PDPL) and the UAE mandate that sensitive data be stored within national borders. The dynamics by which sovereignty regulations concentrate data in kinetically vulnerable locations are analyzed in the AI Realist’s sovereignty vertical framework.

[19] Larry Ellison, Oracle Q1 FY2024 earnings call, September 11, 2023. A-tier.

[20] Oracle Financial Analyst Meeting, September 2024. $104 billion FY2029 revenue target. 131,072 Nvidia Blackwell GPU cluster. A-tier.

[21] Larry Ellison, Oracle Q4 FY2025 earnings call, June 11, 2025. A-tier.

[22] Safra Catz, Oracle Q1 FY2026 earnings call, September 9, 2025. OCI revenue projections: $18B, $32B, $73B, $114B, $144B over five fiscal years. A-tier.

[23] Oracle stock surged approximately 36% on September 10, 2025, hitting an intraday high of $345.72. RPO reached $455 billion. CNBC. A-tier.

[24] CNBC, “Larry Ellison is $100 billion richer after blowout Oracle earnings report,” September 10, 2025.

[25] Oracle acquired PeopleSoft for $10.3 billion in December 2004 after an 18-month hostile takeover bid. The Register, January 2025; Springer Nature case study. A-tier (SEC filings; extensively documented).

[26] Oracle acquired Sun Microsystems for $7.4 billion in January 2010. Ellison said at the time that Sun was “losing $100 million a month.” CIO, 2024. A-tier (SEC filings; executive public statements).

[27] Oracle acquired Cerner for $28.3 billion in June 2022, a 20% premium over Cerner’s closing price. CNBC, December 2021. A-tier for acquisition price (SEC filing); B-tier for divestiture reporting (press).

[28] Crusoe Energy Systems: primary data center operator for the Abilene Stargate site. SemiAnalysis described Crusoe as “on paper, a cryptominer inexperienced with datacenters.” Blue Owl Capital: primary financing partner. Lancium: land and power infrastructure. B-tier (press reporting).

[29] Oracle FY2024 10-K. Capex $6.9 billion, GAAP operating cash flow $18.7 billion, free cash flow $11.8 billion. A-tier (SEC filing).

[30] Oracle FY2025 annual results. Capex $21.2 billion, GAAP operating cash flow $20.8 billion, free cash flow approximately negative $390 million. Oracle reports FCF was positive in every fiscal year from at least FY2010 through FY2024 per available data. A-tier (SEC filing).

[31] Oracle H1 FY2026 (Q1+Q2). Capex approximately $20.5 billion, GAAP operating cash flow approximately $10.2 billion, free cash flow approximately negative $10.3 billion. Q2 FY2026 earnings release: capex $12 billion in Q2 alone. A-tier (SEC filing).

[32] Oracle Q3 FY2026 RPO: $553 billion. Quarterly revenue $17.2 billion. $553B / $17.2B = ~32.2x. Oracle Q3 FY2026 earnings release, March 10, 2026. A-tier.

[33] Oracle 10-Q for Q2 FY2026 (period ended November 30, 2025): approximately 33% of RPO expected to be recognized in the next 12 months; approximately 35% in months 37-60; remainder thereafter. Oracle Q1 FY2026 10-Q (filed with SEC) provides the conversion schedule disclosure. A-tier (SEC filings). The near-term convertible portion (~$173B) represents approximately one-third of total RPO.

[34] AWS RPO approximately $190 billion (per Amazon Q4 2025 10-K) versus ~$28 billion quarterly revenue = ~6.8x. Microsoft commercial RPO $625 billion (per Microsoft Q2 FY2026 10-Q) versus ~$70 billion quarterly revenue = ~8.9x. A-tier (SEC filings). OpenAI’s share of Microsoft RPO is estimated at 45% per Fortune and Fierce Network. B-tier for attribution breakdown.

[35] Doug Kehring, Oracle Principal Financial Officer, Q2 FY2026 earnings call, December 10, 2025: “RPO expected to be recognized in the next twelve months grew 40% year over year, compared with 25% last quarter, and 21% last year.” A-tier.

[36] Oracle’s September 2025 and February 2026 bond offerings included tranches with maturities extending to 2055 and 2065. The immediate cash pressure is interest expense ($1.18 billion in Q3 alone, up 32 percent year-over-year, implying an annualized run rate approaching $4.7 billion) and capex funding, not principal repayment. Oracle Q3 FY2026 earnings release; Data Center Dynamics. A-tier (SEC filing).

[37] OpenAI cumulative cash consumption projections: $218 billion through 2029, per internal projections shared with investors. Sherwood News. Profitability not expected until 2029-2030 per Fortune. B-tier (leaked investor materials via quality journalism).

[38] Moody’s: “effectively one of, if not the world’s largest, project financing” and “significant counterparty concentration risk.” Yahoo Finance. A-tier (credit rating agency assessment).

[39] Oracle total debt trajectory: ~$87B (FY2024) → ~$104B (FY2025) → ~$125B+ post-February 2026 bond offering. A-tier (Oracle 10-K and 10-Q filings).

[40] Barclays debt-to-equity comparison: Oracle approximately 500%, Amazon ~50%, Microsoft ~30%. Barclays fixed income research note, November 2025. Note: sourced through secondary financial reporters, as the original research note is not publicly available. B-tier (consistent secondary reporting of A-tier source).

[41] S&P: BBB, Negative outlook. Moody’s: Baa2, Negative outlook. Fitch: BBB, Stable. A-tier (credit rating agency publications).

[42] Barclays fixed income analyst Andrew Keches, Underweight downgrade, November 11, 2025. “Run out of cash by the November 2026 quarter.” Sourced through financial media. B-tier (consistent secondary reporting).

[43] Keches: “could ultimately fall to a BBB-minus rating, nearing the threshold for junk bonds.” Compared to “high-risk BBB-minus issuers, such as certain automotive and cable companies.” Same note and secondary sources.

[44] Oracle's five-year CDS spread surged past 150 basis points in December 2025, its widest since 2009. Bloomberg, “Oracle Debt Trades Like Junk as Bond, CDS Spreads Flare,” December 12, 2025. Deutsche Bank reported 156 bps; S&P Global Market Intelligence reported 139 bps on December 11. CDS trading volume climbed to $9.2 billion over the past 10 weeks. B-tier (consistent multi-source reporting).

[45] Bloomberg, March 5, 2026. TD Cowen estimated 20,000-30,000 employees (12-18% of ~162,000 workforce), freeing $8-10 billion in cash flow. B-tier (quality journalism).

[46] Oracle Q3 FY2026 earnings release, March 10, 2026. Revenue $17.2 billion (up 22% YoY), non-GAAP EPS $1.79 (up 21%), OCI revenue $4.9 billion (up 84%), cloud revenue $8.9 billion (up 44%). FY2027 revenue guidance raised to $90 billion. A-tier (SEC filing).

[47] RPO: $553 billion (up 325% YoY, up $29 billion sequentially from $523 billion). Sequential growth decelerated from +$68B in Q2 to +$29B in Q3. Oracle raised $30 billion via investment-grade bonds and mandatory convertible preferred stock; stated intention not to issue additional bonds in calendar 2026. At-the-market equity program not yet initiated. Same earnings release. A-tier.

[48] Oracle disclosed that Q3 large-scale AI contracts are structured with customer prepayments (customer funds for GPU purchases upfront) or customer-supplied GPUs. Per the earnings release: “Oracle does not expect to have to raise any incremental funds to support these contracts as most of the equipment needed is either funded upfront via customer prepayments so Oracle can purchase the GPUs, or the customer buys the GPUs and supplies them to Oracle.” This represents a structural shift from Oracle-funded to customer-funded capital expenditure for new AI commitments. A-tier (SEC filing).

[49] OCI pricing: Oracle’s published pricing is typically 30-50% lower than comparable AWS/Azure instances in US regions, with consistent global pricing. Oracle.com pricing page. C-tier for specific percentage claims (vendor-published). The pricing advantage is directionally confirmed by third-party comparisons, but the magnitude varies by workload.

[50] MosaicML (now Databricks) reported linear performance scaling across hundreds of GPUs on OCI’s cluster network in NCCL benchmarks. Oracle Cloud Infrastructure blog: “To the best of our knowledge, no cloud provider has posted better results in absolute terms (speed) or as a % of the theoretical maximum (scaling efficiency).” Reka, founded by researchers from DeepMind, Google Brain, and FAIR, selected OCI for training multimodal models and cited “high-performance AI infrastructure capabilities, dedicated engineering support, and global footprint.” Oracle press release, April 2024. Magouyrk on capacity consumption: Q2 FY2026 earnings call, December 10, 2025. B-tier (customer endorsement + vendor benchmark + earnings call disclosure).

[51] AWS: custom Graviton CPU (4 generations), custom Trainium AI accelerator (2 generations), 20+ years of operations. Google: custom TPU (7 generations). Microsoft: Maia AI accelerator, Cobalt ARM CPU. Data Center Dynamics, 2024. B-tier.

[52] Ellison on Ampere sale: “We no longer think it is strategic for us to continue designing, manufacturing, and using our own chips.” Oracle Q2 FY2026 earnings release (SEC filing, EX-99.1). A-tier.

[53] Abilene cooling infrastructure disruptions: The Information, via Data Center Dynamics and Tom’s Hardware, March 2026. B-tier (press reporting).

[54] Stargate LLC is described as dormant: The Information, cited by multiple outlets. Three sources described it as a “shelved idea.” B-tier.

[55] Blue Owl declined Michigan financing: CNBC, December 17, 2025. B-tier (quality journalism).

[56] Abilene expansion scrapped: Bloomberg, March 6, 2026. B-tier (quality journalism).

[57] xAI withdrawal: reported across multiple outlets. B-tier (press reporting).

[58] Oracle’s legacy database franchise faces multiple pressures. Rimini Street’s business model is built on undercutting Oracle maintenance contracts at lower cost — a direct threat to software support revenue. The Cerner acquisition ($28.3 billion, June 2022) VA health record system implementation has drawn congressional scrutiny: The Register, November 2024. B-tier (quality journalism, industry reporting).

[59] Oracle software support revenue: $5.7 billion in Q1 FY2026, down 1% YoY in USD and down 2% in constant currency, per Oracle Q1 FY2026 earnings release. The trendline has been flat to slightly declining for several years as cloud migration erodes the on-premise installed base. A-tier (SEC filing).

[60] OpenAI’s $6.6 billion equity round at $157 billion valuation: Reuters, October 2025. Cumulative cash consumption of $218 billion through 2029: Sherwood News, citing internal projections shared with investors. B-tier.

[61] Microsoft’s commercial RPO reached $625 billion in Q2 FY2026, with approximately 45% attributable to OpenAI, according to Fortune and Fierce Network analysis. Microsoft’s total investment in OpenAI is approximately $13 billion. Microsoft 10-Q (A-tier for RPO total); Fortune and Fierce Network (B-tier for OpenAI attribution percentage).

[62] The author’s assessment is based on public filings, earnings transcripts, and reporting cited throughout this piece. “Designs own AI processors” reflects whether the company designs proprietary processors for its AI infrastructure — a signal of investment depth, not a prerequisite for capability. “Self-funded” reflects whether AI capex is covered by operating cash flow without reliance on debt or equity issuance. Stage assignments follow the five-stage access sequence defined in this section.

Mistral Succeeded. France's AI Strategy Didn't.

Julien Simon — Tue, 10 Mar 2026 09:53:58 GMT

Updated on April 2nd: Sesterce’s “redressement judiciaire” has been overturned.

In the winter of 1972, in a cellar in Châtenay-Malabry, a suburb south of Paris, an engineer named François Gernelle worked eighteen-hour days to meet a deadline from the Institut National de Recherche Agronomique. The INRA wanted a cheap way to calculate evapotranspiration. Gernelle, who had left his previous employer because they refused to believe in microprocessors, proposed something no one had built before: a general-purpose computer powered by an Intel 8008, small enough to sit on a desk, sold commercially for 8,500 francs — roughly a fifth the price of a minicomputer.[1] He delivered it in January 1973. The word “microcomputer” first appeared in print to describe his machine.[2]

Within a decade, Gernelle’s company R2E had been absorbed by Groupe Bull, the French national champion in computing. By 1983, Gernelle had left. By 1989, the Micral brand was dead.[3] In 2017, Paul Allen — co-founder of Microsoft — bought a Micral N at a French auction house for his Seattle museum.[4] France invented the personal computer and shipped it to America as a collectible piece.

This is not ancient history. It is a script, and France keeps performing it. The state built a sovereign cloud in 2012 and burned €150 million of public money for €2 million in revenue before quietly shutting it down.[5] The state commissioned report after report on AI and cycled through digital ministers in rapid succession, each arriving with a strategy and departing before anyone could measure its failure.[6] The state hosted an AI summit in February 2025 where Macron announced €109 billion in investment pledges, roughly a third to nearly half of which came from a single Abu Dhabi sovereign wealth fund whose money has no legal obligation to arrive.[7]

And at the center of it all, there is Emmanuel Macron — not failing at AI, but succeeding at something else entirely. On national television, the night before the summit, the president told French citizens to “go and download Le Chat, which is made by Mistral, rather than ChatGPT” — product placement for a company whose founding cap table includes his former Secretary of State for Digital Affairs. He posted AI-generated deepfake videos of himself on Instagram. He fist-bumped a robot at Station F. He told assembled executives: “I have a good friend on the other side of the ocean saying ‘drill, baby, drill.’ Here there is no need to drill. It’s ‘plug, baby, plug.’” Day 1 ended with a DJ set. Dario Amodei called it a “missed opportunity.” The declaration was signed by sixty countries but not the United States or the United Kingdom. And Macron invoked Notre-Dame — the cathedral rebuilt in five years — as the model for AI, as if a construction project managed by retired generals applied to a technology that reinvents itself faster than any committee can meet.[67]

One year later, at the AI Impact Summit in New Delhi, he was still performing. The €109 billion in non-binding pledges had become “we are delivering this project — €58 billion in 2025.” The number comes from a UNCTAD (United Nations Conference on Trade and Development) report that counted announced foreign greenfield projects in France — and 87 percent of the total consists of two entries: MGX and Brookfield, the same pledges from the Paris summit, reclassified as FDI announcements. No ground has been broken on the MGX campus. L’Usine Nouvelle, reviewing Bercy’s one-year data, concluded it remained “difficult to know which investments have actually been realized.” The same money, counted three times: once as a summit pledge, once as a UNCTAD greenfield announcement, once as “delivery” in Delhi.[81]

He listed Hugging Face — a company incorporated in New York — and Poolside — founded by Americans — as French AI achievements. He said “we invested in European large language models” about Mistral, as if the state had founded a company built by three researchers who left France, trained at American and British labs, and came back with American venture capital. The word “we” is the pattern compressed into a pronoun.

None of this produced a single line of code, a single trained model, or a single company. And yet — Mistral exists. Hugging Face exists. A generation of Polytechnique graduates trained at American labs is building real AI companies in Paris with American venture capital running on American GPUs. The paradox is not that France fails at AI. The paradox is that French people succeed at AI while the French system fails at everything, including getting out of their way.

The numbers the summits don’t mention

Against a backdrop of 5.8 percent deficits, 113 percent debt-to-GDP, and four prime ministers in fourteen months, the state promises a revolution in artificial intelligence.[8][9][10] The promises are a masterclass in political accounting. The €109 billion announced at the AI Action Summit — which Macron compared favorably to America’s $500 billion Stargate project — is an aggregate of multi-year, non-binding pledges from foreign sovereign wealth funds, domestic incumbents, and at least one startup that announced €52 billion in AI investment on annual revenue a fraction of that size.[11] Here is what the number actually contains, reconstructed from Élysée press materials and reporting by The Media Leader, Maddyness, and the Journal des Entreprises:

The €109 Billion, Broken Down

* Sesterce deserves its own paragraph. The company — a Marseille-based GPU cloud provider that pivoted from cryptocurrency mining — announced €52 billion in AI investment at the summit. The number was calculated, as L’Usine Digitale reported, by multiplying the catalog price of an Nvidia Blackwell B200 GPU (€33,000) by 1.2 million units and adding roughly €12 billion for infrastructure. Sesterce’s 2023 revenue was €20 million. The pledge-to-revenue ratio is 2,600-to-1. The CEO acknowledged to L’Usine Digitale that the company “obviously does not have €52 billion in equity” and that “the rest will come with clients and financiers later.” The Élysée quietly recorded only the first tranche at €400 million. Even so, the Journal des Entreprises noted that the €52 billion claim exceeded total US private AI investment in 2023. The summit accepted it. The headline absorbed it. The €109 billion got larger. On February 5, 2026 — five days before the one-year anniversary of its summit announcement — Sesterce Group entered judicial restructuring (redressement judiciaire) in the Marseille commercial court. The judgment was overturned on March 5, and the company says it has continued operating normally. [12]

Read the origin column. The top three pledges — MGX, Brookfield, and FluidStack — total €80 billion and come from the UAE, Canada, and the UK, respectively. None is European Union money. Add Amazon, Apollo, Digital Realty, and Prologis and the non-EU total reaches approximately €90 billion. The identifiably French commitments — Iliad, Mistral, Eclairion — sum to roughly €5-7 billion.[13]

The €109 billion is not a French AI strategy. It is a real estate brochure for foreign data center operators, denominated in non-binding pledges from non-EU capital, with no disbursement schedule, no enforcement mechanism, and no penalty for non-delivery. Gilles Babinet, co-president of the Conseil National du Numérique, said publicly: “The reality of this money remains to be seen.”[13]

The other €110 billion nobody talks about

The real money — the money that actually gets spent, every year, without a summit — goes somewhere the summits never mention. The Crédit d’Impôt Recherche, France’s R&D tax credit, was created in 1983 as a modest incentive capped at three million francs. It stayed modest for twenty-five years. Then, in 2008, the government uncapped it and shifted the basis from incremental to volume — 30 percent of all R&D spending up to €100 million. The cost exploded: €1.7 billion in 2007, €4.15 billion in 2008, €5.2 billion by 2011, and €7.7 billion today.[14][28] The CNRS scientific council noted in 2014 that the CIR quintupled from €980 million to €5.1 billion between 2006 and 2011 without any observable stimulus effect on private research spending.[76]

Over forty-two years, the CIR has cost the French state approximately €110 billion in foregone tax revenue — roughly €20 billion before the 2008 reform and approximately €90 billion since.[77] The CIR serves all R&D, not just technology — pharma, aerospace, automotive. But even by its own terms, it is a bandaid on a wooden leg. The CDI (permanent contract) that makes firing legally expensive, the 35-hour work week, works council consultations that slow down decision-making in American startups, exit barriers, regulatory unpredictability, and ministerial churn — all make it expensive and risky to do R&D in France. Rather than fix the leg, the state offers a tax credit to offset the cost of the dysfunction.

The top fifty beneficiaries, which include Sanofi, Airbus, Safran, and the French subsidiaries of American tech companies, capture approximately 45 percent of the total — in exchange for paperwork that describes existing engineering as novel research.[15] A 2021 government evaluation found the CIR’s impact multiplier for large firms is approximately 1.0, with negligible spillover to the broader economy.[16] For smaller firms, the additionality is higher, but the top fifty beneficiaries capture 45 percent of the total.

One euro in, one euro of R&D that would have happened anyway. French economist Charles Gave has a phrase for how French industrial policy works: “Créer la pénurie pour distribuer les tickets de rationnement aux plus obéissants” — create the scarcity, then distribute the ration tickets to the most obedient.[41] The CIR is the ration ticket. €110 billion to treat symptoms — some of it funding the very foreign labs that trained Mistral’s founders. Zero to treat the disease.

The CIR is also the state’s most effective recruitment tool for foreign capital. The tax credit attracted Google, Meta, and DeepMind to open research labs in Paris, where they trained the generation of researchers who became Mistral’s hiring pool.[29] Lample and Lacroix worked at Meta’s FAIR lab in Paris. The ecosystem that shaped them was funded in significant part by French tax incentives flowing to American corporations whose Paris offices exist because the CIR makes French researchers cheaper than Californian ones. France spent billions subsidizing the training ground. America captured the company. The state paid for the education, subsidized the employer, and then co-invested as a minority investor who led no rounds.

I know how the CIR works in practice because I wrote the dossiers. At Criteo, assembling the annual CIR claim was a ritual: you took the R&D work the engineers were already doing, repackaged it in the language the Ministry of Research wanted to see — “state of the art,” “technological uncertainty,” “experimental development” — and submitted it to recover 30 percent of eligible salary costs. The research would have happened regardless. The CIR didn’t change what we built. It changed how we described what we built. It is not an innovation policy. It is a payroll subsidy with a scientific vocabulary.[66]

€110 billion in CIR tax credits at a 1.0 multiplier. The state had the resources to build a powerful sovereign technology fund. Instead, it invited Abu Dhabi to build one on its behalf.

The absorption loop

The mechanism is simple. Entrepreneurs innovate; the state absorbs their innovation into institutional frameworks; bureaucratic logic neutralizes competitive speed; the resulting champion fails or stagnates; the state rescues the wreckage, claims credit for whatever escaped, and the political pressure to reform disappears. The loop has been running for fifty years.

It was not always this way. Jean-Baptiste Colbert, Louis XIV’s minister of finance, built France’s manufacturing base in the seventeenth century through state-chartered monopolies, protected markets, and centrally directed investment. Nuclear, TGV, Airbus, Ariane — the great postwar achievements — are Colbert’s methods applied to twentieth-century hardware. And they worked, in part, because they were led by engineers rather than administrators. Pierre Guillaumat, a Polytechnicien and Corps des Mines engineer who had fought in the Resistance, ran the CEA, built France’s nuclear industry, then ran Electricité de France, then created Elf-Aquitaine.[78] Louis Armand, another Polytechnicien, electrified and modernized the French Railways, then ran Euratom.[79] Marcel Dassault (Supaéro) built the aviation empire. André Turcat — Polytechnique again — was the first European to break the sound barrier and flew Concorde’s maiden flight.[80] These were grandes écoles engineers who understood what they were building because they could have built it themselves. The great national projects succeeded because technical authority and decision-making authority were held by the same people.

The technocrats. Starting in the 1980s, administrators replaced the engineers. The inner circle of French elite leadership shifted from Polytechniciens and Supaéro graduates who had run reactors and railways to generalists who had run ministerial cabinets — trained to manage process, not to judge technology. The distinction is not which school they attended but whether they ever built anything. Colbertism worked when the people pulling the levers had built things. It fails when the levers are pulled by administrators who have never shipped a product, and it fails catastrophically in software, because software moves too fast for committees. AI is the most software-intensive strategic technology in history, and France is running a seventeenth-century playbook against a technology that reinvents itself every six months.

France has had eleven digital ministers in seventeen years. Here is what they brought and what they left behind.[34]

The Minister Carousel

Four more followed — Barrot, Ferrari, Chappaz, Le Hénanff — none with tech operating experience, none lasting longer than twenty months, none passing legislation.[34]

Eleven ministers in seventeen years. One passed legislation — Axelle Lemaire’s Loi pour une République numérique in 2016. The rest produced plans, labels, and summits, then moved on before anyone could measure the results. The CNNum was dissolved in 2025.[35] La French Tech is a marketing success. The reforms that actually improved the startup ecosystem (the flat tax, the PACTE law) came from the Ministry of Finance, not from La French Tech, which had no role in designing them.[36] The system’s signature move: create an institution, give it no power, claim its existence proves action, and dissolve it when the next minister arrives.

When the government was confronted with a technology it did not understand, it did what it always does: commissioned a report. The 2018 Villani report — 233 pages from a Fields Medalist math genius whose core team included zero entrepreneurs and no analysis of the venture capital structures or compute costs already driving American AI labs at ten times the scale of French public investment.[30][31] Two-thirds of its recommendations were reportedly implemented; none had any documented connection to a globally competitive French AI company.[32] The Cour des Comptes (France’s supreme audit institution) confirmed: Phase 2 of the national AI strategy was launched without evaluating Phase 1.[33]

The report was the product. It landed on the desk of INRIA, designated State Operator for the AI strategy — 3,800 researchers who produced Scikit-learn and OCaml for the global commons, spun off 230 startups in thirty-five years without producing a single unicorn, and trained a generation of engineers who left to build somewhere else. The pattern at the institutional scale: France funds the research, the world captures the value.[60]

€176 to eight figures

The revolving door between government, industry, and regulation — pantouflage, the French term for officials trading public authority for private profit — is not corruption in the legal sense. It is the absorption loop operating at the personnel level.

Cédric O served as Macron’s Secretary of State for Digital Affairs from 2019 to 2022 — overseeing France’s AI strategy and its position on EU tech regulation. Eleven months after leaving government, he joined the founding cap table of Mistral AI in April 2023, listed as “conseiller-fondateur” — advisor-founder. His entry ticket: €176.10, purchasing 17,610 shares at one cent each through his consulting company Nopeunteo, giving him 1.17 percent of the initial capital.[57] The three technical founders — Mensch, Lample, and Lacroix — held 95.3 percent. O’s contribution was not technical. It was political access. By December 2023, that stake was worth approximately €23 million. By June 2024, after the Series B valued Mistral at €6 billion, roughly €70 million. By September 2025, the Series C valued Mistral at €11.7 billion — four dilutive rounds have reduced O’s percentage, but the trajectory from €176 to eight or nine figures in under three years is not in dispute. He cashed out approximately €1 million by selling shares during the Series A. The Haute Autorité pour la Transparence de la Vie Publique had explicitly prohibited O from lobbying his former ministries — he structured his investment around the restriction by purchasing shares through his consulting company rather than in his own name. The HATVP declined to comment. [57]

And then he used his political network to lobby against the very regulation he had overseen. In June 2023, he co-organized a 150-executive open letter against the AI Act’s foundation model provisions.[69] In October, he told Sifted the AI Act could “kill Mistral.” In November, he coordinated with the Élysée to align France, Germany, and Italy against binding foundation model rules during the final negotiations on the AI Act, replacing them with voluntary self-regulation. It was Mistral’s exact position, adopted as French foreign policy. Bloomberg Businessweek reported that activists responded by driving vans through the streets of Paris and Brussels bearing O’s photograph and a digital billboard reading: “EU AI Act Without Foundation Models = Climate Act That Excludes Big Oil.”[70] O has since left Mistral to raise €10 million for a new edtech startup.[58]

Read the sequence again: minister, founding shareholder, lobbyist, equity holder, exit — in eighteen months. O’s career is the absorption loop compressed into a single biography. The system does not circulate its elites despite the conflict of interest. The system circulates its elites through the conflict of interest. The conflict is the mechanism.

The companies they killed

The Micral — the personal computer France invented and shipped to a Seattle museum — is the founding exhibit. But it was not an isolated case. Plan Calcul (1966) collapsed within a decade.[17] Bull consumed over $1 billion in subsidies for 1 percent margins (1982).[18] Thomson’s no-bid contract produced computers that teachers hated so much that the program was abandoned by 1985.[19] Minitel pioneered online services a decade before the web — then France Télécom’s monopoly delayed internet adoption by years.[73] Different decades, different technologies, identical script.

If you wanted a career that embodies the pattern, you could not improve on Thierry Breton’s. The state privatized Bull — France’s flagship computer manufacturer, builder of the supercomputers that ran the country’s nuclear simulations. Bull was absorbed by Atos, which Breton built into one of Europe’s largest IT services companies, with 100,000 employees across 73 countries, and an IT partner for the Olympic Games. As CEO from 2008 to 2019, Breton oversaw the acquisition of Syntel for $3.57 billion (approximately €3.1 billion at the time) — financed entirely by debt.[24] At its 2017 peak, Atos had a market capitalization of approximately €12 billion. By 2024, more than 98 percent of that value had been destroyed, and the company was drowning in €4.65 billion of debt.[25] Worldline, the payment subsidiary Breton spun off from Atos — Europe’s largest payment processor, handling billions of transactions a year — lost 59 percent of its value in a single session under a CEO who had served as Breton’s directeur de cabinet — and was ejected from the CAC 40 (France’s blue-chip index).[75] The French state then bought back Bull’s supercomputer division from the wreckage at distressed prices. Breton’s response to the collapse: “I have no responsibility, zero.”[26]

In any functioning system, this would end a career. In France, Breton became the European Commissioner for the Internal Market — where he championed an AI Act approach that treated every foundation model, including open-weight research releases, identically to a medical device until months of industry pushback forced a restructuring.[27] Atos destroyed. Bull repurchased. Breton promoted. The loop does not punish failure. The loop promotes it.

The infrastructure desert

The sovereign cloud chapter is the proof. The state invested €150 million through the Caisse des Dépôts (the state investment bank), split between two competing consortia — Cloudwatt (Orange/Thales) and Numergy (SFR/Bull) — because Orange and SFR refused to cooperate.[20] The hiring told you everything: Cloudwatt recruited leadership from Dell, HP, and Orange — enterprise hardware executives, not cloud engineers. I know because they asked me to interview. It was a fifty-person company supposed to compete with AWS, staffed like a mid-tier telco equipment vendor.[56] By 2014, Cloudwatt had generated approximately €2 million in revenue against an original target of €500 million.[21] Numergy managed €6 million, 80% of which came from a single SFR internal contract.[22] The Ministry of Finance stopped disbursing funds in 2015.[23] Orange absorbed Cloudwatt; SFR absorbed Numergy. The platforms were quietly shuttered.

The state’s response was not to build domestic capability. It was to rebrand the dependency. France’s next attempt — S3NS (Google/Thales) and Bleu (Microsoft/Orange/Capgemini) — runs 100 percent American technology in a French legal wrapper, operated by the same companies that failed at Cloudwatt.[37] ANSSI, the national cybersecurity agency, certified S3NS in December 2025, and its director general, Vincent Strubel, was blunt: SecNumCloud is “a cybersecurity tool, not industrial policy.” S3NS and Bleu could maintain operations for only six to twelve months without access to American updates. Imagining fully European cloud offerings exist is, Strubel said, a fantasy.[37] Legal sovereignty without operational sovereignty is a lease with an expiry date.

And beyond the leases, there is no sovereignty at all. The €109 billion features a data center campus whose largest investor, MGX, is controlled by the ruling family of Abu Dhabi, and whose commitment to French digital sovereignty aligns with its expected return on capital.[39] At every layer of the stack — chips, cloud, models — France depends on foreign actors while using French legal wrappers to claim ownership of the dependency.

The closest thing France has to a sovereign cloud is Scaleway — built not by Orange, Atos, or any national champion, but by Xavier Niel, a maverick entrepreneur who never attended a Grande École and made his first fortune running adult chat services on Minitel Rose.[65] Niel broke France Télécom’s monopoly by fighting the state, not by contracting with it. Through Scaleway, he has deployed Europe’s largest cloud-native AI compute cluster. Mistral trained early models on it. He funds roughly a hundred startups a year through Kima Ventures. The system does not produce its own infrastructure builders. The outsider does.

Strip out Niel, and what remains of the French AI “ecosystem” is Rodolphe Saadé — chairman of CMA CGM, the world’s third-largest shipping company, one of Macron’s closest business allies — whose €500 million in AI spending is indistinguishable from a relationship maintenance program.[62][63][64]

The state-provided compute

Jean Zay, France’s showcase public AI facility, went five years without a GPU upgrade — 2019 to mid-2024 — during which every frontier lab in the world moved on. The mid-2024 upgrade created a chimera of three GPU generations that no frontier lab would train on.[60] For context, Mistral trained Large 3 on 3,000 H200 GPUs through Azure — twice the size of Jean Zay’s entire H100 partition, a chip generation newer and available on demand without a grant application.[61]

Hugging Face used Jean Zay to train BLOOM in 2022. INRIA still cites BLOOM as a flagship achievement. What INRIA does not advertise is what happened next: for StarCoder and every subsequent training run, Hugging Face moved to AWS.[60] The state promises sovereignty. The practitioners choose whatever actually works.

Alice Recoque, France’s forthcoming exascale supercomputer, uses AMD processors fabbed at TSMC in Taiwan.[40] The machine that is supposed to deliver sovereign compute depends on American chips manufactured on a Chinese-claimed island. The state’s answer to every layer of dependency is another layer of dependency with a French label on top.

What actually exists

One part of the French system genuinely works for AI: the Grandes Écoles produce mathematicians and engineers that DeepMind and Meta hire in their research labs, in Paris and elsewhere. Some of those researchers then leave and start new companies. This pipeline produced Mistral’s founders, Hugging Face’s founders, and LeCun himself. It is genuinely world-class, and it works because competitive examination has proven more resistant to political interference than industrial policy. The pipeline succeeds because its selection mechanism is harder to corrupt than the apparatus described in every other paragraph of this piece.

If the pipeline were the strategy, the state would fund the Grandes Écoles, create an innovation- and growth-friendly environment, and stop there. No summits, no reports, no ministers, no €109 billion in performative pledges, no €7.7 billion CIR. Instead, the state funds the pipeline and builds the apparatus — an apparatus that produces no companies, consumes €110 billion in tax credits with a 1.0 multiplier, and claims credit for the pipeline’s outputs. The apparatus is not the strategy. The apparatus is the tax on the strategy.

The outsiders

Mistral is real. Founded in April 2023 by Arthur Mensch (ex-DeepMind), Guillaume Lample, and Timothée Lacroix (both ex-Meta), it raised a €1.7 billion Series C at a €11.7 billion valuation in September 2025, with ASML as the lead investor.[42] ARR has crossed $400 million, and Mensch has guided toward €1 billion in revenue for 2026; roughly 60 percent comes from European clients seeking alternatives to American providers.[43][44] But France’s system had nothing to do with producing it. Mistral is a company founded by three researchers who left France, trained at American and British labs on American infrastructure, returned with American venture capital, and chose to incorporate in Paris because “it’s where we’re from”—a personal preference, not an industrial strategy.[47] The talent is Polytechnique. The capital is Lightspeed, Andreessen Horowitz, and General Catalyst. The compute is Nvidia through Azure. BPI France co-invested in every round but led none.[45][38][46]

And the loop is already reaching for it. Cédric O is on the founding cap table. Macron promotes Le Chat on national television. Mensch is at the signing table — standing beside Macron at the February 2025 summit, then again in New Delhi a year later, while the president once again claims “we invested” in Mistral. Mistral began by releasing open-weight models under an Apache 2.0 license. Its most capable systems are now behind commercial licenses and behind sovereignty contracts.[82] The company that started open is closing, and the closing tracks the state’s gravitational pull.

Hugging Face — arguably France’s single most important contribution to the global AI ecosystem — was co-founded in 2016 by three French nationals who incorporated in America and built a large team in France. $4.5 billion valuation on American cap tables.[54] Delangue testified before the US House on the importance of open-source AI, participated in the Senate’s AI Insight Forum, and submitted an open-source policy blueprint for the White House AI Action Plan — advocating for the technology’s future from inside the American policy apparatus, not the French one. No cabinet minister on the cap table. No summit photo ops. No sovereignty contracts. They kept their distance, and the loop has not reached them — so far.

LeCun chose Paris for Meta's FAIR research lab in 2015 — funded by Meta, not the French state, because French researchers are world-class and French salaries are lower than San Francisco's.[55] In March 2026, he left Meta and launched AMI Labs, a world-model startup headquartered in Paris. It just raised $1.03 billion at a $3.5 billion valuation — Europe's largest seed round — from Nvidia, Temasek, Bezos Expeditions, and Eric Schmidt. The CEO is an alumnus of FAIR Paris. The key hires come from Meta and DeepMind. BPI France co-invested. It did not lead. The state interprets this as a win. It is a precise repetition of the pattern: France trained the talent, America built the career, and when the company was founded, the state showed up as a minority co-investor in a round led by Nvidia and Bezos.

Beyond those three, there is not much. H Company, another French AI foundation-model startup, lost three of its five co-founders within a year of its launch.[48] Kyutai produces research, not revenue. LightOn raised €62 million through an IPO, valuing it as a small-cap.[49] Poolside, which relocated to Paris, was founded by Americans.[50]

The first French unicorn turns 20 and leaves

In 2013, Fleur Pellerin — then minister for the digital economy — was sipping champagne at Criteo cocktail parties, celebrating France’s first true tech unicorn. (Full disclosure: I was VP Engineering at Criteo from 2010 to 2014 and hold no financial interest in the company.)[53] Founded in 2005, Criteo is an ML-powered performance advertising platform at scale, was listed on NASDAQ in 2013, and runs one of the largest production machine learning systems in Europe. The French state’s reward for building one of Europe’s few globally competitive ML systems: in June 2023, the CNIL — France’s data protection authority — fined Criteo €40 million for GDPR violations related to consent verification — approximately four times the company’s 2022 net profit. The Conseil d’État (France’s highest administrative court) upheld it in March 2026.[68] The same state that subsidizes R&D through the CIR fines the company that actually shipped production ML. The regulatory arm does not know what the innovation arm is doing — or does not care.

In October 2025, Criteo announced it was redomiciling to Luxembourg — the intermediate step. The destination is the United States. Shareholders approved the move on February 27, 2026.[51] The company’s own FAQ states the reason with corporate bluntness: “French law does not provide a framework for direct merger into a U.S. corporation.”[52] France’s legal structure prevents a French company from being acquired through a direct merger by an American buyer. This is not a bug in French corporate law. It is the absorption loop codified in statute — and the mechanism that prevents departure also prevents the liquidity events that fund the next generation of startups. Without exits, no capital recycling, no serial founders, no ecosystem densification. Criteo’s departure is not a loss. It is a diagnosis.

Criteo cannot stay French. Mistral cannot stay sovereign. The system fails at both ends — it cannot retain what it builds, and it cannot build what it claims.

Mistral may yet break the pattern. Mensch is a builder, not a bureaucrat, and $400 million in ARR is not a summit pledge. More power to Mistral, but the fifty-year record is clear. France invented the personal computer in 1973 and shipped it to a museum in Seattle. In 2026, it is shipping its first tech unicorn to Luxembourg, en route to America. The technology changed. The script did not.

Notes

[1] The Micral N was delivered to INRA in January 1973 and commercialized in February 1973 for FF 8,500 (approximately $1,750). Based on the Intel 8008 microprocessor, clocked at 500 kHz. IEEE Milestones; Computer History Museum; François Gernelle, Computer Timeline. The development team included Gernelle, Alain Lacombe, Jean-Claude Beckmann, and Maurice Benchétrit, working in a cellar in Châtenay-Malabry. The Computer History Museum describes the Micral as “one of the earliest commercial, non-kit personal computers.”

[2] The term “microcomputer” first appeared in print in reference to the Micral. Wikipedia, “Micral”, citing the January 1974 Users Manual; IEEE Milestones documentation.

[3] R2E was acquired by Groupe Bull in the late 1970s (some sources cite 1978-1979 for the sale, with formal absorption by 1981). François Gernelle left Bull in 1983 to found Forum. The Micral brand was eliminated in 1989 when Bull merged it with Zenith Data Systems, another acquisition. Jacky Dubois, a former Bull Micral engineer, quoted in Absomod history.

[4] Paul G. Allen purchased a Micral N at the Rouillac auction house at Château d’Artigny, France, on June 11, 2017, for his Seattle museum Living Computers: Museum + Labs. Wikipedia, “Micral.”

[5] The Andromède project, renamed and split into Cloudwatt (Orange/Thales) and Numergy (SFR/Bull) in 2012, received €150 million from the Caisse des Dépôts — €75 million per project — as confirmed by Minister Fleur Pellerin at the Cloudwatt launch in October 2012 (L’Usine Nouvelle) and Sénat Rapport d’information n° 443. By 2014, Cloudwatt’s revenue was approximately €2 million, according to Les Echos financial reporting cited by ChannelNews and Clubic, against an original target of €500 million by 2017. The Ministry of Finance confirmed in early 2015 that less than half the committed funds had been disbursed (Les Echos, cited in Silicon.fr). Cloudwatt was absorbed by Orange in 2015 and decommissioned in February 2020.

[6] Since the creation of the first dedicated digital affairs portfolio in 2012, France has appointed approximately ten ministers or secretaries of state with digital/numérique responsibilities: Fleur Pellerin (2012-2014), Axelle Lemaire (2014-2017), Mounir Mahjoubi (2017-2019), Cédric O (2019-2022), Jean-Noël Barrot (2022-2024), Marina Ferrari (2024), Clara Chappaz (2024-present), with interim holders including Prisca Thevenot and others. The count varies depending on whether interim and non-portfolio holders are included. Average tenure is approximately 15 months.

[7] Macron announced the €109 billion figure at Station F on February 11, 2025: “On a obtenu 109 milliards d’euros d’investissements privés français et étrangers pour l’IA en France.” Élysée official transcript. The UAE/MGX data center campus was described by the Élysée as requiring “des investissements d’un ordre de grandeur de 30 à 50 milliards d’euros.” Élysée press materials, February 2025. At the upper bound (€50B/€109B), this represents approximately 46%; at the lower bound (€30B/€109B), approximately 28%. Brookfield (Canada) pledged an additional €20 billion. See also Franceinfo, Public Sénat, and Journal des Entreprises breakdowns noting that “l’addition menant à 109 est loin d’être simple à comprendre.”

[8] INSEE first estimate, January 30, 2026: France's GDP growth in 2025 was 0.9%. EU average GDP growth in 2025 was approximately 1.8% per the European Commission's Winter 2026 forecast. The original European Commission Autumn 2025 projection was 0.7%, used in some earlier reporting.

[9] INSEE, March 2026: France's general government deficit was 5.8% of GDP in 2024. The Maastricht Treaty limit is 3.0%. The 2025 deficit was confirmed at approximately 5.4% per the Cour des Comptes. Source: Franceinfo.

[10] INSEE: government debt at 113.2% of GDP in 2024. European Commission projection: 120% by 2027. EU debt-to-GDP Maastricht reference value: 60%.

[11] Élysée official transcript, February 11, 2025. Macron announced the €109 billion figure at Station F and explicitly compared it to the US Stargate project. The Sesterce claim: a sixty-person Marseille startup announced €52 billion in AI investment; the Élysée recorded only the first €400 million tranche. Journal des Entreprises (April 2025) noted that the €52 billion exceeded total US private AI investment in 2023 per Stanford HAI data.

[12] Breakdown table compiled from: Élysée press materials (February 2025); The Media Leader FR (February 10, 2025), “IA : d’où viendront les 109 milliards d’euros d’investissements d’ici 2031”; Maddyness (February 10, 2025), “Ce que contient le plan à 109 milliards”; Journal des Entreprises (April 2025); Public Sénat (February 2025); Nouvelles-Technologies.eu (February 2025). The MGX pledge of €50 billion is the upper bound of the Élysée’s own “30 à 50 milliards” range. BPI France separately announced €10 billion for innovation over 2024-2029 (not solely AI). The Choose France summit (May 2025) confirmed €20.8 billion of the €109 billion as “concrétisés” — less than 20% materialized within three months. Sesterce details: L’Usine Digitale, February 12, 2025 — the €52B calculation method (GPU catalogue price × quantity + infrastructure), the CEO’s acknowledgement that “aucun n’a encore récolté l’ensemble des fonds,” and the 2023 revenue figure of €20M. Sesterce background (crypto mining pivot): Solutions Numériques, November 5, 2024. Sesterce Group entered judicial restructuring (redressement judiciaire) on February 5, 2026, per the Marseille commercial court filing on Pappers — five days before the one-year anniversary of its €52 billion summit announcement. March 5 overturning ruling.

[13] Gilles Babinet, co-president of the Conseil National du Numérique, quoted in Journal des Entreprises (April 2025): “la réalité de cet argent reste à voir” (”the reality of this money remains to be seen”). JDE also quoted France Datacenters estimating the French data center market would grow “de plus de 10% par an jusqu’en 2030 ou 2035” — strong growth, but not remotely sufficient to absorb €109 billion in new capacity. Babinet separately noted: “Si ces chiffres sont vrais, ils sont très conséquents, je ne sais pas comment le marché peut les absorber.”

[14] Sénat, Commission des Finances, Projet de Loi de Finances 2025: the CIR is budgeted at €7.7 billion for 2025 across approximately 15,500 beneficiary companies. For 2024, the Sénat PLF 2024 report cited €7.6 billion. IFRAP cited €7.65 billion for 2024.

[15] Sénat PLF 2024 and PLF 2025: “les cinquante premières entreprises bénéficiaires du CIR concentrent à elles seules près de 45% du bénéfice du dispositif.”

[16] CNEPI (Comité National d’Évaluation des Politiques d’Innovation), 2021 evaluation of the CIR, published by France Stratégie. The ~1.0 multiplier finding applies specifically to large firms; smaller firms show higher additionality.

[17] The Plan Calcul (1966-1976) invested over $100 million in its first five years. CII (Compagnie Internationale pour l’Informatique) was its primary output. The Unidata consortium (CII, Siemens, Philips) collapsed in 1975. The program has been assessed by historians as commercially unsuccessful, though its educational investments (Baccalauréat H, vocational training) are credited as a separate success. Wikipedia, “Plan Calcul”; Wargaming Scribe research on French computing history.

[18] Encyclopedia.com, “Bull S.A. History”: Bull received over $1 billion in government subsidies between 1983 and 1990. Revenue of approximately $5.3 billion with roughly 1% profit margin. Nationalized in 1982, partially privatized in 1994.

[19] The Plan Informatique Pour Tous (1985-1989) awarded a contract to the nationalized Thomson without a public tender, despite the availability of an Apple proposal. Wargaming Scribe: “The MOTO computers were slow, their graphics lacklustre, their sound primitive, their peripherals often incompatible between MO5 and TO7, and the series had barely any software when the Plan was launched.”

[20] Sénat Rapport d’information n° 443, “L’Union européenne, colonie du monde numérique?” documents the CDC structure: €75 million per project, CDC at 33% of each entity. Minister Pellerin confirmed the total of €150 million at the Cloudwatt launch in October 2012. The split occurred because Orange and SFR refused to cooperate — the original Andromède project was designed as a single entity.

[21] ChannelNews, October 2014: Cloudwatt expected approximately €2 million in revenue for 2014. Cloudwatt’s original target was €500 million by 2017 (reduced to €200 million by new management). Numergy generated approximately €6 million, with 80% from SFR internal contracts, per CEO Philippe Tavernier’s own admission.

[22] ChannelNews, October 2014; confirmed by Frenchweb, September 2015.

[23] Les Echos, cited by Silicon.fr, March 2015: the Ministry of Finance confirmed “un peu moins de la moitié de l’argent a été débloqué.” The remainder was redirected to other programs.

[24] Atos press release, 2018: Syntel acquisition price of $3.57 billion (approximately €3.4 billion), financed by debt. Tech Monitor analysis; Fortune, “The Epic Fall of Atos.”

[25] Atos (ATO.PA) historical prices: peak market capitalization of approximately €11-12 billion in late 2017 (Yahoo Finance). At its 2024 nadir, the share price had declined by more than 98% from the 2017 peak. Debt of €4.65 billion at the end of 2023, per Reuters. As of early 2026, market capitalization has partially recovered to approximately €800 million- € 1.3 billion.

[26] Bloomberg, interview with Thierry Breton regarding Atos collapse: “I have no responsibility, zero.” Breton served as Atos CEO from 2008 to November 2019, then became EU Commissioner for the Internal Market in December 2019.

[27] On Breton’s AI Act approach: the European Commission’s initial proposal (April 2021, under Breton’s portfolio) classified AI systems solely by risk tier without a separate regime for general-purpose AI or foundation models. When foundation models emerged as a regulatory question in 2022-2023, the Commission’s initial approach was to apply high-risk conformity assessment to all foundation models regardless of application — an approach that drew criticism from France’s own digital ministry, which led a coalition of EU member states arguing for a lighter-touch regime for open models. The resulting compromise (the GPAI provisions in the final AI Act, December 2023) was substantially restructured from Breton’s original framework. Politico EU, Reuters, and Euractiv covered the negotiations extensively. On the Atos acquisition: the French state negotiated to acquire Atos’s advanced computing division (including Bull’s supercomputer business) to protect national defense capabilities. Tech Monitor; Fortune; Reuters coverage of Atos restructuring, 2024.

[28] OECD data on R&D tax incentives. France’s CIR is consistently ranked among the largest R&D tax credits in the OECD relative to GDP.

[29] Google opened its Paris AI research lab in 2018; Meta’s FAIR Paris lab has been operational since 2015; DeepMind opened a Paris office. Arthur Mensch worked at DeepMind London, but the broader French AI research ecosystem, partly funded by CIR-eligible institutions, trained the talent pool from which Mistral recruited.

[30] Acteurs Publics, Villani report PDF (March 2018): the core mission team composition. Olivier Ezratty’s detailed analysis of the report documented the team composition and absence of entrepreneur representation. The broader mission conducted approximately 400 expert hearings.

[31] Villani report, “Donner un sens à l’intelligence artificielle” (March 2018), 233 pages. The report does not contain a venture capital analysis section or a compute requirements analysis, despite OpenAI’s “AI and Compute” analysis (May 2018) documenting the exponential growth in training compute that was already visible by late 2017. By 2018, DeepMind and OpenAI were spending tens of millions on individual training runs—a capital-intensity trajectory the report did not examine. The framing primarily centers on ethics, education, health, and public policy. CNRS president Petit publicly expressed concerns about the ethics-first framing. Silicon Republic profile of Villani.

[32] France Stratégie tracking of Villani report implementation. The “two-thirds implemented” figure is based on government self-reporting; independent assessments of implementation quality vary significantly.

[33] Cour des Comptes, report on the national AI strategy (November 2025): Phase 2 was launched without a formal evaluation of Phase 1 outcomes.

[34] Liste des ministres français du Numérique, Wikipedia. The digital portfolio was created in March 2008. Minister count (11) includes Éric Besson’s two non-consecutive terms. Tenure is calculated from the date of appointment to the date of departure. Portfolio titles varied: “Développement de l’économie numérique,” “Numérique,” “Transition numérique et Télécommunications,” “Intelligence artificielle et Numérique.” The portfolio rank also fluctuated between secrétaire d’État (junior minister) and ministre délégué (delegated minister), the latter carrying marginally more protocol weight and guaranteed attendance at the Conseil des ministres. Background details verified against individual Wikipedia pages, HATVP declarations, and government biographies at info.gouv.fr. Barrot statistic (”2 of 35 parliamentary questions”) from Euractiv France, July 4, 2022, citing journalist Raphael Grably. Mahjoubi’s early tech background from his Wikipedia entry (network technician at Club Internet at age 16). Ferrari’s Lunabee Studio background from her HATVP declaration. Chappaz biography from enseignementsup-recherche.gouv.fr.

[35] FrenchWeb, “CNNum enterré vivant” analysis: annual operating budget of €48,000, two permanent staff. The CNNum was established by decree in 2011 (Légifrance) and effectively dissolved in 2025.

[36] The flat tax on capital gains (Prélèvement Forfaitaire Unique, 30%) was introduced by the Macron government in 2018 and reduced the effective tax rate on investment returns. The PACTE law (Plan d’Action pour la Croissance et la Transformation des Entreprises, 2019) simplified corporate creation, reduced minimum capital requirements, and streamlined French business structures.

[37] S3NS is a joint venture between Google Cloud and Thales, designed to offer Google Cloud services under French security controls. Bleu is a joint venture between Microsoft, Orange, and Capgemini. Both rely entirely on American hyperscaler technology, with French legal entities. S3NS obtained the SecNumCloud 3.2 qualification from ANSSI in December 2025. Vincent Strubel, the director general of ANSSI, published a detailed clarification on LinkedIn on January 6, 2026. Key distinction: SecNumCloud certifies legal sovereignty (European entity controls data, protected from extraterritorial injunctive relief) but not operational sovereignty (dependency on American technology updates). Strubel: “C’est un outil de cybersécurité, pas de politique industrielle.” On operational autonomy: S3NS and Bleu could maintain operations for “six à douze mois” without American updates. On the fantasy of 100% European solutions: “imaginer qu’il existe des offres 100% européennes relève de la pure vue de l’esprit qui ne résiste pas à la confrontation aux faits.” Sources: Next.ink, January 6, 2026; Le Mag IT, January 6, 2026; Solutions Numériques, January 7, 2026. The sovereignty analysis from the published piece “Access, Disable, Destroy” applies.

[38] Mistral AI blog, December 2025: Mistral Large 3 was trained using “3,000 Nvidia H200 GPUs.” Microsoft Azure partnership confirmed by both Mistral and Microsoft announcements, February 2024.

[39] See footnote 12. MGX is the Abu Dhabi sovereign investment vehicle.

[40] Alice Recoque uses AMD processors manufactured by TSMC. Specific technical sourcing per company disclosures and press reporting.

[41] Charles Gave, GaveKal Research. The quote translates roughly as: “Create the scarcity so you can distribute the ration tickets to the most obedient.” Gave has used this formulation to describe French industrial policy more broadly.

[42] Mistral AI blog, September 9, 2025: “Series C funding round of 1.7B€ at an 11.7B€ post-money valuation.” ASML invested €1.3 billion as lead investor, gaining approximately 11% ownership on a fully diluted basis. CNBC and Bloomberg confirmed.

[43] CEO Arthur Mensch told the Financial Times (published January 31, 2026) that Mistral’s annualized revenue run rate had crossed $400 million, up from $20 million a year earlier. He guided toward exceeding $1 billion in ARR by the end of 2026. Earlier, at the World Economic Forum in Davos (January 2026), Mensch reported ARR of €300 million as of September 2025. Previous revenue figures: approximately $10 million in 2023, $30 million in 2024.

[44] Approximately 60% European revenue is derived from multiple press reports and Mistral’s own investor communications. The company has highlighted European enterprise demand driven by data sovereignty requirements as a key growth driver.

[45] Mistral funding history: Seed round (June 2023, €105 million) led by Lightspeed Venture Partners. Series A (December 2023, €385 million) led by Andreessen Horowitz. Series B (June 2024, €600 million) led by General Catalyst. Series C (September 2025, €1.7 billion) led by ASML. BPI France (Bpifrance) participated as a co-investor in all rounds. Orrick legal advisory.

[46] See footnote 41.

[47] In press interviews, Mensch has framed the decision to incorporate in France in personal and patriotic terms — “building European AI sovereignty” — rather than citing structural advantages such as the regulatory environment or access to capital. The three founders are all Parisian-born Polytechnique graduates with family ties to France. No published interview has identified a specific structural reason for choosing Paris over London or San Francisco, where Mistral’s primary investors and compute providers are based. Mensch’s separate claim of 75% French capital has not been independently verified; the post-Series C shareholder register has not been publicly disclosed.

[48] H Company (formerly Holistic AI): reporting by Sifted and The Information on co-founder departures within the first year.

[49] LightOn IPO on Euronext Growth, 2024: raised approximately €62 million. Market capitalization is in the small-cap range.

[50] Poolside AI: founded by American entrepreneurs, relocated to Paris. Sifted reporting on the company’s strategic reasons for the Paris move.

[51] Criteo press release, October 29, 2025: announced intention to redomicile from France to Luxembourg via cross-border conversion. SEC Form 425 filed. Shareholder approval received at the general meeting on February 27, 2026 (50.5 million votes in favour). Conversion expected Q3 2026. Critically, Criteo has stated that following the Luxembourg conversion, it intends to pursue a subsequent redomiciliation from Luxembourg to the United States — making Luxembourg the intermediate step, not the destination. Criteo SEC Form 425; Shareholder approval, February 27, 2026; Criteo redomiciliation update, January 7, 2026.

[52] Criteo investor FAQ, as cited by PPC Land: “French law does not provide a framework for direct merger into a U.S. corporation.” The redomiciliation to Luxembourg explicitly enables a subsequent US transfer if the board determines it is in the shareholders’ best interests.

[53] The author served as Vice President of Engineering at Criteo from 2010 to 2014, during the company’s pre-IPO growth phase. This experience informs the analysis of French corporate structure constraints on exits, but does not represent any ongoing financial interest in Criteo.

[54] Hugging Face was co-founded in New York in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf — all French nationals. Wolf, a physicist turned patent attorney turned AI researcher (PhD in statistical physics from Sorbonne University), created the Transformers library that became Hugging Face’s core product. The company is headquartered in New York and incorporated in the US. Valued at $4.5 billion as of its August 2023 funding round led by Google, Amazon, Nvidia, Intel, AMD, Qualcomm, IBM, and Salesforce. Delangue testified before the US House Committee on Science, Space, and Technology on June 22, 2023, telling Congress that open-source AI is “extremely aligned with American values and interests” (VentureBeat, June 2023; full transcript at TechPolicy.Press). He participated in Senator Schumer’s AI Insight Forum in September 2023 (FedScoop, September 2023). In 2025, Hugging Face submitted an open-source policy blueprint to the White House for Trump’s AI Action Plan (VentureBeat, March 2025). Notably, Hugging Face was excluded from Biden’s original White House AI meeting in May 2023 — only OpenAI, Google, Microsoft, and Anthropic were invited (Fortune, May 2023).

[55] Meta opened its FAIR (Facebook AI Research) lab in Paris in 2015, at LeCun's initiative, making it one of the first major American AI research labs in France. Meta's choice of Paris reflected the quality of French AI researchers and relatively lower compensation compared to San Francisco — a dynamic the CIR tax credit reinforced by subsidizing the salary costs of researchers at foreign-owned labs. In late 2025, LeCun announced he was leaving Meta to found AMI Labs (Advanced Machine Intelligence), a world-model startup headquartered in Paris. On March 10, 2026, AMI announced a $1.03 billion seed round at a $3.5 billion valuation — Europe's largest seed round ever. Investors: Nvidia, Temasek (Singapore), Bezos Expeditions, Eric Schmidt, Cathay Innovation, Daphni, Greycroft, Hiro Capital, HV Capital, SBVA. BPI France co-invested but did not lead. CEO Alex LeBrun previously co-founded Nabla and worked under LeCun at FAIR Paris. Key hires include Mike Rabbat (former Meta research science director) and Saining Xie (former Google DeepMind). Laurent Solly, Meta's former VP for Europe, also joined. Offices: Paris (HQ), New York, Montreal, Singapore. Sources: Bloomberg, March 10, 2026; Sifted, March 10, 2026; TechCrunch, January 23, 2026; Tech.eu, March 10, 2026.

[56] The author was contacted by Cloudwatt for a senior engineering interview during the company’s early staffing phase. The leadership team was drawn from enterprise hardware and telco backgrounds (Dell, HP, Orange) rather than cloud-native or startup backgrounds — a staffing profile that reflected the consortium’s institutional DNA rather than the competitive requirements of competing with AWS. This is a personal observation based on direct experience with the hiring process.

[57] Cédric O joined Mistral AI’s founding cap table in April 2023 as “conseiller-fondateur” (advisor-founder) — a title that placed him alongside the three technical co-founders (Mensch, Lample, Lacroix) in the company’s statuts without implying an operational role. Mistral AI SAS statutes (company creation documents) show an initial capital of €15,000 split into 1.5 million shares at €0.01 each. The three technical co-founders (Mensch, Lample, Lacroix) held 95.3% equally; the remaining 4.7% was split equally among four holders: Cédric O (via consulting company Nopeunteo), holding company Alan Tech, Jean-Charles Samuelian-Werve, and Charles Gorintin — giving each approximately 1.17%. O’s investment: €176.10 for 17,610 shares. Valuation trajectory: at the €2B Series A (December 2023), O’s 1.17% was worth approximately €23 million (Capital.fr); at the €6B Series B (June 2024), approximately €70 million (Presse-Citron). O sold shares during the Series A, pocketing approximately €1 million per documents consulted by Bloomberg (Cafétech, January 2024). The HATVP (Haute Autorité pour la Transparence de la Vie Publique) had prohibited O from lobbying his former ministries upon leaving government; O circumvented the restriction by purchasing shares through Nopeunteo rather than in his own name (Cafétech, citing HATVP deliberation). Cap table reconstruction: Maddyness, January 2024. The three technical co-founders each hold at least 8% of the company, per the Bloomberg Billionaires Index ($1.1B net worth each at a €11.7B valuation). Sifted interview with O.

[58] Sifted, November 2024: “Mistral cofounder and former tech minister Cédric O nearing €10m seed deal for new AI edtech startup.” The Marshmallow Project, an AI edtech company, is raising from General Catalyst, Balderton, Alpha Intelligence Capital, Edu Capital, Kima Ventures, daphni, and Sistafund. O serves as CEO.

[59] Fleur Pellerin founded Korelya Capital in 2016, immediately after leaving the French government. The fund was initially backed by South Korean internet giant Naver Corporation with €100 million, and has grown to over €800 million under management (Korelya Capital website). Pellerin served as Minister for SMEs, Innovation and the Digital Economy from 2012 to 2014, during which she launched the La French Tech initiative and oversaw the Cloudwatt/Numergy sovereign cloud program. She subsequently served as Secretary of State for Foreign Trade and Minister of Culture before leaving government. Wikipedia, “Fleur Pellerin”; Korelya Capital.

[60] The MIT comparison: MIT has produced spinoffs, including Akamai, Dropbox, Bose, and iRobot; INRIA’s most notable outputs are scikit-learn and its 1988 NSFNet connection. INRIA employs 1,300 researchers. Note: France’s national supercomputers are operated by GENCI (Grand Équipement National de Calcul Intensif) and housed at CNRS/IDRIS (Saclay), CEA/TGCC (Bruyères-le-Châtel), and CINES (Montpellier) — not by INRIA itself, though INRIA coordinates the national AI strategy that these facilities nominally serve. Jean Zay supercomputer, operated by CNRS/IDRIS at Saclay. Deployedin 2019 as HPE SGI 8600. Original GPU partition: 261 converged nodes with Nvidia V100 SXM2 32GB GPUs (released June 2017). Extended in 2021 with 416 Nvidia A100 80GB GPUs. Extended again in summer 2024 (France 2030 funding) with 1,456 Nvidia H100 80GB SXM5 GPUs on 14 Eviden BullSequana XH3000 racks with InfiniBand NDR 400Gb/s interconnect. Legacy V100 and A100 partitions retain the older Omni-Path 100Gb/s interconnect. Total post-upgrade peak: 125.9 petaflops (up from 36.85). Sources: GENCI/CNRS press release March 2024; DCD March 2024; HPCwire November 2021; IDRIS hardware documentation. For comparison: Mistral Large 3 trained on 3,000 Nvidia H200 GPUs via Azure (Mistral blog, December 2025). The H200 provides roughly 1.5-2x the memory bandwidth of H100 and was unavailable on Jean Zay at the time of publication. BLOOM training (BigScience/Hugging Face, 2022) used Jean Zay’s V100 partition; see Hugging Face blog, “The Technology Behind BLOOM Training” for the engineering details of training on Jean Zay. The author was at Hugging Face during BLOOM training and the subsequent migration to AWS for StarCoder. StarCoder and subsequent HF training runs moved to AWS. INRIA cites BLOOM on its AI Programme page as a flagship achievement; the subsequent migration to AWS is not mentioned.

[61] France’s three national GENCI supercomputing centers: Jean Zay at IDRIS/CNRS (Saclay), Joliot-Curie at TGCC/CEA (Bruyères-le-Châtel), and Adastra at CINES (Montpellier). Joliot-Curie: 22 petaflops, BullSequana X1000, primarily Intel Skylake CPUs with no modern GPU accelerators for AI training. Installed 2017-2019, scheduled for decommissioning when Alice Recoque comes online (expected 2026). Adastra: 74.5 petaflops, HPE Cray EX4000, equipped with 1,352 AMD MI250X GPUs — a competitive HPC accelerator but not the Nvidia architecture that dominates LLM training. Alice Recoque, France’s first exascale system, was selected in November 2025 (Eviden/AMD), with installation expected at TGCC. It will use AMD Instinct GPUs — again, not the Nvidia H100/H200/B200 ecosystem that every frontier AI lab trains on. Sources: CEA TGCC; GENCI Annual Report 2023; DCD September 2024; GlobeNewsWire November 2025.

[65] Scaleway, a subsidiary of Xavier Niel’s Iliad Group, has deployed over 3,000 Nvidia H100 GPUs in European data centers as of late 2024, with plans to exceed 5,000. Nvidia’s own blog describes Scaleway as offering “the European cloud’s largest compute capacity.” Scaleway provided GPU clusters that Mistral, Kyutai, and H Company used for model training. Niel also founded Station F (a startup campus), co-founded Kyutai (an open-source AI research lab with a €300M commitment), and solely backs Kima Ventures (a seed fund with ~100 investments per year). TechCrunch, September 2023; Nvidia blog, June 2025; French Tech Journal, November 2024.

[62] CMA CGM’s total AI investments of €500 million include: €100 million five-year partnership with Mistral AI (April 2025); $150 million partnership with Google Cloud; investments in Poolside, Dataiku, and Nabla through Zebox Ventures (CMA CGM’s VC arm); and co-founding of Kyutai alongside Niel and Eric Schmidt. Sifted, April 2025; Supply Chain 24/7; Maritime Executive; Brookes Bell.

[63] Indian PM Modi and President Macron visited CMA CGM headquarters in Marseille on February 12, 2025, during the AI Action Summit state visit. CMA CGM press release, February 2025. Macron inaugurated CMA CGM’s Tangram center of excellence in 2024. Mediapart and Blast investigations documented that every Macron visit to Marseille includes a stop at the CMA CGM tower, and that Saadé’s foundation and corporate investments have made CMA CGM function as “une collectivité territoriale de plus” (an additional local government). Saadé’s directorate includes former Cour des Comptes advisors and former Ministry of Interior officials.

[64] Mediapart investigation (cited in Blast and Révolution Permanente): CMA CGM described as “more powerful than elected officials” in Marseille. La Provence’s editor was suspended in March 2024 after publishing a front page critical of Macron. CMA Media, created in 2021, now owns BFM TV, RMC, La Provence, Brut, La Tribune, and Corse Matin — making it France’s third-largest private media group. Rodolphe Saadé told his media employees he would find it “très agressif” (very aggressive) for journalists to investigate CMA CGM’s own activities. Blast investigation, 2025.

[66] The author prepared CIR (Crédit d’Impôt Recherche) dossiers at Criteo during his tenure as VP Engineering (2010-2014). The CIR allows companies to claim a 30% tax credit on eligible R&D expenditure (primarily researcher salaries). The dossier process requires framing engineering work in terms of “technological uncertainty” and “advancement of the state of the art” as defined by the Ministry of Research. In practice, the engineering work precedes and is independent of the CIR claim — the dossier is assembled retrospectively to describe existing work in CIR-eligible language. A specialized consulting industry exists to optimize CIR claims. The CNEPI 2021 evaluation (France Stratégie) confirmed that, for large firms, the additionality effect is approximately 1:1 — each euro of credit generates roughly 1 euro of additional R&D that would not otherwise have occurred, with limited spillovers. For context: €7.7 billion per year exceeds the entire national AI strategy budget (€1.5 billion over five years for Phase 2) by a factor of roughly 25 on an annual basis.

[67] Macron AI summit theatrics, compiled from multiple sources. “Go and download Le Chat, which is made by Mistral, rather than ChatGPT by OpenAI — or something else” — Macron on France 2 television, February 9, 2025 (TechCrunch, February 2025). Deepfake Instagram video — Macron posted AI-generated clips of himself on Instagram, February 9, 2025, to publicize the summit (Wikipedia, “AI Action Summit,” citing AP). Robot fist-bump at Station F — AP Photo/Aurelien Morissard, February 11, 2025. “Plug, baby, plug” — Macron at Grand Palais closing address, riffing on Trump’s “drill, baby, drill” (Euronews, February 14, 2025; French Tech Journal, February 11, 2025). DJ closing Day 1 at Grand Palais — Euronews: “The Paris summit resembled more of a tech fair or even festival atmosphere with many fancy corporate side events and even a DJ closing the first day.” Notre-Dame comparison — Macron at Grand Palais: “We showed the rest of the world that with a clear timeline, we can get there,” proposing to apply Notre-Dame reconstruction’s streamlined permitting to data center approvals (French Tech Journal; Élysée transcript). Amodei called the summit a “missed opportunity.” Bengio called it the same. The declaration was signed by 60 countries; the US and UK declined to sign. Kevin Roose, NYT: “The biggest surprise of the Paris summit, for me, was realizing that policymakers don’t seem to understand how quickly powerful AI systems could arrive.”

[68] CNIL Délibération n° SAN-2023-009 du 15 juin 2023, sanctioning Criteo for five RGPD violations related to its retargeting advertising activities: failure to verify consent, inadequate transparency, incomplete data access responses, failure to honor withdrawal and erasure requests, and absence of required joint controller agreements. Fine: €40 million (reduced from the rapporteur’s initial proposal of €60 million after Criteo pleaded low profitability). The fine represents approximately 2.1% of Criteo’s €1.9 billion worldwide turnover — the originally proposed €60 million would have been approximately 3%. The GAAP net income figure for 2022 was approximately €10 million per HAAS Avocats analysis, making the fine roughly four times the annual net profit. Criteo appealed; the Conseil d’État upheld the fine on March 4, 2026. A separate QPC (constitutional question) was dismissed on April 18, 2025. Sources: CNIL, Légifrance; L’Usine Digitale, juin 2023; Le Monde Informatique, juin 2023.

[69] In June 2023, O co-organized an open letter signed by over 150 executives warning that the European Parliament’s draft AI Act would regulate foundation models too heavily. TIME, November 22, 2023: “In June 2023, along with the founding partner of Mistral AI investor La Famiglia VC, Jeannette zu Fürstenberg, O helped organize an open letter signed by more than 150 executives.” In October 2023, O told Sifted that the AI Act could “kill Mistral.” Fortune described O’s dual role as “eyebrow-raising.” Max Tegmark, president of the Future of Life Institute, publicly objected: “I feel strongly that former officeholders should not engage in political activities related to their former portfolio.”

[70] Bloomberg Businessweek, December 13, 2023: “In late November, vans circled the streets of Paris and Brussels, warning that Europe was on the verge of losing its chance to control artificial intelligence... One of the vans displayed a digital billboard that read: ‘EU AI Act Without Foundation Models = Climate Act That Excludes Big Oil.’ The other had large photographs of the person the activists behind the campaign considered responsible: Cédric O.” The Franco-German effort to oppose foundation model provisions was reported by The Economist and confirmed by TechCrunch, November 2023.

[73] Minitel was launched by France Télécom in 1982, and it distributed free terminals to telephone subscribers. At its peak installed base (mid-1990s), 9 million terminals were connected, supporting over 25,000 services, including banking (3615 code services), shopping, railway reservations, and — most notoriously — messageries roses (adult chat services, which at their peak generated a significant share of Minitel revenue and launched Xavier Niel’s career). Minitel pioneered many concepts that the web later made global: online directories, e-commerce, chat services, and micropayments via phone bills. However, France Télécom’s centralized model and revenue-sharing structure created a walled garden that delayed French adoption of the open internet. France was the last major Western country to achieve mass internet adoption. The Minitel service was shut down on June 30, 2012. The irony: the technology that most closely anticipated the internet was operated by the same state monopoly whose business model prevented France from adopting the actual internet.

[75] Worldline SA traces its payment processing activities back to the 1970s (originally Sligos, then Axime, and now Atos Worldline). Spun off from Atos via IPO in 2014 at a €2.1B valuation, it grew through acquisitions (SIX Payment Services for €2.3B in 2018, Ingenico’s terminals business) to become Europe’s largest payment processor, reaching the CAC 40 with a peak market capitalization above €20B during the pandemic. On October 25, 2023, shares collapsed 59% in a single session — approximately €3.8B in market value was erased — after the company cut its sales outlook. It was ejected from the CAC 40. CEO Gilles Grapinet, who had led the Worldline division since its Atos days and had previously served as Breton’s directeur de cabinet at the Economy Ministry (2005-2007) before being installed at Atos, was forced out in September 2024 after a third profit warning in twelve months. In July 2025, a coordinated European journalism investigation (”Dirty Payments”) alleged that Worldline had covered up fraud by high-risk clients for years. Shares lost another 38%. Market cap as of mid-2025: approximately €1.3B — a decline of over 93% from peak. Sources: Bloomberg, October 25, 2023; Le Monde / GetToText, September 14, 2024; Worldline Wikipedia.

[76] CNRS scientific council critique of CIR, 2014, as cited in Crédit d’impôt recherche, Wikipedia FR. The original CNRS council publication has not been independently located; the Wikipedia article cites the council’s finding that France’s private R&D expenditure ranking fell from 13th to 15th globally (OECD) between 2006 and 2011, despite the CIR quintupling. The Cour des comptes (2013) independently reached a similar conclusion, criticizing the CIR as presenting “une incertitude sur son efficacité et son ciblage” — see Cour des comptes, 2013. A Sénat commission of inquiry on the CIR’s impact was convened, but its report was never made public following an internal vote not to release its findings — itself a remarkable outcome for a program costing billions annually.

[77] The author’s estimate of the cumulative CIR cost. Pre-reform period (1983–2007): the CIR cost approximately €0.5–1.7B annually, averaging roughly €1B/year, for a total of ~€25B over 25 years. This is a rough estimate; early-years costs were significantly lower (in the hundreds of millions of francs). Post-reform period (2008–2025): annual créance rose from €4.15B (2008) to €7.7B (2025), averaging approximately €5.5–6B/year over 17 years, for a total of ~€95B. Total: approximately €110–120B. The figure is conservative — it uses créance (tax credit claimed) rather than budget cost (cash paid), which was higher in some years due to accelerated reimbursement during the 2009-2010 financial crisis (the state paid €6B+ in 2009 alone). Sources for annual figures: Sénat rapport, May 2010; Cour des comptes, 2013; MESR, “Le crédit d’impôt recherche en 2021”; Sénat PLF 2022.

[78] Pierre Guillaumat (1909-1991), Polytechnicien (X-1928) and Corps des Mines engineer. Résistant in North Africa. Appointed head of the Direction des Carburants at the Liberation (1944), then administrator-general of the CEA (1951-1958), where he oversaw the construction of France’s nuclear infrastructure and played a decisive role in the development of the French atomic bomb. Simultaneously, president of EDF (1954-1959). Minister of the Armed Forces under de Gaulle (1958-1960). First CEO of Elf-Aquitaine (1966-1977), building France’s national petroleum company. Described by the Institut d’Histoire de l’Industrie as “l’exemple le plus achevé des grands capitaines d’industrie publique.” Source: Fondation Charles de Gaulle; Soutou & Beltran, Pierre Guillaumat, La passion des grands projets industriels (1994). The characterization of the X-Mines/X-Ponts generation as “nourris de saint-simonisme et des idéaux de la Résistance” is from “Nucléaire, TGV, télécoms : les dessous de la modernisation” in Histoire secrète de la Ve République, Cairn.info.

[79] Louis Armand (1905-1971), Polytechnicien (X-1924) and Corps des Mines engineer. Key figure in the Résistance-Fer railway resistance network, awarded Compagnon de la Libération. Director-general and then president of the SNCF (1949-1958), where he electrified the French rail network, modernized signaling, and oversaw the world speed record of 331 km/h in 1955. Subsequently, president of Euratom (1958-1959) and president of the Conseil de Perfectionnement of Polytechnique. Elected to the Académie française in 1963. Source: Ordre de la Libération; SNCF Group.

[80] André Turcat (1921-2016), Polytechnicien (X-1940). Joined the Free French Forces, then flew transport aircraft in Indochina. Became a test pilot, and was the first European to break the sound barrier (1954, on the Gerfaut II) and the first European to exceed Mach 2 (1958). As chief test pilot and director of flight testing at Sud Aviation, he flew Concorde’s maiden flight on March 2, 1969, and its first supersonic flight on October 1, 1969. Founded the Académie nationale de l’air et de l’espace in 1983. Source: Wikipedia; Ministère des Armées, January 2016.

[81] Macron’s speech at the AI Impact Summit, New Delhi, February 19, 2026. Full transcript: Élysée. Key claims: “One year ago in Paris, we announced $109 billion in AI investments, and we are delivering this project with a lot of data centers, €58 billion in 2025, powered by our decarbonized nuclear energy.” Note the currency switch from the Paris summit’s €109 billion to “$109 billion” — an approximately 6% inflation. The €58 billion figure traces to the UNCTAD Global Investment Trends Monitor No. 50, published January 20, 2026, which reported $69 billion (≈€58.5 billion) in foreign greenfield investment announcements for data centers in France in 2025. UNCTAD sources its data from Financial Times fDi Markets, which tracks announced project values — not disbursed capital, not construction spending, not revenue. UNCTAD’s own Table 2 shows that MGX alone accounted for $43.4 billion of the $69 billion total — 63% from a single Abu Dhabi entity. Le Grand Continent, January 24, 2026, confirmed that MGX plus Brookfield ($16.3 billion) accounted for approximately 87% of the total — the same pledges as at the Paris summit. Bercy’s own one-year assessment, reported by L’Usine Nouvelle, February 11, 2026, showed that 77% of project sponsors had identified a site and 20% of electrical power capacity had been secured — but the journalist concluded it remained “difficile de savoir quels investissements ont vraiment été réalisés” because investments are made in tranches and not all projects have been costed. The MGX campus: construction expected to beginin the second half of 2026, operations by 2028 (Black Ridge Research). As of publication, no ground has been broken. The Choose France summit in May 2025 had confirmed only €20.8 billion as “concrétisés.” The accounting: summit pledges (February 2025) → UNCTAD counts them as greenfield FDI announcements (January 2026) → Macron presents the UNCTAD figure as “delivery” (February 2026). On French AI companies: “Harmattan AI, partnering with Dassault, Gradium for voice AI, Poolside, H, Hugging Face. I could quote the stories of these unicorns and large caps.” Hugging Face is incorporated in New York; Poolside was founded by Americans; H lost three of its five co-founders within a year. On Mistral: “We invested in European large language models. Mistral, founded in Paris a little bit more than two years ago, is now valued at €12 billion.” The state did not fund or lead-invest in Mistral; BPI France co-invested in every round but led none.

[82] Mistral’s licensing evolution: Mistral 7B (September 2023, Apache 2.0), Mixtral 8x7B (December 2023, Apache 2.0), and Mixtral 8x22B (April 2024, Apache 2.0) were released as open-weight models. Mistral Large (February 2024), Mistral Medium, and subsequent frontier models are available only through commercial licenses, La Plateforme API, or enterprise agreements. The company’s enterprise offerings — including on-premises deployment, sovereignty-compliant configurations, and government contracts — are exclusively commercial. Mistral describes itself as committed to “open science” while operating an increasingly bifurcated model: smaller research models open, frontier commercial models closed. The pattern is common in AI (Meta’s Llama follows a similar trajectory), but in Mistral’s case, the commercial closure coincides with and enables the sovereignty contracting that ties the company to the French state apparatus. Sources: Mistral AI blog (model release announcements); Mistral licensing page; Contrary Research, “Mistral AI Business Breakdown,” September 2025.