🥛This AI selloff makes no sense❓

GM. This is Milk Road AI, the daily download on the biggest stories shaping AI, markets, and where the money is actually flowing.

Here’s what we’ve got for you today:

✍️ The market is misreading AI memory demand.
🎙️ The Milk Road AI Show: SpaceX + Tesla + xAI: The $25B Mega Bet to BREAK NVIDIA's Chokehold.
🍪 xAI just lost its entire founding team.

Nexo is back in the U.S. - and new clients get 30 days of Wealth Club Premier perks! Higher yields, lower borrowing rates, and crypto cashback - start here.

Prices as of 10:00 a.m. ET.

AI MEMORY DEMAND IS ABOUT TO ACCELERATE?

In 1954, a Swanson executive had a problem.

The company had vastly overestimated how many Thanksgiving turkeys America would eat that year, and they were sitting on 520,000 pounds of frozen birds and had no way to move them.

A salesman named Gerry Thomas had an idea. He designed a segmented aluminum tray, one compartment for turkey, one for cornbread dressing, one for peas, froze it, and sold it for 98 cents.

And just like that, the TV dinner was born.

Source: Giphy

The restaurant industry panicked because why would anyone pay $3 for a meal at a restaurant when they could eat at home for a dollar?

The logic seemed airtight, but fast forward to today, and the U.S. restaurant industry is worth more than $1.1T. Americans eat out more than any generation in history.

The TV dinner didn’t kill restaurants, it normalized convenience and raised expectations. When people went out, they went more often and spent more.

That same pattern is playing out again, just in a different industry.

Google published something called TurboQuant, which is a compression algorithm that makes one specific type of AI memory roughly 6x more efficient.

The market panicked, and the four memory stocks fell off a cliff in two trading sessions.

The logic was that as AI became more memory-efficient, we needed less memory.

But just like in 1954, making something more efficient doesn’t reduce demand, it increases it.

Efficiency doesn't shrink demand, it makes people hungrier.

TurboQuant, what it actually does

TurboQuant targets one specific component of AI memory, the KV cache.

Here's a plain English explanation of what that is.

When an AI model is answering you, it needs to remember everything it has already read and written, it would start over from scratch with every new word, and that memory is called the KV cache.

The longer the conversation, the bigger it gets.

A large model processing a 128,000-token context window, roughly a short novel, can use 40GB of memory just for the cache alone.

TurboQuant compresses that cache from 16 bits per value down to 3 bits, a 6x reduction.

You can drop it into any model, and it works, but here's what the market got wrong.

The headline that crashed four stocks said: "Google reduced AI memory usage by 6x."

Google reduced one type of memory, the KV cache, by 6x, but not AI memory overall, and those are very different things.

AI memory has four components. TurboQuant touches exactly one:

Model weights are the knowledge baked into the model itself. A 70B model needs ~140GB just to hold these. TurboQuant: zero effect.
Optimizer states, used during training to track how the model is learning, and are often bigger than the weights. TurboQuant: zero effect.
Activation memory is also a training-only resource. TurboQuant: zero effect.
KV cache is the active memory during a conversation. TurboQuant: 6x smaller.

Three of the four are untouched.

But here's where a reasonable bear pushes back, inference is now the dominant AI workload.

If TurboQuant compresses inference memory, doesn't that still move the needle on demand?

Fair question, but the short answer is no, and here's why.

Inference now accounts for 60-70% of total AI compute demand and crossed 55% of AI cloud spending in early 2026, surpassing training in dollar terms for the first time.

Estimates show AI inference memory growing to over $250B by 2030, and that’s a very conservative estimate.

Source: Grandviewresearch

And reasoning models like OpenAI’s o1 and o3, which think through problems step by step before answering, have made each query 5-6x more expensive than older-style models because they run multiple passes instead of one.

So inference is big, growing fast, and getting more expensive per query.

The bear is right about all of that, but compressing the KV cache doesn't reduce how much total HBM gets bought.

Here's the simple way to think about it: the HBM on a GPU is a fixed resource, you've already bought it.

When TurboQuant frees up KV cache space, that space doesn't sit idle, it immediately gets used for longer conversations, more simultaneous users, or more complex reasoning.

The capacity gets absorbed the moment it's available.

If the cache overflows the GPU entirely, it spills to the next layer, about 18TB of standard server memory in the rack. Also fixed and immediately refilled.

Only at the very bottom tier, enterprise NVMe SSDs, does TurboQuant create any real breathing room.

And even there, the freed capacity gets reinvested into higher-capability products, not banked as cost savings.

Nobody changed their demand outlook by a single unit.

So, TurboQuant doesn't reduce memory demand, but it gets better than that, there's a strong case that it actually creates more of it, and history is the receipts.

CRYPTO SHOULD WORK HARDER FOR YOU

Most people hold crypto and hope.

The smart money? They're earning interest on it, borrowing against it without selling, and trading it.

Where can you do the same all in one place? Nexo.

And right now, new U.S. clients get 30 days of Wealth Club Premier (benefits normally reserved for loyalty program members):

Enhanced interest rates on your digital assets
Lower borrowing costs against your crypto
Up to 0.5% cashback on trades

No need to sell to access liquidity. No juggling 5 different platforms.

👉 Get started here

*Disclaimer: Geographic restrictions and terms apply.

AI MEMORY DEMAND IS ABOUT TO ACCELERATE? (P2)

Here's the simple version of why efficiency gains in AI almost always create more demand, not less.

William Stanley Jevons figured this out in 1865.

When steam engines got more efficient, economists expected Britain to use less coal.

Instead, they used three times more because cheaper steam power became accessible to industries that couldn't afford it before.

Efficiency didn't conserve the resource but rather unlocked an explosion of new demand.

Every major AI efficiency gain has followed this exact pattern:

Flash Attention (2022): Cut attention memory ~4x. Context windows exploded from 2K to 100K+ tokens, and total memory demand increased.
DeepSeek R1 (2025): Cut training compute by ~75%. NVIDIA saw a historic selloff, but hyperscaler CapEx ramped shortly after, and the stock hit new highs within weeks.
TurboQuant (2026): Cuts KV cache ~6x. The likely outcome is longer context models becoming cheaper to run, unlocking a new wave of demand.

That last point is the specific mechanism.

Google tested 10M-token context windows with Gemini 1.5 Pro and disclosed that the results were great, but they chose not to release the model partly because inference costs were too high.

A 10M-token session requires around 410GB of KV cache, which means locking up five H100 GPUs for one user at a time.

At current cloud pricing, that's hundreds of dollars per conversation, and no product gets built on that math.

TurboQuant compresses that 410GB down to around 68GB, one H100.

Now the economics work. Google ships it, the industry follows, developers build on top at scale, and total inference HBM demand goes up, not down.

The efficiency will likely unlock the next generation of products that will consume more of it.

The counter-signals everyone is ignoring

At CES 2026, NVIDIA launched the Context Memory Storage Platform, a purpose-built appliance specifically designed to scale KV cache beyond what GPU memory can hold.

Source: Storagereview

It connects HBM to standard server memory to NVMe SSDs in a tiered stack, with dedicated chips managing the movement of data between layers in real time.

Hardware roadmaps are built 18 to 36 months in advance.

NVIDIA didn't read the TurboQuant paper and scramble, but they saw the KV cache scaling problem coming almost two years ago, and their answer was to build dedicated hardware to scale it, not compress it away.

Companies don't build new product categories for problems that are about to disappear; they build for problems that are about to get much, much larger.

Now let’s zoom out. Jensen believes AI infrastructure spending could reach $3T to $4T by the end of the decade.

If NVIDIA holds its roughly 55-60% market share through that period, analysts at New Street Research project the company could become the first in history to hit $1T in annual revenue from data centers alone.

That's three out of every ten dollars spent on AI infrastructure flowing through NVIDIA's order book.

Every one of those dollars runs on memory, and the demand is only accelerating.

Then there's Terafab.

Elon Musk announced a joint $20-25B semiconductor fabrication facility between Tesla, SpaceX, and xAI.

The goal is one terawatt of computing output per year, a number so large it has no precedent in the history of private manufacturing.

Musk was explicit about why he's doing it.

All the existing fabs on Earth combined produce roughly 2% of the compute, he says, his company will eventually need.

TSMC, Samsung, Micron, all growing, just not fast enough. "We either build the Terafab, or we don't have the chips," he said. "And we need the chips".

Terafab will include memory production under the same roof as chip design, lithography, fabrication, and packaging. Small-batch AI5 production is targeted for late 2026, volume in 2027, with a long-term goal of 100 to 200 billion custom AI and memory chips per year at full scale.

One man deciding to build the largest semiconductor fab in history because the rest of the industry isn't expanding fast enough is not the behavior of someone who thinks memory demand is about to fall off a cliff.

What could actually disrupt this trade?

Honest analysis means naming the risks that could genuinely break the thesis, not dismissing them.

Here's what's actually worth watching.

China is one of them. CXMT is ramping DRAM capacity faster than most people realize. They are not in HBM yet, but they are scaling standard DRAM aggressively.

If export controls ease, that added supply could put real pressure on commodity pricing.

The entire bull case also depends on hyperscalers continuing to spend. Amazon, Microsoft, Google, and Meta are committing hundreds of billions to AI infrastructure.

Apollo puts it into perspective: hyperscalers are spending over $600B a year on CapEx, more than the combined military budgets of Germany, the UK, France, Japan, Italy, and Canada.

Source: Apolloacademy

If that spending slows for any reason, whether it is weaker AI adoption or a macro shock, demand can fall quickly.

There is also a longer-term architecture risk. Some of the newest designs are pushing more memory directly onto the chip, reducing reliance on external DRAM. If that approach scales, it could gradually shift where memory demand lives.

And then there is the cycle itself. Memory has always been a boom and bust business. Prices rise, the industry overbuilds, and eventually buyers push back. There are already early signs of that on the commodity side.

Source: Bloomberg

None of these risks is immediate, but they are real, and they are worth watching.

And I’ll be completely honest.

When this move happened, I adjusted my own position in Micron.

Not because the long-term thesis broke, but because short-term narratives like this can create volatility you can’t ignore.

If you want to see exactly how I’m positioning around this, including the specific changes I made to my MU position and the buys I made this week, I break it all down inside the Discord.

Come join us.

For our PRO members, I bought MU along with a few more stocks which were all alerted in Discord.

Source: Milk Road/Trade Logs

If you're a PRO member and you haven't joined the Discord yet, I don't know what to tell you, you're basically paying for the map and refusing to open it.

Alright, that's it for this edition of Milk Road AI. We want to hear from you.

Did the market get TurboQuant wrong?

THE $25B MEGA BET TO BREAK NVIDIA'S CHOKEHOLD 🚀

In last Wednesday's episode, we dig into the massive announcement that Tesla, SpaceX, and xAI are joining forces to build a custom chip manufacturing operation, with $20–$25B in planned spend and a "Terafab" at the Texas Gigafactory.

Here's what you'll hear:

Why vertical integration into chip manufacturing makes strategic sense when you're scaling robots, autonomous vehicles, and AI models all at once.
How geopolitical risk around TSMC and Taiwan is pushing major players to onshore their semiconductor supply chains.
What a potential SpaceX IPO at a $1T+ valuation could mean for index flows, capital allocation, and broader market dynamics.
The long-term vision for computing off-Earth, plus the real execution risks of building semiconductor fabs from scratch.

Tune in and see for yourself 👇️

YouTube | Spotify | Apple Podcasts

Hackers are using prompt injections to break into crypto users accounts. Don’t be a victim! Use private, safe and encrypted chat Okara. Use the code MILKROAD to get a 20% discount.

Summ (formerly Crypto Tax Calculator) is a tax software built specifically for crypto. Get started for free with Summ.

Kalshi is one of the largest prediction markets that allows users to trade on the outcome of real world events. Sign up here for a free $10 and start trading.