GM. This is Milk Road PRO, mapping where AI value lands once intelligence gets abundant.
Today's edition features our latest PRO report on who actually pockets the money as smart software gets cheap and plentiful. The crowd is piling into the model makers chasing trillion-dollar IPOs, but we think the bigger payoff sits one level up the stack. You'll get the opening below, with the rest on the site.
Here's what we've got for you today:
- 🎯 The core bet: as cheap intelligence spreads, value slides away from the frontier labs and toward the application layer, with ServiceNow as the first name.
- 🧩 Open models closing the gap, like GLM-5.2 (MIT-licensed, 1M-token context, claiming results near Claude Opus 4.8), plus DeepSeek, Qwen, Kimi, Llama, and Mistral.
- 🚨 The June 12 export order that pulled Anthropic's Fable 5 and Mythos 5 offline worldwide, and why that lifts the value of model-agnostic setups.
Squid is the cross-chain routing infrastructure quietly powering some of the biggest moves in DeFi. See what they've built here.
Prices as of 2:00 p.m. ET. Powered by CoinGecko.

THE AI PRICE COLLAPSE IS ALREADY HAPPENING
In 2021, trading onchain meant Uniswap. It was the biggest DEX, so that is where the liquidity lived. Then rivals multiplied, traders stopped caring which venue filled their order, and aggregators like Cowswap took over. They routed across every exchange for the best price and quietly owned the customer. The dominant venue became a backend.
AI is heading the same way. Buyers are shifting from "which model is best?" to "which setup finishes the task at the lowest cost?"
Sam Altman heard it on stage this month: "My company spent my entire 2026 budget in Q1. Can you make this more efficient?" Output/tasks completed per dollar is the new metric. Routing and open models turn frontier labs into swappable suppliers behind the abstraction layer.
So here is the bet.
As intelligence goes from scarce and premium to abundant and tiered, value moves down the stack: away from the frontier labs racing to IPO near $1T, and toward the application layer that turns cheap intelligence into finished work. The consensus is buying the model makers. I think the money is one layer up.
Three forces push that way: open models are closing the gap, routing makes them swappable, and reasoning may compress into smaller systems. Add a policy wildcard (this month, the U.S. ordered Anthropic to pull two models offline worldwide).
All of this adds up to one conclusion: owning the model is a bad place to put your money.
FORCES MAKING INTELLIGENCE CHEAPER
The forces fall into three groups: capability, economics, and policy.
Capability is a cluster of three (open models, routing, and reasoning compression), while economics and policy each stand alone. Separating them might feel like noise, but putting them together forms a strong signal.
Capability is converging and unbundling
Three developments point the same way: the supply of usable intelligence is getting cheaper, and the unit you buy it in is becoming interchangeable.
Open models are closing the gap
Recent open or open-weight models such as GLM, DeepSeek, Qwen, Kimi, Llama, and Mistral show that the open ecosystem is no longer simply imitating old frontier models. It is pushing into reasoning, coding, long-context, agentic workflows, and multimodal tasks.
The chart below compares models on the common Artificial Analysis benchmark, with open-source models marked in red.

GLM-5.2 is a useful example. It is an MIT-licensed open-weight model with a 1M-token context window and strong coding and agentic performance. Its release materials claim near-frontier results close to Claude Opus 4.8.
MiniMax and DeepSeek similarly showed open reasoning models approaching closed reasoning models on math and coding benchmarks. Qwen, Kimi, Llama, and Mistral broaden the point: this is an ecosystem-level phenomenon, not a single-lab anomaly.
This evidence needs discipline, though.
Many benchmark claims are vendor-reported, and benchmarks are not production. Enterprise deployment also requires reliability, latency, tool use, memory, security, safety, auditability, support, and integration. That gap is where the opportunity exists.
Key takeaway: Open models only have to reach a certain quality threshold to pressure the price of intelligence. Beating the frontier outright was never the bar.
Routing and fusion make it modular
OpenRouter's Fusion announcement matters because it reframes the question from "which single model is best?" to "what system produces the best answer at the best cost?"
If multiple models can be combined, routed, or fused, the user does not need to care which individual model generated the final result.

Fusion delivers better results than any single model on its own. Sakana, another routing model launched recently, points the same way.
A routing layer can send easy prompts to cheaper models and reserve expensive frontier models for harder prompts, preserving much of the performance while cutting cost. Fusion goes one step further by combining model outputs into a stronger result.
This turns models into suppliers behind an abstraction layer.
The application or router owns the decision of which model to call. That weakens the customer relationship of any single model provider and raises the value of applications that are model-agnostic and focused on outputs.
There are real caveats. Fusion can add latency, complexity, and cost, and some tasks are better served by one strong model than by a committee. Frontier labs and hyperscalers may build routing internally, and the routing layer itself may become a feature rather than a standalone company. But an in-house router can only reach that lab's own models, which may underperform a neutral aggregator that picks across every vendor.
The direction still holds. If model choice becomes automated, the AI market looks less like one winner-take-most model and more like a marketplace of capabilities arbitraged by cost, latency, and quality.
Key takeaway: the future may be tiered, routed intelligence, where expensive frontier calls are used selectively rather than by default.
Reasoning may compress (speculative)
The third force is the most speculative and, if true, the most consequential: reasoning may be easier to compress than knowledge.
VibeThinker-3B is the example to look at.

It is a model with 3B parameters delivering similar outputs as giant models with hundreds of billions or even trillions of parameters.
The key claim is what the authors call the "Parametric Compression-Coverage Hypothesis": reasoning appears increasingly compressible, while knowledge remains a coverage problem. In other words, solving a math problem may require far fewer parameters than knowing every fact on the internet.
If true, the industry may be underestimating how much useful reasoning can migrate into smaller, cheaper, and increasingly local models. Memorizing vast world knowledge may require enormous model capacity, but many reasoning tasks may require fewer parameters when the model is paired with retrieval, tools, memory, and workflow context.
Smaller reasoning-focused models, including recent 3B-class experiments, hint at this. If a small model can reason over externally supplied context, use tools, and retrieve facts as needed, then not every useful reasoning task requires a frontier-scale model.
This would change the economics of AI. Investors often talk about token growth, but infrastructure economics depend on compute consumed per useful task. A token generated by a small local model is not the same as a token generated by a large frontier model. If useful reasoning migrates into smaller models, token usage can explode while GPU-hours grow more slowly.
To be fair, this is a hypothesis, not a conclusion.
Narrow benchmark performance does not prove broad production capability. Messy enterprise workflows require ambiguity handling, long-horizon planning, tool reliability, domain constraints, and error recovery, and frontier models may still dominate the highest-value reasoning tasks.
But even if reasoning compression is only partly true, it supports a bifurcated market: frontier models for frontier tasks, and smaller open or local models for a growing share of routine reasoning.
Key takeaway: if reasoning compresses, the frontier becomes a premium escalation tier you reach for selectively, while smaller models handle the default load.
PRODUCT FIRST. TOKEN LATER.
Most crypto projects launch a token first and build the product later.
But Squid did it backwards.
Squid is the cross-chain routing infrastructure quietly powering some of the biggest moves in DeFi:
- Buy, swap, bridge & send 20k+ tokens across 100+ chains
- $6B+ routed across chains since 2023
- 1M+ users transacting across 100+ chains

THE AI PRICE COLLAPSE IS ALREADY HAPPENING (P2)
Token monetization is under pressure
At OpenAI's "Intelligence at Work" event in early June 2026, Altman said cost concerns went from something that never came up at the start of the year to "all of a sudden, a huge issue," and described the now-common customer refrain: "My company spent my entire 2026 budget in Q1. Can you make this more efficient?"
That is normal: every customer wants lower prices. What makes it matter is that open models and cheaper APIs now create credible alternatives for many workloads.
At the same time, frontier labs remain capital-intensive. They spend heavily on training, inference, talent, safety, product, data, and infrastructure commitments. OpenAI and Anthropic are not traditional software companies with near-zero marginal cost. They are closer to capital-intensive intelligence factories trying to become software platforms.
That creates a squeeze. Lower token prices can expand demand, but they also make it harder to fund frontier-scale compute unless volume grows fast enough or the company captures value elsewhere, through products, enterprise platforms, consumer distribution, agents, or workflow ownership.
The counterargument is strong. Frontier labs may preserve premium economics if they keep extending the frontier in reasoning, agents, multimodal systems, tool use, memory, and safety. ChatGPT, Claude, Gemini, and Copilot are not just APIs; they are product surfaces with distribution and user habits. Frontier usage can grow in absolute terms even if the frontier share falls.
The risk is narrower: the premium window for each new frontier model may shrink as open models catch up faster and routers make substitution easier.
Measuring that pressure correctly matters because price per token is a treacherous proxy for output per dollar. The denominator that matters is total tokens consumed to actually finish the task, not the sticker price of each one. A model that is half the price but burns twice the tokens is a wash. Several things move the real number, usually against the cheaper option:
- Cache hit rate: Especially in cache-heavy agent loops, most of the input is re-read each turn, and cached tokens bill at a fraction of fresh ones. The more cache-heavy the loop, the less of the headline price gap survives.
- Success and retry rate: If the stronger model finishes in one pass and the cheaper one needs two or three attempts, the savings evaporate quickly.
- Stability: Inconsistent output forces more verification, human review, and re-runs. None of that shows up in the per-token price, but all of it is real cost.
In practice, the gap compresses: a model that looks 4-6 times cheaper on list pricing can land closer to 2-3 times cheaper once caching, retries, and stability are priced in. Still cheaper, but the comparison has to be run on output per dollar, not on the price sheet.
Key takeaway: Frontier labs are not obsolete, but their pricing power becomes more contested.
Policy adds a discontinuity
On June 12, 2026, the U.S. Commerce Department issued an export-control directive ordering Anthropic to suspend access to Fable 5 and Mythos 5 for any foreign national, whether inside or outside the U.S., including Anthropic's own non-citizen employees. Because access cannot be reliably gated by nationality, Anthropic disabled both models for all customers worldwide to stay compliant. Other Claude models, including Opus 4.8, were unaffected.
This matters beyond one company. It is the first time a U.S. frontier lab has been ordered to halt global access to specific models on national-security grounds. The stated trigger was a reported method of jailbreaking Fable 5's safeguards. Anthropic disputes the action, arguing the capability is widely available from other models (including GPT-5.5) and that recalling a deployed commercial model over a narrow jailbreak would, if generalized across the industry, effectively freeze new frontier releases.
The downstream effects cut in two directions.
For the bear case on premium tokens, this reinforces forces one through three. If a frontier model can vanish from your stack overnight by government order, the value of model-agnostic architecture rises sharply. Enterprises gain another reason to route, keep fallbacks, and hold open or local models they actually control. Supply continuity becomes a procurement requirement, not a nice-to-have. And open weights are extremely hard to recall once released, which is precisely why governments worry about them, but also why open ecosystems are structurally more resilient to this kind of kill switch.
For the bull case on frontier labs, regulation can also entrench. If the most capable models become a gated, security-screened tier, the top of the frontier turns into a scarce, regulated product that few suppliers can clear. That can protect pricing power for whoever stays on the right side of the line, while introducing binary, hard-to-hedge regulatory risk.
For Anthropic specifically, the timing is awkward: it filed confidentially for an IPO this month at a recent valuation around $965B, and the export-control decision could make investors question whether it can stay at the cutting edge if the government keeps singling out its models.
Caveats apply: This is one event, possibly resolved quickly, and may prove idiosyncratic rather than a template. And a regulatory moat, if one forms, could help incumbents more than it hurts them.
Key takeaway: intelligence now carries sovereign risk. The price of a frontier token may matter less than whether you are allowed to keep calling it, which strengthens the case for routed, open, and locally controlled fallbacks.
THE MARKET STRUCTURE THIS IMPLIES
The four forces imply a specific market structure: usage rises, the mix shifts down the stack on a lag, and value concentrates differently from volume.
The mix shift, on a lag
Cheaper intelligence raises usage. Developers do not spend less on the same work; they build products that call models more often: more agent steps, retries, verification, background automation, larger contexts, synthetic data, and monitoring. That is the AI version of Jevons paradox, and it is not controversial.
Everyone agrees the mix shifts toward cheaper, open, routed, and local models. What the market is mispricing is the timing: when it shifts, and why.
Output per dollar only starts to bite once something is in production rather than still being built, and most deployments are still being built.
The shift is gated by three clocks running at different speeds:
- The capability clock, open models closing the gap on yesterday's tasks, is fast and running now.
- The frontier clock keeps moving the goalposts. As models improve, users do not redo old tasks more cheaply; they attempt harder ones, so the premium tier regenerates at the top as fast as the bottom commoditizes.
- The economic clock, subsidies ending and deployments maturing from build into production, is slow and back-loaded.
That reframes the "good enough" question the bear case rests on. Good enough has two very different thresholds. Good-enough-to-start is a low bar, and open models clear it for most tasks today. Good-enough-to-replace a working production dependency is far higher because switching costs require re-evaluation, prompt re-tuning, reliability re-validation, and the organizational risk of a quality regression on something people now rely on.
So a cheaper model has to beat the incumbent by enough to clear the switching cost plus a risk premium. That friction favors incumbents in the use phase, the opposite of the naive "good enough, so everyone switches" story.
This is why current data, with closed-model usage still rising almost monotonically, does not refute the thesis. It is what an early, subsidized, time-to-market phase looks like: while you are still building, you reach for the best model and ignore the bill.
The real trigger is the end of the subsidy regime (included in users' subscriptions) or uncompetitive pricing. That is a far more falsifiable signal than "the mix will shift," and it is plausibly synced to margin pressure on the very labs now racing to IPO.
Key takeaway: the metric converges on output per dollar, but it bites task by task on a lag. Volume commoditizes early, value commoditizes late, and the premium tier keeps reconstituting at the top.
Where the value lands
Start from where adoption actually is: likely under 1% of people use AI effectively, and well under 1% of tasks are automated. When the addressable market is this small and growing this fast, gross expansion dominates share redistribution. Frontier revenue can rise in absolute terms for years even as the frontier share of calls falls. The value-capture story is real, but it is a later-cycle trade, sequenced behind the simpler one: adoption rises, so bet on distribution and usable product first. Collapsing both into one timeframe is the most common error in this debate.
Frontier labs. Not obsolete, but their pricing power is contested on a delay. They keep the highest-value reasoning, the regenerating top tier, and the product surfaces with distribution and habit: ChatGPT, Claude, Gemini, Copilot. Their exposure is concentrated in the cheap tail of routine tokens, much of which was never high-margin. The defense is to convert capability into integrated products and to own intra-family routing, downshifting queries to their own smaller models to protect margin. The one thing they structurally cannot do is route you to a competitor, which is why the commoditizing layer lives outside them.
Goldman Sachs projects that agent usage drives a 24-fold jump in token consumption by 2030, deepening the chip shortage over the next 12-18 months. The same note warns that cost-cutting could weigh on the largest labs, Anthropic and OpenAI, both planning near-trillion-dollar IPOs this year. And since January, Chinese models have passed U.S. models in token consumption on OpenRouter, helped by cheaper energy and more efficient models that let Chinese labs undercut U.S. pricing.

As AI investor Gavin Baker put it, "frontier captures 90% of value; open source carries 80% of tokens." Today he is right. The question is whether it holds. I tend to agree with Goldman Sachs that frontier pricing comes under heavy pressure from here.
The routing layer. Worth splitting in two. Intra-family routing is a margin tool the labs own, and no threat to them. Cross-family routing, best answer per dollar regardless of vendor, can only be owned by a neutral third party or by the application sitting above the models. That is where model-agnostic value capture actually accrues, and it strengthens the case for the workflow layer.
Hyperscalers. Structurally strong, but not economics-agnostic. They host closed, open, fine-tuned, and marketplace models and benefit from compute, storage, networking, governance, and data pipelines. But if smaller models cut compute per task, if workloads move local, or if GPU rental rates fall, CapEx returns can disappoint even when demand is real.

H100 GPU rental rates spiked through spring 2026 and have since rolled over (from $3.20 to $2.29 an hour). Falling and volatile compute pricing is precisely the neocloud and hyperscaler exposure: these layers capture usage, not necessarily durable pricing power. To be fair, H100 rates are likely falling partly because Nvidia's newer Blackwell chips are becoming available, but the point holds. Watch rental prices closely; they are a good proxy for the demand side of this trade. The second thing to watch is the token/GPU-hour decoupling.
Neoclouds. High-beta beneficiaries of raw GPU demand, but exposed to utilization, depreciation, financing cost, falling rental rates, and customer concentration. If hyperscalers are under pressure, so are the neoclouds, and they sit in a riskier position, because renting out GPUs is the only business they have.
But remember, there are still not enough GPUs and so whoever has secured access to the newest models can still have a huge advantage.
Workflow platforms. The cleanest setup, if models are cheap inputs. Say your quality threshold for a task deliverable is above 1,000 Elo points.

You can pick MiniMax-M3 or GLM-5.2 for that task instead of Claude and save 80% of the cost. This is measured against very long, complicated tasks, but the pattern holds for everyday work too: these models are getting good enough to satisfy the majority of regular tasks.
Owning workflows, data context, permissions, integrations, and outcomes lets you monetize completed work rather than tokens. ServiceNow needs cheap intelligence, not its own frontier model. The risk is horizontal agents abstracting the workflow platform away.
One caution on the lifecycle endpoint: "train your own model" is usually a trap below hyperscale. The durable path is the best frontier model first, then cheaper frontier or open models via routing, then a fine-tuned small open model for the one narrow, high-volume slice that justifies it. The data moat expresses better through retrieval and context than through pretraining.
Customers. Likely the largest beneficiaries. Cheaper models, routing, and open alternatives improve bargaining power, cut cost per task, reduce lock-in, and expand automation. A meaningful share of the surplus accrues to buyers (users), but, consistent with the lifecycle, that surplus is realized in the use phase, not the build phase.
Key takeaway: commoditization redistributes AI value rather than destroying it, and it does so on a clock. The cheap tail goes first, to customers, workflow owners, routers, and infrastructure. The premium tier reconstitutes at the top for as long as frontier labs keep extending it.
WHERE I'M PLACING MY BET
The market is moving from scarce premium intelligence to abundant, tiered intelligence. Total usage rises, but value capture moves across the stack: customers and workflow owners take the largest share of the surplus, infrastructure captures the work, and frontier labs hold the regenerating top tier for as long as they keep extending it.
That transition runs slowly and unevenly. An individual power user or an early-stage startup is well ahead of the typical large enterprise, which is often still handing employees blanket model access and discovering it burned a year's budget in a quarter. The mix shift is the same move from building to running, repeating cohort by cohort.
The closest precedent is the one this piece opened with. Uniswap did not lose its users to a better DEX; it lost the direct relationship the moment aggregators could route any trade to the best venue, and the question shifted from which exchange to who finds the best execution. The dominant venue became a backend. AI is sorting the same way: a router sends each task to the model tier that finishes it cheapest, and not every task needs the frontier model. As the cheaper tiers become credible, usage distributes by how much capability each one actually requires.
To be fair, Uniswap did not vanish and it built its own aggregator and front end to win the relationship back (without success), and the frontier labs will try the same with their product surfaces and in-house routing.
But cheaper open models only have to shift the mix, and they are already doing it.
That points me in one direction: bullish on the application layer, the companies that treat cheap, modular intelligence as an input and capture value in the workflows, data, and outcomes built on top. ServiceNow is the example I reached for, and I suspect it is the first of many.
Finding the application-layer names that win this is where I am taking the research next.
AI-GENERATED PODCAST 🤖
We’ve turned this PRO report into an AI-generated podcast to make it even easier to digest. You'll find the audio player below. 👇️🎧️
Why ServiceNow and OpenRouter win the AI shift
Disclaimer: This podcast was created using AI and is based on the research report above. While we've done our best to ensure accuracy, the audio may contain minor errors, technical glitches, or mispronunciations. Please note that this podcast provides an overview of the report and is not a comprehensive or definitive take on the topic.

BITE-SIZED COOKIES FOR THE ROAD 🍪
Pre-internet money rails meet the agentic economy? Yikes. Arc is Circle’s open Layer-1 built for the internet-native economy. Humans and AI welcome.*
Julio Moreno: Being near the value zone means nothing if demand keeps falling. "The bad news is from a fundamental side, demand is still contracting."
Phwoar! Polymarket gives OpenAI 46% odds of IPOing at $1.5T or greater.
Dan Tapiero: "Blockchain is the money of the autonomous AI age" - and that alone could blow past my $50T crypto market cap estimate.
Your Bitcoin can hand you cash without selling a single sat → That's Ledn's whole pitch, and our full review breaks down everything you need to know about the platform.
*this is sponsored content.

MILKY MEMES 🤣



ROADIE REVIEW OF THE DAY 🥛

VITALIK PIC OF THE DAY














