More in
AI at Work News
Token Prices Fell 67% This Year. Your AI Bill Is Going Up Anyway
6月 3, 2026 · Currently reading
Small Businesses Using AI Report Higher Revenue and Shorter Workdays
6月 3, 2026
DeepSeek Made Its 75% AI Price Cut Permanent. The Floor Just Dropped
6月 3, 2026
What the AI for Main Street Act Means for Small-Business Owners
6月 3, 2026
Microsoft Built Its Own AI Models to Cut Its Reliance on OpenAI
6月 3, 2026
Microsoft Made Windows an Agent Platform at Build 2026. Here's the CTO Decision Before the Windows Agent Store Goes GA
6月 2, 2026
NVIDIA Just Made AI Models 30% Cheaper. Renegotiate Your Contract
6月 2, 2026
Anthropic Filed to Go Public at $965B. The AI Vendor Calculus Changed
6月 2, 2026
Camunda's ProcessOS Argues BPM Was Always the Right Layer for AI Agents
6月 1, 2026
ServiceNow and Accenture Bet Forward Deployed Engineers Fix the Agent-to-Production Gap
6月 1, 2026
Token Prices Fell 67% This Year. Your AI Bill Is Going Up Anyway

Your vendors are quoting lower prices. Your finance team is flagging a bigger AI line item. Both are correct.
That's not a contradiction. It's the defining cost trap of enterprise AI in 2026, and most CEOs don't have a framework for it yet.
According to a Fortune analysis published May 22, 2026, the shift to agentic artificial intelligence (AI) workflows is causing total enterprise spending to rise sharply even as the per-unit cost of AI inference keeps falling. The problem isn't the price per token. It's how many tokens modern AI workloads consume.
The Jevons Paradox Is Running Your AI Budget

Economists have a name for this pattern. The Jevons Paradox holds that when a resource gets cheaper, total consumption tends to rise faster than the price falls, because lower cost unlocks new use cases that weren't viable before. Coal. Electricity. Bandwidth. Now tokens.
Blended AI inference costs fell roughly 67% year over year, dropping from approximately $18.40 to $6.07 per million tokens between the first quarter of 2025 and the first quarter of 2026, based on 2026 industry cost analyses. That's a genuine and dramatic price drop.
But 73% of enterprises reported in 2026 that their actual AI costs exceeded original projections. That gap isn't a budgeting error. It's the Jevons Paradox in action: cheaper AI enabled companies to deploy more of it, in more places, running more complex workloads.
Key Facts
- Token prices fell roughly 67% year over year (approx. $18.40 to $6.07 per million tokens, Q1 2025 to Q1 2026). (2026 industry cost analyses)
- 73% of enterprises reported AI spend exceeded original projections in 2026. (2026 cost analyses)
- Goldman Sachs analysis projects agentic AI could increase total token demand by up to 24x. (Tom's Hardware, citing Goldman Sachs)
Why Agentic AI Blows the Budget
Simple AI use cases, a chatbot answering a question or a model summarizing a document, consume tokens in one exchange. One prompt in, one response out. You can model that cost on a spreadsheet.
Agentic workloads don't work that way. An agent completing a task independently triggers 10 to 20 model calls per user action. Each step in the workflow is a separate inference call. Retrieval-augmented generation (RAG), where the model pulls in relevant documents to answer accurately, inflates context windows by 3 to 5 times. Always-on monitoring agents run continuously around the clock, generating tokens whether or not a human is watching.
The compounding effect is significant. A single agentic workflow that looks like "one task" might consume fifty times more tokens than the same request handled by a basic single-call model.
Goldman Sachs analysis reported by Tom's Hardware put a number on this: agentic AI could increase total enterprise token demand by up to 24 times compared to current consumption. That's not a forecast for some distant future. Companies like Uber are already living it.
Uber's chief technology officer disclosed that the company burned through its entire 2026 AI budget in four months. Internal adoption of Claude Code, Anthropic's AI coding assistant, jumped from 32% to 84% of roughly 5,000 engineers. The per-engineer monthly cost runs $500 to $2,000, according to the Fortune and Tom's Hardware reporting. At that adoption rate, across that engineering base, the spend adds up fast.
And as Fortune reported, Microsoft has acknowledged that for some workloads, the AI processing cost exceeds what a human worker would cost to do the same task.
GitHub Just Changed the Billing Model. Others Will Follow.
This cost reality is now reshaping how AI vendors structure their contracts.
On June 1, 2026, GitHub moved Copilot from flat-rate subscriptions to a usage-based model using "GitHub AI Credits" tied directly to token consumption. Microsoft is moving its broader Copilot suite toward the same credit-based structure in the same window.
This matters for your next vendor negotiation. The flat-seat model that made AI budgeting feel like a predictable SaaS expense is ending. You're moving into metered utility territory, and the bill scales with usage, not with headcount.
The governance models for AI, covered in more depth in Agentic AI and the Governance Gap, weren't designed for metered consumption. Most enterprises don't have per-workflow cost visibility yet.
What to Do About It
Cheaper tokens are a real advantage if you capture them correctly. But capturing them requires treating AI infrastructure the way you treat cloud compute: as a metered utility with cost controls at the workflow level, not a flat-rate seat license.
Here's a practical starting framework.
Instrument per-workflow token cost before you scale. You can't optimize what you don't measure. Before expanding an agentic workflow to your full organization, measure what it actually costs per task completed. Compare that to the human-labor equivalent. Some workflows will show a 10x cost advantage. Others will show the opposite.
Set agent budgets and guardrails. Each deployed agent should have a maximum token budget per task. When the agent exceeds it, the task should escalate to a human or fall back to a cheaper model. This is standard in cloud cost management and should be standard in AI management.
Decide which workflows justify agentic spend. Not every task needs a multi-step autonomous agent. A simple classification or summarization task can use a single-call model at a fraction of the cost. Cheaper open-weight models are increasingly viable for lower-stakes inference, as explored in Nvidia's Nemotron analysis for CTOs. Match the model to the workflow complexity.
Renegotiate at renewal with usage data in hand. If your vendor is moving to credit-based billing, your baseline consumption data from the flat-rate period is your negotiating leverage. Know your actual usage before you walk into that conversation.
Treat AI like a utility, not a software license. Electric companies don't sell you "unlimited electricity" for a flat monthly fee. Neither will AI vendors, once agentic adoption fully matures. Build your internal governance and finance processes around consumption-based accounting now, before the billing model shifts underneath you.
The BCG AI Radar 2026 finding that 90% of CEOs expect agentic AI to deliver ROI this year suggests most organizations are betting heavily on this technology. But ROI calculations that assume flat-rate pricing will be wrong. The cost structure has changed. The strategy needs to catch up.
For a broader view of how AI vendor relationships are shifting, the vendor calculus analysis from Anthropic's IPO filing covers the strategic implications of AI becoming a foundational infrastructure spend.
The 3-Question Agent Cost Audit
Before deploying any agentic workflow at scale, run these three questions:
- What is the token cost per completed task, measured across 100 real runs (not demos)?
- Does that cost beat the human-labor alternative at your current scale, and at 10x scale?
- What guardrail limits the cost when the agent behaves unexpectedly or the task scope expands?
If you can't answer all three, the workflow isn't ready for production at scale.
Frequently Asked Questions
Why is my AI bill going up if token prices are falling?
Because modern AI workloads consume far more tokens than the simple chat applications most budget models assumed. Agentic workflows trigger 10 to 20 model calls per user task, and retrieval-augmented generation (RAG) expands context windows by 3 to 5 times. The per-token price is lower, but the volume consumed is dramatically higher. Total spend rises even when unit price falls, which is the Jevons Paradox applied to AI infrastructure.
What should I do before my next AI vendor renewal?
Collect your actual per-workflow token consumption data from the current flat-rate period. As vendors move to credit-based billing, that data becomes your negotiating baseline. Know what you've been consuming before you agree to a new pricing structure.
Which AI workloads are not worth the agentic token spend?
Single-step tasks that don't benefit from reasoning chains: document classification, basic summarization, entity extraction, sentiment scoring. These can run on cheaper single-call models or open-weight alternatives. Reserve multi-step agentic spend for workflows where the autonomous reasoning actually changes the output quality, like complex analysis, multi-source research, or tasks requiring sequential decision-making.
Source: Fortune, May 22, 2026 | Tom's Hardware | Duperrin, June 1, 2026
