Taming your token gluttony

Professionals are gorging on AI. As prices rise to reflect the true costs, businesses need to learn to economise.

May 14, 2026

“Where did all my tokens go — surely I have some stashed in another model?” That was me, staring at a usage limit and running a mental inventory of every prompt I had fired at Claude that day. The penny dropped: I had been treating AI like an all-you-can-eat buffet.

Whilst I scoff at performative trends like tokenmaxxing, I was guilty of feeding AI elaborate prompts and running every task through the most powerful model available, because, well, why not. It was the cognitive equivalent of stuffing my face, then wondering why the cupboards are empty.

Token gluttony is real, and it is getting out of control.

AI labs have been subsidising our productivity for three years, but prices are set to increase dramatically to reflect real costs. Businesses that have developed an AI gorging habit — using it indiscriminately, at the wrong tier, in the wrong places — will feel this first.

Three disciplines help: deploying AI only where it genuinely earns its place; rightsizing model selection to task complexity; and applying basic FinOps governance to token spend.

Gorging on tokens

As Axios reports, OpenAI is projected to burn $14 billion in 2026. Every time you send a complex query, the Labs are losing money on the transaction. The strategy has been deliberate: flood the market, establish dependency, sort the unit economics later. That later is now.

92% of AI software companies now use mixed pricing models — combining subscriptions with usage fees, precisely to tackle the margin issue. The habits businesses have developed under the subsidised era are precisely the ones that will make that transition expensive.

In 2025, AI-native spending nearly doubled, with token usage, tier shifts, and AI upgrades inflating costs mid-contract. SMEs will soon experience the meter running faster, even though the per-unit rate falls. Today’s AI costs are just the floor, and businesses need to prepare for steep increases.

Time your meals

The first discipline is using AI only where it actually makes a difference. This may sound obvious, but many organisations have bolted AI onto workflows the way we previously bolted dashboards onto everything — reflexively, and without asking if it helps.

Does the AI output change what a person does next? If the answer is no — if it generates a summary no one reads, a draft rewritten from scratch, a recommendation ignored — it is decorative and wasteful. Every prompt that produces nothing of value is a small act of organisational gluttony.

Map AI to the points in a workflow where cognitive load is highest and quality variance matters most: first-draft generation, structured data extraction, contract reviews, meeting synthesis. AI does not need to be present at every step. It needs to be present at the right ones.

Portion control

The second discipline is rightsizing. Most organisations default to the most powerful model available for everything, because it was the one they integrated first, and the marginal cost felt invisible.

Shifting to smaller models yields substantial savings. In Q1 2025, 73% of enterprise token volume was routed to the two most expensive model tiers, simply because those were the models the team had integrated. By Q1 2026, that figure had fallen to 31% without significant loss of quality.

You would not provision the most powerful Cloud instance to run a static web page, so why would you do that for AI?

Smaller models handle around 70–80% of enterprise tasks adequately, leaving the most complex reasoning to large-scale systems. A two-tier architecture — lighter models for volume, frontier models for complexity — is now standard. You would not provision the most powerful Cloud instance to run a static web page, so why would you do that for AI?

Count the calories

The third discipline is governance. As per Cloudkeeper, 72% of IT and financial leaders say generative AI spending has become completely unmanageable. The root problem is structural: AI costs are volatile, consumption-based, and fragmented across teams and tools. Most organisations have no visibility into what they are spending, who is spending it, or whether it is producing anything worth the invoice.

FinOps practices developed for cloud spend transfer directly. Allocate costs to teams, set budgets and monitor anomalies. Treat prompt engineering as a cost discipline: shorter, sharper inputs and cached responses for repeated queries are free savings available immediately.

Your ninja developers regard tokenmaxxing as a power move. It is gluttony with a keyboard and should be controlled.

Takeaways for leaders

Cutting waste is not the same as austerity. It is what grown-up technology adoption looks like. Businesses should apply the same principles to AI as the rest of their IT stack:

1. Audit, then optimise

Map where AI is in use, for what purpose, and at which model tier. Most organisations will find significant waste immediately.

2. Match the model to the task

Reserve frontier capability for complex, high-value work. Route routine tasks to smaller, cheaper models. The quality gap is rarely material; the cost gap almost always is.

3. Govern the spend now

Build token costs into technology reporting. Set team-level quotas. Review it with the same attention you give the cloud bill.

The subsidised era was not a gift. It was designed to generate demand and was wildly successful at it. What comes next is returning to normal business practices: figure out where AI earns its place, stop wasting it where it does not, and pay attention to what it costs.

As it turns out, knowing where your tokens went is a more useful question than wondering if there are any left in the cupboard.

Humans After All

Discussion about this post

Ready for more?