The hangover from euphoria, or how AI agents can blow through a year's budget in a few hours

Not so long ago, artificial intelligence was supposed to be the ‘ultimate solution’ to productivity problems – a digital alchemist turning empty process flows into pure efficiency gold. The ball was in full swing and the champagne was pouring from the presentations of the models promised by suppliers.

Today, however, instead of more breakthroughs in machine reasoning, something far less spectacular is whispered about in the corridors of business conferences: the happiness bill. For it turns out that the ticket of admission to the world of AI was not a one-off fee, but a dynamic, hard-to-tame subscription for the future, the cost of which can rise exponentially overnight.

What we are witnessing is the birth of ‘token fever’. It’s a state where the enthusiasm of engineers collides with the dismay of CFOs. For decades, we have been accustomed to the SaaS model – predictable, fixed licence fees that were easy to budget for. Generative AI has shattered this order, introducing a ‘probabilistic’ model. Here, a mistake in one agent’s logic or an overly effusive prompt can burn up financial resources faster than traditional cloud infrastructure consumes electricity.

Uber and a mistake worth billions

If the tech industry was looking for the ‘canary in the coal mine’, it found it in San Francisco in April 2026. At the IA HumanX conference, Praveen Neppalli Naga, Uber’s CTO, gave a speech that sobered even the biggest optimists. The giant, which had invested an astronomical $3.4 billion in research and development in 2025, faced a wall: its annual budget for artificial intelligence had evaporated in just four months.

It wasn’t a matter of one misguided investment decision, but a side effect of an engineering fantasy with no brakes. Uber, aiming for aggressive technology adoption, encouraged its developers to use agents like Claude Code en masse. The result? 11% of back-end code was already being generated by artificial intelligence, but the price for this ‘efficiency’ proved deadly. Without proper performance filters and oversight of token consumption, AI ceased to be a lever for savings and became an out-of-control spending engine.

The case of Uber is a classic example of a ‘tsunami of tokens’. Autonomous agents, entering infinite iteration loops with no clear limits, can burn a fortune in the time it takes to drink an espresso. It’s a painful lesson for any CIO: innovation without financial architecture is just a very expensive hobby. Naga admitted that the company had to go back to the design table to completely redefine its strategy. Any company that deploys AI today without a rigorous profitability analysis risks having its success measured not by margin growth, but by the speed with which it exhausts its own resources.

Goodbye SaaS, hello volatility

We are bidding farewell to an era where the IT budget was like a fixed Netflix subscription – predictable, secure and giving a false sense of control. For years, the SaaS model accustomed us to per-user licensing, where the only risk was a surplus of accounts that no one used. Generative AI brutally ends this period of ‘licensing peace of mind’ by introducing a billing model that is more akin to electricity bills during an energy crisis than traditional software.

The shift from fixed costs to variable costs is a fundamental paradigm shift. In 2024, IT departments were buying AI access in a lump sum. Today, in 2026, vendors such as OpenAI and Anthropic have eliminated unlimited Enterprise plans, introducing dynamic billing for token consumption. The reason is mundane: AI agents have destroyed the distribution curve on which the old business was based. The subscription model only worked when the ‘lec’ users subsidised the ‘intensive’ ones. One, when we started employing autonomous agents, the differences became absurd. Analyses show cases where a user paying $100 a month generated costs of $5,600 in a single billing cycle. A subsidy ratio of 25 to 1 is a straightforward path to supplier bankruptcy, hence the sharp turn towards ‘use-pay’ billing.

This makes IT spending probabilistic. This radically differentiates AI from the traditional cloud. A forgotten server in AWS generates a fixed, linear cost. A poorly designed prompt or agent without iteration limits, on the other hand, can go into a loop and generate millions of useless tokens in seconds. In this new world, a programmer’s logical error doesn’t end up ‘crashing’ the application – it ends up draining the company account at the speed of light. This means an immediate redesign of IT finance and the abandonment of rigid budget frameworks in favour of flexible management of the ‘economics of inference’.

Tsunami of tokens – a new unit of risk

In the modern CIO’s dictionary, a new, much more predatory term has emerged alongside ‘technical debt’: the ‘token tsunami’. This is a phenomenon in which autonomous agents, rather than freeing up staff time, fall into loops of endless iterations, burning up budgets with the intensity of a steel mill. The problem is that a bot, unlike a human, never feels fatigue or shame for duplicating mistakes – it simply consumes resources until it encounters a hard limit or empties its account.

The scale of the problem is such that even the biggest players have had to revise their dogmas. Gartner is sounding the alarm: by the end of 2027, up to 40% of agent-based AI projects will be cancelled. The reason? Not a lack of vision, but brutal mathematics – rising costs while lacking precise tools to measure real business value.

Here is where the biggest paradox of 2026 manifests itself: the unit price per token is steadily falling, but the total bill is rising. Indeed, AI agents consume between 5 and even 30 times more units per task than a standard chatbot. This is a classic trap of scale – an efficiency that becomes economically inefficient by its sheer volume. If your AI strategy is based solely on the hope that ‘models will be cheaper’, you’re just building a castle in the sand that the coming tsunami will wash away in one billing cycle. Without rigorous control over what machines process and why, modern IT becomes hostage to its own unbridled computing power.

AI FinOps – the new alchemy of IT finance

If you thought Cloud FinOps was challenging, get ready for a no-holds-barred ride. Traditional cloud optimisation was about simple craftsmanship: shutting down unused servers and keeping an eye on instance reservations. AI FinOps is a completely different discipline – it’s probabilistic rather than deterministic resource management. Here, the unit of expenditure is no longer processor man-hours, but the cost of a useful response relative to the cost of an erroneous or ‘hallucinated’ response.

In 2026, as many as 98% of FinOps teams consider spending on AI as their number one priority. The reason is simple: in the traditional cloud, a technical error rarely leads to an exponential increase in cost. In the world of AI agents, misconfigured prompt logic can burn through budgets faster than you can refresh your dashboard. This is forcing IT leaders to define a new metric – the economics of inference. We no longer count how much a model costs us, but how much the operational success gained from its work costs us.

And that means rewriting dashboards from scratch. Classic management frameworks such as ITIL 4 or COBIT, while providing a solid base, today require immediate extensions to include prompt lifecycle management or agent iteration limits. AI FinOps is not just about Excel tables; it is a new management philosophy where an engineer must think like an economist and a financier must understand LLM architecture. Without this synergy, buying tokens is akin to pouring rocket fuel into a hole in the tank – the effect is spectacular, but extremely short-lived and frighteningly expensive.

How not to burn through a decade of innovation

The time window for non-punitive errors has just slammed shut. To avoid a ‘token tsunami’, organisations need to move from a phase of joyful adaptation to a phase of rigorous architecture. The first and most pressing step is to conduct a token consumption audit – not a general one, but a precise one, broken down by specific teams and use cases. When a query to a model can cost as much as a good cup of coffee, we need to know who is ordering a double espresso without a clear business need.

The key to financial survival is the implementation of three technical foundations:

RAG (Retrieval-Augmented Generation): Providing the model with only the data it actually needs, drastically reducing the token ‘diet’.
Specialist models: Abandoning the ‘all-knowing’ giants in favour of smaller, cheaper and finely-trained models for repetitive tasks.
Corporate charter for the bot: Establish rigid iteration limits and budgets per agent. This is a matter of elementary financial hygiene.

We also need to review how our people work with the technology. Identifying the ‘Centaurs’ (experts empowering their AI skills) and eliminating the ‘Automators’ (unreflectively delegating work to a machine) will allow a real increase in ROI. The most expensive and fastest way to waste an innovation budget is to buy millions of tokens just to have teams working exactly as they will in 2022, only with an on-screen chat interface.