How much will AI cost in 2030? Forecasts for companies

The current technology landscape has accustomed decision-makers to a specific form of digital gravity: the cost of computing power is continually falling, while its availability is increasing. Gartner’s latest forecasts from March 2026 seem to confirm this market constant. By 2030, the cost of inference on trillion-parameter-scale language models is predicted to fall by more than 90% compared to 2025. To the superficial observer, this heralds widespread, almost free intelligence. It is, however, a warning signal of a phenomenon that can be described as the cheap token paradox.

Understanding the mechanics of the coming unit deflation requires a look at the infrastructure foundations. Price reductions are not just driven by economies of scale, but by a profound transformation in the way AI systems consume energy and silicon. A key factor is becoming the widespread adoption of chips designed strictly for inference, which are replacing general-purpose GPUs.

In addition, innovations in the architecture of the models themselves allow for outstanding cognitive performance with significantly less computational load. This trend is complemented by the development of edge technologies that allow data to be processed locally, thus eliminating costly transfers to central clouds.

However, this raises the question of the real impact of these changes on the company’s profit and loss balance sheet. In the economics of technology, a fall in unit costs almost always leads to a sharp rise in consumption, which is known in the literature as Jevons’ paradox.

In the context of artificial intelligence, this phenomenon takes the form of a shift from simple chatbots to autonomous AI agents. While a classic text assistant is content with a few hundred tokens to answer a question, modern agent systems operate on a completely different scale.

AI agents are not just passive recipients of commands. They are systems that plan, verify their own errors, use external tools and conduct multi-step reasoning in feedback loops. Every such operation, every moment of the machine’s ‘thinking’, generates a demand for data. It is estimated that the execution of a complex business task by an autonomous agent can consume between five and 30 times as many tokens as a single interaction with a generative model.

As a result, although the price per thousand tokens will be marginalised, their total consumption within an organisation will increase exponentially, potentially leading to a situation where total spending on AI in 2030 will be higher than at times when the technology was considered a luxury.

Another aspect that requires the attention of leaders is the distinction between widely available mass intelligence and so-called ‘Frontier Intelligence’. Gartner rightly points out that while the cost of basic reasoning is moving towards zero, access to the most powerful frontier models will remain a scarce and expensive resource.

There is a significant operational risk here: many organisations today mask inefficiencies in their IT architecture by taking advantage of temporarily cheap promotional resources from cloud providers. Companies that do not take care to optimise their systems at a design level may discover that agentic scale will remain financially unattainable for them in the future.

Rather than basing the entire infrastructure on a single, most powerful engine, organisations need to learn precise task routing. Orchestration, in which routine, repetitive, high-frequency processes are delegated to small, specialised domain models, is becoming the key to efficiency.

They work faster, cheaper and often more accurately in narrow areas of competence. Frontier class models, with the highest inference cost, should only be reserved for high-margin tasks of enormous complexity, where depth of reasoning has a direct bearing on strategic market advantage.

In the new economic reality, success will not be measured by access to technology, but by the ability to utilise it economically. The traditional approach focused on purchase cost is giving way to a total cost of ownership analysis of the outcome, or ‘Cost per Outcome’. This is a fundamental shift, forcing executives to move away from thinking of AI in terms of an office tool to seeing it as a dynamic energy resource for the enterprise.

The token price deflation projected for 2030 is a real phenomenon, but its interpretation as a simple way to save money is a high-risk mistake. The real democratisation of AI is not about lowering prices, but about enabling machines to perform tasks that previously required only human involvement.

In this new hand, the winners will be those who, instead of passively waiting for cheaper invoices from technology providers, are already building flexible and differentiated architectures capable of intelligently managing data appetite. The future of AI in business is not just a matter of engineering, but more importantly of sophisticated economic strategy.