Why are AI agents becoming the target of cyber attacks? Trend overview 2026

The shift from passive language models to autonomous AI agents radically expands the attack surface through vulnerability to Indirect Prompt Injection, where malicious instructions are retrieved directly from the external content being analyzed. The scale of this phenomenon, documented by a 32% quarterly increase in incidents, is shifting cybersecurity priorities, making semantic resilience a key element in protecting the digital assets of the modern enterprise.

6 Min Read
cyberbezpieczenstwo

Over the past eighteen months, the enterprise sector has moved from a fascination with generative artificial intelligence to a phase of actively implementing it into operational processes. A key trend in this evolution is the shift from passive language models (LLMs) to AI agents – autonomous systems capable not only of generating text but also of performing tasks: writing code, managing email communications, calling APIs or authorising financial transactions. With this agility, however, comes a critical new category of threats: Indirect Prompt Injection (IPI). Recent data from reports by Google and Forcepoint shed new light on the scale and sophistication of these attacks, suggesting that agent systems security will become one of the biggest challenges for chief information security officers (CISOs) in the coming years.

IPI mechanism: Data as instructions

Traditional prompt injection attacks relied on direct manipulation of the model by the user (e.g. attempting to ‘jailbreak’ a bot by giving it the command to ignore security). Indirect Prompt Injection is a much more insidious phenomenon. It involves inserting malicious instructions into content that the AI agent processes as input – this could be web pages, PDF documents, emails or code repositories.

The problem lies in the very architecture of current LLM models, which cannot absolutely separate system instructions (issued by the tool developer) from external data. When an AI agent analyses a web page in search of information, it may come across hidden text, which the model will interpret as a new overarching command. As a result, the attacker takes control of the agent’s logic, instructing it to, for example, send sensitive data to an external server or perform a destructive operation on the user’s file system.

Analysis of market trends

Google Security Research researchers, analysing CommonCrawl resources, point to an alarming trend. Between November 2025 and February 2026, there was a 32 per cent increase in the number of detected malicious injection attempts in publicly accessible web resources. This relatively short time frame demonstrates the dynamism with which the criminal community is adapting to new technologies.

From a market perspective, Google’s observation on cost-benefit calculus is key. Until recently, IPI attacks were considered the realm of academic research – they were difficult to implement and often failed due to the instability of the results generated by AI. Now, with the increased reliability and agility of agents, these attacks are becoming ‘viable’. AI’s ability to autonomously call external tools (tool calling) means that a successful injection of instructions has an immediate and measurable financial or operational impact.

The Google study allowed the current IPI trials to be categorised into five groups:

  1. Harmless jokes: Attempts to change the tone of an agent’s response.
  2. Helpful tips: Suggesting preferential answers to the model (often on the edge of ethics).
  3. Optimisation for AI (AI-SEO):Hidden phrases to position products in assistants’ responses.
  4. Deterring agents: Instructions prohibiting AI from indexing or summarising a particular page.
  5. Malicious attacks: Data exfiltration and sabotage (deletion of files, destruction of backups).

Although the latter are often at an experimental stage at present, their increasing complexity suggests that it is only a matter of time before they enter the phase of mass attacks.

From coding assistants to financial transactions

The Forcepoint report provides concrete evidence of how IPI manifests itself in professional software and financial tools. Experts have identified ten verified indicators of attacks targeting popular tools such as GitHub Copilot, Cursor and Claude Code.

The attack scenario is mundane: a programmer uses an AI agent to analyse a library or documentation on an external site. This site contains a hidden AI instruction. When the agent ‘reads’ the site, it is instructed to execute a command in the terminal that destroys local backups. Since the agent has permission to operate on the file system (which is essential in a programmer’s job), the command can be executed without additional verification.

Even more dangerous are attempts at financial fraud. Forcepoint points to cases where complete transaction instructions are sewn into web content, e.g. PayPal.me links with a predefined amount along with step-by-step instructions on how the agent is to finalise the payment. In systems where AI has access to digital wallets or corporate payment systems, the risk of capital loss becomes immediate.

The paradox of detection and the challenges for business

One of the most worrying findings from the Forcepoint report is the so-called detection paradox. The phrases and keywords used by attackers to inject hints are identical to the terminology the cyber security community uses to describe and analyse these threats. This renders simple filters based on word blacklists ineffective – either blocking legitimate expert communications or letting intelligently worded attacks through.

Share This Article