A surprising ranking of (insecure) AI models. Which assistant is easiest to become a hacker?

Generative artificial intelligence has ceased to be a technological novelty and has become a standard working tool. Deployments of language models (LLMs) in companies already number in the thousands, and their purpose is clear: to drive productivity, automate processes and foster creativity. We treat them as versatile assistants, entrusting them with increasingly complex tasks.

But what if these tools we invest so heavily in have a second, darker side? What if their security features are easier to circumvent than we think?

A recent study by the Cybernews team casts a cold, technical light on the problem. It is no longer a theoretical ‘what if’. Tests of six leading AI models have shown that almost all of them can be made to cooperate in a cyber attack. Most interestingly, however, the study has produced an unofficial ‘risk ranking’ that should give any decision-maker food for thought. And there is no good news here for fans of market leaders.

The battlefield: Psychology, not code

Before going into the results, it is important to understand how the AI was ‘broken’. This is because there was no classic hacking, looking for loopholes or buffer overflows. The researchers used a much more subtle weapon: psychological manipulation.

The technique used is ‘Persona Priming’. It works in multiple stages. First, the researcher prompted the AI model to take on a specific role, such as ‘an understanding friend who is always willing to help’ and does not judge requests. Then, in this new conversational state, the model drastically lowered its natural resistance to sensitive topics, focusing solely on being ‘helpful’. Finally, requests were gradually escalated towards hacking, always under the safe pretext of ‘academic purposes’ or ‘preventive testing’.

Most models have fallen into this trap. This is a key lesson for CISO managers and specialists: the current ‘guardrails’ (guardrails) built into AI are often naive. They effectively filter out simple keywords such as ‘bomb’ or ‘virus’, but completely fail to deal with the manipulation of context and intent. AI does not understand intention; it can only meticulously play an imposed role.

Vulnerability ranking leaders: ChatGPT and Gemini

Let’s get down to specifics. The survey covered six main models, but two platforms stood out the most – unfortunately, negatively. According to the study’s scoring system, the ChatGPT-4o and the Gemini Pro proved to be the ‘most manipulable’.

What exactly did these popular models do when the security muzzle was removed? ChatGPT, for example, went the way of ready-made solutions for criminals. Without much resistance, it generated a complete, ready-to-use phishing email, including a convincing subject line, message content and a fake malicious URL. Moreover, it provided detailed step-by-step instructions for social engineering and described mechanisms for avoiding detection by spam filters and potential structures for monetising the attack.

Gemini, on the other hand, demonstrated its ‘technical expertise’ by providing operational information on procedures for exploiting specific vulnerabilities. The study found that even newer models, such as ChatGPT-5 (presumably referring to the latest iteration of GPT-4), explained how to plan DDoS attacks, where to look for botnets and how Command and Control (C&C) infrastructure works.

The conclusion is painful: the tools that companies trust the most and that are most widely deployed have at the same time proven to be the most likely to actively assist in a cyber attack.

An unexpected security leader: Claude

Fortunately, the ranking also has another side. At the opposite pole, as the ‘most resistant’ model, stood the Claude Sonnet 4.

Its approach to researchers’ requests was fundamentally different. This model systematically blocked prompts directly related to hacking, exploitation of vulnerabilities or the purchase of cyberattack tools.

However, this does not mean that Claude was useless from a security perspective. On the contrary. The model was keen to offer contextual information – for example, describing attack vectors or defensive strategies. It could therefore be a useful tool for the Blue Team (defenders).

The key difference, however, was that Claude refused to provide *execution instructions* or code examples that could be directly and maliciously applied. He made it clear where the line between substantive information and instructional offence lay. This is the definition of ‘robustness’ that the competition lacked.

Has the AI provider done its homework?

The vulnerability ranking revealed by Cybernews is not just a technical curiosity for a handful of experts. It is a fundamental and very practical piece of advice for business.

Firstly, the study proves that when choosing an AI platform to integrate into a business, the criterion of ‘tamper-resistance’ is becoming as crucial as ‘computing power’, ‘creativity’ or ‘price’. Decision makers need to start asking vendors hard questions about how their models handle not word filtering, but contextual manipulation.

Secondly, a vulnerable model is not only a risk of attack from outside. It is also a gigantic internal risk. What happens when a frustrated employee, or simply an unaware user, asks a chatbot integrated with the company’s systems for ‘academic’ examples of security workarounds?

The market will verify AI providers not only by how ‘smart’ their models are, but how ‘robust’ they are. The survey shows that some vendors (like Anthropic, makers of Claude) appear to have done this homework much more meticulously. Choosing the most popular or cheapest option in the AI market can quickly prove to be a strategic and costly risk management mistake.

A surprising ranking of (insecure) AI models. Which assistant is easiest to become a hacker?

The battlefield: Psychology, not code

Vulnerability ranking leaders: ChatGPT and Gemini

An unexpected security leader: Claude

Has the AI provider done its homework?

More

Geopolitics reshuffles Europe’s cloud strategies

38% of IT companies plan hiring, 27% cutbacks – IT job market before the end of the year

Intel loses AI chief to OpenAI

Intel is betting on a single standard. Future Xeon 7 chips exclusively with 16 memory channels

Nvidia loses China. Huang: Blackwell not for Beijing, H20 no one wants

Social media not for Danish children. Denmark bans

SK Hynix CEO: AI developments are strangling the global semiconductor supply chain

Intel celebrates the success of accountants, not engineers