Artificial Intelligence

GPT-5 Has a Vulnerability: Its Router Can Send You to Older, Less Safe Models

Instead of GPT-5 Pro, your query could be quietly redirected to an older, weaker model, opening the door to jailbreaks, hallucinations, and unsafe outputs. The post GPT-5 Has a Vulnerability: Its Router Can Send You to Older, Less Safe Models appeared first on SecurityWeek.

GPT-5 has a Vulnerability: It May Not be GPT-5 Answering Your Call

The new GPT-5 is easy to jailbreak. Researchers have discovered the cause – an SSFR-like flaw in its internal routing mechanism.

When you ask GPT-5 a question, the answer may not come from GPT-5. The model includes an initial router that parses the prompt and decides which of the various GPT models to query. It may be the GPT-5 Pro you expect, but it could equally be GPT 3.5, GPT-4o, GPT-5-mini, or GPT-5-nano.

The reasoning behind this variability in the source of the response is probably to balance the LLM’s efficiency (by using faster, lighter and possibly more focused models on the simpler queries) and cost (GPT-5’s strong reasoning capabilities make it very expensive to run). Researchers at Adversa AI have estimated that this re-routing could be saving OpenAI up to $1.86 billion per year. But the process is opaque.

Worse, the researchers at Adversa have discovered and explained that this internal routing can be manipulated by the user to make GPT-5 redirect the query to the user’s model of choice by including specific ‘trigger’ phrases in the prompt.

Adversa has named, or perhaps more accurately described the vulnerability PROMISQROUTE, which stands for ‘Prompt-based Router Open-Mode Manipulation Induced via SSRF-like Queries, Reconfiguring Operations Using Trust Evasion’. “It’s an evasion attack on the router,” explains Alex Polyakov (co-founder and CEO at Adversa AI). “We manipulate the decision-making process, which is fairly simple, deciding which model should handle the request.”

The concept of ‘routing’ to different models is not unique to OpenAI, but other providers usually allow the user to select which model to use. It is, however, appearing more automatically in some agentic AI architectures, where one model decides how to pass a request to another.

The GPT-5 vulnerability was discovered while Adversa was benchmarking the model’s refusal mechanism. Some prompts produced unexplainable inconsistencies in the replies – leading the researchers to consider that different models were responding. They discovered that some old jailbreaks had started working again, and that a specific reference in the prompt to an older model could allow the jailbreak to work, even if GPT-5 alone would have prevented it.

This alone could have detrimental effects without any human involvement – hallucinations, for example. “Different models have different tendencies, strengths, and weaknesses. By redirecting a request to a less capable or less aligned model, the likelihood of hallucinations or unsafe outputs can increase,” explains Polyakov.

However, the real danger comes when a malicious hacker can trigger the router to query a model less safe than GPT-5 Pro into jailbreaking GPT-5 Pro. “Suppose someone tries to use a jailbreak prompt on the latest GPT-5, but it fails because of GPT-5’s stronger safeguards or reasoning, which more often than not will decline a malicious request. An attacker could prepend a simple instruction that tricks the router into sending their request to an older, more vulnerable model. The jailbreak that previously didn’t work might then succeed, because it’s executed on that older model.”

GPT-5 Pro on its own is stronger than its predecessors, but this vulnerability in the routing mechanism makes it only as strong as its weakest predecessor.

Solving the problem would be simple by eliminating the automated routing to weaker models, but that is not an attractive business proposal. Responses from GPT-5 would be slower, making the model less attractive to users addicted to the speed of earlier models, while the cost of running GPT-5 on every query would infringe on OpenAI’s profit margins.

But at least, suggests Polyakov. “GPT-5 should be done more securely, either by having a guardrail before the router making the router more secure; by making all models secure and safe, not just the most complex reasoning one – or ideally doing both of the above.”

Latest News

Publisher