Agentic AI looks warm and inviting, but it comes with serious security issues.
AI agents promise to provide automation of processes, and autonomous detection, triaging and reaction / remediation to external threats and attacks – the ultimate process and security automation tool operating at machine speed. Organizations are rushing to adopt agentic AI; but it brings a new attack surface that is still not widely understood.
(‘Agentic AI’ and ‘AI agents’ are not strictly synonymous but are both used interchangeably here to imply ‘agentic AI’.)

Agentic AI is the term for autonomous AI agents designed to complete complex tasks by mimicking human decision making processes through interaction with external systems (data sources for input, and other systems for output). “They are ideal for cybersecurity application,” says Nicole Carignan, SVP security and AI strategy, and field CISO at Darktrace.
“Agentic systems use a combination of various AI or machine learning techniques to ingest data from a variety of sources, analyze the data, prepare a plan of action (autonomous or recommended), and take action,” she explains.
When the agent simply delivers recommendations, the result is an AI-assisted human. When it works autonomously, it could provide automated security at machine speed. “In cybersecurity, these systems can be used to autonomously monitor network traffic, identify unusual patterns that might indicate potential threats, and take autonomous actions to respond to possible attacks. Agentic systems can also handle incident response tasks, such as isolating affected systems, patching vulnerabilities, as well as triaging alerts in a SOC.”
But she adds that the advantages also come with challenges, specifically noting inherited bias, possibility of hallucinations, technical complexity, and susceptibility to external manipulation through malicious prompt injections. “These vulnerabilities introduce new attack surfaces that traditional defenses may not cover.”
The new agentic AI attack surface
Many of agentic AI’s security issues come from three areas: the extent of autonomy granted to them, their reach, and the common use of an LLM as the reasoning engine. LLMs are not free of hallucinations and are still susceptible to manipulation by malicious prompt injection.
The primary purpose of the agent is to automate human activity based on AI reasoning, so it is granted extensive freedom of action. AI reasoning is based on learning, so the agent is granted widespread access to existing tools and applications to increase its pool of understanding. And (usually) an underlying LLM is used for automated decision-making based on its own knowledge and that gathered by the agent. The agent can then act on the situation without human involvement.
Automated action from products is nothing new. Machine learning security tools have existed for many years – able, for example, to isolate endpoints and shut down processes whenever an active threat is detected. For the most part, security professionals have been wary of this autonomy and have set the ML’s control to ‘alert only’. They have insisted on having ‘a human in the loop’. Agentic AI threatens to remove or minimize this possibility.
Consider the driverless car. In some places at some times, passengers have no option but to use them. Once inside, there is no manual override – the passenger can do nothing but trust the installed software. Recent incidents suggest that driverless cars are not yet problem free. Agentic AI is currently on the same path.
In June 2025, Aim Labs discovered a zero-click vulnerability (EchoLeak, CVE-2025-32711) against an AI agent: Microsoft’s Copilot. Copilot is a productivity assistant designed to enhance users’ interaction with Microsoft applications, and has widespread access to installed apps, including emails.
The attack involves sending the target victim a ‘useful’ email with disguised malicious prompts included. The target needs neither read nor open the email – but Copilot can. If the agent decides the content is useful to a current interaction with its user, it will consume the content including the malicious prompts – and will act in accordance with those prompts. It may now be instructed to quietly gather and silently exfiltrate sensitive user data.
This vulnerability has been fixed by Microsoft, but it is an example of how agentic AI’s autonomy, reach, and LLM manipulation can be combined to turn a helpful agent into a new threat, without human oversight and potentially no awareness from the victim.
MCP
MCP is the Model Context Protocol introduced by Anthropic in November 2024. It’s an open standard designed to help AI models integrate with external tools and data sources. It is already widely and increasingly being adopted for use with agentic AI – but it is not without problems.
Greg Notch, CSO at Expel, has a fundamental warning: “The term ‘agentic AI’ is a distracting misnomer from a security perspective. What’s often mislabeled as ‘agentic’ is better described as orchestration… Focusing on ‘agency’ can distract security efforts from the true vulnerabilities, which lie in the complex interconnections and external tools an orchestrated AI system uses.” If agentic AI is an orchestra, then MCP is the conductor.

It is a protocol, not a vulnerability – but its complexity in use can lead to the introduction of vulnerabilities and misconfigurations with far reaching effects. For example, as described by Adversa.ai, Asana introduced an MCP server on May 1, 2025. On June 4, it discovered flaws and shut down the server after a 34-day silent exposure window.
While there is no evidence of any malicious activity (MCP is as new to hackers as it is to legitimate business), around 1,000 of Asana’s 130,000 enterprise customers were exposed to cross-organizational data exposure fundamentally caused by a ‘confused deputy bug’. Asana’s remediation costs are estimated at $7.5 million and future compliance implications remain a possibility.
An example MCP attack can be found in research discussed by Invariant Labs and other researchers during May 2025, involving GitHub and the GitHub MCP.
Here, an attacker could prepare the ground by posting new content into a public repository. It would contain a hidden but malicious prompt. An AI agent could subsequently and legitimately connect to GitHub’s MCP with a benign request such as to check for open issues in repositories. MCP would facilitate this since that is its purpose. But when the agent checks the seeded repository, it would receive the hidden malicious prompt and react accordingly. With these new instructions, it could be directed to access and exfiltrate sensitive data from private repositories that the attacker would not normally be able to access.
One month later, June 25, 2025, Backslash revealed details of research into MCP. It examined around 7,000 publicly available locally executed MCPs and found – for example – hundreds of servers explicitly bound to all network interfaces (0.0.0.0) and consequently accessible to anyone on the same local network. It’s “like leaving your laptop open – and unlocked for everyone in the room,” it reported.
Dozens of MCPs also allow arbitrary command execution; so, if the MCP server is compromised, you could lose your own operating system. Where these two conditions exist on the same server (it does occur) it’s game over to any attacker with access to the local network.
Learn More About Securing AI at SecurityWeek’s AI Risk Summit – August 19-20, 2025 at the Ritz-Carlton, Half Moon Bay
A further risk found by BackSlash (but at this time of writing, still waiting responsible disclosure with the underlying LLM provider), affects tens of thousands of users through a silent connection between the MCP and the LLM without proper boundaries. This can provide a pathway for prompt injections that could result in misleading data or agent logic rerouting.
These are not the only MCP issues. The Vulnerable MCP Project maintains a list of all known vulnerabilities – and it’s longer than you might expect. Adversa.ai has also published a list of the 12 most common root-cause MCP security issues in MCP Security Issues and how to fix them.
Opet’s open letter and the authorization issue
Patrick Opet, CISO at JPMorgan Chase had foreseen problems in his open letter to third-party suppliers, published at the end of April 2025. It is primarily a call for improved security by design in all products but includes a specific agentic AI reference. He is concerned about the effect of new developments on authentication and authorization.
“As a generic example,” he wrote, “an AI-driven calendar optimization service integrating directly into corporate email systems through ‘read only roles’ and ‘authentication tokens’ can no doubt boost productivity when functioning correctly. Yet, if compromised, this direct integration grants attackers unprecedented access to confidential data and critical internal communications.”

In practice, he continued, “These integration models collapse authentication (verifying identity) and authorization (granting permissions) into overly simplified interactions, effectively creating single-factor explicit trust between systems on the internet and private internal resources.”
Oded Hareven, co-founder and CEO of Akeyless, agrees. “Agentic AI introduces new attack surfaces due to its ability to execute tasks independently, especially across multiple systems via APIs. Unlike traditional systems, these agents can issue commands, generate infrastructure changes, or move data – all without human verification,” he says.
“The AI’s chaining of actions across services also makes authorization boundaries fuzzier, increasing the risk of unintended consequences. The use of static or overly permissive credentials, combined with minimal oversight, amplifies the blast radius of a compromise.”
Minimizing the new attack surface
Curate the agents
“I think the first step is to curate the type of agents that you’re working with,” suggests Yoav Landman, CTO and co-founder of JFrog. It’s not just a case of being careful what you initially choose but also being certain that the newest and latest version doesn’t introduce errors, new threats or unexpected actions.

It is possible that the OSS building blocks used in the development of in-house agents may be malicious from the get-go (through fake name typosquatting, such as OIIama vs Ollama), while fake upgrades could replace genuine versions through automatic build tool grabs.
While this is good advice, the problem is it butts up against the economics of automation. There is such a rush to automate through AI agents that business leaders are pressuring IT and security to implement and use agents at speed lest they lose competitive edge to automated competitors. And wherever there is haste, there is the potential for cutting corners and making mistakes – and ’No’ remains a difficult response to superiors.
Man in the loop
A common perception is that use of agentic AI will be safe given human oversight. This is the ‘human in the loop’ argument.
“AI requires human oversight, context, and course correction; otherwise, it simply accelerates bad decisions,” says Chad Cragle, CISO at Deepwatch.
“It is important that these AI agents are monitored and have the ability to rollback any tasks they execute,” suggests Kris Bondi, CEO and co-founder of Mimoto. “There must be a way for a human to be inserted into a process if needed.”
“In cybersecurity, it’s well known but hardly discussed that eventually you have to trust someone,” says Tim Youngblood.
The question whether a human can make better decisions than a well-functioning AI, or even detect bad decisions made by a poorly functioning AI is, however, debatable. And having a salaried person or persons monitoring every AI action flies in the face of automation: why have an autonomous tool if you won’t allow it to be autonomous? While organizations may start with the idea of having humans in the loop, the pressure of economics will make this increasingly difficult to justify. Especially since the human in the loop may be just as fallible, if not more so, than the AI agent.
Oversight, by definition, implies the ability to see over or into something. If an agentic AI has been compromised and manipulated by malicious prompt injections, an overseer is unlikely to have visibility. “If an agent is telling you one thing, the overseer could okay it while behind the scenes the agent is doing something completely different, or additional, or nefarious – then of course the human in the loop is going to be fooled,” comments Landman.
Notch, however, believes that a human in the loop is a serious and probably necessary solution. “The largest gains so far are in AI-augmented humans rather than autonomous AI acting alone.” It’s just a tool.
“Humans are needed to make sure the AI models stay fit for purpose. AI is still a tool which needs calibration, adjustments, and inputs to ensure it works properly and as expected. It’s not a technology that can be turned loose to handle security all on its own.”
Guardrails
Like ‘human in the loop’, strong guardrails are often claimed to be the route to safe AI agents. The human in the loop is itself a guardrail, and the need for additional guardrails is well recognized.
“We hear about AI misclassifying threats, over-responding to benign events, and struggling with edge cases. The lesson is clear: agentic AI requires strong guardrails,” suggests Cragle.
“There are many startups and many initiatives trying to provide some sort of guardrails or protection, whether at runtime or through automated red teaming on agents,” says Landman. “But it’s nascent and very hard; so, it’s still an unsolved problem.”
David Benas, principal security consultant at Black Duck, comments, “There’s nothing inherently unique about securing agentic AI compared to a base gen-AI system, but the scope of problems is magnified given its autonomous access to the ‘world’ around them. In the near term, strict guardrails need to be put on the functionality of agentic AI, to ensure that the scope and impact of issues arising from their failure/breach/security mishaps are limited and manageable.”
Typical guardrails could include contextual isolation to prevent confused deputy attacks; recognition and redaction of PII and sensitive data to prevent possible compliance issues; employing strictly defined APIs, MFA and least privilege for access to the agent to control access and authorization (Hareven suggests, “enterprises must enforce zero trust principles for machine identities”); invoking explicit human approval before high stakes actions (a human in the loop), and more.
But it is worth considering that more than 2 1/2 years after the initial ChatGPT release, guardrails have failed to prevent hallucinations and jailbreaks in the major LLMs. With LLMs an important part of agentic AI, there are no guardrails that can guarantee against malicious prompt injection leading the agent to silently perform the hacker’s instructions when obeying what it is now told to do outside of the user’s visibility.
Nevertheless, Notch is ultimately positive. “Guardrails can take the form of restricting what data the AI has access to (although this reduces its capability). It can also take the form of monitoring inputs and outputs. Another category of constraints and guardrails are restriction of prompts and inputs. None of them are perfect – it’s all very early days for agentic AI – but I expect we’ll see rapid improvements, much like other security controls have evolved over the past few years.”
Less haste, more planning
To paraphrase and reverse Presley’s old rock demand, implementing agentic AI requires ‘a little less haste, a little more planning, please’.
Notch suggests that part of that planning should include a data classification program. “Agentic AI relies on whatever data it can access to produce results, so it’s time to get really clear about what it can see and how it’s being used. If you don’t already have a data classification and governance program in place, get one.”
Hareven adds, “Don’t rush into broad deployment – secure usage is a competitive advantage, not a bottleneck. Assign cross-functional ownership between security, engineering, and AI teams to continually assess risks. Prioritize governance over speed to scale agentic AI responsibly.”
The need for speed is arguably a link weaker than the end user.
Will Agentic AI ever be safe?
Agentic AI is like King Richard III: “Deform’d, unfinish’d, sent before my time into this breathing world, scarce half made up…” But today, beyond the reach of Shakespear’s Tudor propaganda, the modern scholarly view of Richard is that he was a capable administrator, military leader, and progressive legal reformer. Context is vital in all things.
Our current thoughts on agentic AI will change as its context evolves with greater understanding, use, and controls. Today, as with most new technology, it can be described as ‘the wild west’. This may be true as we write – but the original wild west was eventually tamed through maturity and effective rule enforcement. The same will happen with agentic AI – eventually. Meanwhile, we must understand and mitigate the lawlessness of this new attack surface as best we can.
Learn More About Securing AI at SecurityWeek’s AI Risk Summit – August 19-20, 2025 at the Ritz-Carlton, Half Moon Bay
Related: Beyond GenAI: Why Agentic AI Was the Real Conversation at RSA 2025
Related: How Hackers Manipulate Agentic AI With Prompt Engineering
Related: How Agentic AI will be Weaponized for Social Engineering Attacks
Related: Mitigating AI Threats: Bridging the Gap Between AI and Legacy Security

