Artificial Intelligence

Should We Trust AI? Three Approaches to AI Fallibility

Experts unpack the risks of trusting agentic AI, arguing that fallibility, hype, and a lack of transparency demand caution—before automation outpaces our understanding. The post Should We Trust AI? Three Approaches to AI Fallibility appeared first on SecurityWeek.

The promise of agentic AI is compelling: increased operational speed, increased automation, and lower operational costs. But have we ever paused to seriously ask the question: can we trust this thing?

Agentic AI is a class of large language model (LLM) AI that can respond to inputs, set its own goals, and interact with other tools to achieve those goals – without necessarily requiring human intervention. Such tools are generally built on top of major generative AI (gen-AI) models typified by ChatGPT; so, before asking if we can trust agentic AI, we should ask if we can trust gen-AI.

And here’s our first problem: nobody really understands how gen-AI works, not even the scientists and engineers who developed it. The issue is described by Neil Johnson, a Professor of Physics at George Washington University: “I’ll try this – oh, that didn’t work. So, I’ll try this – oh, that didn’t work. Oh, this works. Okay, I’ll do that, and then I’ll build on that, and then I’ll build on that, and I’ll go through this iterative process and just make it better and better and better. Why would I trust that it’s not going to go wrong when all I’m looking at is the net effect of the things that did work?”

From observation, we know that gen-AI doesn’t always work as intended. It ‘hallucinates’. It is designed to provide an answer. It never knows whether that answer is right or wrong; it has no concept of truth or morality or ethics. It could be wrong for many reasons: there is bias or flat-out error in the data on which it is trained, there is bias or flat-out or subtle error in its internal algorithms, there is bias in the user input to which it responds…

The most recent example of gen-AI going wrong can be found in Grok. For a short time, it tended to pivot from barely relevant prompts to include references to white farmer genocide in South Africa with no evidence (there is none).

“Generative AI often speaks with confidence, even when it’s wrong. This is because it’s trained to predict likely next words, not ground truth. It doesn’t know it’s hallucinating – there’s no built-in epistemic humility,” explains Alex Polyakov, co-founder and CEO at Adversa AI.

Musk has said the problem was caused by an unauthorized modification to the program. How, why, or by whom is not explained – however the problem is not that it shouldn’t have happened but that it could happen; and if it could happen here, it could happen elsewhere and in other LLMs.

The three approaches: Polyakov, Kolochenko, and Johnson

This potential to go wrong is then accentuated in the agentic AI extension of LLMs. “These systems take actions in the real world – browsing, emailing, coding – based on goals they interpret from prompts. But they don’t deeply understand context, safety boundaries, or when they’re going off the rails,” continues Polyakov. “You’re essentially giving a clever intern the keys to production… blindfolded and without supervision.”

The problem with LLMs is that they mostly work but sometimes don’t – and we cannot easily tell which it is doing. We don’t know when or why it is right or wrong. The danger in agentic AI is that a wrong response can become an autonomous, unsupervised and potentially damaging action. Yet agentic AI is blossoming everywhere because we assume it works correctly and, anyway, it is saving us so much money.

Learn More About Securing AI at SecurityWeek’s AI Risk Summit – August 19-20, 2025 at the Ritz-Carlton, Half Moon Bay

The result, according to Ilia Kolochenko, CEO at ImmuniWeb and adjunct professor of cyber law & cybersecurity, is an over-heating market with vast amounts being spent on faith rather than sound logic – and he sees this as an AI Bubble mirroring and likely to follow the same path as the Great Dot-Com Bubble that burst in March 2000.

Unaware of the longer term danger, and intent on maximizing the short term benefit, our efforts are focused on remediating the symptoms of the weakness rather than abandoning the technology. For agentic AI, this largely revolves around applying human oversight and intervention to a system designed to be automatic – which is a contradiction in terms. It is almost certainly doomed to fail. We struggle to ensure security by design in software development, and we cannot prevent logic flaws in code. A primary cause is pressure from business leadership to complete tasks as fast and as cheaply as possible – that pressure will be repeated in human oversight of, and intervention in, agentic AI implementations; we’ll take shortcuts.

That doesn’t mean there is no good advice on using AI despite its fallibilities. Polyakov suggests that we can trust gen-AI “as a creative co-pilot, not a source of facts. It’s like a brainstorming partner: great for first drafts, useless as a final editor unless cross-checked.” Also, he adds, “when paired with retrieval augmented generation (RAG) models, its grounding improves.” While this has some truth, we should remember that Polyakov’s comment applies to his ‘brainstorming partner’ usage – it doesn’t solve the LLM problems in general.

Kolochenko accepts that RAG provides a slight improvement, but says, “I don’t think it will be the ultimate solution. When you do augmentation, you still need data; and you will never have perfect data. So, it may bring improvement in terms of quality, and it may reduce some problems – but I don’t think it will prevent hallucinations, discrimination, bias, and whatever else we already have in AI.”

Polyakov’s advice for agentic AI usage is based more on reducing our reliance rather than increasing our oversight. We can have limited trust in agentic AI, he suggests, “In controlled environments, like simulations or sandboxed productivity tools (for example, scheduling meetings, summarizing documents), where human review is already always in the loop. They are also good in coding because for code to be ‘right’ it should be compilable. So, if the code can be compiled and executed, it most probably can be trusted to work.”

Kolochenko places his faith in the future: the bubble bursting and the passage of time will provide the solution. It will not make AI more trustworthy, but it will teach us how, where and when we can use it safely and securely. AI will be designed to help real users rather than chase elusive, and expensive dreams.

“I think we are observing the second episode of the dot-com bubble. People believe in miracles. They need magic in their lives because otherwise life is boring. When they believe they’ve found this magic, they think life is great and everyone can be a billionaire. They’ll blindly follow the arrows that are laid down for them, instinctively, because this is how our brains work. So now we have everybody, including C-level executives of the largest companies, over excited and thinking, ‘Goodness, with AI, we’ll make huge profits; we’ll do this and make that.’ But very few of them understand how AI works.”

The dot-com bubble bursting did not stop the internet, it refocused it more sustainably. We’ve had huge and beneficial developments through responsible investment, including search engines, e-commerce, cloud computing, social media, mobile computing, web2 and web3 coming after the dot-com bubble. There’s still much wrong with the internet, but society is better off with it than without it.

Kolochenko believes that AI will follow the same pattern. “I believe that once this hype around AI has disappeared, and I think it will probably happen soon, we will again have some interesting tools. For example, journalists will be able to use faster spell checkers. Don’t dismiss it. Current, or should I say native, spell checkers are somewhat simplistic or primitive. An AI spell checker will likely detect the wrong word even if correctly spelled and subtle semantic errors. That will save time and improve the output of authors who don’t trust current gen-AI to create their output.”

If you look at optimistic elements of trust in AI from both Polyakov and Kolochenko, there is one major common factor: the trusted AI apps are all self-contained, have a single purpose, and work with the user (and therefore have human oversight) rather than working instead of the user.

This is vastly different from the currently emerging crop of agentic AI apps, which are expected to autonomously complete complex rather than singular tasks with complicated and diverse actions and reactions, without human intervention. It is here that Kolochenko completely loses trust.

“To successfully manage something, you need to be at least as smart as what you are managing,” he comments. “You can give a chimpanzee a transmission electron microscope designed for scientific research but that doesn’t mean the chimpanzee will be able to do scientific research. A microscope is an advanced tool, but if you don’t know how to use it, it is worthless.”

Kolochenko isn’t comparing human users to chimpanzees but pointing out the mismatch between the complexity of AI tools being offered, and the relatively simple requirements of most users. He believes the current AI bubble will burst, and many companies will suffer – but it will teach and force us to realign AI with users’ needs rather than some complex, flashy, cool but unmanageable operation.

Most of today’s advice on AI concerns mitigating its fallibility. Ultimately, it is something we must accept and learn to live with since we are told AI is a probabilistic machine. Johnson takes a different approach. Arthur C. Clarke said, ‘magic is just science that we don’t understand yet.’ Neil Johnson suggests that probability is just determinism we don’t understand yet. If he’s right, and if we can understand the underlying deterministic rules of AI, we can live with the fallibility because we will know when, where, why and how it happens. We will learn how to live with AI and trust it where it can be trusted.

“As humans,” he comments, “we think we know what is likely to happen because we’ve been paying attention to what has already happened. And that’s exactly what the machine does. It pays attention to things that it’s seen before to decide where this is heading. All of that is completely deterministic.” At the end of this process, it has a choice of possible pathways to go down each with different weightings. It generates a random number. But even that random number is deterministic since classical computing cannot do true randomness. Then it uses the random number and the weightings to decide which next path to take.

He likens the whole process to chaos theory. Even though it is all deterministic, it is so complex that we cannot follow the determinism and call it probability instead. “You’re right not to trust it; but that luck of trust is really asking the question, ‘Why the heck hasn’t science sorted out some explanation of what’s going on?’” This is the task he has set himself, because AI is a machine, and machines obey rules – even if we don’t know what they are.

“I’m staring at this thing right now. I’m literally taking apart GPT2 to figure out when it starts to go down cul de sacs, and when it runs free and does great things. And instead of just hoping, I’m trying to pin down the conditions under which it does one thing rather than another. I think that’s just what basic science is.”

It’s no easy task because of the complexity of the AI process. The source of ‘wrong turns’ is usually based on hidden bias, which is also deterministic, but can come from many different sources: the learning data, the internal algorithms, the prompts, adversarial intrusions… (Incidentally, on adversarial intrusions, new research from Synapsed shows all ten of the top ten LLMs contain vulnerabilities from the OWASP Foundation’s Top 10 LLM Vulnerabilities framework. We don’t even know if the cul de sac is native or enemy generated.)

Neil Johnson, Professor of Physics at George Washington University

But the reward for success is high. Understanding where it goes wrong would mean confidence in our risk assessment on whether to accept the outcomes.

Summary

“Trust in AI isn’t binary – it’s contextual,” says Polyakov. “Do you trust it to give you facts? No, unless it cites sources you can verify. Do you trust it to act autonomously? Only in narrow, sandboxed domains. Do you trust it to replace human judgment? Absolutely not. But to augment it? Yes, if you know its limits.”

Kolochenko believes that AI is overhyped and hasn’t really achieved anything – but he hopes it may do so in the future. “They are selling interesting ideas. They promise to make the world better, to solve all the unsolved problems of humanity, to stop cancer, to start curing AIDS. But my question is this: apart from generating child pornography, fake IDs, and harmful content, has it managed to invent a vaccine against cancer; has it solved the problems of poverty and hunger?”

Nevertheless, he adds, “I believe that once this hype around AI disappears, and I think it will probably happen soon [after the AI bubble bursts], we will have some interesting tools.”

Johnson takes a pragmatic, scientific view. “It’s all about risk and trust, and that conversation hasn’t been sorted. It doesn’t mean we shouldn’t use AI, but we haven’t been given enough information about it because the companies themselves don’t understand it. This is why we must lift the lid on it, so we know where we can trust AI, and where we shouldn’t trust AI.” Only then can we make informed risk decisions on how to use it safely.

With all this concern, there is something surprisingly prescient in Mr Weasley’s admonition of his daughter Ginny: “What have I always told you? Never trust anything that can think for itself, if you can’t see where it keeps its brain.” Even the title of the book is fitting: ‘The Chamber of Secrets’.

Learn More About Securing AI at SecurityWeek’s AI Risk Summit – August 19-20, 2025 at the Ritz-Carlton, Half Moon Bay

Latest News

Publisher