Artificial Intelligence

Grok-4 Falls to a Jailbreak Two Days After Its Release

The latest release of the xAI LLM, Grok-4, has already fallen to a sophisticated jailbreak. The post Grok-4 Falls to a Jailbreak Two Days After Its Release appeared first on SecurityWeek.

The latest release of the xAI LLM, Grok-4, has already fallen to a sophisticated jailbreak.

The Echo Chamber jailbreak attack was described on June 23, 2025. xAI’a latest Grok-4 was released on July 9, 2025. Two days later it fell to a combined Echo Chamber and Crescendo jailbreak attack.

Echo Chamber was developed by NeuralTrust. We describe it in New AI Jailbreak Bypasses Guardrails With Ease. It uses subtle context poisoning to nudge an LLM into providing dangerous output. The methodology is shown below.

The key element is to never directly introduce a dangerous word that might trigger the LLM’s guardrail filters.

Crescendo was first described by Microsoft in April 2024. It gradually coaxes LLMs into bypassing safety filters by referencing their own prior responses.

Echo Chamber and Crescendo are both ‘multi-turn’ jailbreaks that are subtly different in the way they work. The important point here is that they can be used in combination to improve the efficiency of the attack. They work because of LLMs’ inability to recognize evil intent in context rather than individual prompts.

NeuralTrust researchers attempted to jailbreak the new Grok-4 guardrails using Echo Chamber to trick the LLM into providing a manual to produce a Molotov cocktail. “While the persuasion cycle nudged the model toward the harmful goal, it wasn’t sufficient on its own,” writes the firm. “At this point, Crescendo provided the necessary boost. With just two additional turns, the combined approach succeeded in eliciting the target response.”

Provided you understand how the two individual jailbreaks work, integrating them is simple. In their testing, NeuralTrust began with Echo Chamber and an initial prompt that would detect ‘stale’ progress in the persuasion cycle. At this point, Crescendo techniques are brought into play. “This additional nudge typically succeeds within two iterations. At that point, the model either detects the malicious intent and refuses to respond, or the attack succeeds, and the model produces a harmful output.”

As with all jailbreaks, nothing is 100% successful at all attempts. Nevertheless, the researchers tested the combined Echo Chamber and Crescendo jailbreak method against other ‘forbidden’ outputs from Grok-4. It was successful on many occasions. For Crescendo’s Molotov cocktails it achieved a 67% success rate. For the Crescendo ‘meth’ (methamphetamine synthesis) test, it achieved a 50% success rate. For the Crescendo ‘toxin’ (toxic substances or chemical weapon synthesis) test, it achieved a 30% success rate.

The worrying element is that even the latest LLMs cannot guard against all existing jailbreak methodologies, with Grok-4 being defeated just two days after its release. “Hybrid attacks like the Echo Chamber + Crescendo exploit represent a new frontier in LLM adversarial risks, capable of stealthily overriding isolated filters by leveraging the full conversational context.”

The continuing battle of safe and secure LLMs versus attacker ingenuity shows no sign of abating.

Learn More About Securing AI at SecurityWeek’s AI Risk Summit – August 19-20, 2025 at the Ritz-Carlton, Half Moon Bay

Latest News

Publisher