AI language model replicated by poems – DW – 12/16/2025

This result was a surprise for researchers Ikaro Lab in Italy. They set out to investigate whether different language styles – in this case prompts in the form of poems – affected the AI ​​model’s ability to recognize prohibited or harmful content. And the answer was a resounding yes.

Using poetry, the researchers were able to get around the security guardrails – and it’s not entirely clear why.

Title for his study “Adversarial rhyme as a universal single-turn jailbreak mechanism in large language models.“Researchers took 1,200 potentially harmful signals from a database commonly used to test security AI language model And rewrote them in the form of poems.

Known as “adversarial prompts” – usually written in prose and not poetry – these are questions deliberately crafted to cause AI models to output harmful or undesirable content that they would normally block, such as specific instructions for an illegal act.

Federico Pierucci, one of the study’s authors, told DW that in poetic form, the success rate of the manipulated input was surprisingly high. However, why poetry is so effective as a “jailbreak” technique – that is, as a way to bypass AI’s protective mechanisms – remains unclear and is subject to further research, he says.

The illustration depicts Johann Wolfgang von Goethe looking towards the sky, with the buildings of Rome in the background.
Poetry as jailbreak technique: Johann Wolfgang von Goethe would probably have approvedImage: AKG-Images/Picture Alliance

Poetry as security weakness

Icaro Lab’s research led to the observation that AI models become confused when a manipulated, mathematically calculated piece of text is combined with a signal—what is known as an “adversarial suffix,” a type of interference signal that can cause AI to circumvent its own security rules. These are created using complex mathematical processes. Major AI developers regularly train and test their models using similar attack methods to protect them.

“We asked ourselves, if we give AI a text or signal that is intentionally manipulated, such as an adversarial suffix, what happens?” Federico Pierucci says. But not with the help of complex mathematics, but with poetry – to “surprise” the AI, he adds. He explains the thinking behind it: “Maybe an adversarial suffix is ​​like AI’s poetry. It surprises AI in the same way that poetry – especially very experimental poetry – surprises us,” says Pierucci.

The researchers individually crafted the first 20 prompts into poems, says Pierucci, who also has a background in philosophy. These were the most effective, he added. He wrote the rest with the help of AI. The AI-generated poems were also quite successful in bypassing the security guardrails, but not as much as the first batch. Pierucci says that humans are apparently still better at writing poetry.

He says, “We didn’t have a specific writer writing the prompts. It was just us – with our limited literary ability. Maybe we were terrible poets. Maybe if we were better poets, we would have achieved 100% jailbreak success.”

For security reasons, the study did not publish specific examples.

Generative AI Videos: Can You Spot Real and Fake?

Please enable JavaScript to view this video, and consider upgrading to a web browser Supports HTML5 video

Challenge for AI systems: diversity of human forms of expression

The big surprise that came out of this study is that it identified a heretofore unknown weakness in the AI ​​model that allows a relatively straightforward jailbreak.

It also raises questions that demand further research: What exactly is it about poetry that bypasses the security mechanisms?

Pierucci and his colleagues have various theories, but they can’t say for sure yet. “We’re doing this kind of very precise scientific study to try to understand: Is it the rhyme, the stanza or the metaphor that really does all the heavy lifting in the process?” Pierucci explains.

On the purple glass shape, the circle, is a text that answers the question: How do large language models work?
A key area of ​​research: How do AI models determine the content they deliver?Image: Google DeepMind/Unsplash

They also aim to find out whether other forms of expression would yield similar results. “We have now covered one type of linguistic diversity – namely poetic diversity. The question is whether there are other literary forms, such as fairy tales, that work. Perhaps an attack based on fairy tales could also be organized,” says Pierucci.

In general, the range of human expression is extremely diverse and creative, which can make it more difficult to train the responses of machines. “You take a text and rewrite it in countless ways, and not all of the rewritten versions will be as dangerous as the original,” the researcher says. “This means that, in theory, one could create countless variations of harmful signals or requests that could not trigger the AI ​​system’s protection mechanisms.”

AI research also includes the cultural sector

The study also highlights the fact that multiple disciplines are collaborating in artificial intelligence research – such as the Icaro Lab, where teams work closely with scholars from the University of Rome on topics such as the security and behavior of AI systems. The project brings together researchers from the fields of engineering and computer science, linguistics and philosophy. Kavi has not been a part of the team yet, but who knows what the future holds.

Federico Pierucci is certainly very keen to pursue his research. He says, “What we showed, at least in this study, is that there are forms of cultural expressions, forms of human expressions, that are incredibly powerful, as amazingly powerful as jailbreak techniques, and maybe we’ve just discovered one of them.”

Coincidentally, the lab’s name alludes to the story of Icarus: a figure from Greek mythology who wore wings made of wax and feathers and, despite all warnings, flew too close to the sun. When the wax melts, Icarus falls into the sea and drowns – a symbol of overconfidence and violation of natural boundaries.

The researchers therefore see themselves as a warning that we should exercise more caution when it comes to fully understanding the risks and limitations of AI.

Paul McCartney and Rosalía: Strategies to keep AI music alive

Please enable JavaScript to view this video, and consider upgrading to a web browser Supports HTML5 video

This article was originally written in German.

Source link