May 19, 2026Vishal Kagde, Co-founder, Godel-Labs
Share:
From Document to Detonation: How AI Agents Turn Malicious Text into Actions

The real danger of malicious content in AI systems is not that it changes the phrasing of an answer.

The real danger is that agents can turn document content into actions.

An ordinary-looking file can be translated into:

  • a tool call
  • an API request
  • a shell command
  • a file write
  • a workflow step
  • a data exfiltration path

That is the shift security teams need to internalize.

In classic software, documents were usually inputs to parse, display, or store. In agentic systems, documents can become operational instructions that shape what the system does next. The threat is no longer just bad text generation. It is an action-space compromise.

Why This Problem Is Different

People often talk about prompt injection as if it were mainly an output-quality issue. The model says the wrong thing. The answer gets weird. The tone changes. The summary becomes unreliable.

That is the shallow version of the problem.

The deeper problem is that many agents do not stop at generating text. They read content, reason over it, and then use tools. They can browse, search, send messages, write code, update records, trigger workflows, and execute commands. Once that is true, malicious content is dangerous not because it edits the answer, but because it can steer the next action.

The path looks like this:

document → model interpretation → tool selection → execution

At that point, the document is no longer just influencing language. It is influencing behavior.

The Clearest Modern Example: DDIPE

One of the clearest recent articulations of this problem appears in the 2026 paper Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems by Yubin Qu and coauthors.

This paper matters because it focuses directly on the jump from document content to agent execution.

The setting is highly practical. Coding agents increasingly extend themselves with third-party skills, often distributed through open marketplaces. Those skills include descriptions, examples, templates, and documentation that agents read and reuse while completing tasks. The paper argues that this creates a supply-chain attack surface that is different from ordinary package risk, because the agent may operationalize what it reads from the documentation itself.

Their key concept is Document-Driven Implicit Payload Execution, or DDIPE.

The core mechanism is simple and dangerous: malicious logic is embedded in code examples and configuration templates inside skill documentation, and the agent reproduces those examples during normal task completion. The payload then executes without needing an explicitly malicious prompt.

That is almost the purest version of the “document to detonation” story.

The file does not need to say, in plain language, “do something evil.”

It only needs to look like legitimate documentation that the agent treats as trusted implementation guidance.

Because the agent copies and runs what it reads, the documentation itself becomes an execution vector.

Why DDIPE Matters So Much

DDIPE sharpens the threat model in three important ways.

First, it shows that agents may execute attacker-controlled logic not by obeying an obvious command, but by reusing examples that appear helpful and benign.

Second, it shows that the dangerous boundary is not only prompt interpretation, but code reproduction followed by execution.

Third, it shows that even when systems are defended against explicit instruction injection, documentation-shaped payloads can still cross into action space.

Qu et al. make this concrete in the paper’s abstract. They focus on attacks that hijack an agent’s action space, including file writes, shell commands, and network requests. They report 1,070 adversarial skills generated from 81 seeds across 15 MITRE ATT&CK categories, tested across four frameworks and five models. Across those settings, DDIPE achieves bypass rates of 11.6% to 33.5%, while explicit instruction attacks achieve 0% under strong defenses.

That comparison is the headline.

The obvious attack can fail while the document-shaped one still works.

In other words, an agent may reject a plainly malicious instruction but still execute the same logic when it arrives disguised as documentation.

This Is the Action-Space Problem

The right mental model here is not “prompt injection causes bad responses.”

The right mental model is:

untrusted content can be compiled into agent behavior

That behavior may include:

  • calling external APIs
  • invoking internal tools
  • writing files
  • modifying code
  • installing packages
  • sending data over the network
  • advancing a business workflow

Once an agent has these capabilities, content security becomes action security.

That is why this problem is much bigger than answer filtering or hallucination reduction. Even a perfectly phrased answer can hide the fact that the agent has already taken an unsafe step behind the scenes.

The Enterprise Implication

For enterprise AI, the lesson is stark.

It is not enough to ask whether a document is relevant to the task. You also have to ask whether the document is safe to operationalize.

A file may be:

  • useful to read but unsafe to execute from
  • acceptable for retrieval but unsafe for tool planning
  • fine for a human analyst but dangerous for an autonomous agent
  • topically relevant but carrying embedded action logic the system should never trust

This is where many current architectures fail. They treat retrieved content as if it were passive reference material. But once an agent can translate content into code paths, commands, and workflow steps, retrieval becomes a pre-execution stage.

That changes the security model completely.

What Needs to Change

AI systems need a qualification layer between external content and agent action.

Before a document is allowed to shape planning, tool use, code generation, or execution, the system should evaluate whether that content is:

  • relevant
  • permitted
  • trustworthy
  • safe

This matters especially for coding agents, workflow agents, support agents, and any system that reads external material before taking action.

If content can be turned into execution, then content review is not just data hygiene. It is a runtime defense.

Conclusion

Qu et al. show a particularly dangerous next step in agent security: documentation itself can become a payload delivery mechanism through DDIPE.

Taken together, the lesson is clear:

the problem is not that malicious documents make agents say the wrong thing.

The problem is that agents can turn malicious documents into actions.

That is the path from document to detonation.

Sources

From Document to Detonation: How AI Agents Turn Malicious Text into Actions

Follow our journey in securing the AI revolution.

Get a Demo