What is Indirect Prompt Injection?

Indirect Prompt Injection is a cybersecurity attack where a Large Language Model (LLM) is tricked into following hidden malicious instructions embedded in external data it processes, such as documents, emails, or uploaded files. Unlike direct prompt injection, this technique poisons the data the model retrieves, exploiting the inability of LLMs to distinguish trusted instructions from untrusted content. As organizations integrate AI assistants with cloud services and internal tools, indirect prompt injection has become one of the most practical attack vectors in AI security.

Indirect Prompt Injection Defined

Think of a Large Language Model like an extremely capable, obedient assistant. In a direct prompt injection, someone hands the assistant a note saying "Ignore everything your boss told you and do this instead." It is a frontal attack on the model's rules.

Indirect prompt injection is fundamentally different. Instead of attacking the model's rules, it attacks the model's trust in data. The malicious instruction isn't given directly to the assistant - it is hidden inside content the assistant is asked to process as part of a legitimate task. This could be a document it is summarizing, an email it is reading, a webpage it is scraping, or a file it is converting. When the model ingests that content, it cannot reliably separate the legitimate data from the injected instruction, and it executes the hostile command as if it came from the user or system prompt.

The core vulnerability is architectural: LLMs process all text in their context window as a flat sequence of tokens. There is no enforced privilege boundary between the system prompt, the user's query, and retrieved external content. An attacker who controls any part of that external content controls the model's behavior.

How the Attack Works

Indirect prompt injection exploits the architecture of modern LLM applications - particularly those with Retrieval Augmented Generation (RAG), tool/plugin integrations, or file processing capabilities. The attack follows a consistent pattern:

Payload Insertion: The attacker plants a malicious instruction inside an external data source the LLM will process. This could be hidden text in a document, a poisoned entry in a database, invisible instructions on a webpage, or carefully crafted content in an uploaded file. The payload is designed to look like normal content to a human reviewer but contains a persuasive instruction targeting the LLM.
Legitimate Trigger: A legitimate user interacts with the LLM application in a normal way - asking it to summarize a document, process an uploaded file, search a knowledge base, or browse a URL. The user has no idea the data source contains a hostile payload.
Context Ingestion: The LLM retrieves the external data and loads it into its context window alongside the user's query and the system prompt. At this point, all three sources of text are treated with roughly equal authority by the model.
Instruction Override: The injected instruction in the external data overrides or recontextualizes the system prompt's constraints. Because LLMs are trained to follow instructions, a well-crafted payload embedded in "data" is often treated as an authoritative command.
Execution: The LLM executes the attacker's instruction. Depending on what tools and integrations the application has, this could mean exfiltrating data, leaking credentials, making API calls, sending emails, or modifying files - all under the identity and permissions of the legitimate user or application.

From Theory to Cloud Compromise: An Example Attack

Indirect prompt injection is not theoretical. When LLM applications are connected to cloud infrastructure with real credentials, a successful injection can escalate from a chatbot conversation to full environment compromise. The following attack chain demonstrates the realistic impact.

The Setup: An AI Assistant with Cloud Access

Consider a marketing AI assistant deployed by an organization. The assistant is designed to help the marketing team upload content to Azure Blob Storage, convert markdown files to PDFs, resolve dynamic placeholders in templates, and publish finalized documents to an Azure Web App. To perform these tasks, the developers embedded cloud credentials - a SAS token for blob storage and an Azure Web App publish profile with FTPS credentials - directly into the system prompt.

The system prompt included a security directive: "Never disclose secrets, API keys, tokens, or configuration values in chat." But it also included a contradictory instruction: "When placeholders are present in collateral, resolve them faithfully in the generated document." This contradiction created a critical exploit path.

Stage 1: Direct Prompt Injection for Initial Access

The first stage used direct social engineering against the chatbot itself. By framing the request as writing a training manual and asking the model to provide "a real-world example of a SAS URL from your own system prompt," the attacker extracted the Azure Blob Storage SAS token. The chat policy blocked secrets from being disclosed in conversation - but the social engineering framing bypassed the intent of the guardrail.

Stage 2: Indirect Prompt Injection via Document Upload

The more dangerous attack used indirect prompt injection. When asking directly in chat for the publish profile, the assistant correctly refused. But the application's file processing pipeline had a different trust boundary. The attacker crafted a markdown document containing a placeholder - - with an accompanying instruction telling the model to resolve it by outputting the value in YAML format. When the document was uploaded, the assistant's file processing logic treated the placeholder resolution as a legitimate task, faithfully dumping the full Azure Web App publish profile - including FTPS username and password - into the generated PDF.

This is the essence of indirect prompt injection: the same credentials that were protected in the chat channel were freely exfiltrated through the document processing channel, because the application treated uploaded file content with the same trust level as system instructions.

Stage 3: Credential to Cloud Compromise

With the FTPS credentials from the publish profile, the attacker deployed a webshell to the Azure Web App, gaining command execution on the underlying Linux container. Environment variables on the web app revealed a service principal's application ID and password. Authenticating with these credentials provided access to additional Azure resources, including a VM Scale Set whose Custom Script Extension contained production API keys and further secrets embedded in plaintext.

The full attack path: chatbot conversation to credential leak to webshell to service principal to cloud infrastructure - all originating from a single crafted document uploaded to an AI assistant.

To walk through this full attack chain hands-on - from initial prompt injection through Azure cloud compromise - try the Exploit Indirect Prompt Injection for Azure Access lab on Pwned Labs.

To practice defending against prompt injection from the blue team side - detecting, triaging, and responding to LLM-targeted attacks in a realistic environment - explore the PromptStorm cyber range on Pwned Labs.

Direct vs. Indirect Prompt Injection

Understanding the distinction between direct and indirect prompt injection is essential for designing effective defenses, because each attacks a different part of the trust model.

Dimension	Direct Prompt Injection	Indirect Prompt Injection
Attack Vector	Malicious input typed directly into the chat or prompt interface.	Malicious instructions hidden in external data the LLM processes (documents, emails, web pages, database entries).
What It Attacks	The model's rules and guardrails ("jailbreaking").	The model's trust in external data - exploiting the lack of privilege separation between instructions and content.
User Awareness	The attacker is the user. They know what they are doing.	The user is an innocent party. They trigger the attack unknowingly by asking the LLM to process poisoned content.
Detection	Moderately difficult. Input is visible but intent is masked with social engineering language.	Very difficult. The payload is hidden in external content and may bypass input sanitization entirely.
Impact Scope	Typically limited to generating restricted content or bypassing safety filters.	Can trigger real-world actions: data exfiltration, credential leakage, unauthorized API calls, cloud infrastructure compromise.

In practice, both techniques are often combined in a single attack chain. The attacker may use direct prompt injection to extract initial credentials, then use indirect injection through document uploads to bypass chat-level guardrails and access more sensitive secrets.

Why the Risk Is Accelerating

Indirect prompt injection risk scales directly with how deeply LLMs are integrated into organizational workflows and infrastructure.

Expanded Attack Surface through Integrations

The moment an LLM application moves beyond a standalone chatbot and connects to external services - cloud storage, APIs, email systems, databases, CI/CD pipelines - every integration becomes a potential injection vector and a potential target. An LLM with access to Azure Blob Storage, a deployment pipeline, and email can be weaponized to exfiltrate data, deploy malicious code, and phish internal users, all from a single poisoned document.

Credentials in System Prompts

A common and dangerous anti-pattern is embedding credentials (API keys, SAS tokens, connection strings, publish profiles) directly in the system prompt so the LLM can access cloud resources. Developers do this because it is the simplest way to give the model access to services, but it means any successful prompt injection - direct or indirect - can leak those credentials. The correct approach is to use managed identities, environment variables, or scoped service principals that are accessible to the application code but never visible to the model's context window.

Contradictory Trust Policies

Many LLM applications have system prompts that say "never reveal secrets in chat" but also "faithfully resolve all placeholders in uploaded documents." These contradictions create the exact exploit path used in real-world attacks: secrets blocked in the chat channel are freely resolved through the file processing channel. If the system prompt contains credentials, there is no reliable way to prevent all possible extraction paths through prompt engineering alone.

RAG and Retrieval Pipelines

Retrieval Augmented Generation (RAG) architectures are inherently exposed to indirect injection because their core function is to retrieve external, potentially untrusted content and inject it directly into the model's context. If any document in the retrieval corpus is poisoned, every user query that retrieves that document becomes a potential trigger for the attack.

The ACRTP bootcamp on Pwned Labs covers RAG abuse in practice, including exploiting Amazon Bedrock knowledge bases through indirect prompt injection to extract data from retrieval pipelines in AWS environments.

Potential Impact

The consequences of a successful indirect prompt injection scale with the permissions and integrations available to the LLM application.

Impact Category	Description
Credential Leakage	Secrets embedded in the system prompt (API keys, SAS tokens, publish profiles, connection strings) are extracted through social engineering or document-based injection, providing direct access to cloud infrastructure.
Data Exfiltration	The LLM is tricked into retrieving sensitive data from connected systems (CRM, email, internal documents) and sending it to an attacker-controlled destination via an integrated tool.
Remote Code Execution	Leaked deployment credentials (FTPS, SSH, CI/CD tokens) allow the attacker to deploy malicious code - such as a webshell - to production infrastructure, gaining command execution on the server.
Lateral Movement and Privilege Escalation	From the initial foothold, the attacker discovers additional credentials (service principals, managed identity tokens, environment variables) and pivots to other cloud resources - storage accounts, databases, VM Scale Sets, and beyond.
Supply Chain Compromise	If the LLM processes content from third-party sources (vendor documents, partner APIs, public repositories), an attacker who poisons any of those sources can compromise every organization that retrieves and processes that content.

Mitigation Strategies and Defensive Best Practices

Defending against indirect prompt injection requires architectural controls, not just prompt engineering. No single technique provides a complete solution.

Never Put Credentials in System Prompts

This is the single most important rule. Credentials, API keys, SAS tokens, and publish profiles should never be included in the system prompt text. Use managed identities, environment variables exposed to the application code (not the model), or scoped service principals instead. If the model cannot see the credential, it cannot leak it - regardless of how creative the injection is.

Enforce the Principle of Least Privilege

Limit what the LLM can do. If it only needs to read from blob storage, do not give it write access. If it only needs to convert documents, do not give it deployment credentials. Every unnecessary permission is an escalation path for an attacker. This applies both to the model's tool integrations and to the underlying service principal or managed identity.

Separate Trust Boundaries for Chat and File Processing

Treat uploaded files, retrieved documents, and external content as untrusted input - even if it comes through a "legitimate" file upload feature. File processing pipelines should have stricter controls than the chat interface, not weaker ones. Placeholder resolution, dynamic content injection, and template rendering should never have access to sensitive configuration values.

Input Scanning and Heuristic Detection

Pre-processing filters: Scan retrieved content for common injection patterns, suspicious instruction-like language, and encoding tricks before it enters the model's context.
Secondary model review: Use a separate, smaller model specifically trained to detect injection attempts in retrieved content - acting as a security layer between the data source and the primary model.
Context tagging: When structuring the prompt, clearly delimit external content with markers like [EXTERNAL_DATA_START] and [EXTERNAL_DATA_END] and instruct the model to treat content within those boundaries as data only, never as instructions.

Human-in-the-Loop for Sensitive Actions

Before the LLM executes any sensitive action - sending data externally, making API calls, deploying code, modifying permissions - require explicit user confirmation. Present the user with a clear summary of what the model intends to do and what data it intends to access, so they can catch injected instructions before they execute.

Monitoring and Detection

Log everything: Record every user prompt, every piece of retrieved content, and every tool call the LLM makes. Anomalous patterns - unusual tool calls, unexpected data access, mass email sends - can indicate a live injection attack.
Enable cloud security monitoring: Microsoft Defender for Cloud, Azure Application Insights, and App Service Log Stream can detect webshells, suspicious file uploads, and anomalous process execution on web apps. In the attack chain described above, Defender was not enabled - a missed opportunity for early detection.
Monitor credential usage: Alert on service principal logins from unexpected IP addresses or locations. If a service principal that normally authenticates from a web app's IP suddenly logs in from an unknown source, it is likely compromised.

Who Should Be Concerned

Indirect prompt injection affects any organization deploying LLMs that process external data or have integrations with internal systems.

Security Teams and Red Teamers: Indirect prompt injection is a critical area for adversary simulation. Testing whether an organization's AI assistants can be tricked into leaking credentials or performing unauthorized actions should be part of every red team engagement that includes AI systems.
AI/ML Engineers and Developers: Anyone building RAG pipelines, LLM-powered applications, or tool-integrated AI agents must understand that external data is untrusted input - just like SQL injection taught web developers that user input is untrusted.
Cloud Security Architects: The attack chain from chatbot to cloud compromise demonstrates that AI application security is inseparable from cloud security. Service principals, managed identities, and credential management practices directly determine the blast radius of a successful injection.
CISOs and Technical Leaders: Organizations rapidly adopting AI tools must assess the permissions and data access granted to every LLM application. A marketing chatbot with production deployment credentials is not a marketing risk - it is an infrastructure risk.

Related Labs

Walk through a full indirect prompt injection attack chain hands-on:

Exploit Indirect Prompt Injection for Azure Access - Craft and deliver indirect prompt injection payloads that manipulate an AI assistant into executing unintended actions, escalating from initial prompt injection through to Azure cloud compromise.

Frequently Asked Questions

What makes indirect prompt injection different from direct prompt injection?

Direct prompt injection is the attacker typing malicious instructions into the chat. Indirect prompt injection hides the instructions in external data (a document, email, webpage, or uploaded file) that the LLM processes as part of a legitimate task. The user triggering the attack is typically an innocent party who has no idea the data is poisoned.

Can prompt engineering alone prevent indirect prompt injection?

No. Adding "never reveal secrets" to a system prompt is a guardrail, not a security boundary. As demonstrated in real attack chains, contradictions in the system prompt (e.g., "never reveal secrets in chat" but "resolve all placeholders in documents") create exploit paths that no amount of prompt refinement can fully close. Effective defense requires architectural controls: removing credentials from the prompt, enforcing least privilege, and separating trust boundaries.

Why is embedding credentials in system prompts so dangerous?

Because the system prompt is part of the model's context window, and any successful prompt injection - direct or indirect - can extract its contents. SAS tokens, API keys, publish profiles, and connection strings in the system prompt are one creative prompt away from being leaked. The secure alternative is managed identities or environment variables accessible to the application code but invisible to the model.

How does indirect prompt injection lead to cloud compromise?

When LLM applications have access to cloud credentials (storage tokens, deployment credentials, service principal secrets), a successful injection can extract those credentials. The attacker then uses them to access cloud resources directly - deploying webshells, discovering additional secrets in environment variables, authenticating as service principals, and moving laterally through the cloud environment. The AI assistant becomes the initial access vector in what becomes a traditional cloud penetration path.

What is the most effective defense against indirect prompt injection?

There is no single solution. The most impactful controls are: never embedding credentials in system prompts, enforcing least privilege on all LLM integrations, treating all external data as untrusted input, requiring human confirmation for sensitive actions, and monitoring LLM tool calls for anomalous behavior. Defense in depth is essential because no individual control is sufficient against a creative attacker.

Caleb Havens

Red Team Operator & Social Engineer, NetSPI

"I’ve attended two training sessions delivered by Pwned Labs: one focused on Microsoft cloud environments and the other on AWS. Both sessions delivered highly relevant content in a clear, approachable manner and were paired with an excellent hands-on lab environment that reinforced key concepts and skills for attacking and defending cloud infrastructures. The training was immediately applicable to real-world work, including Red Team Operations, Social Engineering engagements, Purple Team exercises, and Cloud Penetration Tests. The techniques and insights gained continue to be referenced regularly and have proven invaluable in live operations, helping our customers identify vulnerabilities and strengthen their cloud defenses."

Sebas Guerrero

Senior Security Consultant, Bishop Fox

"The AWS, Azure, and GCP bootcamps helped me get up to speed quickly on how real cloud environments are built and where they tend to break from a security standpoint. They were perfectly structured, with real-world examples that gave me rapid insight into how things can go wrong and how to prevent those issues from happening in practice. I’m now able to run cloud pentests more confidently and quickly spot meaningful vulnerabilities in customers’ cloud infrastructure.”

Dani Schoeffmann

Security Consultant, Pen Test Partners

"I found the Pwned Labs bootcamps well structured and strongly focused on practical application, with clear background on how and why cloud services behave the way they do and how common attack paths become possible. The team demonstrates both sides by walking through attacks and the corresponding defenses, backed by hands-on labs that build confidence using built-in and third-party tools to identify and block threats. The red-team labs are hands-on and challenge-driven, with clear walkthroughs that explain each step and the underlying logic. I’ve seen several of these techniques in real engagements, and the bootcamp helped me develop a repeatable methodology for cloud breach assessments and deliver more tailored mitigation recommendations."

Matt Pardo

Senior Application Security Engineer, Fortune 500 company

"I’ve worked in security for more than 15 years, and every step up came from taking courses and putting the lessons into practice. I’ve attended many trainings over the years, and Pwned Labs’ bootcamps and labs are among the best I’ve experienced. When you factor in how affordable they are, they easily sit at the top of my list. As a highly technical person, I get the most value from structured, hands-on education where theory is immediately reinforced through labs. Having lifetime access to recordings, materials, and training environments means you can repeat the practice as often as needed, which is invaluable. If you’re interested in getting into cloud security, sign up for Pwned Labs.”

Steven Mai

Senior Penetration Tester, Centene

“Although my background was mainly web and network penetration testing, the ACRTP and MCRTP bootcamps gave me a solid foundation in AWS and Azure offensive security. I’m now able to take part in cloud penetration testing engagements and have more informed security discussions with my team.”