What is adversarial AI / prompt injection?
Prompt injection is an attack in which a malicious instruction embedded in content manipulates a large language model into taking unintended actions.
Key takeaways
- Prompt injection exploits the fundamental property that large language models process all input as potential instruction. Untrusted content (an email, a document, a web page) becomes attack-controlled prompt material.
- Direct prompt injection (the user types the malicious prompt) and indirect prompt injection (a third party plants the malicious prompt in content the model processes) are the two main variants. Indirect is more relevant for corporate environments.
- Microsoft 365 Copilot, Google Gemini Workspace, and similar assistants are increasingly deployed at mid-market manufacturers. Each deployment expands the prompt-injection attack surface.
- Defense relies on application-level controls (data isolation, output filtering, action limitations) rather than on the model alone. The model cannot reliably tell instructions from content.
- ARG includes AI-tool exposure in audit scope where the client has deployed enterprise AI assistants; the work is part of the broader adversarial simulation program.
What is prompt injection, and where does it live in the attack chain?
A large language model accepts input as text and produces output as text. The model does not have a reliable distinction between "instructions from the developer or user" and "content the model should process". An attacker who can place text into the model's input window can attempt to redirect the model's behavior.
The attack lives at a specific stage in attack chains involving AI assistants:
- The attacker delivers content (email, document, web page, calendar invite, file attachment).
- The content reaches a context where an AI assistant processes it (Copilot summarizes the email, Gemini answers a question about the document, an AI agent reads the calendar to schedule a meeting).
- The malicious instruction in the content triggers the model to act outside the user's intent.
The action varies. In documented cases, prompt injection has caused models to:
- Summarize content misleadingly to influence the user.
- Exfiltrate data by including sensitive information in a response (then visible to a third party).
- Send emails or take actions the user did not request (where the model has agentic capabilities).
- Modify or hide content during processing.
- Redirect users to malicious links presented inside the model's response.
Prompt injection is not a hypothetical attack pattern. Documented public examples exist for major AI assistants from 2023 onward. The vulnerability is structural; the specific attacks evolve.
Direct vs indirect prompt injection: how each one is delivered
Two main categories of prompt injection exist.
Direct prompt injection. The user types the malicious prompt themselves. Examples include attempts to make a chatbot disclose its system prompt, bypass safety controls, or behave in unauthorized ways. Sometimes called jailbreaking when the user is intentionally trying to override safety. The risk profile for corporate AI deployment is moderate; users who actively try to misuse the tool are a manageable threat surface.
Indirect prompt injection. A third party (an attacker) plants the malicious prompt in content that the model processes on behalf of a legitimate user. The user did not request the malicious action; the attacker hid the prompt in content the model encountered. This is the corporate threat that matters.
Indirect prompt injection delivery channels at mid-market manufacturers:
- Email. Malicious content embedded in email body or attachment. When the user asks Copilot to summarize their inbox or extract action items, the malicious prompt fires.
- Documents. Word, PDF, or other documents with embedded malicious prompts. When the model summarizes the document, the prompt fires.
- Web content. Web pages with malicious prompts. When the model browses the web or summarizes a page on behalf of the user, the prompt fires.
- Calendar invites. Malicious prompts in meeting descriptions or attached agendas. When the model processes the calendar to suggest meeting prep, the prompt fires.
- Shared documents from outside the organization. Vendor or customer documents containing embedded prompts.
The attacker does not need to compromise the user. They need to place content where the model will encounter it. The attacker exploits the gap between "user trust" and "content trust" that the model does not reliably distinguish.
Why M365 Copilot, Gemini, and similar tools expand the attack surface
Enterprise AI assistants are increasingly deployed at mid-market manufacturers in 2025-2026. The deployments produce three new attack surfaces.
- Email + Copilot. Copilot reads, summarizes, and generates responses to email. Malicious email content can manipulate the assistant. The attack does not require the user to click anything; the assistant reads the email automatically when asked.
- Documents + Copilot/Gemini. AI assistants summarize and extract information from documents. Documents shared with the organization (from customers, vendors, contractors) can contain prompt injection.
- Agentic actions. Some AI assistants can take actions (send emails, schedule meetings, modify documents, access files). Where the assistant has agency, prompt injection can drive the agency.
Each deployment makes more decisions in the organization driven by the model. Each decision is now a potential attack target. The deployment is operationally valuable; the security implications need to be planned alongside.
The specific risks for mid-market manufacturers:
- CUI handling assistants. AI assistants that read or summarize CUI need particular care. Prompt injection that causes the assistant to disclose CUI to a third party is a compliance incident, not just a security one. See What is NIST SP 800-171?.
- Finance and AP integrations. AI assistants that summarize invoices, draft emails to vendors, or extract payment information are attack-attractive. The yield from a successful injection is direct.
- Engineering data assistants. AI assistants that summarize engineering documents, BoMs, or process specifications expose intellectual property to the prompt-injection surface.
The defense is not to abandon the tools. The defense is to deploy them with the understanding that they expand the attack surface and to architect the deployment with the attack in mind.
Examples of prompt-injection attacks seen in the wild
The public record:
- Microsoft 365 Copilot disclosure attacks (2024-2025). Researchers documented multiple cases where malicious email content caused Copilot to disclose information from other parts of the user's mailbox. Microsoft has issued mitigations; researchers continue to publish new variants.
- Gemini Workspace data exfiltration (2024-2025). Documented examples of malicious document content causing Gemini to surface or exfiltrate data from adjacent documents.
- Indirect injection via shared documents (multiple). A vendor or customer shares a document with embedded malicious prompts. The receiving organization's AI assistant processes the document and is influenced.
- Search-result prompt injection. Malicious content on indexed web pages manipulates AI assistants that summarize search results. The user asks a question; the assistant browses; the malicious content steers the answer.
- Calendar-based injection. Meeting invites with embedded prompts that fire when an AI assistant prepares meeting summaries.
- Code-assistant prompt injection. GitHub Copilot and similar code assistants processing repository content with embedded prompts. The model produces code influenced by the prompt, including subtle vulnerabilities.
The pattern: as more AI tooling is deployed, more channels for indirect prompt injection appear. The attack technique generalizes; specific incidents are vendor-and-deployment-specific.
How to assess prompt-injection exposure across your AI tooling
A practical assessment for a mid-market manufacturer deploying enterprise AI:
- Inventory deployed AI tooling. Microsoft 365 Copilot, Google Gemini Workspace, GitHub Copilot, vendor-provided AI assistants, third-party SaaS with AI features. Each deployment documented with scope and capabilities.
- Map data sources. What data does each AI tool access. Email, documents, code, customer data, CUI, financial data. The data scope determines the impact of successful injection.
- Map agentic capabilities. What actions can each AI tool take. Read-only, draft-and-suggest, send-on-behalf-of, modify-files, access-external-systems. Higher agency means higher attack value.
- Identify untrusted input channels. Where does the AI tool process content from outside the organization. Email from outside, shared documents from vendors and customers, web search results, calendar invites from external organizations.
- Assess output handling. Does the AI tool's output trigger further actions, or is it always reviewed by a human first. Output that triggers actions raises the risk profile.
- Review vendor security claims. What protections has the vendor implemented against prompt injection. What is the vendor's update cadence for new attack patterns.
The output is a risk-ranked inventory of AI tooling with specific exposure characteristics. Higher-risk deployments get tighter controls; lower-risk deployments get monitoring.
Best practices for AI governance and data isolation
AI governance for mid-market manufacturers focuses on bounded deployment with explicit controls.
- Define use cases before deploying. What problems is the AI tool solving. Use cases that benefit from AI agency are different from use cases that only need AI summarization. Different controls follow.
- Limit data access per tool. AI tools have access to specific data sets, not the entire environment. Email assistants have email access; document assistants have document access; the scopes do not overlap by default.
- Limit agentic capability. AI tools that take actions require explicit per-action user approval. The agency is not silent; the user confirms before any external action.
- Filter and sanitize untrusted input. Where the AI tool processes content from outside the organization, the content is screened (where vendor tooling permits) before model processing. The filtering is imperfect; it raises the cost of attack.
- Output review for high-impact actions. AI-generated outputs that drive financial, compliance, or operational decisions are reviewed by a human before execution. The model's output is a suggestion; the human authorizes.
- Monitor AI activity. Logging of AI tool usage, with anomaly detection on unusual access patterns or outputs. The monitoring catches manipulation in progress.
- Train users on indirect injection awareness. Users understand that AI summaries of untrusted content can be manipulated. The skepticism applies to AI output the same way it applies to incoming email content.
- Vendor security review. AI vendors selected based on documented prompt-injection mitigations and update cadence. Vendors that do not address the threat seriously are deprioritized.
- Integration with supply chain risk management and third-party risk. AI tools are vendors; their security posture matters.
- Pair with phishing-resistant MFA and least privilege. The broader identity and access posture matters because compromised user accounts magnify AI-tool risk.
Prompt injection FAQs
Is prompt injection the same as jailbreaking?
Related but not identical. Jailbreaking is a user-driven attempt to bypass a model's safety controls. Prompt injection is a third-party attack delivered through content the model processes. A user jailbreaking a model is the user trying to make the model do something it normally would not; prompt injection is an attacker making the model do something the user did not request.
Can a malicious email trigger Copilot to do something harmful?
Potentially yes, depending on configuration. Indirect prompt injection through email content can manipulate AI assistants (Microsoft 365 Copilot, Google Gemini Workspace, similar) when those assistants summarize, search, or act on email content. Documented attacks show malicious email content influencing summaries, exfiltrating data through follow-up prompts, and (in some configurations) triggering unintended actions.
What is the OWASP LLM Top 10?
A reference list maintained by OWASP of the top vulnerability categories in large language model applications. Prompt injection is consistently in the top three. Other categories include insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency, overreliance, and model theft.
Do AI vendors patch prompt-injection vulnerabilities?
Partially. Vendors continuously improve model and system safeguards against known prompt-injection patterns. The underlying problem (the model treating untrusted content as instructions) is not fully solvable through model patching; it is an architectural property. Defense relies on application-level boundaries, output filtering, and limiting model agency rather than on model patches alone.
How ARG includes AI-tool exposure in audit scope
ARG includes AI-tool exposure in audit scope for clients with deployed enterprise AI assistants. The work is led by James Wall on the infrastructure and policy side, with David Ashby contributing on the operational governance side.
The audit covers:
- Deployment inventory. Which AI tools are deployed, scope, data access, agentic capabilities.
- Data flow analysis. Where untrusted input reaches AI processing. Email, shared documents, web search, calendar, customer-uploaded files.
- Governance review. Use-case documentation, access controls, output review processes, vendor security claims.
- Targeted prompt-injection testing. Where the engagement scope and vendor terms permit, ARG runs controlled prompt-injection attempts against deployed AI tools (with executive sponsor authorization). The testing confirms whether documented mitigations actually hold against current attack patterns.
- Output handling review. Whether AI outputs trigger downstream actions without human review, and whether the workflow is appropriate to the agency level.
- Integration with broader simulation. Spear phishing, BEC, and OAuth consent simulations now include AI-tool variants where the deployment makes them relevant.
Findings consolidate into the monthly operational packet alongside the rest of the engagement. The remediation backlog includes specific AI-governance items: scope tightening, agency reduction, monitoring deployment, vendor coordination.
For founding clients with deployed AI assistants, AI-tool exposure is part of the standard engagement. For clients without AI deployments yet, ARG advises on the security implications of planned deployments as part of the broader program.
Apply as a founding client or see how the engagement works for the full delivery cycle.
Find what gets through.
ARG runs continuous AI-driven adversarial simulation and on-site physical audits for mid-market manufacturers. Two founding-client spots remain.