AI security

LLM red teaming guide: test AI systems before attackers and users do

A practical LLM red teaming guide for prompt injection, jailbreaks, data leakage, tool misuse, RAG attacks, agent safety, adversarial testing, evals, and remediation.

Updated 2026-06-1110 min readIntermediate

Read LLM guardrails guide Read evaluation tools guide

AI Buyer Readiness Scorecard

Turn this guide into procurement, security, ROI, rollout, and governance questions.

Use the scorecard before opening vendor pricing pages. It keeps commercial AI research tied to the workflow, data risk, operating cost, and evidence buyers need before a shortlist becomes a purchase.

Procurement trigger

Define the business event behind the search: budget review, renewal, security review, failed pilot, new workflow, or vendor consolidation.

Data and security review

Check whether prompts, files, logs, embeddings, customer records, regulated data, or source code will touch the AI system.

ROI and operating cost

Estimate seat cost, API usage, implementation time, review effort, support load, fallback work, and expected workflow savings.

Integration and rollout path

Map the tools, identity systems, data sources, approval steps, change management, and users needed for a real deployment.

Governance evidence

Collect policies, evals, audit logs, human review rules, incident response, vendor terms, and owner names before procurement asks.

Best for

Security teams testing LLM applications and agents
RAG teams worried about prompt injection and data leakage
Developers shipping tool-using AI systems
Product leaders adding AI safety release gates

Not for

A guarantee that all attacks can be eliminated
Encouraging abuse of third-party systems
Replacing secure architecture, permission design, or human review

Comparison

Choose by workflow, not brand

Option	Best for	Strengths	Tradeoffs	Use when
Manual red teaming	Novel workflows, human judgment, exploratory testing, and business-specific abuse cases	Finds creative failures that automated suites often miss.	Harder to scale and reproduce without strong documentation.	The product is new, high-impact, or has unusual tool and data flows.
Automated adversarial evals	Regression testing, prompt changes, model upgrades, and repeated security checks	Scales across releases and makes fixes measurable.	Can miss novel attacks and can create false confidence if cases are shallow.	Known risk categories need to be checked on every release.
External or specialist red team	High-risk, enterprise, regulated, or public-facing systems	Adds independence, domain expertise, and adversarial perspective.	Requires scope, safe testing rules, data handling, and remediation ownership.	The system can affect money, safety, privacy, reputation, or regulated workflows.

Scope the system, not just the model

LLM red teaming should test the full application: prompts, retrieved documents, tool permissions, memory, file uploads, output handling, external APIs, logging, and human handoff.

Test direct prompt injection and indirect prompt injection inside retrieved content.
Test whether model output can trigger unsafe downstream actions.
Test access-control failures in RAG, memory, and tool results.

Create a risk-based test catalog

Use known taxonomies such as OWASP LLM risks, then add product-specific abuse cases. A support bot, coding agent, finance assistant, and voice agent should not share the exact same test plan.

Include prompt injection, sensitive disclosure, excessive agency, insecure output handling, and supply-chain risks.
Include business abuse: fraud, policy bypass, account takeover support, and false claims.
Include refusal and escalation tests for uncertain or regulated requests.

Turn findings into regression tests

A red-team finding is not closed when a prompt is edited. It is closed when the system has a durable control, a test case, an owner, and monitoring for recurrence.

Add failing prompts to CI or pre-release eval suites.
Classify fixes as prompt, retrieval, permission, validation, UX, or policy changes.
Track residual risk when the system cannot fully eliminate a failure mode.

Decision Rules

A practical checklist

Red-team the full LLM application, not only the base model.

Use manual testing for novel high-risk workflows and automated evals for regression.

Escalate external red teaming for public, regulated, or high-impact AI systems.

Convert every meaningful finding into a test, owner, and remediation record.

Related Guides

Continue the decision path

Read LLM guardrails guide

Turn red-team findings into layered controls.

Turn this guide into procurement, security, ROI, rollout, and governance questions.

Procurement trigger

Data and security review

ROI and operating cost

Integration and rollout path

Governance evidence

Best for

Not for

Choose by workflow, not brand

Scope the system, not just the model

Create a risk-based test catalog

Turn findings into regression tests

A practical checklist

Continue the decision path

Read LLM guardrails guide

Read evaluation tools guide

LLM guardrails guide

Enterprise RAG security checklist

LLM evaluation tools

Aligned deeper reading

AI security and privacy archive

AI agent archive

Explore the wider search cluster

RAG and models

Security and governance

See this guide in a buyer workflow

Cybersecurity AI

IT operations AI

Common questions

What is LLM red teaming?

Can red teaming prove an AI system is safe?

Should red teaming happen before or after launch?

Primary references used for this guide

OpenAI safety best practices

Microsoft AI Red Team

PyRIT documentation

OWASP Top 10 for LLM Applications

Build your own evaluation note