Security Engineering · Pillar Reference
The 2026 LLM Security Checklist: 47 Controls Across 7 Categories
A canonical reference for auditing LLM-powered applications before production. Walk it end-to-end, answer each control yes, no, or partial, and you will surface every wedge an attacker is likely to use. Free PDF below.
Why this checklist exists
Every product shipping an LLM feature in 2026 is also shipping a fresh attack surface. The application security playbook teams learned for REST APIs (rate limit, validate inputs, scope tokens, log everything) does not directly transfer. Prompts are not request bodies. Tool calls are not RPCs. Vector stores are not relational tables. The threats that bite are the ones engineers do not yet have a reflex for.
This is the checklist we wish every team had before incident response had to write the postmortem. It is built from real attack patterns observed against production LLM apps, organized into seven categories that incident retrospectives keep returning to. 47 controls total. Designed to be walked in a single 3-to-5-hour session by an engineer familiar with the system.
Scope: this checklist covers security (what an attacker can do to your system). It is not an AI safety checklist (what your system might do on benign input, like hallucinations or bias). Those are separate evaluation methodologies. The overlap is at the boundary (moderation bypass is both).
How to use this checklist
For each control, mark one of three answers:
- Yes: this control is implemented and verified in the current production system.
- No: this control is not implemented. Document the residual risk.
- Partial: partially implemented. Document what is done and what is missing.
Walking the full checklist for the first time usually surfaces 8 to 15 partial-or-no answers. That is normal. The checklist is a starting point for a backlog, not a pass-fail gate. Re-walk after every major change to prompts, tools, or the LLM provider. Re-walks take 30 to 45 minutes once the system is documented.
Free PDF checklist
One page per category, checkboxes, notes columns. Print or fill in on screen. Updated quarterly.
Download PDF (v2026.05) No email required. Versioning matches release date.1. Input attack surface (7 controls)
The first audit lane: every player-, user-, or external-system-controlled input that ends up in an LLM prompt or context window. The mental model is "any byte the attacker writes that the model reads."
- I-1. Enumerated inputs. You have a written list of every user-controlled field that reaches the LLM (chat messages, ticket text, profile bios, document uploads, query parameters, headers).
- I-2. Indirect inputs identified. You have identified every external data source the LLM reads (RAG documents, search results, third-party API responses, web pages fetched by tools). Indirect prompt injection comes through these.
- I-3. Length limits enforced server-side. Each input has a maximum size enforced by the application server, not just the LLM. Cost-amplification attacks die at this control.
- I-4. Encoding normalized before scoring. Inputs are normalized (Unicode form, lowercased for keyword checks, base64-decoded for scanning) before any moderation or filter runs. Obfuscation bypasses die here.
- I-5. File uploads scoped. Document and image uploads are size-capped, type-allowlisted, and stripped of metadata before reaching the LLM. EXIF and PDF metadata can carry hidden instructions.
- I-6. Multi-turn context is auditable. You can reconstruct the exact prompt+history that was sent to the LLM for any given user session, for replay and incident investigation.
- I-7. Rate limits per identity. Per-user, per-session, and per-IP request limits in place. LLM cost and latency are unbounded without this.
2. Prompt construction (8 controls)
How user input is composed into the prompt the LLM actually sees. Most prompt injection is a prompt construction failure, not a model failure.
- P-1. System prompt is template-static. The system prompt does not include user input. User content goes into the user role only.
- P-2. User input is wrapped with delimiters. User content is wrapped with unambiguous delimiters (XML tags, fenced blocks) that the system prompt instructs the model to treat as data, not instructions.
- P-3. Role boundaries are enforced. Untrusted content (search results, RAG passages, tool outputs) is never inserted into the system or assistant role. Always user role with clear labeling.
- P-4. Instruction phrases in user input are flagged. Detection for phrases like "ignore previous instructions", "you are now", "system:" in user input. Either reject, strip, or log+escalate.
- P-5. Prompt and template are versioned. Every prompt and template is in source control with a version identifier that is logged on every LLM call.
- P-6. No secrets in the prompt. No API keys, internal URLs, customer PII, or other secrets inside the system prompt. Leakage is one prompt-injection away.
- P-7. Cross-tenant data is scoped before the LLM. When the prompt includes data, the data is already filtered to the requesting user's authorization scope. Never push raw multi-tenant data and ask the model to filter.
- P-8. Few-shot examples are review-gated. Few-shot examples in the prompt have been reviewed by a security-aware engineer. Examples are a documented vector for jailbreaks because the model treats them as authoritative.
3. Output handling (6 controls)
What you do with the LLM's response before it reaches a user, a tool, or a downstream system. Output handling is where prompt injection becomes RCE in the worst cases.
- O-1. Outputs are not eval'd or exec'd. LLM output is never passed to
eval,exec, shell, SQL string-concat, or any code-execution path. Tool calls go through structured schemas. - O-2. Markdown/HTML is sanitized. If LLM output is rendered as Markdown or HTML, it goes through a sanitizer that strips scripts, iframes, and dangerous URI schemes (
javascript:,data:). - O-3. URLs in output are checked. URLs the LLM emits to users are either allowlisted, link-shimmed (warning page), or treated as untrusted in the UI.
- O-4. Output length is capped. Maximum response tokens enforced. Defends against verbose-output exfiltration and runaway cost.
- O-5. Output classification before action. If output triggers downstream actions (file write, email send, payment), the output passes a classifier or schema check first, not just a "the model said yes."
- O-6. PII redaction post-generation. Outputs are scanned for PII (emails, phone numbers, internal IDs) and redacted or blocked depending on the channel.
4. Tool and agent use (7 controls)
Tools are how LLMs get teeth. They are also how a prompt injection becomes a real-world action. Every tool the agent can call expands the blast radius.
- T-1. Tools have explicit schemas. Every tool the LLM can call has a documented input schema, output schema, and authorization scope. No free-form function dispatching.
- T-2. Tool inputs are validated. Tool arguments are validated by the application before execution, not trusted because the LLM emitted them.
- T-3. Authorization checked per tool call. Each tool call is authorized as the calling user, not as the LLM service account. The LLM does not get to elevate privileges.
- T-4. Destructive tools require confirmation. Tools that modify state (refund, delete, send email, transfer funds) require a separate confirmation step. Either user-in-the-loop or a policy-approved second LLM check.
- T-5. Tool budget per session. Maximum tool calls per session and per user, with a hard ceiling. Recursive tool-call loops have a depth limit.
- T-6. Tool errors do not leak. Tool errors that flow back to the LLM are sanitized so they do not reveal infrastructure details (internal hostnames, stack traces, secrets).
- T-7. Agent plans are logged and replayable. The full sequence of tool calls per session is logged with arguments and timestamps. Investigations need this.
5. Memory and RAG (6 controls)
The state that persists between sessions is the new attack surface. Long-lived memory, vector stores, and RAG corpora all carry attacker-controlled bytes forward in time.
- M-1. Memory is scoped to identity. Per-user (and per-tenant) memory namespaces enforced at the storage layer, not the application layer. Cross-tenant memory leaks are impossible by construction.
- M-2. Memory entries are signed by source. Each memory or RAG document carries metadata about who wrote it (user, system, third party) and when. The prompt instructs the model to treat each source differently.
- M-3. Memory writes are size-bounded. Per-write and per-day quotas on what a single identity can persist. Memory poisoning attacks need volume.
- M-4. Memory can be inspected and deleted by users. Users can list and delete their own memory entries. Compliance plus security: a poisoned entry is removable.
- M-5. RAG sources are provenance-tracked. Every chunk in the RAG index points back to its source URL, document, or upload. Citation in the response is mandatory, not optional.
- M-6. External fetches go through an allowlist or proxy. If a tool fetches a URL the LLM emitted, the request goes through a proxy that enforces an allowlist, strips cookies, and prevents SSRF.
6. Identity and authorization (6 controls)
Who is the LLM acting on behalf of, and what is that identity allowed to do? Get this wrong and the attacker does not need to break the LLM at all.
- A-1. LLM calls carry user identity. Every LLM call is associated with an authenticated user identity, server-side. Anonymous or shared-token calls are a red flag.
- A-2. Service tokens are not user tokens. The LLM provider API key is a service credential. It is never substituted for a user's auth token when calling internal systems on the user's behalf.
- A-3. Tool calls inherit user scope. When a tool runs, it runs with the calling user's permission grant, not the LLM service account's. No privilege upgrade by way of an LLM hop.
- A-4. Multi-tenant boundaries are pre-filtered. When the prompt or tool input includes tenant-scoped data, that data was already filtered to the requesting tenant. The model is never expected to enforce tenant isolation.
- A-5. Admin actions require human approval. An LLM never autonomously executes an action restricted to admins (role grants, billing changes, account deletion). The model can suggest; humans approve.
- A-6. Auth failures are not leaked to the model. When a tool call fails on auth, the error returned to the model is generic ("not authorized") with no detail about what scope was missing. Avoid teaching the model your authorization layout.
7. Monitoring and observability (7 controls)
You cannot defend what you cannot see. LLM observability lags HTTP observability by years; closing the gap is the highest-leverage control left after the others are in place.
- L-1. Every LLM call is logged. Full prompt, full response, model version, prompt template version, latency, token counts, and the calling user identity logged for every call. Retention policy documented.
- L-2. Tool calls are logged with arguments. Same standard for tool calls. Arguments are logged (with secret redaction). Tool latency and exit status logged.
- L-3. Anomaly detection on call patterns. Alerts for sudden spikes in token use, tool-call frequency, refusal rate, or per-user request rate. Most attacks have a footprint in these.
- L-4. Refusal and jailbreak signals are tracked. A counter for outputs that contain refusal phrases, instruction-pattern echoes, or other "the model is being attacked" signals. Trends are watched.
- L-5. Cost per user is tracked. Per-user LLM cost is queryable in production. Cost amplification attacks show up here first.
- L-6. Logs are reviewable by incident responders. The on-call engineer can search LLM logs the same way they search HTTP logs. No "the prompts are in a different system you do not have access to."
- L-7. Adversarial red-team runs scheduled. A red-team or automated adversarial suite runs against the production prompts at least monthly. Output deltas are reviewed.
Use this checklist
Walk the seven categories in order, top to bottom. Most teams find the input attack surface and prompt construction categories the easiest to fix and the most impactful. Tool/agent use is the highest blast-radius category if the system has tools; treat any No/Partial there as urgent.
After the first walk-through, file every No and Partial as an issue with a due date. Re-walk the full checklist quarterly. Re-walk just the affected categories after any major prompt, tool, or model change.
Download the PDF checklist
One page per category, with checkboxes for Yes / No / Partial and a notes column. Print or annotate on screen.
Get the PDF (v2026.05) No email required. Republish freely with attribution.How this maps to OWASP Top 10 for LLMs
OWASP Top 10 for LLM Applications ranks the most common risk classes. This checklist breaks those classes into specific auditable controls. Quick mapping:
- LLM01 Prompt Injection: covered by Input (I-1..I-4), Prompt (P-1..P-8), Memory (M-2, M-5, M-6).
- LLM02 Insecure Output Handling: Output (O-1..O-6).
- LLM03 Training Data Poisoning: not covered here (this checklist is application-layer; training data is a model-development concern).
- LLM04 Model Denial of Service: Input (I-3, I-7), Output (O-4), Monitoring (L-3, L-5).
- LLM05 Supply Chain: partially covered by Tools (T-1) and Memory (M-6) where third-party data crosses into the prompt.
- LLM06 Sensitive Information Disclosure: Prompt (P-6, P-7), Output (O-6), Auth (A-4).
- LLM07 Insecure Plugin Design: entire Tools section.
- LLM08 Excessive Agency: Tools (T-3, T-4), Auth (A-3, A-5).
- LLM09 Overreliance: out of scope; this is a UX/communication issue, not a security control.
- LLM10 Model Theft: out of scope; covered by general API auth.
Related articles
- How to Pentest the LLM Layer in a Live Game Backend
- Prompt Injection Through Agent Memory
- Refund Tool Hijack in an LLM Support Agent
- KV Leak Channels in Agent State
- Mapping NIST 2026 AI Agent Security Controls to a Pentest Plan
- Auditing MCP Servers in 2026: Vulnerabilities & Self-Test Checklist
- Encoding-Smuggling Prompt Injection: Base64, Hex, Unicode-Escape
- Multi-Turn Jailbreak Attacks: Crescendo, Sugar-Coated Poison, Defense-in-Depth
Adjacent platforms worth walking this against
If your LLM application sits on top of any of these, the platform layer has its own surface that this checklist applies to:
- Supercraft GSB: managed game backend (matchmaking, dedicated servers, leaderboards, economy, auth). Studios baking LLM features into NPC dialog, moderation, or support agents inherit the prompt-construction (Category 2) and tool-use (Category 4) surface on top of GSB.
- memnode: agent memory architecture with episodic-to-semantic promotion and confidence-weighted recall. If you persist conversational state, walk the Memory + RAG category (Category 5) carefully; M-1 (memory scoped to identity) and M-3 (memory writes size-bounded) are the high-leverage controls.
- UsageBox: usage-based billing for LLM applications. If you are already metering LLM tokens, tool calls, or agent runtime per customer, you have most of the data needed for the Monitoring category (Category 7), especially L-5 (cost per user tracked). The open-source storage engine is at usagedb on GitHub.
Bi-weekly AI Security Brief
If you found this useful, the Austa team publishes a bi-weekly newsletter rounding up the most useful incidents (jailbreaks, agent attacks, data leaks, regulatory news) from the past two weeks. Curated, no fluff. Subscribe via the homepage.
License and republishing
This checklist is published under CC BY 4.0. Translate, adapt, embed, or republish freely with attribution to Austa (link back to this page). Pull requests with corrections or additions are welcome (we will publish the v2026.06 update in August).