Regulatory
Mapping NIST 2026 AI Agent Security Controls to a Pentest Plan
NIST's draft AI Agent Security guidance is the first US-government framework that names the actual surfaces of modern agent stacks: tool calls, persistent memory, indirect prompt injection, MCP-style integrations. The controls are easy to map to a pentest plan, and most of them are testable today without specialized tooling.
What the draft actually covers
The NIST draft opened for public comment in February 2026, with a deadline of March 9, 2026. The published outline names four control areas: identity and authentication for agents, tool-use boundaries, data-handling for persistent context, and incident response specific to agent failure modes. This is the first time a US-government framework has been written for agent stacks rather than for chatbots.
Most of the draft does not introduce new concepts. It connects existing security expectations (access control, least privilege, logging, incident response) to the specific shapes that agent systems take. A team that already takes those expectations seriously for traditional systems will recognize the work; a team that has been deploying agents without applying them will find a long to-do list.
The four control areas mapped to test objectives
Identity and authentication for agents
NIST's framing: an agent acting on a user's behalf must have a verifiable identity, scoped credentials, and a clear delegation chain from the user. The agent should not silently use the user's full credentials for actions the user did not authorize.
Pentest objectives derived from this:
- Identify what credentials the agent process has access to. Are they user-scoped or system-scoped?
- Test whether the agent uses different credentials for read versus write operations. Are tool calls that modify state distinguished from those that read state?
- Verify that agent actions can be attributed back to the user who authorized them. If an audit log shows "agent X called tool Y," can you trace which user's session generated it?
- Attempt agent-to-agent privilege escalation. Can one agent persuade another to act on its behalf in ways neither user authorized?
Tool-use boundaries
NIST's framing: agents should have explicit, documented sets of tools per role, with bounded effects, with policy enforcement outside the LLM, and with logging at the call boundary.
Pentest objectives:
- Enumerate every tool the agent can call in production. Is the list documented? Is it the same list the team thinks it is?
- For each tool, identify the maximum impact of a single call. Refund amounts, data export sizes, account changes per call.
- Test for tool-use outside policy. Inject the agent with a task that requires a tool call the policy should reject. Does it fire?
- Test for tool-call laundering. Can the agent achieve a prohibited outcome by chaining permitted tools?
- Verify logs capture every tool call with arguments, the prompt that produced it, and the final outcome.
Data handling for persistent context
NIST's framing: agent memory (per-session, cross-session, RAG context, vector stores) is data that needs the same classification, retention, and access controls as any other data the organization holds. The novel surface is that this data is also model-readable, which creates exfiltration paths classical data systems do not have.
Pentest objectives:
- Inventory every persistent-context store the agent uses (conversation history, vector DB, KV cache, MCP memory server).
- Classify the data each store may contain. PII, financial, health, intellectual property.
- Test cross-session leakage. Can user A's session retrieve content from user B's session through a shared retriever?
- Test poisoning. Can adversarial content be planted in the store so that future legitimate queries retrieve it?
- Verify retention. Are old entries actually deleted on the schedule the policy specifies, or are they still queryable?
Incident response for agent failure modes
NIST's framing: agent incidents differ from web-app incidents. The trigger is often an input the team has never seen, the failure can be "did the wrong thing many times before anyone noticed," and the forensic data is conversational rather than packet-capture.
Pentest objectives:
- Confirm there is a detection pipeline for anomalous agent behavior (sudden spike in a tool's usage, unusual outbound calls, rate-limit triggers).
- Run a tabletop incident: a hypothetical that the agent fired a refund tool 500 times in an hour due to a prompt-injection campaign. Confirm the team can pause, investigate, and reverse.
- Verify retention of conversation traces sufficient for post-incident analysis. Most teams keep too little.
- Test the rollback path. Can the team disable a tool, restrict a model, or revert a memory store quickly when a finding lands?
What to test first
If you have a week to apply the framework, prioritize the controls that have the largest blast radius when broken:
- Tool-call outside policy. Test whether a manipulated input can produce a tool call that violates policy. This is where most real incidents will live for the next 12-18 months.
- Cross-session memory leakage. Test whether the retriever can return another user's data. This is the data-breach shape regulators understand.
- Agent privilege boundaries. Confirm the agent does not silently inherit broader credentials than the user intended.
- Detection and pause. Confirm anomaly detection exists and the team has a working pause path.
The other controls matter and should be in the report. But these four cover the highest-cost scenarios.
Reporting against the framework
A NIST-aligned pentest report for an agent system usually has four sections matching the four control areas, each with:
- What was tested (scope, attack scenarios run, tools used)
- Findings, severity-ranked, with the prompt or input that produced each
- Recommendations linked to specific NIST control IDs from the draft
- What was not tested and why (residual risk acknowledgment)
The residual-risk section is the one most teams underweight. The agent surface is broad enough that no single engagement covers it. Being explicit about what was out of scope is more useful than implying full coverage.
The practical read: the NIST draft is not a new threat model. It is a clean structure to organize work most security teams should have been doing anyway. If you have not run a tool-use boundary pentest on your agent in 2026, the framework gives you a defensible reason to schedule one.
What is missing from the draft
Honest observation: the draft is light on the supply-chain side. MCP servers, third-party tools, and the npm-pattern risks discussed in MCP supply-chain attacks are mentioned in passing but not deeply controlled. If you are auditing a stack with significant third-party integration, supplement the NIST controls with your own supply-chain checks.
The draft also treats persistent memory as a single category. In practice, memory comes in several shapes (per-session, cross-session, RAG, MCP memory server) with different risk profiles. A real audit needs the granularity NIST does not yet provide.
Related
- The 2026 LLM security checklist covers the controls in implementation detail.
- Prompt injection against persistent agent memory covers the data-handling control area in depth.
- Refund-tool hijack is a worked example of the tool-use-boundary control area in a game-backend context.