Is the NIST draft finalized?

As of mid-2026, the public-comment window closed in March 2026 and a revised draft has been circulating. The final version is expected later in 2026. The control structure has been stable across drafts; treating the four control areas as a working baseline is reasonable today.

Does NIST compliance equal regulatory compliance?

No, but it is a defensible reference point. NIST guidance is widely cited in US-government procurement and in several state-level AI regulatory drafts. If you can show a pentest report mapped to NIST controls, you are in a stronger position than if you cannot.

Which control area is highest priority for most teams?

Tool-use boundaries. The highest-impact agent incidents in 2025-26 have been agents firing real tools in response to manipulated inputs. This is also the area where pentest findings most often produce concrete fixes.

How much does a NIST-aligned agent pentest cost?

For a focused engagement against a single agent stack (one model, one tool set, one memory store), 1-2 engineer-weeks is realistic. For a full enterprise multi-agent fleet, 4-8 weeks. The work scales with the diversity of agents and tools, not the headcount of users.

What about supply-chain controls?

The NIST draft is light on supply-chain specifics. Supplement with your own controls for third-party MCP servers and agent dependencies. We cover those in detail in mcp-supply-chain-attacks and auditing-mcp-servers-2026.

Regulatory

Mapping NIST 2026 AI Agent Security Controls to a Pentest Plan

NIST's draft AI Agent Security guidance is the first US-government framework that names the actual surfaces of modern agent stacks: tool calls, persistent memory, indirect prompt injection, MCP-style integrations. The controls are easy to map to a pentest plan, and most of them are testable today without specialized tooling.

By Austa · Published May 21, 2026 · ~10 min read

What the draft actually covers

The NIST draft opened for public comment in February 2026, with a deadline of March 9, 2026. The published outline names four control areas: identity and authentication for agents, tool-use boundaries, data-handling for persistent context, and incident response specific to agent failure modes. This is the first time a US-government framework has been written for agent stacks rather than for chatbots.

Most of the draft does not introduce new concepts. It connects existing security expectations (access control, least privilege, logging, incident response) to the specific shapes that agent systems take. A team that already takes those expectations seriously for traditional systems will recognize the work; a team that has been deploying agents without applying them will find a long to-do list.

The four control areas mapped to test objectives

Identity and authentication for agents

NIST's framing: an agent acting on a user's behalf must have a verifiable identity, scoped credentials, and a clear delegation chain from the user. The agent should not silently use the user's full credentials for actions the user did not authorize.

Pentest objectives derived from this:

Identify what credentials the agent process has access to. Are they user-scoped or system-scoped?
Test whether the agent uses different credentials for read versus write operations. Are tool calls that modify state distinguished from those that read state?
Verify that agent actions can be attributed back to the user who authorized them. If an audit log shows "agent X called tool Y," can you trace which user's session generated it?
Attempt agent-to-agent privilege escalation. Can one agent persuade another to act on its behalf in ways neither user authorized?

Tool-use boundaries

NIST's framing: agents should have explicit, documented sets of tools per role, with bounded effects, with policy enforcement outside the LLM, and with logging at the call boundary.

Pentest objectives:

Enumerate every tool the agent can call in production. Is the list documented? Is it the same list the team thinks it is?
For each tool, identify the maximum impact of a single call. Refund amounts, data export sizes, account changes per call.
Test for tool-use outside policy. Inject the agent with a task that requires a tool call the policy should reject. Does it fire?
Test for tool-call laundering. Can the agent achieve a prohibited outcome by chaining permitted tools?
Verify logs capture every tool call with arguments, the prompt that produced it, and the final outcome.

Data handling for persistent context

NIST's framing: agent memory (per-session, cross-session, RAG context, vector stores) is data that needs the same classification, retention, and access controls as any other data the organization holds. The novel surface is that this data is also model-readable, which creates exfiltration paths classical data systems do not have.

Pentest objectives:

Inventory every persistent-context store the agent uses (conversation history, vector DB, KV cache, MCP memory server).
Classify the data each store may contain. PII, financial, health, intellectual property.
Test cross-session leakage. Can user A's session retrieve content from user B's session through a shared retriever?
Test poisoning. Can adversarial content be planted in the store so that future legitimate queries retrieve it?
Verify retention. Are old entries actually deleted on the schedule the policy specifies, or are they still queryable?

Incident response for agent failure modes

NIST's framing: agent incidents differ from web-app incidents. The trigger is often an input the team has never seen, the failure can be "did the wrong thing many times before anyone noticed," and the forensic data is conversational rather than packet-capture.

Pentest objectives:

Confirm there is a detection pipeline for anomalous agent behavior (sudden spike in a tool's usage, unusual outbound calls, rate-limit triggers).
Run a tabletop incident: a hypothetical that the agent fired a refund tool 500 times in an hour due to a prompt-injection campaign. Confirm the team can pause, investigate, and reverse.
Verify retention of conversation traces sufficient for post-incident analysis. Most teams keep too little.
Test the rollback path. Can the team disable a tool, restrict a model, or revert a memory store quickly when a finding lands?

What to test first

If you have a week to apply the framework, prioritize the controls that have the largest blast radius when broken:

Tool-call outside policy. Test whether a manipulated input can produce a tool call that violates policy. This is where most real incidents will live for the next 12-18 months.
Cross-session memory leakage. Test whether the retriever can return another user's data. This is the data-breach shape regulators understand.
Agent privilege boundaries. Confirm the agent does not silently inherit broader credentials than the user intended.
Detection and pause. Confirm anomaly detection exists and the team has a working pause path.

The other controls matter and should be in the report. But these four cover the highest-cost scenarios.

Reporting against the framework

A NIST-aligned pentest report for an agent system usually has four sections matching the four control areas, each with:

What was tested (scope, attack scenarios run, tools used)
Findings, severity-ranked, with the prompt or input that produced each
Recommendations linked to specific NIST control IDs from the draft
What was not tested and why (residual risk acknowledgment)

The residual-risk section is the one most teams underweight. The agent surface is broad enough that no single engagement covers it. Being explicit about what was out of scope is more useful than implying full coverage.

The practical read: the NIST draft is not a new threat model. It is a clean structure to organize work most security teams should have been doing anyway. If you have not run a tool-use boundary pentest on your agent in 2026, the framework gives you a defensible reason to schedule one.

What is missing from the draft

Honest observation: the draft is light on the supply-chain side. MCP servers, third-party tools, and the npm-pattern risks discussed in MCP supply-chain attacks are mentioned in passing but not deeply controlled. If you are auditing a stack with significant third-party integration, supplement the NIST controls with your own supply-chain checks.

The draft also treats persistent memory as a single category. In practice, memory comes in several shapes (per-session, cross-session, RAG, MCP memory server) with different risk profiles. A real audit needs the granularity NIST does not yet provide.

The 2026 LLM security checklist covers the controls in implementation detail.
Prompt injection against persistent agent memory covers the data-handling control area in depth.
Refund-tool hijack is a worked example of the tool-use-boundary control area in a game-backend context.
Anthropic Workspaces security model maps a real vendor's isolation controls onto the governance area of this plan.
Microsoft's Claude Code exposure audit is a worked example of the post-incident audit these controls are meant to enable.

Mapping NIST 2026 AI Agent Security Controls to a Pentest Plan

What the draft actually covers

The four control areas mapped to test objectives

Identity and authentication for agents

Tool-use boundaries

Data handling for persistent context

Incident response for agent failure modes

What to test first

Reporting against the framework

What is missing from the draft

Related