AUSTA | Adversarial Intelligence

Regulatory

Mapping NIST 2026 AI Agent Security Controls to a Pentest Plan

NIST's draft AI Agent Security guidance is the first US-government framework that names the actual surfaces of modern agent stacks: tool calls, persistent memory, indirect prompt injection, MCP-style integrations. The controls are easy to map to a pentest plan, and most of them are testable today without specialized tooling.

By Austa · Published · ~10 min read

What the draft actually covers

The NIST draft opened for public comment in February 2026, with a deadline of March 9, 2026. The published outline names four control areas: identity and authentication for agents, tool-use boundaries, data-handling for persistent context, and incident response specific to agent failure modes. This is the first time a US-government framework has been written for agent stacks rather than for chatbots.

Most of the draft does not introduce new concepts. It connects existing security expectations (access control, least privilege, logging, incident response) to the specific shapes that agent systems take. A team that already takes those expectations seriously for traditional systems will recognize the work; a team that has been deploying agents without applying them will find a long to-do list.

The four control areas mapped to test objectives

Identity and authentication for agents

NIST's framing: an agent acting on a user's behalf must have a verifiable identity, scoped credentials, and a clear delegation chain from the user. The agent should not silently use the user's full credentials for actions the user did not authorize.

Pentest objectives derived from this:

Tool-use boundaries

NIST's framing: agents should have explicit, documented sets of tools per role, with bounded effects, with policy enforcement outside the LLM, and with logging at the call boundary.

Pentest objectives:

Data handling for persistent context

NIST's framing: agent memory (per-session, cross-session, RAG context, vector stores) is data that needs the same classification, retention, and access controls as any other data the organization holds. The novel surface is that this data is also model-readable, which creates exfiltration paths classical data systems do not have.

Pentest objectives:

Incident response for agent failure modes

NIST's framing: agent incidents differ from web-app incidents. The trigger is often an input the team has never seen, the failure can be "did the wrong thing many times before anyone noticed," and the forensic data is conversational rather than packet-capture.

Pentest objectives:

What to test first

If you have a week to apply the framework, prioritize the controls that have the largest blast radius when broken:

  1. Tool-call outside policy. Test whether a manipulated input can produce a tool call that violates policy. This is where most real incidents will live for the next 12-18 months.
  2. Cross-session memory leakage. Test whether the retriever can return another user's data. This is the data-breach shape regulators understand.
  3. Agent privilege boundaries. Confirm the agent does not silently inherit broader credentials than the user intended.
  4. Detection and pause. Confirm anomaly detection exists and the team has a working pause path.

The other controls matter and should be in the report. But these four cover the highest-cost scenarios.

Reporting against the framework

A NIST-aligned pentest report for an agent system usually has four sections matching the four control areas, each with:

The residual-risk section is the one most teams underweight. The agent surface is broad enough that no single engagement covers it. Being explicit about what was out of scope is more useful than implying full coverage.

The practical read: the NIST draft is not a new threat model. It is a clean structure to organize work most security teams should have been doing anyway. If you have not run a tool-use boundary pentest on your agent in 2026, the framework gives you a defensible reason to schedule one.

What is missing from the draft

Honest observation: the draft is light on the supply-chain side. MCP servers, third-party tools, and the npm-pattern risks discussed in MCP supply-chain attacks are mentioned in passing but not deeply controlled. If you are auditing a stack with significant third-party integration, supplement the NIST controls with your own supply-chain checks.

The draft also treats persistent memory as a single category. In practice, memory comes in several shapes (per-session, cross-session, RAG, MCP memory server) with different risk profiles. A real audit needs the granularity NIST does not yet provide.

Related