AUSTA | Adversarial Intelligence

Engine Internals

The Target Harness: How Austa Connects to Chat APIs, Agents, RAG, and Browser Agents

Part 2 of the Austa engine series. The harness is the engine's boundary with reality. It is the single layer that knows what a target actually is, so that the generator, orchestrator, and judge never have to. Get the harness right and everything above it becomes target-agnostic. Get it wrong and every finding is suspect.

By Austa · Published · ~9 min read

One layer, one job: turn anything into an Exchange

The engine runs a closed loop: generate an attack, run it, judge the result, learn, repeat. The loop does not care whether the target is a raw chat-completion endpoint, a function-calling agent behind a queue, an MCP server, a RAG application, or a browser agent driving a real page. The harness is the contract that makes that indifference possible. Everything it touches comes back as one type: the Exchange.

An Exchange is a normalized record of a single interaction with the target. It carries four things, always in the same shape regardless of where they came from:

Once a target is wrapped, the rest of the engine sees only Exchanges. The judge inspects an Exchange. The orchestrator threads Exchanges into a session. The generator reads judge signal derived from Exchanges. This is the same separation of concerns that lets PyRIT point one orchestrator at many targets through a thin target abstraction, and the same idea behind the way promptfoo hides a dozen provider APIs behind a uniform provider interface. Austa generalizes that boundary so it covers not just text-in text-out models but full tool-using agents and their side effects.

The connectors

Each target class gets a connector. A connector knows how to drive one kind of system and how to read its results back into an Exchange. The set the engine ships with:

Chat-completion APIs

The simplest case. Send messages, read the completion. The connector normalizes role naming, system-prompt placement, and stop conditions across providers, and it pins sampling parameters so a run is reproducible. The output Exchange has messages and nothing else.

Tool and function-calling agents

Here the target does not just answer, it decides to act. The connector exposes a tool schema to the agent, receives the agent's tool-call requests, and routes them through the recording sandbox (below) instead of to the real implementation. The returned Exchange carries the full ordered list of tool calls with arguments, which is the part the judge cares about most: did the agent call the forbidden tool, and with what.

MCP servers

An MCP server publishes tools and resources over the Model Context Protocol. The connector speaks the protocol directly, enumerates the advertised tools, and presents them to the engine as a callable surface. This is what lets the engine treat an MCP server as a first-class target rather than something hidden behind an agent. The deeper attack surface here, tool poisoning and over-broad scopes, is covered in auditing MCP servers.

RAG applications

A RAG app retrieves before it answers. The connector captures the retrieval step so the retrieved chunks land in the Exchange's context field. That matters because many findings live in retrieval, not generation: the model behaved correctly given poisoned context. The harness makes the context visible so the judge can attribute the failure correctly.

Browser agents

The hardest connector. A browser agent reads pages and acts on them, so the harness must serve controlled page content, observe the agent's DOM-level actions, and capture both what the agent read and what it tried to do. The recorded Exchange ties the injected page content to the resulting action, which is exactly the evidence chain you need for indirect injection through browser agents.

Recording intent without firing the real side effect

This is the part that makes a pentest engine safe to run against a connected agent. If the target under test is a customer-support agent with a "process refund" tool, you absolutely want to know whether an attacker can talk it into a refund. You absolutely do not want it issuing real refunds during the test. The same logic applies to a coding agent with shell access, an agent with an HTTP-fetch tool, or anything that writes to a database.

The harness solves this with a recording sandbox: a mock tool layer that sits between the agent and every real implementation. When the agent calls a tool, the sandbox records the call name and arguments into the Exchange, then returns a plausible, schema-valid response so the conversation can continue, and never touches the real refund, shell, or HTTP endpoint. The agent believes the tool fired, the world is untouched, and the intent is captured.

Intent is the finding. The engine does not need the refund to actually process to prove the vulnerability. A recorded tool call to issue_refund with an attacker-controlled amount, produced by an attacker-controlled conversation, is the proof. Firing the real action would add risk and zero evidence.

Mock responses are deliberate, not random. A refund tool returns a success object with a synthetic transaction id; a shell tool returns canned output; a fetch tool returns a fixed document. Keeping these responses seeded is what keeps a multi-turn run deterministic, because the agent's next move depends on what the tool said. The sandbox is also where canary and tripwire tools live, so certain invocations prove a finding with no judge guesswork. That mechanism gets its own part later in the series.

Multi-turn sessions and conversation state

Most real attacks are not one shot. They build rapport over several turns, then pivot. The harness therefore models a session, not just an exchange. A session is an ordered list of Exchanges that share state: the conversation history, the agent's accumulated memory, any server-side session id the target hands back, and the cumulative tool-call ledger.

Carrying that state correctly between turns is target-specific work the connector owns. A stateless chat API needs the full message history resent every turn. A stateful agent endpoint may track its own session and only need the new user turn plus a session token. A browser agent may keep a live page open across turns. The orchestrator, which plans the multi-turn campaign, never sees this difference: it asks the harness to advance the session by one turn and gets back the next Exchange. The branching and backtracking that explores a conversation tree is the orchestrator's job, covered in its own part.

Record everything, replay exactly

Every Exchange the harness produces is written to a durable transcript: the exact messages sent, the raw response received, every tool call and its sandboxed result, the retrieved context, and the seeds and sampling parameters in force. This is not logging for debugging. It is the substance of a finding.

The payoff is deterministic replay. A finding is not "we saw the agent leak a secret once." It is a transcript that, fed back through the harness in replay mode, reproduces the same exchange every time. Replay mode reads the recorded target responses instead of calling the live target, so a finding can be reviewed, regression-tested, and shrunk to a minimal case without burning live calls or depending on the target staying online. Deterministic transforms in the generator are seeded pure functions and the stochastic parts run with pinned sampling, so the pipeline replays from its transcript even though parts of it are probabilistic. That recording is the difference between a one-off demo and a regression test.

The hard parts

The clean Exchange abstraction hides real engineering, and honest design fiction names where it strains.

Agents that hide their tool calls

Some agents do not surface their internal tool calls in their visible output, or they summarize them away. A harness that only reads the final assistant message would miss the dangerous call entirely. The connector has to capture tool invocations at the protocol or instrumentation layer, not by parsing prose, which is why the recording sandbox sits inline on the tool path rather than scraping the transcript after the fact.

Streaming

Streaming responses arrive as token deltas and interleaved tool-call fragments. The connector reassembles the stream into a complete, ordered Exchange before handing it up, so the rest of the engine never deals with partial state. Tool-call arguments that stream in pieces are buffered until the call is whole, because judging half an argument is worse than useless.

Nondeterminism

Even with fixed sampling, some targets are not fully reproducible: provider-side routing, model updates, and time-dependent tools all introduce drift. The harness cannot make a nondeterministic target deterministic, so it does the next best thing: it records enough that the engine can tell a genuine behavior change from noise, and it replays from the transcript for analysis rather than re-querying. When a finding stops reproducing on the live target, that is itself signal worth surfacing.

Rate limits

An adversarial loop is chatty. The harness centralizes backoff, retry, and concurrency control per target so that the orchestrator's parallelism does not trip the target's limits. Retries are recorded too, because a response that only came back after three 429s is a different observation than a clean first hit.

Testing through a guardrail

Many targets sit behind a defense layer such as NeMo Guardrails. The harness treats that layer as part of the system under test, not something to bypass. It wraps the guarded endpoint so the engine measures what actually reaches the model and what the guardrail blocks. A finding that only fires when the guardrail is misconfigured is a real finding, and the harness is what makes the guardrail's behavior observable in the Exchange.

Why the boundary matters

The harness is deliberately the least glamorous subsystem in the engine, and the most load-bearing. It is the only place that knows the messy truth of a specific target. By absorbing that mess into one normalized type, one recording sandbox, and one transcript format, it lets the corpus, generator, orchestrator, and judge stay simple and target-agnostic. A coding agent with shell access, whose blast radius is mapped in our shell-access threat model, plugs into the same Exchange contract as a plain chat endpoint. That uniformity is what lets one engine pentest the whole spectrum of LLM systems with the same closed loop.

The Austa engine series

Related reading