What kinds of data do LLM agents typically store in KV?

Session keys mapped to conversation IDs, conversation IDs mapped to the recent message buffer, intermediate tool-call results, partial outputs from streaming generations, rate-limit counters per user, feature flags per tenant, cached embeddings, and short-lived auth tokens. Most agent stacks split state across a fast KV layer and a slower durable store, with the KV layer holding everything in the hot path.

Why is the KV layer behind an agent a security concern?

Because most teams treat KV as plumbing rather than as an attack surface. Keys are often predictable (user_id prefixes, conversation timestamps), TTLs reveal session state, error messages reflect key existence, and prefix scans, where allowed, let an attacker enumerate everything for a target user. Each of those is a leak channel even when the application-level auth is correctly enforced.

What is the conversation-ID enumeration attack?

Agent stacks frequently use predictable conversation IDs (sequential, timestamp-based, or short random IDs with low entropy). An attacker who knows the format can probe for the existence of other users' conversations by checking which IDs return content versus a 404. Even without read access to the content, the existence signal alone leaks usage volume, active hours, and conversation cadence.

How does TTL inference work as a side channel?

If a KV entry is created when an agent action fires and removed by TTL when the action finishes, an attacker who can probe for the key's existence can infer the duration of the action. For long-running tool calls (data exports, batch generations, model warmups) the TTL window directly reveals operational state that should not be public.

What does a hardened KV layer for agent state look like?

Conversation and session IDs are high-entropy random values, not sequential or timestamp-based. Error responses for missing keys are indistinguishable from unauthorized access. Prefix scans require an explicit admin scope and are logged. TTLs on sensitive operations are constant-time, not data-dependent. Per-tenant namespacing is enforced inside the KV driver, not only at the API layer.

Security Engineering

KV Leak Channels: What Your Agent State Store Reveals

Behind every LLM agent there is a KV layer holding session state, conversation IDs, partial outputs, and tool-call traces. It is plumbing the security review almost never asks about. Four leak channels worth pentesting.

By Austa · Published May 11, 2026 · ~6 min read

The state behind every agent

An LLM agent is not really stateless. The user sees a chat-shaped interface. Underneath, the runtime keeps a lot of moving pieces in a fast KV store:

Session token to conversation ID mappings
Conversation ID to recent-message buffer
Tool-call inflight markers and intermediate results
Partial output chunks during streaming generation
Per-user rate-limit counters, feature-flag overrides, model-routing decisions
Cached embeddings and reusable RAG results

Most teams use a generic KV store for this layer (Redis, DynamoDB, or a managed equivalent like basekv when they want simpler ops). The choice rarely matters for security. What matters is whether the layer is treated as an attack surface or as invisible plumbing.

The realistic threat model: the application enforces auth on every API endpoint. The KV driver does not enforce anything. An attacker who finds a single bug, a leaked debug endpoint, or a buggy admin route gets primitives like "check if key X exists" or "list keys starting with Y." Those primitives are enough to leak per-user state without ever reading any record's contents.

Leak channel #1: conversation-ID enumeration

If conversation IDs are predictable (sequential, timestamp-based, short random IDs with low entropy), an attacker who knows the format can enumerate which IDs exist. They do not need to read the content. The existence signal alone leaks:

How many conversations a given user has had
The hours of the day they are active
The cadence of conversations (busy days versus quiet days)
Which sessions are still live (key exists with TTL) versus closed (no key)

What to test: capture a real conversation ID from your own session. Examine the format. Probe nearby IDs with a read request. Does the response distinguish "exists but forbidden" from "does not exist"? If yes, you have an existence oracle.

Mitigation that actually works

Use 128-bit random IDs that an attacker cannot guess. Return identical error codes for "missing" and "unauthorized." Apply the check inside the KV driver layer, not only at the HTTP layer where bug-prone middleware lives.

Leak channel #2: TTL inference

An agent action sets a KV entry when it begins and clears it when it finishes. The TTL is whatever the action expects to take. The existence-of-key check from channel #1 becomes a timing oracle.

An attacker who can poll for the existence of an action key learns:

How long expensive tool calls take (data exports, batch generations, model warmups)
When admin actions fire (with timing patterns visible across hours)
Whether a long-running operation is making progress or stalled

What to test: trigger a known long-running action in one session. From another vantage point, probe for the corresponding key. Measure the time-to-disappearance. Repeat under varying inputs to see if the TTL window changes with input size (data-dependent timing).

Mitigation that actually works

Operations that should not be externally observable use a constant-time wrapper: the key lives for at least N seconds regardless of actual completion time, with the real result stored under a separate unguessable key.

Leak channel #3: error-message reflection

The KV driver returns errors. The application maps them to HTTP responses. Common mistakes:

"Key already exists" returned for a conflict, leaking that some other user's key collided
"Value too large" leaking byte counts of stored data
"Invalid namespace" leaking which namespaces exist versus do not
Latency differences between "key found" and "key not found" leaking via timing

What to test: fuzz the request shapes and capture every distinct error response. Cluster them. Any cluster that varies based on the existence or content of someone else's data is a leak.

Mitigation that actually works

A single error code and message for the entire "denied or missing" class. Detailed errors only in server logs that the user never sees.

Leak channel #4: prefix-scan side channels

Some KV implementations allow scans by key prefix. If the admin API exposes scan and a less-privileged role can reach it (intentionally or through bug), the attacker enumerates every key for a target tenant.

Even when scan is correctly gated, a clever attacker can sometimes get a prefix-scan-equivalent through:

Index queries on a metadata table backing the KV
Backup endpoints that return key listings
Debug or telemetry endpoints leaking key names
Application-level "list my conversations" routes with broken tenant checks

What to test: map every endpoint that returns a key name, a count, or anything derivable from "this key exists." Confirm each enforces tenant isolation. Repeat with two test tenants to verify the boundary holds.

A test loop you can run today

Inventory the KV access surface. Every endpoint, every backoffice tool, every debug route that reads or writes the KV.
Capture key shapes. Look at a normal session and note every key the agent creates. Document the format.
Probe with two accounts. Drive the same flows from account A and account B. Try to reach A's keys from B. Try to enumerate.
Time the responses. Differentials of 5 ms or more between "exists" and "missing" are real channels.
Cluster the errors. Group responses by error code, message, latency. Any cluster that depends on someone else's state is a finding.
Triage by impact. Conversation-content disclosure is P0. Existence oracle without content is P1. Timing differential without an actionable inference is informational.

Final thought

KV stores feel safe because they are not databases. They are not running SQL. There is no parser to fuzz. That feeling is wrong. The leak channels are smaller per-channel, but the surface is wider because every agent runtime ships a custom KV access layer that nobody pentested.

If your agent has state, it has a KV layer. If it has a KV layer, it has these four channels. Test for them before someone else does.

Permission-aware RAG retrieval covers the parallel surface in vector and RAG stores.
Prompt injection against persistent agent memory covers the poisoning direction of the same surface.
Coding agents with shell access covers what an attacker who finds a KV leak can do next.

KV Leak Channels: What Your Agent State Store Reveals

The state behind every agent

Leak channel #1: conversation-ID enumeration

Mitigation that actually works

Leak channel #2: TTL inference

Mitigation that actually works

Leak channel #3: error-message reflection

Mitigation that actually works

Leak channel #4: prefix-scan side channels

A test loop you can run today

Final thought

Related