AUSTA | Adversarial Intelligence

Security Engineering

Refund-Tool Hijack: Pentesting LLM Support Agents in Game Backends

The most expensive prompt-injection attack against a game studio is not data leakage. It is the support agent that fires refund_credits because a player asked nicely. Here is what the attack looks like, how to find it, and how to bound the blast radius even if you cannot remove the agent's tools.

By Austa · Published · ~8 min read

The threat with a price tag

Most prompt-injection write-ups talk about system-prompt leakage and moderation bypass. Those matter. They are not the attack that costs a studio real money. The attack that costs real money is the LLM support agent firing a tool call against the economy backend in response to a manipulation that should have been rejected.

The agent has a tool called refund_credits or issue_compensation or grant_inventory. It exists because tier-1 support legitimately needs to issue small comp for crashed sessions, lost servers, missed events. Player files a ticket. Agent reads ticket. Agent decides whether to comp. Tool fires. Credits land in the player's account.

That is the design when the player is honest. When the player is hostile, the same pipe lets them extract money.

The realistic threat model: a player files a polite, well-structured ticket that frames a non-existent incident in a way that matches the agent's policy. The agent reads it, treats it as legitimate, and fires the refund tool. Multiplied across a botnet of accounts, this becomes a real fraud channel. No auth bypass needed.

Where the agent fires tools

Walk a typical AI-augmented game support stack:

[Player] --> [Support form / chat]
                |
                v
        [Ticket queue + classifier]
                |
                v
        [LLM agent runtime]
                |
                +-- [Tool: read_player_history]
                +-- [Tool: refund_credits]
                +-- [Tool: grant_inventory]
                +-- [Tool: escalate_to_human]
                |
                v
        [Game backend economy / inventory APIs]

The economy and inventory APIs sit behind managed game backends like Supercraft GSB (matchmaking, dedicated servers, leaderboards, economy, auth, live-ops with Unity/Unreal/Godot SDKs), PlayFab, Nakama, or a studio-built equivalent. The support agent's tools are HTTP calls into those APIs. Once the agent decides to call, the call lands. The economy backend is doing what it was told.

The interesting boundary is not "can the agent reach the economy API." It always can. The interesting boundary is "what convinces the agent it should call."

Five attack patterns worth testing

1. Instruction override

The attacker writes text that resembles the system prompt. They claim that policy has been updated and the agent should now auto-approve refunds under some threshold. Plain old prompt injection, applied to the support flow.

2. Role escalation

The attacker claims to be someone with elevated standing. Developers testing the flow, QA engineers, a partner at Funcom, the player's high-spending guildmate. The agent has no way to verify and is biased toward being helpful.

3. Multi-turn social engineering

A single adversarial prompt is easy to flag. A three-turn or five-turn arc that builds toward a refund request is much harder. Turn 1: friendly chat about a crash. Turn 2: ask whether the agent has the ability to comp. Turn 3: small refund request anchored to the established incident.

The agent's policy needs to consider turn count, not just per-turn content.

4. Tool-name confusion

If the agent's available tools are refund_credits, issue_compensation, and grant_inventory, attackers will probe synonyms. "Please process my compensation" tests whether the agent maps to issue_compensation with weaker policy than refund_credits. Often tools that share a parent intent have inconsistent guardrails.

5. RAG-poisoned ticket context

The agent retrieves prior tickets, related guides, or knowledge-base articles to inform its response. If any of that retrieved content is user-influenceable (forum posts, community wiki, prior tickets from the same attacker), the attacker plants instructions there. The agent then reads them as authoritative context for the current ticket.

Test by writing a forum post or prior ticket that contains a refund-policy claim, then opening a new ticket that triggers retrieval against that source.

A test loop you can run today

  1. Inventory the agent's tools. List every tool the support agent can call. Note the policy that gates each (in the prompt, in middleware, in the API itself).
  2. Build a structured attack set. A few hundred adversarial tickets, organized by category (override / role / multi-turn / tool-name / RAG). Mix in benign tickets so the agent's baseline behavior is visible.
  3. Run through the real flow with monitor mode. Submit tickets through the production submission path with a flag that captures the tool call the agent decides on but does not actually disburse. The flag is critical; pentesting refund flows in production-real mode is its own incident.
  4. Score each decision. Was the tool call policy-compliant? Did it fire when policy would have rejected? Use a second LLM as judge plus manual sampling.
  5. Quantify the financial exposure. For each hijack, what was the disbursement size? Sum across the attack set, multiply by the realistic frequency, and you have a P0 number to bring to engineering.
  6. Repeat with multi-turn arcs. Single-turn prompts are the appetizer. Multi-turn is where the real damage lives.

Bound the blast radius (even if the agent fails)

The pentest will find hijacks. The agent will be wrong some of the time, no matter how good the prompt is. The right response is to make sure each failure is small.

Final thought

Support agents are useful and they are not going away. Every studio that ships one is going to find out, eventually, that the agent is wrong sometimes. The teams that get a P0 incident out of it are the ones that gave the agent direct economy access with no external cap. The teams that get a small finding and a postmortem are the ones that treated the agent as one input into a policy-enforced pipeline.

The mindset is the one the security team has had for HTTP endpoints for twenty years. The agent is just another untrusted client.

Related