The Agent Firewall: How Enforced Guardrails for AI Trading Agents Work

The 60-second version

Every guardrail in our Safety Kit — per-trade caps, ticker allowlists, the kill switch — is text you paste into your agent’s prompt. That makes them requests. A prompt-injected agent, or one that simply misreads a number, can ignore requests.

The Agent Firewall turns those same rules into guarantees. It’s a small program that runs on your machine, sits between your AI agent and your broker, and checks every trade against your policy before it can happen. The agent never holds your broker credentials, so there’s no way around the check. If the agent tries something outside policy — because it was manipulated, confused, or stuck in a loop — the trade is blocked, logged, and you get told. No matter how convinced the agent is.

It’s open source, it runs locally, and it’s in development now. Join the waitlist for the beta and founding pricing. The rest of this page is the deep technical version.

The problem: your safeguards live inside the thing being attacked

Today’s standard setup looks like this:

agent (Claude / ChatGPT / Codex)
  └── holds broker credentials
  └── guardrails = text in its own prompt
  └── talks directly to the broker MCP server

Notice where everything important lives: inside the agent. The credentials are in the agent’s environment. The guardrails are in the agent’s context window. The decision to honor or ignore them is made by the same model that an attacker is feeding poisoned headlines and fake ticker data.

In security architecture terms, the policy decision point sits inside the trust boundary of the component you can’t trust. That’s the design flaw — not any particular model, not any particular prompt. OWASP classifies prompt injection as the number-one LLM risk precisely because instructions and data share one channel, and no prompt phrasing fully separates them.

The architecture: move enforcement outside the model

The firewall restructures the setup so the model is no longer the last line of defense:

agent (MCP client)
  │  sees ONLY the firewall's tool surface — no credentials
  ▼
agent-firewall          (local process, open source)
  ├── policy.yaml       your rules, machine-readable
  ├── audit.jsonl       every attempt, allowed or blocked
  ├── killswitch        one command halts everything
  │   holds the broker credentials
  ▼
broker MCP server / API

Mechanically, the firewall is an MCP server to your agent and an MCP client to your broker. Your agent’s MCP configuration points at the firewall instead of the broker. The firewall spawns and wraps the real broker server as a child process, passes through read-only calls (quotes, positions, history), and runs every write call — anything that moves money — through the policy engine first.

Three properties fall out of this design, and they’re the entire product:

1. Credential isolation

The broker API key lives in the firewall’s environment, never the agent’s. This is the load-bearing property: enforcement is only real if the agent has no second path to your account. An agent that holds credentials can be talked into using them directly; an agent that has never seen them cannot. (The honest caveat: if you also give an agent browser control over a logged-in brokerage session, you’ve reopened a path the firewall can’t see. Don’t do that — more in the threat-model section below.)

2. Deterministic policy evaluation

Every place_order call is checked against your policy in a fixed order: instrument scope → ticker allowlist/denylist → trading-hours window → per-trade cap → daily-volume cap → concentration cap → frequency circuit breaker → approval threshold. These are arithmetic comparisons on the structured tool call — order.value <= funding * 0.05 — not language a model interprets. There is no prompt to inject, no context window to flood, no persona to jailbreak. The check passes or it doesn’t.

3. Fail-closed semantics

If the policy engine errors, if the policy file is malformed, if the firewall crashes mid-call — the order fails. The failure mode is "your agent stops trading," never "your agent trades unguarded." For a safety system this is the only acceptable default, and it’s worth stating plainly because it’s also the trade-off: a firewall outage means missed trades, not unbounded ones.

The approval gate and kill switch

Orders above your approval threshold (5–33% of funding, depending on tier) aren’t blocked — they’re held. The firewall queues the order and asks a human. In the free tier that’s a terminal prompt on your machine; in the Pro tier it’s a push notification you can approve from anywhere. This is the same pattern hardware wallets use: the transaction physically halts until a confirmation happens on a channel the attacker doesn’t control.

The kill switch is a file-presence check evaluated before every single tool call. firewall kill creates it; every subsequent write call is rejected until you explicitly re-arm. You never race your own agent to the brokerage app again.

The audit log

Every attempt — allowed, blocked, or held — is appended to a local JSONL log:

{"ts":"2026-06-11T14:32:07Z","tool":"place_order","ticker":"XYZ",
 "value":4000,"decision":"blocked","rule":"ticker_allowlist",
 "prev_hash":"9f2c…","hash":"a41d…"}

Each entry includes the hash of the previous entry, making the log a hash chain: any after-the-fact tampering breaks the chain visibly. When you ask "what did my agent try to do while I was asleep," the answer is complete, ordered, and tamper-evident — including the attempts that never reached the broker, which are exactly the ones prompt-level logging never shows you.

What it does NOT protect against

A security tool that won’t state its limits shouldn’t be trusted with your money, so:

Bad trades within policy. The agent buys an allowlisted ticker, inside your caps, at a terrible price. The firewall bounds the blast radius; it does not make the agent smart. Losses inside your chosen bounds are still losses.
A compromised computer. Malware on the machine running the firewall can read the credentials the firewall holds. The firewall defends against a misbehaving agent, not a rooted host.
Browser-driving agents. An agent operating your logged-in brokerage website bypasses the MCP path entirely. The firewall protects the API/MCP route; pair it with the rule that agents never get browser access to your broker.
You. It’s a self-binding tool — you can always edit your own policy or shut the firewall off. It enforces the limits you chose when you were thinking clearly, against an agent acting when you aren’t watching.

Open source, local-first, and what the paid tier is

The enforcement engine is open source — for software that sits in the path between an AI and your money, auditability is the feature, and you should never have to pay to be safe on your own machine. The paid tier (Firewall Pro) is the part that physically requires servers: real-time phone alerts, approve-from-anywhere, a remote kill switch, hosted dashboards, and 90-day audit history. The cloud side receives telemetry and sends signed commands; it never holds your broker credentials and cannot initiate trades.

The full design is documented in this cluster: the policy file format, the setup walkthrough, the conceptual case in Prompt guardrails vs. enforcement — and for developers embedding the enforcement engine in their own agent stack, the SDK guide.

The beta opens to the waitlist first, with founding pricing locked for life. Join the waitlist →

The Agent Firewall: how enforced guardrails actually work