The Agent Firewall SDK: Embed Enforced Guardrails in Your Own Agent

The 60-second version

If you’re building your own trading agent, MCP server, or agent platform, you have the same problem we wrote a whole architecture piece about: guardrails in the prompt are advisory, and the component that enforces limits has to live outside the model. The SDK is that component as a library — a pure TypeScript policy engine you embed in your own tool path: policy in, order in, decision out. No I/O, no framework, no opinions about your stack.

It’s the exact engine inside the Agent Firewall, so its contracts are the ones we publish: fail-closed, deterministic order, latching circuit breaker, caller-persisted state. The code is implemented and tested today; it ships as @secprove/agent-firewall with the open-source release. Waitlist here — the rest of this page is the API.

Who this is for

Three audiences keep asking for the same thing in different words. Developers building custom trading agents (not via MCP) who want real limits without reinventing them. Authors of broker MCP servers or wrappers who want enforcement inside the server itself. And platform teams putting agents in front of users, who need per-user policy enforcement plus an audit trail they can show compliance. In every case the rule is the one from the enforcement piece: the policy decision point must sit outside the model’s trust boundary. The SDK is a policy decision point in a box.

Five minutes to enforced

import { Firewall, tierPolicy } from "@secprove/agent-firewall";

// A policy: the Safety Kit tiers, or hand-rolled YAML parsed to an object.
const policy = tierPolicy("balanced", 2000, { denylist: ["TSLA", "GME"] });

const fw = Firewall.create(policy, new Date());

// In your tool path, before anything reaches the broker:
const decision = fw.submit(
  {
    tool: "place_order",
    ticker: "NVDA",
    side: "buy",
    notionalUsd: 180,        // market orders: price at quote + slippage buffer
    instrument: "equities",
  },
  new Date(),
);

// decision.outcome is "allow" | "hold" | "block"
// decision.rule names the exact rule that fired; decision.reason is human-readable

submit runs the documented pipeline — kill switch → instrument scope → ticker rules → hours → per-trade cap → daily volume → concentration → circuit breaker → approval gate — first failure wins, and an allow atomically advances the counters. Three outcomes, three behaviors for your integration: allow means forward to the broker, block means return the structured refusal to the agent (the reason string is designed to be shown to the model — it explains the boundary instead of inviting a retry loop), and hold means a human has to answer.

Holds: the approval gate as an API

Orders over the approval threshold park as a HeldOrder with a policy-derived expiry. Your job is to surface them to a human; the SDK’s job is to make every resolution path safe:

for (const hold of fw.pendingHolds(new Date())) {
  notifyHuman(hold);                       // push, SMS, terminal — your channel
}

fw.approve(holdId, new Date());            // re-checked, then executed
fw.deny(holdId, new Date());               // recorded, never executed
fw.expireHolds(new Date());                // overdue holds become "no"

Two semantics here are load-bearing, and both are tested promises. A late approval fails closed — approving after the timeout yields a block, because the human was answering a stale question. And approval answers the gate, not the other rules: an approved order is re-evaluated against current state before executing, so if the daily budget got consumed (or the kill switch engaged) while the hold sat, the human’s "yes" cannot override the cap. A human can authorize a big order; a human cannot authorize a policy violation.

State: you persist it, restarts can’t refill budgets

The engine is pure — it never touches a disk or a clock you didn’t hand it. Firewall wraps that purity in one serializable object:

await db.put("firewall", fw.snapshot());           // after every submit/approve

const restored = Firewall.restore(await db.get("firewall"));

The snapshot carries policy, counters, pending holds, and the audit log. Restoring re-validates the policy (a malformed one throws — there is no default policy, by design) and re-verifies the audit chain (a tampered snapshot refuses to load). Persist it atomically with order submission: that’s the property that stops both parallel-order races past a cap and the restart-to-refill-the-budget trick.

The audit chain

Every evaluated call — allowed, blocked, held, approved, expired — appends a hash-chained entry:

import { verifyChain } from "@secprove/agent-firewall";

const log = fw.auditLog();
const { valid, brokenAt } = verifyChain([...log]);

Each entry commits to its predecessor’s hash, so edits, deletions, and reorders all break the chain at the exact tampered entry. For platform teams this is the compliance artifact: a complete, ordered, tamper-evident record of everything the agent attempted, including the attempts that never reached a broker — which are precisely the ones prompt-level logging never shows you.

The contracts, stated as API guarantees

Fail closed, everywhere. Engine error → block. Malformed order → block. Malformed policy → constructor throws. Unanswered hold → expired. Unknown hold id → block. There is no code path where uncertainty resolves to "allow."
Deterministic and replayable. Same inputs, same decision, same rule — evaluateOrder(req, policy, state, now) is exported raw for property tests and CI. Run your policy against a corpus of attack-shaped orders in your test suite; we do.
The breaker latches. Frequency trips don’t auto-heal; only an explicit rearm() clears them, and the latch is in persisted state, so a restart doesn’t reset it.
No hidden dependencies. The core is dependency-light (schema validation only), framework-free, and side-effect-free — embeddable in an MCP server, a Lambda, or your agent’s event loop without ceremony.

What the SDK deliberately does not do

It doesn’t talk to brokers — you own the I/O, which is what makes it broker-agnostic. It doesn’t price market orders — you pass notionalUsd priced at quote plus a slippage buffer, because a stale price sneaking under a cap must be impossible by construction at your boundary. It doesn’t store anything — persistence is yours, atomicity rules above. And it doesn’t sandbox the model — it bounds what the model’s tool calls can do, which, as the threat model spells out, is the part that can actually be guaranteed.

Status and the release

The engine, policy schema, hold lifecycle, and audit chain documented here are implemented and covered by the test suite that encodes every promise on this page. The npm package publishes with the Agent Firewall open-source release; waitlist members get the repo first, and the first 200 lock founding pricing on the Pro tier ($79/yr) that adds the hosted layer — phone approvals, remote kill, dashboards. Join the waitlist →

The Agent Firewall SDK: embed enforced guardrails in your own agent