Reins
Security controls for AI agents.
In Greek myth, Athena gave Bellerophon the golden bridle — reins included — that let him guide Pegasus. Reins applies the same idea to AI agents: raw power is not enough — what matters is making it controllable.
Reins watches your agent’s every move. Here’s how, technically.
Prevent, Pause, Prove
Reins enforces deterministic security policies on every agent action — synchronously, before execution, with no LLM in the enforcement path.
- Prevent — Block destructive actions before they execute. Regex and heuristic rules catch
rm -rf /,DROP DATABASE, fork bombs, writes to~/.ssh, and dozens of other patterns. Rules are evaluated in under 50ms with no network call. - Pause — Route high-impact actions through human approval before they proceed. In a terminal, this is an interactive prompt. In a messaging workflow (WhatsApp, Telegram), an out-of-band notification delivers the approval request directly to the human — bypassing the agent context entirely so the agent cannot self-approve.
- Prove — Every decision is appended to an immutable JSONL audit trail: tool, action, decision, rule, timestamp, decision time. Pending entries are buffered locally and flushed to Reins Cloud on the next sync.
Enforcement is synchronous and deterministic. The agent cannot proceed until the hook exits. An unhandled hook error blocks the action — fail-closed, not fail-open.
Why this matters
AI agents are increasingly capable of taking real actions: deleting files, pushing code, sending messages, dropping database tables, making purchases. Most of the time, this is exactly what you want. But capability without control creates a class of failure modes that are hard to anticipate and easy to trigger.
Irreversibility. A git push --force, DROP TABLE, or rm -rf is not a suggestion — it’s a permanent change. Agents don’t inherently understand the difference between a reversible action and one that destroys data or breaks production. They optimize for task completion, not for the preservation of things you might want back.
Drift and manipulation. An agent working through a long task can drift far from its original intent — through prompt injection in tool output, through cumulative low-risk steps that compose into a harmful chain, or simply through losing context. By the time an agent reaches a destructive action, the reason it was authorized may no longer apply.
In-context approval is not enforcement. Asking a model to “confirm with the user before deleting” is a suggestion, not a control. A sufficiently creative prompt, a poisoned tool response, or simple model error can bypass it. The approval request is just another token — the same model that asked can also answer.
No audit trail. When something goes wrong, you need to know what happened: what action, what rule, what was the decision, who approved it. Without a structured record, post-incident review is guesswork.
An agent cannot be its own watchdog. Neither can any runtime that runs inside the agent’s context.
Why Reins
| Reins | ClawSec | DefenseClaw | |
|---|---|---|---|
| Architecture | External to agent — cannot be prompt-injected | Runs inside agent context — can be compromised | External, multi-runtime |
| Install | npm i -g @pegasi/reins | Skill install | 3 runtimes + Go daemon |
| Hosted dashboard | Yes (Reins Cloud) | No | No (Splunk only) |
| HITL approvals | Yes | No | No |
| Target | Developers + small teams | OpenClaw users | Enterprise SOC teams |
The architecture difference is the one that matters most. Security controls that run inside the agent’s context can be bypassed by the agent — whether through prompt injection, model error, or adversarial tool output. Reins runs at the OS hook level (Claude Code) or inside the gateway process before tool dispatch (OpenClaw), where the agent has no influence over the enforcement decision.
In The News
- TechCrunch (February 23, 2026): A Meta AI security researcher said an OpenClaw agent ran amok on her inbox
Install
npm install -g @pegasi-ai/reins
reins initreins init runs an interactive wizard that installs PreToolUse and PostToolUse hooks into .claude/settings.json (Claude Code) or registers the plugin with your OpenClaw gateway, then runs an initial security scan.
Next steps
- Getting Started — install, init, first scan
- Claude Code — hooks, skill, examples
- OpenClaw — plugin, channel mode, OOB approvals
- How It Works — terminal mode, channel mode, memory forecasting
- Security Policies — ALLOW / ASK / DENY rules, OWASP coverage
- CLI Reference — every command
- Security Scan — 13-check audit, drift monitoring
- Reins Cloud — centralized governance