Gormes

Safety, Sandboxing & Deployment

Agent safety architectures, code execution sandboxing, production deployment patterns, and multi-channel gateways.

Safety, Sandboxing & Deployment

Building a better agentic OS requires not just capability but reliability. This section covers safety architectures, sandboxing approaches, and production deployment patterns from both academic research and real-world systems.

PART 1: Safety Architectures

1.1 MOSAIC — Plan-Check-Act Safety Loop

FieldValue
TitleMOSAIC: Aligning Multi-Step Tool Use Safely
Year2025

Core Pattern: Structure every agent turn as Plan → Check → Act/Refuse.

┌──────────┐     ┌──────────┐     ┌──────────────┐
│  Plan    │────→│  Check   │────→│ Act or Refuse │
│  (LLM)   │     │ (Safety) │     │   (Tool)      │
└──────────┘     └──────────┘     └──────────────┘
       ↑                                  │
       └────────── Feedback ──────────────┘

Key features:

  • Explicit safety reasoning as a first-class step (not bolted on)
  • “Refuse” as a legitimate action choice
  • Preference-based RL with pairwise trajectory comparisons

Results: 50% reduction in harmful behavior; 20% increase in refusal on injection attacks.

Why it matters: The plan-check-act pattern should be the default safety loop in any agentic OS. Not an afterthought — the control flow itself.


1.2 IntentGuard — Training-Free Runtime Gates

Two safety gates, model-agnostic, no training required:

  1. Plan Gate: Evaluates high-level plan safety before execution begins
  2. Tool Gate: Evaluates individual tool invocations against intent alignment

Continuous intent alignment: Each action checked against the user’s original objective. Blocks multi-step chains that drift from intent, as well as tool-chain attacks.

Why it matters: Runtime guardrails that work with any model, no fine-tuning needed. The plan gate catches strategic problems; the tool gate catches tactical problems.


1.3 Agent-C — Formal Temporal Guarantees

| ArXiv | 2512.23738 |

Core innovation: DSL for expressing temporal safety properties (e.g., “authenticate BEFORE accessing data”). Translates to first-order logic, uses SMT solving for verification. Constrained generation ensures every tool call satisfies temporal constraints.

Results: 100% safety conformance while improving utility.

Example constraint: AUTHENTICATE BEFORE QUERY — agent must call auth() before db.query(), always, provably.

Why it matters: Formal methods for agent safety. Not probabilistic guarantees — mathematical proofs. Critical for high-assurance deployments.


1.4 MCP Guardrails

Framework: Multi-layer guardrails for Model Context Protocol tool use

LayerMechanismWhat It Prevents
Static schemasInput/output type validationMalformed tool calls
Pre/post conditionsInferred invariantsState violations
Dynamic policiesRBAC, rate limits, loop preventionAbuse, runaway agents
Runtime monitoringReal-time tool call inspectionPrompt injection via tools

Results: 95-100% unsafe action blocking.


1.5 Safety Architecture Principle: Layered Defense

From Hermes Agent’s production security model:

┌─────────────────────────────────────────┐
│         Layer 1: Prompt-Level            │
│  Instruction filtering, intent analysis  │
├─────────────────────────────────────────┤
│         Layer 2: Runtime Gates           │
│  Plan gate → Tool gate → Approval loop   │
├─────────────────────────────────────────┤
│         Layer 3: OS Isolation            │
│  Docker sandbox, path traversal guard    │
├─────────────────────────────────────────┤
│         Layer 4: Network Boundary        │
│  SSRF mitigation, egress filtering       │
└─────────────────────────────────────────┘

Key insight: No single layer suffices. Prompt filtering, runtime gates, and OS isolation must work together. A safety gap at any layer becomes the attack surface.


PART 2: Code Execution Sandboxing

2.1 OpenSandbox (Alibaba)

GitHub: github.com/alibaba/OpenSandbox

Production-grade general sandbox runtime:

  • Multi-language SDKs: Python, Java/Kotlin, TypeScript/C#, Go
  • Unified API: Single interface across runtimes
  • Security runtimes: gVisor, Kata Containers, Firecracker microVM
  • Unified gateway: Multiple routing strategies, per-sandbox egress control
  • Use cases: Coding agents, GUI agents, evaluation, code execution, RL training

Why it matters: Production reference for sandbox design. Multi-language, strong isolation, enterprise-hardened.


2.2 AgentBay (Alibaba Cloud)

Cloud-native agent operating environment with hardware-level isolation:

  • Per-instance VMs: Each sandbox in its own guest OS (kernel-level isolation)
  • VPC networking: No public IP, default-deny security groups
  • Session ephemerality: Auto-destroy after timeout/termination (no data residue)
  • Memory isolation: Cross-session context and history isolated per tenant

Security model: Zero-trust, defense-in-depth, secure-by-design.


2.3 Fault-Tolerant Sandboxing (2025)

| ArXiv | 2512.12806 |

The “middle path” between unsafe local execution and heavyweight VM isolation:

Core mechanism: Each tool call wrapped as an atomic transaction (ACID properties):

  1. Pre-execution validation: Commands classified as safe (whitelist), unsafe (blacklist), or uncertain (needs snapshot)
  2. Transactional filesystem: Snapshot before uncertain operations, rollback on failure

Results: 100% high-risk command interception, 100% successful state rollback on failure. Performance overhead: ~14.5% (~1.8s per transaction).

Why it matters: The practical sweet spot — safer than raw local execution, lighter than full VM isolation. Atomic transactions with snapshot/rollback is the key design pattern.


2.4 Sandbox Design Principles

PrincipleMechanismSource
Isolation depthgVisor → Kata → Firecracker (pick based on threat model)OpenSandbox
AtomicityACID tool calls with snapshot/rollbackFault-Tolerant Sandbox
EphemeralityAuto-destroy on session endAgentBay
Network controlDefault-deny + explicit egress rulesAgentBay + Hermes
Filesystem scopingPath traversal guards + allowlistsHermes Agent

PART 3: Production Deployment Patterns

3.1 Persistent Agent Daemons

ProjectLanguageChannelsMemoryBinary Size
aidaemonRustTelegram, Slack, DiscordSQLite + vectors
TalonTypeScriptTelegram, Slack, WhatsApp, DiscordSQLite
RogerGoDiscord, Slack, Telegram, WebhooksPostgreSQL + pgvector15MB
Pantalk9 platformsSQLite
GormesGoTelegram, Discord, SlackSQLite (Goncho)22MB

3.2 Event-Driven Architecture (Production Pattern)

The emerging consensus for production agent systems:

┌──────────┐     ┌──────────┐     ┌──────────┐
│  Kafka   │────→│   A2A    │────→│  Worker  │
│  Topics  │     │  Orch.   │     │  Agents  │
└──────────┘     └──────────┘     └──────────┘
     ↑                                  │
     └──────── Results Publish ─────────┘

Why event-driven:

  • Survives process restarts (events are durable)
  • Decouples channel adapters from agent logic
  • Enables horizontal scaling (more workers, more throughput)
  • Natural fit for multi-channel (each channel publishes to same event stream)

3.3 Anti-Patterns

Anti-PatternWhy It FailsCorrect Approach
while True: sleep()Wastes CPU, fragileOS scheduler (systemd/launchd/cron)
Tight couplingChange one thing, break everythingEvent-driven pub/sub
No checkpointingCrash = lost contextDurable state from day one
Single providerProvider outage = agent downMulti-provider with circuit breakers

3.4 Production Reliability Patterns

PatternMechanism
Dead letter queuesFailed events retry with exponential backoff, then DLQ
Health monitoringProactive ping every 60-120s, alert below 95% success
Circuit breakersPer-provider, per-API-key, threshold-based
Graceful degradationFallback chains, latency thresholds (not just 4xx/5xx)
CheckpointingPostgreSQL/SQLite for pause/resume, time-travel debugging
Streaming retry ruleCan only retry if no data sent yet (streaming partial response cannot be replayed)

3.5 Multi-Channel Gateway Pattern

From OpenClaw blog and production systems:

┌────────────────────────────────────────────┐
│              Gateway Process                │
│                                              │
│  ┌────────┐ ┌───────┐ ┌────────┐           │
│  │Telegram│ │Discord│ │ Slack  │  ...      │
│  │Adapter │ │Adapter│ │Adapter │           │
│  └────────┘ └───────┘ └────────┘           │
│        │         │         │                │
│        └─────────┴─────────┘                │
│                  │                          │
│           Shared Kernel                     │
│    (ReAct loop, tools, memory)              │
└────────────────────────────────────────────┘

Key design decisions:

  • Routing is deterministic (configured), not model-driven
  • Session isolation: Per-channel sessions, shared memory files
  • Platform quirks handled in adapters: Slack=Socket Mode, Discord=Gateway+REST, Telegram=long-poll

3.6 Provider Routing for Production

StrategyHow It WorksBest For
Priority/FailoverOrdered list, try first, fallback on errorSimple, predictable
Weighted/Round-robinDistribute evenly across healthyLoad balancing
Cost-optimizedCheapest healthy that meets qualityBudget-conscious
Latency-weightedFastest healthy providerInteractive use
Capability-basedRoute by task complexityTiered model strategy

Critical pattern: Not all errors should trigger fallback:

  • 400 (bad request) → return immediately (the request itself is wrong)
  • 429/503 → fallback (the provider is having trouble)

Circuit breaker design: Track P95 latency, not just P50. Degraded performance at the tail matters more than average performance.


4. For Building Agentic OS

Safety Checklist

  • Plan-check-act loop as default control flow
  • Tool-level access control (who can call what)
  • Temporal safety constraints (must X before Y)
  • Runtime approval gates for dangerous operations
  • OS-level isolation for code execution

Deployment Checklist

  • Event-driven architecture (not polling)
  • Durable state with checkpointing
  • Multi-provider routing with circuit breakers
  • Dead letter queues for failed events
  • Health monitoring with proactive pings
  • Graceful degradation under provider failure

Sandboxing Checklist

  • Transactional tool execution (snapshot/rollback)
  • Filesystem scoping (no access outside project)
  • Network egress control
  • Auto-destroy after session timeout
  • Path traversal protection