Gormes Completion Plan

This is the execution plan for finishing gormes-agent. The finish line is not an MVP, not a partial wrapper, and not only “enough to improve itself from Telegram”: Gormes is complete when it is Hermes in Go, with Goncho as the Honcho-compatible Go port inside Gormes.

The canonical backlog remains progress.json. This page explains how to drive that backlog without spending unbounded planner tokens or creating parallel queues.

Use this page with:

Completion Lane Roadmap for phase-to-lane ownership and lane-specific gates.
Hermes And Honcho Feature Map for the upstream feature-to-Go package map.
Hermes/Honcho To Gormes Go Runtime Plan for the reconciled implementation plan, subsystem classification, nested coverage matrix, and Go package dependency order.
Upstream Coverage Ledger for the audit rule that tells us whether every feature-bearing Hermes/Honcho source class has been mapped.
Swarm Feature Parity Audit for the feature-level gap register found by parallel sub-agent parity audits.
Agent Operating Model for exactly how Codex, Claude, claudeu, and codexu should run planner, builder, parity, TDD, and interface-design passes.
Contract Readiness for the row-level handoff fields that make a slice builder-executable.

Non-Negotiables

Hermes parity is the product definition. CLI, agent loop, provider routing, tool execution, memory, skills, plugins, API, TUI, gateway, channels, cron, packaging, observability, and operations must have Go-native equivalents or explicit tested divergences.
Config, commands, providers, and operator experience are core parity lanes. Hermes-compatible config precedence, command names, slash commands, gateway commands, provider routing/auth/usage, error surfaces, status output, and local interactive behavior are not polish. They are part of the runtime contract an operator depends on while running long coding turns.
Divergence must be deliberate, visible, and tested. A Go-native replacement is acceptable only when the docs and tests name the upstream Hermes behavior, explain why Gormes owns a different contract, and prove the operator-visible result.
Goncho is in-process Gormes memory. Internal code stays goncho; public compatibility can expose honcho_* names when tools, MCP clients, or existing users depend on them.
progress.json is the only backlog. Missing work becomes a row. Broad work becomes an umbrella row until split. No side TODOs, private queues, or agent-local task lists.
Rows must be builder-executable. A runnable row names source refs, write scope, test commands, acceptance, ready/not-ready conditions, and a done signal.
Every runtime claim needs tests. Prefer hermetic fixtures. Live provider, live platform, and live cloud checks are opt-in smoke tests, not row-local proof.
Planning is bounded. Planner passes map parity and sharpen rows. They do not run indefinitely and do not implement runtime code.

Current Finish Ledger

As of the current progress.json, the remaining work is concentrated in the native agent spine, tool/security surface, release surface, learning loop, and paused channel backlog. Do not let Phase 7 channel expansion outrun the core agent, Goncho, tool, and release lanes.

Phase	Open rows	Planner meaning
Phase 1 — Dashboard / control plane	0	Skill-era control rows are complete: planning/building route through canonical development skills and symlink loader views instead of deleted loop binaries.
Phase 2 — Gateway	15	Mostly channel polish, home-channel ownership, and mid-run steering. Keep these behind Lane 1/2 unless a row unblocks normal operation.
Phase 3 — Memory	3	Goncho now has an explicit drop-in compatibility closure: SDK-style harness, normal-turn integration, and empty peer-card hint.
Phase 4 — Brain Transplant	30	Highest strategic pressure: provider, context, prompt, routing, OAuth, retry, telemetry, and the new normal-turn e2e closure decide whether Gormes is really Hermes in Go.
Phase 5 — Final Purge	70	Largest backlog: tools, sandboxing, browser/media, security, CLI/API/TUI, packaging, and release. Split umbrellas before building.
Phase 6 — Learning Loop	11	Depends on the skills substrate and memory evidence; build after the skill storage/retrieval rows are sharper.
Phase 7 — Paused Channels	16	Explicit backlog. Build only fixture-ready slices or channel dependencies that unblock Lane 4.

The first closure target is not “all green”; it is a Python-free normal agent turn with local Goncho memory and tested tool-call continuation. That is a dogfood gate, not a reduced finish line. Once it works, Gormes still must keep closing Hermes parity across config, commands, providers, tools, TUI/API, gateway, release, and operator experience:

CLI/API/gateway input
  -> Go prompt/context assembly
  -> Go provider adapter
  -> Go tool execution
  -> Goncho/memory recall and persistence
  -> Go final response + audit/status evidence

Telegram Dogfood Milestone (“Gormes finishes itself”)

Goal: operate Gormes from Telegram as the primary operator surface while Gormes continues shipping the remaining parity rows. This milestone proves that the runtime can steer and validate its own work; it does not redefine completion as “Telegram works.”

Execution sequence:

Control-plane safety first (Phase 2/5).
- Land /steer queue fallback (2.F.5.1) so operators can issue bounded steering instructions during active work.
- Land gateway /usage binding (4.H.13) and /status parity surfaces so runtime health, rate limits, and stuck sessions are visible from Telegram.
CLI/config parity closure (Phase 5.O).
- Finish command-tree manifest (5.O.1) and migrate/config rows (5.O.18..5.O.23) so Telegram-driven sessions can rely on the same deterministic runtime/config behavior as Hermes.
- Treat Hermes command names, aliases, root flags, profile/model/provider selection, config show/path/env-path/set/check/edit/migrate, auth, logs, status, backup, update, and dynamic plugin commands as parity targets unless a row explicitly marks an owned Gormes divergence.
Provider and account-control closure (Phase 4.A/4.G/4.H).
- Keep at least one coding-capable provider and one fallback provider stable for dogfood, but continue toward Hermes provider parity: streaming, tool-call continuation, auth/token refresh, retries, rate evidence, context limits, model quirks, usage/cost reporting, and visible failure classification.
Tool/runtime closure (Phase 5.A/B/J).
- Complete remaining core tool registry and sandbox-policy rows before broad channel expansion.
Operator experience closure (Phase 5.Q plus gateway/TUI rows).
- Match the Hermes operator feel where it matters: slash completion, busy-turn steering, status/footer evidence, prompt symbols, tool progress, approval prompts, interrupt/edit helpers, gateway status/usage, and recoverable failure output.
Operator e2e gate.
- Prove one full “plan -> build -> validate -> report” loop executed from Telegram without Python fallbacks, using only Gormes runtime and Goncho memory surfaces.

Definition of done for this lane:

Telegram session can start work, steer active work, inspect status/usage, and receive validated completion evidence.
Remaining implementation rows can then be executed through that same Telegram surface as the default operator workflow, while the finish line remains broad Hermes parity with explicit tested divergences only.

Docs Spine

Need	Start here
Overall finish line	This page
Phase-to-lane ownership and gates	Completion Lane Roadmap
Upstream feature-to-Go package map	Hermes And Honcho Feature Map
Reconciled Go implementation plan	Hermes/Honcho To Gormes Go Runtime Plan
Completeness audit for upstream mapping	Upstream Coverage Ledger
Feature-level swarm gap register	Swarm Feature Parity Audit
How agents should run each pass	Agent Operating Model
Current generated roadmap	Architecture Plan
Upstream feature inventory	Subsystem Inventory
Row handoff requirements	Contract Readiness
Skill-builder queue and selection	Skill Builder Handoff
Test expectations	Testing

Skill-Routed Operating Model

Every substantial agent pass starts by choosing a repo-local skill. Canonical skill files live under docs/development-skills/; .agents/skills/, .claude/skills/, and .codex/skills/ are symlink loader views.

Situation	Skill path
Unsure what workflow applies	`gormes-skill-manager`
Mapping upstream Hermes/Honcho/GBrain gaps	`gormes-parity-auditor`
Updating `progress.json`, phases, or docs	`gormes-planner`
Designing a Go package/API boundary	`gormes-interface-designer`
Implementing one row	`gormes-builder`
Delivering one behavior with red-green-refactor	`gormes-tdd-slice`
Stress-testing a plan with the operator	`grill-me`

The default flow is:

parity audit -> planner row refinement -> builder row execution -> TDD slice -> validation

Do not recreate the old loop binaries. gormes-planner and gormes-builder are manual skill-routed workflows; repository evidence and progress.json are the source of truth.

The operating rule is:

skill -> bounded scan -> row/doc change -> validation -> short handoff

If a pass cannot name its lane, subsystem, expected files, and validation gates, it is too vague to run.

New Closure Subphases

The roadmap now has explicit closure subphases for the work that was previously spread across prose or broad phase headings.

Subphase	Purpose	First rows
1.D — Skill-Driven Control Plane	Keep all agents on skills + `progress.json` after deleting loop binaries.	Skill-manager selection matrix hardening; skill-pack coverage audit.
3.G — Goncho Drop-In Compatibility Closure	Prove Goncho is the Honcho-compatible Go memory port, not just local memory pieces.	Goncho Honcho SDK compatibility e2e harness; Goncho memory integration into normal agent turn.
4.I — Native Agent Turn Closure	Prove the actual Hermes-in-Go normal turn across provider, tools, memory, final response, and audit.	Python-free normal agent turn e2e harness; provider-tool-memory golden transcript suite; Hermes/Honcho feature map.

No-Loop Execution Ladder

The deleted loop binaries are replaced by this repeatable ladder. Every agent uses it; no private scheduler, side queue, or ad hoc task list is allowed.

Step	Skill	Output	Validation
1. Route	`gormes-skill-manager`	One selected workflow and reason.	None beyond naming the skill.
2. Audit	`gormes-parity-auditor`	Covered/planned/vague/missing/owned map for one lane surface.	Exact upstream and Gormes paths named.
3. Plan	`gormes-planner`	Updated docs/rows with builder-ready contracts.	`go run ./cmd/progress validate`, docs/progress tests when changed.
4. Design	`gormes-interface-designer` when needed	Chosen Go package/API boundary.	Row updated with write scope and tests.
5. Build	`gormes-builder`	One row implemented.	Row-local tests, focused package gate, progress validation.
6. TDD	`gormes-tdd-slice`	Red-green-refactor evidence for the behavior.	Test first or explicit reason why not feasible.
7. Handoff	Same skill used for the pass	Done signal, files changed, tests run, next row.	Final report is resumable.

If a row is not executable by step 5, the correct action is to return to step 3 and sharpen the row. Do not compensate by asking a builder to rediscover the architecture.

Completion Lanes

These lanes cut across the existing phases. Each lane is done only when the corresponding progress.json rows are shipped and tests prove the user-visible contract.

Lane 0 — Control Plane Discipline

Goal: make autonomous work reliable enough to finish the product.

Done means:

builder skills select only rows with test proof or explicit no_test_required;
planner rows preserve row health fields and do not mutate runtime code;
invalid progress.json blocks work until planning fixes it;
repo-local skills route all future agent work;
docs, progress rows, and generated queue pages agree.

Primary gates for control-plane code changes:

go test ./internal/progress -count=1
go run ./cmd/progress validate
go test ./docs -count=1

Planner-doc passes should use non-loop validation only:

go run ./cmd/progress validate
go test ./internal/progress -count=1
go test ./docs -count=1

Lane 1 — Native Agent Spine

Goal: replace Hermes’ Python run_agent.py responsibilities with Go-native provider, prompt, context, kernel, retry, tool-call, and telemetry contracts.

Done means:

provider adapters normalize requests, streaming, usage, errors, and retries;
prompt/context assembly handles project instructions, skills, memory, session search, compression, redaction, and references;
tool-call parsing and repair happen before tool execution;
model routing, cost, budgets, and provider degradation are visible in status and audit logs;
the agent loop can run without Python for normal chat/tool sessions.

Lane 2 — Goncho Memory And Honcho Compatibility

Goal: make in-process Goncho the memory substrate while preserving Honcho-compatible public behavior.

Done means:

sessions, messages, users/workspaces, memories/facts, search, provenance, timestamps, updates, and deletions are available through Go APIs;
public compatibility fixtures cover required honcho_* tool/MCP names;
SQLite/FTS/graph storage is local and auditable;
cross-session recall, source scoping, parent lineage, and import/export are tested without a live Honcho service;
memory injection into the agent loop is deterministic and bounded.

Lane 3 — Tool Surface, Security, And Skills

Goal: port Hermes’ tool ecosystem without copying Python gravity.

Done means:

tool descriptors drive schemas, CLI/gateway exposure, doctor checks, audit kinds, trust classes, timeouts, and result budgets;
core file, shell, web, browser, image, audio, sandbox, MCP, ACP, plugin, approval, and operator tools have Go contracts or explicit deferred rows;
toolset restrictions and availability checks prevent impossible tool calls;
the skills runtime supports discovery, install/sync, guard metadata, preprocessing, slash-command exposure, and lockfile provenance;
dangerous tools are covered by path, URL, approval, and policy tests before they are exposed to untrusted callers.

Lane 4 — Gateway, Channels, Cron, And Delivery

Goal: make Gormes usable from every supported interface with one shared runtime.

Done means:

Telegram, Discord, Slack, WhatsApp, WeChat, Email/SMS, Webhook, API, and paused long-tail channels either ship or have explicit deferral rows;
session context, delivery routing, home channel, contact directory, pairing, restart, status, hooks, and mid-run steering are unified across channels;
cron/admin/tool/API control surfaces share one scheduler and audit store;
platform-specific live checks are optional, with fake adapters proving row behavior by default.

Lane 5 — CLI, API, TUI, Packaging, And Release

Goal: make the Go binary the only runtime operators need.

Done means:

gormes covers Hermes CLI command groups or tested divergences;
OpenAI-compatible HTTP surfaces, Responses/Runs streaming, health, cron admin, dashboard-facing contracts, and disconnect/cancel snapshots are native Go;
Bubble Tea TUI startup, provider/model overrides, status, copy policy, and streaming are independent of Node/Ink bundles;
Unix/Windows installers, service units, offline doctor, version output, release artifacts, and docs are fixture-backed and Python-free.

Lane 6 — Learning Loop

Goal: make Gormes improve itself through durable skills and evidence.

Done means:

complex tasks can be detected, distilled into skills, scored, improved, and safely promoted;
skill usage is logged and linked to outcomes;
failed or stale skills become planner-visible work;
learning loop behavior builds on the Phase 5 skills substrate instead of creating a second skill system.

First Ordered Passes

These are the next planner/builder passes that should happen before expanding the roadmap again:

Parity audit: Native agent spine. Use gormes-parity-auditor on Hermes run_agent.py, provider adapters, prompt/context, retry, and tool-call repair. Output missing/vague rows only.
Planner pass: Phase 4 row readiness. Use gormes-planner to split broad Phase 4 rows into small provider/context/kernel tracer bullets with focused tests.
Builder pass: one provider boundary row. Use gormes-builder + gormes-tdd-slice to ship exactly one provider behavior through the public Go interface.
Parity audit: Goncho/Honcho. Compare ../honcho concepts and MCP/docs against internal/goncho, internal/gonchotools, internal/memory, and Phase 3/5.I rows.
Planner pass: Goncho compatibility rows. Ensure every public honcho_* compatibility behavior has a hermetic fixture and builder-ready row.
Builder pass: one Goncho compatibility row. Ship one request/response or tool contract with tests.
Parity audit: tool descriptors. Map Hermes tool registry/toolsets into descriptor-first Go rows before any large handler ports.
Builder pass: one descriptor-to-schema slice. Prove one tool descriptor drives schema, availability, audit, and doctor output.
Planner pass: API/TUI/packaging dependency order. Split Phase 5.O-5.Q into API contract, CLI command, TUI, installer, and service-manager slices with no Node/Python runtime assumptions.
Release readiness pass. Use docs, doctor, and e2e gates to identify the smallest Python-free operator path, then build it row by row.

Next Skill-Routed Targets

Use this table to keep the next few passes concrete. If a row here is blocked or too vague, update progress.json instead of skipping silently.

Order	Row	Why it matters	Skill chain
1	`Python-free normal agent turn e2e harness`	Defines the first honest Hermes-in-Go closure test across provider, tools, Goncho memory, final response, and audit evidence.	`gormes-builder` -> `gormes-tdd-slice`
2	`Goncho Honcho SDK compatibility e2e harness`	Proves Goncho as the Honcho-compatible Go port with SDK-style local flows.	`gormes-builder` -> `gormes-tdd-slice`
3	`Goncho empty peer-card hint contract`	Improves Honcho-compatible diagnostics and unblocks the Goncho closure harness.	`gormes-builder` -> `gormes-tdd-slice`
4	`ContextEngine compression-boundary callback vocabulary`	Gives Phase 4 a precise context/compression callback contract before kernel binding.	`gormes-builder` -> `gormes-tdd-slice`
5	`Provider-tool-memory golden transcript suite`	Turns the normal-turn harness into repeatable regression fixtures.	`gormes-builder` -> `gormes-tdd-slice`
6	`Provider image-too-large error classification`	Hardens provider failure taxonomy before image retry and multimodal rows.	`gormes-builder` -> `gormes-tdd-slice`

Rows not listed here can still be built, but a planner pass should explain why they outrank Lane 1/2 closure.

Hard Dependency Order

Lane 0 remains enforced at all times. Progress validation, skill routing, and generated docs must stay green before runtime work expands.
Lane 1 before broad tools. Provider/context/kernel/tool-call continuity must be stable before porting dozens of tool handlers.
Lane 2 before memory-heavy UX. Goncho/Honcho compatibility must be fixture-complete before learning-loop and advanced session UX claims.
Lane 3 before untrusted exposure. Tool descriptors, trust classes, approval policy, and availability checks land before gateway/API exposure.
Lane 4 before release polish. Shared gateway/session/delivery behavior must be unified before packaging markets Gormes as multi-channel.
Lane 5 before public install promises. Installers, services, API health, and docs must match the real binary.
Lane 6 after skill substrate maturity. Learning loop work builds on reviewed skills, retrieval, and outcome evidence.

Pass Templates

Parity Audit Template

Use gormes-parity-auditor.

Pick one lane and one upstream surface.
List exact upstream paths and symbols.
List exact Gormes packages/tests/progress rows.
Classify every behavior as covered, planned, vague, missing, or owned.
Propose only the missing/vague rows that unblock the next builder pass.

Planner Template

Use gormes-planner.

Read the parity audit or current lane docs.
Update docs or progress.json, not runtime code.
Split umbrella work into tracer-bullet rows.
Preserve builder-owned health blocks.
Validate docs/progress.
Report the next three builder-ready rows.

Builder Template

Use gormes-builder and gormes-tdd-slice.

Select one row.
Write one failing public-behavior test.
Implement the smallest passing behavior.
Repeat vertically only inside the same row.
Run row-local and lane gates.
Update evidence and stop.

Definition Of Done By Row

A row is done when:

the behavior is observable through a public Go interface;
row-local tests prove the behavior with no live credentials by default;
required docs/web surfaces are updated;
go run ./cmd/progress validate passes;
the relevant focused package tests pass;
broad shared changes also pass go test ./... -count=1;
the final report names the done signal and remaining follow-up rows.

Risk Burn-Down

Risk	Burn-down rule
Planner token drain	Run bounded skill passes; stop after row/doc validation.
Broad rows that workers cannot finish	Split into tracer-bullet rows before assignment.
Python runtime leakage	Every lane must prove no Python dependency in the operator path it claims.
Node/Ink TUI leakage	Treat Gormes Bubble Tea independence as a tested divergence.
Live provider/platform brittleness	Use fake clients and hermetic fixtures for row proof.
Duplicate memory/skills substrates	Reuse Goncho and Phase 2.G skills store; do not create parallel stores.
Tool schema drift	Descriptor-first rows before handler ports.
Public messaging drift	Sync `progress.json`, generated docs, and `www.gormes.ai` when roadmap status changes.

Operating Cadence

Use this loop until Gormes is complete:

Pick one lane.
Audit parity for that lane.
Refine the smallest next rows in progress.json.
Build one row with TDD.
Run the relevant gates.
Update docs/progress evidence.
Repeat.

When a pass discovers a reusable agent workflow, route through gormes-skill-manager and create a new repo-local skill only if the workflow will recur and has distinct validation needs.