Skip to content

Gormes Completion Plan

This is the execution plan for finishing gormes-agent. The finish line is not an MVP, not a partial wrapper, and not only “enough to improve itself from Telegram”: Gormes is complete when it is Hermes in Go, with Goncho as the Honcho-compatible Go port inside Gormes.

The canonical backlog remains progress.json. This page explains how to drive that backlog without spending unbounded planner tokens or creating parallel queues.

Use this page with:

  1. Hermes parity is the product definition. CLI, agent loop, provider routing, tool execution, memory, skills, plugins, API, TUI, gateway, channels, cron, packaging, observability, and operations must have Go-native equivalents or explicit tested divergences.
  2. Config, commands, providers, and operator experience are core parity lanes. Hermes-compatible config precedence, command names, slash commands, gateway commands, provider routing/auth/usage, error surfaces, status output, and local interactive behavior are not polish. They are part of the runtime contract an operator depends on while running long coding turns.
  3. Divergence must be deliberate, visible, and tested. A Go-native replacement is acceptable only when the docs and tests name the upstream Hermes behavior, explain why Gormes owns a different contract, and prove the operator-visible result.
  4. Goncho is in-process Gormes memory. Internal code stays goncho; public compatibility can expose honcho_* names when tools, MCP clients, or existing users depend on them.
  5. progress.json is the only backlog. Missing work becomes a row. Broad work becomes an umbrella row until split. No side TODOs, private queues, or agent-local task lists.
  6. Rows must be builder-executable. A runnable row names source refs, write scope, test commands, acceptance, ready/not-ready conditions, and a done signal.
  7. Every runtime claim needs tests. Prefer hermetic fixtures. Live provider, live platform, and live cloud checks are opt-in smoke tests, not row-local proof.
  8. Planning is bounded. Planner passes map parity and sharpen rows. They do not run indefinitely and do not implement runtime code.

As of the 2026-05-26 cmd/progress emit scan, the split canonical backlog contains 1,178 row objects: 1,173 complete and 5 planned. The generated Agent Queue is empty because every planned row is gated by a dependency, operator decision, or external-access blocker. Treat old phase-open counts as historical context only; current implementation intent comes from the row objects and generated queue pages.

PhaseNon-complete rowsPlanner meaning
Phase 1 — Dashboard / control plane0Skill-era control rows are complete: planning/building route through canonical development skills and symlink loader views instead of deleted loop binaries.
Phase 2 — Gateway0Gateway, channel, slash/skill/tool exposure, and operator-control rows are currently closed in the active backlog. New Hermes/Pi findings must become fresh rows before builder work.
Phase 3 — Memory0Current Goncho/Honcho memory closure rows are complete; future memory work must be sourced from a new parity or product row.
Phase 4 — Brain Transplant0Native-turn/provider/context rows in the active backlog are closed; regressions still need row-backed parity evidence before implementation.
Phase 5 — Final Purge2Goscrapling local crawler adapter gate for web_crawl and Go-owned WASM TTS backend remain planned, but both are blocked on dependency/source-selection gates rather than builder-ready code work.
Phase 6 — Learning Loop0Skill extraction, retrieval, scoring, and operator surfaces are closed in the active backlog; new learning-loop work starts with planner evidence.
Phase 7 — Paused Channels0The paused channel backlog has no active non-complete rows. Do not expand channels without a fixture-ready progress row.
Phase 8 — Reputation & Publication3Public-social, engineering-writeup, and agentic-porting-kit rows remain planned but operator/external-access gated.
Phase 9 — Design & Security Hardening0Design/security hardening rows are currently closed; new findings need source-backed rows.

Structured blocker receipts remain attached to planned rows (five active blocker records plus one resolved release receipt), but blocker metadata is not builder-ready work. Empty queue means planner work is next: choose one planned row, either satisfy/remove its blocker or sharpen the contract, then validate progress before assigning it to a builder. The current best repo-local planner candidates are the WASI TTS runtime source selection, the Goscrapling release gate, or publication rows after operator input.

The first closure target is not “all green”; it is a Python-free normal agent turn with local Goncho memory and tested tool-call continuation. That is a dogfood gate, not a reduced finish line. Once it works, Gormes still must keep closing Hermes parity across config, commands, providers, tools, TUI/API, gateway, release, and operator experience:

CLI/API/gateway input
-> Go prompt/context assembly
-> Go provider adapter
-> Go tool execution
-> Goncho/memory recall and persistence
-> Go final response + audit/status evidence

Telegram Dogfood Milestone (“Gormes finishes itself”)

Section titled “Telegram Dogfood Milestone (“Gormes finishes itself”)”

Goal: operate Gormes from Telegram as the primary operator surface while Gormes continues shipping the remaining parity rows. This milestone proves that the runtime can steer and validate its own work; it does not redefine completion as “Telegram works.”

Execution sequence:

  1. Control-plane safety first (Phase 2/5).
    • Land /steer queue fallback (2.F.5.1) so operators can issue bounded steering instructions during active work.
    • Land gateway /usage binding (4.H.13) and /status parity surfaces so runtime health, rate limits, and stuck sessions are visible from Telegram.
  2. CLI/config parity closure (Phase 5.O).
    • Finish command-tree manifest (5.O.1) and migrate/config rows (5.O.18..5.O.23) so Telegram-driven sessions can rely on the same deterministic runtime/config behavior as Hermes.
    • Treat Hermes command names, aliases, root flags, profile/model/provider selection, config show/path/env-path/set/check/edit/migrate, auth, logs, status, backup, update, and dynamic plugin commands as parity targets unless a row explicitly marks an owned Gormes divergence.
  3. Provider and account-control closure (Phase 4.A/4.G/4.H).
    • Keep at least one coding-capable provider and one fallback provider stable for dogfood, but continue toward Hermes provider parity: streaming, tool-call continuation, auth/token refresh, retries, rate evidence, context limits, model quirks, usage/cost reporting, and visible failure classification.
  4. Tool/runtime closure (Phase 5.A/B/J).
    • Complete remaining core tool registry and sandbox-policy rows before broad channel expansion.
  5. Operator experience closure (Phase 5.Q plus gateway/TUI rows).
    • Match the Hermes operator feel where it matters: slash completion, busy-turn steering, status/footer evidence, prompt symbols, tool progress, approval prompts, interrupt/edit helpers, gateway status/usage, and recoverable failure output.
  6. Operator e2e gate.
    • Prove one full “plan -> build -> validate -> report” loop executed from Telegram without Python fallbacks, using only Gormes runtime and Goncho memory surfaces.

Definition of done for this lane:

  • Telegram session can start work, steer active work, inspect status/usage, and receive validated completion evidence.
  • Remaining implementation rows can then be executed through that same Telegram surface as the default operator workflow, while the finish line remains broad Hermes parity with explicit tested divergences only.
NeedStart here
Overall finish lineThis page
Phase-to-lane ownership and gatesCompletion Lane Roadmap
Upstream feature-to-Go package mapHermes And Honcho Feature Map
Reconciled Go implementation planHermes/Honcho To Gormes Go Runtime Plan
Completeness audit for upstream mappingUpstream Coverage Ledger
Feature-level swarm gap registerSwarm Feature Parity Audit
How agents should run each passAgent Operating Model
Current generated roadmapArchitecture Plan
Upstream feature inventorySubsystem Inventory
Go implementation pattern lookupGo Donor Reference Map
Row handoff requirementsContract Readiness
Skill-builder queue and selectionSkill Builder Handoff
Test expectationsTesting

Every substantial agent pass starts by choosing a repo-local skill. Canonical skill files live under docs/development-skills/; .agents/skills/, .claude/skills/, and .codex/skills/ are symlink loader views.

SituationSkill path
Unsure what workflow appliesgormes-skill-manager
Mapping upstream Hermes/Honcho gapsgormes-parity-auditor
Updating progress.json, phases, or docsgormes-planner
Selecting a Go donor pattern before shaping a runtime rowgormes-references
Designing a Go package/API boundarygormes-interface-designer
Implementing one rowgormes-builder
Delivering one behavior with red-green-refactorgormes-tdd-slice
Stress-testing a plan with the operatorgrill-me

The default flow is:

parity audit -> planner row refinement -> builder row execution -> TDD slice -> validation

Do not recreate the old loop binaries. gormes-planner and gormes-builder are manual skill-routed workflows; repository evidence and progress.json are the source of truth.

The operating rule is:

skill -> bounded scan -> row/doc change -> validation -> short handoff

If a pass cannot name its lane, subsystem, expected files, and validation gates, it is too vague to run.

The roadmap now has explicit closure subphases for the work that was previously spread across prose or broad phase headings.

SubphasePurposeFirst rows
1.D — Skill-Driven Control PlaneKeep all agents on skills + progress.json after deleting loop binaries.Skill-manager selection matrix hardening; skill-pack coverage audit.
3.G — Goncho Drop-In Compatibility ClosureProve Goncho is the Honcho-compatible Go memory port, not just local memory pieces.Goncho Honcho SDK compatibility e2e harness; Goncho memory integration into normal agent turn.
4.I — Native Agent Turn ClosureProve the actual Hermes-in-Go normal turn across provider, tools, memory, final response, and audit.Python-free normal agent turn e2e harness; provider-tool-memory golden transcript suite; Hermes/Honcho feature map.

The deleted loop binaries are replaced by this repeatable ladder. Every agent uses it; no private scheduler, side queue, or ad hoc task list is allowed.

StepSkillOutputValidation
1. Routegormes-skill-managerOne selected workflow and reason.None beyond naming the skill.
2. Auditgormes-parity-auditorCovered/planned/vague/missing/owned map for one lane surface.Exact upstream and Gormes paths named.
3. Plangormes-plannerUpdated docs/rows with builder-ready contracts.go run ./cmd/progress validate, docs/progress tests when changed.
4. Designgormes-interface-designer when neededChosen Go package/API boundary.Row updated with write scope and tests.
5. Buildgormes-builderOne row implemented.Row-local tests, focused package gate, progress validation.
6. TDDgormes-tdd-sliceRed-green-refactor evidence for the behavior.Test first or explicit reason why not feasible.
7. HandoffSame skill used for the passDone signal, files changed, tests run, next row.Final report is resumable.

If a row is not executable by step 5, the correct action is to return to step 3 and sharpen the row. Do not compensate by asking a builder to rediscover the architecture.

These lanes cut across the existing phases. Each lane is done only when the corresponding progress.json rows are shipped and tests prove the user-visible contract.

Goal: make autonomous work reliable enough to finish the product.

Done means:

  • builder skills select only rows with test proof or explicit no_test_required;
  • planner rows preserve row health fields and do not mutate runtime code;
  • invalid progress.json blocks work until planning fixes it;
  • repo-local skills route all future agent work;
  • docs, progress rows, and generated queue pages agree.

Primary gates for control-plane code changes:

Terminal window
go test ./internal/progress -count=1
go run ./cmd/progress validate
go test ./docs -count=1

Planner-doc passes should use non-loop validation only:

Terminal window
go run ./cmd/progress validate
go test ./internal/progress -count=1
go test ./docs -count=1

Goal: replace Hermes’ Python run_agent.py responsibilities with Go-native provider, prompt, context, kernel, retry, tool-call, and telemetry contracts.

Done means:

  • provider adapters normalize requests, streaming, usage, errors, and retries;
  • prompt/context assembly handles project instructions, skills, memory, session search, compression, redaction, and references;
  • tool-call parsing and repair happen before tool execution;
  • model routing, cost, budgets, and provider degradation are visible in status and audit logs;
  • the agent loop can run without Python for normal chat/tool sessions.

Lane 2 — Goncho Memory And Honcho Compatibility

Section titled “Lane 2 — Goncho Memory And Honcho Compatibility”

Goal: make in-process Goncho the memory substrate while preserving Honcho-compatible public behavior.

Done means:

  • sessions, messages, users/workspaces, memories/facts, search, provenance, timestamps, updates, and deletions are available through Go APIs;
  • public compatibility fixtures cover required honcho_* tool/MCP names;
  • SQLite/FTS/graph storage is local and auditable;
  • cross-session recall, source scoping, parent lineage, and import/export are tested without a live Honcho service;
  • memory injection into the agent loop is deterministic and bounded.

Lane 3 — Tool Surface, Security, And Skills

Section titled “Lane 3 — Tool Surface, Security, And Skills”

Goal: port Hermes’ tool ecosystem without copying Python gravity.

Done means:

  • tool descriptors drive schemas, CLI/gateway exposure, doctor checks, audit kinds, trust classes, timeouts, and result budgets;
  • core file, shell, web, browser, image, audio, sandbox, MCP, ACP, plugin, approval, and operator tools have Go contracts or explicit deferred rows;
  • toolset restrictions and availability checks prevent impossible tool calls;
  • the skills runtime supports discovery, install/sync, guard metadata, preprocessing, slash-command exposure, and lockfile provenance;
  • dangerous tools are covered by path, URL, approval, and policy tests before they are exposed to untrusted callers.

Lane 4 — Gateway, Channels, Cron, And Delivery

Section titled “Lane 4 — Gateway, Channels, Cron, And Delivery”

Goal: make Gormes usable from every supported interface with one shared runtime.

Done means:

  • Telegram, Discord, Slack, WhatsApp, WeChat, Email/SMS, Webhook, API, and paused long-tail channels either ship or have explicit deferral rows;
  • session context, delivery routing, home channel, contact directory, pairing, restart, status, hooks, and mid-run steering are unified across channels;
  • cron/admin/tool/API control surfaces share one scheduler and audit store;
  • platform-specific live checks are optional, with fake adapters proving row behavior by default.

Lane 5 — CLI, API, TUI, Packaging, And Release

Section titled “Lane 5 — CLI, API, TUI, Packaging, And Release”

Goal: make the Go binary the only runtime operators need.

Done means:

  • gormes covers Hermes CLI command groups or tested divergences;
  • OpenAI-compatible HTTP surfaces, Responses/Runs streaming, health, cron admin, dashboard-facing contracts, and disconnect/cancel snapshots are native Go;
  • Bubble Tea TUI startup, provider/model overrides, status, copy policy, and streaming are independent of Node/Ink bundles;
  • Unix/Windows installers, service units, offline doctor, version output, release artifacts, and docs are fixture-backed and Python-free.

Goal: make Gormes improve itself through durable skills and evidence.

Done means:

  • complex tasks can be detected, distilled into skills, scored, improved, and safely promoted;
  • skill usage is logged and linked to outcomes;
  • failed or stale skills become planner-visible work;
  • learning loop behavior builds on the Phase 5 skills substrate instead of creating a second skill system.

These are the next planner/builder passes that should happen before expanding the roadmap again:

  1. Parity audit: Native agent spine. Use gormes-parity-auditor on Hermes run_agent.py, provider adapters, prompt/context, retry, and tool-call repair. Output missing/vague rows only.
  2. Planner pass: Phase 4 row readiness. Use gormes-planner to split broad Phase 4 rows into small provider/context/kernel tracer bullets with focused tests.
  3. Builder pass: one provider boundary row. Use gormes-builder + gormes-tdd-slice to ship exactly one provider behavior through the public Go interface.
  4. Parity audit: Goncho/Honcho. Compare ../honcho concepts and MCP/docs against internal/goncho, internal/gonchotools, internal/memory, and Phase 3/5.I rows.
  5. Planner pass: Goncho compatibility rows. Ensure every public honcho_* compatibility behavior has a hermetic fixture and builder-ready row.
  6. Builder pass: one Goncho compatibility row. Ship one request/response or tool contract with tests.
  7. Parity audit: tool descriptors. Map Hermes tool registry/toolsets into descriptor-first Go rows before any large handler ports.
  8. Builder pass: one descriptor-to-schema slice. Prove one tool descriptor drives schema, availability, audit, and doctor output.
  9. Planner pass: API/TUI/packaging dependency order. Split Phase 5.O-5.Q into API contract, CLI command, TUI, installer, and service-manager slices with no Node/Python runtime assumptions.
  10. Release readiness pass. Use docs, doctor, and e2e gates to identify the smallest Python-free operator path, then build it row by row.

Use this table to keep the next few passes concrete. If a row here is blocked or too vague, update progress.json instead of skipping silently.

OrderRowWhy it mattersSkill chain
1Python-free normal agent turn e2e harnessDefines the first honest Hermes-in-Go closure test across provider, tools, Goncho memory, final response, and audit evidence.gormes-builder -> gormes-tdd-slice
2Goncho Honcho SDK compatibility e2e harnessProves Goncho as the Honcho-compatible Go port with SDK-style local flows.gormes-builder -> gormes-tdd-slice
3Goncho empty peer-card hint contractImproves Honcho-compatible diagnostics and unblocks the Goncho closure harness.gormes-builder -> gormes-tdd-slice
4ContextEngine compression-boundary callback vocabularyGives Phase 4 a precise context/compression callback contract before kernel binding.gormes-builder -> gormes-tdd-slice
5Provider-tool-memory golden transcript suiteTurns the normal-turn harness into repeatable regression fixtures.gormes-builder -> gormes-tdd-slice
6Provider image-too-large error classificationHardens provider failure taxonomy before image retry and multimodal rows.gormes-builder -> gormes-tdd-slice

Rows not listed here can still be built, but a planner pass should explain why they outrank Lane 1/2 closure.

  1. Lane 0 remains enforced at all times. Progress validation, skill routing, and generated docs must stay green before runtime work expands.
  2. Lane 1 before broad tools. Provider/context/kernel/tool-call continuity must be stable before porting dozens of tool handlers.
  3. Lane 2 before memory-heavy UX. Goncho/Honcho compatibility must be fixture-complete before learning-loop and advanced session UX claims.
  4. Lane 3 before untrusted exposure. Tool descriptors, trust classes, approval policy, and availability checks land before gateway/API exposure.
  5. Lane 4 before release polish. Shared gateway/session/delivery behavior must be unified before packaging markets Gormes as multi-channel.
  6. Lane 5 before public install promises. Installers, services, API health, and docs must match the real binary.
  7. Lane 6 after skill substrate maturity. Learning loop work builds on reviewed skills, retrieval, and outcome evidence.

Use gormes-parity-auditor.

  1. Pick one lane and one upstream surface.
  2. List exact upstream paths and symbols.
  3. List exact Gormes packages/tests/progress rows.
  4. Classify every behavior as covered, planned, vague, missing, or owned.
  5. Propose only the missing/vague rows that unblock the next builder pass.

Use gormes-planner.

  1. Read the parity audit or current lane docs.
  2. Update docs or progress.json, not runtime code.
  3. Split umbrella work into tracer-bullet rows.
  4. Preserve builder-owned health blocks.
  5. Validate docs/progress.
  6. Report the next three builder-ready rows.

Use gormes-builder and gormes-tdd-slice.

  1. Select one row.
  2. Write one failing public-behavior test.
  3. Implement the smallest passing behavior.
  4. Repeat vertically only inside the same row.
  5. Run row-local and lane gates.
  6. Update evidence and stop.

A row is done when:

  • the behavior is observable through a public Go interface;
  • row-local tests prove the behavior with no live credentials by default;
  • required docs/web surfaces are updated;
  • go run ./cmd/progress validate passes;
  • the relevant focused package tests pass;
  • broad shared changes also pass go test ./... -count=1;
  • the final report names the done signal and remaining follow-up rows.
RiskBurn-down rule
Planner token drainRun bounded skill passes; stop after row/doc validation.
Broad rows that workers cannot finishSplit into tracer-bullet rows before assignment.
Python runtime leakageEvery lane must prove no Python dependency in the operator path it claims.
Node/Ink TUI leakageTreat Gormes Bubble Tea independence as a tested divergence.
Live provider/platform brittlenessUse fake clients and hermetic fixtures for row proof.
Duplicate memory/skills substratesReuse Goncho and Phase 2.G skills store; do not create parallel stores.
Tool schema driftDescriptor-first rows before handler ports.
Public messaging driftSync progress.json, generated docs, and www.gormes.ai when roadmap status changes.

Use this loop until Gormes is complete:

  1. Pick one lane.
  2. Audit parity for that lane.
  3. Refine the smallest next rows in progress.json.
  4. Build one row with TDD.
  5. Run the relevant gates.
  6. Update docs/progress evidence.
  7. Repeat.

When a pass discovers a reusable agent workflow, route through gormes-skill-manager and create a new repo-local skill only if the workflow will recur and has distinct validation needs.