Testing
Testing
Section titled “Testing”Testing is the proof that a progress.json row is complete. The default is
hermetic fixture evidence; live providers and live platforms are optional smoke
checks, not row-local proof.
Use the Completion Lane Roadmap to choose the lane gate for broad changes.
Row-local TDD
Section titled “Row-local TDD”Use gormes-tdd-slice for implementation rows:
- Write one failing public-behavior test.
- Make the smallest implementation pass.
- Repeat vertically inside the same row.
- Refactor only while green.
- Run the row’s
test_commands.
Tests should verify behavior through public interfaces: CLI output, HTTP responses, gateway events, tool schemas/results, memory recall, provider transcripts, or package APIs meant for callers. Avoid tests that only prove a private helper’s current shape unless the row explicitly defines that helper as the public contract.
Baseline layers
Section titled “Baseline layers”Go tests
Section titled “Go tests”go test ./... from the repository root. Covers kernel, memory, tools, telegram adapter, session resume. Integration tests are tag-gated:
go test -tags=live ./... # requires local OllamaLanding + docs smoke (Playwright)
Section titled “Landing + docs smoke (Playwright)”npm run test:e2e from www.gormes.ai/ and docs/www-tests/. Parametrized over mobile viewports (320 / 360 / 390 / 430 / 768 / 1024 px). Asserts:
- No horizontal overflow
- Section copy matches the locked strings in
content.go - Starlight navigation, TOC, and prev/next links render on mobile and desktop
Astro/Starlight build
Section titled “Astro/Starlight build”go test ./docs/... -run TestAstroBuild. Shells out to npm run build, verifies generated Starlight pages and redirect aliases, checks for broken internal links, and confirms the canonical progress.json is copied into the static export.
Architecture fixtures
Section titled “Architecture fixtures”Subsystem ports need tests that freeze contracts, not just happy-path code. Borrow these fixture classes from the Hermes and Gormes-owned studies:
- Command registry parity: one registry drives parsing, help text, platform command exposure, aliases, and active-turn policy.
- Provider transcripts: replay request/stream fixtures for text, reasoning, tool-call deltas, finish reasons, usage, and retryable errors.
- Tool schema parity: compare upstream tool names, toolsets, JSON schemas,
result envelopes, trust classes, availability/degraded status, and default
registry exposure. A renderer fixture for
📖 read_file: "x"is not proof that the model can callread_file;cmd/gormesregistry tests must prove the descriptor is present. - Memory scope negatives: prove same-chat default recall, opt-in user-scope widening, source allowlists, deny paths, and no cross-chat leakage.
- Compression replay: prove head/tail preservation, tool call/result pair integrity, summary lineage, and JSON-valid shrunken tool arguments.
- Durable job replay: prove cron/subagent jobs can be claimed, renewed, completed, failed, cancelled, retried, and inspected after restart.
- Skill resolver conformance: prove triggers, exclusions, disabled state, review state, and confusing phrases route to the expected skill or no skill.
- Normal-turn e2e: prove a Python-free Go agent turn crosses provider, tool execution, Goncho memory recall/persistence, final response, and audit/status evidence with fake providers and temp state.
Operator Transcript Fixtures
Section titled “Operator Transcript Fixtures”When the bug came from live dogfood, preserve the exact visible artifact in the test. A final answer alone is not sufficient proof.
- Gateway/channel UX: assert outbound send/edit/delete order, placeholder lifetime, typing-action fallback, final-message count, duplicate suppression, and whether tool progress is visible.
- Tool progress: assert the correct surface. Gateway progress uses
emoji tool_name: "preview"withnew/all/verbosemodes, bounded previews, and(×N)collapse. Current TUI shelves use Title Case tool calls. Do not mix the two contracts in one fixture. Pair these with default-registry assertions when the bug is “Gormes collapsed all work into execute_code.” - Local task tools: for
read_file,search_files,write_file,patch, andterminal, use temp roots and fake/small processes. Cover outside-root denial, symlink escape denial, duplicate-read status behavior, dangerous-command approval, timeout/error envelopes, and explicit unsupported evidence for background/PTY gaps. - Runtime surface: for installer or lock reports, run the same focused
command through the intended surface:
go run ./cmd/gormes,./bin/gormes, or installedgormes, with an explicitGORMES_HOME. Record binary path and source checkout in the test name or helper comments. - Persona/reset: ask the provider-captured prompt or reset fixture what files were injected or reset. Do not patch the assistant’s final string after the provider as proof of identity.
Discipline
Section titled “Discipline”Every PR must keep the relevant row-local, package, docs, and lane gates green.
The deploy-gormes-landing.yml and deploy-gormes-docs.yml workflows run the
Go + Playwright subsets on every push to main.
Minimum completion proof for runtime rows:
go run ./cmd/progress validatego test ./... -count=1Planner/doc-only rows normally run:
go run ./cmd/progress validatego test ./docs -count=1Lane Gate Cheat Sheet
Section titled “Lane Gate Cheat Sheet”| Lane | Focused gate |
|---|---|
| Native agent spine | go test ./internal/llm ./internal/kernel ./internal/gateway ./internal/telemetry -count=1 plus go test ./internal/e2e -count=1 after Phase 4.I creates it |
| Goncho/memory | go test ./internal/goncho ./internal/gonchotools ./internal/memory ./internal/persistence/session ./internal/persistence/store -count=1 |
| Tools/security/skills | go test ./internal/tools ./internal/plugins ./internal/skills ./internal/doctor -count=1 |
| Gateway/channels/cron | go test ./internal/gateway ./internal/automation/cron ./internal/channels/... ./internal/slack ./internal/discord -count=1 |
| CLI/API/TUI/release | go test ./cmd/gormes ./internal/cli ./internal/apiserver ./internal/tui ./internal/tuigateway -count=1 |
| Docs/progress | go run ./cmd/progress validate && go test ./internal/progress ./docs -count=1 |
When a row touches more than one lane, run all relevant focused gates before
the broad go test ./... -count=1.