Foundational Agent Architectures

Core agent loop patterns: ReAct, Reflexion, Tree of Thoughts, Voyager, and Generative Agents.

Foundational Agent Architectures

The modern agentic OS inherits a set of foundational architectures developed between 2022–2024. These define the core loop patterns, reasoning strategies, and memory primitives used by virtually all agent systems today.

1. ReAct — The Foundational Loop

Field	Value
Title	ReAct: Synergizing Reasoning and Acting in Language Models
Authors	Shunyu Yao, Jeffrey Zhao, et al. (Princeton, Google Research)
Venue	ICLR 2023
ArXiv	2210.03629

Core Pattern: Thought → Action → Observation → Thought → ...

ReAct interleaves explicit reasoning traces with tool-calling actions. Reasoning helps induce, track, and update action plans while actions gather external information that grounds reasoning. This synergy proved transformative:

34% improvement on ALFWorld interactive decision-making
10% improvement on WebShop vs. imitation/RL baselines
Ablation: removing reasoning or action both degrade performance

Why it matters: ReAct is the default control loop in 7 of 13 open-source coding agents studied as of 2025. LangGraph, CrewAI, AutoGen, and OpenCode all default to ReAct or ReAct-derived loops.

Variants:

ReflAct (EMNLP 2025): Adds goal-state reflection to reduce hallucination; 93.3% success in ALFWorld (+27.7%)
Autono: ReAct + timely abandonment strategy + probabilistic penalty mechanism for robustness

2. Reflexion — Verbal Reinforcement Learning

Field	Value
Title	Reflexion: Language Agents with Verbal Reinforcement Learning
Authors	Noah Shinn, et al.
Venue	NeurIPS 2023
ArXiv	2303.11366

Core Pattern: Act → Critique → Revise → Act

Instead of gradient-based RL, Reflexion uses linguistic feedback stored in episodic memory. When an action fails, the agent generates a verbal reflection about what went wrong and stores it. Future attempts retrieve relevant reflections as context.

91% Pass@1 on HumanEval (vs. GPT-4’s 80%)
Ablation shows reflection + memory jointly critical — removing either drops performance

Why it matters: Self-critique with memory is the core pattern used by Replit, Devin, Cursor’s agent mode, and modern coding scaffolds. Agents learn from failure without weight updates.

Architecture: Three models work together:

Actor: Generates text and actions (LLM)
Evaluator: Produces scalar reward from environment
Self-Reflection: Converts trajectory + reward into verbal summary stored in episodic memory

3. Tree of Thoughts (ToT) — Deliberate Search

Field	Value
Title	Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Authors	Yao et al. (2023)
ArXiv	2305.10601

Core Pattern: Frame reasoning as BFS/DFS search over a tree of thought nodes.

Each node is a reasoning path scored by the LLM itself. When one path fails, the agent backtracks to explore alternatives. This breaks the linear constraint of ReAct — enabling exploration of multiple solution branches.

Key Operations:

Generate: Produce k candidate “next thoughts” from current state
Evaluate: Score each candidate (value of continuing from this state)
Select/BFS/DFS: Choose how to traverse the tree

Why it matters: Enables structured exploration when multiple valid approaches exist. Complements ReAct for tasks with branching solution spaces (math proofs, creative writing, game strategies).

Derivative: LATS (Language Agent Tree Search) — Combines ReAct + ToT with Monte Carlo Tree Search (MCTS). External tools generate states; self-evaluation guides search.

4. Voyager — Skill Libraries

Field	Value
Title	Voyager: An Open-Ended Embodied Agent with Large Language Models
Authors	Guanzhi Wang, Yuqi Xie, et al.
Venue	NeurIPS 2023 Workshops
ArXiv	2305.16291

Core Pattern: Store reusable, executable code as skills; compose them; learn from execution feedback.

Three components:

Automatic curriculum: Progressive difficulty exploration
Skill library: Ever-growing repository of executable code programs
Iterative prompting: Environment feedback + execution errors + self-verification loop

Results in Minecraft: 3.3× more unique items discovered, 15.3× faster tech tree unlocking vs. prior SOTA.

Why it matters: Voyager pioneered the skill library pattern — storing interpretable, reusable, compositional action programs. This directly maps to OpenCode’s skill system, Hermes’ GEPA self-evolution engine, and IDE agent skill files.

Key insight: Using code as the action space (not low-level motor commands) enables composition, reuse, and interpretability.

5. Generative Agents — Memory Architecture

Field	Value
Title	Generative Agents: Interactive Simulacra of Human Behavior
Authors	Joon Sung Park, Joseph O’Brien, Carrie Cai, Percy Liang, Michael Bernstein (Stanford/Google)
Venue	UIST 2023
ArXiv	2304.03442
Citations	1,300+

Core Architecture:

Memory stream: Natural language record of all experiences, timestamped
Reflection: Periodic synthesis of memories into higher-level abstractions
Planning: Goal-driven behavior generation from reflection output

Retrieval mechanism: Memories are retrieved by combining:

Recency: Recent events weighted higher
Importance: LLM judges significance at storage time
Relevance: Embedding similarity to current context

Demonstration: 25 agents autonomously coordinated a Valentine’s Day party — planning, invitations, decor, timing — with zero human scripting.

Why it matters: This is the canonical reference for agent memory system design. Nearly every subsequent memory paper (AgeMem, Synapse, A-Mem, GSW) builds on or contrasts with this architecture. The memory stream → retrieval → reflection pipeline maps directly to Goncho’s design in Gormes.

6. Plan-and-Execute

Core Pattern: Split planning from execution. Planner produces full plan upfront; executor walks step-by-step; re-plan only on failure.

Implementations: BabyAGI (2023), LangChain Plan-and-Execute, LlamaIndex planners, Devin (Cognition AI).

Trade-offs vs. ReAct:

Advantage: Lower token cost (doesn’t re-deliberate at every step), better for long-horizon tasks
Disadvantage: Less adaptive to surprises, initial plan errors propagate

Why it matters: Choice between ReAct and Plan-and-Execute is the primary architectural decision when designing an agent loop. Gormes uses ReAct for interactive TUI turns and Plan-and-Execute for background/multi-step tasks.

Architecture Comparison

Architecture	Control Flow	Best For	Limitations
ReAct	Sequential Think→Act→Observe	Interactive tasks, tool use	Linear, no backtracking
Reflexion	Act→Critique→Revise	Self-improvement, debugging	Requires evaluator signal
Tree of Thoughts	BFS/DFS over thought tree	Multi-solution problems	High token cost
LATS	MCTS + ReAct + ToT	Complex exploration	Highest token cost
Plan-and-Execute	Plan→Step→Re-plan	Long-horizon tasks	Inflexible to surprises
Voyager	Curriculum-driven skill building	Lifelong learning	Requires execution environment

How These Inform Agentic OS Design

OS Component	Architectural Inspiration
Turn loop	ReAct cycle with Reflexion self-critique
Skill system	Voyager’s skill library + code-as-action
Memory	Generative Agents’ memory stream + retrieval
Long tasks	Plan-and-Execute with dynamic re-planning
Complex decisions	ToT/LATS for branching exploration