Reading List

Curated priority reading order for understanding and building agentic operating systems.

Curated Reading List

A priority-ordered reading list for building better agentic operative systems. Each tier builds on the previous; start at Tier 1 and work down.

Tier 1 — Core Foundations (Start Here)

These papers define the fundamental concepts every subsequent paper references.

1.1 Agent Survey: Wang et al. 2023

ArXiv	2308.11432

A Survey on Large Language Model Based Autonomous Agents

The canonical taxonomy: Profiling → Memory → Planning → Action. Every agent system paper references this framework. Read this first to understand the vocabulary.

Key takeaway: Agents decompose into four modules. Design each with clear boundaries.

1.2 ReAct: Yao et al. 2022

ArXiv	2210.03629

ReAct: Synergizing Reasoning and Acting in Language Models

The Thought → Action → Observation loop that powers virtually every agent system today. Understanding ReAct means understanding the default control flow of modern agents.

Key takeaway: Interleaving reasoning and action improves both. Don’t separate “thinking” from “doing.”

1.3 Generative Agents: Park et al. 2023

ArXiv	2304.03442

Generative Agents: Interactive Simulacra of Human Behavior

The canonical memory architecture: Memory Stream → Retrieval → Reflection → Planning. The foundation for every agent memory paper since.

Key takeaway: Memory is a pipeline — store, retrieve, reflect, plan. Each step matters.

Tier 2 — How Agents Get Better

Once you understand the basic loop, learn how agents improve themselves.

2.1 Reflexion: Shinn et al. 2023

ArXiv	2303.11366

Reflexion: Language Agents with Verbal Reinforcement Learning

Self-critique as verbal RL. Agents store reflections in memory, learn from past failures without gradient descent.

Key takeaway: The Act → Critique → Revise → Act loop adds learning to the basic ReAct cycle.

2.2 Voyager: Wang et al. 2023

ArXiv	2305.16291

Voyager: An Open-Ended Embodied Agent with Large Language Models

Skill libraries that grow with experience. The code-as-action pattern that enables composition and reuse.

Key takeaway: Store reusable code, not static instructions. Skills should be executable and composable.

2.3 SWE-Agent: Yang et al. 2024

ArXiv	2405.15793

SWE-Agent: Agent-Computer Interfaces Enable Automated Software Engineering

The Agent-Computer Interface (ACI) concept: interface design matters as much as model capability. Few high-level actions beat raw shell access.

Key takeaway: Design the tool interface for how the model thinks, not how humans use computers.

Tier 3 — Multi-Agent & OS Design

Now think at system scale — multiple agents, OS-level architecture.

3.1 MetaGPT: Hong et al. 2024

ArXiv	2308.00352

MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework

SOP-based multi-agent collaboration. Organizational structure improves agent team performance.

Key takeaway: Define roles and communication protocols. Free-form collaboration is chaotic.

3.2 AIOS: Mei et al. 2024

ArXiv	2403.16971

AIOS: LLM Agent Operating System

The LLM-as-kernel paradigm. Six kernel modules for agent scheduling, context, memory, tools, and access control.

Key takeaway: Treat agents as OS processes. The kernel manages resources shared across agents.

3.3 MACNet: Qian et al. 2025

| ICLR 2025 | 基于大型语言模型的多智能体协作扩展研究 |

Collaboration scaling laws: more agents help (up to ~100). Topology matters more than count.

Key takeaway: Don’t assume “more agents = better.” There’s a sweet spot, and how they’re connected matters.

3.4 Architecting AgentOS: Li et al. 2026

ArXiv	2602.20934

Architecting AgentOS: From Token-Level Context to Emergent System-Level Intelligence

OS abstractions mapped to LLM constructs — paging, interrupts, scheduling for context.

Key takeaway: The frontier. Classical OS theory applied to LLM-native systems.

Tier 4 — Memory & Safety (Deep Dives)

Now specialize in the hardest problems.

4.1 AgeMem (2026)

ArXiv	2601.01885

Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management

The agent decides what to remember, not a heuristic pipeline. Memory operations as tools.

Key takeaway: Memory management should be learned behavior, not engineered rules.

4.2 Memory Survey (2026)

ArXiv	2603.07670

Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Emerging Frontiers

The definitive survey. Three-dimensional taxonomy, five mechanism families. Use this as your memory design reference.

Key takeaway: Formal framework for memory: write-manage-read loop with temporal, representational, and policy dimensions.

4.3 MOSAIC (2025)

MOSAIC: Aligning Multi-Step Tool Use Safely

Plan → Check → Act/Refuse as default safety loop. Explicit safety reasoning, refusal as first-class action.

Key takeaway: Safety isn’t bolted on — it’s embedded in the control flow.

4.4 Agent-C (2025)

ArXiv	2512.23738

Agent-C: Formal Temporal Safety Guarantees for LLM Agents

Formal methods for agent safety. DSL for temporal constraints (“authenticate BEFORE query”). 100% safety conformance.

Key takeaway: Safety can be proven, not just hoped for.

Tier 5 — Production & Deployment

Where research meets reality.

5.1 OpenCode Architecture

Repository: github.com/anomalyco/opencode

Study the event-driven architecture, 75+ provider abstraction, and SQLite session persistence. A production reference implementation for many of the patterns described in the academic papers.

Key takeaway: Event bus decouples I/O from processing. Provider abstraction enables zero-lock-in.

5.2 Hermes Agent Security Model

Repository: github.com/NousResearch/hermes-agent

Study the layered security architecture (prompt → runtime gates → OS isolation), the GEPA self-evolution engine, and the 15+ channel gateway.

Key takeaway: Multi-layer defense. Self-evolution as target state.

5.3 MCP Specification

Spec: modelcontextprotocol.io

The emerging standard for agent-tool interaction. Understand the tool/resource/prompt/roots primitives.

Key takeaway: Standardize tool interfaces. Every agent benefits from every tool.

Quick Reference: By Topic

Agent Loops

ReAct (Yao 2022) — start here
Reflexion (Shinn 2023) — adds self-critique
Plan-and-Execute — separates planning from execution

Memory

Generative Agents (Park 2023) — foundational
Memory Survey (2026) — comprehensive reference
AgeMem (2026) — agentic, learned memory

Multi-Agent

MetaGPT (ICLR 2024) — SOP-based
MACNet (ICLR 2025) — scaling laws
MegaAgent (ACL 2025) — dynamic generation

OS Design

AIOS (NeurIPS 2024) — LLM kernel
ACOS (2024) — agent-centric
Architecting AgentOS (2026) — frontier

Safety

MOSAIC (2025) — plan-check-act
Agent-C (2025) — formal guarantees
IntentGuard — runtime gates

Tools

SWE-Agent (NeurIPS 2024) — ACI concept
ToolACE (ICLR 2025) — data generation
MCP Spec — standard protocol

Prioritized Timeline

Total: ~5 weeks for comprehensive understanding at 2-3 papers per week.