Multi-Agent Systems & Tool Ecosystems

Multi-agent orchestration, collaboration scaling laws, MCP tool protocol, and tool learning.

Multi-Agent Systems & Tool Ecosystems

Two forces shape the scalability of agentic OS: multi-agent orchestration (how agents collaborate) and tool ecosystems (how agents act on the world).

PART 1: Multi-Agent Orchestration

1.1 MetaGPT — SOP-Based Collaboration

Field	Value
Title	MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework
Authors	Sirui Hong, Mingchen Zhuge, Jonathan Chen, Jürgen Schmidhuber
Venue	ICLR 2024 (Oral)
ArXiv	2308.00352

Core Pattern: Encode Standardized Operating Procedures (SOPs) into prompts, modeling agents as a simulated software company.

Architecture:

┌──────────────────────────────────────────────┐
│            MetaGPT Software Company            │
│                                                │
│  Product Manager ──→ Architect ──→ Engineer   │
│        │                │              │       │
│        └────────────────┴──────────────┘       │
│                   │                            │
│         Shared Message Pool (pub/sub)          │
│                   │                            │
│           Executable Feedback Loop             │
└──────────────────────────────────────────────┘

Key insights:

Role specialization with SOP-constrained communication
Structured messages via shared pool + publish-subscribe
Executable feedback for self-correction (run the code, not just review it)

Results: Outperforms ChatDev on HumanEval and MBPP.

Why it matters: Proves that organizational structure matters for agent teams. Well-defined roles + SOP-constrained communication outperform free-form collaboration. The assembly-line paradigm is surprisingly effective.

1.2 MACNet — Collaboration Scaling Laws

Field	Value
Title	基于大型语言模型的多智能体协作扩展研究
Authors	Chen Qian, Zihao Xie, Maosong Sun
Institution	Tsinghua University, Peng Cheng Laboratory
Venue	ICLR 2025

Core question: Does adding more agents improve performance, like neural scaling laws?

Key findings:

Logistic growth: Performance improves with agent count, saturating around ~100 agents
Irregular topology wins: Random graph topologies outperform regular ones due to small-world properties (shorter interaction paths)
Collaborative emergence is cheap: Neural scaling needs billions of parameters; collaborative emergence appears at hundreds of agents

Architecture: DAG-organized agents — nodes are Actors, edges are Critics.

Why it matters: Provides the first empirical answer to “how many agents should we spawn?” The answer: more helps, but with diminishing returns. And topology matters more than you’d think.

1.3 CAMEL — Role-Playing Framework

Field	Value
Title	CAMEL: Communicative Agents for “Mind” Exploration
ArXiv	2303.17760

Core Pattern: AI user + AI assistant role-play through multi-turn conversation with “inception prompting” to guide autonomous cooperation.

Challenges identified: Conversation deviation, role flipping, termination conditions — issues that remain relevant for any autonomous multi-agent system.

1.4 MegaAgent — No Predefined SOPs

Field	Value
Title	MegaAgent: A Large-Scale Autonomous LLM-based Multi-Agent System Without Predefined SOPs
Authors	Qian Wang, Tianyu Wang, et al. (Huazhong University of Science & Technology)
Venue	ACL 2025 Findings

Core pattern: Dynamic agent generation by task complexity — no hand-crafted SOPs. Automatic task decomposition, parallel execution, efficient communication, system-level monitoring.

Results: Scaled to 590 agents in national policy simulation; completed Gomoku game development in 800 seconds. Significantly outperforms MetaGPT on task completion efficiency and scalability.

1.5 Puppeteer — Dynamic Orchestration

Central “puppeteer” dynamically selects and sequences agent activation:

Dynamic routing based on current context
Adaptive evolution through REINFORCE learning
Converges toward tighter, more cyclic reasoning structures

Why it matters: Proves that a central orchestrator (vs. fully decentralized) is efficient for structured tasks.

1.6 Multi-Agent Architecture Spectrum

Architecture	Control	Example	Best For
Centralized	One supervisor routes tasks	Puppeteer, MAC	Structured, decomposable tasks
Decentralized	Peer-to-peer messaging	CAMEL	Creative, open-ended tasks
Hybrid	Hierarchy with local autonomy	MetaGPT, MACNet	Complex organizational tasks
Dynamic	Topology evolves with task	MegaAgent	Unknown-scope problems

PART 2: Tool Ecosystems

2.1 Model Context Protocol (MCP)

Specification: modelcontextprotocol.io GitHub: modelcontextprotocol (42 repos, 79k+ stars)

MCP standardizes how agents discover, invoke, and reason about tools:

┌──────────┐     ┌──────────────┐     ┌──────────┐
│  Agent   │────→│  MCP Client  │────→│  Tool    │
│  (LLM)   │←────│  (Protocol)  │←────│  Server  │
└──────────┘     └──────────────┘     └──────────┘

Core primitives:

Tools: Functions the agent can call (with schema-based input/output)
Resources: Data the agent can read (via URI)
Prompts: Reusable interaction templates
Roots: Named filesystem locations for scoped access
Sampling: Interactive human-in-the-loop prompts

Transport: stdio (local), SSE, HTTP streaming

Key design insight (Cloudflare): “Code Mode” — expose 2 tools (search() + execute()) rather than 2,500 API endpoints as 2,500 tools. ~1,000 tokens fixed cost regardless of API size.

2.2 Tool Learning — How Agents Get Better at Tools

ToolACE (ICLR 2025)

Automated function-calling data generation:

Tool Self-Evolution Synthesis (TSS): LLM-as-evaluator generates 26,507 diverse APIs
Self-guided complexifying: Multi-agent interaction generates four call types (single, parallel, dependent, non-tool)
Dual-layer verification: Rule-based + model-based validation

8B model outperforms GPT-4 on function calling benchmarks.

ToolCoder (2025)

| ArXiv | 2502.11404 |

Reframes tool learning as code generation. Converts natural language queries into structured Python function scaffolds. Successfully executed code is stored in a function repository for reuse. Error backtracking for systematic debugging.

InfTool (2025)

| ArXiv | 2512.23611 |

Fully autonomous self-evolving framework — three collaborative agents (user simulator, tool assistant, MCP server) generate diverse, verified trajectories from raw API specs. Closed-loop: synthesized data trains models, improved models generate better data.

InfTool-7B (61.7) surpasses GPT-5.2 (60.4) on BFCL benchmark.

Tool-R1 (2025)

| ArXiv | 2509.12867 |

RL-based tool use training. Generates executable Python code for flexible tool calling; reward function combines LLM judgment + execution success rate. Dynamic sample queue for training efficiency. ~10% accuracy improvement on GAIA.

2.3 Agent-Computer Interfaces (ACI)

Field	Value
Title	SWE-Agent: Agent-Computer Interfaces Enable Automated Software Engineering
ArXiv	2405.15793

Key insight: Interface design for AI agents matters as much as model capability.

SWE-Agent provides a custom ACI with a small set of simple, high-level actions (view, search, edit files) instead of a granular Linux shell. Results: 12.5% SWE-bench resolution (vs. 3.8% prior SOTA). ACI reduces errors from 35% to 9%.

Why it matters: Don’t give the agent raw bash. Design an interface that matches how the model thinks.

2.4 Tool System Design Principles

Principle	Source	Implementation
Few high-level tools beat many low-level ones	SWE-Agent	`< 10` curated actions, not raw shell
Code mode for complex APIs	Cloudflare MCP	`search()` + `execute()` for any API
Tools should self-evolve	ToolACE, InfTool	Auto-generate tool data from specs
Schema-driven safety	MCP Guardrails	Input/output types + pre/post conditions
Composition over enumeration	Voyager	Reusable skills compose into new capabilities

3. Multi-Agent + Tool Integration

The Agent-Tool Interface

┌──────────────────────────────────────────────┐
│              Orchestration Layer              │
│  ┌──────────┐  ┌──────────┐  ┌───────────┐  │
│  │Supervisor│  │  Router  │  │  Delegate │  │
│  └──────────┘  └──────────┘  └───────────┘  │
│         │            │             │         │
│         ▼            ▼             ▼         │
│  ┌──────────────────────────────────────┐   │
│  │           MCP Client Layer            │   │
│  │  ┌──────┐ ┌──────┐ ┌──────┐         │   │
│  │  │ FS   │ │Browser│ │ DB  │  ...     │   │
│  │  │Server│ │Server │ │Server│         │   │
│  │  └──────┘ └──────┘ └──────┘         │   │
│  └──────────────────────────────────────┘   │
└──────────────────────────────────────────────┘

Key pattern: Every agent (orchestrator, worker, specialist) talks to tools through the same MCP interface. This means:

Tools can be shared across agents without duplication
Tool permissions can be managed centrally
Adding a new tool benefits all agents immediately

For Agentic OS Design

OS Layer	Multi-Agent Pattern	Tool Pattern
Orchestration	Puppeteer-style central routing	Task-based tool selection
Kernel	Dynamic agent spawn/destroy	Tool registry with permission model
Safety	Agent capability scoping	Tool-level access control
Memory	Shared memory pool with provenance	Tool results as memory entries
Evolution	Agent specialization over time	Self-evolving tool catalogs