diff --git a/docs/research/agent-ecosystem-report.md b/docs/research/agent-ecosystem-report.md index 3cca9978a67..d2cb0da8622 100644 --- a/docs/research/agent-ecosystem-report.md +++ b/docs/research/agent-ecosystem-report.md @@ -86,3 +86,14 @@ Anthropic's SDK relies on a deliberately simple architecture, keeping the agent **Recommendation:** Summit's internal orchestration and benchmarking must expand to cover these advanced topologies, specifically evaluating the overhead of coordination and the resilience of durable execution under load. _Update:_ We have explicitly expanded our benchmarks to track State Recovery Success Rate (SRSR), Coordination Token Overhead (CTO), and Orchestration Latency Penalty (OLP). We have also created adapter layers for LangGraph, CrewAI, and AutoGen to support these metrics. + +### 7. MetaGPT + +MetaGPT is a multi-agent framework purpose-built to automate software development. It simulates a full-stack product team—PMs, tech leads, developers, and analysts—as coordinated AI agents for business automation that follow standardized engineering workflows. + +- **Core Paradigm:** Software company simulation (SOP-driven multi-agent system). +- **Key Capabilities:** + - **Role-based agents:** Simulates a full software team: PM, Architect, Engineer. + - **Standard Operating Procedures (SOPs):** Embeds human workflows into agent operations for structured outputs. + - **End-to-End Development:** Capable of handling requirements to fully working code. +- **Best Use Cases:** Early-stage ideation, Proof-of-Concept (PoC) development, or augmenting engineering capacity. diff --git a/docs/research/agent-eval-insights.md b/docs/research/agent-eval-insights.md index a78ce336b9c..8014cb02ff5 100644 --- a/docs/research/agent-eval-insights.md +++ b/docs/research/agent-eval-insights.md @@ -52,3 +52,15 @@ Based on the latest developments in the agent ecosystem (LangGraph, CrewAI, Auto 3. [x] **Dataset Generation:** Construct the golden fixtures for `concurrent_stress_test` and `mid_task_failure_recovery` in `GOLDEN/datasets/agent_orchestration/`. 4. [ ] **New Data Modalities:** Construct multimodal and cross-framework A2A interaction fixtures. 5. [x] **Metric Implementation:** Add the `SRSR` and `CTO` scoring logic to `evaluation/scoring/agent_metrics.py`. + +- **SOP Adherence:** Measuring the ability of agents to strictly follow Standard Operating Procedures (SOPs) during code generation and system design. +- **Case: `metagpt_full_stack_poc`** + - **Description:** Task MetaGPT with generating a complete PoC for a simple web application from a one-line prompt. + - **Target Framework:** MetaGPT. + - **Goal:** Evaluate the quality of the generated code, architecture, and alignment with the initial prompt. +- **SOP Deviation Rate (SDR):** The frequency at which agents deviate from prescribed SOPs during a multi-step task. + +## Next Steps for the Summit Team (Continued) + +1. [ ] **MetaGPT Integration:** Implement adapter layers for MetaGPT. +2. [ ] **SOP Metrics:** Implement metrics for SOP Adherence and SDR.