Superseding Static Harnesses with Learnable Skills for Context Optimization
This repository accompanies the paper Meta Context Engineering via Agentic Skill Evolution. Meta Context Engineering (MCE) is a bi-level agentic framework that co-evolves context engineering skills and context artifacts, replacing rigid CE heuristics with learnable skills that automatically discover optimal context representations and optimization procedures.
MCE achieves consistent improvements across five diverse domains (finance, chemistry, medicine, law, AI safety):
| Setting | Metric | MCE | Best Baseline | Improvement |
|---|---|---|---|---|
| Offline | Avg. Relative Gain vs Base | 89.1% | 70.7% (ACE) | +18.4% |
| Online | Avg. Relative Gain vs Base | 74.1% | 41.1% (ACE) | +33.0% |
Efficiency gains:
- 13.6× faster training than ACE
- 4.8× fewer rollouts required
- Dynamic context length: 1.5K to 86K tokens based on task needs
Reproduce experiments: See mce-artifact for code and data used in our paper.
Current context engineering methods are fundamentally limited by manually crafted harnesses, for example:
- Prompt rewriting (GEPA) favors brevity → fails on tasks requiring detailed knowledge
- Additive curation (ACE) favors verbosity, structuring context as rigid itemized lists → causes context bloat and lacks structural expressiveness
- Manually crafted agentic harnesses restrict optimization to narrow, intuition-bound design spaces
MCE breaks free by treating the context engineering skill itself as a learnable object:
Traditional CE: Fixed workflow → Optimized context
MCE: Learnable skill + fully agentic CE → Optimized context function
MCE formalizes context as a context function c(x) = (F_k ∘ ... ∘ F_1)(x; ρ):
- Static components (ρ): Knowledge bases, decision rules, examples
- Dynamic operators (F): Retrieval, filtering, composition logic
Meta-Level (Agentic Skill Evolution):
- Analyzes task specification and performance history
- Generates improved skills via agentic crossover
- Skills include: methodology, executable code, context templates, dynamic operators
Base-Level (Fully Agentic Context Optimization):
- Executes skills to learn from training rollouts
- Produces context as files and code
- No structural constraints on context representation
git clone https://github.com/metaevo-ai/meta-context-engineering
cd meta-context-engineering
# Download uv if not installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install with uv
uv syncCopy .env.template to .env and set your API keys:
cp .env.template .envThe system uses OpenRouter by default with automatic fallback to OpenAI:
# Option 1: OpenRouter (recommended)
export OPENROUTER_API_KEY="your-api-key"
export OPENROUTER_API_BASE="https://openrouter.ai/api/v1"
# Option 2: OpenAI (fallback if OpenRouter not set)
export OPENAI_API_KEY="your-api-key"
export OPENAI_API_BASE="https://api.openai.com/v1" # Optional
# To use Claude agent SDK
export ANTHROPIC_API_KEY="your-anthropic-api-key"
# If you are using OpenRouter
export ANTHROPIC_BASE_URL=https://openrouter.ai/api
export ANTHROPIC_AUTH_TOKEN="$OPENROUTER_API_KEY"
export ANTHROPIC_API_KEY=""
# Set default models for Claude agent SDK
export ANTHROPIC_DEFAULT_SONNET_MODEL=
export ANTHROPIC_DEFAULT_OPUS_MODEL=
export ANTHROPIC_DEFAULT_HAIKU_MODEL=# Run training on the symptom diagnosis task
bash scripts/train_symptom_diagnosis.sh # Optimize context for one-step inference
bash scripts/train_symptom_diagnosis_twostep.sh # Optimize context for a two-step workflow
bash scripts/train_symptom_diagnosis_agent.sh # Optimize context for an agentExample Results: Boost DeepSeek V3.1 performance from 45% to 70% accuracy with only 100 training rollouts on symptom diagnosis!
meta-context-engineering/
├── env/ # Task environments
│ ├── base.py # InterfaceSignature, TaskEnvironment
│ ├── registry.py # Environment registry
│ ├── TUTORIAL.md # Guide for adding new environments
│ └── symptom_diagnosis*/ # Example environments
├── mce/ # Core framework
│ ├── main.py # Training orchestration
│ ├── meta_agent.py # Meta-level: skill evolution
│ ├── base_agent.py # Base-level: context optimization
│ └── validation.py # Interface validation
├── scripts/ # Training & evaluation scripts
└── assets/ # Paper and figures
See env/TUTORIAL.md for a comprehensive guide on creating custom task environments.
Quick steps:
- Create environment directory with data files
- Implement
TaskEnvironmentsubclass - Register in
env/registry.py - Create training script and run
uv run python -m mce.main \
--workspace "workspace/my_task" # Output directory
--env "my_task" # Environment name
--train-data "path/to/train.jsonl" # Training data
--val-data "path/to/val.jsonl" # Validation data
--model "deepseek/deepseek-chat-v3.1" # Inference LLM
--iterations 3 # Meta-iterations
--train-limit 50 # Training samples
--val-limit 20 # Validation samplesIf you find this work useful, please kindly give it a star and cite:
@misc{ye2026mce,
title={Meta Context Engineering via Agentic Skill Evolution},
author={Haoran Ye and Xuning He and Vincent Arak and Haonan Dong and Guojie Song},
year={2026},
eprint={2601.21557},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2601.21557},
note={Code available at \url{https://github.com/metaevo-ai/meta-context-engineering}},
}MIT License

