Nexus Agents is an open-source competitive platform for developing, evaluating, and sharing specialized language models using Ollama. Think of it as the Olympics for AI models - where each task is a different sport, and models compete to be the fastest, most accurate, or most efficient.
Instead of building general-purpose LLMs, we focus on creating highly specialized models that excel at specific domains and tasks. Each model competes for championship badges, and winners are automatically published to the Ollama Hub for worldwide access.
- Model Definition: Models are defined using Ollama Modelfiles for easy sharing and reproduction
- Automated Evaluation: Each model is tested against task-specific benchmarks
- Badge Competition: Models compete for championship badges in different categories
- Quality Gates: PRs are only merged if models beat existing benchmarks
- Performance Tracking: Comprehensive metrics for quality, latency, and resource usage
- Automatic Publishing: Champion models are pushed to Ollama Hub instantly
Models compete for these prestigious badges:
- 🚀 Speed Champion: Lowest latency (TTFT + token generation)
- 🎯 Accuracy Master: Highest quality scores
- 🪶 Lightweight Leader: Lowest memory footprint
- 🏆 Overall Champion: Best balance of all metrics
Multiple models can hold different badges for the same task, creating healthy competition across different optimization goals.
graph TD
A[Create Specialized Model] --> B[Define Modelfile]
B --> C[Create Benchmark Dataset]
C --> D[Submit PR]
D --> E[Automated Testing]
E --> F{Beats Current Champions?}
F -->|Yes| G[Earn Badges & Push to Hub]
F -->|No| H[Improve Model]
H --> D
Our flagship task showcases the platform's capabilities - a specialized LLM designed to assist in psychological assessment of children in conflict zones. Built on Gemma 3B, this model provides trauma-informed responses while maintaining professional boundaries and safety protocols.
Key Features:
- Initial trauma screening guidance
- Coping strategy recommendations
- Urgent case identification
- Cultural sensitivity handling
- Report generation assistance
This real-world application demonstrates how specialized models can address critical humanitarian needs while competing on technical excellence.
Each task is self-contained with:
- Model definition (Modelfile)
- Benchmark dataset
- Evaluation metrics
- Championship leaderboard
- Documentation
Current tasks:
- Child Trauma Assessment: Psychological assessment support for children living in regions under conflicts/wars
- (More tasks coming - submit yours!)
ollamaforge/
├── tasks/ # Each specialized task
│ └── child_trauma_assessment/
│ ├── model/ # Ollama model definition
│ ├── benchmarks/ # Test datasets
│ └── evaluation/ # Metrics and evaluation
├── docs/ # Documentation
└── .github/
└── workflows/ # CI/CD pipelines
Want to dethrone the current champions? Here's how:
- Choose Your Battle: Pick speed, accuracy, efficiency, or overall excellence
- Fork & Improve: Enhance existing models or create new ones
- Local Testing: Use our evaluation scripts to benchmark locally
- Submit PR: Demonstrate improvements over current champions
Important: One task per PR to maintain clean evaluation pipelines.
For detailed contribution guidelines, see CONTRIBUTING.md.
- Create directory under
tasks/ - Provide:
- Modelfile for your specialized model
- Benchmark dataset with test cases
- Evaluation metrics (or use standard ones)
- Task documentation
- Submit PR with comprehensive benchmarks
For complete submission guidelines, check out docs/guidelines.md.
- Fork the repository
- Modify the Modelfile or benchmarks
- Test locally using Ollama
- Submit a PR
Your PR must demonstrate:
- Improved quality scores
- Maintained or improved performance
- No regression in existing test cases
-
Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh -
Clone and setup:
git clone https://github.com/Dahimi/Nexus-Agents.git cd Nexus-Agents python -m venv venv source venv/bin/activate # On Unix/MacOS pip install -r requirements/requirements.txt
-
Test a task locally:
cd tasks/child_trauma_assessment ollama create trauma-model -f model/Modelfile python ../../scripts/evaluate_task.py child_trauma_assessment
When your model claims a championship:
- Automatic push to Ollama Hub with badges displayed
- Instant worldwide availability via
ollama pull - Credit as model creator/contributor
- Performance tracking and leaderboard placement
Note: We're currently fine-tuning our CI/CD pipeline for seamless Ollama Hub integration - manual publishing may be needed temporarily.
- More Specialized Tasks: Adding models for:
- Medical diagnosis assistance
- Technical documentation
- Educational assessment
- Local language processing
- Enhanced Evaluation:
- GPU performance metrics
- Cross-task evaluation
- A/B testing framework
- Community Features:
- Model leaderboards
- Usage analytics
- Collaborative improvement tools
This project is licensed under the MIT License - see the LICENSE file for details.