oumi-ai · oelachqar · Apr 16, 2026 · Apr 16, 2026 · idoudali · Apr 16, 2026
diff --git a/docs/index.md b/docs/index.md
@@ -60,6 +60,7 @@ get_started/tutorials
 
 user_guides/train/train
 user_guides/infer/infer
+user_guides/deploy
 user_guides/evaluate/evaluate
 user_guides/analyze/analyze
 user_guides/judge/judge
@@ -68,6 +69,7 @@ user_guides/synth
 user_guides/tune
 user_guides/quantization
 user_guides/customization
+user_guides/mcp
 ```
 
 ```{toctree}

diff --git a/docs/user_guides/analyze/analyze.md b/docs/user_guides/analyze/analyze.md
@@ -71,6 +71,153 @@ The built-in `length` analyzer computes text length metrics:
 Enable token counting by adding `tokenizer_config` to your configuration. See {doc}`analyze_config` for setup details.
 :::
 
+### Data Quality Analyzer
+
+The built-in `quality` analyzer ({py:class}`~oumi.analyze.analyzers.quality.DataQualityAnalyzer`) catches five common data issues without running any model inference. It's meant as a cheap, first-pass sanity check before training or fine-tuning.
+
+| Field                              | What it flags                                                             |
+|------------------------------------|---------------------------------------------------------------------------|
+| `has_non_alternating_turns`        | Consecutive same-role messages (`user`, `user`, …) in non-system turns     |
+| `has_no_user_message`              | Conversation has no `user` message at all (including empty conversations) |
+| `has_system_message_not_at_start`  | A `system` message appears anywhere other than position 0                  |
+| `has_empty_turns` / `empty_turn_count` | Any message whose content is empty or whitespace-only                  |
+| `has_invalid_values` / `invalid_value_patterns` | Strings like `NaN`, `null`, `None`, `undefined` leaked into content |
+
+```yaml
+analyzers:
+  - id: quality
+```
+
+Because the output is typed ({py:class}`~oumi.analyze.analyzers.quality.DataQualityMetrics`), quality fields can be referenced by later **tests** using dotted metric paths (see [Testing Framework](#testing-framework)), e.g. `quality.has_no_user_message`.
+
+### Turn Stats Analyzer
+
+The built-in `turn_stats` analyzer ({py:class}`~oumi.analyze.analyzers.turn_stats.TurnStatsAnalyzer`) reports conversation shape: `num_turns`, `num_user_turns`, `num_assistant_turns`, `has_system_message`, `first_turn_role`, `last_turn_role`. Useful for finding malformed or single-sided conversations.
+
+```yaml
+analyzers:
+  - id: turn_stats
+```
+
+## Typed Analyzer Framework
+
+All built-in analyzers above (`length`, `quality`, `turn_stats`) are implemented in the **typed analyzer framework** ({py:class}`~oumi.analyze.base.BaseAnalyzer`). Each analyzer declares a pydantic result model, which gives you:
+
+- **Auto-generated JSON schemas** for result documentation and validation.
+- **Typed access** to analyzer output in Python (fields are proper attributes, not dict keys).
+- **Metric paths** for the testing framework — `{analyzer_id}.{field_name}`, or `{instance_id}.{field_name}` when you run multiple instances of the same analyzer.
+
+### Defining a Typed Analyzer
+
+```python
+from pydantic import BaseModel, Field
+from oumi.analyze.base import ConversationAnalyzer
+from oumi.core.registry import register_sample_analyzer
+from oumi.core.types.conversation import Conversation
+
+
+class QuestionMetrics(BaseModel):
+    num_questions: int = Field(description="Count of '?' characters")
+    density: float = Field(description="Questions per message")
+
+
+@register_sample_analyzer("questions")
+class QuestionAnalyzer(ConversationAnalyzer[QuestionMetrics]):
+    _result_model = QuestionMetrics
+
+    @classmethod
+    def get_config_schema(cls) -> dict:
+        return {"properties": {}}
+
+    def analyze(self, conversation: Conversation) -> QuestionMetrics:
+        total = sum(m.content.count("?") for m in conversation.messages)
+        return QuestionMetrics(
+            num_questions=total,
+            density=total / max(len(conversation.messages), 1),
+        )
+```
+
+Point the config at your typed analyzer the same way as built-ins:
+
+```yaml
+analyzers:
+  - id: questions
+    instance_id: questions            # required for typed analyzers
+```
+
+When you need two configurations of the same analyzer (e.g. two `length` analyzers with different tokenizers), give each one a unique `instance_id`.
+
+### Custom Metrics (No Code Registration Required)
+
+For quick one-offs you don't want to package as an analyzer, declare a `custom_metrics` block directly in YAML:
+
+```yaml
+custom_metrics:
+  - id: word_to_char_ratio
+    scope: conversation              # message | conversation | dataset
+    description: "Ratio of words to characters"
+    output_schema:
+      - name: ratio
+        type: float
+        description: "Words divided by characters"
+    function: |
+      def compute(conversation):
+          chars = sum(len(m.content) for m in conversation.messages)
+          words = sum(len(m.content.split()) for m in conversation.messages)
+          return {"ratio": words / chars if chars > 0 else 0.0}
+```
+
+```{warning}
+Custom metric `function` strings are compiled and run as arbitrary Python. Only load configs from sources you trust.
+```
+
+## Testing Framework
+
+The typed framework also ships a **testing** layer that evaluates analyzer output against thresholds and produces a pass/fail summary — useful for CI, regression detection, and "fail the run if more than 5% of conversations are missing a user message".
+
+### Defining Tests
+
+```yaml
+tests:
+  - id: max_words
+    type: threshold
+    metric: length.total_words        # <analyzer_id_or_instance_id>.<field>
+    operator: ">"
+    value: 10000
+    max_percentage: 5.0               # fail if >5% of conversations match
+
+  - id: no_missing_user_msg
+    type: threshold
+    metric: quality.has_no_user_message
+    operator: "=="
+    value: true
+    max_percentage: 0.0               # fail if any conversation is missing a user
+```
+
+Each test compares a metric to a `value` using `operator`, then checks whether the flagged fraction exceeds `max_percentage` (or falls below `min_percentage`).
+
+### Running Tests Incrementally with BatchTestEngine
+
+For large datasets where full analyzer output won't fit in memory, use {py:class}`~oumi.analyze.testing.batch_engine.BatchTestEngine`. It accumulates only lightweight counters and per-test affected conversation IDs as batches stream through, then returns a `TestSummary` at the end:
+
+```python
+from oumi.analyze.testing.batch_engine import BatchTestEngine
+
+engine = BatchTestEngine(config.tests)
+
+for batch_results, batch_conversation_ids in stream_batches():
+    engine.process_batch(batch_results, batch_conversation_ids)
+
+summary = engine.finalize()
+print(f"{summary.passed_tests}/{summary.total_tests} passed "
+      f"({summary.pass_rate}%)")
+
+# IDs of conversations that caused test failures, per test:
+affected = engine.get_affected_conversation_ids()
+```
+
+Use the standard `TestEngine` (same module) when the full dataset fits in memory; use `BatchTestEngine` when it doesn't.
+
 ## Working with Results
 
 ### Analysis Summary

diff --git a/docs/user_guides/deploy.md b/docs/user_guides/deploy.md
@@ -0,0 +1,121 @@
+# Deploying Models
+
+Oumi provides a top-level `oumi deploy` command for taking a trained or downloaded model and standing it up as a managed inference endpoint on a third-party provider. Today it supports **Fireworks AI** and **Parasail.io**.
+
+```{admonition} Related
+:class: note
+- To *launch training* on remote clusters, see {doc}`/user_guides/launch/launch`.
+- To *call* a deployed endpoint, see {doc}`/user_guides/infer/inference_engines`.
+```
+
+## Overview
+
+The deploy workflow has three stages, each exposed as a sub-command:
+
+1. **Upload** — push the model (full weights or a LoRA adapter) to the provider.
+2. **Create endpoint** — provision hardware and start serving the uploaded model.
+3. **Test / use** — smoke-test the endpoint and then call it with any inference engine.
+
+For the common case, `oumi deploy up` runs all three stages end-to-end from a single YAML config.
+
+## Prerequisites
+
+- A provider account and API key exported in your shell:
+  - Fireworks: `FIREWORKS_API_KEY`
+  - Parasail:  `PARASAIL_API_KEY`
+- For Fireworks, the model must exist on your local disk (HuggingFace download or an Oumi training output).
+
+## Quick Start: End-to-End Deploy
+
+```bash
+oumi deploy up --config configs/examples/deploy/fireworks_deploy.yaml
+```
+
+The `--config` YAML matches the {py:class}`~oumi.deploy.deploy_config.DeploymentConfig` schema:
+
+```yaml
+# configs/examples/deploy/fireworks_deploy.yaml
+model_source: /path/to/my-finetuned-model/   # local directory
+provider: fireworks                           # fireworks | parasail
+model_name: my-finetuned-model-v1             # display name on the provider
+model_type: full                              # full | adapter
+# base_model: accounts/fireworks/models/llama-v3p1-8b-instruct  # required if adapter
+
+hardware:
+  accelerator: nvidia_h100_80gb               # see `oumi deploy list-hardware`
+  count: 2
+
+autoscaling:
+  min_replicas: 1
+  max_replicas: 4
+
+test_prompts:
+  - "Hello, how are you?"
+```
+
+Any of `model_source`, `provider`, and `hardware` can be overridden on the CLI, e.g.:
+
+```bash
+oumi deploy up \
+  --config fireworks_deploy.yaml \
+  --model-path /tmp/llama3-8b \
+  --hardware nvidia_a100_80gb
+```
+
+`oumi deploy up` will upload the model, wait for it to be ready, create an endpoint, optionally run any `test_prompts`, and print the endpoint URL.
+
+## Sub-Commands
+
+| Command                         | What it does                                                         |
+|---------------------------------|----------------------------------------------------------------------|
+| `oumi deploy up`                | Full pipeline: upload → create endpoint → test                        |
+| `oumi deploy upload`            | Upload a model only                                                   |
+| `oumi deploy create-endpoint`   | Create an endpoint for a previously uploaded model                    |
+| `oumi deploy list`              | List all deployments on the provider                                  |
+| `oumi deploy list-models`       | List uploaded models                                                  |
+| `oumi deploy list-hardware`     | List hardware options available for a provider                        |
+| `oumi deploy status`            | Show endpoint state, replica counts, URL                              |
+| `oumi deploy start` / `stop`    | Start or stop an existing endpoint (pause to save cost)               |
+| `oumi deploy delete`            | Delete an endpoint                                                    |
+| `oumi deploy delete-model`      | Delete an uploaded model                                              |
+| `oumi deploy test`              | Send a sample request to an endpoint                                  |
+
+Add `--help` to any sub-command for the exact flags it accepts, or see {doc}`/cli/commands`.
+
+## Using a Deployed Endpoint
+
+Once `oumi deploy up` reports `RUNNING`, point any Oumi inference engine at the returned URL. For Fireworks:
+
+```python
+from oumi.inference import FireworksInferenceEngine
+from oumi.core.configs import ModelParams
+
+engine = FireworksInferenceEngine(
+    model_params=ModelParams(model_name="my-finetuned-model-v1")
+)
+```
+
+For Parasail:
+
+```python
+from oumi.inference import ParasailInferenceEngine
+from oumi.core.configs import ModelParams
+
+engine = ParasailInferenceEngine(
+    model_params=ModelParams(model_name="my-finetuned-model-v1")
+)
+```
+
+Both engines are documented in {doc}`/user_guides/infer/inference_engines`.
+
+## Tips
+
+- **Cost control.** Use `oumi deploy stop <endpoint>` to pause an endpoint without deleting it; `start` brings it back online. Set `autoscaling.min_replicas: 0` if the provider supports scale-to-zero.
+- **LoRA adapters.** Set `model_type: adapter` and a matching `base_model` to deploy a LoRA adapter on top of a hosted base model. This is usually cheaper than a full model.
+- **Smoke tests.** `test_prompts` at the bottom of the YAML run automatically after `oumi deploy up` finishes — quick sanity check before sending real traffic.
+
+## See Also
+
+- {doc}`/user_guides/infer/inference_engines` — calling the deployed endpoint
+- {doc}`/user_guides/launch/launch` — launching training jobs on remote clusters
+- {doc}`/cli/commands` — CLI reference