Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ get_started/tutorials

user_guides/train/train
user_guides/infer/infer
user_guides/deploy
user_guides/evaluate/evaluate
user_guides/analyze/analyze
user_guides/judge/judge
Expand All @@ -68,6 +69,7 @@ user_guides/synth
user_guides/tune
user_guides/quantization
user_guides/customization
user_guides/mcp
```

```{toctree}
Expand Down
147 changes: 147 additions & 0 deletions docs/user_guides/analyze/analyze.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,153 @@ The built-in `length` analyzer computes text length metrics:
Enable token counting by adding `tokenizer_config` to your configuration. See {doc}`analyze_config` for setup details.
:::

### Data Quality Analyzer

The built-in `quality` analyzer ({py:class}`~oumi.analyze.analyzers.quality.DataQualityAnalyzer`) catches five common data issues without running any model inference. It's meant as a cheap, first-pass sanity check before training or fine-tuning.

| Field | What it flags |
|------------------------------------|---------------------------------------------------------------------------|
| `has_non_alternating_turns` | Consecutive same-role messages (`user`, `user`, …) in non-system turns |
| `has_no_user_message` | Conversation has no `user` message at all (including empty conversations) |
| `has_system_message_not_at_start` | A `system` message appears anywhere other than position 0 |
| `has_empty_turns` / `empty_turn_count` | Any message whose content is empty or whitespace-only |
| `has_invalid_values` / `invalid_value_patterns` | Strings like `NaN`, `null`, `None`, `undefined` leaked into content |

```yaml
analyzers:
- id: quality
```

Because the output is typed ({py:class}`~oumi.analyze.analyzers.quality.DataQualityMetrics`), quality fields can be referenced by later **tests** using dotted metric paths (see [Testing Framework](#testing-framework)), e.g. `quality.has_no_user_message`.

### Turn Stats Analyzer

The built-in `turn_stats` analyzer ({py:class}`~oumi.analyze.analyzers.turn_stats.TurnStatsAnalyzer`) reports conversation shape: `num_turns`, `num_user_turns`, `num_assistant_turns`, `has_system_message`, `first_turn_role`, `last_turn_role`. Useful for finding malformed or single-sided conversations.

```yaml
analyzers:
- id: turn_stats
```

## Typed Analyzer Framework

All built-in analyzers above (`length`, `quality`, `turn_stats`) are implemented in the **typed analyzer framework** ({py:class}`~oumi.analyze.base.BaseAnalyzer`). Each analyzer declares a pydantic result model, which gives you:

- **Auto-generated JSON schemas** for result documentation and validation.
- **Typed access** to analyzer output in Python (fields are proper attributes, not dict keys).
- **Metric paths** for the testing framework — `{analyzer_id}.{field_name}`, or `{instance_id}.{field_name}` when you run multiple instances of the same analyzer.

### Defining a Typed Analyzer

```python
from pydantic import BaseModel, Field
from oumi.analyze.base import ConversationAnalyzer
from oumi.core.registry import register_sample_analyzer
from oumi.core.types.conversation import Conversation


class QuestionMetrics(BaseModel):
num_questions: int = Field(description="Count of '?' characters")
density: float = Field(description="Questions per message")


@register_sample_analyzer("questions")
class QuestionAnalyzer(ConversationAnalyzer[QuestionMetrics]):
_result_model = QuestionMetrics

@classmethod
def get_config_schema(cls) -> dict:
return {"properties": {}}

def analyze(self, conversation: Conversation) -> QuestionMetrics:
total = sum(m.content.count("?") for m in conversation.messages)
return QuestionMetrics(
num_questions=total,
density=total / max(len(conversation.messages), 1),
)
```

Point the config at your typed analyzer the same way as built-ins:

```yaml
analyzers:
- id: questions
instance_id: questions # required for typed analyzers
```

When you need two configurations of the same analyzer (e.g. two `length` analyzers with different tokenizers), give each one a unique `instance_id`.

### Custom Metrics (No Code Registration Required)

For quick one-offs you don't want to package as an analyzer, declare a `custom_metrics` block directly in YAML:

```yaml
custom_metrics:
- id: word_to_char_ratio
scope: conversation # message | conversation | dataset
description: "Ratio of words to characters"
output_schema:
- name: ratio
type: float
description: "Words divided by characters"
function: |
def compute(conversation):
chars = sum(len(m.content) for m in conversation.messages)
words = sum(len(m.content.split()) for m in conversation.messages)
return {"ratio": words / chars if chars > 0 else 0.0}
```

```{warning}
Custom metric `function` strings are compiled and run as arbitrary Python. Only load configs from sources you trust.
```

## Testing Framework

The typed framework also ships a **testing** layer that evaluates analyzer output against thresholds and produces a pass/fail summary — useful for CI, regression detection, and "fail the run if more than 5% of conversations are missing a user message".

### Defining Tests

```yaml
tests:
- id: max_words
type: threshold
metric: length.total_words # <analyzer_id_or_instance_id>.<field>
operator: ">"
value: 10000
max_percentage: 5.0 # fail if >5% of conversations match

- id: no_missing_user_msg
type: threshold
metric: quality.has_no_user_message
operator: "=="
value: true
max_percentage: 0.0 # fail if any conversation is missing a user
```

Each test compares a metric to a `value` using `operator`, then checks whether the flagged fraction exceeds `max_percentage` (or falls below `min_percentage`).

### Running Tests Incrementally with BatchTestEngine

For large datasets where full analyzer output won't fit in memory, use {py:class}`~oumi.analyze.testing.batch_engine.BatchTestEngine`. It accumulates only lightweight counters and per-test affected conversation IDs as batches stream through, then returns a `TestSummary` at the end:

```python
from oumi.analyze.testing.batch_engine import BatchTestEngine

engine = BatchTestEngine(config.tests)

for batch_results, batch_conversation_ids in stream_batches():
engine.process_batch(batch_results, batch_conversation_ids)

summary = engine.finalize()
print(f"{summary.passed_tests}/{summary.total_tests} passed "
f"({summary.pass_rate}%)")

# IDs of conversations that caused test failures, per test:
affected = engine.get_affected_conversation_ids()
```

Use the standard `TestEngine` (same module) when the full dataset fits in memory; use `BatchTestEngine` when it doesn't.

## Working with Results

### Analysis Summary
Expand Down
121 changes: 121 additions & 0 deletions docs/user_guides/deploy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Deploying Models

Oumi provides a top-level `oumi deploy` command for taking a trained or downloaded model and standing it up as a managed inference endpoint on a third-party provider. Today it supports **Fireworks AI** and **Parasail.io**.

```{admonition} Related
:class: note
- To *launch training* on remote clusters, see {doc}`/user_guides/launch/launch`.
- To *call* a deployed endpoint, see {doc}`/user_guides/infer/inference_engines`.
```

## Overview

The deploy workflow has three stages, each exposed as a sub-command:

1. **Upload** — push the model (full weights or a LoRA adapter) to the provider.
2. **Create endpoint** — provision hardware and start serving the uploaded model.
3. **Test / use** — smoke-test the endpoint and then call it with any inference engine.

For the common case, `oumi deploy up` runs all three stages end-to-end from a single YAML config.

## Prerequisites

- A provider account and API key exported in your shell:
- Fireworks: `FIREWORKS_API_KEY`
- Parasail: `PARASAIL_API_KEY`
- For Fireworks, the model must exist on your local disk (HuggingFace download or an Oumi training output).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this statement is 100% given PR https://github.com/oumi-ai/oumi/pull/2360/changes.

on the other hand, we have not exposed that functionality throug the CLI so I think we are ok


## Quick Start: End-to-End Deploy

```bash
oumi deploy up --config configs/examples/deploy/fireworks_deploy.yaml
```

The `--config` YAML matches the {py:class}`~oumi.deploy.deploy_config.DeploymentConfig` schema:

```yaml
# configs/examples/deploy/fireworks_deploy.yaml
model_source: /path/to/my-finetuned-model/ # local directory
provider: fireworks # fireworks | parasail
model_name: my-finetuned-model-v1 # display name on the provider
model_type: full # full | adapter
# base_model: accounts/fireworks/models/llama-v3p1-8b-instruct # required if adapter

hardware:
accelerator: nvidia_h100_80gb # see `oumi deploy list-hardware`
count: 2

autoscaling:
min_replicas: 1
max_replicas: 4

test_prompts:
- "Hello, how are you?"
```

Any of `model_source`, `provider`, and `hardware` can be overridden on the CLI, e.g.:

```bash
oumi deploy up \
--config fireworks_deploy.yaml \
--model-path /tmp/llama3-8b \
--hardware nvidia_a100_80gb
```

`oumi deploy up` will upload the model, wait for it to be ready, create an endpoint, optionally run any `test_prompts`, and print the endpoint URL.

## Sub-Commands

| Command | What it does |
|---------------------------------|----------------------------------------------------------------------|
| `oumi deploy up` | Full pipeline: upload → create endpoint → test |
| `oumi deploy upload` | Upload a model only |
| `oumi deploy create-endpoint` | Create an endpoint for a previously uploaded model |
| `oumi deploy list` | List all deployments on the provider |
| `oumi deploy list-models` | List uploaded models |
| `oumi deploy list-hardware` | List hardware options available for a provider |
| `oumi deploy status` | Show endpoint state, replica counts, URL |
| `oumi deploy start` / `stop` | Start or stop an existing endpoint (pause to save cost) |
| `oumi deploy delete` | Delete an endpoint |
| `oumi deploy delete-model` | Delete an uploaded model |
| `oumi deploy test` | Send a sample request to an endpoint |

Add `--help` to any sub-command for the exact flags it accepts, or see {doc}`/cli/commands`.

## Using a Deployed Endpoint

Once `oumi deploy up` reports `RUNNING`, point any Oumi inference engine at the returned URL. For Fireworks:

```python
from oumi.inference import FireworksInferenceEngine
from oumi.core.configs import ModelParams

engine = FireworksInferenceEngine(
model_params=ModelParams(model_name="my-finetuned-model-v1")
)
```

For Parasail:

```python
from oumi.inference import ParasailInferenceEngine
from oumi.core.configs import ModelParams

engine = ParasailInferenceEngine(
model_params=ModelParams(model_name="my-finetuned-model-v1")
)
```

Both engines are documented in {doc}`/user_guides/infer/inference_engines`.

## Tips

- **Cost control.** Use `oumi deploy stop <endpoint>` to pause an endpoint without deleting it; `start` brings it back online. Set `autoscaling.min_replicas: 0` if the provider supports scale-to-zero.
- **LoRA adapters.** Set `model_type: adapter` and a matching `base_model` to deploy a LoRA adapter on top of a hosted base model. This is usually cheaper than a full model.
- **Smoke tests.** `test_prompts` at the bottom of the YAML run automatically after `oumi deploy up` finishes — quick sanity check before sending real traffic.

## See Also

- {doc}`/user_guides/infer/inference_engines` — calling the deployed endpoint
- {doc}`/user_guides/launch/launch` — launching training jobs on remote clusters
- {doc}`/cli/commands` — CLI reference
Loading
Loading