feat: migrate analyze CLI to typed analyzer framework by ryan-arman · Pull Request #2375 · oumi-ai/oumi

ryan-arman · 2026-04-17T21:20:16Z

Description

Migrates the oumi analyze CLI from the legacy DatasetAnalyzer path onto the typed analyzer framework (TypedAnalyzeConfig + AnalysisPipeline + TestEngine) that already lives in src/oumi/analyze/ on main. First PR of a 4-PR split of #2370 — keeps this change focused on the CLI so the framework refactors, v2 naming alignment, and bug fixes can ship and be reviewed independently.

What changed

Rewrite src/oumi/cli/analyze.py to load TypedAnalyzeConfig from YAML, build analyzers via REGISTRY.get_sample_analyzer, run AnalysisPipeline, and optionally execute TestEngine.
Restore --list, --list-metrics, --log-level, --dataset_name, --dataset_path, --sample_count, --output, --format flags with rich help panels.
Detect legacy AnalyzeConfig (v1) YAMLs (presence of dataset_source, processor_name, is_multimodal, etc.) and emit a friendly migration error instead of a cryptic crash.
Wrap analyze as a nested Typer app in src/oumi/cli/main.py so subcommands and help panels render.
Update configs/examples/analyze/analyze.yaml to the v2 schema.
Rewrite docs/user_guides/analyze/{analyze,analyze_config}.md to document the v2 schema.
Exclude the v2 example YAML from the legacy-config sweep in tests/unit/core/configs/test_parse_configs.py.

Out of scope (deferred to follow-up PRs)

type/display_name rename in AnalyzerConfig → PR 2.
testing/engine.py refactor (local TestConfig → core TestParams) + bug fixes (None-index shift, MRO walk, all_affected_indices restore) → PR 3.
analyze/base.py, analyze/pipeline.py dup-name validation, analyze/discovery.py helpers, analyze/utils/dataframe.py raw-dict handling → PR 4.

Verification

pytest tests/unit/cli/test_cli_analyze.py — 7/7 pass.
pytest tests/unit/core/configs/test_parse_configs.py — 984/984 pass.
oumi analyze --config configs/examples/analyze/analyze.yaml --output /tmp/smoke --format json runs end-to-end, produces analysis.json, test_results.json, summary.json.

Related issues

Linear Issue: OPE-1868
Fixes OPE-1868

Before submitting

This PR only changes documentation. (You can ignore the following checks in that case)
Did you read the contributor guideline Pull Request guidelines?
Did you link the issue(s) related to this PR in the section above?
Did you add / update tests where needed?

Reviewers

At least one review from a member of `oumi-ai/oumi-staff` is required.

Port the oumi analyze CLI from the legacy DatasetAnalyzer path onto the typed analyzer framework (TypedAnalyzeConfig, AnalysisPipeline, TestEngine) that already lives in src/oumi/analyze/ on main. - Rewrite src/oumi/cli/analyze.py to load TypedAnalyzeConfig from YAML, construct analyzers via the core registry, run AnalysisPipeline, and optionally execute TestEngine - Restore --list, --list-metrics, --log-level, --dataset_name, --dataset_path, --sample_count, --output, --format flags - Emit a friendly migration error when a v1 AnalyzeConfig YAML is detected (checks for dataset_source, processor_name, etc.) - Wire analyze as a nested Typer app in cli/main.py so help panels render correctly - Update configs/examples/analyze/analyze.yaml to the v2 schema - Rewrite docs/user_guides/analyze/{analyze,analyze_config}.md for the v2 schema - Exclude configs/examples/analyze/analyze.yaml from the legacy test_parse_configs sweep (uses TypedAnalyzeConfig, not AnalyzeConfig) First PR in a 4-PR split of #2370. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

gitar-bot · 2026-04-17T21:20:19Z

Gitar is working

_Gitar

Remove _item_to_conversation's hardcoded prompt/instruction/response/ context key lists and simplify load_conversations_from_dataset to a strict Oumi-format loader (Conversation.from_dict per row, warn-and-skip on failure). Matches the pre-v2 CLI contract of requiring Oumi-shaped data and stays consistent with the api backend, which uses typed schemas or explicit column names rather than field guessing. Update the example YAML and the analyze docs to use placeholder dataset names and note that HF rows must already be in Oumi format; instruction- style datasets should be pre-converted to Oumi JSONL. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

This was referenced Apr 17, 2026

refactor(analyze): split AnalyzerConfig into type/id/display_name #2376

Draft

fix(analyze): preserve sample indices and fix threshold truthiness #2377

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: migrate analyze CLI to typed analyzer framework#2375

feat: migrate analyze CLI to typed analyzer framework#2375
ryan-arman wants to merge 2 commits intomainfrom
ryan-arman/analyze-cli-v2

ryan-arman commented Apr 17, 2026

Uh oh!

gitar-bot bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ryan-arman commented Apr 17, 2026

Description

Related issues

Before submitting

Reviewers

Uh oh!

gitar-bot bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gitar-bot bot commented Apr 17, 2026 •

edited

Loading