Skip to content
Draft
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
8cc131b
feat: update analyze CLI to use typed analyzer framework
ryan-arman Apr 14, 2026
2377011
test: add tests for quality analyzer, turn stats analyzer, and test e…
ryan-arman Apr 14, 2026
a01156c
docs: update analyze docs and config for typed analyzer framework
ryan-arman Apr 14, 2026
cabb863
refactor: remove deduplication, llm_analyzer, and llm_criteria
ryan-arman Apr 14, 2026
d6759aa
revert: restore length.py from main, fix tests to match
ryan-arman Apr 14, 2026
9a5876b
revert: restore quality.py from main, rewrite tests to match
ryan-arman Apr 14, 2026
6f11088
revert: restore turn_stats.py from main, rewrite tests to match
ryan-arman Apr 14, 2026
f27a237
fix: properly restore quality.py from main, update tests and docs
ryan-arman Apr 14, 2026
bae5037
fix: remove stale params from example analyze config
ryan-arman Apr 14, 2026
6db1c4a
fix: restore analyzers __init__.py exports to match main
ryan-arman Apr 14, 2026
8b867c8
feat: restore CLI flags and backward compat for analyze v2
ryan-arman Apr 14, 2026
a254dce
fix: update docs key capabilities to match actual quality checks
ryan-arman Apr 15, 2026
b71dcb7
docs: use --config in CLI examples, mention -c shorthand
ryan-arman Apr 15, 2026
5894848
feat: remove custom metrics feature from analyze framework
ryan-arman Apr 15, 2026
e357230
fix: reject duplicate analyzer names in AnalysisPipeline constructor
ryan-arman Apr 15, 2026
2087823
fix: preserve original indices in test engine when None values filtered
ryan-arman Apr 15, 2026
8195671
fix: populate failure_reasons for max_percentage failures in percenta…
ryan-arman Apr 15, 2026
5f2774d
fix: walk MRO in _get_result_type for analyzer subclass support
ryan-arman Apr 15, 2026
e5bb1b3
fix: restore BatchTestEngine and core TestType in testing exports
ryan-arman Apr 15, 2026
acac138
fix: restore all_affected_indices field on TestResult
ryan-arman Apr 15, 2026
caad7d5
fix: convert TypeError to ValueError for unknown YAML config fields
ryan-arman Apr 15, 2026
7a5660b
fix: handle raw dicts from cache in dataframe conversion
ryan-arman Apr 15, 2026
c43d02e
fix: use core registry in discovery instead of CLI module
ryan-arman Apr 15, 2026
885b7ce
fix: resolve pyright errors and empty-except code quality findings
ryan-arman Apr 15, 2026
998d9fb
Merge branch 'main' into ryan-arman/analyzer-cli-update
ryan-arman Apr 15, 2026
6322d9d
refactor: consolidate analyze/cli.py into cli/analyze.py
ryan-arman Apr 15, 2026
31d8e5d
fix: unify duplicate logging imports in cli/analyze.py
ryan-arman Apr 15, 2026
46443a8
fix: exclude v2 analyze config from old config parse test
ryan-arman Apr 15, 2026
08bdf8a
revert: restore analyzer test files to match main
ryan-arman Apr 15, 2026
ba33410
fix: clean up TestEngine — remove percentage/range, restore helpers, …
ryan-arman Apr 15, 2026
8e11f55
fix: remove percentage/range references from config, yaml, and docs
ryan-arman Apr 15, 2026
e4e55ea
style: use --config instead of -c in analyze.yaml usage comment
ryan-arman Apr 15, 2026
401c816
update
ryan-arman Apr 15, 2026
1a052d2
docs: clarify --output CLI flag vs output_path config field
ryan-arman Apr 15, 2026
a3a8c42
style: remove unnecessary comments in testing __init__.py
ryan-arman Apr 15, 2026
076019a
refactor: merge TestConfigYAML into TestConfig
ryan-arman Apr 15, 2026
ef56549
refactor: replace TestConfig with TestParams from core for API consis…
ryan-arman Apr 15, 2026
357bdb4
fix: raise TypeError on unexpected types in TestEngine._get_nested_value
ryan-arman Apr 15, 2026
4486a91
style: remove AI slop and fix analyze.md docs
ryan-arman Apr 16, 2026
e5ca56c
refactor: drop heuristic field-mapping in analyze CLI dataset loader
ryan-arman Apr 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 33 additions & 18 deletions configs/examples/analyze/analyze.yaml
Original file line number Diff line number Diff line change
@@ -1,36 +1,51 @@
# Dataset Analysis Configuration
# Usage: oumi analyze --config configs/examples/analyze/analyze.yaml
# Usage: oumi analyze -c configs/examples/analyze/analyze.yaml
#
# Output files:
# - message_analysis.csv: Per-message metrics
# - conversation_analysis.csv: Per-conversation aggregated metrics
# - analysis_summary.json: Statistical summary (mean, std, min, max, median)
# Local file in Oumi or Alpaca format
# - analysis.csv: Per-conversation metrics
# - test_results.json: Test pass/fail details
# - summary.json: Statistical summary (mean, std, min, max, median)

# Local file in Oumi format (JSONL with {"messages": [...]})
dataset_path: data/dataset_examples/oumi_format.jsonl
# Or, use a HuggingFace dataset
# dataset_name: argilla/databricks-dolly-15k-curated-en
# split: train

# sample_count: 1000 # Limit samples (null = all)

# Tokenizer for token_count metric (use same tokenizer as your target model)
# tokenizer_name: openai-community/gpt2
# tokenizer_kwargs: {}

# For multimodal (vision-language) datasets, specify a processor
# Presence of processor_name automatically enables multimodal mode
# processor_name: llava-hf/llava-1.5-7b-hf
# processor_kwargs: {}

# trust_remote_code: false

output_path: ./analysis_output

# Analyzers to run. Each needs a unique display_name.
# Metrics are accessed via paths like "Length.total_tokens".
analyzers:
- id: length
- type: length
display_name: Length
params:
# Tokenizer name - automatically detects tiktoken vs HuggingFace
tokenizer_name: cl100k_base # tiktoken encoding (GPT-4)
# For HuggingFace tokenizers, use model ID:
# tokenizer_name: meta-llama/Llama-3.1-8B-Instruct
# trust_remote_code: false
- type: quality
display_name: Quality
- type: turn_stats
display_name: TurnStats

# Tests validate analysis results against thresholds.
# Metric paths use "{display_name}.{field_name}" format.
# tests:
# - id: max_tokens
# type: threshold
# metric: Length.total_tokens
# operator: ">"
# value: 10000
# max_percentage: 5.0
# severity: high
# display_name: "Token count exceeds 10K"
# - id: empty_turns
# type: percentage
# metric: Quality.has_empty_turns
# condition: "== True"
# max_percentage: 5.0
# severity: high
# display_name: "Conversations with empty turns"
Loading
Loading