feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export#2406
feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export#2406justinchuby wants to merge 19 commits intomainfrom
Conversation
Adds a new Olive pass that wraps mobius's build() function to produce ONNX models directly from HuggingFace model IDs. - Single-component models (LLMs) → ONNXModelHandler - Multi-component models (VLMs, encoder-decoders) → CompositeModelHandler - EP auto-detected from Olive accelerator spec (cpu/cuda/dml/webgpu) - Precision: fp32 (default), fp16, bf16 - Registered in olive_config.json as 'MobiusModelBuilder' - Example pipeline config: examples/gemma4/gemma4_int4_pipeline.json - 10 unit tests covering single/multi-component, EP detection, and error cases Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- test_ep_map_covers_common_providers now asserts DML and WebGPU in addition to CPU and CUDA, verifying full EP coverage - Add examples/gemma4/gemma4_fp32_cpu.json showing CPU/fp32 deployment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Use official model IDs: - google/gemma-4-E2B-it and google/gemma-4-E4B-it: Any-to-Any (vision + audio + text) - google/gemma-4-26B-A4B-it and google/gemma-4-31B-it: Image-Text to Text only (no audio encoder) Updated both example configs to use google/gemma-4-E2B-it and added comment strings documenting the audio-capable vs image-only distinction. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
…lds) Fix invalid RunConfig fields in both example configs: - Remove output_name and system (not valid engine fields) - Move target reference to engine.target - Use log_severity_level=1 Verified E2E with HuggingFaceTB/SmolLM2-135M-Instruct: - olive run completed successfully - model.onnx + model.onnx.data produced - ORT loaded the model, correct causal-LM I/O (input_ids -> logits + KV cache) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new ONNX pass (MobiusModelBuilder) that uses the mobius package to build ONNX models directly from HuggingFace model IDs, returning either a single ONNXModelHandler or a CompositeModelHandler for multi-component exports.
Changes:
- Introduces
olive/passes/onnx/mobius_model_builder.pyimplementing the new pass (EP mapping, precision mapping, trust_remote_code passthrough). - Registers the pass in
olive/olive_config.jsonand adds two Gemma4 example run configs. - Adds unit tests for single-component, multi-component, EP selection, and error paths.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
olive/passes/onnx/mobius_model_builder.py |
New pass wrapping mobius.build() and emitting Olive model handlers. |
olive/olive_config.json |
Registers MobiusModelBuilder and declares extras for its dependencies. |
examples/gemma4/gemma4_int4_pipeline.json |
Example pipeline: mobius export (fp16 CUDA) then INT4 quantization. |
examples/gemma4/gemma4_fp32_cpu.json |
Example pipeline: mobius export (fp32 CPU). |
test/passes/onnx/test_mobius_model_builder.py |
New unit tests for config, handler types, EP mapping, and missing dependency behavior. |
| @@ -0,0 +1,182 @@ | |||
| # ------------------------------------------------------------------------- | |||
- _PRECISION_TO_DTYPE: add inline comments explaining each dtype string (f32 = float32, f16 = float16, bf16 = bfloat16) and when to use a downstream quantization pass for INT4/INT8 instead - Remove explicit execution_provider from CUDA example config so both gemma4 configs consistently rely on auto-detection from the accelerator spec; the CPU config already did this - olive_config.json: add mobius-genai to top-level extra_dependencies map so 'olive run' can surface the install hint; remove onnx_ir (transitive dep of mobius-genai) from the pass entry - Move AcceleratorSpec import to TYPE_CHECKING block (RUFF TC001) — safe because the file already has 'from __future__ import annotations' - Use X | Y union syntax instead of Union[X, Y] (RUFF UP007) - Remove redundant 'import onnx_ir' check; ImportError message now correctly says 'pip install mobius-genai' (PYLINT W0611) - Rename unused _fake_pkg 'output_dir' param to '_output_dir' to suppress lint warning (PYLINT W0613) - Wrap long AcceleratorSpec(…) lines to stay under 120 chars (RUFF format) - Collapse nested 'with' into single 'with' (RUFF SIM117) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- EP_MAP: tighten annotation to ClassVar[dict[ExecutionProvider, str]]
(keys are enum instances, not plain strings)
- olive_config.json: add onnx-ir (correct pip hyphenated name) to both
the pass extra_dependencies and the top-level extra_dependencies map;
was previously using wrong underscore spelling 'onnx_ir'
- Rename examples/gemma4/gemma4_int4_pipeline.json ->
gemma4_int4_cuda.json so both example configs follow the same
{precision}_{device}.json naming pattern
- _patch_build: expand docstring explaining why 'mobius.build' is the
correct patch target (lazy import inside function body, not module-level)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
…delBuilder - After pkg.save(), verify each expected model.onnx exists and raise RuntimeError with a clear message if missing (single-component and per-component in multi-component paths) - Log a WARNING when trust_remote_code=True is passed so users are reminded to only use this with trusted model sources - Add 4 new tests: missing output raises RuntimeError (single and multi-component), trust_remote_code warning emitted, no warning when False (14/14 passing) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- Add module-scoped _stub_mobius_module fixture that injects a fake
'mobius' stub into sys.modules when the package is not installed,
ensuring patch('mobius.build') works in Olive CI without mobius-genai
- Add '# pylint: disable=protected-access' on _default_config test line
(PYLINT W0212 — intentional test access to a pass internals method)
- Add '# noqa: PLC0415' on lazy 'from mobius import build' inside
_run_for_config — import is intentionally deferred to surface a clear
ImportError only when the pass actually runs
- Run 'lintrunner -a' to auto-apply RUFF-FORMAT and FORMAT-JSON patches
on mobius_model_builder.py, test file, and both example configs
- 14/14 tests pass
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Change all references from 'mobius-genai' to 'mobius-ai': - olive_config.json: extra_dependencies key/value and top-level mapping - mobius_model_builder.py: docstring install snippet and ImportError message - test file: fixture docstring comment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
lintrunner auto-fixed RUF100 (unused noqa directive) across 15 files. The PLC0415 noqa in mobius_model_builder.py was stale — ruff does not enable PLC0415 in this repo, so the directive was unused. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
| ), | ||
| ), | ||
| "execution_provider": PassConfigParam( | ||
| type_=str, |
There was a problem hiding this comment.
we could create an enum of the supported eps for automatic validation like in
Olive/olive/passes/pytorch/autoawq.py
Line 27 in 8b1957e
unless you think the options might keep growing and it would be hard to keep it in sync across versions
…files to model_attributes Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
…ify test docstring Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
Olive's RunConfig uses Pydantic with extra='forbid' on EngineConfig, which causes validation errors when unknown top-level fields like 'comment' are present. Remove them so the configs validate correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>
Replace GptqQuantizer (requires auto_gptq) with the built-in OnnxBlockWiseRtnQuantization pass which works out of the box. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>
Replace free-form string with StrEnumBase enum matching the pattern from AutoAWQQuantizer.ModelDtype. Supports: default, cpu, cuda, dml, webgpu, trt-rtx, onnx-standard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>
21ab3e2 to
2af889f
Compare
Configs moved to microsoft/olive-recipes per repo convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>
|
will mobius generate genai_config.json and related files for ort genai? Also, does mobius support customized naming for different component for multi components model? I can see all component models are named as "model.onnx" |
Add 'runtime' config param (default: ort-genai) that generates genai_config.json, tokenizer files, and processor configs alongside ONNX models via write_ort_genai_config(). Set to 'none' to skip. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>
dcce655 to
68ed349
Compare
| ) | ||
|
|
||
|
|
||
| class _combine_patches: |
|
@xiaoyu-work The models are |
|
|
||
| pip install mobius-ai | ||
|
|
||
| See https://github.com/microsoft/mobius |
| ), | ||
| ), | ||
| "execution_provider": PassConfigParam( | ||
| type_=MobiusModelBuilder.MobiusEP, |
There was a problem hiding this comment.
We should not have pass level execution provider user choices.
The Olive engine is running a series of passes and user selects the EP for a given engine run.
There was a problem hiding this comment.
See another relevant comment below _run_for_config
|
|
||
| # Resolve EP: explicit config override > accelerator spec > fallback to cpu. | ||
| ep_str: str = config.execution_provider or self.EP_MAP.get(self.accelerator_spec.execution_provider, "cpu") | ||
|
|
There was a problem hiding this comment.
Olive Engine expects a pass to raise an error if accelerator spec provided EP is not supported by the pass. Alternatively return on modified input model in cases where input model type is same as output model type.
Summary
Adds a new Olive pass (
MobiusModelBuilder) that wraps mobiusbuild()to produce ONNX models from HuggingFace model IDs.ONNXModelHandlerCompositeModelHandlerolive_config.jsonasMobiusModelBuilderexamples/gemma4/gemma4_int4_cuda.jsonValidated: Gemma4 INT4 Quantization Pipeline
Successfully tested
MobiusModelBuilder→OnnxBlockWiseRtnQuantizationongoogle/gemma-4-E2B-it:Quantized ops per component
Weight quantization coverage
Output structure (2.8GB total, down from ~5GB fp16)
Pipeline timing
MobiusModelBuilder(fp16 build)OnnxBlockWiseRtnQuantization(int4)