feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export by justinchuby · Pull Request #2406 · microsoft/Olive

justinchuby · 2026-04-09T21:24:07Z

Summary

Adds a new Olive pass (MobiusModelBuilder) that wraps mobius build() to produce ONNX models from HuggingFace model IDs.

Single-component models (LLMs) → ONNXModelHandler
Multi-component models (VLMs, encoder-decoders) → CompositeModelHandler
EP auto-detected from Olive accelerator spec (cpu/cuda/dml/webgpu)
Precision: fp32 (default), fp16, bf16
Registered in olive_config.json as MobiusModelBuilder
Example pipeline config: examples/gemma4/gemma4_int4_cuda.json
14 unit tests covering single/multi-component, EP detection, and error cases

Validated: Gemma4 INT4 Quantization Pipeline

Successfully tested MobiusModelBuilder → OnnxBlockWiseRtnQuantization on google/gemma-4-E2B-it:

Quantized ops per component

Component	Total nodes	MatMulNBits	GatherBlockQuantized	Other ops
decoder	1,277	277	39	961
audio	1,465	135	0	1,330
vision	1,488	114	66	1,308
embedding	24	0	1	23

Weight quantization coverage

Component	Quantized (UINT8/INT4)	Non-quantized (FP16)	% quantized by size
decoder	316 tensors (2.4G elements)	585 tensors (71M elements)	97%
audio	135 tensors (154M elements)	768 tensors (2.8M elements)	98%
embedding	1 tensor (201M elements)	4 tensors (3.1M elements)	98%
vision	147 tensors (90M elements)	698 tensors (1.7M elements)	98%

Output structure (2.8GB total, down from ~5GB fp16)

models/gemma4-e2b-int4-cuda/
├── decoder.onnx      (853K) + decoder.onnx.data   (2.4G)
├── audio.onnx        (1.2M) + audio.onnx.data     (152M)
├── embedding.onnx    (8.5K) + embedding.onnx.data  (199M)
├── vision.onnx       (1.2M) + vision.onnx.data     (89M)
└── model_config.json

Pipeline timing

Pass	Time
`MobiusModelBuilder` (fp16 build)	77s
`OnnxBlockWiseRtnQuantization` (int4)	129s
Total	~3.5 min

Adds a new Olive pass that wraps mobius's build() function to produce ONNX models directly from HuggingFace model IDs. - Single-component models (LLMs) → ONNXModelHandler - Multi-component models (VLMs, encoder-decoders) → CompositeModelHandler - EP auto-detected from Olive accelerator spec (cpu/cuda/dml/webgpu) - Precision: fp32 (default), fp16, bf16 - Registered in olive_config.json as 'MobiusModelBuilder' - Example pipeline config: examples/gemma4/gemma4_int4_pipeline.json - 10 unit tests covering single/multi-component, EP detection, and error cases Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

- test_ep_map_covers_common_providers now asserts DML and WebGPU in addition to CPU and CUDA, verifying full EP coverage - Add examples/gemma4/gemma4_fp32_cpu.json showing CPU/fp32 deployment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Use official model IDs: - google/gemma-4-E2B-it and google/gemma-4-E4B-it: Any-to-Any (vision + audio + text) - google/gemma-4-26B-A4B-it and google/gemma-4-31B-it: Image-Text to Text only (no audio encoder) Updated both example configs to use google/gemma-4-E2B-it and added comment strings documenting the audio-capable vs image-only distinction. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

…lds) Fix invalid RunConfig fields in both example configs: - Remove output_name and system (not valid engine fields) - Move target reference to engine.target - Use log_severity_level=1 Verified E2E with HuggingFaceTB/SmolLM2-135M-Instruct: - olive run completed successfully - model.onnx + model.onnx.data produced - ORT loaded the model, correct causal-LM I/O (input_ids -> logits + KV cache) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Copilot

Pull request overview

Adds a new ONNX pass (MobiusModelBuilder) that uses the mobius package to build ONNX models directly from HuggingFace model IDs, returning either a single ONNXModelHandler or a CompositeModelHandler for multi-component exports.

Changes:

Introduces olive/passes/onnx/mobius_model_builder.py implementing the new pass (EP mapping, precision mapping, trust_remote_code passthrough).
Registers the pass in olive/olive_config.json and adds two Gemma4 example run configs.
Adds unit tests for single-component, multi-component, EP selection, and error paths.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`olive/passes/onnx/mobius_model_builder.py`	New pass wrapping `mobius.build()` and emitting Olive model handlers.
`olive/olive_config.json`	Registers `MobiusModelBuilder` and declares extras for its dependencies.
`examples/gemma4/gemma4_int4_pipeline.json`	Example pipeline: mobius export (fp16 CUDA) then INT4 quantization.
`examples/gemma4/gemma4_fp32_cpu.json`	Example pipeline: mobius export (fp32 CPU).
`test/passes/onnx/test_mobius_model_builder.py`	New unit tests for config, handler types, EP mapping, and missing dependency behavior.

@@ -0,0 +1,182 @@
+# -------------------------------------------------------------------------


- _PRECISION_TO_DTYPE: add inline comments explaining each dtype string (f32 = float32, f16 = float16, bf16 = bfloat16) and when to use a downstream quantization pass for INT4/INT8 instead - Remove explicit execution_provider from CUDA example config so both gemma4 configs consistently rely on auto-detection from the accelerator spec; the CPU config already did this - olive_config.json: add mobius-genai to top-level extra_dependencies map so 'olive run' can surface the install hint; remove onnx_ir (transitive dep of mobius-genai) from the pass entry - Move AcceleratorSpec import to TYPE_CHECKING block (RUFF TC001) — safe because the file already has 'from __future__ import annotations' - Use X | Y union syntax instead of Union[X, Y] (RUFF UP007) - Remove redundant 'import onnx_ir' check; ImportError message now correctly says 'pip install mobius-genai' (PYLINT W0611) - Rename unused _fake_pkg 'output_dir' param to '_output_dir' to suppress lint warning (PYLINT W0613) - Wrap long AcceleratorSpec(…) lines to stay under 120 chars (RUFF format) - Collapse nested 'with' into single 'with' (RUFF SIM117) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

- EP_MAP: tighten annotation to ClassVar[dict[ExecutionProvider, str]] (keys are enum instances, not plain strings) - olive_config.json: add onnx-ir (correct pip hyphenated name) to both the pass extra_dependencies and the top-level extra_dependencies map; was previously using wrong underscore spelling 'onnx_ir' - Rename examples/gemma4/gemma4_int4_pipeline.json -> gemma4_int4_cuda.json so both example configs follow the same {precision}_{device}.json naming pattern - _patch_build: expand docstring explaining why 'mobius.build' is the correct patch target (lazy import inside function body, not module-level) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

…delBuilder - After pkg.save(), verify each expected model.onnx exists and raise RuntimeError with a clear message if missing (single-component and per-component in multi-component paths) - Log a WARNING when trust_remote_code=True is passed so users are reminded to only use this with trusted model sources - Add 4 new tests: missing output raises RuntimeError (single and multi-component), trust_remote_code warning emitted, no warning when False (14/14 passing) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

- Add module-scoped _stub_mobius_module fixture that injects a fake 'mobius' stub into sys.modules when the package is not installed, ensuring patch('mobius.build') works in Olive CI without mobius-genai - Add '# pylint: disable=protected-access' on _default_config test line (PYLINT W0212 — intentional test access to a pass internals method) - Add '# noqa: PLC0415' on lazy 'from mobius import build' inside _run_for_config — import is intentionally deferred to surface a clear ImportError only when the pass actually runs - Run 'lintrunner -a' to auto-apply RUFF-FORMAT and FORMAT-JSON patches on mobius_model_builder.py, test file, and both example configs - 14/14 tests pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Change all references from 'mobius-genai' to 'mobius-ai': - olive_config.json: extra_dependencies key/value and top-level mapping - mobius_model_builder.py: docstring install snippet and ImportError message - test file: fixture docstring comment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

lintrunner auto-fixed RUF100 (unused noqa directive) across 15 files. The PLC0415 noqa in mobius_model_builder.py was stale — ruff does not enable PLC0415 in this repo, so the directive was unused. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

jambayk · 2026-04-10T18:26:48Z

+                ),
+            ),
+            "execution_provider": PassConfigParam(
+                type_=str,


we could create an enum of the supported eps for automatic validation like in

Olive/olive/passes/pytorch/autoawq.py

Line 27 in 8b1957e

class ModelDtype(StrEnumBase):

.
unless you think the options might keep growing and it would be hard to keep it in sync across versions

…files to model_attributes Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

…ify test docstring Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

Olive's RunConfig uses Pydantic with extra='forbid' on EngineConfig, which causes validation errors when unknown top-level fields like 'comment' are present. Remove them so the configs validate correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Replace GptqQuantizer (requires auto_gptq) with the built-in OnnxBlockWiseRtnQuantization pass which works out of the box. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Replace free-form string with StrEnumBase enum matching the pattern from AutoAWQQuantizer.ModelDtype. Supports: default, cpu, cuda, dml, webgpu, trt-rtx, onnx-standard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Configs moved to microsoft/olive-recipes per repo convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

xiaoyu-work · 2026-04-24T04:34:24Z

will mobius generate genai_config.json and related files for ort genai? Also, does mobius support customized naming for different component for multi components model? I can see all component models are named as "model.onnx"

Add 'runtime' config param (default: ort-genai) that generates genai_config.json, tokenizer files, and processor configs alongside ONNX models via write_ort_genai_config(). Set to 'none' to skip. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

+    )
+
+
+class _combine_patches:


justinchuby · 2026-04-25T02:01:13Z

@xiaoyu-work The models are <component_name>/model.onnx. Is it good enough or are there cases when we want a flat layout / renamed models?

devang-ml · 2026-04-27T17:54:16Z

+
+        pip install mobius-ai
+
+    See https://github.com/microsoft/mobius


Incorrect URL

devang-ml · 2026-04-27T17:56:56Z

+                ),
+            ),
+            "execution_provider": PassConfigParam(
+                type_=MobiusModelBuilder.MobiusEP,


We should not have pass level execution provider user choices.
The Olive engine is running a series of passes and user selects the EP for a given engine run.

See another relevant comment below _run_for_config

devang-ml · 2026-04-27T18:23:14Z

+
+        # Resolve EP: explicit config override > accelerator spec > fallback to cpu.
+        ep_str: str = config.execution_provider or self.EP_MAP.get(self.accelerator_spec.execution_provider, "cpu")
+


Olive Engine expects a pass to raise an error if accelerator spec provided EP is not supported by the pass. Alternatively return on modified input model in cases where input model type is same as output model type.

Justin Chu and others added 4 commits April 9, 2026 14:04

Copilot AI review requested due to automatic review settings April 9, 2026 21:24

Copilot started reviewing on behalf of justinchuby April 9, 2026 21:25 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

github-advanced-security AI found potential problems Apr 9, 2026

View reviewed changes

justinchuby marked this pull request as draft April 9, 2026 22:00

Justin Chu and others added 4 commits April 9, 2026 19:56

docs: clarify _patch_build comment on lazy import patch target

8c1259c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

github-advanced-security AI found potential problems Apr 10, 2026

View reviewed changes

Comment thread examples/gemma4/gemma4_int4_cuda.json Fixed

justinchuby self-assigned this Apr 10, 2026

github-advanced-security AI found potential problems Apr 10, 2026

View reviewed changes

Comment thread olive/passes/onnx/mobius_model_builder.py Fixed

Comment thread olive/passes/onnx/mobius_model_builder.py Fixed

github-advanced-security AI found potential problems Apr 10, 2026

View reviewed changes

Comment thread olive/passes/onnx/mobius_model_builder.py Fixed

justinchuby requested a review from Copilot April 10, 2026 17:36

Copilot started reviewing on behalf of justinchuby April 10, 2026 17:37 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

Comment thread examples/gemma4/gemma4_int4_cuda.json Outdated

Comment thread examples/gemma4/gemma4_int4_cuda.json Outdated

Comment thread olive/passes/onnx/mobius_model_builder.py Outdated

Comment thread olive/passes/onnx/mobius_model_builder.py

justinchuby requested review from jambayk and xiaoyu-work April 10, 2026 17:54