-
Notifications
You must be signed in to change notification settings - Fork 90
Native MLX inference server for Apple Silicon #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
michaelneale
wants to merge
196
commits into
main
Choose a base branch
from
mlx-rs-native
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 156 commits
Commits
Show all changes
196 commits
Select commit
Hold shift + click to select a range
cf72f85
feat: native MLX inference server for Apple Silicon
michaelneale 652e794
fix: add causal mask for prefill, fix 32B+ model generation
michaelneale 1be6a96
fix: read eos_token_id from config.json, revert chunked prefill
michaelneale ee50040
perf: pre-allocated KV cache + correct dtype matching
michaelneale 5c018d5
feat: prompt KV cache reuse between requests
michaelneale e30d87d
fix: remove mlx server warning
i386 1bd1df0
refine mlx model selection
i386 3ec979a
fix: gate mlx cli handling on macos
i386 68f04ea
feat: improve mlx prompt and sampling behavior
i386 8885f5a
feat: improve mlx model resolution and templating
i386 4384b84
Improve MLX HF template compatibility tests
i386 cdff07c
feat: expand mlx hf template compatibility
i386 8b43297
test: expand mlx qwen template coverage
i386 747bceb
feat: improve mlx qwen3 template rendering
i386 b05c40e
fix: allow missing optional mlx sidecars
i386 5112def
fix: support qwen3 attention norms in mlx
i386 94c58ba
test: add macos mlx smoke matrix
i386 8456616
fix: default qwen3 mlx templates to no thinking
i386 6a45380
Merge remote-tracking branch 'origin/main' into codex/mlx-rs-native-o…
i386 552bad8
feat: expand mlx runtime support and coverage
i386 9a2aedd
feat: support gemma2 in native mlx runtime
i386 64b6cb8
fix: harden mlx template extraction
i386 b9a70bf
feat: support glm4 in native mlx runtime
i386 6865ba7
test: add macos gguf smoke matrix
i386 cca9a23
feat: add deepseekv3 mlx runtime base
i386 a4d60dd
feat: support lfm2 in native mlx runtime
i386 edaa28e
ci: gate macos workflows behind repo variable
i386 b620ce3
feat: support gpt-oss in native mlx runtime
i386 b8129c3
fix: apply review feedback from PR comment thread
Copilot 83578bd
feat: support kimi linear in native mlx runtime
i386 168407f
fix: apply review feedback from PR comment thread
Copilot dac0037
Merge remote-tracking branch 'origin/mlx-rs-native' into codex/mlx-rs…
i386 047bfdc
fix: harden mlx reasoning output handling
i386 02a5867
test: expand mlx smoke prompts
i386 1822f94
test: widen mlx family prompt suites
i386 0f64e8b
test: expand gguf smoke prompts
i386 82c517a
ci: fold prompt suites into main test jobs
i386 901d357
Merge origin/main into codex/mlx-rs-native-on-issue-119
i386 8778ed2
fix: apply review thread 4057133282 feedback
Copilot 89d5bb9
Merge remote-tracking branch 'origin/main' into codex/mlx-rs-native-o…
i386 545ff7b
Merge remote-tracking branch 'origin/mlx-rs-native' into codex/mlx-rs…
i386 a69110e
Merge origin/main into codex/mlx-rs-native-on-issue-119
i386 bcd77f1
feat: require explicit mlx selection
i386 ee4ff6c
chore: update mlx experimental warning
i386 d903edc
Merge remote-tracking branch 'origin/main' into codex/mlx-rs-native-o…
i386 1f9654d
docs: refresh mlx documentation
i386 85eee23
chore: reword mlx warning
i386 6bb66c1
ci: reuse linux binaries for gguf smokes
i386 10d843f
ci: keep gguf prompt tests on linux
i386 c8e08a9
ci: reuse macos binaries for mlx smokes
i386 e10b770
ci: upload built binaries from producer jobs
i386 b3680df
Update mesh-llm/src/inference/election.rs
i386 e20d262
ci: use static matrix job names
i386 5b6b67d
Merge remote-tracking branch 'origin/mlx-rs-native' into codex/mlx-rs…
i386 d2cd890
fix: apply review threads 4055741484 and 4057198211 feedback
Copilot 816582f
feat: propagate runtime backend descriptors
i386 f99a153
Merge remote-tracking branch 'origin/mlx-rs-native' into codex/mlx-rs…
i386 555376e
feat: show installed model format and type
i386 a5c6eab
feat: surface gguf and mlx across model cli
i386 543bffd
feat: show run commands for installed models
i386 d8e81b4
style: add backend emojis to model cli
i386 d43f03c
feat: prefer shorthand run hints for installed models
i386 0ba4636
Merge origin/main
i386 72a59cb
docs: add backend support matrix to readme
i386 fe81461
fix: guard mlx inventory scan on non-macos
i386 ae257c1
ci: add nightly behavior regression workflow
i386 c2a831e
Update backend matrix: Kimi Linear GGUF smoke PASS, DeepSeekV3/Kimi-K…
michaelneale 1beab73
Merge origin/main
i386 4464f7a
Merge remote-tracking branch 'origin/codex/mlx-rs-native-on-issue-119…
i386 7a76fbc
fix: restore linux model module builds
i386 eca6732
Merge branch 'main' into mlx-rs-native
michaelneale 3fc8995
test: add reproducible mlx parity suite
i386 e488a7e
fix: normalize mlx gemma2 config aliases
i386 80f877d
build: pin llama.cpp to upstream-latest
i386 3d2774f
fix: harden mlx reasoning parity handling
i386 607433d
fix: stabilize mlx gemma prefill parity
i386 da0f105
test: add unified backend validation matrix runner
i386 74e1355
test: remove validation wrapper scripts
i386 769fc35
test: move validation data under testdata
i386 81cd0f0
test: enforce canonical gguf baseline promotion
i386 3df9056
Record validation artifacts and baseline guidance
i386 944ba57
test: unify exact validation harness
i386 0280eda
ci: run exact matrix on PR workflows
i386 29f9b6c
ci: narrow behavior workflow triggers
i386 f7ec24a
ci: gate releases on exact validation
i386 210442a
docs: add CI workflow diagrams
i386 0dcfcdd
test: add live validation progress tracking
i386 12696f8
test: preflight model downloads before matrix runs
i386 9cb6673
test: expose active preflight model state
i386 58effcc
test: stream preflight download logs
i386 b7ca2db
test: preserve raw download progress output
i386 5b2c389
test: harden qwen parity matrix runs
i386 e157a12
test: resolve just in noninteractive runs
i386 c1fbead
fix: support dense mlx qwen loader paths
i386 29a2fb4
fix: bundle matching mlx metallib on macos
i386 f3a8527
test: refine mlx parity validation and fix qwen template rendering
i386 6806b25
debug: add mlx top candidate tracing
i386 2e91c8b
Fix Gemma 3 MLX parity and switch to the QAT test pair
i386 d00baff
docs: publish gemma3 parity pair
i386 6c20d9f
Use canonical meshllm Qwen2.5 parity models
i386 bd0b094
Publish canonical Qwen3 and Gemma 2 parity pairs
i386 5bc8bcb
fix: use published Gemma 4 parity pair
i386 1168682
docs: add same-origin parity workflow
i386 b0ad67c
Switch parity matrix models to same-origin Llama and LFM2
i386 13e8970
Publish GLM4 parity pair and add studio54 download script
i386 b7c2ee3
Merge remote-tracking branch 'origin/main' into codex/mlx-rs-native-o…
i386 915a354
fix: restore CI unit tests after merge
i386 b298529
fix: keep mlx snapshot scan cross-platform
i386 b15e219
Generate behavior matrix from matrix.json and run validation script
i386 57e6aed
test: add local GGUF preflight and OLMO2 local matrix helper
i386 3bd0de7
ci: restore matrix workflows, harness, and validation baselines
i386 a88bc5e
ci: rename smoke test job suites
i386 6f9595d
ci: run model-exercising checks only in smoke test jobs
i386 07a614c
ci: add canary smoke tests job before exact matrices
i386 24d309e
ci: make matrix smoke suites optional for manual runs
i386 6366c19
ci: rename canary smoke job to inference smoke tests
i386 b28fd50
ci: remove legacy wording from inference smoke steps
i386 9cb8758
ci: require >=64GB RAM for mlx smoke tests
i386 3112e7d
ci: skip mlx smoke tests on runners with <64GB RAM
i386 b8172fb
ci: make mlx smoke tests job-level skipped on low-memory runners
i386 7934084
ci: make llama/mlx smoke suites user-invoked and remove mlx runner gate
i386 5c53d7c
ci: run inference smoke tests on linux by default
i386 b6ecf1c
ci: source smoke suite matrices from validation matrix.json
i386 2b96266
ci: remove behavior workflow and add release workflow svg diagram
i386 f4b1e1e
release: gate publish on inference smoke tests
i386 871874d
release: keep inference checks only in inference_smoke_tests
i386 a864353
docs: refresh release workflow svg for inference smoke gate
i386 f67b811
ci: rename smoke matrix job and document its role
i386 287bdae
ci: run smoke matrix generation after inference smoke tests
i386 0f04bff
ci: add environment approval gates for smoke suites
i386 a0c66b5
ci: always run smoke matrix generation after inference smokes
i386 6135914
ci: upgrade artifact actions and drop node20 sccache action
i386 1617a39
test: add olmo2 parity pair to validation matrix
i386 d5bb2a2
fix: add OLMo2 parity support and matrix entries
i386 2d46a53
release: replace action-gh-release with gh CLI publish
i386 b1665e1
ci: trace linux build commands for clone/cmake failures
i386 4551332
fix: avoid set -e abort when compiler cache is unavailable
i386 24e999a
docs: record mamba exact validation failure
i386 163a570
docs: record smollm2 exact validation failure
i386 1cbfed4
merge main into mlx-rs-native
michaelneale 4926e00
scripts: add kimi-k2 to heavy origin downloads
i386 a1e07f8
updating to main
michaelneale dd30275
docs: record deepseek exact validation failure
i386 691e276
Merge origin/codex/mlx-rs-native-on-issue-119 into mlx-rs-native
i386 0aa5c52
fix(ci): restore model resolve exports and download CLI wiring
i386 6fcbb61
feat: add phi3 mlx runtime support
i386 cd7ad22
fix(ci): restore prompt module exports in models
i386 71f0f60
fix: preserve phi3 special-token whitespace in mlx
i386 a7287e5
Merge michael/mlx-rs-native into mlx-rs-native
i386 2d902af
fix: remove local mlx-rs path dep and duplicate module root
michaelneale edfd832
fix: gate mlx election fallback behind cfg(target_os = macos)
michaelneale 8551c88
cli: add --gguf/--mlx filters to models search
i386 6cd3c5f
search: improve mlx filtering by artifact kind
i386 d0a1831
Merge origin/main into mlx-rs-native
i386 5c4fe05
search: keep gguf hub filter and skip repo metadata decode errors
i386 dfeb7d7
search: restrict --mlx results to mlx-identified repos
i386 9cd9295
Merge main and retain MLX serve path
i386 e3e4700
Merge branch 'origin/mlx-stem-search-refs' into mlx-rs-native
i386 bb06b8e
test(search): align GGUF ordering expectation with quant preference
i386 8d1c65d
fix(search): align gguf ranking with main split-first semantics
i386 49f81f4
ci/release: build portable llama.cpp binaries with GGML_NATIVE=OFF
i386 f82d789
mlx: remove unused top_logits helper
i386 1fa95c7
Merge origin/main and keep main model resolution behavior
i386 01196f0
cli/runtime: drop format preference flags and keep explicit model-fil…
i386 24d8a0c
inference: guard MLX server startup path on non-macOS
i386 30d45a1
mesh/mlx: address review feedback and apply required formatting
i386 8b6bac6
scripts: remove orphan smoke helpers and clean plugin clippy lint
i386 1c6f636
chore: remove stale release job graph artifact
i386 b650c31
Document MLX family bring-up workflow
i386 2e871b2
fix(warm-caches): fix the syntax issue for env in the warm job
ndizazzo 6403443
feature(smoke-test): add inference smoke test as synced CI flow
ndizazzo 5eecb5f
workflows: align ci/release definitions with main
i386 c896364
ci: add optional GGUF/MLX matrix smoke jobs
i386 962c871
Merge origin/main into mlx-rs-native
i386 bcff8ea
fmt: normalize lib.rs trailing newline
i386 c6e5192
Merge remote-tracking branch 'origin/main' into mlx-rs-native
i386 4839b52
ci: include llama-moe-analyze in inference smoke artifacts
i386 2babac0
Improve MLX Mistral parity and split family loaders
i386 4cf2554
Update validation scripts for current model flags
i386 b4b48ea
Publish Mistral parity artifacts in validation matrix
i386 6ffb53d
Accept OLMo parity artifacts in validation matrix
i386 98a40cc
fix(runtime): add backend_hint in startup spec test
i386 6e04478
Harden MLX template coverage for shipped families
i386 241366b
Harden MLX config coverage for shipped families
i386 c016056
Harden MLX family transforms and cache invariants
i386 362f9d1
Fix MLX behavior regressions and finalize matrix status
i386 a34aa57
Fix MLX validation regressions and refresh matrix coverage
i386 db72bd7
Refresh validation baselines and artifact paths
i386 cbd18f3
Clarify tested MLX families in README
i386 06048df
Merge origin/main into mlx-rs-native
i386 b0cd011
Decompose MLX model loading and family assembly
i386 d18444c
Fix macOS CI test regressions
i386 ba16004
Clarify MLX startup logs
i386 25b92fd
Merge remote-tracking branch 'origin/main' into mlx-rs-native
i386 e43da62
Merge remote-tracking branch 'origin/main' into mlx-rs-native
i386 c32f1a3
Merge remote-tracking branch 'origin/main' into mlx-rs-native
i386 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| export NPM_CONFIG_REGISTRY=https://registry.npmjs.org/ | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.