Add local MLX backend support and benchmark tooling#79
Closed
Conversation
# Conflicts: # mesh-llm/src/main.rs # mesh-llm/src/proxy.rs
Collaborator
|
super excited for this one! |
a86fdfc to
f8ae4b4
Compare
02d186c to
a86fdfc
Compare
Collaborator
|
I think can retire this one in favour of a native one: #103 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this adds for users
This PR adds local
mlxbackend support alongside the existing llama.cpp path.Backend behavior
mlxbackend automatically.llamabackend.mesh-llmcan launchmlx_lm.servereither fromPATH, viapython -m mlx_lm.server, or through--mlx-server-bin/MESH_LLM_MLX_SERVER_BIN.modelfield to the actual model directory so the mesh-facing alias stays stable while the runtime still accepts the request.Docs and operator workflow
evals/backend-benchmark.pywas expanded into a generic local backend benchmark so the same harness can compare matched exports across different backends.evals/README.mdnow documents both the simple llama/MLX workflow and the JSON-spec workflow for arbitrary backend combinations.Benchmarking
Q4_K_Mand4bitcollapse to the same comparison key.Current benchmark results
Paired-export benchmark using
Qwen2.5-0.5B-Instruct:mlx: local4bitMLX exportllama: localQ4_K_MGGUF export1,4,83164Key read from that run:
Validation
just buildcargo test -p mesh-llmcargo check -p mesh-llmpython3 -m py_compile evals/backend-benchmark.pyQwen2.5-0.5B-Instructinf16/bf16andQ4_K_M/4bitforms