Auto routing prefers larger models when they are available#334
Merged
michaelneale merged 2 commits intomainfrom Apr 18, 2026
Merged
Auto routing prefers larger models when they are available#334michaelneale merged 2 commits intomainfrom
michaelneale merged 2 commits intomainfrom
Conversation
When a client sends a request with model="auto" (or no model at all), the router picks from whatever models the mesh currently serves. Until now that pick was uniform random, which on public meshes that host a mix of sizes meant roughly half of first-turn auto requests landed on a small 2B/9B model even when a 31B, 35B or unnamed-but-strong model was available. The router now partitions candidates into two tiers by name: - small tier: names advertising a single-digit billion-parameter count at a word boundary, e.g. "Qwen3.5-2B", "Mistral-7B", "llama-3-7b-instruct". - big-or-unknown tier: multi-digit sizes (31B, 70B, 671B), decimal sizes (3.8B, 1.5B), names without an explicit size (MiniMax, Qwen3-Coder-Next), and MoE active-params tags like "A3B" are all ignored when scoring size — the total count on those names wins. Each tier is shuffled independently and the first tier is tried first. Smalls are still reachable as a fallback when the big-or-unknown tier is empty, so single-host small-only meshes keep working unchanged. Sticky-auto, retry/failover, capability filtering, and the existing media/tool filters are untouched — this only changes which name comes out of pick_model_classified on the first turn of a session.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the client-side auto model routing logic so that, on first-turn model="auto" picks, the router preferentially selects “big-or-unknown” model names over explicitly single-digit “xB” small models, while still falling back to small-only meshes.
Changes:
- Partition candidate models into “big-or-unknown” vs “small” tiers based on name parsing, then shuffle each tier and try big first.
- Add a name parser helper (
is_single_digit_b_name) and a lightweight in-place shuffle helper. - Add tests covering the parser and the new selection bias/spread behavior.
Comment on lines
+493
to
+516
| /// Accepts: a standalone digit 1-9 immediately followed by `b` or `B`, | ||
| /// with the digit *not* preceded by another digit or `.` (so "12B" and | ||
| /// "2.5B" don't count) and the `B` *not* followed by another digit (so | ||
| /// "BF16" isn't a match). | ||
| /// | ||
| /// Names without any digit-B pattern return false — they are treated as | ||
| /// "probably strong" because small open-weight models almost always | ||
| /// advertise their size in the filename. | ||
| fn is_single_digit_b_name(name: &str) -> bool { | ||
| let bytes = name.as_bytes(); | ||
| for i in 0..bytes.len() { | ||
| let c = bytes[i]; | ||
| if !c.is_ascii_digit() { | ||
| continue; | ||
| } | ||
| // Must be a single digit run at a word boundary: previous char | ||
| // must not be another digit, a '.', or an ASCII letter. That | ||
| // last part rules out MoE "active-params" tags like "A3B" where | ||
| // the 3B is a subset of a larger total count advertised | ||
| // elsewhere in the name (e.g. "Qwen3.6-35B-A3B"). | ||
| if i > 0 { | ||
| let prev = bytes[i - 1]; | ||
| if prev.is_ascii_digit() || prev == b'.' || prev.is_ascii_alphabetic() { | ||
| continue; |
There was a problem hiding this comment.
The is_single_digit_b_name doc comment says the digit must not be preceded by another digit or .; the implementation also rejects digits preceded by an ASCII letter (to avoid matching MoE tags like A3B). Please update the doc comment to include the “not preceded by ASCII letter” rule so the described matching criteria matches the code.
michaelneale
added a commit
that referenced
this pull request
Apr 20, 2026
…eases-rebased * origin/main: (86 commits) auto: only join community mesh, never private named meshes (#354) Fix: Update AMD device naming from HIP to ROCm to match upstream llama.cpp (#319) fix(ui): improve accessibility and code quality in UI components (#342) api/status: add routing_metrics to test StatusPayload literals (#352) Add `--headless` flag to disable the web UI while keeping the management API alive (#349) feature: revise node states to be simpler (#298) status: surface routing outcomes and utilization in the management API (#301) fix(ui): fix the unintended blue line in dark mode update for fly web deployment ui: richer image understanding and multi-file attachments (#336) Fix client join hang on unresponsive gossip peer (#346) Add reusable command bar / filter bars, ship the first model catalog filters (#306) Stop floods from piling up inside llama-server (#341) fix(ui-tests): add missing UI tests for React to CI and Justfile (#340) fix(light-mode): the toplogy view in light mode was not readable (#333) Rename PR github copilot instructions Auto routing prefers larger models when they are available (#334) Show GGUF variants in search results Default llama-server to a mild anti-repetition penalty (#328) Add Mac Catalyst slice to MeshLLMFFI XCFramework ... # Conflicts: # .github/workflows/docker.yml # Justfile # RELEASE.md # docker/Dockerfile.cuda
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a client sends a request with
model="auto"(or no model), the router picks from whatever the mesh serves. Until now that pick was uniform random, so on mixed-size public meshes a first-turn auto request had an even chance of landing on a small 2B/9B model even when much stronger models were available.This change partitions candidates into two tiers by name:
Qwen3.5-2B,Mistral-7B,llama-3-7b-instruct.A3B(the total 35B inQwen3.6-35B-A3Bwins).Each tier is shuffled independently and the big-or-unknown tier is tried first. Smalls are still reachable as a fallback when the big tier is empty, so single-host small-only meshes keep working unchanged.
Sticky-auto, retry/failover, capability filtering, and the existing media/tool filters are untouched — this only changes which name comes out of
pick_model_classifiedon the first turn of a session.Why a client-side change is enough
pick_model_classifiedruns on the client's own node inside its ingress handler. Peers never see or participate in the choice. Update the client binary, restart, new bias is live. No protocol change, no peer-version compatibility concern.Tests
A3B, empty name, unknown names.