Auto routing prefers larger models when they are available by michaelneale · Pull Request #334 · Mesh-LLM/mesh-llm

michaelneale · 2026-04-18T04:55:44Z

When a client sends a request with model="auto" (or no model), the router picks from whatever the mesh serves. Until now that pick was uniform random, so on mixed-size public meshes a first-turn auto request had an even chance of landing on a small 2B/9B model even when much stronger models were available.

This change partitions candidates into two tiers by name:

small tier: names advertising a single-digit billion-parameter count at a word boundary — Qwen3.5-2B, Mistral-7B, llama-3-7b-instruct.
big-or-unknown tier: everything else — multi-digit sizes (31B, 70B, 671B), decimal sizes (3.8B, 1.5B), names without an explicit size (MiniMax-M2.5, Qwen3-Coder-Next), and MoE active-params tags like A3B (the total 35B in Qwen3.6-35B-A3B wins).

Each tier is shuffled independently and the big-or-unknown tier is tried first. Smalls are still reachable as a fallback when the big tier is empty, so single-host small-only meshes keep working unchanged.

Sticky-auto, retry/failover, capability filtering, and the existing media/tool filters are untouched — this only changes which name comes out of pick_model_classified on the first turn of a session.

Why a client-side change is enough

pick_model_classified runs on the client's own node inside its ingress handler. Peers never see or participate in the choice. Update the client binary, restart, new bias is live. No protocol change, no peer-version compatibility concern.

Tests

Parser covers: single/multi-digit B, decimal sizes, BF16/FP16 substrings, MoE active-params A3B, empty name, unknown names.
Picker tests: 200-iteration assertion that smalls never win when bigs exist; fallback works on small-only meshes; nanosecond-seeded spread test verifies we're hitting multiple bigs across 500 picks.

When a client sends a request with model="auto" (or no model at all), the router picks from whatever models the mesh currently serves. Until now that pick was uniform random, which on public meshes that host a mix of sizes meant roughly half of first-turn auto requests landed on a small 2B/9B model even when a 31B, 35B or unnamed-but-strong model was available. The router now partitions candidates into two tiers by name: - small tier: names advertising a single-digit billion-parameter count at a word boundary, e.g. "Qwen3.5-2B", "Mistral-7B", "llama-3-7b-instruct". - big-or-unknown tier: multi-digit sizes (31B, 70B, 671B), decimal sizes (3.8B, 1.5B), names without an explicit size (MiniMax, Qwen3-Coder-Next), and MoE active-params tags like "A3B" are all ignored when scoring size — the total count on those names wins. Each tier is shuffled independently and the first tier is tried first. Smalls are still reachable as a fallback when the big-or-unknown tier is empty, so single-host small-only meshes keep working unchanged. Sticky-auto, retry/failover, capability filtering, and the existing media/tool filters are untouched — this only changes which name comes out of pick_model_classified on the first turn of a session.

Copilot

Pull request overview

This PR updates the client-side auto model routing logic so that, on first-turn model="auto" picks, the router preferentially selects “big-or-unknown” model names over explicitly single-digit “xB” small models, while still falling back to small-only meshes.

Changes:

Partition candidate models into “big-or-unknown” vs “small” tiers based on name parsing, then shuffle each tier and try big first.
Add a name parser helper (is_single_digit_b_name) and a lightweight in-place shuffle helper.
Add tests covering the parser and the new selection bias/spread behavior.

Copilot · 2026-04-18T04:58:47Z

+/// Accepts: a standalone digit 1-9 immediately followed by `b` or `B`,
+/// with the digit *not* preceded by another digit or `.` (so "12B" and
+/// "2.5B" don't count) and the `B` *not* followed by another digit (so
+/// "BF16" isn't a match).
+///
+/// Names without any digit-B pattern return false — they are treated as
+/// "probably strong" because small open-weight models almost always
+/// advertise their size in the filename.
+fn is_single_digit_b_name(name: &str) -> bool {
+    let bytes = name.as_bytes();
+    for i in 0..bytes.len() {
+        let c = bytes[i];
+        if !c.is_ascii_digit() {
+            continue;
+        }
+        // Must be a single digit run at a word boundary: previous char
+        // must not be another digit, a '.', or an ASCII letter. That
+        // last part rules out MoE "active-params" tags like "A3B" where
+        // the 3B is a subset of a larger total count advertised
+        // elsewhere in the name (e.g. "Qwen3.6-35B-A3B").
+        if i > 0 {
+            let prev = bytes[i - 1];
+            if prev.is_ascii_digit() || prev == b'.' || prev.is_ascii_alphabetic() {
+                continue;


The is_single_digit_b_name doc comment says the digit must not be preceded by another digit or .; the implementation also rejects digits preceded by an ASCII letter (to avoid matching MoE tags like A3B). Please update the doc comment to include the “not preceded by ASCII letter” rule so the described matching criteria matches the code.

ndizazzo

LGTM

* main: fix(light-mode): the toplogy view in light mode was not readable (#333) Rename PR github copilot instructions Auto routing prefers larger models when they are available (#334)

…eases-rebased * origin/main: (86 commits) auto: only join community mesh, never private named meshes (#354) Fix: Update AMD device naming from HIP to ROCm to match upstream llama.cpp (#319) fix(ui): improve accessibility and code quality in UI components (#342) api/status: add routing_metrics to test StatusPayload literals (#352) Add `--headless` flag to disable the web UI while keeping the management API alive (#349) feature: revise node states to be simpler (#298) status: surface routing outcomes and utilization in the management API (#301) fix(ui): fix the unintended blue line in dark mode update for fly web deployment ui: richer image understanding and multi-file attachments (#336) Fix client join hang on unresponsive gossip peer (#346) Add reusable command bar / filter bars, ship the first model catalog filters (#306) Stop floods from piling up inside llama-server (#341) fix(ui-tests): add missing UI tests for React to CI and Justfile (#340) fix(light-mode): the toplogy view in light mode was not readable (#333) Rename PR github copilot instructions Auto routing prefers larger models when they are available (#334) Show GGUF variants in search results Default llama-server to a mild anti-repetition penalty (#328) Add Mac Catalyst slice to MeshLLMFFI XCFramework ... # Conflicts: # .github/workflows/docker.yml # Justfile # RELEASE.md # docker/Dockerfile.cuda

Copilot AI review requested due to automatic review settings April 18, 2026 04:55

Copilot started reviewing on behalf of michaelneale April 18, 2026 04:56 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

ndizazzo approved these changes Apr 18, 2026

View reviewed changes

michaelneale marked this pull request as draft April 18, 2026 05:05

michaelneale marked this pull request as ready for review April 18, 2026 05:10

router: drop noisy shuffle comments

1a51a3f

michaelneale merged commit 9e2516c into main Apr 18, 2026
31 checks passed

michaelneale deleted the micn/auto-prefer-bigger branch April 18, 2026 07:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto routing prefers larger models when they are available#334

Auto routing prefers larger models when they are available#334
michaelneale merged 2 commits intomainfrom
micn/auto-prefer-bigger

michaelneale commented Apr 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Uh oh!

Uh oh!

ndizazzo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

michaelneale commented Apr 18, 2026

Why a client-side change is enough

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ndizazzo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants