Skip to content

Auto routing prefers larger models when they are available#334

Merged
michaelneale merged 2 commits intomainfrom
micn/auto-prefer-bigger
Apr 18, 2026
Merged

Auto routing prefers larger models when they are available#334
michaelneale merged 2 commits intomainfrom
micn/auto-prefer-bigger

Conversation

@michaelneale
Copy link
Copy Markdown
Collaborator

When a client sends a request with model="auto" (or no model), the router picks from whatever the mesh serves. Until now that pick was uniform random, so on mixed-size public meshes a first-turn auto request had an even chance of landing on a small 2B/9B model even when much stronger models were available.

This change partitions candidates into two tiers by name:

  • small tier: names advertising a single-digit billion-parameter count at a word boundary — Qwen3.5-2B, Mistral-7B, llama-3-7b-instruct.
  • big-or-unknown tier: everything else — multi-digit sizes (31B, 70B, 671B), decimal sizes (3.8B, 1.5B), names without an explicit size (MiniMax-M2.5, Qwen3-Coder-Next), and MoE active-params tags like A3B (the total 35B in Qwen3.6-35B-A3B wins).

Each tier is shuffled independently and the big-or-unknown tier is tried first. Smalls are still reachable as a fallback when the big tier is empty, so single-host small-only meshes keep working unchanged.

Sticky-auto, retry/failover, capability filtering, and the existing media/tool filters are untouched — this only changes which name comes out of pick_model_classified on the first turn of a session.

Why a client-side change is enough

pick_model_classified runs on the client's own node inside its ingress handler. Peers never see or participate in the choice. Update the client binary, restart, new bias is live. No protocol change, no peer-version compatibility concern.

Tests

  • Parser covers: single/multi-digit B, decimal sizes, BF16/FP16 substrings, MoE active-params A3B, empty name, unknown names.
  • Picker tests: 200-iteration assertion that smalls never win when bigs exist; fallback works on small-only meshes; nanosecond-seeded spread test verifies we're hitting multiple bigs across 500 picks.

When a client sends a request with model="auto" (or no model at all),
the router picks from whatever models the mesh currently serves. Until
now that pick was uniform random, which on public meshes that host a
mix of sizes meant roughly half of first-turn auto requests landed on a
small 2B/9B model even when a 31B, 35B or unnamed-but-strong model was
available.

The router now partitions candidates into two tiers by name:

- small tier: names advertising a single-digit billion-parameter
  count at a word boundary, e.g. "Qwen3.5-2B", "Mistral-7B",
  "llama-3-7b-instruct".
- big-or-unknown tier: multi-digit sizes (31B, 70B, 671B), decimal
  sizes (3.8B, 1.5B), names without an explicit size (MiniMax,
  Qwen3-Coder-Next), and MoE active-params tags like "A3B" are all
  ignored when scoring size — the total count on those names wins.

Each tier is shuffled independently and the first tier is tried first.
Smalls are still reachable as a fallback when the big-or-unknown tier
is empty, so single-host small-only meshes keep working unchanged.

Sticky-auto, retry/failover, capability filtering, and the existing
media/tool filters are untouched — this only changes which name comes
out of pick_model_classified on the first turn of a session.
Copilot AI review requested due to automatic review settings April 18, 2026 04:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the client-side auto model routing logic so that, on first-turn model="auto" picks, the router preferentially selects “big-or-unknown” model names over explicitly single-digit “xB” small models, while still falling back to small-only meshes.

Changes:

  • Partition candidate models into “big-or-unknown” vs “small” tiers based on name parsing, then shuffle each tier and try big first.
  • Add a name parser helper (is_single_digit_b_name) and a lightweight in-place shuffle helper.
  • Add tests covering the parser and the new selection bias/spread behavior.

Comment thread mesh-llm/src/network/router.rs
Comment thread mesh-llm/src/network/router.rs
Comment on lines +493 to +516
/// Accepts: a standalone digit 1-9 immediately followed by `b` or `B`,
/// with the digit *not* preceded by another digit or `.` (so "12B" and
/// "2.5B" don't count) and the `B` *not* followed by another digit (so
/// "BF16" isn't a match).
///
/// Names without any digit-B pattern return false — they are treated as
/// "probably strong" because small open-weight models almost always
/// advertise their size in the filename.
fn is_single_digit_b_name(name: &str) -> bool {
let bytes = name.as_bytes();
for i in 0..bytes.len() {
let c = bytes[i];
if !c.is_ascii_digit() {
continue;
}
// Must be a single digit run at a word boundary: previous char
// must not be another digit, a '.', or an ASCII letter. That
// last part rules out MoE "active-params" tags like "A3B" where
// the 3B is a subset of a larger total count advertised
// elsewhere in the name (e.g. "Qwen3.6-35B-A3B").
if i > 0 {
let prev = bytes[i - 1];
if prev.is_ascii_digit() || prev == b'.' || prev.is_ascii_alphabetic() {
continue;
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_single_digit_b_name doc comment says the digit must not be preceded by another digit or .; the implementation also rejects digits preceded by an ASCII letter (to avoid matching MoE tags like A3B). Please update the doc comment to include the “not preceded by ASCII letter” rule so the described matching criteria matches the code.

Copilot uses AI. Check for mistakes.
Comment thread mesh-llm/src/network/router.rs Outdated
Comment thread mesh-llm/src/network/router.rs
Copy link
Copy Markdown
Collaborator

@ndizazzo ndizazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@michaelneale michaelneale marked this pull request as draft April 18, 2026 05:05
@michaelneale michaelneale marked this pull request as ready for review April 18, 2026 05:10
@michaelneale michaelneale merged commit 9e2516c into main Apr 18, 2026
31 checks passed
@michaelneale michaelneale deleted the micn/auto-prefer-bigger branch April 18, 2026 07:02
michaelneale added a commit that referenced this pull request Apr 19, 2026
* main:
  fix(light-mode): the toplogy view in light mode was not readable (#333)
  Rename PR github copilot instructions
  Auto routing prefers larger models when they are available (#334)
michaelneale added a commit that referenced this pull request Apr 20, 2026
…eases-rebased

* origin/main: (86 commits)
  auto: only join community mesh, never private named meshes (#354)
  Fix: Update AMD device naming from HIP to ROCm to match upstream llama.cpp (#319)
  fix(ui): improve accessibility and code quality in UI components (#342)
  api/status: add routing_metrics to test StatusPayload literals (#352)
  Add `--headless` flag to disable the web UI while keeping the management API alive (#349)
  feature: revise node states to be simpler (#298)
  status: surface routing outcomes and utilization in the management API (#301)
  fix(ui): fix the unintended blue line in dark mode
  update for fly web deployment
  ui: richer image understanding and multi-file attachments (#336)
  Fix client join hang on unresponsive gossip peer (#346)
  Add reusable command bar / filter bars, ship the first model catalog filters (#306)
  Stop floods from piling up inside llama-server (#341)
  fix(ui-tests): add missing UI tests for React to CI and Justfile (#340)
  fix(light-mode): the toplogy view in light mode was not readable (#333)
  Rename PR github copilot instructions
  Auto routing prefers larger models when they are available (#334)
  Show GGUF variants in search results
  Default llama-server to a mild anti-repetition penalty (#328)
  Add Mac Catalyst slice to MeshLLMFFI XCFramework
  ...

# Conflicts:
#	.github/workflows/docker.yml
#	Justfile
#	RELEASE.md
#	docker/Dockerfile.cuda
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants