Fix LoRA scaling: divide alpha by rank (#845) by H-A-Khan · Pull Request #986 · Blaizzy/mlx-vlm

H-A-Khan · 2026-04-09T02:34:43Z

What

Apply the standard LoRA scaling factor alpha / rank (Hu et al. 2021) in LoRaLayer.__call__ and replace_lora_with_linear. Previously the layer multiplied the LoRA update by raw alpha, making the effective scaling rank-times too large for the documented defaults — for example r=8, alpha=16 gave an effective scaling of 16 instead of the intended 2.

This matches every other PEFT implementation, including HuggingFace peft and the original Microsoft LoRA reference.

Closes #845.

Changes

LoRaLayer.__init__ now stores self.rank and self.scaling = alpha / rank.
LoRaLayer.__call__ multiplies the update by self.scaling.
replace_lora_with_linear uses layer.scaling so merged weights match what the trained adapter applies during inference.
Two regression tests in mlx_vlm/tests/test_trainer_utils.py:
- test_lora_layer_uses_alpha_over_rank_scaling — checks the stored attribute.
- test_lora_layer_forward_matches_alpha_over_rank — checks the actual forward-pass output against the expected base + (alpha/rank) * (x A B) formula with a non-zero B.

Backwards compatibility

⚠️ This is a behavioural change. Adapters trained against the previous (broken) scaling will behave 8× weaker after this fix when r=8, alpha=16. Users who want the old effective scaling can multiply their alpha by rank — e.g. set --lora-alpha 128 to recover the old r=8, alpha=16 behaviour. The recommended action is to retrain with the documented defaults now that they actually mean what the docs say.

If maintainers prefer to gate this behind a flag for one release I am happy to do that — let me know.

Tests

python -m pytest mlx_vlm/tests/test_trainer_utils.py -v

test_find_all_linear_names PASSED
test_get_module_by_name PASSED
test_get_peft_model PASSED
test_lora_layer_forward_matches_alpha_over_rank PASSED
test_lora_layer_uses_alpha_over_rank_scaling PASSED
test_set_module_by_name PASSED
======================== 6 passed in 4.97s =========================

The standard LoRA formulation (Hu et al. 2021) scales the low-rank update by `alpha / rank`. `LoRaLayer.__call__` was multiplying the update by raw `alpha` instead, making the effective scaling rank-times too large for the documented defaults — for example r=8, alpha=16 gave an effective scaling of 16 instead of the intended 2. This affects every adapter trained on the current LoRaLayer and matches what every other PEFT implementation does, including the HuggingFace `peft` library and the original Microsoft LoRA repo. Changes: * `LoRaLayer.__init__` now stores `self.rank` and `self.scaling = alpha / rank` for use by both the forward pass and the merge helper. * `LoRaLayer.__call__` multiplies the LoRA update by `self.scaling` instead of `self.alpha`. * `replace_lora_with_linear` uses `layer.scaling` so the merged weights match what the trained adapter applies during inference. * Two regression tests in `test_trainer_utils.py` verify both the stored attribute and the actual forward pass output. ### Backwards compatibility note Adapters trained against the previous (broken) scaling will behave 8× weaker after this fix when r=8, alpha=16. Users who want the old effective scaling can multiply their alpha by rank (e.g. set alpha=128 to match the old r=8, alpha=16 behaviour). The recommended action is to retrain with the documented defaults now that they actually mean what the docs say. Closes Blaizzy#845

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix LoRA scaling: divide alpha by rank (#845)#986

Fix LoRA scaling: divide alpha by rank (#845)#986
H-A-Khan wants to merge 1 commit intoBlaizzy:mainfrom
H-A-Khan:fix/lora-alpha-scaling

H-A-Khan commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

H-A-Khan commented Apr 9, 2026

What

Changes

Backwards compatibility

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant