Skip to content

Fix LoRA scaling: divide alpha by rank (#845)#986

Open
H-A-Khan wants to merge 1 commit intoBlaizzy:mainfrom
H-A-Khan:fix/lora-alpha-scaling
Open

Fix LoRA scaling: divide alpha by rank (#845)#986
H-A-Khan wants to merge 1 commit intoBlaizzy:mainfrom
H-A-Khan:fix/lora-alpha-scaling

Conversation

@H-A-Khan
Copy link
Copy Markdown

@H-A-Khan H-A-Khan commented Apr 9, 2026

What

Apply the standard LoRA scaling factor alpha / rank (Hu et al. 2021) in LoRaLayer.__call__ and replace_lora_with_linear. Previously the layer multiplied the LoRA update by raw alpha, making the effective scaling rank-times too large for the documented defaults — for example r=8, alpha=16 gave an effective scaling of 16 instead of the intended 2.

This matches every other PEFT implementation, including HuggingFace peft and the original Microsoft LoRA reference.

Closes #845.

Changes

  • LoRaLayer.__init__ now stores self.rank and self.scaling = alpha / rank.
  • LoRaLayer.__call__ multiplies the update by self.scaling.
  • replace_lora_with_linear uses layer.scaling so merged weights match what the trained adapter applies during inference.
  • Two regression tests in mlx_vlm/tests/test_trainer_utils.py:
    • test_lora_layer_uses_alpha_over_rank_scaling — checks the stored attribute.
    • test_lora_layer_forward_matches_alpha_over_rank — checks the actual forward-pass output against the expected base + (alpha/rank) * (x A B) formula with a non-zero B.

Backwards compatibility

⚠️ This is a behavioural change. Adapters trained against the previous (broken) scaling will behave 8× weaker after this fix when r=8, alpha=16. Users who want the old effective scaling can multiply their alpha by rank — e.g. set --lora-alpha 128 to recover the old r=8, alpha=16 behaviour. The recommended action is to retrain with the documented defaults now that they actually mean what the docs say.

If maintainers prefer to gate this behind a flag for one release I am happy to do that — let me know.

Tests

python -m pytest mlx_vlm/tests/test_trainer_utils.py -v
test_find_all_linear_names PASSED
test_get_module_by_name PASSED
test_get_peft_model PASSED
test_lora_layer_forward_matches_alpha_over_rank PASSED
test_lora_layer_uses_alpha_over_rank_scaling PASSED
test_set_module_by_name PASSED
======================== 6 passed in 4.97s =========================

The standard LoRA formulation (Hu et al. 2021) scales the low-rank
update by `alpha / rank`. `LoRaLayer.__call__` was multiplying the
update by raw `alpha` instead, making the effective scaling
rank-times too large for the documented defaults — for example
r=8, alpha=16 gave an effective scaling of 16 instead of the
intended 2.

This affects every adapter trained on the current LoRaLayer and
matches what every other PEFT implementation does, including the
HuggingFace `peft` library and the original Microsoft LoRA repo.

Changes:
  * `LoRaLayer.__init__` now stores `self.rank` and `self.scaling
    = alpha / rank` for use by both the forward pass and the merge
    helper.
  * `LoRaLayer.__call__` multiplies the LoRA update by
    `self.scaling` instead of `self.alpha`.
  * `replace_lora_with_linear` uses `layer.scaling` so the merged
    weights match what the trained adapter applies during inference.
  * Two regression tests in `test_trainer_utils.py` verify both the
    stored attribute and the actual forward pass output.

### Backwards compatibility note

Adapters trained against the previous (broken) scaling will behave
8× weaker after this fix when r=8, alpha=16. Users who want the
old effective scaling can multiply their alpha by rank (e.g. set
alpha=128 to match the old r=8, alpha=16 behaviour). The
recommended action is to retrain with the documented defaults now
that they actually mean what the docs say.

Closes Blaizzy#845
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LoRaLayer uses raw alpha instead of alpha/rank — 8x scaling error with default settings

1 participant