Skip to content

[DRAFT] LLM Compressor integration#2299

Draft
idoudali wants to merge 2 commits intomainfrom
idoudali/quantization
Draft

[DRAFT] LLM Compressor integration#2299
idoudali wants to merge 2 commits intomainfrom
idoudali/quantization

Conversation

@idoudali
Copy link
Copy Markdown
Contributor

@idoudali idoudali commented Mar 23, 2026

Description

Related issues

Fixes # (issue)

Before submitting

  • This PR only changes documentation. (You can ignore the following checks in that case)
  • Did you read the contributor guideline Pull Request guidelines?
  • Did you link the issue(s) related to this PR in the section above?
  • Did you add / update tests where needed?

Reviewers

At least one review from a member of oumi-ai/oumi-staff is required.


Summary by Gitar

  • Major integration change:
    • Replaced AWQ quantization backend with LLM Compressor (vLLM project), supporting FP8, GPTQ, AWQ, and W-bit schemes
    • Deprecated awq_quantizer.py; added llmcompressor_quantizer.py with oneshot API integration
  • Configuration overhaul:
    • Changed from method (string) to backend + scheme + algorithm (enums) in QuantizationConfig
    • Added calibration dataset control, ignore-layers, and algorithm selection
  • CLI updates:
    • Split --method into --backend, --scheme, --algorithm flags
    • Updated example model IDs and quantization examples
  • Tests:
    • Removed AWQ tests; added LLM Compressor and constants tests
    • Updated builder, BNB, and base quantization tests for new enum-based API

This will update automatically on new commits.

@idoudali idoudali requested a review from oelachqar March 24, 2026 08:17
method: str = "awq_q4_0"
"""Quantization method. AWQ methods (awq_q4_0, awq_q8_0) provide best quality.
Direct GGUF methods (q4_0, q8_0) for llama.cpp. Precision methods (f16, f32)."""
backend: str | None = None
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would like not to expose the backend

The customer cares more about the format and not the backend being use

Keep the naming convention as is for now and infert the backend form the format. use a prefix like "bnb_" for the old case to avoid introducing more changes

@idoudali idoudali force-pushed the idoudali/quantization branch from 015d0f6 to 4a09ae3 Compare March 30, 2026 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant