Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion examples/llm_ptq/example_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -862,7 +862,13 @@ def resolve_checkpoint_dir(quant_cfg: dict, model_path: str) -> dict:
else:
name = Path(name).name

config_hash = hashlib.sha256(json.dumps(quant_cfg, default=str).encode()).hexdigest()[:8]
# Hash only the algorithm dict (scalar fields, fully JSON-serializable and deterministic).
alg_for_hash = {
k: v
for k, v in sorted(algorithm.items())
if k != "layerwise_checkpoint_dir" and isinstance(v, (str, int, float, bool, type(None)))
}
config_hash = hashlib.sha256(json.dumps(alg_for_hash, sort_keys=True).encode()).hexdigest()[:8]
Comment on lines +865 to +871
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid silent checkpoint-hash collisions from dropped algorithm fields.

The current hash input excludes all non-scalar values, so different algorithm configs can collapse to the same suffix and accidentally reuse incompatible layerwise checkpoints.

💡 Proposed fix
-    # Hash only the algorithm dict (scalar fields, fully JSON-serializable and deterministic).
-    alg_for_hash = {
-        k: v
-        for k, v in sorted(algorithm.items())
-        if k != "layerwise_checkpoint_dir" and isinstance(v, (str, int, float, bool, type(None)))
-    }
-    config_hash = hashlib.sha256(json.dumps(alg_for_hash, sort_keys=True).encode()).hexdigest()[:8]
+    def _stable_jsonable(obj):
+        if isinstance(obj, dict):
+            return {k: _stable_jsonable(v) for k, v in sorted(obj.items())}
+        if isinstance(obj, (list, tuple)):
+            return [_stable_jsonable(v) for v in obj]
+        if isinstance(obj, (str, int, float, bool, type(None))):
+            return obj
+        raise TypeError(
+            f"Unsupported algorithm field type for checkpoint hash: {type(obj).__name__}"
+        )
+
+    alg_for_hash = _stable_jsonable(
+        {k: v for k, v in algorithm.items() if k != "layerwise_checkpoint_dir"}
+    )
+    config_hash = hashlib.sha256(
+        json.dumps(alg_for_hash, sort_keys=True, separators=(",", ":")).encode()
+    ).hexdigest()[:8]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_ptq/example_utils.py` around lines 865 - 871, The hash currently
drops all non-scalar fields from algorithm, risking collisions; instead, build
alg_for_hash from algorithm (still skipping layerwise_checkpoint_dir) but for
non-JSON-scalars include a deterministic placeholder that captures their
identity — e.g., replace each non-scalar value with a small stable descriptor
like its type name plus a short hash of repr(value) (or a deterministic
serialization), then compute config_hash from json.dumps(alg_for_hash,
sort_keys=True). This preserves uniqueness for nested/complex fields while
keeping the hash deterministic and still excluding layerwise_checkpoint_dir.


quant_cfg = copy.deepcopy(quant_cfg)
quant_cfg["algorithm"]["layerwise_checkpoint_dir"] = os.path.join(
Expand Down
34 changes: 18 additions & 16 deletions modelopt/torch/quantization/model_calib.py
Original file line number Diff line number Diff line change
Expand Up @@ -1636,34 +1636,36 @@ def layerwise_calibrate(
input_getter = LayerActivationCollector(model)
input_getter._patch_all_layers(decoder_layers=transformer_layers)

resumed_inputs = ckpt.setup_resume(transformer_layers) if ckpt and start_layer > 0 else None
# When all layers are already done (start_layer == num_layers), skip input setup:
resumed_inputs = (
ckpt.setup_resume(transformer_layers) if ckpt and 0 < start_layer < num_layers else None
)

try:
# Bootstrap: get first layer's inputs (or use resumed inputs).
layer_inputs = input_getter.get_first_layer_inputs(
start_layer, resumed_inputs, forward_loop
)
# Skip entirely when all layers are already calibrated (start_layer == num_layers).
if start_layer < num_layers:
layer_inputs = input_getter.get_first_layer_inputs(
start_layer, resumed_inputs, forward_loop
)
else:
layer_inputs = None
Comment on lines +1660 to +1673
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Inspect detect_resume_point completion behavior:"
rg -n -C4 'def detect_resume_point|last \+ 1 >= total|return None' modelopt/torch/quantization/utils/layerwise_calib.py

echo
echo "Inspect _CheckpointState.from_folder start_layer assignment:"
rg -n -C6 'def from_folder|detect_resume_point|start =|return cls\(.*start_layer' modelopt/torch/quantization/utils/layerwise_calib.py

Repository: NVIDIA/Model-Optimizer

Length of output: 5411


🏁 Script executed:

sed -n '1640,1680p' modelopt/torch/quantization/model_calib.py | cat -n

Repository: NVIDIA/Model-Optimizer

Length of output: 2211


🏁 Script executed:

# Double-check: is there any other code path that could set start_layer to num_layers?
rg -n 'start_layer.*=.*num_layers|start_layer.*=.*total' modelopt/torch/quantization/

Repository: NVIDIA/Model-Optimizer

Length of output: 340


The start_layer == num_layers fast-path is unreachable due to checkpoint helper behavior.

The code at lines 1651 and 1659 is designed to skip bootstrap when all layers are already calibrated (start_layer == num_layers). However, detect_resume_point() returns None when calibration is complete, and _CheckpointState.from_folder() then sets start_layer = 0 (line 570 of layerwise_calib.py). This means resuming from a complete checkpoint will always have start_layer == 0, causing the bootstrap to execute unnecessarily.

The checkpoint helper should either set start_layer = num_layers when complete, or the code should check for completion via a dedicated flag instead of relying on start_layer == num_layers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 1651 - 1664, The
current fast-path relying on start_layer == num_layers is unreachable because
_CheckpointState.from_folder() resets start_layer to 0 when
detect_resume_point() returns None; fix this by having the checkpoint helper
indicate completion instead of hiding it: update _CheckpointState.from_folder
(and detect_resume_point) to set start_layer = num_layers (or expose a completed
flag like ckpt.completed) when calibration is already complete, and then use
that value/flag here (the start_layer check in model_calib.py that decides
whether to call ckpt.setup_resume and input_getter.get_first_layer_inputs).
Reference symbols to change: detect_resume_point, _CheckpointState.from_folder,
_CheckpointState (add completed or set start_layer=num_layers),
ckpt.setup_resume, start_layer, and input_getter.get_first_layer_inputs.


for layer_idx in range(start_layer, num_layers):
layer = transformer_layers[layer_idx]

def _layer_forward_loop(m, _inputs=layer_inputs):
for args, kwargs_input in _inputs:
# Reset past_key_values to prevent the KV cache from
# accumulating across multiple forward replays (e.g.
# max_calibrate then Hessian collection in GPTQ).
# The layer doesn't need stale KV data — each replay
# should start with a fresh cache.
if (
"past_key_values" in kwargs_input
and kwargs_input["past_key_values"] is not None
):
# Always clear past_key_values for each replay so layers
# that behave differently in decode vs prefill mode (e.g.
# NemotronH SSM/Mamba) always run in prefill mode where
# hidden_states has the full sequence length.
if "past_key_values" in kwargs_input:
kwargs_input = dict(kwargs_input)
cache = kwargs_input["past_key_values"]
if hasattr(cache, "reset"):
if cache is not None and hasattr(cache, "reset"):
cache.reset()
else:
kwargs_input["past_key_values"] = None
kwargs_input["past_key_values"] = None
m(*args, **kwargs_input)

with persistent_materialization(layer):
Expand Down
Loading