-
Notifications
You must be signed in to change notification settings - Fork 138
[DRAFT DO NOT REVIEW] adds claude integration test and plugin #1511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jomitchellnv
wants to merge
1
commit into
main
Choose a base branch
from
jm/claudify-bionemo-recipes
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| name: Claude Integration Tests | ||
| on: | ||
| schedule: | ||
| - cron: "0 6 * * 1" # Weekly Monday 6am UTC | ||
| workflow_dispatch: | ||
| push: | ||
| paths: | ||
| - "bionemo-recipes/claude-plugin/**" | ||
| - "bionemo-recipes/integration-tests/**" | ||
|
|
||
| jobs: | ||
| test: | ||
| runs-on: linux-amd64-gpu-l4-latest-1 | ||
| container: | ||
| image: nvcr.io/nvidia/pytorch:25.06-py3 | ||
| env: | ||
| ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Install Claude Code CLI | ||
| run: npm install -g @anthropic-ai/claude-code | ||
|
|
||
| - name: Install test dependencies | ||
| run: pip install pytest pytest-timeout | ||
|
|
||
| - name: Run integration tests | ||
| run: cd bionemo-recipes/integration-tests && pytest -v --timeout=600 | ||
| timeout-minutes: 30 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| { | ||
| "name": "bionemo-recipes", | ||
| "version": "0.1.0", | ||
| "description": "Convert HuggingFace models to TransformerEngine, add FP8 support, set up distributed training — using NVIDIA BioNeMo Recipes as reference.", | ||
| "author": { "name": "NVIDIA BioNeMo Team" }, | ||
| "repository": "https://github.com/NVIDIA/bionemo-framework", | ||
| "license": "Apache-2.0", | ||
| "keywords": ["transformerengine", "fp8", "fsdp", "distributed-training", "nvidia"] | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| # BioNeMo Recipes Claude Plugin | ||
|
|
||
| A Claude Code plugin for converting HuggingFace models to NVIDIA TransformerEngine, | ||
| adding FP8/FP4 quantization support, writing golden value tests, and setting up | ||
| FSDP distributed training. All skills use real BioNeMo Recipes as reference implementations. | ||
|
|
||
| ## Installation | ||
|
|
||
| ```bash | ||
| claude --add-dir /path/to/bionemo-recipes/claude-plugin | ||
| ``` | ||
|
|
||
| ## Available Skills | ||
|
|
||
| | Skill | Description | | ||
| | ---------------------- | -------------------------------------------------------------------------------------------------------------------------- | | ||
| | `/te-convert-model` | Convert a HuggingFace `PreTrainedModel` to use TransformerEngine layers with bidirectional weight conversion (HF \<-> TE). | | ||
| | `/add-fp8-support` | Add FP8 or FP4 quantized training support to an existing TransformerEngine model. | | ||
| | `/write-golden-tests` | Generate golden value tests that verify a TE model produces identical outputs to the original HF reference model. | | ||
| | `/setup-fsdp-training` | Scaffold a complete FSDP training recipe with Hydra configs, distributed launcher, and Docker environment. | | ||
| | `/export-to-hf-hub` | Create an export script that bundles model weights, tokenizer, and config for publishing to the Hugging Face Hub. | | ||
|
|
||
| ## Usage Examples | ||
|
|
||
| ### Convert a HuggingFace model to TransformerEngine | ||
|
|
||
| ``` | ||
| /te-convert-model facebook/esm2_t33_650M_UR50D | ||
| ``` | ||
|
|
||
| Generates a TE-backed `PreTrainedModel` class with `convert_hf_to_te()` and | ||
| `convert_te_to_hf()` functions, following the pattern in `bionemo-recipes/models/`. | ||
|
|
||
| ### Add FP8 quantized training | ||
|
|
||
| ``` | ||
| /add-fp8-support --precision fp8 | ||
| ``` | ||
|
|
||
| Adds FP8 recipe configuration, `DelayedScaling` setup, and the `fp8_autocast` | ||
| context manager to your training loop. | ||
|
|
||
| ### Write golden value tests | ||
|
|
||
| ``` | ||
| /write-golden-tests --model esm2 --reference facebook/esm2_t33_650M_UR50D | ||
| ``` | ||
|
|
||
| Creates pytest tests that load both the HF reference and TE model, run a forward | ||
| pass with fixed inputs, and assert outputs match within tolerance. | ||
|
|
||
| ### Set up FSDP distributed training | ||
|
|
||
| ``` | ||
| /setup-fsdp-training --model esm2 --framework native_te | ||
| ``` | ||
|
|
||
| Scaffolds a self-contained recipe directory with a Dockerfile, training script, | ||
| Hydra configs, and a sample data loader. | ||
|
|
||
| ### Export model to Hugging Face Hub | ||
|
|
||
| ``` | ||
| /export-to-hf-hub --model esm2 | ||
| ``` | ||
|
|
||
| Generates an `export.py` script that packages weights, config, and tokenizer | ||
| files for upload to Hugging Face Hub. | ||
|
|
||
| ## Links | ||
|
|
||
| - [BioNeMo Framework](https://github.com/NVIDIA/bionemo-framework) | ||
| - [BioNeMo Recipes README](../README.md) |
136 changes: 136 additions & 0 deletions
136
bionemo-recipes/claude-plugin/skills/add-fp8-support/SKILL.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,136 @@ | ||
| --- | ||
| name: add-fp8-support | ||
| description: > | ||
| Add FP8, MXFP8, or NVFP4 quantization support to a TransformerEngine model. | ||
| Triggers when user asks about FP8, FP4, quantization, mixed precision, | ||
| or low-precision training. | ||
| allowed-tools: Read, Grep, Glob, Write, Edit, Bash, Agent | ||
| argument-hint: '[fp8|mxfp8|nvfp4]' | ||
| --- | ||
|
|
||
| # Add FP8/FP4 Quantization Support | ||
|
|
||
| You are adding quantization support to a TransformerEngine model. Read the reference files first. | ||
|
|
||
| ## Reference Files | ||
|
|
||
| - `reference/quantization.py` — Layer-wise precision assignment | ||
| - `reference/fp8_config_example.py` — FP8 recipe setup in training | ||
|
|
||
| ## Steps | ||
|
|
||
| ### 1. Add Config Fields | ||
|
|
||
| Add these fields to the NV config class: | ||
|
|
||
| - `layer_precision: list[str | None] | None = None` — Per-layer precision ("fp8", "fp4", None) | ||
| - `use_quantized_model_init: bool = False` — Initialize weights directly in quantized format | ||
|
|
||
| Validate in `__init__`: | ||
|
|
||
| ```python | ||
| if layer_precision is not None: | ||
| assert len(layer_precision) == self.num_hidden_layers | ||
| for p in layer_precision: | ||
| assert p in {"fp8", "fp4", None} | ||
| ``` | ||
|
|
||
| ### 2. Pad Vocabulary Size | ||
|
|
||
| FP8 requires tensor dimensions divisible by 16. Pad vocab: | ||
|
|
||
| ```python | ||
| self.padded_vocab_size = padded_vocab_size or self.vocab_size | ||
| # Round up to next multiple of 16 | ||
| if self.padded_vocab_size % 16 != 0: | ||
| self.padded_vocab_size = ((self.padded_vocab_size + 15) // 16) * 16 | ||
| ``` | ||
|
|
||
| Update embedding and LM head to use `padded_vocab_size`. Truncate logits back to `vocab_size` in forward pass. | ||
|
|
||
| ### 3. Implement `get_autocast_context()` | ||
|
|
||
| This method returns the appropriate TE context manager for each layer: | ||
|
|
||
| ```python | ||
| from contextlib import nullcontext | ||
| import transformer_engine.pytorch as te | ||
|
|
||
|
|
||
| def get_autocast_context(self, layer_number, init=False, outer=False): | ||
| if self.config.layer_precision is None: | ||
| return nullcontext() | ||
|
|
||
| # Outer context wraps entire encoder for recipe post-processing | ||
| if outer: | ||
| if "fp8" not in self.config.layer_precision: | ||
| return nullcontext() | ||
| return te.autocast(enabled=True, recipe=self._fp8_recipe) | ||
|
|
||
| precision = self.config.layer_precision[layer_number] | ||
| recipe = {"fp8": self._fp8_recipe, "fp4": self._fp4_recipe}.get(precision) | ||
|
|
||
| # During init: use quantized_model_init for weight initialization | ||
| if init and self.config.use_quantized_model_init: | ||
| if precision in ("fp8", "fp4"): | ||
| return te.quantized_model_init(recipe=recipe) | ||
| return nullcontext() | ||
|
|
||
| # During forward: use autocast for precision control | ||
| if precision in ("fp8", "fp4"): | ||
| return te.autocast(enabled=True, recipe=recipe) | ||
| return te.autocast(enabled=False) # Explicitly disable for BF16 layers | ||
| ``` | ||
|
|
||
| ### 4. Use Contexts in Model | ||
|
|
||
| During layer creation: | ||
|
|
||
| ```python | ||
| for i in range(config.num_hidden_layers): | ||
| with self.get_autocast_context(i, init=True): | ||
| layers.append(te.TransformerLayer(...)) | ||
| ``` | ||
|
|
||
| During forward pass: | ||
|
|
||
| ```python | ||
| with self.get_autocast_context(None, outer=True): | ||
| for layer_idx, layer in enumerate(self.layers): | ||
| with self.get_autocast_context(layer_idx): | ||
| hidden_states = layer(hidden_states, ...) | ||
| ``` | ||
|
|
||
| ### 5. Keep LM Head in Higher Precision | ||
|
|
||
| ```python | ||
| with te.autocast(enabled=False): | ||
| logits = self.lm_head(hidden_states) | ||
| ``` | ||
|
|
||
| ### 6. Set Up FP8 Recipes | ||
|
|
||
| In training script: | ||
|
|
||
| ```python | ||
| from transformer_engine.common.recipe import DelayedScaling, Format | ||
|
|
||
| fp8_recipe = DelayedScaling(fp8_format=Format.HYBRID) | ||
| model = MyTEModel(config, fp8_recipe=fp8_recipe) | ||
| ``` | ||
|
|
||
| Available recipes: | ||
|
|
||
| - `DelayedScaling` — Classic FP8, computes scaling factors with delay | ||
| - `Float8CurrentScaling` — Per-tensor current scaling | ||
| - `Float8BlockScaling` — Block-wise scaling (MXFP8) | ||
| - `NVFP4BlockScaling` — 4-bit quantization | ||
|
|
||
| ### 7. Layer-wise Precision Assignment | ||
|
|
||
| Use `resolve_layer_precision()` from reference to assign layers: | ||
|
|
||
| ```python | ||
| # In config: fp8_layers=[1,2,3], fp4_layers=[4,5,6] (1-indexed) | ||
| # Returns: ["fp8","fp8","fp8","fp4","fp4","fp4"] (0-indexed) | ||
| ``` |
66 changes: 66 additions & 0 deletions
66
bionemo-recipes/claude-plugin/skills/add-fp8-support/reference/fp8_config_example.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: LicenseRef-Apache2 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| """Reference: FP8 recipe setup in a training script. | ||
|
|
||
| Shows how to create and use FP8/FP4 recipes with TransformerEngine models. | ||
| """ | ||
|
|
||
| from transformer_engine.common.recipe import ( | ||
| DelayedScaling, | ||
| Float8BlockScaling, | ||
| Float8CurrentScaling, | ||
| Format, | ||
| NVFP4BlockScaling, | ||
| ) | ||
|
|
||
|
|
||
| def create_fp8_recipe(recipe_name: str = "DelayedScaling", **kwargs): | ||
| """Create an FP8 recipe by name. | ||
|
|
||
| Available recipes: | ||
| - DelayedScaling: Classic FP8, scaling factors computed with delay | ||
| - Float8CurrentScaling: Per-tensor scaling computed each step | ||
| - Float8BlockScaling: Block-wise scaling (MXFP8) | ||
| - NVFP4BlockScaling: 4-bit quantization | ||
| """ | ||
| recipes = { | ||
| "DelayedScaling": DelayedScaling, | ||
| "Float8CurrentScaling": Float8CurrentScaling, | ||
| "Float8BlockScaling": Float8BlockScaling, | ||
| "NVFP4BlockScaling": NVFP4BlockScaling, | ||
| } | ||
| recipe_cls = recipes[recipe_name] | ||
|
|
||
| # NOTE: Format.HYBRID uses E4M3 for forward, E5M2 for backward | ||
| if "fp8_format" not in kwargs and recipe_name != "NVFP4BlockScaling": | ||
| kwargs["fp8_format"] = Format.HYBRID | ||
| if "fp4_format" not in kwargs and recipe_name == "NVFP4BlockScaling": | ||
| kwargs["fp4_format"] = Format.E2M1 | ||
|
|
||
| return recipe_cls(**kwargs) | ||
|
|
||
|
|
||
| # Example usage in training script: | ||
| def setup_model_with_fp8(config, layer_precision): | ||
| """Example of setting up a TE model with FP8 quantization.""" | ||
| config.layer_precision = layer_precision | ||
|
|
||
| fp8_recipe = create_fp8_recipe("DelayedScaling") | ||
|
|
||
| # NOTE: Pass recipe to model constructor, not as global state | ||
| # model = NVModelForMaskedLM(config, fp8_recipe=fp8_recipe) | ||
|
|
||
| return config, fp8_recipe |
69 changes: 69 additions & 0 deletions
69
bionemo-recipes/claude-plugin/skills/add-fp8-support/reference/quantization.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: LicenseRef-Apache2 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| """Reference: Layer-wise quantization assignment utilities. | ||
|
|
||
| Demonstrates how to resolve user-specified layer lists into per-layer precision assignments. | ||
| """ | ||
|
|
||
|
|
||
| def resolve_layer_precision( | ||
| num_layers: int, | ||
| fp8_enabled: bool, | ||
| fp4_enabled: bool, | ||
| fp8_layers: list[int] | None, | ||
| fp4_layers: list[int] | None, | ||
| ) -> list[str | None]: | ||
| """Resolve layer-wise quantization from user config. | ||
|
|
||
| Takes 1-indexed layer lists and returns 0-indexed precision list. | ||
|
|
||
| Examples: | ||
| # All layers FP8 | ||
| resolve_layer_precision(6, fp8_enabled=True, fp4_enabled=False, None, None) | ||
| # -> ["fp8", "fp8", "fp8", "fp8", "fp8", "fp8"] | ||
|
|
||
| # Mixed: layers 1-3 FP8, layers 4-6 FP4 | ||
| resolve_layer_precision(6, True, True, [1,2,3], [4,5,6]) | ||
| # -> ["fp8", "fp8", "fp8", "fp4", "fp4", "fp4"] | ||
| """ | ||
| all_layers = set(range(1, num_layers + 1)) | ||
|
|
||
| if fp8_enabled and fp4_enabled and fp8_layers is None and fp4_layers is None: | ||
| raise ValueError("Both fp8 and fp4 enabled but no layer lists specified. Provide explicit layer assignments.") | ||
|
|
||
| # Auto-fill: if one format has explicit layers, other gets remaining | ||
| if fp8_enabled and fp8_layers is None: | ||
| claimed = set(fp4_layers) if fp4_layers else set() | ||
| fp8_layers = sorted(all_layers - claimed) | ||
|
|
||
| if fp4_enabled and fp4_layers is None: | ||
| claimed = set(fp8_layers) if fp8_layers else set() | ||
| fp4_layers = sorted(all_layers - claimed) | ||
|
|
||
| if not fp8_enabled: | ||
| fp8_layers = None | ||
| if not fp4_enabled: | ||
| fp4_layers = None | ||
|
|
||
| # Validate no overlap | ||
| if fp8_layers and fp4_layers: | ||
| overlap = set(fp8_layers) & set(fp4_layers) | ||
| if overlap: | ||
| raise ValueError(f"Overlapping layers: {overlap}") | ||
|
|
||
| fp8_set = set(fp8_layers) if fp8_layers else set() | ||
| fp4_set = set(fp4_layers) if fp4_layers else set() | ||
| return ["fp8" if i in fp8_set else "fp4" if i in fp4_set else None for i in range(1, num_layers + 1)] |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Copilot Autofix
AI 28 days ago
In general, the fix is to explicitly define a
permissionsblock in the workflow or job to restrict theGITHUB_TOKENto the least privileges needed. This job only checks out code and runs tests, so it should only require read access to repository contents.The best fix without changing functionality is to add a top-level
permissionssection (so it applies to all jobs) immediately after thename:declaration in.github/workflows/integration-tests-claude.yml, specifyingcontents: read. This matches the minimal suggestion from CodeQL and GitHub, and does not interfere with the existing steps (actions/checkout,npm install,pip install,pytest, all of which run locally in the container). No new imports or external dependencies are required; we are only changing the YAML configuration of the workflow.Concretely:
.github/workflows/integration-tests-claude.yml.after line 1 (
name: Claude Integration Tests) and before theon:block. No other lines need to be modified.