new workflow action to prevent misc categorization of new notebooks#125
new workflow action to prevent misc categorization of new notebooks#125
Conversation
…dated, and navigation is correctly mapped
|
Review these changes at https://app.gitnotebooks.com/wherobots/wherobots-examples/pull/125 |
…n permissions Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds enforcement and clearer guidance to ensure new/modified notebooks are correctly categorized in docs navigation, preventing accidental fallback into an “Other” bucket.
Changes:
- Documented the notebook→MDX publishing workflow and instructions for updating navigation mappings.
- Switched docs navigation from prefix-based categorization to an explicit filename→category mapping.
- Added a PR workflow to fail when notebooks change without updating the navigation config; normalized MDX output filenames to lowercase.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Documents docs publishing pipeline and required mapping update steps for new notebooks. |
| Makefile | Makes convert depend on clean to ensure a fresh conversion output. |
| .github/workflows/scripts/update_docs_navigation.py | Replaces pattern matching with explicit filename→category mapping and simplifies categorization. |
| .github/workflows/scripts/notebook_to_mdx.py | Normalizes generated MDX filenames to lowercase to align with mapping expectations. |
| .github/workflows/check-notebooks.yml | Adds CI guardrails ensuring notebook changes accompany navigation config updates. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| BASE_BRANCH: ${{ github.base_ref }} | ||
| run: | | ||
| echo "Comparing HEAD against origin/$BASE_BRANCH..." | ||
|
|
||
| # Get list of changed files between the PR branch and the base branch | ||
| CHANGED_FILES=$(git diff --name-only origin/$BASE_BRANCH HEAD) |
There was a problem hiding this comment.
origin/$BASE_BRANCH may not exist in the runner clone (even with fetch-depth: 0), which can cause git diff to fail or return incomplete results. Prefer using the PR-provided SHAs (base SHA vs head SHA) or explicitly git fetch origin $BASE_BRANCH before diffing.
| BASE_BRANCH: ${{ github.base_ref }} | |
| run: | | |
| echo "Comparing HEAD against origin/$BASE_BRANCH..." | |
| # Get list of changed files between the PR branch and the base branch | |
| CHANGED_FILES=$(git diff --name-only origin/$BASE_BRANCH HEAD) | |
| BASE_SHA: ${{ github.event.pull_request.base.sha }} | |
| HEAD_SHA: ${{ github.event.pull_request.head.sha }} | |
| run: | | |
| echo "Comparing changes between BASE_SHA=$BASE_SHA and HEAD_SHA=$HEAD_SHA..." | |
| # Get list of changed files between the PR base commit and the PR head commit | |
| CHANGED_FILES=$(git diff --name-only "$BASE_SHA" "$HEAD_SHA") |
| # Extract filenames without extensions for comparison | ||
| HAS_IPYNB=$(echo "$CHANGED_FILES" | grep '\.ipynb$' | sed 's/\.ipynb$//' || true) |
There was a problem hiding this comment.
The mapping keys in update_docs_navigation.py are documented/implemented as basename, underscore→hyphen, lowercase. This check currently compares the path without extension (e.g., some/dir/My_New.ipynb → some/dir/My_New) against the config file, which will frequently fail even when the mapping exists. Also, grep -q treats the input as a regex; notebook names containing regex metacharacters (e.g., *) can produce incorrect matches. Normalize notebook names to the same canonical form used in conversion/mapping (strip directories, replace _ with -, lowercase) and use fixed-string matching (grep -Fq) or parse the dict keys more robustly.
| while IFS= read -r notebook; do | ||
| if ! grep -q "$notebook" <<< "$CONFIG_CONTENT"; then | ||
| MISSING_MAPPING="$MISSING_MAPPING\n$notebook" | ||
| fi | ||
| done <<< "$HAS_IPYNB" |
There was a problem hiding this comment.
The mapping keys in update_docs_navigation.py are documented/implemented as basename, underscore→hyphen, lowercase. This check currently compares the path without extension (e.g., some/dir/My_New.ipynb → some/dir/My_New) against the config file, which will frequently fail even when the mapping exists. Also, grep -q treats the input as a regex; notebook names containing regex metacharacters (e.g., *) can produce incorrect matches. Normalize notebook names to the same canonical form used in conversion/mapping (strip directories, replace _ with -, lowercase) and use fixed-string matching (grep -Fq) or parse the dict keys more robustly.
|
|
||
| # If there are notebooks without mappings, fail the build | ||
| if [[ -n "$MISSING_MAPPING" ]]; then | ||
| echo "::error::⛔ FAILURE: The following notebooks are missing mappings in $CONFIG_FILE_PATH:$MISSING_MAPPING" |
There was a problem hiding this comment.
GitHub Actions ::error:: annotations don’t reliably render embedded \\n line breaks inside a single message, so $MISSING_MAPPING may appear as a literal \\n or as a hard-to-read single line. Consider emitting one ::error:: per missing notebook (or printing the list as normal log lines and keeping the annotation concise).
| echo "::error::⛔ FAILURE: The following notebooks are missing mappings in $CONFIG_FILE_PATH:$MISSING_MAPPING" | |
| echo "⛔ FAILURE: The following notebooks are missing mappings in $CONFIG_FILE_PATH:" | |
| # Interpret the '\n' escape sequences in MISSING_MAPPING as actual newlines | |
| printf '%b\n' "$MISSING_MAPPING" | |
| echo "::error::⛔ FAILURE: One or more notebooks are missing mappings in $CONFIG_FILE_PATH. See the list above for details." |
| @@ -46,13 +64,7 @@ | |||
|
|
|||
| def get_category(filename: str) -> str: | |||
| """Determine category for a notebook based on filename.""" | |||
There was a problem hiding this comment.
notebook_to_mdx.py now lowercases output filenames, and the mapping keys are lowercase; however, get_category assumes the caller always passes a lowercased stem. If any upstream caller passes a non-normalized stem (e.g., pre-existing MDX files with mixed case, or different naming), categorization will silently fall back to Other. Normalize at the boundary (e.g., filename = filename.lower()) before the lookup to make categorization consistent.
| """Determine category for a notebook based on filename.""" | |
| """Determine category for a notebook based on filename.""" | |
| filename = filename.lower() |
| "part-4-spatial-joins": "Getting Started", | ||
| # Analyzing Data | ||
| "clustering-dbscan": "Analyzing Data", | ||
| "getis-ord-gi*": "Analyzing Data", |
There was a problem hiding this comment.
Including * in a filename key implies the corresponding generated MDX/Notebook stem may contain *, which is error-prone in shells, globbing contexts, URLs, and some tooling. If the underlying notebook really is named with *, consider renaming it to a safer token (e.g., getis-ord-gi-star) and updating the mapping accordingly; otherwise, update the key to the actual normalized filename to avoid confusing future contributors.
| "getis-ord-gi*": "Analyzing Data", | |
| "getis-ord-gi-star": "Analyzing Data", |
…CI enforcement) Combines the work from PR #125 and PR #118: - Add cleanup_orphaned_mdx.py script to remove orphaned MDX files when notebooks are deleted or renamed - Add check-notebooks.yml CI workflow to enforce that PRs modifying notebooks also update the navigation config - Dynamically discover notebook directories instead of hardcoding them - Use git add -A to stage deletions in the docs sync workflow - Include deleted file list in auto-generated docs PR descriptions - Add cleanup target to Makefile, run it before convert in preview/all - Update README to reflect NOTEBOOK_LOCATIONS config and document the orphan cleanup behavior
|
Closing in favor of #129 |
No description provided.