Skip to content

new workflow action to prevent misc categorization of new notebooks#125

Closed
kadolor wants to merge 3 commits intomainfrom
add-new-workflow-action
Closed

new workflow action to prevent misc categorization of new notebooks#125
kadolor wants to merge 3 commits intomainfrom
add-new-workflow-action

Conversation

@kadolor
Copy link
Copy Markdown
Contributor

@kadolor kadolor commented Feb 6, 2026

No description provided.

@gitnotebooks
Copy link
Copy Markdown

gitnotebooks bot commented Feb 6, 2026

@kadolor kadolor self-assigned this Feb 6, 2026
Comment thread .github/workflows/check-notebooks.yml Fixed
@kadolor kadolor marked this pull request as draft February 6, 2026 23:43
…n permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds enforcement and clearer guidance to ensure new/modified notebooks are correctly categorized in docs navigation, preventing accidental fallback into an “Other” bucket.

Changes:

  • Documented the notebook→MDX publishing workflow and instructions for updating navigation mappings.
  • Switched docs navigation from prefix-based categorization to an explicit filename→category mapping.
  • Added a PR workflow to fail when notebooks change without updating the navigation config; normalized MDX output filenames to lowercase.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
README.md Documents docs publishing pipeline and required mapping update steps for new notebooks.
Makefile Makes convert depend on clean to ensure a fresh conversion output.
.github/workflows/scripts/update_docs_navigation.py Replaces pattern matching with explicit filename→category mapping and simplifies categorization.
.github/workflows/scripts/notebook_to_mdx.py Normalizes generated MDX filenames to lowercase to align with mapping expectations.
.github/workflows/check-notebooks.yml Adds CI guardrails ensuring notebook changes accompany navigation config updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +25 to +30
BASE_BRANCH: ${{ github.base_ref }}
run: |
echo "Comparing HEAD against origin/$BASE_BRANCH..."

# Get list of changed files between the PR branch and the base branch
CHANGED_FILES=$(git diff --name-only origin/$BASE_BRANCH HEAD)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

origin/$BASE_BRANCH may not exist in the runner clone (even with fetch-depth: 0), which can cause git diff to fail or return incomplete results. Prefer using the PR-provided SHAs (base SHA vs head SHA) or explicitly git fetch origin $BASE_BRANCH before diffing.

Suggested change
BASE_BRANCH: ${{ github.base_ref }}
run: |
echo "Comparing HEAD against origin/$BASE_BRANCH..."
# Get list of changed files between the PR branch and the base branch
CHANGED_FILES=$(git diff --name-only origin/$BASE_BRANCH HEAD)
BASE_SHA: ${{ github.event.pull_request.base.sha }}
HEAD_SHA: ${{ github.event.pull_request.head.sha }}
run: |
echo "Comparing changes between BASE_SHA=$BASE_SHA and HEAD_SHA=$HEAD_SHA..."
# Get list of changed files between the PR base commit and the PR head commit
CHANGED_FILES=$(git diff --name-only "$BASE_SHA" "$HEAD_SHA")

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +33
# Extract filenames without extensions for comparison
HAS_IPYNB=$(echo "$CHANGED_FILES" | grep '\.ipynb$' | sed 's/\.ipynb$//' || true)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping keys in update_docs_navigation.py are documented/implemented as basename, underscore→hyphen, lowercase. This check currently compares the path without extension (e.g., some/dir/My_New.ipynbsome/dir/My_New) against the config file, which will frequently fail even when the mapping exists. Also, grep -q treats the input as a regex; notebook names containing regex metacharacters (e.g., *) can produce incorrect matches. Normalize notebook names to the same canonical form used in conversion/mapping (strip directories, replace _ with -, lowercase) and use fixed-string matching (grep -Fq) or parse the dict keys more robustly.

Copilot uses AI. Check for mistakes.
Comment on lines +56 to +60
while IFS= read -r notebook; do
if ! grep -q "$notebook" <<< "$CONFIG_CONTENT"; then
MISSING_MAPPING="$MISSING_MAPPING\n$notebook"
fi
done <<< "$HAS_IPYNB"
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping keys in update_docs_navigation.py are documented/implemented as basename, underscore→hyphen, lowercase. This check currently compares the path without extension (e.g., some/dir/My_New.ipynbsome/dir/My_New) against the config file, which will frequently fail even when the mapping exists. Also, grep -q treats the input as a regex; notebook names containing regex metacharacters (e.g., *) can produce incorrect matches. Normalize notebook names to the same canonical form used in conversion/mapping (strip directories, replace _ with -, lowercase) and use fixed-string matching (grep -Fq) or parse the dict keys more robustly.

Copilot uses AI. Check for mistakes.

# If there are notebooks without mappings, fail the build
if [[ -n "$MISSING_MAPPING" ]]; then
echo "::error::⛔ FAILURE: The following notebooks are missing mappings in $CONFIG_FILE_PATH:$MISSING_MAPPING"
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub Actions ::error:: annotations don’t reliably render embedded \\n line breaks inside a single message, so $MISSING_MAPPING may appear as a literal \\n or as a hard-to-read single line. Consider emitting one ::error:: per missing notebook (or printing the list as normal log lines and keeping the annotation concise).

Suggested change
echo "::error::⛔ FAILURE: The following notebooks are missing mappings in $CONFIG_FILE_PATH:$MISSING_MAPPING"
echo "⛔ FAILURE: The following notebooks are missing mappings in $CONFIG_FILE_PATH:"
# Interpret the '\n' escape sequences in MISSING_MAPPING as actual newlines
printf '%b\n' "$MISSING_MAPPING"
echo "::error::⛔ FAILURE: One or more notebooks are missing mappings in $CONFIG_FILE_PATH. See the list above for details."

Copilot uses AI. Check for mistakes.
@@ -46,13 +64,7 @@

def get_category(filename: str) -> str:
"""Determine category for a notebook based on filename."""
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notebook_to_mdx.py now lowercases output filenames, and the mapping keys are lowercase; however, get_category assumes the caller always passes a lowercased stem. If any upstream caller passes a non-normalized stem (e.g., pre-existing MDX files with mixed case, or different naming), categorization will silently fall back to Other. Normalize at the boundary (e.g., filename = filename.lower()) before the lookup to make categorization consistent.

Suggested change
"""Determine category for a notebook based on filename."""
"""Determine category for a notebook based on filename."""
filename = filename.lower()

Copilot uses AI. Check for mistakes.
"part-4-spatial-joins": "Getting Started",
# Analyzing Data
"clustering-dbscan": "Analyzing Data",
"getis-ord-gi*": "Analyzing Data",
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including * in a filename key implies the corresponding generated MDX/Notebook stem may contain *, which is error-prone in shells, globbing contexts, URLs, and some tooling. If the underlying notebook really is named with *, consider renaming it to a safer token (e.g., getis-ord-gi-star) and updating the mapping accordingly; otherwise, update the key to the actual normalized filename to avoid confusing future contributors.

Suggested change
"getis-ord-gi*": "Analyzing Data",
"getis-ord-gi-star": "Analyzing Data",

Copilot uses AI. Check for mistakes.
kadolor added a commit that referenced this pull request Feb 21, 2026
…CI enforcement)

Combines the work from PR #125 and PR #118:

- Add cleanup_orphaned_mdx.py script to remove orphaned MDX files when
  notebooks are deleted or renamed
- Add check-notebooks.yml CI workflow to enforce that PRs modifying
  notebooks also update the navigation config
- Dynamically discover notebook directories instead of hardcoding them
- Use git add -A to stage deletions in the docs sync workflow
- Include deleted file list in auto-generated docs PR descriptions
- Add cleanup target to Makefile, run it before convert in preview/all
- Update README to reflect NOTEBOOK_LOCATIONS config and document
  the orphan cleanup behavior
@kadolor
Copy link
Copy Markdown
Contributor Author

kadolor commented Feb 21, 2026

Closing in favor of #129

@kadolor kadolor closed this Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants