feat: clean up orphaned MDX files when notebooks are deleted or renamed#118
feat: clean up orphaned MDX files when notebooks are deleted or renamed#118
Conversation
…n permissions Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
- Rewrite convert_notebook_to_mdx.py to use nbconvert for proper output rendering - Add sanitization to remove script/style tags and escape JSX special chars - Fix execute_notebooks.py to use kernel_name=None for Wherobots runtime - Add nbconvert>=7.0.0 and ipykernel dependencies to workflow
- Remove notebook execution (no longer needed) - Convert notebooks to MDX with source code only (no outputs) - Exclude Raster_Inference notebooks automatically - Convert filenames from underscores to dashes - Add docs/ folder with mint.json and 26 MDX notebooks - Auto-update Mintlify navigation on each conversion - Organize notebooks into categories: Getting Started, Analyzing Data, RasterFlow, Reading and Writing Data, Open Data Connections, Scala
- Remove docs/ folder from wherobots-examples (MDX lives in wherobots/docs) - Update workflow to checkout and push to wherobots/docs repo - Place MDX files in tutorials/example-notebooks/ folder - Update docs.json navigation under 'Spatial Analytics Tutorials' tab - Requires DOCS_REPO_TOKEN secret with write access to wherobots/docs
- Copy local file images (./assets/img/...) to docs repo - Extract embedded attachment images (base64) from notebooks - Save images to tutorials/example-notebooks/images/ - Update image paths in MDX to absolute paths from docs root - Prefix image filenames with notebook name to avoid collisions
- Add cleanup_orphaned_mdx.py script to detect and remove orphaned MDX files - Script also removes associated images (prefixed with notebook slug) - Run cleanup step before conversion in workflow - Use git add -A to properly stage deletions - Add cleanup script to workflow trigger paths - Integrate cleanup target into Makefile (runs before convert in all/preview)
|
Review these changes at https://app.gitnotebooks.com/wherobots/wherobots-examples/pull/118 |
Resolve conflicts by keeping cleanup functionality: - Add cleanup step to workflow - Add cleanup target to Makefile - Use git add -A for staging deletions
- Add --output-file option to cleanup script - Capture deleted file names and pass to PR creation step - Show 'Deleted MDX Files' section in PR body when notebooks are removed
There was a problem hiding this comment.
Pull request overview
This PR adds automated cleanup of orphaned MDX files when notebooks are deleted or renamed in the wherobots-examples repository. Previously, when notebooks were removed or renamed, their corresponding MDX documentation and images remained in the docs repo, creating stale content.
Changes:
- Adds a cleanup script that compares existing MDX files against source notebooks and removes orphans
- Updates the GitHub Actions workflow to run cleanup before conversion and track deleted files in PR descriptions
- Updates the Makefile to include cleanup in local development workflows
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| .github/workflows/scripts/cleanup_orphaned_mdx.py | New script implementing orphaned file detection and removal logic |
| .github/workflows/convert-notebooks.yml | Adds cleanup step before conversion and includes deletion tracking in PR body |
| Makefile | Adds cleanup target and integrates it into preview/all workflows |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Remove redundant list comprehension in cleanup script - Remove unused DELETED_LIST variable in workflow
… comments - Replace hardcoded folder list with find command to auto-discover directories - Remove self-explanatory comments from workflow (configure git, push branch, etc.) - Makefile now also uses dynamic discovery via shell find command
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from pathlib import Path | ||
|
|
||
|
|
||
| def get_expected_mdx_names(notebook_dirs: list[Path], exclude_prefix: str) -> set[str]: |
There was a problem hiding this comment.
The exclude_prefix parameter should be Optional[str] since line 30 checks if exclude_prefix which suggests it can be None or empty. Update the type hint to Optional[str] and import Optional from typing.
| python wherobots-examples/.github/workflows/scripts/cleanup_orphaned_mdx.py \ | ||
| ${{ steps.find-dirs.outputs.dirs }} \ | ||
| --mdx-dir docs/tutorials/example-notebooks \ | ||
| --exclude-prefix Raster_Inference \ | ||
| --output-file /tmp/removed-mdx.txt \ | ||
| -v | ||
|
|
||
| if [ -f /tmp/removed-mdx.txt ] && [ -s /tmp/removed-mdx.txt ]; then | ||
| echo "has_deletions=true" >> $GITHUB_OUTPUT | ||
| echo "deleted_files<<EOF" >> $GITHUB_OUTPUT | ||
| cat /tmp/removed-mdx.txt | sed 's/^/- /' >> $GITHUB_OUTPUT |
There was a problem hiding this comment.
Using a hardcoded path in /tmp could cause issues in environments where /tmp has restricted permissions or in concurrent workflow runs. Consider using mktemp to create a unique temporary file or use $RUNNER_TEMP which is a GitHub Actions environment variable for temporary files.
| python wherobots-examples/.github/workflows/scripts/cleanup_orphaned_mdx.py \ | |
| ${{ steps.find-dirs.outputs.dirs }} \ | |
| --mdx-dir docs/tutorials/example-notebooks \ | |
| --exclude-prefix Raster_Inference \ | |
| --output-file /tmp/removed-mdx.txt \ | |
| -v | |
| if [ -f /tmp/removed-mdx.txt ] && [ -s /tmp/removed-mdx.txt ]; then | |
| echo "has_deletions=true" >> $GITHUB_OUTPUT | |
| echo "deleted_files<<EOF" >> $GITHUB_OUTPUT | |
| cat /tmp/removed-mdx.txt | sed 's/^/- /' >> $GITHUB_OUTPUT | |
| REMOVED_MDX_FILE="$(mktemp "${RUNNER_TEMP:-/tmp}/removed-mdx.XXXXXX")" | |
| python wherobots-examples/.github/workflows/scripts/cleanup_orphaned_mdx.py \ | |
| ${{ steps.find-dirs.outputs.dirs }} \ | |
| --mdx-dir docs/tutorials/example-notebooks \ | |
| --exclude-prefix Raster_Inference \ | |
| --output-file "$REMOVED_MDX_FILE" \ | |
| -v | |
| if [ -f "$REMOVED_MDX_FILE" ] && [ -s "$REMOVED_MDX_FILE" ]; then | |
| echo "has_deletions=true" >> $GITHUB_OUTPUT | |
| echo "deleted_files<<EOF" >> $GITHUB_OUTPUT | |
| cat "$REMOVED_MDX_FILE" | sed 's/^/- /' >> $GITHUB_OUTPUT |
| # Notebook directories to convert | ||
| NOTEBOOK_DIRS = Getting_Started/ Analyzing_Data/ Reading_and_Writing_Data/ Open_Data_Connections/ scala/ | ||
| # Dynamically find all directories containing notebooks | ||
| NOTEBOOK_DIRS = $(shell find . -name "*.ipynb" -type f | xargs -I {} dirname {} | sort -u | grep -v ".ipynb_checkpoints") |
There was a problem hiding this comment.
The shell command will fail if there are no notebooks found (xargs will receive no input). Consider adding find ... -print0 | xargs -0 for better handling of filenames with spaces, and add 2>/dev/null || true to handle the case when no notebooks exist gracefully.
| NOTEBOOK_DIRS = $(shell find . -name "*.ipynb" -type f | xargs -I {} dirname {} | sort -u | grep -v ".ipynb_checkpoints") | |
| NOTEBOOK_DIRS = $(shell find . -name "*.ipynb" -type f -print0 2>/dev/null | xargs -0 -I {} dirname "{}" 2>/dev/null | sort -u | grep -v ".ipynb_checkpoints" || true) |
…CI enforcement) Combines the work from PR #125 and PR #118: - Add cleanup_orphaned_mdx.py script to remove orphaned MDX files when notebooks are deleted or renamed - Add check-notebooks.yml CI workflow to enforce that PRs modifying notebooks also update the navigation config - Dynamically discover notebook directories instead of hardcoding them - Use git add -A to stage deletions in the docs sync workflow - Include deleted file list in auto-generated docs PR descriptions - Add cleanup target to Makefile, run it before convert in preview/all - Update README to reflect NOTEBOOK_LOCATIONS config and document the orphan cleanup behavior
|
Closing in favor of #129 |
Summary
Fixes a gap in PR #115 where deleted or moved notebooks were not automatically cleaned up from the docs repo.
wherobots-examples, the corresponding MDX file and images remain inwherobots/docs, creating orphaned content.Changes
Add
cleanup_orphaned_mdx.pyscript that:--dry-runfor testingUpdate workflow (
convert-notebooks.yml):git add -Ato stage deletionsUpdate
Makefile:make cleanuptargetmake previewandmake allnow run cleanup before convertBehavior
Testing
Run locally with
make cleanup --dry-runto preview what would be removed (once--dry-runis passed through).