Skip to content

feat: clean up orphaned MDX files when notebooks are deleted or renamed#118

Closed
kadolor wants to merge 35 commits intomainfrom
cleanup-orphaned-notebooks
Closed

feat: clean up orphaned MDX files when notebooks are deleted or renamed#118
kadolor wants to merge 35 commits intomainfrom
cleanup-orphaned-notebooks

Conversation

@kadolor
Copy link
Copy Markdown
Contributor

@kadolor kadolor commented Jan 28, 2026

Summary

Fixes a gap in PR #115 where deleted or moved notebooks were not automatically cleaned up from the docs repo.

  • Problem: When a notebook is deleted or renamed in wherobots-examples, the corresponding MDX file and images remain in wherobots/docs, creating orphaned content.
  • Solution: Add a cleanup step that runs before conversion to detect and remove orphaned MDX files.

Changes

  • Add cleanup_orphaned_mdx.py script that:

    • Compares existing MDX files against source notebooks
    • Removes orphaned MDX files (no corresponding notebook)
    • Removes associated images (prefixed with notebook slug)
    • Supports --dry-run for testing
  • Update workflow (convert-notebooks.yml):

    • Run cleanup step before conversion
    • Use git add -A to stage deletions
    • Add cleanup script to trigger paths
  • Update Makefile:

    • Add make cleanup target
    • make preview and make all now run cleanup before convert

Behavior

Scenario Before After
Notebook deleted Orphaned MDX remains MDX and images removed
Notebook renamed Old + new MDX exist Old MDX removed, new created

Testing

Run locally with make cleanup --dry-run to preview what would be removed (once --dry-run is passed through).

kadolor and others added 30 commits January 14, 2026 14:11
…n permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
- Rewrite convert_notebook_to_mdx.py to use nbconvert for proper output rendering
- Add sanitization to remove script/style tags and escape JSX special chars
- Fix execute_notebooks.py to use kernel_name=None for Wherobots runtime
- Add nbconvert>=7.0.0 and ipykernel dependencies to workflow
- Remove notebook execution (no longer needed)
- Convert notebooks to MDX with source code only (no outputs)
- Exclude Raster_Inference notebooks automatically
- Convert filenames from underscores to dashes
- Add docs/ folder with mint.json and 26 MDX notebooks
- Auto-update Mintlify navigation on each conversion
- Organize notebooks into categories: Getting Started, Analyzing Data,
  RasterFlow, Reading and Writing Data, Open Data Connections, Scala
- Remove docs/ folder from wherobots-examples (MDX lives in wherobots/docs)
- Update workflow to checkout and push to wherobots/docs repo
- Place MDX files in tutorials/example-notebooks/ folder
- Update docs.json navigation under 'Spatial Analytics Tutorials' tab
- Requires DOCS_REPO_TOKEN secret with write access to wherobots/docs
- Copy local file images (./assets/img/...) to docs repo
- Extract embedded attachment images (base64) from notebooks
- Save images to tutorials/example-notebooks/images/
- Update image paths in MDX to absolute paths from docs root
- Prefix image filenames with notebook name to avoid collisions
- Add cleanup_orphaned_mdx.py script to detect and remove orphaned MDX files
- Script also removes associated images (prefixed with notebook slug)
- Run cleanup step before conversion in workflow
- Use git add -A to properly stage deletions
- Add cleanup script to workflow trigger paths
- Integrate cleanup target into Makefile (runs before convert in all/preview)
@gitnotebooks
Copy link
Copy Markdown

gitnotebooks bot commented Jan 28, 2026

Resolve conflicts by keeping cleanup functionality:
- Add cleanup step to workflow
- Add cleanup target to Makefile
- Use git add -A for staging deletions
- Add --output-file option to cleanup script
- Capture deleted file names and pass to PR creation step
- Show 'Deleted MDX Files' section in PR body when notebooks are removed
@kadolor kadolor requested a review from Copilot January 28, 2026 19:51
@kadolor kadolor self-assigned this Jan 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds automated cleanup of orphaned MDX files when notebooks are deleted or renamed in the wherobots-examples repository. Previously, when notebooks were removed or renamed, their corresponding MDX documentation and images remained in the docs repo, creating stale content.

Changes:

  • Adds a cleanup script that compares existing MDX files against source notebooks and removes orphans
  • Updates the GitHub Actions workflow to run cleanup before conversion and track deleted files in PR descriptions
  • Updates the Makefile to include cleanup in local development workflows

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
.github/workflows/scripts/cleanup_orphaned_mdx.py New script implementing orphaned file detection and removal logic
.github/workflows/convert-notebooks.yml Adds cleanup step before conversion and includes deletion tracking in PR body
Makefile Adds cleanup target and integrates it into preview/all workflows

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/scripts/cleanup_orphaned_mdx.py Outdated
Comment thread .github/workflows/convert-notebooks.yml Outdated
- Remove redundant list comprehension in cleanup script
- Remove unused DELETED_LIST variable in workflow
@kadolor kadolor requested a review from rbavery January 28, 2026 19:54
Comment thread .github/workflows/convert-notebooks.yml
Comment thread .github/workflows/convert-notebooks.yml Outdated
… comments

- Replace hardcoded folder list with find command to auto-discover directories
- Remove self-explanatory comments from workflow (configure git, push branch, etc.)
- Makefile now also uses dynamic discovery via shell find command
@kadolor kadolor requested review from Copilot and rbavery January 28, 2026 20:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

from pathlib import Path


def get_expected_mdx_names(notebook_dirs: list[Path], exclude_prefix: str) -> set[str]:
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exclude_prefix parameter should be Optional[str] since line 30 checks if exclude_prefix which suggests it can be None or empty. Update the type hint to Optional[str] and import Optional from typing.

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/scripts/cleanup_orphaned_mdx.py
Comment on lines +51 to +61
python wherobots-examples/.github/workflows/scripts/cleanup_orphaned_mdx.py \
${{ steps.find-dirs.outputs.dirs }} \
--mdx-dir docs/tutorials/example-notebooks \
--exclude-prefix Raster_Inference \
--output-file /tmp/removed-mdx.txt \
-v

if [ -f /tmp/removed-mdx.txt ] && [ -s /tmp/removed-mdx.txt ]; then
echo "has_deletions=true" >> $GITHUB_OUTPUT
echo "deleted_files<<EOF" >> $GITHUB_OUTPUT
cat /tmp/removed-mdx.txt | sed 's/^/- /' >> $GITHUB_OUTPUT
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a hardcoded path in /tmp could cause issues in environments where /tmp has restricted permissions or in concurrent workflow runs. Consider using mktemp to create a unique temporary file or use $RUNNER_TEMP which is a GitHub Actions environment variable for temporary files.

Suggested change
python wherobots-examples/.github/workflows/scripts/cleanup_orphaned_mdx.py \
${{ steps.find-dirs.outputs.dirs }} \
--mdx-dir docs/tutorials/example-notebooks \
--exclude-prefix Raster_Inference \
--output-file /tmp/removed-mdx.txt \
-v
if [ -f /tmp/removed-mdx.txt ] && [ -s /tmp/removed-mdx.txt ]; then
echo "has_deletions=true" >> $GITHUB_OUTPUT
echo "deleted_files<<EOF" >> $GITHUB_OUTPUT
cat /tmp/removed-mdx.txt | sed 's/^/- /' >> $GITHUB_OUTPUT
REMOVED_MDX_FILE="$(mktemp "${RUNNER_TEMP:-/tmp}/removed-mdx.XXXXXX")"
python wherobots-examples/.github/workflows/scripts/cleanup_orphaned_mdx.py \
${{ steps.find-dirs.outputs.dirs }} \
--mdx-dir docs/tutorials/example-notebooks \
--exclude-prefix Raster_Inference \
--output-file "$REMOVED_MDX_FILE" \
-v
if [ -f "$REMOVED_MDX_FILE" ] && [ -s "$REMOVED_MDX_FILE" ]; then
echo "has_deletions=true" >> $GITHUB_OUTPUT
echo "deleted_files<<EOF" >> $GITHUB_OUTPUT
cat "$REMOVED_MDX_FILE" | sed 's/^/- /' >> $GITHUB_OUTPUT

Copilot uses AI. Check for mistakes.
Comment thread Makefile
# Notebook directories to convert
NOTEBOOK_DIRS = Getting_Started/ Analyzing_Data/ Reading_and_Writing_Data/ Open_Data_Connections/ scala/
# Dynamically find all directories containing notebooks
NOTEBOOK_DIRS = $(shell find . -name "*.ipynb" -type f | xargs -I {} dirname {} | sort -u | grep -v ".ipynb_checkpoints")
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shell command will fail if there are no notebooks found (xargs will receive no input). Consider adding find ... -print0 | xargs -0 for better handling of filenames with spaces, and add 2>/dev/null || true to handle the case when no notebooks exist gracefully.

Suggested change
NOTEBOOK_DIRS = $(shell find . -name "*.ipynb" -type f | xargs -I {} dirname {} | sort -u | grep -v ".ipynb_checkpoints")
NOTEBOOK_DIRS = $(shell find . -name "*.ipynb" -type f -print0 2>/dev/null | xargs -0 -I {} dirname "{}" 2>/dev/null | sort -u | grep -v ".ipynb_checkpoints" || true)

Copilot uses AI. Check for mistakes.
kadolor added a commit that referenced this pull request Feb 21, 2026
…CI enforcement)

Combines the work from PR #125 and PR #118:

- Add cleanup_orphaned_mdx.py script to remove orphaned MDX files when
  notebooks are deleted or renamed
- Add check-notebooks.yml CI workflow to enforce that PRs modifying
  notebooks also update the navigation config
- Dynamically discover notebook directories instead of hardcoding them
- Use git add -A to stage deletions in the docs sync workflow
- Include deleted file list in auto-generated docs PR descriptions
- Add cleanup target to Makefile, run it before convert in preview/all
- Update README to reflect NOTEBOOK_LOCATIONS config and document
  the orphan cleanup behavior
@kadolor
Copy link
Copy Markdown
Contributor Author

kadolor commented Feb 21, 2026

Closing in favor of #129

@kadolor kadolor closed this Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants