Skip to content

perf: cache metadata directory scans for large workflows#58

Open
yfarjoun wants to merge 7 commits intofg-labs:mainfrom
yfarjoun:yf_performance-thread-investigation
Open

perf: cache metadata directory scans for large workflows#58
yfarjoun wants to merge 7 commits intofg-labs:mainfrom
yfarjoun:yf_performance-thread-investigation

Conversation

@yfarjoun
Copy link
Copy Markdown
Contributor

@yfarjoun yfarjoun commented Mar 31, 2026

Summary

  • Each poll cycle (default 1s) was recursively stat-ing every file in .snakemake/metadata/. With 19k+ files this caused significant sluggishness.
  • Added _ScanCache that tracks directory mtimes (~256 dirs) and only rescans directories that have actually changed.
  • Reduces steady-state polling from ~19k stat calls to ~256 per cycle (~75x reduction).

Test plan

  • All 978 tests pass
  • Manual testing with a large workflow directory (19k+ metadata files) to confirm responsiveness improvement

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactor

    • Improved directory-scan performance with an incremental scan cache that preserves results and rescans only changed directories.
    • Added an option to bypass the cache for always-fresh scans.
    • More robust handling of directory changes and targeted invalidation to avoid unnecessary work.
  • Tests

    • Added comprehensive tests validating cache hits, invalidation on filesystem changes, targeted rescans, clearing behavior, and cross-root scenarios.

…f files

Each poll cycle (default 1s) was recursively scanning and stat-ing every
file in .snakemake/metadata/. With 19k+ metadata files this caused
significant CPU and I/O overhead.

The new _ScanCache tracks directory mtimes (~256 dirs) and only rescans
directories that have actually changed, reducing steady-state polling
from ~19k stat calls to ~256.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yfarjoun yfarjoun requested a review from nh13 as a code owner March 31, 2026 15:40
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a directory-mtime-backed incremental scan cache to snakesee/utils.py; _scandir_files can now use the cache to avoid rescanning unchanged directories. Tests in tests/test_utils.py add fixtures and a TestScanCache suite validating cache reuse, eviction, rescans, and clear semantics.

Changes

Cohort / File(s) Summary
Directory scan & cache
snakesee/utils.py
Added _ScanCache and get_scan_cache(); added _collect_dir_mtimes, _scan_single_dir, _full_scandir; changed _scandir_files to _scandir_files(directory: Path, *, use_scan_cache: bool = True) -> Sequence[_MetadataFileInfo] and updated sorting and iterator type hints.
Tests for scan cache
tests/test_utils.py
Added autouse fixtures clearing global scan/metadata caches; new TestScanCache tests for cache identity reuse, mtime-triggered rescans, partial eviction on removed subdirs, root switching behavior, and clear() semantics; adjusted existing iterate tests to use cache flag and Sequence types.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant ScanCache as _ScanCache
    participant FS as Filesystem

    Caller->>ScanCache: request scan for root (use_cache?)
    alt cache miss or use_cache == False
        ScanCache->>FS: full recursive scan
        FS-->>ScanCache: files + dir mtimes
        ScanCache-->>Caller: full file list
    else cache hit
        ScanCache->>FS: collect dir mtimes (per-directory)
        FS-->>ScanCache: current dir mtimes
        ScanCache->>ScanCache: compare mtimes -> changed/removed dirs
        alt changed or removed dirs found
            ScanCache->>FS: scan changed dirs (non-recursive)
            FS-->>ScanCache: updated file metadata
            ScanCache-->>Caller: merged updated + cached file list
        else no changes detected
            ScanCache-->>Caller: cached file list
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇
I hopped through folders, counted each time,
Saved tiny ticks so rescans take less climb,
If mtimes changed, I rustled once more,
Else I returned the list I stored before —
Nibble, bound, and scan some more! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description includes a clear summary of the problem, solution, and impact, but is missing several required template sections like Type of Change, Checklist items, Testing details, and Related Issues. Complete the description template by filling in Type of Change (mark appropriate checkbox), Checklist (verify all items completed), Related Issues (link #58 or remove if not applicable), and expand Testing section with manual testing confirmation once completed.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title concisely and accurately summarizes the main change: adding caching for metadata directory scans to improve performance in large workflows.
Docstring Coverage ✅ Passed Docstring coverage is 92.59% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
snakesee/utils.py (1)

210-210: Sort __slots__ alphabetically.

Static analysis flags that __slots__ is not sorted.

-    __slots__ = ("_files", "_dir_mtimes", "_lock")
+    __slots__ = ("_dir_mtimes", "_files", "_lock")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakesee/utils.py` at line 210, The __slots__ tuple in the class is not
alphabetically ordered; update the __slots__ declaration to list its members
alphabetically (e.g., order "_dir_mtimes", "_files", "_lock") so the __slots__ =
("_dir_mtimes", "_files", "_lock") ordering is used; modify the __slots__ line
accordingly where __slots__ is defined to satisfy the static analysis rule.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@snakesee/utils.py`:
- Around line 344-357: _scandir_files currently always uses the global
_scan_cache which prevents callers of iterate_metadata_files from bypassing the
directory-scan cache when they pass use_cache=False; change _scandir_files to
accept a parameter like use_scan_cache: bool = True and, when False, perform a
fresh recursive scan instead of calling _scan_cache.get_files (or call the
existing lower-level scanner directly), then update iterate_metadata_files to
pass its use_cache value as use_scan_cache to _scandir_files so use_cache=False
truly forces a fresh directory listing; keep the public behavior unchanged when
use_scan_cache is True to use _scan_cache.
- Around line 205-207: The caching logic that "Tracks directory mtimes to detect
when new files are added" must be changed to not assume metadata files are
append-only: update the code that builds/uses _MetadataFileInfo so it records
and compares each metadata file's stat (mtime, size, inode) rather than relying
solely on directory mtime; on rescans compare per-file stats and treat modified,
added, or removed files as cache-invalidating changes (missing files should mark
the cache stale), and ensure any cache entries for directories are invalidated
when any per-file stat differs. Address this in the module code that currently
uses directory mtime checks and in the _MetadataFileInfo handling so the cache
reflects file-level modifications and deletions.
- Around line 202-255: The get_files method in _ScanCache uses an unsafe prefix
check (str(f.path).startswith(d)) and accepts any directory but only stores one
global cache; update get_files to (1) replace the startswith check with a safe
comparison that treats directory boundaries correctly (e.g., check str(f.path)
== d or str(f.path).startswith(d + os.sep) or equivalent using Path methods)
when building kept, and (2) add a simple cached-root guard: store the initial
directory (e.g., self._root_dir) on first full scan and on subsequent calls
validate that the incoming directory equals that root (raise or document error)
so the cache cannot be silently corrupted when callers pass different
directories (symbols: class _ScanCache, method get_files, attributes _files and
_dir_mtimes, and the caller _scandir_files).

---

Nitpick comments:
In `@snakesee/utils.py`:
- Line 210: The __slots__ tuple in the class is not alphabetically ordered;
update the __slots__ declaration to list its members alphabetically (e.g., order
"_dir_mtimes", "_files", "_lock") so the __slots__ = ("_dir_mtimes", "_files",
"_lock") ordering is used; modify the __slots__ line accordingly where __slots__
is defined to satisfy the static analysis rule.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bf1a5995-a9a4-47f3-8463-2429400aeb9e

📥 Commits

Reviewing files that changed from the base of the PR and between 453d660 and aab0451.

📒 Files selected for processing (1)
  • snakesee/utils.py

Comment thread snakesee/utils.py Outdated
Comment thread snakesee/utils.py Outdated
Comment thread snakesee/utils.py Outdated
- Fix unsafe path-prefix check: use `d + os.sep` to avoid matching
  `/foo/bar_baz` when checking for `/foo/bar`
- Track cached root directory and reset on directory change
- Fix docstring: remove incorrect "append-only" claim; directory mtime
  correctly detects modifications and deletions too
- Add use_scan_cache bypass: when iterate_metadata_files is called with
  use_cache=False, also bypass the scan cache for a fresh listing
- Sort __slots__ alphabetically

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@snakesee/utils.py`:
- Around line 249-264: The current eviction logic removes entire subtrees by
checking startswith on changed_prefixes; change it so we only evict cached files
whose immediate parent directory equals a changed directory (i.e., don't drop
files in deeper subdirectories). Replace the kept comprehension that uses
changed_prefixes with one that keeps f in self._files if not
any(Path(f.path).parent == Path(dir_path) for dir_path in changed_dirs), so
_scan_single_dir (which only repopulates direct files) won't cause unrelated
subtrees to vanish; leave updating self._dir_mtimes/current_mtimes and new_files
logic unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c972074c-5728-477d-a9c8-82569ad47e98

📥 Commits

Reviewing files that changed from the base of the PR and between aab0451 and ef2d51b.

📒 Files selected for processing (1)
  • snakesee/utils.py

Comment thread snakesee/utils.py Outdated
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread snakesee/utils.py

def _scandir_files(directory: Path) -> list[_MetadataFileInfo]:
"""Recursively scan directory for files using os.scandir (faster than rglob).
class _ScanCache:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: can you add unit tests:

  • First call performs full scan
  • No-change call returns cached results
  • Adding/removing a file triggers rescan of that directory only
  • Removing an entire directory removes its files
  • Different root directory triggers full rescan
  • clear() resets state

Comment thread snakesee/utils.py
self._dir_mtimes = current_mtimes
return list(self._files)

def clear(self) -> None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: needs to be called in the test infrastructure once unit tests are added, to avoid flaky tests

Comment thread snakesee/utils.py Outdated
# First call or different directory — full scan
self._root = dir_str
self._files, self._dir_mtimes = _full_scandir(directory)
return list(self._files)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking):

There's a list() copy on every hot-path return here and below. A tuple return or a generation-counter approach could avoid this allocation on the hot path.

- Return tuples from _ScanCache.get_files() to avoid list() copy on
  every hot-path return (callers that need mutability convert explicitly)
- Fix subtree eviction: only evict direct children of changed dirs,
  preserving files in unchanged subdirectories
- Add _ScanCache unit tests: first call, cached results, add/remove
  files, remove directory, different root, clear, subdirectory
  preservation
- Clear global scan cache in test fixtures to prevent flaky tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
snakesee/utils.py (2)

358-375: ⚠️ Potential issue | 🟠 Major

Line 375 puts the full list() copy back on the steady-state path.

The cache now stores a tuple, but list(_scan_cache.get_files(directory)) still allocates and copies every cached result. That restores an O(n) hot-path cost for large metadata trees, which is exactly what this PR is trying to remove.

♻️ One way to keep the cached path allocation-free
-from collections.abc import Iterator
+from collections.abc import Iterator, Sequence
...
-def _scandir_files(directory: Path, *, use_scan_cache: bool = True) -> list[_MetadataFileInfo]:
+def _scandir_files(directory: Path, *, use_scan_cache: bool = True) -> Sequence[_MetadataFileInfo]:
...
     if not use_scan_cache:
         files, _ = _full_scandir(directory)
         return files
-    return list(_scan_cache.get_files(directory))
+    return _scan_cache.get_files(directory)
...
-    if sort_by_mtime:
-        files.sort(key=lambda f: f.mtime, reverse=newest_first)
+    if sort_by_mtime:
+        files = sorted(files, key=lambda f: f.mtime, reverse=newest_first)

The sequential/parallel helper annotations should be widened to Sequence[_MetadataFileInfo] too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakesee/utils.py` around lines 358 - 375, _scandir_files currently forces an
unnecessary allocation by returning list(_scan_cache.get_files(directory)),
undoing the cache's zero-copy intent; change the cached-path return to return
the cached sequence directly (i.e., return _scan_cache.get_files(directory)) and
update any helper/annotation types from list[_MetadataFileInfo] to
Sequence[_MetadataFileInfo] (and widen signatures/usages that reference
_scandir_files and helpers to Sequence[_MetadataFileInfo]) so the steady-state
path remains allocation-free while the non-cached path still calls _full_scandir
and returns a concrete list.

232-249: ⚠️ Potential issue | 🔴 Critical

Directory mtimes won't catch in-place metadata rewrites.

When current_mtimes matches _dir_mtimes, this reuses the old _MetadataFileInfo tuple without re-stat-ing any files. Parent directory mtimes change for add/remove/rename, not for rewriting an existing file, so a cached call can keep returning stale JSON after a.json is overwritten in place because _read_metadata_file() will also see the stale file_info.mtime/inode.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakesee/utils.py` around lines 232 - 249, The code only compares directory
mtimes (current_mtimes vs self._dir_mtimes) so when they match it returns the
cached self._files, which misses in-place file rewrites; instead, when no
changed_dirs and no removed_dirs, iterate the existing cached entries in
self._files, re-stat each file (e.g., os.stat) and compare st_mtime and st_ino
against the stored values in the _MetadataFileInfo for that file; if any file's
stat differs, call _read_metadata_file(path) to refresh that file's entry and
update both self._files and the underlying per-file metadata (and optionally
self._dir_mtimes if needed). Ensure you reference and update the same
_MetadataFileInfo structure and keep using _read_metadata_file for consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@snakesee/utils.py`:
- Around line 358-375: _scandir_files currently forces an unnecessary allocation
by returning list(_scan_cache.get_files(directory)), undoing the cache's
zero-copy intent; change the cached-path return to return the cached sequence
directly (i.e., return _scan_cache.get_files(directory)) and update any
helper/annotation types from list[_MetadataFileInfo] to
Sequence[_MetadataFileInfo] (and widen signatures/usages that reference
_scandir_files and helpers to Sequence[_MetadataFileInfo]) so the steady-state
path remains allocation-free while the non-cached path still calls _full_scandir
and returns a concrete list.
- Around line 232-249: The code only compares directory mtimes (current_mtimes
vs self._dir_mtimes) so when they match it returns the cached self._files, which
misses in-place file rewrites; instead, when no changed_dirs and no
removed_dirs, iterate the existing cached entries in self._files, re-stat each
file (e.g., os.stat) and compare st_mtime and st_ino against the stored values
in the _MetadataFileInfo for that file; if any file's stat differs, call
_read_metadata_file(path) to refresh that file's entry and update both
self._files and the underlying per-file metadata (and optionally
self._dir_mtimes if needed). Ensure you reference and update the same
_MetadataFileInfo structure and keep using _read_metadata_file for consistency.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 41043795-db10-42d7-a8fd-7c95a451ca5c

📥 Commits

Reviewing files that changed from the base of the PR and between ef2d51b and 7e4b8a9.

📒 Files selected for processing (2)
  • snakesee/utils.py
  • tests/test_utils.py

yfarjoun and others added 2 commits April 14, 2026 16:54
- Return Sequence from _scandir_files instead of list, avoiding O(n)
  copy on every cached call. Widen helper signatures to match.
- Use sorted() instead of .sort() at the call site.
- Detect in-place file modifications by re-stat-ing cached files when
  no directory changes are found (catches mtime/inode shifts).
- Add test for in-place rewrite detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@snakesee/utils.py`:
- Around line 249-287: The fast-path currently still stats every cached file in
the block starting at "if not changed_dirs and not removed_dirs:" (using
self._files, stat(), and _MetadataFileInfo) which makes steady-state O(files)
and also inconsistently leaves in-place rewrites stale when any directory change
occurs; fix this by making the fast path directory-only: remove the per-file
stat loop and any_changed handling in that block so it simply returns
self._files when no directory mtimes changed, and extract the file-level
revalidation into a separate method (e.g., _revalidate_files or a configurable
"fresh_scan" mode) that callers can opt into when full file-level freshness is
required; ensure _dir_mtimes/current_mtimes logic and the existing eviction
logic (changed_dirs/removed_dirs, _scan_single_dir, kept/new_files) remain
unchanged and update any tests that assumed implicit per-file stats.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 44d349a1-5d7b-41d1-9c2f-fb1d541d5021

📥 Commits

Reviewing files that changed from the base of the PR and between 3c3e015 and f8e48e2.

📒 Files selected for processing (2)
  • snakesee/utils.py
  • tests/test_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_utils.py

Comment thread snakesee/utils.py
Remove per-file stat() loop from the no-change fast path. Snakemake
metadata files are write-once, so in-place rewrites don't need
detection. Callers needing file-level freshness can bypass the cache
via use_scan_cache=False.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yfarjoun
Copy link
Copy Markdown
Contributor Author

@coderabbitai help

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 15, 2026

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
  • @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
  • @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve merge conflict to automatically resolve merge conflicts.
  • @coderabbitai autofix to automatically fix issues identified in unresolved review comments.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit configuration file (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, support, documentation and community

  • Visit our status page to check the current availability of CodeRabbit.
  • Create a ticket on our support page for assistance with any issues or questions.
  • Visit our documentation site for detailed information on how to use CodeRabbit.
  • Join our Discord community to connect with other users and get help from the community.
  • Follow us on X/Twitter for updates and announcements.

@yfarjoun
Copy link
Copy Markdown
Contributor Author

@nh13 I think this is ready for your review.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 15, 2026

@yfarjoun: Sure! I'll kick off a review of PR #58 right away.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 15, 2026

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
  • @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
  • @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve merge conflict to automatically resolve merge conflicts.
  • @coderabbitai autofix to automatically fix issues identified in unresolved review comments.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit configuration file (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, support, documentation and community

  • Visit our status page to check the current availability of CodeRabbit.
  • Create a ticket on our support page for assistance with any issues or questions.
  • Visit our documentation site for detailed information on how to use CodeRabbit.
  • Join our Discord community to connect with other users and get help from the community.
  • Follow us on X/Twitter for updates and announcements.

@yfarjoun
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 15, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/test_utils.py (2)

285-304: Remove redundant local time imports.

Lines 287 and 304 import time locally, but it's already imported at the module level (line 4). These local imports are unnecessary.

♻️ Proposed fix
     def test_sorts_by_mtime_newest_first(self, tmp_path: Path) -> None:
         """Test files are sorted by mtime, newest first by default."""
-        import time
-
         metadata_dir = tmp_path / "metadata"
     def test_sorts_oldest_first_when_requested(self, tmp_path: Path) -> None:
         """Test files sorted oldest first when newest_first=False."""
-        import time
-
         metadata_dir = tmp_path / "metadata"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_utils.py` around lines 285 - 304, Remove the redundant local
imports of the time module inside the test functions
test_sorts_by_mtime_newest_first and test_sorts_oldest_first_when_requested;
since time is already imported at module scope, delete the two local "import
time" lines so the tests use the top-level import and avoid
shadowing/redundancy.

437-456: Consider moving shutil import to module level.

The shutil import at line 449 could be moved to the top of the file with other imports for consistency.

♻️ Proposed fix

At the top of the file, add:

import shutil

Then remove line 449:

         # Remove subdirectory
-        import shutil
-
         time.sleep(0.05)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_utils.py` around lines 437 - 456, The test function
test_removing_directory_removes_its_files contains a local import of shutil;
move "import shutil" to the module top-level imports and remove the inline
import in test_removing_directory_removes_its_files so imports are consistent;
update the file imports block to include shutil and ensure no other references
rely on a local import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_utils.py`:
- Around line 285-304: Remove the redundant local imports of the time module
inside the test functions test_sorts_by_mtime_newest_first and
test_sorts_oldest_first_when_requested; since time is already imported at module
scope, delete the two local "import time" lines so the tests use the top-level
import and avoid shadowing/redundancy.
- Around line 437-456: The test function
test_removing_directory_removes_its_files contains a local import of shutil;
move "import shutil" to the module top-level imports and remove the inline
import in test_removing_directory_removes_its_files so imports are consistent;
update the file imports block to include shutil and ensure no other references
rely on a local import.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a78fd24-62b6-4116-a07c-8bf06acf6e7b

📥 Commits

Reviewing files that changed from the base of the PR and between 453d660 and b1a9831.

📒 Files selected for processing (2)
  • snakesee/utils.py
  • tests/test_utils.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants