fix(rag): 20-item review pass — security, perf, arch, simplification by jmagar · Pull Request #14 · jmagar/noxa

jmagar · 2026-04-27T03:37:29Z

Summary

Comprehensive review pass on the noxa-rag ingestion pipeline addressing security vulnerabilities, performance bottlenecks, architectural issues, and code quality findings from a 4-agent review (security-sentinel, performance-oracle, architecture-strategist, code-simplicity-reviewer).

21 commits across 24 files. 92 tests passing (up from 81).

Security

P1 fix(noxa-5gf): ODT/PPTX decompression bomb — replace bare read_to_string with .take(cap) measured guard in parse_office_zip_file; matches the existing DOCX two-pass pattern
fix(noxa-3b8): Add 50 MiB size guard to startup_scan_key before JSON deserialization (previously unbounded on startup)
fix(noxa-qkg): Guard XML/OPML/RSS/Atom parsers against <!DOCTYPE/<!ENTITY expansion attacks (billion laughs)
fix(noxa-byr): Omit TEI response body from retry logs when auth_token is configured

Performance

P1 fix(noxa-gs8): Eliminate double BPE tokenization in chunk() — token_estimate now uses word_count approximation; the splitter already runs the tokenizer for boundary placement
fix(noxa-bkq): Change Qdrant upsert to wait=true to guarantee segment visibility before delete_stale_by_url filter runs
perf(noxa-c28): Eliminate redundant chunk text clone in TeiProvider::embed — stream index-range tuples instead of materializing Vec<String> batches
perf(noxa-rso): Check git_branch_cache before dispatching spawn_blocking — ancestor walk avoids blocking task on cache hits
perf(noxa-5tl): Batch startup_scan spawn_blocking calls (256 paths/batch vs. one per file)
fix(noxa-u90): Split parse_ms timer into io_ms + parse_ms — heartbeat now correctly measures parse CPU time separately from file I/O

Architecture

chore(noxa-bd4): Delete dead IngestionContext struct (zero call sites, duplicated fields from PointPayload/SearchResult)
refactor(noxa-4jt): Split 1429-line mcp_bridge.rs into per-platform module directory (7 files, each <200 lines)
refactor(noxa-26r): Wrap RagConfig in Arc for WorkerContext — consistent with all other shared fields
refactor(noxa-qgq): Remove OnceLock<watch_roots> from Pipeline — pass canonicalized roots directly to spawn_workers, eliminating runtime panic invariant
refactor(noxa-mqm): Add record_parse_failure() to SessionCounters, remove direct atomic field access from process.rs
refactor(noxa-3g7): Add RagError::UrlValidation and WorkerPanic variants; increment files_failed on path confinement violations (previously silent)
refactor(noxa-ngd): Normalize parse_html_file to sync fn dispatched via spawn_blocking_parse helper (consistent with all other CPU-bound parsers)
refactor(noxa-udb): Move FormatProvenance match into apply() method, eliminating 14 mut locals in build_point_payload
perf(noxa-346): Extract FileMetadata struct built once per document — eliminates N-per-chunk clones of title, author, git_branch, content_hash, etc.

Docs

docs(noxa-dkl): Document is_indexable symlink TOCTOU window and defense-in-depth design

Simplification (follow-up pass)

refactor: Promote word_count to pub(crate), replace 8 inline split_whitespace().count() calls across parse files; replace boilerplate test helper with make_text_result; remove redundant WHAT comments; eliminate double-collect in startup_scan

Test plan

cargo test -p noxa-rag — 92 tests pass
cargo clippy --workspace — no new warnings
Index a local directory with mixed formats (.md, .pdf, .docx, .odt) and verify no OOM on crafted zip input
Verify heartbeat log shows both io_ms and parse_ms fields
Verify git_branch field populated correctly in Qdrant points

Summary by cubic

Hardens and speeds up the noxa-rag ingestion pipeline: blocks zip/XML bombs, reduces CPU/I/O overhead, and simplifies the worker/parse architecture for stability and clearer metrics. Adds deterministic mcp filenames and safer mcporter execution (30s timeout, truncated stderr).

Bug Fixes
- Security: measured caps for ODT/PPTX zip reads; reject XML/feeds with DOCTYPE/ENTITY and scan raw bytes to catch non-UTF-8; 50 MiB guard (from pipeline.max_file_size_bytes) before startup_scan_key JSON parse.
- Correctness/robustness: Qdrant upserts use wait=true to avoid races; TEI retries omit bodies when auth_token is set; Windows-safe, idempotent mcp writes; deterministic mcp filenames via stable hashing; mcporter subprocess now has a 30s timeout and truncates stderr; remove stray mcp_bridge.rs to fix duplicate module.
- Errors/metrics: new RagError::UrlValidation and WorkerPanic; confinement rejections counted in files_failed; heartbeat now reports io_ms and parse_ms.
Refactors
- Architecture: delete dead IngestionContext; split mcp_bridge into focused modules; wrap config in Arc; remove OnceLock for watch roots (pass to workers); move provenance writes into FormatProvenance::apply().
- Performance: avoid double tokenization in chunking (use word_count); batch startup scan; check git-branch cache before spawning; stream ranges in TEI embed; extract FileMetadata to cut per-chunk clones.

^{Written for commit b8bab43. Summary will update on new commits. Review in cubic}

Summary by CodeRabbit

Release Notes

New Features
- Added support for syncing documents from Linkding, Memos, Bytestash, and Paperless platforms
Security
- Improved XML and RSS feed parsing with entity expansion attack prevention
Performance
- Optimized document processing with improved word counting and batch handling
- Enhanced caching for better indexing efficiency

…ctory Splits the 1429-line mcp_bridge.rs into a mcp_bridge/ directory module: - mod.rs: shared types (McpSource, BridgeConfig, SyncReport, BridgeDocument, WriteStatus), McporterExecutor trait, McpBridge<E> struct + sync() dispatch, call_data(), private helper functions, and all tests - executor.rs: ProcessMcporterExecutor implementation (59 lines) - io.rs: write_bridge_document(), relative_output_path(), build_extraction(), and file I/O helpers (129 lines) - linkding.rs: sync_linkding() + normalize_linkding_record() (120 lines) - memos.rs: sync_memos() + normalize_memo_record() (86 lines) - bytestash.rs: sync_bytestash() + normalize_bytestash_record() (105 lines) - paperless.rs: sync_paperless() + fetch_paperless_lookup() + normalize_paperless_record() (127 lines) Zero behavior change. Public API identical. All platform files < 200 lines.

…ete_stale_by_url

…fice_zip_file Replace bare entry.read_to_string() in the ODT/PPTX branch with the same two-pass pattern already used by the DOCX branch: advisory entry.size() checks remain as a fast-path, but actual reads are now capped by .take(per_entry_cap+1) and measured against MAX_ODT_PPTX_PER_ENTRY_BYTES (10 MiB) and MAX_ODT_PPTX_TOTAL_BYTES (50 MiB cumulative). A crafted zip that declares tiny central-directory sizes but expands to gigabytes is now rejected before allocating unbounded memory. Tests: odt_decompression_bomb_rejected_by_measured_guard, odt_cumulative_decompression_bomb_rejected, odt_small_legitimate_file_parses_ok, docx_entry_count_bomb_rejected

…ion in chunk() The token_estimate diagnostic field was previously computed by calling tokenizer.encode() on every emitted chunk, doubling tokenizer invocations on the spawn_blocking hot path (the splitter already runs the tokenizer internally via ChunkConfig::with_sizer for accurate boundary placement). ChunkCharIndex in text-splitter 0.25 does not expose a token count field, so the fix replaces token_estimate(&text, tokenizer) with word_count(&text). The splitter's boundary accuracy is unchanged; only the diagnostic metadata field now uses a word-count approximation. Also removes the now-unused token_estimate() helper to keep clippy clean. Tests: chunk_token_estimate_uses_word_count_not_tokenizer, chunk_empty_content_returns_empty

…move direct field access

Pipeline::new still accepts RagConfig by value and wraps it with Arc::new internally — call sites are unchanged. WorkerContext.config: Arc<RagConfig> clones the Arc pointer (O(1)) instead of deep-cloning the config value per worker.

…deserialization

…embed

…se-in-depth design

…roots to spawn_workers watch_roots is now computed in runtime::run(), passed directly to spawn_workers() as Arc<Vec<PathBuf>>, and forwarded to WorkerContext::from_pipeline() as a parameter. Eliminates the runtime panic from OnceLock::get().expect() and enforces the invariant at compile time instead.

… configured

…-file dispatch overhead

…ocking_parse helper

t0 previously captured canonicalize + open + read_to_end time, then parse_ms was recorded before parse_file() was called — measuring I/O, not parse CPU. Split into io_t0 (file I/O through read_to_end) and parse_t0 (wrapping parse_file only). Add io_ms field to JobStats, total_io_ms to SessionCounters/CounterSnapshot, and emit both in the per-job tracing::info and the heartbeat log so operators can distinguish I/O-bound from parse-bound workloads.

…ttacks

…surface confinement violations Replace RagError::Generic at all five validate_url_scheme call sites with RagError::UrlValidation so callers can pattern-match on URL-specific failures without parsing error strings. Replace the chunker spawn_blocking join-handle panic with RagError::WorkerPanic for the same reason. Surface path confinement rejections to monitoring by calling ctx.counters.record_failure() before the silent Ok(default) return at the TOCTOU guard — confinement rejections are now visible in files_failed heartbeat metrics even though no Err is propagated.

…eliminate 14 mut locals in build_point_payload

…in process_job Add find_cached_branch() helper that walks ancestor directories of the already-canonical file path against the DashMap cache (O(depth) in-memory lookups, no I/O). If any ancestor matches a cached git root, return the stored branch immediately without spawning a blocking thread. On cache miss, canonicalize the detected git root before storing it so future pre-checks via canonical path ancestors always match, regardless of symlinks or non-normalized input paths. Also cache the miss (None) at the immediate parent directory to skip re-stating .git for files outside any git repository. This eliminates spawn_blocking for every file after the first in each git repository, which is the common case for large watched directories.

…iminate N-per-chunk clones

…iminate double-collect

coderabbitai · 2026-04-27T03:37:37Z

📝 Walkthrough

Walkthrough

The PR refactors noxa-rag by simplifying token estimation to word-count approximation, modularizing MCP bridge implementations into platform-specific modules, adding XML entity expansion security checks, improving embedding batching, making the parse module public with refactored payload construction, splitting pipeline timing metrics into I/O and parsing, converting shared config to Arc-based ownership, and introducing new error variants.

Changes

Cohort / File(s)	Summary
Token & Chunking Simplification `crates/noxa-rag/src/chunker.rs`	`token_estimate` now uses word-count approximation instead of tokenizer re-encoding; exposed `word_count` function within crate and expanded unit tests.
Embedding & Error Types `crates/noxa-rag/src/embed/tei.rs`, `crates/noxa-rag/src/error.rs`	Batch processing refactored to compute index ranges and slice texts instead of cloning; conditional auth-token logging omits response body. Added `UrlValidation` and `WorkerPanic` error variants.
MCP Bridge Modularization `crates/noxa-rag/src/mcp_bridge/mod.rs`, `crates/noxa-rag/src/mcp_bridge/executor.rs`, `crates/noxa-rag/src/mcp_bridge/io.rs`, `crates/noxa-rag/src/mcp_bridge/linkding.rs`, `crates/noxa-rag/src/mcp_bridge/memos.rs`, `crates/noxa-rag/src/mcp_bridge/bytestash.rs`, `crates/noxa-rag/src/mcp_bridge/paperless.rs`	Monolithic MCP bridge split into dedicated submodules; each implements pagination, normalization, and deduplication; IO utilities handle deterministic path generation, atomic writes, and extraction metadata building.
Parse Module Security & Refactoring `crates/noxa-rag/src/pipeline/parse/mod.rs`, `crates/noxa-rag/src/pipeline/parse/text.rs`, `crates/noxa-rag/src/pipeline/parse/binary.rs`, `crates/noxa-rag/src/pipeline/parse/rich.rs`, `crates/noxa-rag/src/pipeline/parse/tests.rs`	Module exposed as `pub(crate)`; `parse_html_file` made synchronous; XML/RSS entity expansion risk detection added; word counting now uses unified `chunker::word_count`; office zip decompression budgeting added; `build_point_payload` refactored to accept `FileMetadata` struct.
Pipeline Infrastructure & Timing `crates/noxa-rag/src/pipeline.rs`, `crates/noxa-rag/src/pipeline/heartbeat.rs`, `crates/noxa-rag/src/pipeline/process.rs`, `crates/noxa-rag/src/pipeline/runtime.rs`, `crates/noxa-rag/src/pipeline/worker.rs`, `crates/noxa-rag/src/pipeline/scan.rs`, `crates/noxa-rag/src/pipeline/startup_scan.rs`	Config ownership changed to `Arc<RagConfig>`; watch_roots passed explicitly to workers instead of via `OnceLock`; timing split into `io_ms` and parse timing; startup scan now batches 256 paths per `spawn_blocking` call; git branch caching added; JSON file size checks added to startup scan.
Storage & Type Cleanup `crates/noxa-rag/src/store/qdrant/vector_store.rs`, `crates/noxa-rag/src/types.rs`	Upsert now uses `wait=true` to ensure visibility before delete filter; `IngestionContext` struct deleted.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Pipeline
    participant Executor
    participant MCPServer
    participant FileIO
    participant Store

    User->>Pipeline: Configure MCP bridge (linkding/memos/bytestash/paperless)
    Pipeline->>Executor: spawn_workers with watch_roots Arc
    
    loop Sync Loop (per platform)
        Executor->>MCPServer: Execute action (list/lookup)
        MCPServer-->>Executor: JSON response
        Executor->>Executor: Normalize record to BridgeDocument
        Executor->>FileIO: write_bridge_document
        FileIO->>FileIO: Compute relative_output_path
        FileIO->>FileIO: Compare serialized bytes<br/>(avoid rewrite)
        FileIO->>FileIO: Atomic write (temp + rename)
        FileIO-->>Executor: WriteStatus
        Executor->>Executor: Update SyncReport counters
    end
    
    Executor-->>Pipeline: SyncReport (fetched/written/skipped)

sequenceDiagram
    participant Worker
    participant Parser as ParseModule
    participant Security
    participant Chunker
    participant Store

    Worker->>Parser: parse_file (bytes, url, provenance)
    
    alt Is XML/RSS/HTML
        Parser->>Security: contains_xml_entity_expansion_risk(bytes)
        Security-->>Parser: bool
        alt Risk Detected
            Parser-->>Worker: RagError::Parse
        else Safe
            Parser->>Parser: parse_html_file or parse_xml_file
        end
    else Binary (PDF/Office)
        Parser->>Parser: Decompress with budget caps
        alt Exceeds Budget
            Parser-->>Worker: RagError::Parse
        else Within Budget
            Parser->>Parser: Extract text
        end
    end
    
    Parser->>Chunker: Split text into chunks
    Chunker->>Chunker: word_count approximation<br/>(replaces tokenizer)
    
    loop For each chunk
        Parser->>Parser: build_point_payload(chunk, FileMetadata)
        Parser->>Store: Insert PointPayload
    end
    
    Parser-->>Worker: Emit JobStats with io_ms + parse_ms

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat(noxa-68r): noxa-rag RAG pipeline crate #4: Introduces chunker::word_count and TEI embedding code that this PR refactors; both touch token estimation and embedding batching logic.
refactor(noxa-rag): full review fixes — cleanup, tests, and dead-code removal #12: Both PRs update URL normalization across the codebase, introducing shared url_util::normalize_url that affects MCP and pipeline integration.
Port full upstream vertical extractor parity #13: Both modify extraction result handling and introduce support for vertical data propagation in MCP bridges and core extraction flows.

Poem

🐰 Chunkers hopping without tokenizers bright,
MCP bridges modularized just right,
XML entities checked with care,
Batched embeddings float through the air,
Watch roots shared in arcs so fair! 🌟

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately captures the main intent: a comprehensive review pass addressing security, performance, architecture, and simplification across 24 files with 21 commits. It reflects the multi-faceted nature of the changes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch bd-work/rag-review-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR is a broad hardening + performance + refactor pass across noxa-rag’s ingestion pipeline (parsing → chunking → embedding → Qdrant upsert), plus a new MCP bridge module split intended to replace the previous monolithic mcp_bridge.rs.

Changes:

Adds multiple ingestion safety guards (zip bomb caps, XML entity expansion pre-scan, startup-scan JSON size cap, safer logging) and improves error taxonomy.
Improves ingestion throughput/latency (startup scan batching, avoids double tokenization, avoids redundant clones, git-branch cache fast path, improved timing metrics).
Refactors parsing/provenance and pipeline internals (per-file metadata extraction, provenance application, watch roots handling, new MCP bridge module layout).

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
crates/noxa-rag/src/types.rs	Removes `IngestionContext` and keeps provenance fields in `PointPayload`.
crates/noxa-rag/src/store/qdrant/vector_store.rs	Switches Qdrant upsert to `wait=true` to avoid post-upsert delete visibility races.
crates/noxa-rag/src/pipeline/worker.rs	Passes canonicalized `watch_roots` into worker context explicitly.
crates/noxa-rag/src/pipeline/startup_scan.rs	Batches `startup_scan_key` computations per `spawn_blocking` task to reduce overhead.
crates/noxa-rag/src/pipeline/scan.rs	Adds symlink-safety docs and a JSON size guard + tests for `startup_scan_key`.
crates/noxa-rag/src/pipeline/runtime.rs	Removes `OnceLock` watch-roots setup; threads `watch_roots` into worker spawn.
crates/noxa-rag/src/pipeline/process.rs	Separates IO vs parse timing, adds git-branch cache precheck, improves failure accounting, and uses new error variants.
crates/noxa-rag/src/pipeline/parse/text.rs	Uses shared `word_count`, makes HTML parse sync (spawned via helper), and adds XML entity-expansion pre-scan.
crates/noxa-rag/src/pipeline/parse/tests.rs	Adds tests ensuring XML/feed entity-expansion payloads are rejected.
crates/noxa-rag/src/pipeline/parse/rich.rs	Applies the XML entity-expansion pre-scan to feed parsing; uses shared `word_count`.
crates/noxa-rag/src/pipeline/parse/mod.rs	Adds `FormatProvenance::apply`, introduces `FileMetadata`, normalizes spawn-blocking usage, and simplifies payload building.
crates/noxa-rag/src/pipeline/parse/binary.rs	Adds measured ODT/PPTX decompression caps and tests; uses shared `word_count`.
crates/noxa-rag/src/pipeline/heartbeat.rs	Emits `io_ms` in heartbeat snapshots.
crates/noxa-rag/src/pipeline.rs	Adds IO timing counters, refactors counters API, wraps config in `Arc`, removes `OnceLock` watch-roots.
crates/noxa-rag/src/mcp_bridge/paperless.rs	Adds Paperless MCP normalization + sync logic.
crates/noxa-rag/src/mcp_bridge/mod.rs	New MCP bridge module root with shared helpers, executor trait, sync orchestration, and tests.
crates/noxa-rag/src/mcp_bridge/memos.rs	Adds Memos MCP sync + record normalization.
crates/noxa-rag/src/mcp_bridge/linkding.rs	Adds Linkding MCP sync + record normalization (including platform URL building).
crates/noxa-rag/src/mcp_bridge/io.rs	Adds bridge document storage format + atomic-ish write helper and path sanitization.
crates/noxa-rag/src/mcp_bridge/executor.rs	Adds `ProcessMcporterExecutor` to invoke `mcporter` as a subprocess and parse JSON output.
crates/noxa-rag/src/mcp_bridge/bytestash.rs	Adds Bytestash MCP sync + record normalization.
crates/noxa-rag/src/error.rs	Adds `UrlValidation` and `WorkerPanic` error variants.
crates/noxa-rag/src/embed/tei.rs	Avoids cloning batch texts and omits retry body logs when an auth token is configured.
crates/noxa-rag/src/chunker.rs	Makes `word_count` reusable, switches token estimate to word-count approximation, and adds tests to prevent re-tokenization regressions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bd4243a979

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Remove mcp_bridge.rs (was deleted on disk but not staged in commit 59a195c; caused E0761 duplicate module root alongside mcp_bridge/) - mcp_bridge/io.rs: remove destination before rename so write_bridge_document is idempotent on Windows (rename fails if dst exists) - parse/text.rs: scan raw bytes in contains_xml_entity_expansion_risk instead of from_utf8().unwrap_or("") — non-UTF-8 bytes no longer silently suppress the <!DOCTYPE/<!ENTITY guard - pipeline/scan.rs: replace hardcoded MAX_SCAN_JSON_BYTES with a max_json_bytes parameter threaded from pipeline.config.pipeline.max_file_size_bytes - pipeline/scan.rs: fix startup_scan_key_json_at_size_limit_is_processed to actually exercise the boundary condition (len == limit, not a tiny file) Resolves review thread PRRT_kwDOR_mP6c59t3qL Resolves review thread PRRT_kwDOR_mP6c59t25j Resolves review thread PRRT_kwDOR_mP6c59t25d Resolves review thread PRRT_kwDOR_mP6c59t25O Resolves review thread PRRT_kwDOR_mP6c59t3qN Resolves review thread PRRT_kwDOR_mP6c59t25n

coderabbitai

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

crates/noxa-rag/src/embed/tei.rs (2)
266-275: ⚠️ Potential issue | 🟡 Minor

Body still leaks via the non‑retryable error path when auth_token is set.

The retry-log omission added above is undone here: on any non‑success status that isn't 429/503/413 (or after should_retry returns false), the response body is read and a 512‑char preview is embedded into RagError::Embed { message }. That error message is virtually always logged by upstream callers, so the same TEI body bytes the retry path now suppresses will still surface in logs whenever auth is configured. Since the PR’s stated goal is "Omit TEI response body from retry logs when auth_token is configured", this is a consistency/security gap worth closing in the same place.
🛡️ Suggested fix
-            let body = resp.text().await.unwrap_or_default();
-            let preview: String = body.chars().take(512).collect();
-            return Err(RagError::Embed {
-                message: format!("TEI /embed returned HTTP {status_u16}: {preview}"),
-                status: Some(status_u16),
-            });
+            let message = if self.auth_token.is_some() {
+                // Body omitted to avoid leaking auth-gated response content into error logs.
+                format!("TEI /embed returned HTTP {status_u16} (body omitted: auth token configured)")
+            } else {
+                let body = resp.text().await.unwrap_or_default();
+                let preview: String = body.chars().take(512).collect();
+                format!("TEI /embed returned HTTP {status_u16}: {preview}")
+            };
+            return Err(RagError::Embed {
+                message,
+                status: Some(status_u16),
+            });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/noxa-rag/src/embed/tei.rs` around lines 266 - 275, The non-retryable
error path still reads and includes a 512‑char TEI response body preview in
RagError::Embed, leaking sensitive TEI content when an auth token is configured;
change the branch that constructs the error so it does NOT read/attach the
response body if the request used an auth token (e.g., check the same auth_token
flag/variable used earlier), and instead produce a safe generic message (or omit
the preview) in RagError::Embed; retain the body preview only when no auth token
is present so behavior matches the earlier retry-path omission.
332-352: 🧹 Nitpick | 🔵 Trivial

Refactor is correct; the intermediate Vec is unnecessary.

Slicing &texts[start..end] inside async move cleanly avoids the previous per-batch Vec<String> clone, and the borrow is sound because self and texts outlive the buffered(..).collect() await. One optional nit: futures::stream::iter accepts any IntoIterator, so the batch_ranges: Vec<(usize, usize, usize)> materialization adds an extra allocation for no benefit — you can feed the range iterator directly.
♻️ Optional simplification
-        let batch_ranges: Vec<(usize, usize, usize)> = (0..total_batches)
-            .map(|i| {
-                let start = i * BATCH_SIZE;
-                let end = (start + BATCH_SIZE).min(texts.len());
-                (i, start, end)
-            })
-            .collect();
-
-        let results: Vec<Vec<Vec<f32>>> = futures::stream::iter(batch_ranges)
+        let results: Vec<Vec<Vec<f32>>> = futures::stream::iter((0..total_batches).map(|i| {
+            let start = i * BATCH_SIZE;
+            let end = (start + BATCH_SIZE).min(texts.len());
+            (i, start, end)
+        }))
             .map(|(batch_idx, start, end)| async move {
                 self.embed_batch_adaptive(&texts[start..end], batch_idx, total_batches)
                     .await
             })
             .buffered(EMBED_PIPELINE_DEPTH)
             .collect::<Vec<_>>()
             .await
             .into_iter()
             .collect::<Result<_, _>>()?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/noxa-rag/src/embed/tei.rs` around lines 332 - 352, The intermediate
Vec batch_ranges is an unnecessary allocation—remove the batch_ranges binding
and pass the iterator directly to futures::stream::iter by using
(0..total_batches).map(|i| { let start = i * BATCH_SIZE; let end = (start +
BATCH_SIZE).min(texts.len()); (i, start, end) }) so the stream maps over those
tuples and calls self.embed_batch_adaptive(&texts[start..end], batch_idx,
total_batches).await, keeping the existing
buffered(EMBED_PIPELINE_DEPTH).collect().await.into_iter().collect::<Result<_,
_>>()? plumbing intact.
crates/noxa-rag/src/types.rs (1)
213-215: ⚠️ Potential issue | 🟡 Minor

Fix cargo fmt failure: extra blank line.

CI reported a cargo fmt --check --all diff. The double blank line between the closing } of SearchMetadataFilter and the #[cfg(test)] module looks like the culprit.
📐 Proposed fix
 }
 
-
 #[cfg(test)]
 mod tests {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/noxa-rag/src/types.rs` around lines 213 - 215, Remove the extra blank
line between the end of the SearchMetadataFilter type declaration (the closing
"}" for SearchMetadataFilter) and the subsequent #[cfg(test)] module; ensure
there is only a single newline separating them so the file matches cargo fmt
formatting (i.e., place #[cfg(test)] immediately after one blank line following
the SearchMetadataFilter closing brace).
crates/noxa-rag/src/pipeline/parse/text.rs (1)
171-199: ⚠️ Potential issue | 🟡 Minor

Fix cargo fmt failure on parse_xml_file signature/body wrapping.

CI reports a fmt diff in this region (likely the Result<ParsedFile, RagError> signature wrapping or the Err(RagError::Parse(...)) block). Run cargo fmt --all and commit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/noxa-rag/src/pipeline/parse/text.rs` around lines 171 - 199, The
function parse_xml_file has formatting that fails cargo fmt (signature/body
wrapping around Result<ParsedFile, RagError> and the Err(RagError::Parse(...))
block); run `cargo fmt --all` to reformat the file, ensure the parse_xml_file
signature and its Err return are wrapped/indented per rustfmt rules, then add
and commit the formatted changes so CI passes (locate parse_xml_file and the
Err(RagError::Parse(...)) return to verify formatting).
crates/noxa-rag/src/pipeline/scan.rs (1)
123-184: 🧹 Nitpick | 🔵 Trivial

Add explicit comment at spawn_startup_scan call site wiring max_json_bytes to config source.

The doc note at lines 107-108 correctly explains the design, but the relationship is only documented in startup_scan_key's function comment. At the call site in runtime.rs:99-106, the parameter is already correctly wired from pipeline.config.pipeline.max_file_size_bytes, but a defensive comment would make this coupling explicit and prevent accidental drift. Consider adding a comment above the call:
// max_json_bytes wired from pipeline.max_file_size_bytes to stay in sync with process_job guard
Alternatively, a debug_assert in spawn_startup_scan could validate at runtime (e.g., assert the parameter is not zero if config is expected to be non-zero).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/noxa-rag/src/pipeline/scan.rs` around lines 123 - 184, At the call
site of spawn_startup_scan in runtime.rs (where you pass
pipeline.config.pipeline.max_file_size_bytes into startup_scan_key), add an
explicit comment stating that max_json_bytes is intentionally wired to
pipeline.config.pipeline.max_file_size_bytes to keep it in sync with the
process_job guard and startup_scan_key behavior; optionally, add a defensive
debug_assert inside spawn_startup_scan (or at its caller) that the passed
max_json_bytes is non-zero (or matches expected invariants) to catch accidental
config drift at runtime.
crates/noxa-rag/src/mcp_bridge/mod.rs (1)
106-117: 🧹 Nitpick | 🔵 Trivial

Remove the redundant pub(self) visibility modifier.

pub(self) provides no additional visibility compared to omitting the modifier entirely and is flagged by the clippy::needless_pub_self lint (enabled by default in Clippy 1.72.0+). Since call_data is used exclusively within submodule implementations via self.call_data(...), the visibility marker is unnecessary.
♻️ Proposed fix
-    pub(self) async fn call_data(
+    async fn call_data(
         &self,
         source: McpSource,
         action: &str,
         params: Value,
     ) -> Result<Value, RagError> {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/noxa-rag/src/mcp_bridge/mod.rs` around lines 106 - 117, The method
declaration uses the redundant visibility modifier `pub(self)` on async fn
call_data; remove `pub(self)` so the signature becomes async fn call_data(&self,
source: McpSource, action: &str, params: Value) -> Result<Value, RagError> and
keep the body unchanged; this addresses the clippy::needless_pub_self lint while
preserving usage from submodule callers and retaining the call to
self.executor.call(...) and extract_mcporter_data(raw).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/noxa-rag/src/chunker.rs`:
- Line 132: Remove the unused import "use std::str::FromStr as _;" from
chunker.rs to silence the CI warning; the code that calls
str::parse::<Tokenizer>() does not require the trait to be imported, so delete
the FromStr import line (referenced symbol: FromStr as _ in chunker.rs) and run
cargo check to confirm the warning is gone.

In `@crates/noxa-rag/src/mcp_bridge/bytestash.rs`:
- Around line 5-10: CI indicates rustfmt import grouping/line-wrapping issues in
this file; run cargo fmt --all to apply rustfmt rules or manually reflow the use
statement that imports BridgeDocument, McpBridge, McpSource, McporterExecutor,
SyncReport, WriteStatus, array_field, join_base_url, join_non_empty,
optional_string, required_base_url, required_string, string_array and the nested
io::{build_extraction, write_bridge_document} so they follow rustfmt's grouping
and line length rules (split into multiple lines or group paths as rustfmt
expects) and re-run cargo fmt to verify the formatting passes.
- Around line 16-38: The sync_bytestash implementation currently makes a single
snippets.list call and loads all results; change it to paginate using
self.config.page_size by repeatedly calling call_data(McpSource::Bytestash,
"snippets.list", params) with pagination params (e.g. page_size and a page token
or offset depending on the Bytestash API) and accumulate records rather than
calling it once; inside the loop use array_field(&data, "snippets") or
data.as_array() to collect page results, update report counts as each record is
processed (normalize_bytestash_record, write_bridge_document, WriteStatus
handling remain the same), and stop when the API indicates there are no more
pages (no next_page_token / empty page), mirroring the pagination pattern used
in linkding.rs/memos.rs/paperless.rs.

In `@crates/noxa-rag/src/mcp_bridge/executor.rs`:
- Around line 48-53: The current error builds RagError::Generic by interpolating
String::from_utf8_lossy(&output.stderr).trim() which may leak secrets; instead,
debug!-log the full stderr (e.g., debug!("mcporter stderr: {:?}",
String::from_utf8_lossy(&output.stderr))) and return a sanitized
RagError::Generic that does not include the raw stderr—use a short safe summary
such as "mcporter call {selector} {action} failed (stderr redacted)" or include
only a non-sensitive prefix/hex-hash of output.stderr; update the code around
the existing Err(...) construction to perform the debug! call and replace the
interpolated stderr with the sanitized message while still referencing selector
and action.
- Around line 36-45: The mcporter invocation using Command::output().await can
hang indefinitely; add a configurable timeout field (e.g., timeout: Duration) to
ProcessMcporterExecutor, wrap the child spawn/await in tokio::time::timeout
using that field, and if the timeout elapses ensure the child process is killed
(send kill/terminate to the Child returned by Command::spawn) and return a
RagError indicating timeout; update the call site that currently uses
Command::new(&self.executable)...output().await to spawn the child, await its
output under timeout, and handle both timeout and child-kill error paths
accordingly.

In `@crates/noxa-rag/src/mcp_bridge/io.rs`:
- Around line 10-16: The filenames are unstable because stable_component_hash
(used by relative_output_path) relies on DefaultHasher; replace that hashing
with a deterministic 64-bit xxh3 implementation (xxhash_rust::xxh3::xxh3_64) so
the same external_id always yields the same 16‑hex filename. Update the
stable_component_hash function to compute xxh3_64(external_id.as_bytes()) and
keep the existing "{:016x}" formatting, and replace any other uses of
DefaultHasher in this file (e.g., other calls referenced around the 104-109
region) to use xxh3_64 to ensure on-disk paths remain stable across Rust
toolchain upgrades.
- Around line 37-43: The existing remove-then-rename sequence (using
temp_output_path, tokio::fs::write, tokio::fs::remove_file, tokio::fs::rename)
is intentional for reliable cross-platform behavior; update the inline comment
above that block to explicitly state why we keep remove_file: note that although
std::fs::rename may use MoveFileExW / FileRenameInfoEx on Windows, atomic
replacement semantics depend on Rust version and Windows 10+ API differences and
will fail if the destination is a directory, so without a guaranteed MSRV/target
Windows version we intentionally remove the destination first to ensure success;
leave the code and return WriteStatus::Written as-is.

In `@crates/noxa-rag/src/mcp_bridge/linkding.rs`:
- Around line 55-60: The check that decides when to stop paging currently does
two lookups of data.get("next") and can be simplified and made more efficient:
replace the condition `data.get("next").is_none() ||
data.get("next").is_some_and(Value::is_null)` with a single lookup like
`data.get("next").map_or(true, Value::is_null)` (or the equivalent
`!data.get("next").is_some_and(|v| !v.is_null())`) so the "next" field is read
once; keep the subsequent `if new_records == 0 { break; }` logic unchanged.
- Around line 8-13: The import block inside the super::{...} in linkding.rs is
not formatted to rustfmt/cargo fmt standards causing CI failures; run cargo fmt
--all (or rustfmt on that file) to canonicalize the grouping and line-wrapping
of the super::{ BridgeDocument, McpBridge, McpSource, McporterExecutor,
SyncReport, WriteStatus, array_field, as_u64_value, join_non_empty,
optional_string, required_string, required_value, string_array,
io::{build_extraction, write_bridge_document} } import so it matches the
project's formatting rules and removes the CI diff.

In `@crates/noxa-rag/src/mcp_bridge/memos.rs`:
- Around line 5-10: The import list in crates/noxa-rag/src/mcp_bridge/memos.rs
is not formatted to rustfmt's grouping/wrapping rules (symbols like
BridgeDocument, McpBridge, McpSource, McporterExecutor, SyncReport, WriteStatus,
array_field, first_line_title, join_base_url, optional_string,
required_base_url, required_string, string_array, and the io imports
build_extraction and write_bridge_document are flagged); run rustfmt by
executing cargo fmt --all (or rustfmt on that file) to reorder and wrap the use
statements to match the project's formatting, then re-commit and push the
updated file so CI passes.

In `@crates/noxa-rag/src/pipeline/parse/binary.rs`:
- Around line 167-183: The current read-and-check block (using remaining,
per_entry_cap, read_cap, xml_buf, .take(read_cap).read_to_string(...),
read_bytes and odt_pptx_measured_total) is correct functionally but needs two
fixes: 1) run cargo fmt --all to satisfy rustfmt so the (&mut
entry).take(read_cap).read_to_string(...) chain is formatted on one line; 2)
improve the RagError::Parse message emitted in the if read_bytes > per_entry_cap
branch to distinguish a per-entry limit hit from a cumulative-cap exhaustion
(i.e., when remaining==0 / per_entry_cap==0) — update the error text inside the
RagError::Parse formatting to mention either "exceeds per-entry decompression
limit" or "cumulative decompression budget exhausted" depending on per_entry_cap
== 0 (or include both pieces of context), then run tests and commit.

In `@crates/noxa-rag/src/pipeline/parse/tests.rs`:
- Around line 142-147: The CI failure is due to rustfmt reflowing short
function-call expressions (e.g., FileMetadata::from_result_and_provenance and
build_point_payload) and the sample_file_metadata helper; fix it by running
`cargo fmt --all` locally and committing the reformatted changes, ensuring the
call sites (FileMetadata::from_result_and_provenance(...) and
build_point_payload(...)) and the sample_file_metadata helper are formatted
according to rustfmt so the function-call wrapping matches CI expectations.
- Around line 416-430: The test parse_feed_rejects_entity_expansion currently
only checks result.is_err(); change it to mirror the XML test by unwrapping the
error (let err = result.unwrap_err().to_string()) and assert the error message
mentions DOCTYPE/ENTITY/entity expansion (e.g., assert!(err.contains("DOCTYPE")
|| err.contains("ENTITY") || err.contains("entity expansion"))), so that the
failure is specifically due to the entity-expansion guard in parse_file rather
than any other parse error.

In `@crates/noxa-rag/src/pipeline/scan.rs`:
- Around line 549-568: The at-limit test is currently asserting is_some() but
only passes because serde_json fails on trailing NULs and the code falls back to
mtime/size; change the test (the test covering the at-limit case that calls
startup_scan_key) to assert the returned Option is Some and that the string
begins with "mtime:" (e.g. unwrap() and starts_with("mtime:")) to explicitly
verify the mtime/size branch; locate the test that calls startup_scan_key at the
size limit and replace the loose is_some() check with this stronger assertion.
- Around line 580-584: Rustfmt is breaking the long chained call to
OpenOptions::new().write(true).open(&file); wrap the chained OpenOptions
invocation (the OpenOptions::new().write(true).open(&file) expression used to
create f) in parentheses or assign it to a temporary variable so the formatter
can break lines cleanly, then call set_len(LIMIT) on the resulting File; ensure
you still use std::fs::OpenOptions::new, .write(true), .open(&file) and
.set_len(LIMIT) (and keep the existing expect("open")/expect("set_len")
behavior).

---

Outside diff comments:
In `@crates/noxa-rag/src/embed/tei.rs`:
- Around line 266-275: The non-retryable error path still reads and includes a
512‑char TEI response body preview in RagError::Embed, leaking sensitive TEI
content when an auth token is configured; change the branch that constructs the
error so it does NOT read/attach the response body if the request used an auth
token (e.g., check the same auth_token flag/variable used earlier), and instead
produce a safe generic message (or omit the preview) in RagError::Embed; retain
the body preview only when no auth token is present so behavior matches the
earlier retry-path omission.
- Around line 332-352: The intermediate Vec batch_ranges is an unnecessary
allocation—remove the batch_ranges binding and pass the iterator directly to
futures::stream::iter by using (0..total_batches).map(|i| { let start = i *
BATCH_SIZE; let end = (start + BATCH_SIZE).min(texts.len()); (i, start, end) })
so the stream maps over those tuples and calls
self.embed_batch_adaptive(&texts[start..end], batch_idx, total_batches).await,
keeping the existing
buffered(EMBED_PIPELINE_DEPTH).collect().await.into_iter().collect::<Result<_,
_>>()? plumbing intact.

In `@crates/noxa-rag/src/mcp_bridge/mod.rs`:
- Around line 106-117: The method declaration uses the redundant visibility
modifier `pub(self)` on async fn call_data; remove `pub(self)` so the signature
becomes async fn call_data(&self, source: McpSource, action: &str, params:
Value) -> Result<Value, RagError> and keep the body unchanged; this addresses
the clippy::needless_pub_self lint while preserving usage from submodule callers
and retaining the call to self.executor.call(...) and
extract_mcporter_data(raw).

In `@crates/noxa-rag/src/pipeline/parse/text.rs`:
- Around line 171-199: The function parse_xml_file has formatting that fails
cargo fmt (signature/body wrapping around Result<ParsedFile, RagError> and the
Err(RagError::Parse(...)) block); run `cargo fmt --all` to reformat the file,
ensure the parse_xml_file signature and its Err return are wrapped/indented per
rustfmt rules, then add and commit the formatted changes so CI passes (locate
parse_xml_file and the Err(RagError::Parse(...)) return to verify formatting).

In `@crates/noxa-rag/src/pipeline/scan.rs`:
- Around line 123-184: At the call site of spawn_startup_scan in runtime.rs
(where you pass pipeline.config.pipeline.max_file_size_bytes into
startup_scan_key), add an explicit comment stating that max_json_bytes is
intentionally wired to pipeline.config.pipeline.max_file_size_bytes to keep it
in sync with the process_job guard and startup_scan_key behavior; optionally,
add a defensive debug_assert inside spawn_startup_scan (or at its caller) that
the passed max_json_bytes is non-zero (or matches expected invariants) to catch
accidental config drift at runtime.

In `@crates/noxa-rag/src/types.rs`:
- Around line 213-215: Remove the extra blank line between the end of the
SearchMetadataFilter type declaration (the closing "}" for SearchMetadataFilter)
and the subsequent #[cfg(test)] module; ensure there is only a single newline
separating them so the file matches cargo fmt formatting (i.e., place
#[cfg(test)] immediately after one blank line following the SearchMetadataFilter
closing brace).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3fd83159-dbcf-4405-adbb-2b03c2b2a28a

📥 Commits

Reviewing files that changed from the base of the PR and between 5380a90 and 47d420b.

📒 Files selected for processing (24)

crates/noxa-rag/src/chunker.rs
crates/noxa-rag/src/embed/tei.rs
crates/noxa-rag/src/error.rs
crates/noxa-rag/src/mcp_bridge/bytestash.rs
crates/noxa-rag/src/mcp_bridge/executor.rs
crates/noxa-rag/src/mcp_bridge/io.rs
crates/noxa-rag/src/mcp_bridge/linkding.rs
crates/noxa-rag/src/mcp_bridge/memos.rs
crates/noxa-rag/src/mcp_bridge/mod.rs
crates/noxa-rag/src/mcp_bridge/paperless.rs
crates/noxa-rag/src/pipeline.rs
crates/noxa-rag/src/pipeline/heartbeat.rs
crates/noxa-rag/src/pipeline/parse/binary.rs
crates/noxa-rag/src/pipeline/parse/mod.rs
crates/noxa-rag/src/pipeline/parse/rich.rs
crates/noxa-rag/src/pipeline/parse/tests.rs
crates/noxa-rag/src/pipeline/parse/text.rs
crates/noxa-rag/src/pipeline/process.rs
crates/noxa-rag/src/pipeline/runtime.rs
crates/noxa-rag/src/pipeline/scan.rs
crates/noxa-rag/src/pipeline/startup_scan.rs
crates/noxa-rag/src/pipeline/worker.rs
crates/noxa-rag/src/store/qdrant/vector_store.rs
crates/noxa-rag/src/types.rs

- executor.rs: add 30s timeout to mcporter subprocess; truncate stderr to 512 chars to prevent large/sensitive output leaking into error msgs - io.rs: replace DefaultHasher (not stable across Rust versions) with xxh3_64 for deterministic filenames in stable_component_hash - linkding.rs: simplify next-page termination check to map_or(true, is_null) - binary.rs: separate per-entry vs cumulative error messages in ODT/PPTX decompression guard so the rejection reason is explicit - parse/tests.rs: strengthen parse_feed_rejects_entity_expansion to assert the error message mentions DOCTYPE/ENTITY (mirrors XML test) - chunker.rs: remove unused `use std::str::FromStr as _` import (CI warning) - cargo fmt --all: fix import grouping in bytestash.rs, linkding.rs, memos.rs, parse/tests.rs; fix OpenOptions chain wrapping in scan.rs No action taken on: - io.rs:43 rename/remove_file: kept as-is; cross-platform safe regardless - bytestash.rs:38 pagination: no official pagination API confirmed by reviewer - scan.rs:568 boundary test: reviewer acknowledged implementation is correct Resolves review thread PRRT_kwDOR_mP6c59zaLc Resolves review thread PRRT_kwDOR_mP6c59zaLq Resolves review thread PRRT_kwDOR_mP6c59zaLs Resolves review thread PRRT_kwDOR_mP6c59zaL3 Resolves review thread PRRT_kwDOR_mP6c59zaL5 Resolves review thread PRRT_kwDOR_mP6c59zaL8 Resolves review thread PRRT_kwDOR_mP6c59zaMA Resolves review thread PRRT_kwDOR_mP6c59zaMJ Resolves review thread PRRT_kwDOR_mP6c59zaMM Resolves review thread PRRT_kwDOR_mP6c59zaMR Resolves review thread PRRT_kwDOR_mP6c59zaMT Resolves review thread PRRT_kwDOR_mP6c59zaMm Resolves review thread PRRT_kwDOR_mP6c59zaMq Resolves review thread PRRT_kwDOR_mP6c59zaMs Resolves review thread PRRT_kwDOR_mP6c59zaMz

jmagar added 21 commits April 26, 2026 19:19

chore(noxa-bd4): delete dead IngestionContext struct from types.rs

0f588de

fix(noxa-bkq): change upsert to wait=true to ensure ordering with del…

cf7eed2

…ete_stale_by_url

refactor(noxa-mqm): add record_parse_failure() to SessionCounters, re…

c9c9b15

…move direct field access

fix(noxa-3b8): add 50 MiB size guard to startup_scan_key before JSON …

0d99532

…deserialization

perf(noxa-c28): eliminate redundant chunk text clone in TeiProvider::…

8779914

…embed

docs(noxa-dkl): document is_indexable symlink TOCTOU window and defen…

5b59687

…se-in-depth design

fix(noxa-byr): omit TEI response body from retry logs when auth_token…

9b220ca

… configured

perf(noxa-5tl): batch startup_scan spawn_blocking calls to reduce per…

debc760

…-file dispatch overhead

refactor(noxa-ngd): normalize parse_html_file to sync fn via spawn_bl…

35db1ad

…ocking_parse helper

fix(noxa-qkg): guard XML/OPML/feed parsers against entity expansion a…

3a9479b

…ttacks

refactor(noxa-udb): move FormatProvenance match into apply() method, …

2a39b7d

…eliminate 14 mut locals in build_point_payload

perf(noxa-346): factor per-file metadata into FileMetadata struct, el…

8c09c8a

…iminate N-per-chunk clones

refactor: simplify — shared word_count, remove redundant comments, el…

bd4243a

…iminate double-collect

Copilot AI review requested due to automatic review settings April 27, 2026 03:37

Copilot started reviewing on behalf of jmagar April 27, 2026 03:37 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Comment thread crates/noxa-rag/src/pipeline/parse/text.rs Outdated

Comment thread crates/noxa-rag/src/mcp_bridge/io.rs

Comment thread crates/noxa-rag/src/mcp_bridge/mod.rs

Comment thread crates/noxa-rag/src/pipeline/scan.rs Outdated

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread crates/noxa-rag/src/mcp_bridge/mod.rs

Comment thread crates/noxa-rag/src/pipeline/scan.rs Outdated

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

Conversation

jmagar commented Apr 27, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Security

Performance

Architecture

Docs

Simplification (follow-up pass)

Test plan

Summary by cubic

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jmagar commented Apr 27, 2026 •

edited by cubic-dev-ai Bot

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading