Skip to content

fix(rag): 20-item review pass — security, perf, arch, simplification#14

Open
jmagar wants to merge 23 commits intomainfrom
bd-work/rag-review-fixes
Open

fix(rag): 20-item review pass — security, perf, arch, simplification#14
jmagar wants to merge 23 commits intomainfrom
bd-work/rag-review-fixes

Conversation

@jmagar
Copy link
Copy Markdown
Owner

@jmagar jmagar commented Apr 27, 2026

Summary

Comprehensive review pass on the noxa-rag ingestion pipeline addressing security vulnerabilities, performance bottlenecks, architectural issues, and code quality findings from a 4-agent review (security-sentinel, performance-oracle, architecture-strategist, code-simplicity-reviewer).

21 commits across 24 files. 92 tests passing (up from 81).

Security

  • P1 fix(noxa-5gf): ODT/PPTX decompression bomb — replace bare read_to_string with .take(cap) measured guard in parse_office_zip_file; matches the existing DOCX two-pass pattern
  • fix(noxa-3b8): Add 50 MiB size guard to startup_scan_key before JSON deserialization (previously unbounded on startup)
  • fix(noxa-qkg): Guard XML/OPML/RSS/Atom parsers against <!DOCTYPE/<!ENTITY expansion attacks (billion laughs)
  • fix(noxa-byr): Omit TEI response body from retry logs when auth_token is configured

Performance

  • P1 fix(noxa-gs8): Eliminate double BPE tokenization in chunk()token_estimate now uses word_count approximation; the splitter already runs the tokenizer for boundary placement
  • fix(noxa-bkq): Change Qdrant upsert to wait=true to guarantee segment visibility before delete_stale_by_url filter runs
  • perf(noxa-c28): Eliminate redundant chunk text clone in TeiProvider::embed — stream index-range tuples instead of materializing Vec<String> batches
  • perf(noxa-rso): Check git_branch_cache before dispatching spawn_blocking — ancestor walk avoids blocking task on cache hits
  • perf(noxa-5tl): Batch startup_scan spawn_blocking calls (256 paths/batch vs. one per file)
  • fix(noxa-u90): Split parse_ms timer into io_ms + parse_ms — heartbeat now correctly measures parse CPU time separately from file I/O

Architecture

  • chore(noxa-bd4): Delete dead IngestionContext struct (zero call sites, duplicated fields from PointPayload/SearchResult)
  • refactor(noxa-4jt): Split 1429-line mcp_bridge.rs into per-platform module directory (7 files, each <200 lines)
  • refactor(noxa-26r): Wrap RagConfig in Arc for WorkerContext — consistent with all other shared fields
  • refactor(noxa-qgq): Remove OnceLock<watch_roots> from Pipeline — pass canonicalized roots directly to spawn_workers, eliminating runtime panic invariant
  • refactor(noxa-mqm): Add record_parse_failure() to SessionCounters, remove direct atomic field access from process.rs
  • refactor(noxa-3g7): Add RagError::UrlValidation and WorkerPanic variants; increment files_failed on path confinement violations (previously silent)
  • refactor(noxa-ngd): Normalize parse_html_file to sync fn dispatched via spawn_blocking_parse helper (consistent with all other CPU-bound parsers)
  • refactor(noxa-udb): Move FormatProvenance match into apply() method, eliminating 14 mut locals in build_point_payload
  • perf(noxa-346): Extract FileMetadata struct built once per document — eliminates N-per-chunk clones of title, author, git_branch, content_hash, etc.

Docs

  • docs(noxa-dkl): Document is_indexable symlink TOCTOU window and defense-in-depth design

Simplification (follow-up pass)

  • refactor: Promote word_count to pub(crate), replace 8 inline split_whitespace().count() calls across parse files; replace boilerplate test helper with make_text_result; remove redundant WHAT comments; eliminate double-collect in startup_scan

Test plan

  • cargo test -p noxa-rag — 92 tests pass
  • cargo clippy --workspace — no new warnings
  • Index a local directory with mixed formats (.md, .pdf, .docx, .odt) and verify no OOM on crafted zip input
  • Verify heartbeat log shows both io_ms and parse_ms fields
  • Verify git_branch field populated correctly in Qdrant points

Summary by cubic

Hardens and speeds up the noxa-rag ingestion pipeline: blocks zip/XML bombs, reduces CPU/I/O overhead, and simplifies the worker/parse architecture for stability and clearer metrics. Adds deterministic mcp filenames and safer mcporter execution (30s timeout, truncated stderr).

  • Bug Fixes

    • Security: measured caps for ODT/PPTX zip reads; reject XML/feeds with DOCTYPE/ENTITY and scan raw bytes to catch non-UTF-8; 50 MiB guard (from pipeline.max_file_size_bytes) before startup_scan_key JSON parse.
    • Correctness/robustness: Qdrant upserts use wait=true to avoid races; TEI retries omit bodies when auth_token is set; Windows-safe, idempotent mcp writes; deterministic mcp filenames via stable hashing; mcporter subprocess now has a 30s timeout and truncates stderr; remove stray mcp_bridge.rs to fix duplicate module.
    • Errors/metrics: new RagError::UrlValidation and WorkerPanic; confinement rejections counted in files_failed; heartbeat now reports io_ms and parse_ms.
  • Refactors

    • Architecture: delete dead IngestionContext; split mcp_bridge into focused modules; wrap config in Arc; remove OnceLock for watch roots (pass to workers); move provenance writes into FormatProvenance::apply().
    • Performance: avoid double tokenization in chunking (use word_count); batch startup scan; check git-branch cache before spawning; stream ranges in TEI embed; extract FileMetadata to cut per-chunk clones.

Written for commit b8bab43. Summary will update on new commits. Review in cubic

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for syncing documents from Linkding, Memos, Bytestash, and Paperless platforms
  • Security

    • Improved XML and RSS feed parsing with entity expansion attack prevention
  • Performance

    • Optimized document processing with improved word counting and batch handling
    • Enhanced caching for better indexing efficiency

jmagar added 21 commits April 26, 2026 19:19
…ctory

Splits the 1429-line mcp_bridge.rs into a mcp_bridge/ directory module:
- mod.rs: shared types (McpSource, BridgeConfig, SyncReport, BridgeDocument,
  WriteStatus), McporterExecutor trait, McpBridge<E> struct + sync() dispatch,
  call_data(), private helper functions, and all tests
- executor.rs: ProcessMcporterExecutor implementation (59 lines)
- io.rs: write_bridge_document(), relative_output_path(), build_extraction(),
  and file I/O helpers (129 lines)
- linkding.rs: sync_linkding() + normalize_linkding_record() (120 lines)
- memos.rs: sync_memos() + normalize_memo_record() (86 lines)
- bytestash.rs: sync_bytestash() + normalize_bytestash_record() (105 lines)
- paperless.rs: sync_paperless() + fetch_paperless_lookup() +
  normalize_paperless_record() (127 lines)

Zero behavior change. Public API identical. All platform files < 200 lines.
…fice_zip_file

Replace bare entry.read_to_string() in the ODT/PPTX branch with the same
two-pass pattern already used by the DOCX branch: advisory entry.size() checks
remain as a fast-path, but actual reads are now capped by .take(per_entry_cap+1)
and measured against MAX_ODT_PPTX_PER_ENTRY_BYTES (10 MiB) and
MAX_ODT_PPTX_TOTAL_BYTES (50 MiB cumulative).

A crafted zip that declares tiny central-directory sizes but expands to
gigabytes is now rejected before allocating unbounded memory.

Tests: odt_decompression_bomb_rejected_by_measured_guard,
       odt_cumulative_decompression_bomb_rejected,
       odt_small_legitimate_file_parses_ok,
       docx_entry_count_bomb_rejected
…ion in chunk()

The token_estimate diagnostic field was previously computed by calling
tokenizer.encode() on every emitted chunk, doubling tokenizer invocations
on the spawn_blocking hot path (the splitter already runs the tokenizer
internally via ChunkConfig::with_sizer for accurate boundary placement).

ChunkCharIndex in text-splitter 0.25 does not expose a token count field,
so the fix replaces token_estimate(&text, tokenizer) with word_count(&text).
The splitter's boundary accuracy is unchanged; only the diagnostic metadata
field now uses a word-count approximation.

Also removes the now-unused token_estimate() helper to keep clippy clean.

Tests: chunk_token_estimate_uses_word_count_not_tokenizer,
       chunk_empty_content_returns_empty
Pipeline::new still accepts RagConfig by value and wraps it with Arc::new
internally — call sites are unchanged. WorkerContext.config: Arc<RagConfig>
clones the Arc pointer (O(1)) instead of deep-cloning the config value per worker.
…roots to spawn_workers

watch_roots is now computed in runtime::run(), passed directly to spawn_workers()
as Arc<Vec<PathBuf>>, and forwarded to WorkerContext::from_pipeline() as a
parameter. Eliminates the runtime panic from OnceLock::get().expect() and
enforces the invariant at compile time instead.
t0 previously captured canonicalize + open + read_to_end time, then
parse_ms was recorded before parse_file() was called — measuring I/O, not
parse CPU. Split into io_t0 (file I/O through read_to_end) and parse_t0
(wrapping parse_file only). Add io_ms field to JobStats, total_io_ms to
SessionCounters/CounterSnapshot, and emit both in the per-job tracing::info
and the heartbeat log so operators can distinguish I/O-bound from
parse-bound workloads.
…surface confinement violations

Replace RagError::Generic at all five validate_url_scheme call sites with
RagError::UrlValidation so callers can pattern-match on URL-specific
failures without parsing error strings. Replace the chunker spawn_blocking
join-handle panic with RagError::WorkerPanic for the same reason.

Surface path confinement rejections to monitoring by calling
ctx.counters.record_failure() before the silent Ok(default) return at the
TOCTOU guard — confinement rejections are now visible in files_failed
heartbeat metrics even though no Err is propagated.
…eliminate 14 mut locals in build_point_payload
…in process_job

Add find_cached_branch() helper that walks ancestor directories of the
already-canonical file path against the DashMap cache (O(depth) in-memory
lookups, no I/O). If any ancestor matches a cached git root, return the
stored branch immediately without spawning a blocking thread.

On cache miss, canonicalize the detected git root before storing it so
future pre-checks via canonical path ancestors always match, regardless of
symlinks or non-normalized input paths. Also cache the miss (None) at the
immediate parent directory to skip re-stating .git for files outside any
git repository.

This eliminates spawn_blocking for every file after the first in each git
repository, which is the common case for large watched directories.
Copilot AI review requested due to automatic review settings April 27, 2026 03:37
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

📝 Walkthrough

Walkthrough

The PR refactors noxa-rag by simplifying token estimation to word-count approximation, modularizing MCP bridge implementations into platform-specific modules, adding XML entity expansion security checks, improving embedding batching, making the parse module public with refactored payload construction, splitting pipeline timing metrics into I/O and parsing, converting shared config to Arc-based ownership, and introducing new error variants.

Changes

Cohort / File(s) Summary
Token & Chunking Simplification
crates/noxa-rag/src/chunker.rs
token_estimate now uses word-count approximation instead of tokenizer re-encoding; exposed word_count function within crate and expanded unit tests.
Embedding & Error Types
crates/noxa-rag/src/embed/tei.rs, crates/noxa-rag/src/error.rs
Batch processing refactored to compute index ranges and slice texts instead of cloning; conditional auth-token logging omits response body. Added UrlValidation and WorkerPanic error variants.
MCP Bridge Modularization
crates/noxa-rag/src/mcp_bridge/mod.rs, crates/noxa-rag/src/mcp_bridge/executor.rs, crates/noxa-rag/src/mcp_bridge/io.rs, crates/noxa-rag/src/mcp_bridge/linkding.rs, crates/noxa-rag/src/mcp_bridge/memos.rs, crates/noxa-rag/src/mcp_bridge/bytestash.rs, crates/noxa-rag/src/mcp_bridge/paperless.rs
Monolithic MCP bridge split into dedicated submodules; each implements pagination, normalization, and deduplication; IO utilities handle deterministic path generation, atomic writes, and extraction metadata building.
Parse Module Security & Refactoring
crates/noxa-rag/src/pipeline/parse/mod.rs, crates/noxa-rag/src/pipeline/parse/text.rs, crates/noxa-rag/src/pipeline/parse/binary.rs, crates/noxa-rag/src/pipeline/parse/rich.rs, crates/noxa-rag/src/pipeline/parse/tests.rs
Module exposed as pub(crate); parse_html_file made synchronous; XML/RSS entity expansion risk detection added; word counting now uses unified chunker::word_count; office zip decompression budgeting added; build_point_payload refactored to accept FileMetadata struct.
Pipeline Infrastructure & Timing
crates/noxa-rag/src/pipeline.rs, crates/noxa-rag/src/pipeline/heartbeat.rs, crates/noxa-rag/src/pipeline/process.rs, crates/noxa-rag/src/pipeline/runtime.rs, crates/noxa-rag/src/pipeline/worker.rs, crates/noxa-rag/src/pipeline/scan.rs, crates/noxa-rag/src/pipeline/startup_scan.rs
Config ownership changed to Arc<RagConfig>; watch_roots passed explicitly to workers instead of via OnceLock; timing split into io_ms and parse timing; startup scan now batches 256 paths per spawn_blocking call; git branch caching added; JSON file size checks added to startup scan.
Storage & Type Cleanup
crates/noxa-rag/src/store/qdrant/vector_store.rs, crates/noxa-rag/src/types.rs
Upsert now uses wait=true to ensure visibility before delete filter; IngestionContext struct deleted.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Pipeline
    participant Executor
    participant MCPServer
    participant FileIO
    participant Store

    User->>Pipeline: Configure MCP bridge (linkding/memos/bytestash/paperless)
    Pipeline->>Executor: spawn_workers with watch_roots Arc
    
    loop Sync Loop (per platform)
        Executor->>MCPServer: Execute action (list/lookup)
        MCPServer-->>Executor: JSON response
        Executor->>Executor: Normalize record to BridgeDocument
        Executor->>FileIO: write_bridge_document
        FileIO->>FileIO: Compute relative_output_path
        FileIO->>FileIO: Compare serialized bytes<br/>(avoid rewrite)
        FileIO->>FileIO: Atomic write (temp + rename)
        FileIO-->>Executor: WriteStatus
        Executor->>Executor: Update SyncReport counters
    end
    
    Executor-->>Pipeline: SyncReport (fetched/written/skipped)
Loading
sequenceDiagram
    participant Worker
    participant Parser as ParseModule
    participant Security
    participant Chunker
    participant Store

    Worker->>Parser: parse_file (bytes, url, provenance)
    
    alt Is XML/RSS/HTML
        Parser->>Security: contains_xml_entity_expansion_risk(bytes)
        Security-->>Parser: bool
        alt Risk Detected
            Parser-->>Worker: RagError::Parse
        else Safe
            Parser->>Parser: parse_html_file or parse_xml_file
        end
    else Binary (PDF/Office)
        Parser->>Parser: Decompress with budget caps
        alt Exceeds Budget
            Parser-->>Worker: RagError::Parse
        else Within Budget
            Parser->>Parser: Extract text
        end
    end
    
    Parser->>Chunker: Split text into chunks
    Chunker->>Chunker: word_count approximation<br/>(replaces tokenizer)
    
    loop For each chunk
        Parser->>Parser: build_point_payload(chunk, FileMetadata)
        Parser->>Store: Insert PointPayload
    end
    
    Parser-->>Worker: Emit JobStats with io_ms + parse_ms
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 Chunkers hopping without tokenizers bright,
MCP bridges modularized just right,
XML entities checked with care,
Batched embeddings float through the air,
Watch roots shared in arcs so fair! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately captures the main intent: a comprehensive review pass addressing security, performance, architecture, and simplification across 24 files with 21 commits. It reflects the multi-faceted nature of the changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bd-work/rag-review-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a broad hardening + performance + refactor pass across noxa-rag’s ingestion pipeline (parsing → chunking → embedding → Qdrant upsert), plus a new MCP bridge module split intended to replace the previous monolithic mcp_bridge.rs.

Changes:

  • Adds multiple ingestion safety guards (zip bomb caps, XML entity expansion pre-scan, startup-scan JSON size cap, safer logging) and improves error taxonomy.
  • Improves ingestion throughput/latency (startup scan batching, avoids double tokenization, avoids redundant clones, git-branch cache fast path, improved timing metrics).
  • Refactors parsing/provenance and pipeline internals (per-file metadata extraction, provenance application, watch roots handling, new MCP bridge module layout).

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
crates/noxa-rag/src/types.rs Removes IngestionContext and keeps provenance fields in PointPayload.
crates/noxa-rag/src/store/qdrant/vector_store.rs Switches Qdrant upsert to wait=true to avoid post-upsert delete visibility races.
crates/noxa-rag/src/pipeline/worker.rs Passes canonicalized watch_roots into worker context explicitly.
crates/noxa-rag/src/pipeline/startup_scan.rs Batches startup_scan_key computations per spawn_blocking task to reduce overhead.
crates/noxa-rag/src/pipeline/scan.rs Adds symlink-safety docs and a JSON size guard + tests for startup_scan_key.
crates/noxa-rag/src/pipeline/runtime.rs Removes OnceLock watch-roots setup; threads watch_roots into worker spawn.
crates/noxa-rag/src/pipeline/process.rs Separates IO vs parse timing, adds git-branch cache precheck, improves failure accounting, and uses new error variants.
crates/noxa-rag/src/pipeline/parse/text.rs Uses shared word_count, makes HTML parse sync (spawned via helper), and adds XML entity-expansion pre-scan.
crates/noxa-rag/src/pipeline/parse/tests.rs Adds tests ensuring XML/feed entity-expansion payloads are rejected.
crates/noxa-rag/src/pipeline/parse/rich.rs Applies the XML entity-expansion pre-scan to feed parsing; uses shared word_count.
crates/noxa-rag/src/pipeline/parse/mod.rs Adds FormatProvenance::apply, introduces FileMetadata, normalizes spawn-blocking usage, and simplifies payload building.
crates/noxa-rag/src/pipeline/parse/binary.rs Adds measured ODT/PPTX decompression caps and tests; uses shared word_count.
crates/noxa-rag/src/pipeline/heartbeat.rs Emits io_ms in heartbeat snapshots.
crates/noxa-rag/src/pipeline.rs Adds IO timing counters, refactors counters API, wraps config in Arc, removes OnceLock watch-roots.
crates/noxa-rag/src/mcp_bridge/paperless.rs Adds Paperless MCP normalization + sync logic.
crates/noxa-rag/src/mcp_bridge/mod.rs New MCP bridge module root with shared helpers, executor trait, sync orchestration, and tests.
crates/noxa-rag/src/mcp_bridge/memos.rs Adds Memos MCP sync + record normalization.
crates/noxa-rag/src/mcp_bridge/linkding.rs Adds Linkding MCP sync + record normalization (including platform URL building).
crates/noxa-rag/src/mcp_bridge/io.rs Adds bridge document storage format + atomic-ish write helper and path sanitization.
crates/noxa-rag/src/mcp_bridge/executor.rs Adds ProcessMcporterExecutor to invoke mcporter as a subprocess and parse JSON output.
crates/noxa-rag/src/mcp_bridge/bytestash.rs Adds Bytestash MCP sync + record normalization.
crates/noxa-rag/src/error.rs Adds UrlValidation and WorkerPanic error variants.
crates/noxa-rag/src/embed/tei.rs Avoids cloning batch texts and omits retry body logs when an auth token is configured.
crates/noxa-rag/src/chunker.rs Makes word_count reusable, switches token estimate to word-count approximation, and adds tests to prevent re-tokenization regressions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/noxa-rag/src/pipeline/parse/text.rs Outdated
Comment thread crates/noxa-rag/src/mcp_bridge/io.rs
Comment thread crates/noxa-rag/src/mcp_bridge/mod.rs
Comment thread crates/noxa-rag/src/pipeline/scan.rs Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bd4243a979

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/noxa-rag/src/mcp_bridge/mod.rs
Comment thread crates/noxa-rag/src/pipeline/scan.rs Outdated
- Remove mcp_bridge.rs (was deleted on disk but not staged in commit 59a195c;
  caused E0761 duplicate module root alongside mcp_bridge/)
- mcp_bridge/io.rs: remove destination before rename so write_bridge_document
  is idempotent on Windows (rename fails if dst exists)
- parse/text.rs: scan raw bytes in contains_xml_entity_expansion_risk instead
  of from_utf8().unwrap_or("") — non-UTF-8 bytes no longer silently suppress
  the <!DOCTYPE/<!ENTITY guard
- pipeline/scan.rs: replace hardcoded MAX_SCAN_JSON_BYTES with a max_json_bytes
  parameter threaded from pipeline.config.pipeline.max_file_size_bytes
- pipeline/scan.rs: fix startup_scan_key_json_at_size_limit_is_processed to
  actually exercise the boundary condition (len == limit, not a tiny file)

Resolves review thread PRRT_kwDOR_mP6c59t3qL
Resolves review thread PRRT_kwDOR_mP6c59t25j
Resolves review thread PRRT_kwDOR_mP6c59t25d
Resolves review thread PRRT_kwDOR_mP6c59t25O
Resolves review thread PRRT_kwDOR_mP6c59t3qN
Resolves review thread PRRT_kwDOR_mP6c59t25n
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
crates/noxa-rag/src/embed/tei.rs (2)

266-275: ⚠️ Potential issue | 🟡 Minor

Body still leaks via the non‑retryable error path when auth_token is set.

The retry-log omission added above is undone here: on any non‑success status that isn't 429/503/413 (or after should_retry returns false), the response body is read and a 512‑char preview is embedded into RagError::Embed { message }. That error message is virtually always logged by upstream callers, so the same TEI body bytes the retry path now suppresses will still surface in logs whenever auth is configured. Since the PR’s stated goal is "Omit TEI response body from retry logs when auth_token is configured", this is a consistency/security gap worth closing in the same place.

🛡️ Suggested fix
-            let body = resp.text().await.unwrap_or_default();
-            let preview: String = body.chars().take(512).collect();
-            return Err(RagError::Embed {
-                message: format!("TEI /embed returned HTTP {status_u16}: {preview}"),
-                status: Some(status_u16),
-            });
+            let message = if self.auth_token.is_some() {
+                // Body omitted to avoid leaking auth-gated response content into error logs.
+                format!("TEI /embed returned HTTP {status_u16} (body omitted: auth token configured)")
+            } else {
+                let body = resp.text().await.unwrap_or_default();
+                let preview: String = body.chars().take(512).collect();
+                format!("TEI /embed returned HTTP {status_u16}: {preview}")
+            };
+            return Err(RagError::Embed {
+                message,
+                status: Some(status_u16),
+            });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/noxa-rag/src/embed/tei.rs` around lines 266 - 275, The non-retryable
error path still reads and includes a 512‑char TEI response body preview in
RagError::Embed, leaking sensitive TEI content when an auth token is configured;
change the branch that constructs the error so it does NOT read/attach the
response body if the request used an auth token (e.g., check the same auth_token
flag/variable used earlier), and instead produce a safe generic message (or omit
the preview) in RagError::Embed; retain the body preview only when no auth token
is present so behavior matches the earlier retry-path omission.

332-352: 🧹 Nitpick | 🔵 Trivial

Refactor is correct; the intermediate Vec is unnecessary.

Slicing &texts[start..end] inside async move cleanly avoids the previous per-batch Vec<String> clone, and the borrow is sound because self and texts outlive the buffered(..).collect() await. One optional nit: futures::stream::iter accepts any IntoIterator, so the batch_ranges: Vec<(usize, usize, usize)> materialization adds an extra allocation for no benefit — you can feed the range iterator directly.

♻️ Optional simplification
-        let batch_ranges: Vec<(usize, usize, usize)> = (0..total_batches)
-            .map(|i| {
-                let start = i * BATCH_SIZE;
-                let end = (start + BATCH_SIZE).min(texts.len());
-                (i, start, end)
-            })
-            .collect();
-
-        let results: Vec<Vec<Vec<f32>>> = futures::stream::iter(batch_ranges)
+        let results: Vec<Vec<Vec<f32>>> = futures::stream::iter((0..total_batches).map(|i| {
+            let start = i * BATCH_SIZE;
+            let end = (start + BATCH_SIZE).min(texts.len());
+            (i, start, end)
+        }))
             .map(|(batch_idx, start, end)| async move {
                 self.embed_batch_adaptive(&texts[start..end], batch_idx, total_batches)
                     .await
             })
             .buffered(EMBED_PIPELINE_DEPTH)
             .collect::<Vec<_>>()
             .await
             .into_iter()
             .collect::<Result<_, _>>()?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/noxa-rag/src/embed/tei.rs` around lines 332 - 352, The intermediate
Vec batch_ranges is an unnecessary allocation—remove the batch_ranges binding
and pass the iterator directly to futures::stream::iter by using
(0..total_batches).map(|i| { let start = i * BATCH_SIZE; let end = (start +
BATCH_SIZE).min(texts.len()); (i, start, end) }) so the stream maps over those
tuples and calls self.embed_batch_adaptive(&texts[start..end], batch_idx,
total_batches).await, keeping the existing
buffered(EMBED_PIPELINE_DEPTH).collect().await.into_iter().collect::<Result<_,
_>>()? plumbing intact.
crates/noxa-rag/src/types.rs (1)

213-215: ⚠️ Potential issue | 🟡 Minor

Fix cargo fmt failure: extra blank line.

CI reported a cargo fmt --check --all diff. The double blank line between the closing } of SearchMetadataFilter and the #[cfg(test)] module looks like the culprit.

📐 Proposed fix
 }
 
-
 #[cfg(test)]
 mod tests {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/noxa-rag/src/types.rs` around lines 213 - 215, Remove the extra blank
line between the end of the SearchMetadataFilter type declaration (the closing
"}" for SearchMetadataFilter) and the subsequent #[cfg(test)] module; ensure
there is only a single newline separating them so the file matches cargo fmt
formatting (i.e., place #[cfg(test)] immediately after one blank line following
the SearchMetadataFilter closing brace).
crates/noxa-rag/src/pipeline/parse/text.rs (1)

171-199: ⚠️ Potential issue | 🟡 Minor

Fix cargo fmt failure on parse_xml_file signature/body wrapping.

CI reports a fmt diff in this region (likely the Result<ParsedFile, RagError> signature wrapping or the Err(RagError::Parse(...)) block). Run cargo fmt --all and commit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/noxa-rag/src/pipeline/parse/text.rs` around lines 171 - 199, The
function parse_xml_file has formatting that fails cargo fmt (signature/body
wrapping around Result<ParsedFile, RagError> and the Err(RagError::Parse(...))
block); run `cargo fmt --all` to reformat the file, ensure the parse_xml_file
signature and its Err return are wrapped/indented per rustfmt rules, then add
and commit the formatted changes so CI passes (locate parse_xml_file and the
Err(RagError::Parse(...)) return to verify formatting).
crates/noxa-rag/src/pipeline/scan.rs (1)

123-184: 🧹 Nitpick | 🔵 Trivial

Add explicit comment at spawn_startup_scan call site wiring max_json_bytes to config source.

The doc note at lines 107-108 correctly explains the design, but the relationship is only documented in startup_scan_key's function comment. At the call site in runtime.rs:99-106, the parameter is already correctly wired from pipeline.config.pipeline.max_file_size_bytes, but a defensive comment would make this coupling explicit and prevent accidental drift. Consider adding a comment above the call:

// max_json_bytes wired from pipeline.max_file_size_bytes to stay in sync with process_job guard

Alternatively, a debug_assert in spawn_startup_scan could validate at runtime (e.g., assert the parameter is not zero if config is expected to be non-zero).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/noxa-rag/src/pipeline/scan.rs` around lines 123 - 184, At the call
site of spawn_startup_scan in runtime.rs (where you pass
pipeline.config.pipeline.max_file_size_bytes into startup_scan_key), add an
explicit comment stating that max_json_bytes is intentionally wired to
pipeline.config.pipeline.max_file_size_bytes to keep it in sync with the
process_job guard and startup_scan_key behavior; optionally, add a defensive
debug_assert inside spawn_startup_scan (or at its caller) that the passed
max_json_bytes is non-zero (or matches expected invariants) to catch accidental
config drift at runtime.
crates/noxa-rag/src/mcp_bridge/mod.rs (1)

106-117: 🧹 Nitpick | 🔵 Trivial

Remove the redundant pub(self) visibility modifier.

pub(self) provides no additional visibility compared to omitting the modifier entirely and is flagged by the clippy::needless_pub_self lint (enabled by default in Clippy 1.72.0+). Since call_data is used exclusively within submodule implementations via self.call_data(...), the visibility marker is unnecessary.

♻️ Proposed fix
-    pub(self) async fn call_data(
+    async fn call_data(
         &self,
         source: McpSource,
         action: &str,
         params: Value,
     ) -> Result<Value, RagError> {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/noxa-rag/src/mcp_bridge/mod.rs` around lines 106 - 117, The method
declaration uses the redundant visibility modifier `pub(self)` on async fn
call_data; remove `pub(self)` so the signature becomes async fn call_data(&self,
source: McpSource, action: &str, params: Value) -> Result<Value, RagError> and
keep the body unchanged; this addresses the clippy::needless_pub_self lint while
preserving usage from submodule callers and retaining the call to
self.executor.call(...) and extract_mcporter_data(raw).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/noxa-rag/src/chunker.rs`:
- Line 132: Remove the unused import "use std::str::FromStr as _;" from
chunker.rs to silence the CI warning; the code that calls
str::parse::<Tokenizer>() does not require the trait to be imported, so delete
the FromStr import line (referenced symbol: FromStr as _ in chunker.rs) and run
cargo check to confirm the warning is gone.

In `@crates/noxa-rag/src/mcp_bridge/bytestash.rs`:
- Around line 5-10: CI indicates rustfmt import grouping/line-wrapping issues in
this file; run cargo fmt --all to apply rustfmt rules or manually reflow the use
statement that imports BridgeDocument, McpBridge, McpSource, McporterExecutor,
SyncReport, WriteStatus, array_field, join_base_url, join_non_empty,
optional_string, required_base_url, required_string, string_array and the nested
io::{build_extraction, write_bridge_document} so they follow rustfmt's grouping
and line length rules (split into multiple lines or group paths as rustfmt
expects) and re-run cargo fmt to verify the formatting passes.
- Around line 16-38: The sync_bytestash implementation currently makes a single
snippets.list call and loads all results; change it to paginate using
self.config.page_size by repeatedly calling call_data(McpSource::Bytestash,
"snippets.list", params) with pagination params (e.g. page_size and a page token
or offset depending on the Bytestash API) and accumulate records rather than
calling it once; inside the loop use array_field(&data, "snippets") or
data.as_array() to collect page results, update report counts as each record is
processed (normalize_bytestash_record, write_bridge_document, WriteStatus
handling remain the same), and stop when the API indicates there are no more
pages (no next_page_token / empty page), mirroring the pagination pattern used
in linkding.rs/memos.rs/paperless.rs.

In `@crates/noxa-rag/src/mcp_bridge/executor.rs`:
- Around line 48-53: The current error builds RagError::Generic by interpolating
String::from_utf8_lossy(&output.stderr).trim() which may leak secrets; instead,
debug!-log the full stderr (e.g., debug!("mcporter stderr: {:?}",
String::from_utf8_lossy(&output.stderr))) and return a sanitized
RagError::Generic that does not include the raw stderr—use a short safe summary
such as "mcporter call {selector} {action} failed (stderr redacted)" or include
only a non-sensitive prefix/hex-hash of output.stderr; update the code around
the existing Err(...) construction to perform the debug! call and replace the
interpolated stderr with the sanitized message while still referencing selector
and action.
- Around line 36-45: The mcporter invocation using Command::output().await can
hang indefinitely; add a configurable timeout field (e.g., timeout: Duration) to
ProcessMcporterExecutor, wrap the child spawn/await in tokio::time::timeout
using that field, and if the timeout elapses ensure the child process is killed
(send kill/terminate to the Child returned by Command::spawn) and return a
RagError indicating timeout; update the call site that currently uses
Command::new(&self.executable)...output().await to spawn the child, await its
output under timeout, and handle both timeout and child-kill error paths
accordingly.

In `@crates/noxa-rag/src/mcp_bridge/io.rs`:
- Around line 10-16: The filenames are unstable because stable_component_hash
(used by relative_output_path) relies on DefaultHasher; replace that hashing
with a deterministic 64-bit xxh3 implementation (xxhash_rust::xxh3::xxh3_64) so
the same external_id always yields the same 16‑hex filename. Update the
stable_component_hash function to compute xxh3_64(external_id.as_bytes()) and
keep the existing "{:016x}" formatting, and replace any other uses of
DefaultHasher in this file (e.g., other calls referenced around the 104-109
region) to use xxh3_64 to ensure on-disk paths remain stable across Rust
toolchain upgrades.
- Around line 37-43: The existing remove-then-rename sequence (using
temp_output_path, tokio::fs::write, tokio::fs::remove_file, tokio::fs::rename)
is intentional for reliable cross-platform behavior; update the inline comment
above that block to explicitly state why we keep remove_file: note that although
std::fs::rename may use MoveFileExW / FileRenameInfoEx on Windows, atomic
replacement semantics depend on Rust version and Windows 10+ API differences and
will fail if the destination is a directory, so without a guaranteed MSRV/target
Windows version we intentionally remove the destination first to ensure success;
leave the code and return WriteStatus::Written as-is.

In `@crates/noxa-rag/src/mcp_bridge/linkding.rs`:
- Around line 55-60: The check that decides when to stop paging currently does
two lookups of data.get("next") and can be simplified and made more efficient:
replace the condition `data.get("next").is_none() ||
data.get("next").is_some_and(Value::is_null)` with a single lookup like
`data.get("next").map_or(true, Value::is_null)` (or the equivalent
`!data.get("next").is_some_and(|v| !v.is_null())`) so the "next" field is read
once; keep the subsequent `if new_records == 0 { break; }` logic unchanged.
- Around line 8-13: The import block inside the super::{...} in linkding.rs is
not formatted to rustfmt/cargo fmt standards causing CI failures; run cargo fmt
--all (or rustfmt on that file) to canonicalize the grouping and line-wrapping
of the super::{ BridgeDocument, McpBridge, McpSource, McporterExecutor,
SyncReport, WriteStatus, array_field, as_u64_value, join_non_empty,
optional_string, required_string, required_value, string_array,
io::{build_extraction, write_bridge_document} } import so it matches the
project's formatting rules and removes the CI diff.

In `@crates/noxa-rag/src/mcp_bridge/memos.rs`:
- Around line 5-10: The import list in crates/noxa-rag/src/mcp_bridge/memos.rs
is not formatted to rustfmt's grouping/wrapping rules (symbols like
BridgeDocument, McpBridge, McpSource, McporterExecutor, SyncReport, WriteStatus,
array_field, first_line_title, join_base_url, optional_string,
required_base_url, required_string, string_array, and the io imports
build_extraction and write_bridge_document are flagged); run rustfmt by
executing cargo fmt --all (or rustfmt on that file) to reorder and wrap the use
statements to match the project's formatting, then re-commit and push the
updated file so CI passes.

In `@crates/noxa-rag/src/pipeline/parse/binary.rs`:
- Around line 167-183: The current read-and-check block (using remaining,
per_entry_cap, read_cap, xml_buf, .take(read_cap).read_to_string(...),
read_bytes and odt_pptx_measured_total) is correct functionally but needs two
fixes: 1) run cargo fmt --all to satisfy rustfmt so the (&mut
entry).take(read_cap).read_to_string(...) chain is formatted on one line; 2)
improve the RagError::Parse message emitted in the if read_bytes > per_entry_cap
branch to distinguish a per-entry limit hit from a cumulative-cap exhaustion
(i.e., when remaining==0 / per_entry_cap==0) — update the error text inside the
RagError::Parse formatting to mention either "exceeds per-entry decompression
limit" or "cumulative decompression budget exhausted" depending on per_entry_cap
== 0 (or include both pieces of context), then run tests and commit.

In `@crates/noxa-rag/src/pipeline/parse/tests.rs`:
- Around line 142-147: The CI failure is due to rustfmt reflowing short
function-call expressions (e.g., FileMetadata::from_result_and_provenance and
build_point_payload) and the sample_file_metadata helper; fix it by running
`cargo fmt --all` locally and committing the reformatted changes, ensuring the
call sites (FileMetadata::from_result_and_provenance(...) and
build_point_payload(...)) and the sample_file_metadata helper are formatted
according to rustfmt so the function-call wrapping matches CI expectations.
- Around line 416-430: The test parse_feed_rejects_entity_expansion currently
only checks result.is_err(); change it to mirror the XML test by unwrapping the
error (let err = result.unwrap_err().to_string()) and assert the error message
mentions DOCTYPE/ENTITY/entity expansion (e.g., assert!(err.contains("DOCTYPE")
|| err.contains("ENTITY") || err.contains("entity expansion"))), so that the
failure is specifically due to the entity-expansion guard in parse_file rather
than any other parse error.

In `@crates/noxa-rag/src/pipeline/scan.rs`:
- Around line 549-568: The at-limit test is currently asserting is_some() but
only passes because serde_json fails on trailing NULs and the code falls back to
mtime/size; change the test (the test covering the at-limit case that calls
startup_scan_key) to assert the returned Option is Some and that the string
begins with "mtime:" (e.g. unwrap() and starts_with("mtime:")) to explicitly
verify the mtime/size branch; locate the test that calls startup_scan_key at the
size limit and replace the loose is_some() check with this stronger assertion.
- Around line 580-584: Rustfmt is breaking the long chained call to
OpenOptions::new().write(true).open(&file); wrap the chained OpenOptions
invocation (the OpenOptions::new().write(true).open(&file) expression used to
create f) in parentheses or assign it to a temporary variable so the formatter
can break lines cleanly, then call set_len(LIMIT) on the resulting File; ensure
you still use std::fs::OpenOptions::new, .write(true), .open(&file) and
.set_len(LIMIT) (and keep the existing expect("open")/expect("set_len")
behavior).

---

Outside diff comments:
In `@crates/noxa-rag/src/embed/tei.rs`:
- Around line 266-275: The non-retryable error path still reads and includes a
512‑char TEI response body preview in RagError::Embed, leaking sensitive TEI
content when an auth token is configured; change the branch that constructs the
error so it does NOT read/attach the response body if the request used an auth
token (e.g., check the same auth_token flag/variable used earlier), and instead
produce a safe generic message (or omit the preview) in RagError::Embed; retain
the body preview only when no auth token is present so behavior matches the
earlier retry-path omission.
- Around line 332-352: The intermediate Vec batch_ranges is an unnecessary
allocation—remove the batch_ranges binding and pass the iterator directly to
futures::stream::iter by using (0..total_batches).map(|i| { let start = i *
BATCH_SIZE; let end = (start + BATCH_SIZE).min(texts.len()); (i, start, end) })
so the stream maps over those tuples and calls
self.embed_batch_adaptive(&texts[start..end], batch_idx, total_batches).await,
keeping the existing
buffered(EMBED_PIPELINE_DEPTH).collect().await.into_iter().collect::<Result<_,
_>>()? plumbing intact.

In `@crates/noxa-rag/src/mcp_bridge/mod.rs`:
- Around line 106-117: The method declaration uses the redundant visibility
modifier `pub(self)` on async fn call_data; remove `pub(self)` so the signature
becomes async fn call_data(&self, source: McpSource, action: &str, params:
Value) -> Result<Value, RagError> and keep the body unchanged; this addresses
the clippy::needless_pub_self lint while preserving usage from submodule callers
and retaining the call to self.executor.call(...) and
extract_mcporter_data(raw).

In `@crates/noxa-rag/src/pipeline/parse/text.rs`:
- Around line 171-199: The function parse_xml_file has formatting that fails
cargo fmt (signature/body wrapping around Result<ParsedFile, RagError> and the
Err(RagError::Parse(...)) block); run `cargo fmt --all` to reformat the file,
ensure the parse_xml_file signature and its Err return are wrapped/indented per
rustfmt rules, then add and commit the formatted changes so CI passes (locate
parse_xml_file and the Err(RagError::Parse(...)) return to verify formatting).

In `@crates/noxa-rag/src/pipeline/scan.rs`:
- Around line 123-184: At the call site of spawn_startup_scan in runtime.rs
(where you pass pipeline.config.pipeline.max_file_size_bytes into
startup_scan_key), add an explicit comment stating that max_json_bytes is
intentionally wired to pipeline.config.pipeline.max_file_size_bytes to keep it
in sync with the process_job guard and startup_scan_key behavior; optionally,
add a defensive debug_assert inside spawn_startup_scan (or at its caller) that
the passed max_json_bytes is non-zero (or matches expected invariants) to catch
accidental config drift at runtime.

In `@crates/noxa-rag/src/types.rs`:
- Around line 213-215: Remove the extra blank line between the end of the
SearchMetadataFilter type declaration (the closing "}" for SearchMetadataFilter)
and the subsequent #[cfg(test)] module; ensure there is only a single newline
separating them so the file matches cargo fmt formatting (i.e., place
#[cfg(test)] immediately after one blank line following the SearchMetadataFilter
closing brace).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3fd83159-dbcf-4405-adbb-2b03c2b2a28a

📥 Commits

Reviewing files that changed from the base of the PR and between 5380a90 and 47d420b.

📒 Files selected for processing (24)
  • crates/noxa-rag/src/chunker.rs
  • crates/noxa-rag/src/embed/tei.rs
  • crates/noxa-rag/src/error.rs
  • crates/noxa-rag/src/mcp_bridge/bytestash.rs
  • crates/noxa-rag/src/mcp_bridge/executor.rs
  • crates/noxa-rag/src/mcp_bridge/io.rs
  • crates/noxa-rag/src/mcp_bridge/linkding.rs
  • crates/noxa-rag/src/mcp_bridge/memos.rs
  • crates/noxa-rag/src/mcp_bridge/mod.rs
  • crates/noxa-rag/src/mcp_bridge/paperless.rs
  • crates/noxa-rag/src/pipeline.rs
  • crates/noxa-rag/src/pipeline/heartbeat.rs
  • crates/noxa-rag/src/pipeline/parse/binary.rs
  • crates/noxa-rag/src/pipeline/parse/mod.rs
  • crates/noxa-rag/src/pipeline/parse/rich.rs
  • crates/noxa-rag/src/pipeline/parse/tests.rs
  • crates/noxa-rag/src/pipeline/parse/text.rs
  • crates/noxa-rag/src/pipeline/process.rs
  • crates/noxa-rag/src/pipeline/runtime.rs
  • crates/noxa-rag/src/pipeline/scan.rs
  • crates/noxa-rag/src/pipeline/startup_scan.rs
  • crates/noxa-rag/src/pipeline/worker.rs
  • crates/noxa-rag/src/store/qdrant/vector_store.rs
  • crates/noxa-rag/src/types.rs

Comment thread crates/noxa-rag/src/chunker.rs Outdated
Comment thread crates/noxa-rag/src/mcp_bridge/bytestash.rs
Comment thread crates/noxa-rag/src/mcp_bridge/bytestash.rs
Comment thread crates/noxa-rag/src/mcp_bridge/executor.rs Outdated
Comment thread crates/noxa-rag/src/mcp_bridge/executor.rs
Comment thread crates/noxa-rag/src/pipeline/parse/binary.rs
Comment thread crates/noxa-rag/src/pipeline/parse/tests.rs Outdated
Comment thread crates/noxa-rag/src/pipeline/parse/tests.rs
Comment thread crates/noxa-rag/src/pipeline/scan.rs
Comment thread crates/noxa-rag/src/pipeline/scan.rs
- executor.rs: add 30s timeout to mcporter subprocess; truncate stderr
  to 512 chars to prevent large/sensitive output leaking into error msgs
- io.rs: replace DefaultHasher (not stable across Rust versions) with
  xxh3_64 for deterministic filenames in stable_component_hash
- linkding.rs: simplify next-page termination check to map_or(true, is_null)
- binary.rs: separate per-entry vs cumulative error messages in ODT/PPTX
  decompression guard so the rejection reason is explicit
- parse/tests.rs: strengthen parse_feed_rejects_entity_expansion to assert
  the error message mentions DOCTYPE/ENTITY (mirrors XML test)
- chunker.rs: remove unused `use std::str::FromStr as _` import (CI warning)
- cargo fmt --all: fix import grouping in bytestash.rs, linkding.rs, memos.rs,
  parse/tests.rs; fix OpenOptions chain wrapping in scan.rs

No action taken on:
- io.rs:43 rename/remove_file: kept as-is; cross-platform safe regardless
- bytestash.rs:38 pagination: no official pagination API confirmed by reviewer
- scan.rs:568 boundary test: reviewer acknowledged implementation is correct

Resolves review thread PRRT_kwDOR_mP6c59zaLc
Resolves review thread PRRT_kwDOR_mP6c59zaLq
Resolves review thread PRRT_kwDOR_mP6c59zaLs
Resolves review thread PRRT_kwDOR_mP6c59zaL3
Resolves review thread PRRT_kwDOR_mP6c59zaL5
Resolves review thread PRRT_kwDOR_mP6c59zaL8
Resolves review thread PRRT_kwDOR_mP6c59zaMA
Resolves review thread PRRT_kwDOR_mP6c59zaMJ
Resolves review thread PRRT_kwDOR_mP6c59zaMM
Resolves review thread PRRT_kwDOR_mP6c59zaMR
Resolves review thread PRRT_kwDOR_mP6c59zaMT
Resolves review thread PRRT_kwDOR_mP6c59zaMm
Resolves review thread PRRT_kwDOR_mP6c59zaMq
Resolves review thread PRRT_kwDOR_mP6c59zaMs
Resolves review thread PRRT_kwDOR_mP6c59zaMz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants