forked from 0xMassi/webclaw
-
Notifications
You must be signed in to change notification settings - Fork 0
fix(rag): 20-item review pass — security, perf, arch, simplification #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jmagar
wants to merge
23
commits into
main
Choose a base branch
from
bd-work/rag-review-fixes
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
0f588de
chore(noxa-bd4): delete dead IngestionContext struct from types.rs
jmagar 59a195c
refactor(noxa-4jt): split mcp_bridge.rs into per-platform module dire…
jmagar cf7eed2
fix(noxa-bkq): change upsert to wait=true to ensure ordering with del…
jmagar 1b46471
fix(noxa-5gf): add measured decompression cap to ODT/PPTX in parse_of…
jmagar d11219b
fix(noxa-gs8): replace double tokenization with word_count approximat…
jmagar c9c9b15
refactor(noxa-mqm): add record_parse_failure() to SessionCounters, re…
jmagar c80b2ec
refactor(noxa-26r): wrap RagConfig in Arc for WorkerContext
jmagar 0d99532
fix(noxa-3b8): add 50 MiB size guard to startup_scan_key before JSON …
jmagar 8779914
perf(noxa-c28): eliminate redundant chunk text clone in TeiProvider::…
jmagar 5b59687
docs(noxa-dkl): document is_indexable symlink TOCTOU window and defen…
jmagar bfa3b53
refactor(noxa-qgq): remove OnceLock<watch_roots> from Pipeline, pass …
jmagar 9b220ca
fix(noxa-byr): omit TEI response body from retry logs when auth_token…
jmagar debc760
perf(noxa-5tl): batch startup_scan spawn_blocking calls to reduce per…
jmagar 35db1ad
refactor(noxa-ngd): normalize parse_html_file to sync fn via spawn_bl…
jmagar c390dd3
fix(noxa-u90): split parse_ms timer into io_ms + parse_ms in process_job
jmagar 3a9479b
fix(noxa-qkg): guard XML/OPML/feed parsers against entity expansion a…
jmagar 4e71049
refactor(noxa-3g7): add UrlValidation/WorkerPanic RagError variants, …
jmagar 2a39b7d
refactor(noxa-udb): move FormatProvenance match into apply() method, …
jmagar 8d2cc7b
perf(noxa-rso): check git_branch_cache before spawning blocking task …
jmagar 8c09c8a
perf(noxa-346): factor per-file metadata into FileMetadata struct, el…
jmagar bd4243a
refactor: simplify — shared word_count, remove redundant comments, el…
jmagar 47d420b
fix: address PR #14 review comments
jmagar b8bab43
fix: address PR #14 round-2 review comments (coderabbitai)
jmagar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,105 @@ | ||
| use serde_json::Value; | ||
|
|
||
| use crate::RagError; | ||
|
|
||
| use super::{ | ||
| BridgeDocument, McpBridge, McpSource, McporterExecutor, SyncReport, WriteStatus, array_field, | ||
| io::{build_extraction, write_bridge_document}, | ||
| join_base_url, join_non_empty, optional_string, required_base_url, required_string, | ||
| string_array, | ||
| }; | ||
|
|
||
| impl<E> McpBridge<E> | ||
| where | ||
| E: McporterExecutor, | ||
| { | ||
| pub(super) async fn sync_bytestash(&self) -> Result<SyncReport, RagError> { | ||
| let base_url = required_base_url(&self.config, McpSource::Bytestash)?; | ||
| let data = self | ||
| .call_data(McpSource::Bytestash, "snippets.list", serde_json::json!({})) | ||
| .await?; | ||
| let records = if let Some(array) = data.as_array() { | ||
| array.iter().collect::<Vec<_>>() | ||
| } else { | ||
| array_field(&data, "snippets")? | ||
| }; | ||
|
|
||
| let mut report = SyncReport::default(); | ||
| for record in records { | ||
| let document = normalize_bytestash_record(record, base_url)?; | ||
| report.fetched += 1; | ||
| match write_bridge_document(&self.config.watch_dir, &document).await? { | ||
| WriteStatus::Written => report.written += 1, | ||
| WriteStatus::Unchanged => report.skipped += 1, | ||
| } | ||
| } | ||
|
|
||
| Ok(report) | ||
| } | ||
|
jmagar marked this conversation as resolved.
|
||
| } | ||
|
|
||
| pub fn normalize_bytestash_record( | ||
| record: &Value, | ||
| platform_base_url: &str, | ||
| ) -> Result<BridgeDocument, RagError> { | ||
| let id = required_string(record, "id")?; | ||
| let title = optional_string(record, "title"); | ||
| let description = optional_string(record, "description"); | ||
| let language = optional_string(record, "language"); | ||
| let fragments = record | ||
| .get("fragments") | ||
| .and_then(Value::as_array) | ||
| .ok_or_else(|| RagError::Parse("bytestash record missing fragments array".to_string()))?; | ||
|
|
||
| let mut markdown_parts = Vec::new(); | ||
| if let Some(value) = title.as_deref() { | ||
| markdown_parts.push(format!("# {value}")); | ||
| } | ||
| if let Some(value) = description.as_deref() { | ||
| markdown_parts.push(value.to_string()); | ||
| } | ||
| for fragment in fragments { | ||
| let file_name = fragment | ||
| .get("fileName") | ||
| .or_else(|| fragment.get("file_name")) | ||
| .and_then(Value::as_str) | ||
| .unwrap_or("snippet"); | ||
| let code = fragment | ||
| .get("code") | ||
| .and_then(Value::as_str) | ||
| .unwrap_or_default(); | ||
| markdown_parts.push(format!( | ||
| "## {file_name}\n```{}\n{}\n```", | ||
| language.clone().unwrap_or_default(), | ||
| code | ||
| )); | ||
| } | ||
| let plain_text = join_non_empty([ | ||
| title.clone(), | ||
| description.clone(), | ||
| Some( | ||
| fragments | ||
| .iter() | ||
| .filter_map(|fragment| fragment.get("code").and_then(Value::as_str)) | ||
| .collect::<Vec<_>>() | ||
| .join("\n\n"), | ||
| ), | ||
| ]); | ||
| let url = join_base_url(platform_base_url, &format!("/api/snippets/{id}"))?; | ||
|
|
||
| Ok(BridgeDocument { | ||
| source: McpSource::Bytestash, | ||
| external_id: format!("bytestash:{id}"), | ||
| platform_url: Some(url.clone()), | ||
| extraction: build_extraction( | ||
| url, | ||
| title, | ||
| None, | ||
| None, | ||
| language, | ||
| string_array(record.get("categories")), | ||
| markdown_parts.join("\n\n"), | ||
| plain_text, | ||
| ), | ||
| }) | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.