Conversation
Long-form Notion pages (troubleshooting, create/edit observation, etc.) were silently dropping sections during automatic translation. The likely cause: feeding very large chunks to the model saturated its effective attention window, causing it to omit headings and paragraphs without raising an error. Changes: - Lower proactive chunk ceiling from 500 K → 120 K chars so each translation request stays well within reliable model attention range - Add structural completeness validation after every translation call: checks heading count, fenced code blocks, bullet/numbered lists, table lines, and severe length shrinkage (< 55 % of source) - Retry with progressively smaller chunks (halved each attempt, floor 8 K chars) up to TRANSLATION_COMPLETENESS_MAX_RETRIES (2) times when incompleteness is detected, then surface a non-critical error Closes #166
🐳 Docker Image PublishedYour Docker image has been built and pushed for this PR. Image Reference: Platforms: linux/amd64, linux/arm64 TestingTo test this image: docker pull docker.io/communityfirst/comapeo-docs-api:pr-169
docker run -p 3001:3001 docker.io/communityfirst/comapeo-docs-api:pr-169Built with commit 156021d |
|
@codex review |
Code Review SummaryStatus: 1 Issue Found | Recommendation: Address before merge OverviewThis increment adds data URL image handling (routing through processing pipeline instead of keeping inline), removes batch-level timeout, and improves failed page accounting.
Issue Details
Files Reviewed (8 files)
Key Changes Verified
Previously Resolved Issues (carried forward)
Reviewed by minimax-m2.5-20260211 · 1,573,167 tokens |
🚀 Preview DeploymentYour documentation preview is ready! Preview URL: https://pr-169.comapeo-docs.pages.dev 🔄 Content: Regenerated 5 pages from Notion (script changes detected)
This preview will update automatically when you push new commits to this PR. Built with commit 156021d |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 33c1581bba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 33c1581bba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… overhead
P1: TRANSLATION_COMPLETENESS_MAX_RETRIES was 2, so halving from 120k only
reached 60k → 30k before giving up. Reaching the 8k floor requires 4
halvings (120k→60k→30k→15k→8k), so raise the constant to 4.
P2: getChunkContentBudget was flooring the *content* budget at
TRANSLATION_MIN_CHUNK_MAX_CHARS (8k), ignoring prompt overhead (~2.6k).
This made the actual request larger than the documented 8k minimum.
Fix: subtract overhead from the total limit and floor the content budget
at 1; the 8k total-request floor is already enforced by the retry caller.
Update the "preserves heading structures" test to use a chunkLimit that
reflects a realistic total-request budget (3_200 chars) rather than a
raw content size (500 chars), which the old incorrect floor had masked.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fd3b4d24b7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
collectMarkdownStructureMetrics only matched ATX headings (# Heading). CommonMark/Docusaurus also accept setext headings (underline with === or ---). If the model reformats a heading into setext style the count would drop and translateText would incorrectly treat the translation as incomplete. Add a multiline regex for setext headings and include them in headingCount.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c36dfe9b16
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The previous regex /^.+\n[=\-]{2,}\s*/gm matched any non-empty line
followed by --- or ===, which also matches list items before thematic
breaks (e.g. "- Item\n---"). This caused isSuspiciouslyIncompleteTranslation
to count spurious headings in the source and falsely flag complete
translations as incomplete.
Fix: only match === underlines (setext H1). The = character has no other
CommonMark meaning, so this is unambiguous. Setext H2 (--- underline) is
skipped because it cannot be distinguished from a thematic break without
a full parser. Notion content uses ATX headings exclusively anyway.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4459a152a0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… check Setext H2 headings (Heading\n---): re-introduce detection with a negative lookahead that excludes lines starting with list markers or block-level prefixes, which avoids the thematic-break false-positive while still catching genuine section headings. Admonitions (:::type … :::): Docusaurus callout blocks can be silently dropped by the model without triggering any of the existing checks. Count opening+closing ::: pairs (like fenced code blocks) and treat a drop in admonitionCount as an incompleteness signal.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ecdf12235a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ection P1 — table detection: the previous regex only matched GFM table rows with outer pipes (| A | B |). Models sometimes emit pipeless form (A | B | C). Switch to matching GFM table *separator* rows instead — these are the unambiguous per-spec indicator of a table and work regardless of outer-pipe style. Threshold lowered to 1 separator (from 2 data lines). Regex uses a simple character-class + .filter() to avoid ReDoS-unsafe nested quantifiers. P2 — fenced code content: structural markers inside fenced code blocks (headings, list items, table rows, admonitions) were counted as real document structure. Strip fenced block interiors before running all regex checks so that code samples do not inflate source counts and cause false-positive incompleteness failures.
|
@codex review |
This comment was marked as off-topic.
This comment was marked as off-topic.
Fixes a TypeScript 'includes does not exist on type never' error and allows up to 3 spaces of optional indentation for fenced code blocks in stripFencedCodeContent. Co-authored-by: Junie <junie@jetbrains.com>
Template literal contained unescaped backticks that broke the TypeScript parser. Also added explicit constraints forbidding JSX attribute wrapping and HTML/JSX tag modification to reduce future LLM corruption.
Replace flat per-file stats with a dynamic box-drawing metrics table (headings, images, lines, chars). Consolidate checks into a single human-readable block with inline timing and failure details. Trim logs section to three essential lines.
…calls
Tests that style={{...}}, className, and HTML block elements survive
translation intact across pt-BR and es. Skipped automatically when
OPENAI_API_KEY is absent. Loads .env for local runs. Serves as
acceptance criterion for the planned jsxPlaceholders protection layer.
Add three new test cases to `parseFrontmatterKeys` and integration tests: 1. parseFrontmatterKeys ignores body-level horizontal-rule blocks without anchor keys 2. parseFrontmatterKeys keeps real frontmatter and ignores later Video blocks 3. succeeds when a later chunk contains a body-level Video block These tests validate that Video blocks appearing after markdown separators in the document body are correctly handled and do not interfere with frontmatter parsing or multi-chunk translation workflows.
…g i18n files Docusaurus 3.7+ validates image paths during MDX compilation before remark plugins run, causing build failures when __locale_ref__ placeholders reach the i18n files on disk. Add decodeLocaleImagePlaceholderPaths() (parallel to the existing remote variant) and call it in saveTranslatedContentToDisk() so translated files always contain canonical /images/... paths, never __locale_ref__ placeholders. Also fix the two existing i18n files (creating-a-new-observation.md in pt and es) that were written by a previous pipeline run with unresolved placeholders.
…ests - Gate JSX integration tests with explicit RUN_LIVE_TRANSLATION_TESTS=1 flag instead of implicit OPENAI_API_KEY presence, preventing unintended live API calls in CI or developer machines with .env files - Fix heading chunking test: increase chunkLimit from 3200 to 3600 to account for ~3.05K TRANSLATION_PROMPT overhead, replace fragile toHaveBeenCalledTimes(3) with behavioral assertions on translated headings and section-level chunking structure - Fix Video/frontmatter test: increase chunkLimit from 3000 to 6000 to avoid budget=1 which caused splitByLines to drop newlines during character-level force-splitting, corrupting frontmatter in reassembled markdown Refs: #166
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
…ion test
The splitByLines function used text.split("\n") which consumed newline
characters. When lines didn't fit and current was flushed, the "\n"
separator between them was lost, making reassembly via join("") corrupt
frontmatter structure (e.g. "---\ntitle" became "---title").
Fix: use split(/(\n)/) to keep newlines as explicit tokens in the output
array, so join("") reconstructs the original text exactly. This mirrors
the approach already used by splitByParagraphs.
Add a regression test that uses the original chunkLimit: 3_000 which
produces a content budget of ~1 char, forcing character-level splitting.
The test asserts that frontmatter keys, Video blocks, and body headings
all survive the round-trip through many tiny chunks.
Refs: #166
…ocks - remark-fix-image-paths: use Node from @types/unist via a local UnistNode = Node & Record<string, unknown> intersection instead of any - generateBlocks.test.ts: narrow 5 new mock-call lambdas from any[] to unknown[], which TypeScript accepts since the bodies already guard with typeof
When loadCanonicalEnglishMarkdown returns null (e.g. docs/ not yet generated in CI), getSourceMarkdownForTranslation now logs the fallback to the Notion API instead of silently proceeding.
eval/ scripts use Bun-only .ts-extension dynamic imports and globalThis.fetch assignment that tsc rejects. They're standalone Bun scripts, not part of the compiled output.
…abels ensureTranslatedFrontmatter was replacing title, sidebar_label, and pagination_label all with the same effectiveTitle, discarding any distinct values the LLM correctly translated. Add extractFrontmatterValue helper that reads sidebar_label and pagination_label from the LLM-translated content (translatedContent), falling back to effectiveTitle only when the key is absent or blank. title always uses effectiveTitle to guard against LLM hallucination. Add 5 regression tests covering: distinct label preservation, no- frontmatter fallback, blank-value fallback, title hallucination guard, and no-key-injection when the English canonical lacks the key.
…fallback In the fallback path where canonical English markdown is unavailable and content is fetched from Notion directly, S3 image URLs have no local file equivalents. Switch imageHandling from placeholder-path to placeholder-text so the LLM receives [Image: alt] tokens instead of unresolvable /images/__remote_ref__/... paths. Update the imageStabilization test to use a real S3 hostname and assert the text-placeholder behaviour for this code path.
…body content recoverMissingTranslatedHeadings only checked aggregate completeness metrics before accepting a recovered document, so a heading could be reinserted with zero body content beneath it (content merged into an adjacent section) and the recovery would silently pass. After the aggregate completeness guard, re-parse the recovered document and verify that every actually-inserted heading has non-empty sectionContentLength. Track actually-inserted indexes in a separate set to exclude headings skipped by hasHeadingNearInsertionPoint (cached but never inserted), and apply toLocaleLowerCase on both sides of the lookup to match the normalization used elsewhere in the function. Also add security/detect-object-injection ESLint override for scripts/notion-translate/ (same rationale as the existing override for scripts/notion-fetch/generateBlocks.ts: controlled property access, not user input).
🧹 Preview Deployment CleanupThe preview deployment for this PR has been cleaned up. Preview URL was: Note: Cloudflare Pages deployments follow automatic retention policies. Old previews are cleaned up automatically. |
Problem
Closes #166
Long-form Notion pages (troubleshooting, create/edit observation, view/edit track, understanding exchange, etc.) were silently dropping sections during automatic translation. The root cause: feeding very large chunks (up to 500 K chars) to the model saturated its effective attention window, causing it to omit headings and paragraphs without raising any error.
Solution
Two complementary mechanisms:
1. Proactive aggressive chunking
Lowered
TRANSLATION_CHUNK_MAX_CHARSfrom 500,000 → 120,000 chars. Each translation request now stays well within the model's reliable attention range, reducing the chance of content being dropped in the first place.2. Structural completeness validation + retry
After every translation call, the result is compared against the source using structural markers:
If incompleteness is detected, the chunk limit is halved and the translation is retried (up to
TRANSLATION_COMPLETENESS_MAX_RETRIES = 4times, with a floor ofTRANSLATION_MIN_CHUNK_MAX_CHARS = 8,000chars). After exhausting retries a non-critical error is surfaced so the pipeline can log and continue.Test plan
token_overflowbunx vitest run scripts/notion-translate/translateFrontMatter.test.ts)