feat(translate): add no-overwrite strategy for human-reviewed translations#179
feat(translate): add no-overwrite strategy for human-reviewed translations#179
Conversation
…tions (#171) When a translation exists and the English source is newer, route to a new automated path instead of overwriting the existing human-reviewed content. The automated path: - Creates a timestamped versioned file in ./automated-translations/{lang} - Writes a .notion.json sidecar (with parentId) for later Notion push - Creates a new Notion page with Language = "PT - automated" / "ES - automated" using forceCreate=true to bypass DB title+language deduplication Key additions: - AUTOMATED_LANGUAGE_MAP, AUTOMATED_OUTPUT_DIRS, helper fns in constants.ts - forceCreate param on createNotionPageWithBlocks in translateBlocks.ts - updateKind discriminant field on TranslationUpdateResult - saveAutomatedTranslationToDisk() and processAutomatedTranslation() in index.ts - automatedTranslations counter wired through summary types and main() - scripts/push-new-translation-to-notion.ts — push saved file to Notion - scripts/run-single-page-translation.ts — local smoke-test script - 7 integration tests covering all routing scenarios (no-overwrite.test.ts) Fixes #171
🐳 Docker Image PublishedYour Docker image has been built and pushed for this PR. Image Reference: Platforms: linux/amd64, linux/arm64 TestingTo test this image: docker pull docker.io/communityfirst/comapeo-docs-api:pr-179
docker run -p 3001:3001 docker.io/communityfirst/comapeo-docs-api:pr-179Built with commit 46c2113 |
🚀 Preview DeploymentYour documentation preview is ready! Preview URL: https://pr-179.comapeo-docs.pages.dev 🔄 Content: Regenerated 5 pages from Notion (script changes detected)
This preview will update automatically when you push new commits to this PR. Built with commit 46c2113 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5ce6d609b0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…e strategy Creates and verifies real Notion pages with Language = "PT - automated" using forceCreate=true. Gated behind RUN_LIVE_NOTION_TESTS=1 so CI is safe. Run: RUN_LIVE_NOTION_TESTS=1 bunx vitest run scripts/notion-translate/__tests__/no-overwrite.live.test.ts RUN_LIVE_NOTION_TESTS=1 NO_OVERWRITE_KEEP_PAGE=1 ... # keep pages for visual inspection Two tests: 1. Single page created with correct Language property and parent relation 2. Two calls with forceCreate=true produce two distinct pages (dedup bypassed)
…Notion write Notion returns `icon: null` inside block type sub-objects and `object` at the block root when reading pages, but rejects both on block creation. Stripping them from translateBlocksTree prevents API errors like: "body.children[N].paragraph.icon should be an object or undefined, instead was null" Found during live end-to-end test of the no-overwrite automated path.
Add Scenario 8 (sibling matching via shared container) and Scenario 9 (batch-mode with container skip) to validate the no-overwrite translation flow handles the three-level Notion hierarchy correctly. Scenario 8: ChildEnglish resolves ChildPT through the shared ParentContainer relation without using --local-only. Scenario 9: fetchPublishedEnglishPages returns both ParentContainer and ChildEnglish; ParentContainer is skipped (no Parent item), ChildEnglish reaches the automated no-overwrite path. Introduces setupThreeLevelMocks() helper that returns multiple English pages for the Publish Status filter, matching real fetchPublishedEnglishPages behavior.
…ated path (Issue #171) - Validate Notion database has required automated select options before creating pages, with graceful fallback when DATABASE_ID is absent - Move toggle-page gate before processAutomatedTranslation so skippedTranslations counter increments correctly - Skip Notion page creation when block translation fails or returns empty, still saving disk artifact to avoid data loss - Make DATABASE_ID optional in notionClient (DATA_SOURCE_ID suffices) - Add test scenarios 10–15 covering validation, caching, block failure, and empty-block edge cases - Rewrite smoke-test script for page-specific no-overwrite verification
…ecar generation - Add type-safe error handling in catch block: check `error instanceof Error` before accessing `.message` - Fix Order property guard in processAutomatedTranslation: use `typeof orderProp?.number === "number"` to correctly handle falsy zero values (×2 locations) - Replace hardcoded automated language strings with LANGUAGES-derived values in validateAutomatedLanguageOptions - Add missing sourceProperties parameter to sidecar metadata for metadata preservation - Add seconds precision to datetime suffix generation in sidecar output - Refactor sidecar normalization to extract Language property and preserve Order and Tags metadata
- Added `notion.databases.retrieve` mock to fix silent failure in no-overwrite strategy path - Typed all filter callbacks, replacing `any` types with proper `NotionFilter` shape signatures - Moved `afterEach` hook adjacent to `beforeEach` for test structure clarity - Added human-reviewed skip unit test: verifies translation with Human Reviewed status and content is not marked for update - Added human-reviewed integration test: validates end-to-end routing of Human Reviewed translation to automated path when English is newer - Added three-level hierarchy test: ensures findSiblingTranslations correctly traverses immediate parent even in deep hierarchies (grandparent→parent→child) - Added sidecar sourceProperties tests: validate metadata writing to .notion.json sidecar files, excluding Language field
…constant - Language validation now computed from LANGUAGES constant via getAutomatedLanguageCode() - Replaces hardcoded 'PT - automated' and 'ES - automated' check - Adds sourceProperties passthrough support in sidecar for preserving source Notion properties - Adds 4 new unit tests covering: sourceProperties passthrough, legacy sidecar array compatibility, Language property override, and partial properties handling
…s and model params
- Add tests for all 6 Issue-171 automated-language exports: AUTOMATED_OUTPUT_DIRS,
AUTOMATED_LANGUAGE_MAP, isAutomatedLanguageCode, getBaseLanguageCode,
getAutomatedLanguageCode, getAutomatedOutputDir
- Add coverage for getModelContextLimit, getMaxChunkChars, and GPT-5.2 fallback
- Fix brittle LANGUAGES.length assertion to use toBeGreaterThanOrEqual(2)
- Consolidate describe blocks under root describe("constants") for consistent
env guard inheritance
- Remove redundant tests, false-positive env-var checks, and improve test organization
- Fix import ordering (values before types)
Adds check for raw Notion S3 URLs that shouldn't be leaked. Co-authored-by: Junie <junie@jetbrains.com>
Use ms precision for timestamp generation in saveAutomatedTranslationToDisk to avoid minute-long waits in tests. Require --page-id in run-single-page-translation.ts and adjust buffer to 100ms. Co-authored-by: Junie <junie@jetbrains.com>
QA Checklist✅ Verified locally (automated)
🔲 Live checks (need Notion access +
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6fa2669fb9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…line
indexOf('---', 3) matched '---' anywhere inside YAML values, truncating
the parsed slice early and producing an incorrect title/metadata when
pushed to Notion.
Replace with a line-by-line scan that stops only on a line whose trimmed
content is exactly '---'. Add a regression test with a title that contains
'---'.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d46d423099
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- validateAutomatedLanguageOptions: skip validation only when both DATABASE_ID and DATA_SOURCE_ID are absent; previously DATA_SOURCE_ID- only workflows bypassed schema validation entirely - createNotionPageWithBlocks: set retryPageId when an existing page is found via DB query, avoiding a redundant re-query on retry - push-new-translation: remove dead bare-array sidecar fallback that always threw due to missing parentId - push-new-translation: replace fragile hand-rolled frontmatter parser with yaml package (already in deps) for correct YAML handling
The test set DATA_SOURCE_ID, so validateAutomatedLanguageOptions now validates via dataSources.retrieve rather than taking the early-return warning path. Update the assertion to confirm the validation warning is NOT emitted (instead of asserting it IS emitted), and update the test description to reflect the DATA_SOURCE_ID-only code path.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f4fd69c92
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…wed translations P1: Introduce `automated` source priority between `explicit` and `fallback`. `AUTOMATED_LANGUAGE_NAMES` tags "pt - automated"/"es - automated" pages so they can never displace a human-reviewed page for the same locale, regardless of Sub-item ordering. `SOURCE_RANK` map enforces the three-level hierarchy at module scope. P2: Guard `fetchNotionPage()` against nullable DATABASE_ID by falling back to DATA_SOURCE_ID and throwing a descriptive error when both are unset.
This comment was marked as off-topic.
This comment was marked as off-topic.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bc894fddec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…gate The "Update Notion Status → Auto Translation Generated" step was gated on newTranslations || updatedTranslations, but automated translations only increment automatedTranslations. Automated-only runs therefore silently skipped the status update. Parse and expose automated_translations from translation-summary.json (both JSON and legacy log-parse paths) and add it to the step condition.
|
@codex review |
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@tomasciccola this PR should be ready to merge, could you go through the QA which is documented here? |
Summary
Fixes #171
When a translation already exists and the English source is newer, the pipeline previously overwrote the human-reviewed translation. This PR routes those cases to a safe automated path that creates a new versioned file instead of touching the existing one.
Language = "PT - automated"/"ES - automated", usingforceCreateto bypass DB deduplication--local-only+ existing i18n file → automated path, no Notion writesKey changes
scripts/constants.tsAUTOMATED_LANGUAGE_MAP,AUTOMATED_OUTPUT_DIRS, helper functionsscripts/notion-translate/translateBlocks.tsforceCreate9th param oncreateNotionPageWithBlocksscripts/notion-translate/index.tsupdateKinddiscriminant onTranslationUpdateResult;saveAutomatedTranslationToDisk();processAutomatedTranslation();automatedTranslationscounter wired through 4 locationsscripts/push-new-translation-to-notion.tsparentIdfrom.notion.jsonsidecar)scripts/run-single-page-translation.tsscripts/notion-translate/__tests__/no-overwrite.test.tsSidecar format
Automated translation runs now write a
.notion.jsonsidecar alongside each.mdfile:{ "parentId": "<notion-parent-block-id>", "blocks": [ ...translated Notion blocks... ] }The push script reads
parentIdfrom here — no need to add it to markdown frontmatter manually.Test plan
bunx vitest run scripts/notion-translate/__tests__/no-overwrite.test.ts)constants.test.tsmodel name check), 0 new failuresbunx tsc --noEmit— cleanbun run notion:translate -- --local-only --page-id <id>should write toautomated-translations/when a translation already existsGreptile Summary
This PR adds a no-overwrite strategy for human-reviewed translations: when an existing translation is found and the English source is newer, instead of overwriting it the pipeline now creates a new versioned file in
automated-translations/with millisecond-precision timestamps, optionally paired with a.notion.jsonsidecar for later Notion push. A newprocessAutomatedTranslationhelper,push-new-translation-to-notion.tsCLI, and 7 integration tests cover all routing scenarios.The three concerns raised in the previous review round (S3 URL leak in automated path, hardcoded page ID as default, same-second filename collision) are all addressed in the commit history.
Confidence Score: 5/5
Safe to merge — all three prior P0/P1 concerns are resolved; the one remaining finding is a P2 style query about an intentional guard removal.
The S3 URL leak in the automated path, hardcoded page ID default, and millisecond-timestamp collision are all addressed in the commit history. The only new finding is the removal of the
color: nullNotion write guard intranslateBlocks.ts, which is speculative and does not constitute a confirmed current defect.scripts/notion-translate/translateBlocks.ts — verify the intentional removal of the
color: nullcleanup guard.Important Files Changed
updateKinddiscriminant,processAutomatedTranslation,saveAutomatedTranslationToDisk, andgetTranslatedBlocksForDisk; wiresautomatedTranslationscounter through four locations. S3 URL leak guard is now present in the automated path.forceCreate9th param tocreateNotionPageWithBlocks, refactorsretryPageIdinitialisation; restructures null-property cleanup — movesicon: nullguard but drops thecolor: nullguard entirely, a potential regression.AUTOMATED_OUTPUT_DIRS,AUTOMATED_LANGUAGE_MAP, and four helper functions; the${code}-automatedfallback ingetAutomatedLanguageCodeis bounded byvalidateAutomatedLanguageOptionsat call time.--languageagainst known automated codes, checks sidecar existence, and usesforceCreateto bypass deduplication.automatedTranslationsinto the summary parse step and adds it to the Notion status-update gate condition; correctly set in$GITHUB_OUTPUT.Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[processLanguageTranslations] --> B{localOnly?} B -- No --> C[findTranslationPage] C --> D[needsTranslationUpdate] D -- needsUpdate=false --> E[skip ⏭] D -- updateKind=no-translation\nor empty-translation --> F[processSinglePageTranslation\nnormal path] D -- updateKind=newer-english\nor verification-error\nAND translationPage≠null --> G[automatedPathNeeded=true] B -- Yes --> H{i18n file exists\non disk?} H -- No --> F H -- Yes --> G G --> I{sectionType=toggle?} I -- Yes --> E I -- No --> J[processAutomatedTranslation] J --> K[translateText + S3 guard] K --> L{localOnly?} L -- No --> M[translateNotionBlocks\ncreateNotionPageWithBlocks\nforceCreate=true] L -- Yes --> N[skip Notion write] M --> O[saveAutomatedTranslationToDisk\n+ .notion.json sidecar] N --> O O --> P[onNew ➜ automatedTranslations++] F --> Q[new/updated translation\nin i18n dir]Prompt To Fix All With AI
Reviews (13): Last reviewed commit: "fix(workflow): include automatedTranslat..." | Re-trigger Greptile