Skip to content

feat(translate): add no-overwrite strategy for human-reviewed translations#179

Open
luandro wants to merge 24 commits intomainfrom
issue-171-translation-no-overwrite
Open

feat(translate): add no-overwrite strategy for human-reviewed translations#179
luandro wants to merge 24 commits intomainfrom
issue-171-translation-no-overwrite

Conversation

@luandro
Copy link
Copy Markdown
Contributor

@luandro luandro commented Apr 7, 2026

Summary

Fixes #171

When a translation already exists and the English source is newer, the pipeline previously overwrote the human-reviewed translation. This PR routes those cases to a safe automated path that creates a new versioned file instead of touching the existing one.

  • No existing translation → normal creation path (unchanged)
  • Existing translation, English is newer → automated path: new timestamped file + new Notion page with Language = "PT - automated" / "ES - automated", using forceCreate to bypass DB deduplication
  • Verification error on block check → same automated path (safe default)
  • --local-only + existing i18n file → automated path, no Notion writes

Key changes

File What changed
scripts/constants.ts AUTOMATED_LANGUAGE_MAP, AUTOMATED_OUTPUT_DIRS, helper functions
scripts/notion-translate/translateBlocks.ts forceCreate 9th param on createNotionPageWithBlocks
scripts/notion-translate/index.ts updateKind discriminant on TranslationUpdateResult; saveAutomatedTranslationToDisk(); processAutomatedTranslation(); automatedTranslations counter wired through 4 locations
scripts/push-new-translation-to-notion.ts New CLI to push a saved automated file to Notion (reads parentId from .notion.json sidecar)
scripts/run-single-page-translation.ts Local smoke-test script
scripts/notion-translate/__tests__/no-overwrite.test.ts 7 integration tests for all routing scenarios

Sidecar format

Automated translation runs now write a .notion.json sidecar alongside each .md file:

{
  "parentId": "<notion-parent-block-id>",
  "blocks": [ ...translated Notion blocks... ]
}

The push script reads parentId from here — no need to add it to markdown frontmatter manually.

Test plan

  • All 7 no-overwrite scenarios pass (bunx vitest run scripts/notion-translate/__tests__/no-overwrite.test.ts)
  • Full test suite: 3,224 passing, 1 pre-existing unrelated failure (constants.test.ts model name check), 0 new failures
  • TypeScript: bunx tsc --noEmit — clean
  • Lint: no new errors or warnings
  • Live end-to-end: bun run notion:translate -- --local-only --page-id <id> should write to automated-translations/ when a translation already exists

Greptile Summary

This PR adds a no-overwrite strategy for human-reviewed translations: when an existing translation is found and the English source is newer, instead of overwriting it the pipeline now creates a new versioned file in automated-translations/ with millisecond-precision timestamps, optionally paired with a .notion.json sidecar for later Notion push. A new processAutomatedTranslation helper, push-new-translation-to-notion.ts CLI, and 7 integration tests cover all routing scenarios.

The three concerns raised in the previous review round (S3 URL leak in automated path, hardcoded page ID as default, same-second filename collision) are all addressed in the commit history.

Confidence Score: 5/5

Safe to merge — all three prior P0/P1 concerns are resolved; the one remaining finding is a P2 style query about an intentional guard removal.

The S3 URL leak in the automated path, hardcoded page ID default, and millisecond-timestamp collision are all addressed in the commit history. The only new finding is the removal of the color: null Notion write guard in translateBlocks.ts, which is speculative and does not constitute a confirmed current defect.

scripts/notion-translate/translateBlocks.ts — verify the intentional removal of the color: null cleanup guard.

Important Files Changed

Filename Overview
scripts/notion-translate/index.ts Core routing logic: adds updateKind discriminant, processAutomatedTranslation, saveAutomatedTranslationToDisk, and getTranslatedBlocksForDisk; wires automatedTranslations counter through four locations. S3 URL leak guard is now present in the automated path.
scripts/notion-translate/translateBlocks.ts Adds forceCreate 9th param to createNotionPageWithBlocks, refactors retryPageId initialisation; restructures null-property cleanup — moves icon: null guard but drops the color: null guard entirely, a potential regression.
scripts/constants.ts Adds AUTOMATED_OUTPUT_DIRS, AUTOMATED_LANGUAGE_MAP, and four helper functions; the ${code}-automated fallback in getAutomatedLanguageCode is bounded by validateAutomatedLanguageOptions at call time.
scripts/push-new-translation-to-notion.ts New CLI tool to push a pre-translated sidecar file to Notion; validates --language against known automated codes, checks sidecar existence, and uses forceCreate to bypass deduplication.
scripts/notion-translate/tests/no-overwrite.test.ts 7 well-structured integration tests covering all routing scenarios (no translation, translation newer, English newer, empty translation, verification error, local-only, three-level hierarchy).
.github/workflows/translate-docs.yml Wires automatedTranslations into the summary parse step and adds it to the Notion status-update gate condition; correctly set in $GITHUB_OUTPUT.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[processLanguageTranslations] --> B{localOnly?}
    B -- No --> C[findTranslationPage]
    C --> D[needsTranslationUpdate]
    D -- needsUpdate=false --> E[skip ⏭]
    D -- updateKind=no-translation\nor empty-translation --> F[processSinglePageTranslation\nnormal path]
    D -- updateKind=newer-english\nor verification-error\nAND translationPage≠null --> G[automatedPathNeeded=true]
    B -- Yes --> H{i18n file exists\non disk?}
    H -- No --> F
    H -- Yes --> G
    G --> I{sectionType=toggle?}
    I -- Yes --> E
    I -- No --> J[processAutomatedTranslation]
    J --> K[translateText + S3 guard]
    K --> L{localOnly?}
    L -- No --> M[translateNotionBlocks\ncreateNotionPageWithBlocks\nforceCreate=true]
    L -- Yes --> N[skip Notion write]
    M --> O[saveAutomatedTranslationToDisk\n+ .notion.json sidecar]
    N --> O
    O --> P[onNew ➜ automatedTranslations++]
    F --> Q[new/updated translation\nin i18n dir]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: scripts/notion-translate/translateBlocks.ts
Line: 250-254

Comment:
**`color: null` guard dropped in refactor**

The original code deleted both `icon: null` and `color: null` from block type objects before writing to Notion, with an explicit comment: *"Notion returns as null but rejects on append."* The refactor kept the `icon` guard but removed the `color` guard entirely. If the Notion API still rejects `color: null` on block-append calls, any block whose runtime `color` property is `null` will silently fail or return an API error on the first write after this change.

If the removal is intentional (e.g., Notion now accepts `null` for `color` or it was never actually null in practice), a brief comment explaining why would prevent future re-introduction of the guard.

```suggestion
      // Clean up unsupported properties that Notion API rejects on block creation.
      // The Notion API returns these fields when reading but rejects them as null on write.
      if ("icon" in typeObj && typeObj.icon === null) {
        delete typeObj.icon;
      }
      if ("color" in typeObj && typeObj.color === null) {
        delete typeObj.color;
      }
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (13): Last reviewed commit: "fix(workflow): include automatedTranslat..." | Re-trigger Greptile

…tions (#171)

When a translation exists and the English source is newer, route to a
new automated path instead of overwriting the existing human-reviewed
content. The automated path:

- Creates a timestamped versioned file in ./automated-translations/{lang}
- Writes a .notion.json sidecar (with parentId) for later Notion push
- Creates a new Notion page with Language = "PT - automated" / "ES - automated"
  using forceCreate=true to bypass DB title+language deduplication

Key additions:
- AUTOMATED_LANGUAGE_MAP, AUTOMATED_OUTPUT_DIRS, helper fns in constants.ts
- forceCreate param on createNotionPageWithBlocks in translateBlocks.ts
- updateKind discriminant field on TranslationUpdateResult
- saveAutomatedTranslationToDisk() and processAutomatedTranslation() in index.ts
- automatedTranslations counter wired through summary types and main()
- scripts/push-new-translation-to-notion.ts — push saved file to Notion
- scripts/run-single-page-translation.ts — local smoke-test script
- 7 integration tests covering all routing scenarios (no-overwrite.test.ts)

Fixes #171
greptile-apps[bot]

This comment was marked as off-topic.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

🐳 Docker Image Published

Your Docker image has been built and pushed for this PR.

Image Reference: docker.io/communityfirst/comapeo-docs-api:pr-179

Platforms: linux/amd64, linux/arm64

Testing

To test this image:

docker pull docker.io/communityfirst/comapeo-docs-api:pr-179
docker run -p 3001:3001 docker.io/communityfirst/comapeo-docs-api:pr-179

Built with commit 46c2113

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

🚀 Preview Deployment

Your documentation preview is ready!

Preview URL: https://pr-179.comapeo-docs.pages.dev

🔄 Content: Regenerated 5 pages from Notion (script changes detected)

💡 Tip: Add label fetch-all-pages to test with full content, or fetch-10-pages for broader coverage.

This preview will update automatically when you push new commits to this PR.


Built with commit 46c2113

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ce6d609b0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/push-new-translation-to-notion.ts
Comment thread scripts/notion-translate/index.ts Outdated
luandro and others added 7 commits April 7, 2026 11:39
…e strategy

Creates and verifies real Notion pages with Language = "PT - automated"
using forceCreate=true. Gated behind RUN_LIVE_NOTION_TESTS=1 so CI is safe.

Run:
  RUN_LIVE_NOTION_TESTS=1 bunx vitest run scripts/notion-translate/__tests__/no-overwrite.live.test.ts
  RUN_LIVE_NOTION_TESTS=1 NO_OVERWRITE_KEEP_PAGE=1 ...  # keep pages for visual inspection

Two tests:
1. Single page created with correct Language property and parent relation
2. Two calls with forceCreate=true produce two distinct pages (dedup bypassed)
…Notion write

Notion returns `icon: null` inside block type sub-objects and `object`
at the block root when reading pages, but rejects both on block creation.
Stripping them from translateBlocksTree prevents API errors like:
  "body.children[N].paragraph.icon should be an object or undefined, instead was null"

Found during live end-to-end test of the no-overwrite automated path.
Add Scenario 8 (sibling matching via shared container) and Scenario 9
(batch-mode with container skip) to validate the no-overwrite translation
flow handles the three-level Notion hierarchy correctly.

Scenario 8: ChildEnglish resolves ChildPT through the shared ParentContainer
relation without using --local-only.

Scenario 9: fetchPublishedEnglishPages returns both ParentContainer and
ChildEnglish; ParentContainer is skipped (no Parent item), ChildEnglish
reaches the automated no-overwrite path.

Introduces setupThreeLevelMocks() helper that returns multiple English pages
for the Publish Status filter, matching real fetchPublishedEnglishPages behavior.
…ated path (Issue #171)

- Validate Notion database has required automated select options before
  creating pages, with graceful fallback when DATABASE_ID is absent
- Move toggle-page gate before processAutomatedTranslation so
  skippedTranslations counter increments correctly
- Skip Notion page creation when block translation fails or returns
  empty, still saving disk artifact to avoid data loss
- Make DATABASE_ID optional in notionClient (DATA_SOURCE_ID suffices)
- Add test scenarios 10–15 covering validation, caching, block failure,
  and empty-block edge cases
- Rewrite smoke-test script for page-specific no-overwrite verification
…ecar generation

- Add type-safe error handling in catch block: check `error instanceof Error` before accessing `.message`
- Fix Order property guard in processAutomatedTranslation: use `typeof orderProp?.number === "number"` to correctly handle falsy zero values (×2 locations)
- Replace hardcoded automated language strings with LANGUAGES-derived values in validateAutomatedLanguageOptions
- Add missing sourceProperties parameter to sidecar metadata for metadata preservation
- Add seconds precision to datetime suffix generation in sidecar output
- Refactor sidecar normalization to extract Language property and preserve Order and Tags metadata
- Added `notion.databases.retrieve` mock to fix silent failure in no-overwrite strategy path
- Typed all filter callbacks, replacing `any` types with proper `NotionFilter` shape signatures
- Moved `afterEach` hook adjacent to `beforeEach` for test structure clarity
- Added human-reviewed skip unit test: verifies translation with Human Reviewed status and content is not marked for update
- Added human-reviewed integration test: validates end-to-end routing of Human Reviewed translation to automated path when English is newer
- Added three-level hierarchy test: ensures findSiblingTranslations correctly traverses immediate parent even in deep hierarchies (grandparent→parent→child)
- Added sidecar sourceProperties tests: validate metadata writing to .notion.json sidecar files, excluding Language field
…constant

- Language validation now computed from LANGUAGES constant via getAutomatedLanguageCode()
- Replaces hardcoded 'PT - automated' and 'ES - automated' check
- Adds sourceProperties passthrough support in sidecar for preserving source Notion properties
- Adds 4 new unit tests covering: sourceProperties passthrough, legacy sidecar array compatibility, Language property override, and partial properties handling
greptile-apps[bot]

This comment was marked as off-topic.

greptile-apps[bot]

This comment was marked as off-topic.

Comment thread scripts/notion-translate/index.ts
Comment thread scripts/run-single-page-translation.ts Outdated
Comment thread scripts/notion-translate/index.ts
luandro and others added 4 commits April 13, 2026 02:04
…s and model params

- Add tests for all 6 Issue-171 automated-language exports: AUTOMATED_OUTPUT_DIRS,
  AUTOMATED_LANGUAGE_MAP, isAutomatedLanguageCode, getBaseLanguageCode,
  getAutomatedLanguageCode, getAutomatedOutputDir
- Add coverage for getModelContextLimit, getMaxChunkChars, and GPT-5.2 fallback
- Fix brittle LANGUAGES.length assertion to use toBeGreaterThanOrEqual(2)
- Consolidate describe blocks under root describe("constants") for consistent
  env guard inheritance
- Remove redundant tests, false-positive env-var checks, and improve test organization
- Fix import ordering (values before types)
Adds check for raw Notion S3 URLs that shouldn't be leaked.

Co-authored-by: Junie <junie@jetbrains.com>
Use ms precision for timestamp generation in saveAutomatedTranslationToDisk to avoid minute-long waits in tests.

Require --page-id in run-single-page-translation.ts and adjust buffer to 100ms.

Co-authored-by: Junie <junie@jetbrains.com>
Copy link
Copy Markdown

@capy-ai capy-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 1 comment

Comment thread scripts/notion-translate/index.ts
@luandro
Copy link
Copy Markdown
Contributor Author

luandro commented Apr 14, 2026

QA Checklist

✅ Verified locally (automated)

  • 15/15 routing scenarios passbunx vitest run scripts/notion-translate/__tests__/no-overwrite.test.ts
  • 4/4 push-CLI tests passscripts/push-new-translation-to-notion.test.ts
  • TypeScript cleanbunx tsc --noEmit
  • Lint clean — no new errors on changed files

🔲 Live checks (need Notion access + .env)

Precondition for checks 1–2: In Notion, verify ChildEnglish (26b1b08162d580e1aeeadbd31cbbe351) has last_edited_time newer than ChildPT (26b1b08162d580b0...). If not, open the English page, add a space and save.

1. Full-batch hierarchy ← most important

bun run notion:translate

Look for in terminal output:

  • Skipping "Ending a Project" - missing required Parent item relation (container skipped)
  • Processing: Completing or Ending a Project (child English enters pipeline)
  • Automated path fires for Portuguese

In Notion after run:

  • ParentContainer (2331b08162d58090ab6ad6c1c5f60dda) has no new PT/ES-automated child
  • A new PT - automated page exists under the same parent as ChildPT
  • ChildPT "Concluir um projeto" is untouched (same content/edited time)
  • File written under ./automated-translations/pt-BR/

2. Single-page automated flow

bun run notion:translate -- --page-id 26b1b08162d580e1aeeadbd31cbbe351

In Notion after run:

  • New PT - automated page exists under ChildPT's parent
  • ls automated-translations/pt-BR/ shows a new file
  • Existing ChildPT unchanged

3. DATABASE_ID="" degraded path

DATABASE_ID="" bun run notion:translate -- --page-id 26b1b08162d580e1aeeadbd31cbbe351
  • Log contains: Cannot verify automated language select options without DATABASE_ID — ensure they exist in Notion
  • Workflow completes without crash

4. Toggle-page skip

Find a Notion page with Element Type = toggle, Publish Status = Ready for translation, and an existing PT/ES translation older than English. Then:

bun run notion:translate -- --page-id <toggle-page-id>
  • Log contains: Skipping toggle page "..." — automated path does not handle toggles
  • No new Notion page or disk file created for that toggle

🧹 Post-QA cleanup

Archive stale PT - automated / ES - automated test pages under [TEST] Installation Guide (3031b08162d5819799b8f13cb9cd1ffa) in Notion.


Pass criteria: all 4 live checks green → PR ready to merge.

@luandro
Copy link
Copy Markdown
Contributor Author

luandro commented Apr 14, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6fa2669fb9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/push-new-translation-to-notion.ts Outdated
…line

indexOf('---', 3) matched '---' anywhere inside YAML values, truncating
the parsed slice early and producing an incorrect title/metadata when
pushed to Notion.

Replace with a line-by-line scan that stops only on a line whose trimmed
content is exactly '---'.  Add a regression test with a title that contains
'---'.
@luandro
Copy link
Copy Markdown
Contributor Author

luandro commented Apr 14, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d46d423099

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/notion-translate/translateBlocks.ts Outdated
Comment thread scripts/notion-translate/index.ts Outdated
luandro added 2 commits April 15, 2026 08:14
- validateAutomatedLanguageOptions: skip validation only when both
  DATABASE_ID and DATA_SOURCE_ID are absent; previously DATA_SOURCE_ID-
  only workflows bypassed schema validation entirely
- createNotionPageWithBlocks: set retryPageId when an existing page is
  found via DB query, avoiding a redundant re-query on retry
- push-new-translation: remove dead bare-array sidecar fallback that
  always threw due to missing parentId
- push-new-translation: replace fragile hand-rolled frontmatter parser
  with yaml package (already in deps) for correct YAML handling
The test set DATA_SOURCE_ID, so validateAutomatedLanguageOptions now
validates via dataSources.retrieve rather than taking the early-return
warning path. Update the assertion to confirm the validation warning
is NOT emitted (instead of asserting it IS emitted), and update the
test description to reflect the DATA_SOURCE_ID-only code path.
@luandro
Copy link
Copy Markdown
Contributor Author

luandro commented Apr 15, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f4fd69c92

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/notion-fetch/pageGrouping.ts
Comment thread scripts/notionClient.ts
…wed translations

P1: Introduce `automated` source priority between `explicit` and `fallback`.
`AUTOMATED_LANGUAGE_NAMES` tags "pt - automated"/"es - automated" pages so
they can never displace a human-reviewed page for the same locale, regardless
of Sub-item ordering. `SOURCE_RANK` map enforces the three-level hierarchy at
module scope.

P2: Guard `fetchNotionPage()` against nullable DATABASE_ID by falling back to
DATA_SOURCE_ID and throwing a descriptive error when both are unset.
@greptile-apps

This comment was marked as off-topic.

@luandro
Copy link
Copy Markdown
Contributor Author

luandro commented Apr 15, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc894fddec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/notion-translate/index.ts
…gate

The "Update Notion Status → Auto Translation Generated" step was gated
on newTranslations || updatedTranslations, but automated translations
only increment automatedTranslations. Automated-only runs therefore
silently skipped the status update.

Parse and expose automated_translations from translation-summary.json
(both JSON and legacy log-parse paths) and add it to the step condition.
@luandro
Copy link
Copy Markdown
Contributor Author

luandro commented Apr 15, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@luandro
Copy link
Copy Markdown
Contributor Author

luandro commented Apr 15, 2026

@tomasciccola this PR should be ready to merge, could you go through the QA which is documented here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Updating a translation page shouldn't overwrite the existing translation

1 participant