Fix whitespace preservation 8895408090764671849 by Azadron228 · Pull Request #901 · wagtail/wagtail-localize

Azadron228 · 2026-02-24T13:05:14Z

No description provided.

This change addresses a regression where multiple whitespaces between words or tags were collapsed during extraction and restoration of translatable strings. The root cause was BeautifulSoup's default behavior with the html.parser engine, which collapses whitespace at the root level of a fragment. By adding '[document]' to the 'preserve_whitespace_tags' list, we instruct the parser to keep all whitespaces across the fragment. Key changes: - Introduced a `get_soup` helper in `wagtail_localize/strings.py` that correctly configures BeautifulSoup. - Consistently used `get_soup` in all string extraction and restoration paths. - Added regression tests in `wagtail_localize/tests/test_strings.py`. Co-authored-by: Azadron228 <106530452+Azadron228@users.noreply.github.com>

- Configured BeautifulSoup to preserve whitespaces by including '[document]' in the preserve_whitespace_tags parameter. - Consistently used a get_soup helper in all string extraction and restoration paths. - Suppressed the MarkupResemblesLocatorWarning in get_soup to avoid warnings when parsing strings that look like URLs or file paths. - Added regression tests in wagtail_localize/tests/test_strings.py. Co-authored-by: Azadron228 <106530452+Azadron228@users.noreply.github.com>

Double whitespaces between words or tags were being lost during translation synchronization because BeautifulSoup's html.parser collapses whitespace-only nodes in certain contexts. This change introduces a more robust way to preserve all whitespace by configuring the BeautifulSoup builder with a custom preserve_whitespace_tags class that always returns True. Modified wagtail_localize/strings.py to centralize soup creation in a get_soup helper and updated all call sites. Also updated the dummy machine translator for consistency. Added regression tests in wagtail_localize/tests/test_strings.py. Co-authored-by: Azadron228 <106530452+Azadron228@users.noreply.github.com>

Marvinrose

Please see the review below

Marvinrose

Thanks for this contribution! Centralising BeautifulSoup calls into get_soup() is a clean improvement and the whitespace preservation approach is sensible.
However, the CI failure needs fixing. The pre-commit checks failed on ruff formatting and linting. You can fix this by running ' ruff check --fix . ' and ' ruff format . ' locally, then pushing the updated commit.
Suggestion: Whilst this is a great contribution, it'd be nice if you added a PR description. a description explaining what problem this solves, what caused it, or how to reproduce it. It would really help reviewers (and future contributors reading git history) to know: what specific scenario was losing whitespace, and which part of the pipeline was collapsing it?

Marvinrose · 2026-03-21T16:17:00Z

wagtail_localize/strings.py

 INLINE_TAGS = ["a", "abbr", "acronym", "b", "code", "em", "i", "strong", "br"]


+class PreserveWhitespaceTags:


This class returns True for every tag, meaning BeautifulSoup will now preserve whitespace inside all HTML elements including block-level ones like
and
. The existing INLINE_TAGS list suggests the codebase already distinguishes inline from block elements. Was it intentional to override this for all tags, or should PreserveWhitespaceTags only apply to inline tags?

google-labs-jules bot and others added 3 commits February 16, 2026 10:06

Marvinrose reviewed Mar 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix whitespace preservation 8895408090764671849#901

Fix whitespace preservation 8895408090764671849#901
Azadron228 wants to merge 3 commits intowagtail:mainfrom
Azadron228:fix-whitespace-preservation-8895408090764671849

Azadron228 commented Feb 24, 2026

Uh oh!

Marvinrose left a comment •

edited

Loading

Uh oh!

Marvinrose left a comment

Uh oh!

Marvinrose Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		INLINE_TAGS = ["a", "abbr", "acronym", "b", "code", "em", "i", "strong", "br"]


		class PreserveWhitespaceTags:

Conversation

Azadron228 commented Feb 24, 2026

Uh oh!

Marvinrose left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Marvinrose left a comment

Choose a reason for hiding this comment

Uh oh!

Marvinrose Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Marvinrose left a comment •

edited

Loading