Skip to content

Fix cp1252 UnicodeEncodeError from emoji prints on Windows#203

Open
d123d wants to merge 1 commit intoJuliusBrussee:mainfrom
d123d:fix/windows-cp1252-emojis
Open

Fix cp1252 UnicodeEncodeError from emoji prints on Windows#203
d123d wants to merge 1 commit intoJuliusBrussee:mainfrom
d123d:fix/windows-cp1252-emojis

Conversation

@d123d
Copy link
Copy Markdown

@d123d d123d commented Apr 17, 2026

Problem

On Windows consoles with cp1252 encoding (default for many installs), print statements containing emoji raise UnicodeEncodeError: 'charmap' codec can't encode character '\u274c'. This hits users of the Python compress sub-skill and obscures real error messages.

The project's own CLAUDE.md already flags cross-platform hook concerns, and this is the same class of bug in the Python side.

Reproducer

Windows 11, stock cmd/PowerShell, cp1252 console:

python -m scripts /any/file.md
# if compression fails, the catch-all handler at cli.py:68 prints '\u274c Error: ...'
# \u274c crashes encode, real error is lost

Fix

Replace emoji prints with ASCII equivalents in three files:

  • caveman-compress/scripts/cli.py — 4 sites (file-not-found, not-a-file, compression-failed, generic exception)
  • caveman-compress/scripts/compress.py — 3 sites (backup-exists warning, validation-failed, retries-exhausted)
  • caveman-compress/scripts/benchmark.py — 4 sites (pass/fail markers, not-found errors)

Mapping:

  • \u274c[ERROR]
  • \u26a0[WARNING]
  • \u2705[OK]

No behavior change. Prints render identically on Unix terminals; Windows cp1252 terminals now see the intended message.

Alternative considered

sys.stdout.reconfigure(encoding='utf-8') at module load — works but changes global stdout behavior and can interfere with tools that parse subprocess output. ASCII is safer.

Verify

python -m py_compile caveman-compress/scripts/{cli,compress,benchmark}.py

Passes.

…or on Windows

Prior art: issue is mentioned in caveman's own CLAUDE.md guidance about
cross-platform hooks. Same class of bug hits the Python compress scripts.

Symptom: on Windows consoles with cp1252 encoding (default for many installs),
print statements containing emoji (\u274c, \u26a0, \u2705) raise:

    UnicodeEncodeError: 'charmap' codec can't encode character '\u274c'

This obscures real error messages (e.g. in cli.py:68 catch-all handler),
making compress failures hard to debug on Windows.

Scope:
  - cli.py: 4 sites (file-not-found, not-a-file, compression-failed, catch-all)
  - compress.py: 3 sites (backup-exists, validation-failed, retries-exhausted)
  - benchmark.py: 4 sites (pass/fail markers, not-found errors)

Replacements:
  \u274c  -> [ERROR]
  \u26a0 -> [WARNING]
  \u2705 -> [OK]

No behavior change; ASCII-only prints on all platforms.

Verified: python -m py_compile passes on all three files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant