Skip to content

fix(compress): atomic filepath write to prevent data loss on crash#205

Open
d123d wants to merge 1 commit intoJuliusBrussee:mainfrom
d123d:fix/compress-atomic-write
Open

fix(compress): atomic filepath write to prevent data loss on crash#205
d123d wants to merge 1 commit intoJuliusBrussee:mainfrom
d123d:fix/compress-atomic-write

Conversation

@d123d
Copy link
Copy Markdown

@d123d d123d commented Apr 17, 2026

Problem

Current `compress_file` writes backup and target in sequence:

```python
backup_path.write_text(original_text)
filepath.write_text(compressed) # <-- if this raises, partial state
```

If the second `write_text` raises mid-way (disk full, permission denied, antivirus lock on Windows, OS kill), `filepath` contains a truncated/partial write OR the original — the in-memory `compressed` string proceeds to validation against `backup_path` regardless of what actually landed on disk. Worst case: validation 'passes' against a file that now contains garbage.

Same issue hits the retry-fix write (line 225) and the failure-restore write (line 216). The failure-restore is especially bad: if the restore write fails, `filepath` has the failed-validation compressed text AND the backup was already deleted — unrecoverable.

Fix

Small helper using `Path.replace()` (POSIX `rename(2)` / Windows `MoveFileEx` — atomic on same filesystem):

```python
def _atomic_write_text(target: Path, content: str) -> None:
tmp = target.with_suffix(target.suffix + ".tmp")
try:
tmp.write_text(content)
tmp.replace(target)
except Exception:
tmp.unlink(missing_ok=True)
raise
```

Invariant enforced: `target` always contains either the pre-existing content or the full new content — never a partial write.

Applied in three places:

  • Initial compressed write
  • Retry fix-prompt write
  • Failure-restore write

Backward compat

Zero behavior change on the happy path. The `.tmp` sibling only exists during the write; `Path.replace()` is atomic. No API change.

Prior art

The project's own `hooks/caveman-config.js::safeWriteFlag` already uses temp + rename for similar reasons on the JS side. This extends the same invariant to the Python side.

Verify

```
python -m py_compile caveman-compress/scripts/compress.py
```

Passes.

Current behavior: backup and target are written in sequence:

    backup_path.write_text(original_text)
    filepath.write_text(compressed)

If the second write raises mid-way (disk full, permission denied,
antivirus lock, OS kill), filepath contains a partial/truncated
compressed text OR the original — but there's no way for the retry
loop to tell which. The in-memory 'compressed' string proceeds to
validation against backup_path regardless of what actually landed on
disk. Worst case: validation 'passes' against a truncated file.

Fix: add _atomic_write_text() helper using Path.replace() (atomic on
same filesystem) via a .tmp sibling. Invariant enforced: filepath
always contains either the pre-existing content or the full new
content — never a partial write.

Applied in three places:
  - Initial compressed write (line 198)
  - Retry fix-prompt write (line 231)
  - Failure-restore write (line 221)

The failure-restore in particular benefits: previously if the restore
write failed, filepath contained the failed-validation compressed
text AND backup was already deleted — unrecoverable. Now restore is
atomic.

No behavior change on the happy path. py_compile passes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant