fix(compress): atomic filepath write to prevent data loss on crash#205
Open
d123d wants to merge 1 commit intoJuliusBrussee:mainfrom
Open
fix(compress): atomic filepath write to prevent data loss on crash#205d123d wants to merge 1 commit intoJuliusBrussee:mainfrom
d123d wants to merge 1 commit intoJuliusBrussee:mainfrom
Conversation
Current behavior: backup and target are written in sequence:
backup_path.write_text(original_text)
filepath.write_text(compressed)
If the second write raises mid-way (disk full, permission denied,
antivirus lock, OS kill), filepath contains a partial/truncated
compressed text OR the original — but there's no way for the retry
loop to tell which. The in-memory 'compressed' string proceeds to
validation against backup_path regardless of what actually landed on
disk. Worst case: validation 'passes' against a truncated file.
Fix: add _atomic_write_text() helper using Path.replace() (atomic on
same filesystem) via a .tmp sibling. Invariant enforced: filepath
always contains either the pre-existing content or the full new
content — never a partial write.
Applied in three places:
- Initial compressed write (line 198)
- Retry fix-prompt write (line 231)
- Failure-restore write (line 221)
The failure-restore in particular benefits: previously if the restore
write failed, filepath contained the failed-validation compressed
text AND backup was already deleted — unrecoverable. Now restore is
atomic.
No behavior change on the happy path. py_compile passes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Current `compress_file` writes backup and target in sequence:
```python
backup_path.write_text(original_text)
filepath.write_text(compressed) # <-- if this raises, partial state
```
If the second `write_text` raises mid-way (disk full, permission denied, antivirus lock on Windows, OS kill), `filepath` contains a truncated/partial write OR the original — the in-memory `compressed` string proceeds to validation against `backup_path` regardless of what actually landed on disk. Worst case: validation 'passes' against a file that now contains garbage.
Same issue hits the retry-fix write (line 225) and the failure-restore write (line 216). The failure-restore is especially bad: if the restore write fails, `filepath` has the failed-validation compressed text AND the backup was already deleted — unrecoverable.
Fix
Small helper using `Path.replace()` (POSIX `rename(2)` / Windows `MoveFileEx` — atomic on same filesystem):
```python
def _atomic_write_text(target: Path, content: str) -> None:
tmp = target.with_suffix(target.suffix + ".tmp")
try:
tmp.write_text(content)
tmp.replace(target)
except Exception:
tmp.unlink(missing_ok=True)
raise
```
Invariant enforced: `target` always contains either the pre-existing content or the full new content — never a partial write.
Applied in three places:
Backward compat
Zero behavior change on the happy path. The `.tmp` sibling only exists during the write; `Path.replace()` is atomic. No API change.
Prior art
The project's own `hooks/caveman-config.js::safeWriteFlag` already uses temp + rename for similar reasons on the JS side. This extends the same invariant to the Python side.
Verify
```
python -m py_compile caveman-compress/scripts/compress.py
```
Passes.