trim#10
Conversation
lzma_action is c_int on MSVC and c_uint elsewhere, so the u32 parameter type rejected lzma_action constants on MSVC. Take lzma_action directly.
WalkthroughRefactors xz-core allocation layers to introduce size- and type-aware internal alloc/free helpers, gate allocator entrypoints behind a ChangesAllocator System Refactor (primary DAG)
CI, Tests, Docs, and Metadata (independent DAG)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
xz-core/src/lzma/lzma_decoder.rs (1)
2370-2378:⚠️ Potential issue | 🟠 Major | ⚡ Quick winInstall the decoder
endcallback in the create path.
lzma_lzma_decoder_createnow returns a non-null(*lz).coderon Lines 2371-2374, but the destructor isn't wired until Line 2419 insidelzma_decoder_init. That leaves a live decoder without anendcallback if anyone tears it down aftercreatebut beforeinit, which is especially risky now that cleanup no longer falls back to a generic free path.Suggested fix
pub unsafe fn lzma_lzma_decoder_create( lz: *mut lzma_lz_decoder, allocator: *const lzma_allocator, options: *const lzma_options_lzma, lz_options: *mut lzma_lz_options, ) -> lzma_ret { if (*lz).coder.is_null() { (*lz).coder = crate::alloc::internal_alloc_object::<lzma_lzma1_decoder>(allocator).cast(); if (*lz).coder.is_null() { return LZMA_MEM_ERROR; } (*lz).code = lzma_decode as lzma_lz_decoder_code_function; (*lz).reset = Some(lzma_decoder_reset as unsafe fn(*mut c_void, *const c_void) -> ()); (*lz).set_uncompressed = Some(lzma_decoder_uncompressed as unsafe fn(*mut c_void, lzma_vli, bool) -> ()); } + (*lz).end = Some(lzma_decoder_end as unsafe fn(*mut c_void, *const lzma_allocator) -> ()); (*lz_options).dict_size = (*options).dict_size as size_t; (*lz_options).preset_dict = (*options).preset_dict; (*lz_options).preset_dict_size = (*options).preset_dict_size as size_t; LZMA_OK }Also applies to: 2419-2419
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@xz-core/src/lzma/lzma_decoder.rs` around lines 2370 - 2378, The newly-allocated decoder object receives coder, code, reset and set_uncompressed in the create path but never gets its end callback set, so if someone calls the destructor before lzma_decoder_init runs the cleanup path misses the decoder-specific end routine; fix by installing the end callback at the same time you assign (*lz).coder in the create path (set (*lz).end = Some(lzma_decoder_end as unsafe fn(*mut c_void) -> ()))—mirror the assignment done in lzma_decoder_init so the decoder always has a valid end callback after lzma_lzma_decoder_create returns.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Around line 16-29: The section heading "Migrate from xz2" is inconsistent with
the example which shows replacing liblzma with xz; update the heading text to
"Migrate from liblzma" (or otherwise match the examples) so the title aligns
with the code snippets that use liblzma -> xz; locate the heading string
"Migrate from xz2" and change it to "Migrate from liblzma" to avoid confusion in
the README.
In `@scripts/run_xz_c_tests_with_xz_sys.sh`:
- Around line 65-66: The new ctest invocation (ctest --test-dir
"$CMAKE_BUILD_DIR" --output-on-failure "$@") is blocking CI because it returns
non-zero when upstream tests fail; update the script to make running upstream xz
C tests opt-in or non-fatal: guard the ctest call with an environment flag like
RUN_UPSTREAM_XZ_TESTS (only run ctest when that var is set/true) or run it but
ignore failures (e.g., capture the exit code and continue) so CI doesn't fail on
exit code 8; refer to the ctest invocation and the CMAKE_BUILD_DIR variable when
making the change.
In `@xz-core/Cargo.toml`:
- Line 3: The Cargo.toml version field is set to 0.0.1 which is out-of-order
relative to the crate's previous 0.1.0 release; update the version value in
Cargo.toml (the version = "..." entry) to a semantically appropriate next
release (e.g., "0.1.1" for a patch or "0.2.0" for a minor change) or add a
note/commit message if the 0.0.1 choice is intentionally required, so crate
versioning remains consistent and non-confusing.
In `@xz-core/src/alloc/custom.rs`:
- Around line 155-164: Both internal_free_untyped and
internal_free_untyped_bytes currently call lzma_free which can fall back to the
C-backed c_free_ptr when allocator is null and thus can mismatch with internal
allocations made using the module's internal Rust allocator (the path used at
Line 73). Change these functions to use the module's internal free path — either
call the internal free helper used for internal allocations or invoke lzma_free
with the explicit internal allocator pointer (the same allocator instance used
for allocations at Line 73) instead of the possibly-null allocator parameter,
ensuring frees always use the matching internal allocator.
In `@xz-core/src/common/index.rs`:
- Around line 90-105: The allocation uses internal_alloc_array::<u8>
(byte-aligned) and then casts to *mut index_group, violating alignment for
index_group; update index_group_alloc to allocate with the alignment and element
size of index_group (e.g., call the allocator helper specialized for index_group
or pass size_of::<index_group>() and align_of::<index_group>() so the returned
pointer is properly aligned) and cast from that aligned allocation to *mut
index_group; ensure index_group_free uses the same size/alignment calculation
(index_group_size and align_of::<index_group> or the matching free helper) when
calling internal_free_array/internal_free so allocation and deallocation use the
same alignment and size semantics (refer to index_group_alloc, index_group_free,
index_group_size, internal_alloc_array/internal_free_array, and the index_group
type).
In `@xz-core/src/common/stream_encoder_mt.rs`:
- Around line 491-495: Existing per-thread input buffer (*thr).in_0 is only
allocated when null but not resized, causing buffer overruns when
(*coder).block_size grows; modify the init/reuse path (where (*thr).in_0,
(*thr).in_alloc_size and (*coder).block_size are referenced) to check if
(*thr).in_alloc_size != (*coder).block_size and, if so, free/reallocate
(*thr).in_0 to the new (*coder).block_size (using the same allocator functions
used elsewhere) and update (*thr).in_alloc_size accordingly so stream_encode_in
always copies into a buffer at least as large as (*coder).block_size.
In `@xz-core/src/lz/lz_decoder.rs`:
- Around line 177-180: The non-custom-allocator path currently only
debug_asserts that (*coder).lz.coder is null and thus can leak when
(*coder).lz.end is None in release; update the teardown logic (around the lz
fields referenced as (*coder).lz.coder and (*coder).lz.end) so that when feature
"custom_allocator" is disabled you still call the appropriate free routine
(e.g., crate::alloc::internal_free_bytes or the module's standard free) with the
same parameters (coder, size 0, allocator) instead of only asserting; ensure the
freeing mirrors the custom_allocator branch and runs in release builds as well.
In `@xz-core/src/simple/simple_coder.rs`:
- Around line 179-183: The allocation for lzma_simple_coder currently uses
internal_alloc_array::<u8>() and casts the result, which only guarantees 1-byte
alignment and can produce undefined behavior; change the allocation to
explicitly respect core::mem::align_of::<lzma_simple_coder>() (e.g., use or add
an allocator helper that allocates the struct portion with
align_of::<lzma_simple_coder>() and then appends trailing bytes, or call an
internal_alloc_array::<lzma_simple_coder>()-style API) so the returned pointer
is properly aligned for lzma_simple_coder; adjust the corresponding deallocation
path (internal_free_array or allocator usage around coder and allocated size) to
match the new allocation scheme.
---
Outside diff comments:
In `@xz-core/src/lzma/lzma_decoder.rs`:
- Around line 2370-2378: The newly-allocated decoder object receives coder,
code, reset and set_uncompressed in the create path but never gets its end
callback set, so if someone calls the destructor before lzma_decoder_init runs
the cleanup path misses the decoder-specific end routine; fix by installing the
end callback at the same time you assign (*lz).coder in the create path (set
(*lz).end = Some(lzma_decoder_end as unsafe fn(*mut c_void) -> ()))—mirror the
assignment done in lzma_decoder_init so the decoder always has a valid end
callback after lzma_lzma_decoder_create returns.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0bcb2e51-6b53-472c-a3b1-9b7718327b7e
📒 Files selected for processing (28)
.github/workflows/main.ymlREADME.mdscripts/run_xz_c_tests_with_xz_sys.shxz-core/Cargo.tomlxz-core/src/alloc.rsxz-core/src/alloc/custom.rsxz-core/src/common/common.rsxz-core/src/common/filter_common.rsxz-core/src/common/filter_encoder.rsxz-core/src/common/index.rsxz-core/src/common/index_hash.rsxz-core/src/common/outqueue.rsxz-core/src/common/stream_decoder.rsxz-core/src/common/stream_decoder_mt.rsxz-core/src/common/stream_encoder.rsxz-core/src/common/stream_encoder_mt.rsxz-core/src/common/string_conversion.rsxz-core/src/delta/delta_decoder.rsxz-core/src/lz/lz_decoder.rsxz-core/src/lz/lz_encoder.rsxz-core/src/lzma/lzma2_decoder.rsxz-core/src/lzma/lzma2_encoder.rsxz-core/src/lzma/lzma_decoder.rsxz-core/src/lzma/lzma_encoder.rsxz-core/src/raw_alloc.rsxz-core/src/simple/simple_coder.rsxz-core/src/simple/simple_decoder.rsxz-core/src/types.rs
| echo "Running upstream xz C tests against xz-sys..." | ||
| ctest --test-dir "$CMAKE_BUILD_DIR" --output-on-failure "$@" |
There was a problem hiding this comment.
The new CTest step is blocking CI.
The pipeline log shows ctest exiting 8, so this job will keep failing every run until the upstream test regression or the archive replacement flow is fixed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/run_xz_c_tests_with_xz_sys.sh` around lines 65 - 66, The new ctest
invocation (ctest --test-dir "$CMAKE_BUILD_DIR" --output-on-failure "$@") is
blocking CI because it returns non-zero when upstream tests fail; update the
script to make running upstream xz C tests opt-in or non-fatal: guard the ctest
call with an environment flag like RUN_UPSTREAM_XZ_TESTS (only run ctest when
that var is set/true) or run it but ignore failures (e.g., capture the exit code
and continue) so CI doesn't fail on exit code 8; refer to the ctest invocation
and the CMAKE_BUILD_DIR variable when making the change.
| [package] | ||
| name = "xz-core" | ||
| version = "0.1.0" | ||
| version = "0.0.1" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🌐 Web query:
For Cargo/crates.io pre-1.0 crates, what semver bump is recommended for breaking vs non-breaking changes, and what happens in dependency resolution if a crate publishes 0.0.1 after 0.1.0 when downstream users depend on ^0.1?
💡 Result:
For pre-1.0 crates on crates.io/Cargo: Semver bump recommendations: - In 0.y.z (y > 0): Breaking changes require bumping y (minor version, e.g., 0.1.0 -> 0.2.0, treated as major). Non-breaking changes (bugfixes, new features) use patch bumps (e.g., 0.1.0 -> 0.1.1). - In 0.0.z: Every release is considered a major/breaking change; bump z (e.g., 0.0.1 -> 0.0.2). No compatibility guarantees across any 0.0.z versions. This follows Cargo's convention where only changes in the left-most non-zero component are incompatible (e.g., 0.1.x compatible within x, 0.0.x each z incompatible). If a crate publishes 0.0.1 after 0.1.0 exists, and downstream depends on ^0.1 (i.e., >=0.1.0, <0.2.0): - 0.0.1 does NOT satisfy ^0.1.0, as 0.0.1 < 0.1.0 and outside the range (different left-most non-zero component: 0 for 0.0.1 vs 1 for 0.1.0). - Dependency resolution ignores 0.0.1; selects highest compatible >=0.1.0 (e.g., latest 0.1.z). No impact on users of ^0.1.0. - Publishing out-of-order versions is allowed but discouraged; Cargo resolves by semver precedence (0.1.0 > 0.0.1), picking highest compatible.
Citations:
- 1: https://doc.rust-lang.org/cargo/reference/semver.html
- 2: https://rustprojectprimer.com/releasing/versioning.html
- 3: https://doc.rust-lang.org/1.60.0/cargo/reference/resolver.html
- 4: https://doc.rust-lang.org/1.60.0/cargo/reference/specifying-dependencies.html
- 5: https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html
Out-of-order versioning is discouraged; ensure this change is intentional.
Publishing 0.0.1 after 0.1.0 is unconventional. While dependency resolution will not strand users on ^0.1 (Cargo ignores 0.0.1 and resolves to the highest compatible version ≥ 0.1.0), out-of-order versions create confusion and are generally discouraged. If this is a mistake, use 0.1.1 or 0.2.0 instead. If intentional, confirm the rationale.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@xz-core/Cargo.toml` at line 3, The Cargo.toml version field is set to 0.0.1
which is out-of-order relative to the crate's previous 0.1.0 release; update the
version value in Cargo.toml (the version = "..." entry) to a semantically
appropriate next release (e.g., "0.1.1" for a patch or "0.2.0" for a minor
change) or add a note/commit message if the 0.0.1 choice is intentionally
required, so crate versioning remains consistent and non-confusing.
| (*thr).in_0 = crate::alloc::internal_alloc_array::<u8>((*coder).block_size, allocator); | ||
| if (*thr).in_0.is_null() { | ||
| return LZMA_MEM_ERROR; | ||
| } | ||
| (*thr).in_alloc_size = (*coder).block_size; |
There was a problem hiding this comment.
Reused worker buffers can become undersized after re-init with a larger block size.
Line 495 stores the allocation size, but existing threads are reused when only threads stays the same. If a later init sets a larger (*coder).block_size, stream_encode_in can copy up to the new block size into an old smaller in_0 allocation.
Please reallocate (*thr).in_0 when (*thr).in_alloc_size != (*coder).block_size before the thread is reused.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@xz-core/src/common/stream_encoder_mt.rs` around lines 491 - 495, Existing
per-thread input buffer (*thr).in_0 is only allocated when null but not resized,
causing buffer overruns when (*coder).block_size grows; modify the init/reuse
path (where (*thr).in_0, (*thr).in_alloc_size and (*coder).block_size are
referenced) to check if (*thr).in_alloc_size != (*coder).block_size and, if so,
free/reallocate (*thr).in_0 to the new (*coder).block_size (using the same
allocator functions used elsewhere) and update (*thr).in_alloc_size accordingly
so stream_encode_in always copies into a buffer at least as large as
(*coder).block_size.
| #[cfg(feature = "custom_allocator")] | ||
| crate::alloc::internal_free_bytes((*coder).lz.coder, 0, allocator); | ||
| #[cfg(not(feature = "custom_allocator"))] | ||
| debug_assert!((*coder).lz.coder.is_null()); |
There was a problem hiding this comment.
Non-custom-allocator fallback can leak lz.coder in release builds.
When (*coder).lz.end is None, the non-custom_allocator branch only asserts in debug and skips freeing in release. Keep a real fallback free path in both feature modes.
Suggested fix
} else {
- #[cfg(feature = "custom_allocator")]
- crate::alloc::internal_free_bytes((*coder).lz.coder, 0, allocator);
- #[cfg(not(feature = "custom_allocator"))]
- debug_assert!((*coder).lz.coder.is_null());
+ crate::alloc::internal_free_untyped((*coder).lz.coder, allocator);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@xz-core/src/lz/lz_decoder.rs` around lines 177 - 180, The
non-custom-allocator path currently only debug_asserts that (*coder).lz.coder is
null and thus can leak when (*coder).lz.end is None in release; update the
teardown logic (around the lz fields referenced as (*coder).lz.coder and
(*coder).lz.end) so that when feature "custom_allocator" is disabled you still
call the appropriate free routine (e.g., crate::alloc::internal_free_bytes or
the module's standard free) with the same parameters (coder, size 0, allocator)
instead of only asserting; ensure the freeing mirrors the custom_allocator
branch and runs in release builds as well.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
xz-core/src/common/stream_encoder_mt.rs (1)
491-495:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReused worker buffers can still be undersized after block-size growth.
in_alloc_sizeis recorded at allocation time, but reused threads are not resized when(*coder).block_sizeincreases.stream_encode_incan then copy up to the new block size into an older, smaller(*thr).in_0.Proposed fix
diff --git a/xz-core/src/common/stream_encoder_mt.rs b/xz-core/src/common/stream_encoder_mt.rs @@ unsafe fn get_thread(coder: *mut lzma_stream_coder, allocator: *const lzma_allocator) -> lzma_ret { @@ if (*coder).thr.is_null() { if (*coder).threads_initialized == (*coder).threads_max { return LZMA_OK; } let ret__1: lzma_ret = initialize_new_thread(coder, allocator); if ret__1 != LZMA_OK { return ret__1; } } + + // Ensure reused worker buffer matches current block size. + if (*(*coder).thr).in_alloc_size != (*coder).block_size { + let new_in = crate::alloc::internal_alloc_array::<u8>((*coder).block_size, allocator); + if new_in.is_null() { + return LZMA_MEM_ERROR; + } + crate::alloc::internal_free_array( + (*(*coder).thr).in_0, + (*(*coder).thr).in_alloc_size, + allocator, + ); + (*(*coder).thr).in_0 = new_in; + (*(*coder).thr).in_alloc_size = (*coder).block_size; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@xz-core/src/common/stream_encoder_mt.rs` around lines 491 - 495, The threaded worker allocation records in_alloc_size once but doesn't grow reused buffers when coder.block_size increases; update the allocation logic around where (*thr).in_0 is set so that if (*thr).in_0 is null OR (*thr).in_alloc_size < (*coder).block_size you (re)allocate a buffer of size (*coder).block_size via crate::alloc::internal_alloc_array::<u8>(), update (*thr).in_alloc_size to (*coder).block_size, and on allocation failure return LZMA_MEM_ERROR; this ensures stream_encode_in can't copy more bytes than the worker's in_0 can hold.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@xz-core/src/common/stream_encoder_mt.rs`:
- Around line 491-495: The threaded worker allocation records in_alloc_size once
but doesn't grow reused buffers when coder.block_size increases; update the
allocation logic around where (*thr).in_0 is set so that if (*thr).in_0 is null
OR (*thr).in_alloc_size < (*coder).block_size you (re)allocate a buffer of size
(*coder).block_size via crate::alloc::internal_alloc_array::<u8>(), update
(*thr).in_alloc_size to (*coder).block_size, and on allocation failure return
LZMA_MEM_ERROR; this ensures stream_encode_in can't copy more bytes than the
worker's in_0 can hold.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 487f7e22-b957-4ba3-8625-eb1af1287b12
📒 Files selected for processing (7)
Cargo.lock.msrvREADME.mdxz-core/src/alloc/custom.rsxz-core/src/common/index_hash.rsxz-core/src/common/stream_decoder_mt.rsxz-core/src/common/stream_encoder_mt.rsxz-core/src/lzma/lzma_decoder.rs
Summary by CodeRabbit
Documentation
Tests
Bug Fixes
Chores