Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
a57a504
Decouple liblzma-sys backend from bindgen
youknowone Apr 13, 2026
c6236a1
trim
youknowone Apr 13, 2026
5142a3d
Refine xz internals and restore systest perf builds
youknowone Apr 13, 2026
1f964de
Tidy trivial c2rust naming remnants
youknowone Apr 13, 2026
3627f85
Tidy remaining low-risk c2rust naming
youknowone Apr 13, 2026
1ebdc7b
Use ctest 0.5.1 raw identifier support
youknowone Apr 13, 2026
a93e21b
Reduce mutable statics and narrow unsafe helpers
youknowone Apr 13, 2026
d8d77f1
revert crc64
youknowone Apr 14, 2026
2d7323e
ifix
youknowone Apr 14, 2026
19e4fe3
cleanup
youknowone Apr 14, 2026
ce93d55
Reduce internal panic paths
youknowone Apr 16, 2026
0314bc7
Refine LZMA hot paths and backend comparison tooling
youknowone Apr 16, 2026
834e3fd
Tune performance probes and hot paths
youknowone Apr 17, 2026
c4b047d
Trim hot-path function-pointer checks
youknowone Apr 17, 2026
2211802
remove redundant
youknowone Apr 17, 2026
dc080a9
edition 2024
youknowone Apr 17, 2026
1c627e0
xz 5.8.3
youknowone Apr 18, 2026
146611d
Optino<fn>
youknowone Apr 18, 2026
e8a96e6
Replace c2rust state hashes with named control flow
youknowone Apr 18, 2026
b36bdd0
Refine allocator defaults and optimum helper flow
youknowone Apr 18, 2026
d1ae9b8
refine
youknowone Apr 19, 2026
db7ed70
Revert top-level src changes to origin
youknowone Apr 20, 2026
1a94603
1.88
youknowone Apr 20, 2026
674511a
fix systest
youknowone Apr 20, 2026
e944c6f
let-else
youknowone Apr 20, 2026
7a41ac9
Restore cold literal-after helpers in optimum encoder
youknowone Apr 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ wasm-bindgen-test = "0.3"
default = ["xz"]
xz = ["dep:xz"]
xz-sys = ["dep:xz-sys"]
liblzma-sys = ["dep:liblzma-sys", "liblzma-sys/bindgen"]
liblzma-sys = ["dep:liblzma-sys"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Cargo, when the same crate appears in [dependencies]and[dev-dependencies], are features unified for benches/examples/tests, and do default features from the dev-dependency stay enabled?

💡 Result:

Yes, when the same crate appears in both [dependencies] and [dev-dependencies], Cargo unifies the features for builds that include dev-dependencies (benches, examples, tests). This uses the union of features specified in both sections. Default features from the dev-dependency declaration remain enabled unless explicitly disabled there (via default-features = false). For normal builds (cargo build on lib/bin), only [dependencies] features apply (dev-deps ignored). This behavior depends on the feature resolver version (default "2" for edition 2021+, controlled by top-level resolver field). Resolver v2 avoids unwanted unification but includes dev-dep features when building dev targets like tests/examples/benches. Example from docs: [dependencies] serde = {version = "1.0", default-features = false} [dev-dependencies] serde = {version = "1.0", features = ["std"]} - cargo build: serde without features (defaults off). - cargo test: serde with "std" (union, defaults from dev-dep enabled since not disabled).

Citations:


🏁 Script executed:

# Check the structure of Cargo.toml, particularly around the lines mentioned
sed -n '20,60p' Cargo.toml | cat -n

Repository: youknowone/xz-rs

Length of output: 1680


liblzma-sys still enables bindgen for dev targets.

Line 45 defines a target-specific dev-dependency liblzma-sys = { path = "liblzma-sys" } without default-features = false. When benches/examples/tests are built with the liblzma-sys feature, Cargo unifies features across dependencies and dev-dependencies, re-enabling the default bindgen feature from the dev-dependency. This defeats the goal of disabling it at Line 25 and keeps the libclang requirement around for those builds.

♻️ Proposed fix
 [target.'cfg(not(target_family = "wasm"))'.dev-dependencies]
 xz-sys = { path = "xz-sys" }
-liblzma-sys = { path = "liblzma-sys" }
+liblzma-sys = { path = "liblzma-sys", default-features = false }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Cargo.toml` at line 54, The dev-dependency entry for liblzma-sys still pulls
in its default features (re-enabling bindgen); update the liblzma-sys dependency
declarations so they set default-features = false (both the target-specific
dev-dependency and the liblzma-sys = ["dep:liblzma-sys"] mapping) to prevent the
bindgen feature from being activated for benches/examples/tests; look for the
liblzma-sys entries and add default-features = false (and adjust to a table form
if needed) to ensure the bindgen feature stays disabled.

static = ["liblzma-sys?/static"]
parallel = ["xz-sys?/parallel", "liblzma-sys?/parallel", "num_cpus"]
bindgen = ["liblzma-sys?/bindgen"] # only affects liblzma-sys
Expand Down
35 changes: 24 additions & 11 deletions benches/backend_comparison.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,22 @@ compile_error!("backend_comparison bench requires `xz`, `xz-sys`, or `liblzma-sy
#[cfg(feature = "liblzma-sys")]
use liblzma_sys::{
lzma_crc32, lzma_crc64, lzma_easy_buffer_encode, lzma_stream_buffer_bound,
lzma_stream_buffer_decode, LZMA_CHECK_CRC64,
lzma_stream_buffer_decode, LZMA_CHECK_CRC64, LZMA_OK,
};
#[cfg(feature = "xz")]
use xz::check::{crc32_fast::lzma_crc32, crc64_fast::lzma_crc64};
#[cfg(feature = "xz")]
use xz::common::{
easy_buffer_encoder::lzma_easy_buffer_encode, stream_buffer_decoder::lzma_stream_buffer_decode,
stream_buffer_encoder::lzma_stream_buffer_bound,
use xz::{
check::{crc32_fast::lzma_crc32, crc64_fast::lzma_crc64},
common::{
easy_buffer_encoder::lzma_easy_buffer_encode,
stream_buffer_decoder::lzma_stream_buffer_decode,
stream_buffer_encoder::lzma_stream_buffer_bound,
},
types::{LZMA_CHECK_CRC64, LZMA_OK},
};
#[cfg(feature = "xz")]
use xz::types::LZMA_CHECK_CRC64;
#[cfg(feature = "xz-sys")]
use xz_sys::{
lzma_crc32, lzma_crc64, lzma_easy_buffer_encode, lzma_stream_buffer_bound,
lzma_stream_buffer_decode, LZMA_CHECK_CRC64,
lzma_stream_buffer_decode, LZMA_CHECK_CRC64, LZMA_OK,
};

#[cfg(feature = "xz")]
Expand All @@ -56,7 +57,7 @@ unsafe fn backend_encode(input: &[u8]) -> Vec<u8> {
let bound = lzma_stream_buffer_bound(input.len());
let mut out = vec![0u8; bound];
let mut out_pos: usize = 0;
lzma_easy_buffer_encode(
let ret = lzma_easy_buffer_encode(
6,
LZMA_CHECK_CRC64,
ptr::null(),
Expand All @@ -66,6 +67,7 @@ unsafe fn backend_encode(input: &[u8]) -> Vec<u8> {
&mut out_pos,
out.len(),
);
assert_eq!(ret, LZMA_OK, "{BACKEND_NAME} encode failed with {ret}");
out.truncate(out_pos);
out
}
Expand All @@ -75,7 +77,7 @@ unsafe fn backend_decode(compressed: &[u8], out_size: usize) -> Vec<u8> {
let mut memlimit = u64::MAX;
let mut in_pos = 0usize;
let mut out_pos = 0usize;
lzma_stream_buffer_decode(
let ret = lzma_stream_buffer_decode(
&mut memlimit,
0,
ptr::null(),
Expand All @@ -86,6 +88,17 @@ unsafe fn backend_decode(compressed: &[u8], out_size: usize) -> Vec<u8> {
&mut out_pos,
out.len(),
);
assert_eq!(ret, LZMA_OK, "{BACKEND_NAME} decode failed with {ret}");
assert_eq!(
in_pos,
compressed.len(),
"{BACKEND_NAME} decode left trailing input: consumed {in_pos} of {} bytes",
compressed.len()
);
assert_eq!(
out_pos, out_size,
"{BACKEND_NAME} decode produced {out_pos} bytes, expected {out_size}"
);
out.truncate(out_pos);
out
}
Expand Down
13 changes: 13 additions & 0 deletions liblzma-sys/src/manual.rs
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ pub const LZMA_PB_DEFAULT: u32 = 2;

pub const LZMA_BACKWARD_SIZE_MIN: lzma_vli = 4;
pub const LZMA_BACKWARD_SIZE_MAX: lzma_vli = 1 << 34;
pub const LZMA_STREAM_HEADER_SIZE: u32 = 12;

pub const LZMA_VLI_MAX: lzma_vli = u64::MAX / 2;
pub const LZMA_VLI_UNKNOWN: lzma_vli = u64::MAX;
Expand All @@ -99,6 +100,7 @@ pub struct lzma_allocator {
pub opaque: *mut c_void,
}

pub enum lzma_index {}
pub enum lzma_internal {}

#[repr(C)]
Expand Down Expand Up @@ -344,6 +346,17 @@ extern "C" {
b: *const lzma_stream_flags,
) -> lzma_ret;

pub fn lzma_index_buffer_decode(
i: *mut *mut lzma_index,
memlimit: *mut u64,
allocator: *const lzma_allocator,
input: *const u8,
in_pos: *mut size_t,
in_size: size_t,
) -> lzma_ret;
pub fn lzma_index_uncompressed_size(i: *const lzma_index) -> lzma_vli;
pub fn lzma_index_end(i: *mut lzma_index, allocator: *const lzma_allocator);

pub fn lzma_version_number() -> u32;
pub fn lzma_version_string() -> *const c_char;

Expand Down
12 changes: 11 additions & 1 deletion perf-probe/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ fn usage() -> String {
let mut message = String::new();
message.push_str("Usage:\n");
message.push_str(
" cargo run -p perf-probe --release --no-default-features --features <liblzma-sys|xz-sys> -- \\\n",
" cargo run -p perf-probe --release --no-default-features --features <xz|xz-sys|liblzma-sys> -- \\\n",
);
message.push_str(" --workload <encode|decode|size|crc32|crc64> [options]\n\n");
message.push_str("Options:\n");
Expand Down Expand Up @@ -527,6 +527,16 @@ unsafe fn backend_decode(compressed: &[u8], out_size: usize) -> Vec<u8> {
)
};
assert_eq!(ret, LZMA_OK, "{BACKEND_NAME} decode failed with {ret}");
assert_eq!(
in_pos,
compressed.len(),
"{BACKEND_NAME} decode left trailing input: consumed {in_pos} of {} bytes",
compressed.len()
);
Comment on lines +534 to +539
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Does liblzma's lzma_stream_buffer_decodeallow trailing bytes by returning success and advancingin_pos only to the first unused byte?

💡 Result:

Yes, liblzma's lzma_stream_buffer_decode allows trailing bytes by returning LZMA_OK (success) and advancing *in_pos to the position of the first unused byte after successfully decoding a complete .xz Stream or .lzma file from the input buffer. This behavior follows the single-call decoding pattern in liblzma, where like lzma_code, it consumes only the necessary bytes for one complete stream and updates *in_pos accordingly, leaving any trailing data unconsumed without error if the decoding succeeded. The official documentation states that *in_pos and *out_pos are updated only if decoding succeeds (LZMA_OK), confirming partial consumption on success. Analogous streaming decoders (e.g., Python's LZMADecompressor) explicitly provide unconsumed_tail for trailing bytes after LZMA_STREAM_END, supporting this design.

Citations:


🏁 Script executed:

cat -n perf-probe/src/main.rs | sed -n '525,540p'

Repository: youknowone/xz-rs

Length of output: 639


🏁 Script executed:

cat -n perf-probe/src/main.rs | sed -n '500,545p'

Repository: youknowone/xz-rs

Length of output: 1659


🏁 Script executed:

rg -n "backend_decode\(" perf-probe/src/main.rs

Repository: youknowone/xz-rs

Length of output: 281


🏁 Script executed:

cat -n perf-probe/src/main.rs | sed -n '355,385p'

Repository: youknowone/xz-rs

Length of output: 1357


🏁 Script executed:

rg -n "fn load_compressed_input" perf-probe/src/main.rs

Repository: youknowone/xz-rs

Length of output: 151


🏁 Script executed:

cat -n perf-probe/src/main.rs | sed -n '299,330p'

Repository: youknowone/xz-rs

Length of output: 1394


🏁 Script executed:

rg -n "compressed.input" perf-probe/src/main.rs | head -20

Repository: youknowone/xz-rs

Length of output: 1277


Don't require full input consumption for --compressed-input.

lzma_stream_buffer_decode can legitimately return success with in_pos < compressed.len() when the buffer contains trailing bytes or extra concatenated streams. The assertion at lines 530–535 enforces full input consumption, which rejects valid inputs the liblzma API itself accepts. This is a user-visible regression for externally supplied data via --compressed-input (see load_compressed_input, line 299–312). Keep the exact-consumption check for synthesized single-stream data only, or make it opt-in.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@perf-probe/src/main.rs` around lines 530 - 535, The assertion requiring full
consumption (assert_eq!(in_pos, compressed.len(), ...)) is too strict for
lzma_stream_buffer_decode because it may validly leave trailing bytes; update
the logic around the lzma_stream_buffer_decode result (the in_pos check) to skip
enforcing exact consumption for external/compressed inputs loaded by
load_compressed_input and only assert exact-consumption for synthesized
single-stream data (or gate it behind an opt-in flag); identify and modify the
assert that compares in_pos and compressed.len(), and adjust call sites and any
tests that rely on full-consumption so that trailing bytes or concatenated
streams are accepted for --compressed-input while preserving the strict check
where appropriate.

assert_eq!(
out_pos, out_size,
"{BACKEND_NAME} decode produced {out_pos} bytes, expected {out_size}"
);
out.truncate(out_pos);
out
}
Expand Down
2 changes: 1 addition & 1 deletion scripts/compare_api_workloads.sh
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ esac
RUST_BIN="$RUST_TARGET/release/examples/$EXAMPLE_NAME"
C_BIN="$C_TARGET/release/examples/$EXAMPLE_NAME"

env CARGO_TARGET_DIR="$RUST_TARGET" cargo build --example "$EXAMPLE_NAME" --release >/dev/null
env CARGO_TARGET_DIR="$RUST_TARGET" cargo build --example "$EXAMPLE_NAME" --release --no-default-features --features xz >/dev/null
env LZMA_API_STATIC=1 CARGO_TARGET_DIR="$C_TARGET" cargo build --example "$EXAMPLE_NAME" --release --no-default-features --features liblzma-sys >/dev/null

hyperfine \
Expand Down
2 changes: 1 addition & 1 deletion scripts/compare_backends.sh
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ SYSTEST_RUST_CMD="env CARGO_TARGET_DIR=$SYSTEST_RUST_TARGET cargo test -p systes
SYSTEST_C_CMD="env LZMA_API_STATIC=1 CARGO_TARGET_DIR=$SYSTEST_C_TARGET cargo test -p systest --release --no-default-features --features liblzma-sys -- --test-threads=1"

echo "prebuilding root test binaries..."
env CARGO_TARGET_DIR="$ROOT_RUST_TARGET" cargo test --release --no-run >/dev/null
env CARGO_TARGET_DIR="$ROOT_RUST_TARGET" cargo test -p liblzma --release --no-default-features --features xz --no-run >/dev/null
env LZMA_API_STATIC=1 CARGO_TARGET_DIR="$ROOT_C_TARGET" cargo test --release --no-default-features --features liblzma-sys --no-run >/dev/null

hyperfine \
Expand Down
7 changes: 1 addition & 6 deletions scripts/profile_backend.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,10 @@ set -euo pipefail

if [[ $# -lt 2 ]]; then
cat <<'EOF' >&2
Usage: scripts/profile_backend.sh <c|liblzma-sys|xz|rust|xz-sys|both> <encode|decode|size|crc32|crc64> [backend_probe args...]
Usage: scripts/profile_backend.sh <c|liblzma-sys|xz|rust|xz-sys> <encode|decode|size|crc32|crc64> [backend_probe args...]

Examples:
scripts/profile_backend.sh xz decode --size 1048576 --iters 500 --warmup 50
scripts/profile_backend.sh both encode --input-kind random --size 8388608
scripts/profile_backend.sh xz size --input-kind random --size 1048576 --iters 800 --warmup 80

Environment:
Expand Down Expand Up @@ -40,10 +39,6 @@ case "$BACKEND" in
TARGET_DIR="target/profile-bench-xz-sys"
BACKEND_ENV=()
;;
both)
echo "profile_backend.sh profiles one backend at a time; use xz, xz-sys, or liblzma-sys" >&2
exit 2
;;
*)
echo "unknown backend: $BACKEND" >&2
exit 2
Expand Down
10 changes: 8 additions & 2 deletions xz/src/common/common.rs
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ unsafe fn free(ptr: *mut c_void) {
}
pub const LZMA_VERSION_MAJOR: u32 = 5;
pub const LZMA_VERSION_MINOR: u32 = 8;
pub const LZMA_VERSION_PATCH: u32 = 2;
pub const LZMA_VERSION_PATCH: u32 = 3;
pub const LZMA_VERSION_STABILITY: u32 = LZMA_VERSION_STABILITY_STABLE;
pub const LZMA_VERSION_STABILITY_STABLE: u32 = 2;
pub const LZMA_VERSION: c_uint = LZMA_VERSION_MAJOR * 10000000
Expand All @@ -76,7 +76,7 @@ pub fn lzma_version_number() -> u32 {
LZMA_VERSION as u32
}
pub fn lzma_version_string() -> *const c_char {
crate::c_str!("5.8.2")
crate::c_str!("5.8.3")
}
pub unsafe fn lzma_alloc(mut size: size_t, allocator: *const lzma_allocator) -> *mut c_void {
if size == 0 {
Expand Down Expand Up @@ -120,6 +120,12 @@ pub unsafe fn lzma_bufcpy(
out_pos: *mut size_t,
out_size: size_t,
) -> size_t {
if *in_pos > in_size || *out_pos > out_size {
return 0;
}
if (in_0.is_null() && *in_pos != in_size) || (out.is_null() && *out_pos != out_size) {
return 0;
}
debug_assert!(!in_0.is_null() || *in_pos == in_size);
debug_assert!(!out.is_null() || *out_pos == out_size);
debug_assert!(*in_pos <= in_size);
Expand Down