Skip to content

Add backend performance workflow and tuning#5

Merged
youknowone merged 5 commits intomainfrom
tuning
Apr 13, 2026
Merged

Add backend performance workflow and tuning#5
youknowone merged 5 commits intomainfrom
tuning

Conversation

@youknowone
Copy link
Copy Markdown
Owner

@youknowone youknowone commented Apr 9, 2026

Summary by CodeRabbit

  • New Features

    • Repeatable performance framework: CLI probe, comparison, profiling and code-inspection tools for gated backend benchmarking.
    • Default build now uses the Rust-backed xz backend; build enforces single-backend selection.
    • AArch64 CRC32 acceleration with runtime detection.
  • Examples

    • Added deterministic probes: QuickCheck-style, standard-file corpus, and trailing-bytes workloads.
  • Documentation

    • New performance workflow guide and README backend-selection instructions.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a performance/profiling workflow: new perf-probe crate and CLI, benchmarking/profiling scripts and docs, example probes, bench refactor to single compile-time backend, crate renames (liblzma-rs→xz, liblzma-rs-sys→xz-sys), many low-level xz implementation optimizations and symbol-visibility tightenings.

Changes

Cohort / File(s) Summary
Workspace & Manifests
Cargo.toml, Cargo.lock.msrv, xz/Cargo.toml, xz-sys/Cargo.toml, systest/Cargo.toml
Renamed crates (liblzma-rsxz, liblzma-rs-sysxz-sys), updated workspace membership, swapped default backend to xz, rewired optional backend features and path dependencies.
Performance docs & guide
AGENTS.md, docs/performance-workflow.md, README.md
New performance workflow and gating docs; added operational constraints (deterministic gate, no mixed sys backends, profiler selection rules) and backend-selection guidance.
Perf tooling crate
perf-probe/Cargo.toml, perf-probe/src/main.rs
New perf-probe crate implementing workloads (encode/decode/size/crc32/crc64), single-backend compile-time selection, input generation, measurement loop, and backend FFI helpers.
Shell tooling
scripts/*
scripts/compare_backends.sh, scripts/compare_workloads.sh, scripts/compare_api_workloads.sh, scripts/profile_backend.sh, scripts/inspect_codegen.sh, scripts/run_root_test_bins.sh
New scripts to build feature-specific binaries, run hyperfine comparisons, profile backends (perf/samply/plain), inspect codegen, and run prebuilt root test binaries reproducibly; emit artifacts under target/perf-results/.
Benchmarks
benches/backend_comparison.rs
Refactored benchmark to a single compile-time-selected backend with compile_error! guards enforcing exactly one backend feature and consolidated BACKEND_NAME/BenchmarkId usage.
Example probes
examples/qc_probe.rs, examples/standard_files_probe.rs, examples/bufread_trailing_probe.rs
Added three example binaries (QC-style, corpus-based, bufread-trailing) with CLI, warmup/measure phases, deterministic inputs, digesting, and throughput/timing output.
Perf integration scripts
scripts/compare_workloads.sh, scripts/compare_api_workloads.sh, scripts/compare_backends.sh
Scripts build Rust vs C variants into separate target dirs, optionally generate compressed inputs, and run hyperfine to produce JSON/Markdown reports.
Sys wiring & tests
src/lib.rs, systest/*, tests/sys_equivalence.rs, src/{bufread.rs,stream.rs,write.rs}
Replaced legacy rust/c-backend gating with xz/xz-sys/liblzma-sys features, added conditional pub(crate) mod sys modules, updated conditional imports, and adjusted tests and include paths to new sys crates.
MaybeUninit & easy helpers
liblzma-rs/src/common/easy_*.rs, xz/src/common/easy_*.rs, xz/src/common/stream_*
Replaced zeroed struct literals with core::mem::MaybeUninit patterns for easy_* helpers, footer/flags handling, and filter arrays; adjusted pointer usage and deallocation.
Low-level perf & API visibility
xz/src/... (many files: check/*, crc32_fast.rs, crc64_fast.rs, types.rs, lz/*, lzma/*, simple/*, common/*)
Performance-focused changes: ARM64 CRC32 path with runtime crc detection, immutable CRC64 table, unaligned read/write helpers, pointer caching, more aggressive inlining, range-encoder refactor, rc_symbol alias change, and widespread tightening of exported symbols (pubpub(crate) or removal of extern "C" linkage).
Examples/tests wiring & small edits
benches/*, examples/*, src/{lib.rs,bufread.rs,stream.rs,write.rs}, xz/src/simple/simple_coder.rs
Minor formatting, import aliases (use crate::sys as liblzma_sys), crate-level cfg attributes, and test gating adjustments to standardize backend symbol resolution.

Sequence Diagram(s)

sequenceDiagram
    participant Script as scripts/*.sh
    participant Cargo as cargo (build)
    participant Probe as perf-probe
    participant Backend as Selected Backend (xz / xz-sys / liblzma-sys)
    participant Profiler as hyperfine / perf / samply
    Script->>Cargo: build perf-probe with --features and target-dir
    Cargo->>Probe: produce binary
    Script->>Profiler: invoke profiler (hyperfine/perf/samply)
    Profiler->>Probe: execute workload iterations
    Probe->>Backend: call lzma_* FFI/API (encode/decode/crc/size)
    Backend-->>Probe: return results
    Probe-->>Profiler: report timings and digest
    Profiler-->>Script: write JSON/MD or profile artifacts (target/perf-results/)
    Script-->>User: print report paths
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

"🐰
I hopped through crates and scripts galore,
Renamed the paths and added probes to score,
ARM64 CRCs now race with delight,
MaybeUninit saves zeroing’s plight,
Docs and scripts sing metrics bright!"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add backend performance workflow and tuning' accurately reflects the primary changes: it introduces a comprehensive performance workflow (AGENTS.md, docs/performance-workflow.md, multiple scripts) and backend tuning (optimizations in xz crate).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch tuning

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
liblzma-rs/src/common/stream_decoder.rs (1)

176-188: ⚠️ Potential issue | 🔴 Critical

Fix unsound assume_init(): reserved fields are not initialized by lzma_stream_footer_decode().

The stream_flags_decode() function only initializes the version and check fields. The 14 reserved fields (4 enums + 8 bools + 2 ints) remain uninitialized when assume_init() is called at line 188, violating Rust's safety invariants and creating undefined behavior.

Options to fix:

  1. Zero-initialize the struct before passing to the decoder:
let mut footer_flags = MaybeUninit::<lzma_stream_flags>::zeroed();
let ret_3: lzma_ret = lzma_stream_footer_decode(footer_flags.as_mut_ptr(), ...);
  1. Or keep using raw pointers for the full decode cycle instead of assume_init().
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@liblzma-rs/src/common/stream_decoder.rs` around lines 176 - 188, The code
unsafely calls footer_flags.assume_init() after lzma_stream_footer_decode even
though lzma_stream_footer_decode only sets version and check and leaves reserved
fields uninitialized; change the flow so footer_flags is fully initialized
before assume_init() — e.g., create footer_flags with
MaybeUninit::<lzma_stream_flags>::zeroed() (or otherwise zero/init the struct)
before passing footer_flags.as_mut_ptr() to lzma_stream_footer_decode, or
alternatively keep using raw pointer access throughout instead of calling
assume_init(); update references to footer_flags, lzma_stream_footer_decode, and
the assume_init() call accordingly.
benches/backend_comparison.rs (1)

34-69: ⚠️ Potential issue | 🟠 Major

Check liblzma status codes in the benchmark helpers.

backend_encode and backend_decode ignore the return value and only trust out_pos. If a backend regresses or the feature wiring is wrong, this harness can still emit timings for partial/failed work instead of failing the comparison run.

Based on learnings: Use scripts/compare_backends.sh as the full deterministic regression gate for backend performance comparisons.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benches/backend_comparison.rs` around lines 34 - 69, backend_encode and
backend_decode currently ignore the lzma return codes and only use out_pos;
capture the return value from backend_sys::lzma_easy_buffer_encode and
backend_sys::lzma_stream_buffer_decode and check it against the expected success
codes (e.g. backend_sys::LZMA_OK or backend_sys::LZMA_STREAM_END as
appropriate), and if not successful panic or return an Err with a clear message
including the return code (and optionally translate via lzma error helper if
available); only truncate out by out_pos after confirming success so
failed/partial operations don't silently produce benchmarks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/standard_files_probe.rs`:
- Around line 240-261: The measure, fold_bytes, and print_measurement functions
are duplicated between standard_files_probe.rs and qc_probe.rs; extract these
shared benchmark utilities into a single module (e.g., examples/common or
bench_util) and have both probes import them. Move the functions (measure,
fold_bytes, print_measurement) into the new module, update both files to use the
new module's public functions, remove the duplicated implementations, and ensure
any required types (e.g., Measurement) are also exported from the new module so
callers compile without changes.
- Line 175: The code currently calls entry.file_name().into_string().unwrap(),
which will panic on non-UTF8 filenames; change this to handle non-UTF8 safely by
replacing the unwrap with a non-panicking conversion such as using
entry.file_name().to_string_lossy() (or explicitly match on into_string() and
fall back to a lossy/path display) and assign that result to filename so tests
don't panic on weird filenames.

In `@liblzma-rs/src/check/crc32_fast.rs`:
- Around line 357-424: aligned_read16le/32le/64le currently use
core::ptr::read_unaligned which yields native-endian values so on big-endian
AArch64 the CRC is wrong; change those helpers to read the unaligned integer and
convert to little-endian (e.g. read_unaligned(... ) then call u16::from_le /
u32::from_le / u64::from_le) so aligned_read16le/32le/64le always return
little-endian values, leaving lzma_crc32_arm64 logic unchanged.

In `@liblzma-rs/src/check/crc64_fast.rs`:
- Around line 1061-1069: The tail-loop rewrite is safe for empty input because
the check `if old_size == 0 { break; }` occurs before any dereference of `buf`;
add a short clarifying comment above the loop referencing `size`, `old_size`,
`buf`, `crc`, and `table0` that explains the break protects against zero-length
buffers and preserve the current order of operations (capture `old_size`,
decrement `size`, then test and break) so no dereference happens on empty input.

In `@liblzma-rs/src/common/common.rs`:
- Around line 123-126: Replace the debug-only assertions in lzma_bufcpy with
real runtime bounds checks: before doing any arithmetic like (in_size - *in_pos)
or calling copy_nonoverlapping, check that *in_pos <= in_size and *out_pos <=
out_size and that pointers are non-null; if any check fails, do an early
return/error from lzma_bufcpy (do not perform the subtraction or call
copy_nonoverlapping). Update lzma_bufcpy’s control flow so invalid inputs are
rejected safely (no underflow-driven large lengths), and ensure the change is
coherent with the public API paths that call it (e.g., simple_code /
simple_coder) by propagating or returning the error rather than performing an
unsafe copy.

In `@liblzma-rs/src/common/easy_buffer_encoder.rs`:
- Around line 14-18: The call to opt_easy.assume_init_mut() is unsound because
lzma_easy_preset() only initializes part of lzma_options_easy (filters[0..1]),
so create and use a raw pointer instead of making a &mut reference; keep using
opt_easy.as_mut_ptr() (the *mut lzma_options_easy returned from MaybeUninit) for
calls to lzma_easy_preset and any subsequent field access, and when you need to
touch fields use unsafe pointer reads/writes (e.g., core::ptr::addr_of_mut /
core::ptr::read/write or direct deref of the *mut in unsafe) so you never form a
mutable reference to the partially-initialized opt_easy; remove the
assume_init_mut() call and update code that referenced opt_easy to use the raw
pointer operations on the original MaybeUninit buffer.

In `@liblzma-rs/src/common/easy_decoder_memusage.rs`:
- Around line 4-8: The current code calls assume_init_mut() on opt_easy after
lzma_easy_preset(), but lzma_easy_preset() only partially initializes
lzma_options_easy so assume_init_mut() is UB; instead keep opt_easy as
MaybeUninit, use opt_easy.as_mut_ptr() to get a *mut lzma_options_easy, and
access the filters and other fields through unsafe deref (e.g., let opt_easy_ptr
= opt_easy.as_mut_ptr(); unsafe { (*opt_easy_ptr).filters[...] } ) without
calling assume_init_mut() or treating the whole struct as initialized; adjust
uses in the memusage helper (opt_easy, lzma_easy_preset) to read only the
initialized subfields via the raw pointer.

In `@liblzma-rs/src/common/easy_encoder_memusage.rs`:
- Around line 4-8: The code currently calls assume_init_mut() on the
MaybeUninit<lzma_options_easy> (opt_easy) after lzma_easy_preset(), causing
undefined behavior because the struct is only partially initialized; replace the
assume_init_mut() usage by accessing the fields via the raw pointer returned by
opt_easy.as_mut_ptr() (e.g., read/write the filters field through the pointer)
instead of materializing the whole struct; update the same pattern in
easy_decoder_memusage.rs, easy_encoder.rs, and easy_buffer_encoder.rs where
lzma_easy_preset(), lzma_options_easy, opt_easy, or opt_* MaybeUninit variables
are used so no assume_init*() is called on partially-initialized data.

In `@liblzma-rs/src/common/easy_encoder.rs`:
- Around line 9-13: The code currently calls assume_init_mut() on opt_easy after
lzma_easy_preset(), but lzma_easy_preset() only initializes part of
lzma_options_easy (opt_lzma, filters[0], and filters[1].id), so calling
assume_init_mut() yields undefined behavior; instead keep opt_easy as
MaybeUninit, obtain a raw pointer via opt_easy.as_mut_ptr(), and
access/initialize only the known initialized fields (e.g., use
(*opt_ptr).filters[..] or pointer arithmetic to read/write filters[0] and
filters[1].id and avoid creating references to the uninitialized remainder),
then only assume_init()/assume_init_mut() after all fields are properly
initialized or avoid assuming init at all and work through the raw pointer for
subsequent FFI calls such as lzma_easy_preset and any code that reads filters.

In `@liblzma-rs/src/lz/lz_encoder_mf.rs`:
- Around line 623-624: Replace the plain subtraction used to compute delta2 and
delta3 with wrapping subtraction to match lzma_mf_hc4_find: compute delta2 as
pos.wrapping_sub(*hash.offset(hash_2_value as isize)) and delta3 as
pos.wrapping_sub(*hash.offset(hash_3_index as isize)). Update the uses in the
same block where pos, hash, hash_2_value and hash_3_index are referenced so
underflow due to wrapped positions is handled consistently.

In `@scripts/compare_backends.sh`:
- Around line 33-45: The usage/help block printed by the here-doc in
scripts/compare_backends.sh (the cat <<'EOF' block starting with "Usage:
scripts/compare_backends.sh [options]") is missing documentation for the
implemented --help|-h flag; update that help text by adding a line documenting
"-h, --help    Show this help message" (or similar) to the Options section so
the implemented --help|-h behavior is reflected in the usage output.

In `@scripts/compare_workloads.sh`:
- Around line 49-51: The call that generates COMPRESSED_INPUT uses
"${RAW_ARGS[@]}" then appends "--iters 1 --warmup 0", which can be overridden by
duplicate flags in RAW_ARGS; update the invocation around C_BIN so the intended
overrides are enforced by either (a) moving "--iters 1 --warmup 0" before
"${RAW_ARGS[@]}" or (b) sanitizing RAW_ARGS to remove any user-supplied --iters
and --warmup entries before using it; reference the COMPRESSED_INPUT variable,
the C_BIN invocation line that runs --workload encode, and RAW_ARGS when making
the change.
- Around line 53-64: The decode command arrays RUST_CMD and C_CMD currently omit
forwarding user-provided iteration/warmup options; update both arrays (RUST_CMD
and C_CMD) to include the iteration flags by appending --iters "$ITERS" and
--warmup "$WARMUP" (or the equivalent variables/argument-slice your script sets
for those user flags) so any provided --iters/--warmup values are passed through
to the decode workload.

In `@scripts/inspect_codegen.sh`:
- Around line 38-65: The option parsing loop reads "$2" for flags like
--package, --format, --features, and --target-dir without guarding argument
count, causing unbound-variable failures under set -u; update the case branches
in the while/case block to first check that $# -ge 2 (or that a next argument
exists) before assigning PACKAGE, FORMAT, FEATURES or TARGET_DIR, and if the
value is missing print a clear error (e.g., "missing value for --package") and
exit with non-zero status so the script fails with a user-friendly message
instead of an unbound-variable error.

In `@scripts/profile_backend.sh`:
- Around line 4-18: The usage help in scripts/profile_backend.sh states
"<c|rust|both>" but the runtime validation later rejects "both"; update the
script so help and validation match: either remove "both" from the usage/help
text (the heredoc shown in the file) or change the validation logic that
currently rejects "both" to accept it and implement the intended behavior for
handling both backends (the backend selection/dispatch logic). Locate the usage
heredoc and the backend validation block in scripts/profile_backend.sh and make
them consistent (modify the usage string if you don't intend to support both, or
add support for "both" in the validation and execution flow).

---

Outside diff comments:
In `@benches/backend_comparison.rs`:
- Around line 34-69: backend_encode and backend_decode currently ignore the lzma
return codes and only use out_pos; capture the return value from
backend_sys::lzma_easy_buffer_encode and backend_sys::lzma_stream_buffer_decode
and check it against the expected success codes (e.g. backend_sys::LZMA_OK or
backend_sys::LZMA_STREAM_END as appropriate), and if not successful panic or
return an Err with a clear message including the return code (and optionally
translate via lzma error helper if available); only truncate out by out_pos
after confirming success so failed/partial operations don't silently produce
benchmarks.

In `@liblzma-rs/src/common/stream_decoder.rs`:
- Around line 176-188: The code unsafely calls footer_flags.assume_init() after
lzma_stream_footer_decode even though lzma_stream_footer_decode only sets
version and check and leaves reserved fields uninitialized; change the flow so
footer_flags is fully initialized before assume_init() — e.g., create
footer_flags with MaybeUninit::<lzma_stream_flags>::zeroed() (or otherwise
zero/init the struct) before passing footer_flags.as_mut_ptr() to
lzma_stream_footer_decode, or alternatively keep using raw pointer access
throughout instead of calling assume_init(); update references to footer_flags,
lzma_stream_footer_decode, and the assume_init() call accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 37b984f8-23b2-4fe8-8dd4-a4882f8ec6c6

📥 Commits

Reviewing files that changed from the base of the PR and between bb544f1 and 2891602.

📒 Files selected for processing (28)
  • AGENTS.md
  • Cargo.toml
  • benches/backend_comparison.rs
  • docs/performance-workflow.md
  • examples/qc_probe.rs
  • examples/standard_files_probe.rs
  • liblzma-rs/src/check/crc32_fast.rs
  • liblzma-rs/src/check/crc64_fast.rs
  • liblzma-rs/src/common/common.rs
  • liblzma-rs/src/common/easy_buffer_encoder.rs
  • liblzma-rs/src/common/easy_decoder_memusage.rs
  • liblzma-rs/src/common/easy_encoder.rs
  • liblzma-rs/src/common/easy_encoder_memusage.rs
  • liblzma-rs/src/common/stream_decoder.rs
  • liblzma-rs/src/common/stream_encoder.rs
  • liblzma-rs/src/delta/delta_decoder.rs
  • liblzma-rs/src/lz/lz_encoder_mf.rs
  • liblzma-rs/src/lzma/lzma_decoder.rs
  • liblzma-rs/src/lzma/lzma_encoder.rs
  • liblzma-rs/src/simple/simple_coder.rs
  • perf-probe/Cargo.toml
  • perf-probe/src/main.rs
  • scripts/compare_api_workloads.sh
  • scripts/compare_backends.sh
  • scripts/compare_workloads.sh
  • scripts/inspect_codegen.sh
  • scripts/profile_backend.sh
  • scripts/run_root_test_bins.sh

continue;
}

let filename = entry.file_name().into_string().unwrap();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential panic on non-UTF8 filenames.

Line 175 uses .unwrap() on into_string(), which will panic if the filename contains non-UTF8 characters. While unlikely for test corpus files, consider handling gracefully:

🛡️ Proposed defensive fix
-        let filename = entry.file_name().into_string().unwrap();
+        let filename = match entry.file_name().into_string() {
+            Ok(s) => s,
+            Err(_) => continue, // Skip non-UTF8 filenames
+        };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let filename = entry.file_name().into_string().unwrap();
let filename = match entry.file_name().into_string() {
Ok(s) => s,
Err(_) => continue, // Skip non-UTF8 filenames
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/standard_files_probe.rs` at line 175, The code currently calls
entry.file_name().into_string().unwrap(), which will panic on non-UTF8
filenames; change this to handle non-UTF8 safely by replacing the unwrap with a
non-panicking conversion such as using entry.file_name().to_string_lossy() (or
explicitly match on into_string() and fall back to a lossy/path display) and
assign that result to filename so tests don't panic on weird filenames.

Comment on lines +240 to +261
fn measure<F>(bytes_per_iter: usize, iters: usize, warmup: usize, mut work: F) -> Measurement
where
F: FnMut() -> u64,
{
let mut digest = 0u64;
for _ in 0..warmup {
digest ^= black_box(work());
}

let start = Instant::now();
for _ in 0..iters {
digest ^= black_box(work());
}
let elapsed = start.elapsed();
let mib = (bytes_per_iter as f64 * iters as f64) / (1024.0 * 1024.0);

Measurement {
elapsed,
throughput_mib_s: mib / elapsed.as_secs_f64(),
digest,
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider extracting shared benchmark utilities.

The measure, fold_bytes, and print_measurement functions are duplicated between qc_probe.rs and standard_files_probe.rs. While acceptable for examples, extracting to a shared module (e.g., examples/common/mod.rs or a bench_util module) would reduce maintenance burden.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/standard_files_probe.rs` around lines 240 - 261, The measure,
fold_bytes, and print_measurement functions are duplicated between
standard_files_probe.rs and qc_probe.rs; extract these shared benchmark
utilities into a single module (e.g., examples/common or bench_util) and have
both probes import them. Move the functions (measure, fold_bytes,
print_measurement) into the new module, update both files to use the new
module's public functions, remove the duplicated implementations, and ensure any
required types (e.g., Measurement) are also exported from the new module so
callers compile without changes.

Comment on lines +357 to +424
#[cfg(target_arch = "aarch64")]
#[inline(always)]
unsafe fn aligned_read16le(buf: *const u8) -> u16 {
core::ptr::read_unaligned(buf as *const u16)
}

#[cfg(target_arch = "aarch64")]
#[inline(always)]
unsafe fn aligned_read32le(buf: *const u8) -> u32 {
core::ptr::read_unaligned(buf as *const u32)
}

#[cfg(target_arch = "aarch64")]
#[inline(always)]
unsafe fn aligned_read64le(buf: *const u8) -> u64 {
core::ptr::read_unaligned(buf as *const u64)
}

#[cfg(target_arch = "aarch64")]
#[target_feature(enable = "crc")]
unsafe fn lzma_crc32_arm64(mut buf: *const u8, mut size: size_t, mut crc: u32) -> u32 {
crc = !crc;

if size >= 8 {
let align = (0usize.wrapping_sub(buf as usize)) & 7;

if align & 1 != 0 {
crc = __crc32b(crc, *buf);
buf = buf.add(1);
}

if align & 2 != 0 {
crc = __crc32h(crc, aligned_read16le(buf));
buf = buf.add(2);
}

if align & 4 != 0 {
crc = __crc32w(crc, aligned_read32le(buf));
buf = buf.add(4);
}

size -= align;

let limit = buf.add(size & !7);
while buf < limit {
crc = __crc32d(crc, aligned_read64le(buf));
buf = buf.add(8);
}

size &= 7;
}

if size & 4 != 0 {
crc = __crc32w(crc, aligned_read32le(buf));
buf = buf.add(4);
}

if size & 2 != 0 {
crc = __crc32h(crc, aligned_read16le(buf));
buf = buf.add(2);
}

if size & 1 != 0 {
crc = __crc32b(crc, *buf);
}

!crc
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For Rust big-endian AArch64 targets, does cfg(target_arch = "aarch64")still apply, and doescore::ptr::read_unaligned::<u16/u32/u64>return native-endian values such thatfrom_le(...) is required to preserve little-endian byte order?

💡 Result:

  • Yes. Big-endian AArch64 targets (e.g. aarch64_be-unknown-linux-musl) are still AArch64, so cfg(target_arch = "aarch64") applies; endianness is a separate knob via cfg(target_endian = "big") / "little". The Rust Reference lists "aarch64" as a target_arch value and target_endian as "big"/"little" depending on the CPU endianness. [1] The compiler’s documented/checked target_arch values include aarch64 (there is no distinct aarch64_be target_arch value). [2] The aarch64_be-unknown-linux-musl target is explicitly “ARM64 Linux (big-endian)”. [3]

  • core::ptr::read_unaligned::<u16/u32/u64> reads a value of type T from memory (without alignment requirements) and thus yields the integer as interpreted in the platform’s native endianness (it’s just a typed load/bitwise read). [6] Therefore, if the bytes in memory are in little-endian order but the target is big-endian, you must convert (e.g. u32::from_le(v)) to interpret/preserve the intended little-endian meaning; from_le is defined as converting a little-endian integer to the target’s endianness. [5] (Portable alternatives are from_le_bytes/from_be_bytes, which explicitly define the byte order.) [4]

Sources: [1] https://doc.rust-lang.org/reference/conditional-compilation.html, [2] https://fuchsia.googlesource.com/third_party/rust/%2B/0285dab54f87120367f5693174136aff24b74640%5E1..0285dab54f87120367f5693174136aff24b74640/, [3] https://doc.rust-lang.org/beta/rustc/platform-support/aarch64_be-unknown-linux-musl.html, [4] https://doc.rust-lang.org/core/primitive.u32.html, [5] https://doc.rust-lang.org/std/primitive.i32.html, [6] https://doc.rust-lang.org/core/ptr/fn.read_unaligned.html


Fix the AArch64 loads to be explicitly little-endian.

The helper functions aligned_read16le/32le/64le use core::ptr::read_unaligned, which returns values in the platform's native endianness. On big-endian AArch64 targets, this will cause them to return reversed word values, producing a different CRC than the generic path and breaking correctness.

Suggested fix
 #[cfg(target_arch = "aarch64")]
 #[inline(always)]
 unsafe fn aligned_read16le(buf: *const u8) -> u16 {
-    core::ptr::read_unaligned(buf as *const u16)
+    u16::from_le(core::ptr::read_unaligned(buf as *const u16))
 }

 #[cfg(target_arch = "aarch64")]
 #[inline(always)]
 unsafe fn aligned_read32le(buf: *const u8) -> u32 {
-    core::ptr::read_unaligned(buf as *const u32)
+    u32::from_le(core::ptr::read_unaligned(buf as *const u32))
 }

 #[cfg(target_arch = "aarch64")]
 #[inline(always)]
 unsafe fn aligned_read64le(buf: *const u8) -> u64 {
-    core::ptr::read_unaligned(buf as *const u64)
+    u64::from_le(core::ptr::read_unaligned(buf as *const u64))
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[cfg(target_arch = "aarch64")]
#[inline(always)]
unsafe fn aligned_read16le(buf: *const u8) -> u16 {
core::ptr::read_unaligned(buf as *const u16)
}
#[cfg(target_arch = "aarch64")]
#[inline(always)]
unsafe fn aligned_read32le(buf: *const u8) -> u32 {
core::ptr::read_unaligned(buf as *const u32)
}
#[cfg(target_arch = "aarch64")]
#[inline(always)]
unsafe fn aligned_read64le(buf: *const u8) -> u64 {
core::ptr::read_unaligned(buf as *const u64)
}
#[cfg(target_arch = "aarch64")]
#[target_feature(enable = "crc")]
unsafe fn lzma_crc32_arm64(mut buf: *const u8, mut size: size_t, mut crc: u32) -> u32 {
crc = !crc;
if size >= 8 {
let align = (0usize.wrapping_sub(buf as usize)) & 7;
if align & 1 != 0 {
crc = __crc32b(crc, *buf);
buf = buf.add(1);
}
if align & 2 != 0 {
crc = __crc32h(crc, aligned_read16le(buf));
buf = buf.add(2);
}
if align & 4 != 0 {
crc = __crc32w(crc, aligned_read32le(buf));
buf = buf.add(4);
}
size -= align;
let limit = buf.add(size & !7);
while buf < limit {
crc = __crc32d(crc, aligned_read64le(buf));
buf = buf.add(8);
}
size &= 7;
}
if size & 4 != 0 {
crc = __crc32w(crc, aligned_read32le(buf));
buf = buf.add(4);
}
if size & 2 != 0 {
crc = __crc32h(crc, aligned_read16le(buf));
buf = buf.add(2);
}
if size & 1 != 0 {
crc = __crc32b(crc, *buf);
}
!crc
}
#[cfg(target_arch = "aarch64")]
#[inline(always)]
unsafe fn aligned_read16le(buf: *const u8) -> u16 {
u16::from_le(core::ptr::read_unaligned(buf as *const u16))
}
#[cfg(target_arch = "aarch64")]
#[inline(always)]
unsafe fn aligned_read32le(buf: *const u8) -> u32 {
u32::from_le(core::ptr::read_unaligned(buf as *const u32))
}
#[cfg(target_arch = "aarch64")]
#[inline(always)]
unsafe fn aligned_read64le(buf: *const u8) -> u64 {
u64::from_le(core::ptr::read_unaligned(buf as *const u64))
}
#[cfg(target_arch = "aarch64")]
#[target_feature(enable = "crc")]
unsafe fn lzma_crc32_arm64(mut buf: *const u8, mut size: size_t, mut crc: u32) -> u32 {
crc = !crc;
if size >= 8 {
let align = (0usize.wrapping_sub(buf as usize)) & 7;
if align & 1 != 0 {
crc = __crc32b(crc, *buf);
buf = buf.add(1);
}
if align & 2 != 0 {
crc = __crc32h(crc, aligned_read16le(buf));
buf = buf.add(2);
}
if align & 4 != 0 {
crc = __crc32w(crc, aligned_read32le(buf));
buf = buf.add(4);
}
size -= align;
let limit = buf.add(size & !7);
while buf < limit {
crc = __crc32d(crc, aligned_read64le(buf));
buf = buf.add(8);
}
size &= 7;
}
if size & 4 != 0 {
crc = __crc32w(crc, aligned_read32le(buf));
buf = buf.add(4);
}
if size & 2 != 0 {
crc = __crc32h(crc, aligned_read16le(buf));
buf = buf.add(2);
}
if size & 1 != 0 {
crc = __crc32b(crc, *buf);
}
!crc
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@liblzma-rs/src/check/crc32_fast.rs` around lines 357 - 424,
aligned_read16le/32le/64le currently use core::ptr::read_unaligned which yields
native-endian values so on big-endian AArch64 the CRC is wrong; change those
helpers to read the unaligned integer and convert to little-endian (e.g.
read_unaligned(... ) then call u16::from_le / u32::from_le / u64::from_le) so
aligned_read16le/32le/64le always return little-endian values, leaving
lzma_crc32_arm64 logic unchanged.

Comment on lines +1061 to 1069
loop {
let old_size = size;
size = size.wrapping_sub(1);
if old_size == 0 {
break;
}
crc = *table0.add((*buf as u64 ^ crc & 0xff as u64) as usize) ^ crc >> 8;
buf = buf.offset(1);
size -= 1;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Tail-loop rewrite remains safe for empty input.

The old_size == 0 break happens before dereferencing buf, so zero-length tails are still handled safely.

Based on learnings: Use scripts/compare_backends.sh as the full deterministic regression gate for backend performance comparisons.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@liblzma-rs/src/check/crc64_fast.rs` around lines 1061 - 1069, The tail-loop
rewrite is safe for empty input because the check `if old_size == 0 { break; }`
occurs before any dereference of `buf`; add a short clarifying comment above the
loop referencing `size`, `old_size`, `buf`, `crc`, and `table0` that explains
the break protects against zero-length buffers and preserve the current order of
operations (capture `old_size`, decrement `size`, then test and break) so no
dereference happens on empty input.

Comment thread xz/src/common/common.rs
Comment on lines +123 to +126
debug_assert!(!in_0.is_null() || *in_pos == in_size);
debug_assert!(!out.is_null() || *out_pos == out_size);
debug_assert!(*in_pos <= in_size);
debug_assert!(*out_pos <= out_size);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "1) Locate all lzma_bufcpy call sites with context:"
rg -n --type=rust -C4 '\blzma_bufcpy\s*\('

echo
echo "2) Check whether callers validate in/out positions before calling:"
rg -n --type=rust -C3 '\*in_pos\s*<=\s*in_size|\*out_pos\s*<=\s*out_size|\*in_pos\s*>\s*in_size|\*out_pos\s*>\s*out_size'

echo
echo "3) Confirm callers that ignore return value (risk amplification):"
rg -n --type=rust -C3 'lzma_bufcpy\s*\([^;]*\);'

Repository: youknowone/xz2-rs

Length of output: 23027


🏁 Script executed:

# Check simple_coder.rs line 15 context and caller chain
cat -n liblzma-rs/src/simple/simple_coder.rs | sed -n '1,25p'

echo "---"

# Check what calls this function
rg -n 'simple_code' --type=rust | head -20

Repository: youknowone/xz2-rs

Length of output: 2639


🏁 Script executed:

cat -n liblzma-rs/src/simple/simple_coder.rs | sed -n '50,120p'

Repository: youknowone/xz2-rs

Length of output: 2946


Restore runtime bounds checks in lzma_bufcpy to prevent underflow-driven out-of-bounds copy.

The function performs arithmetic (in_size - *in_pos) immediately after assertions. With debug_assert! only, release builds have no protection. If *in_pos > in_size or *out_pos > out_size (whether from invalid input or a bug), subtraction underflows to a very large value and feeds it into copy_nonoverlapping, corrupting memory.

The public API callback simple_code in liblzma-rs/src/simple/simple_coder.rs:50 does not validate input positions before calling the chain that reaches lzma_bufcpy. Additionally, at least one caller ignores the return value entirely (simple_coder.rs:15), so the function must be self-defending.

Proposed fix
 pub unsafe fn lzma_bufcpy(
     in_0: *const u8,
     in_pos: *mut size_t,
     in_size: size_t,
     out: *mut u8,
     out_pos: *mut size_t,
     out_size: size_t,
 ) -> size_t {
+    if *in_pos > in_size || *out_pos > out_size {
+        return 0;
+    }
     debug_assert!(!in_0.is_null() || *in_pos == in_size);
     debug_assert!(!out.is_null() || *out_pos == out_size);
     debug_assert!(*in_pos <= in_size);
     debug_assert!(*out_pos <= out_size);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
debug_assert!(!in_0.is_null() || *in_pos == in_size);
debug_assert!(!out.is_null() || *out_pos == out_size);
debug_assert!(*in_pos <= in_size);
debug_assert!(*out_pos <= out_size);
if *in_pos > in_size || *out_pos > out_size {
return 0;
}
debug_assert!(!in_0.is_null() || *in_pos == in_size);
debug_assert!(!out.is_null() || *out_pos == out_size);
debug_assert!(*in_pos <= in_size);
debug_assert!(*out_pos <= out_size);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@liblzma-rs/src/common/common.rs` around lines 123 - 126, Replace the
debug-only assertions in lzma_bufcpy with real runtime bounds checks: before
doing any arithmetic like (in_size - *in_pos) or calling copy_nonoverlapping,
check that *in_pos <= in_size and *out_pos <= out_size and that pointers are
non-null; if any check fails, do an early return/error from lzma_bufcpy (do not
perform the subtraction or call copy_nonoverlapping). Update lzma_bufcpy’s
control flow so invalid inputs are rejected safely (no underflow-driven large
lengths), and ensure the change is coherent with the public API paths that call
it (e.g., simple_code / simple_coder) by propagating or returning the error
rather than performing an unsafe copy.

Comment on lines +33 to +45
cat <<'EOF'
Usage: scripts/compare_backends.sh [options]

Compare full test-suite wall clock time for the Rust and C backends using hyperfine.
Results are written under target/perf-results by default.

Options:
--runs <n> Number of measured runs per command (default: 5)
--warmup <n> Warmup runs per command (default: 1)
--root-repeats <n> Repeat the root test-binary bundle inside each timed command
--results-dir <dir> Where to write hyperfine reports
--skip-systest Only compare the root crate test suite
EOF
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Help text is missing --help option documentation.

The --help|-h option is implemented (Line 32) but not documented in the usage text.

📝 Proposed fix
 Options:
   --runs <n>          Number of measured runs per command (default: 5)
   --warmup <n>        Warmup runs per command (default: 1)
   --root-repeats <n>  Repeat the root test-binary bundle inside each timed command
   --results-dir <dir> Where to write hyperfine reports
   --skip-systest      Only compare the root crate test suite
+  --help, -h          Show this help message
 EOF
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cat <<'EOF'
Usage: scripts/compare_backends.sh [options]
Compare full test-suite wall clock time for the Rust and C backends using hyperfine.
Results are written under target/perf-results by default.
Options:
--runs <n> Number of measured runs per command (default: 5)
--warmup <n> Warmup runs per command (default: 1)
--root-repeats <n> Repeat the root test-binary bundle inside each timed command
--results-dir <dir> Where to write hyperfine reports
--skip-systest Only compare the root crate test suite
EOF
cat <<'EOF'
Usage: scripts/compare_backends.sh [options]
Compare full test-suite wall clock time for the Rust and C backends using hyperfine.
Results are written under target/perf-results by default.
Options:
--runs <n> Number of measured runs per command (default: 5)
--warmup <n> Warmup runs per command (default: 1)
--root-repeats <n> Repeat the root test-binary bundle inside each timed command
--results-dir <dir> Where to write hyperfine reports
--skip-systest Only compare the root crate test suite
--help, -h Show this help message
EOF
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/compare_backends.sh` around lines 33 - 45, The usage/help block
printed by the here-doc in scripts/compare_backends.sh (the cat <<'EOF' block
starting with "Usage: scripts/compare_backends.sh [options]") is missing
documentation for the implemented --help|-h flag; update that help text by
adding a line documenting "-h, --help    Show this help message" (or similar) to
the Options section so the implemented --help|-h behavior is reflected in the
usage output.

Comment on lines +49 to +51
COMPRESSED_INPUT="$RESULTS_DIR/decode-input-${SIZE}.xz"
"$C_BIN" \
--workload encode "${RAW_ARGS[@]}" --iters 1 --warmup 0 --save-output "$COMPRESSED_INPUT" >/dev/null
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential argument conflict in decode input generation.

Line 51 appends --iters 1 --warmup 0 after ${RAW_ARGS[@]}, but if the user already specified --iters or --warmup in their args, the input generation may use the user's values instead of 1/0 (depending on how perf-probe handles duplicate flags).

Consider placing the override arguments before ${RAW_ARGS[@]} or filtering them out:

🔧 Proposed fix
   COMPRESSED_INPUT="$RESULTS_DIR/decode-input-${SIZE}.xz"
+  # Filter out --iters and --warmup from RAW_ARGS for input generation
+  ENCODE_ARGS=()
+  skip_next=0
+  for arg in "${RAW_ARGS[@]}"; do
+    if [[ $skip_next -eq 1 ]]; then
+      skip_next=0
+      continue
+    fi
+    if [[ "$arg" == "--iters" || "$arg" == "--warmup" ]]; then
+      skip_next=1
+      continue
+    fi
+    ENCODE_ARGS+=("$arg")
+  done
   "$C_BIN" \
-    --workload encode "${RAW_ARGS[@]}" --iters 1 --warmup 0 --save-output "$COMPRESSED_INPUT" >/dev/null
+    --workload encode "${ENCODE_ARGS[@]}" --iters 1 --warmup 0 --save-output "$COMPRESSED_INPUT" >/dev/null
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
COMPRESSED_INPUT="$RESULTS_DIR/decode-input-${SIZE}.xz"
"$C_BIN" \
--workload encode "${RAW_ARGS[@]}" --iters 1 --warmup 0 --save-output "$COMPRESSED_INPUT" >/dev/null
COMPRESSED_INPUT="$RESULTS_DIR/decode-input-${SIZE}.xz"
# Filter out --iters and --warmup from RAW_ARGS for input generation
ENCODE_ARGS=()
skip_next=0
for arg in "${RAW_ARGS[@]}"; do
if [[ $skip_next -eq 1 ]]; then
skip_next=0
continue
fi
if [[ "$arg" == "--iters" || "$arg" == "--warmup" ]]; then
skip_next=1
continue
fi
ENCODE_ARGS+=("$arg")
done
"$C_BIN" \
--workload encode "${ENCODE_ARGS[@]}" --iters 1 --warmup 0 --save-output "$COMPRESSED_INPUT" >/dev/null
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/compare_workloads.sh` around lines 49 - 51, The call that generates
COMPRESSED_INPUT uses "${RAW_ARGS[@]}" then appends "--iters 1 --warmup 0",
which can be overridden by duplicate flags in RAW_ARGS; update the invocation
around C_BIN so the intended overrides are enforced by either (a) moving
"--iters 1 --warmup 0" before "${RAW_ARGS[@]}" or (b) sanitizing RAW_ARGS to
remove any user-supplied --iters and --warmup entries before using it; reference
the COMPRESSED_INPUT variable, the C_BIN invocation line that runs --workload
encode, and RAW_ARGS when making the change.

Comment on lines +53 to +64
RUST_CMD=(
"$RUST_BIN"
--workload decode
--compressed-input "$COMPRESSED_INPUT"
--expected-size "$SIZE"
)
C_CMD=(
"$C_BIN"
--workload decode
--compressed-input "$COMPRESSED_INPUT"
--expected-size "$SIZE"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Decode commands don't forward --iters/--warmup args.

The decode commands (Lines 53-64) only specify --compressed-input and --expected-size, but don't forward user-provided --iters or --warmup arguments. The usage example on Line 10 shows --iters 50 --warmup 5, but these won't actually be passed to the decode benchmark.

🔧 Proposed fix to forward iteration args
+  # Extract --iters and --warmup values from RAW_ARGS
+  ITERS=""
+  WARMUP=""
+  for ((i = 0; i < ${`#RAW_ARGS`[@]}; i++)); do
+    if [[ "${RAW_ARGS[$i]}" == "--iters" ]]; then
+      ITERS="${RAW_ARGS[$((i + 1))]}"
+    elif [[ "${RAW_ARGS[$i]}" == "--warmup" ]]; then
+      WARMUP="${RAW_ARGS[$((i + 1))]}"
+    fi
+  done
+
   RUST_CMD=(
     "$RUST_BIN"
     --workload decode
     --compressed-input "$COMPRESSED_INPUT"
     --expected-size "$SIZE"
+    ${ITERS:+--iters "$ITERS"}
+    ${WARMUP:+--warmup "$WARMUP"}
   )
   C_CMD=(
     "$C_BIN"
     --workload decode
     --compressed-input "$COMPRESSED_INPUT"
     --expected-size "$SIZE"
+    ${ITERS:+--iters "$ITERS"}
+    ${WARMUP:+--warmup "$WARMUP"}
   )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUST_CMD=(
"$RUST_BIN"
--workload decode
--compressed-input "$COMPRESSED_INPUT"
--expected-size "$SIZE"
)
C_CMD=(
"$C_BIN"
--workload decode
--compressed-input "$COMPRESSED_INPUT"
--expected-size "$SIZE"
)
# Extract --iters and --warmup values from RAW_ARGS
ITERS=""
WARMUP=""
for ((i = 0; i < ${`#RAW_ARGS`[@]}; i++)); do
if [[ "${RAW_ARGS[$i]}" == "--iters" ]]; then
ITERS="${RAW_ARGS[$((i + 1))]}"
elif [[ "${RAW_ARGS[$i]}" == "--warmup" ]]; then
WARMUP="${RAW_ARGS[$((i + 1))]}"
fi
done
RUST_CMD=(
"$RUST_BIN"
--workload decode
--compressed-input "$COMPRESSED_INPUT"
--expected-size "$SIZE"
${ITERS:+--iters "$ITERS"}
${WARMUP:+--warmup "$WARMUP"}
)
C_CMD=(
"$C_BIN"
--workload decode
--compressed-input "$COMPRESSED_INPUT"
--expected-size "$SIZE"
${ITERS:+--iters "$ITERS"}
${WARMUP:+--warmup "$WARMUP"}
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/compare_workloads.sh` around lines 53 - 64, The decode command arrays
RUST_CMD and C_CMD currently omit forwarding user-provided iteration/warmup
options; update both arrays (RUST_CMD and C_CMD) to include the iteration flags
by appending --iters "$ITERS" and --warmup "$WARMUP" (or the equivalent
variables/argument-slice your script sets for those user flags) so any provided
--iters/--warmup values are passed through to the decode workload.

Comment on lines +38 to +65
while [[ $# -gt 0 ]]; do
case "$1" in
--package)
PACKAGE="$2"
shift 2
;;
--format)
FORMAT="$2"
shift 2
;;
--features)
FEATURES="$2"
shift 2
;;
--target-dir)
TARGET_DIR="$2"
shift 2
;;
--help|-h)
print_usage
exit 0
;;
*)
echo "unknown argument: $1" >&2
exit 2
;;
esac
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard option values before reading $2.

With set -u, a call like scripts/inspect_codegen.sh foo --package aborts with an unbound-variable error instead of a clear CLI error. Each flag that consumes a value should validate "$#" first.

Suggested fix
 while [[ $# -gt 0 ]]; do
   case "$1" in
     --package)
+      [[ $# -ge 2 ]] || { echo "--package requires a value" >&2; exit 2; }
       PACKAGE="$2"
       shift 2
       ;;
     --format)
+      [[ $# -ge 2 ]] || { echo "--format requires a value" >&2; exit 2; }
       FORMAT="$2"
       shift 2
       ;;
     --features)
+      [[ $# -ge 2 ]] || { echo "--features requires a value" >&2; exit 2; }
       FEATURES="$2"
       shift 2
       ;;
     --target-dir)
+      [[ $# -ge 2 ]] || { echo "--target-dir requires a value" >&2; exit 2; }
       TARGET_DIR="$2"
       shift 2
       ;;

Based on learnings: Use scripts/inspect_codegen.sh for optimized code inspection.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while [[ $# -gt 0 ]]; do
case "$1" in
--package)
PACKAGE="$2"
shift 2
;;
--format)
FORMAT="$2"
shift 2
;;
--features)
FEATURES="$2"
shift 2
;;
--target-dir)
TARGET_DIR="$2"
shift 2
;;
--help|-h)
print_usage
exit 0
;;
*)
echo "unknown argument: $1" >&2
exit 2
;;
esac
done
while [[ $# -gt 0 ]]; do
case "$1" in
--package)
[[ $# -ge 2 ]] || { echo "--package requires a value" >&2; exit 2; }
PACKAGE="$2"
shift 2
;;
--format)
[[ $# -ge 2 ]] || { echo "--format requires a value" >&2; exit 2; }
FORMAT="$2"
shift 2
;;
--features)
[[ $# -ge 2 ]] || { echo "--features requires a value" >&2; exit 2; }
FEATURES="$2"
shift 2
;;
--target-dir)
[[ $# -ge 2 ]] || { echo "--target-dir requires a value" >&2; exit 2; }
TARGET_DIR="$2"
shift 2
;;
--help|-h)
print_usage
exit 0
;;
*)
echo "unknown argument: $1" >&2
exit 2
;;
esac
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/inspect_codegen.sh` around lines 38 - 65, The option parsing loop
reads "$2" for flags like --package, --format, --features, and --target-dir
without guarding argument count, causing unbound-variable failures under set -u;
update the case branches in the while/case block to first check that $# -ge 2
(or that a next argument exists) before assigning PACKAGE, FORMAT, FEATURES or
TARGET_DIR, and if the value is missing print a clear error (e.g., "missing
value for --package") and exit with non-zero status so the script fails with a
user-friendly message instead of an unbound-variable error.

Comment thread scripts/profile_backend.sh
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
xz/src/common/stream_decoder.rs (1)

238-267: ⚠️ Potential issue | 🟠 Major

Free filters_ptr before returning on header-decode failure.

Line 240 stores a stack-backed buffer into (*coder).block_options.filters, but the early return on Line 247 skips the cleanup on Lines 266-267. If lzma_block_header_decode allocated any filter options before failing, they leak, and block_options.filters is left dangling until the next overwrite.

Suggested fix
                 let ret_: lzma_ret = lzma_block_header_decode(
                     ::core::ptr::addr_of_mut!((*coder).block_options),
                     allocator,
                     ::core::ptr::addr_of_mut!((*coder).buffer) as *mut u8,
                 );
                 if ret_ != LZMA_OK {
+                    lzma_filters_free(filters_ptr, allocator);
+                    (*coder).block_options.filters = core::ptr::null_mut();
                     return ret_;
                 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xz/src/common/stream_decoder.rs` around lines 238 - 267, The early return
after lzma_block_header_decode leaves the stack-backed filters buffer referenced
by (*coder).block_options.filters and can leak allocated filter options; modify
the error path where ret_ != LZMA_OK to call lzma_filters_free(filters_ptr,
allocator) and then set (*coder).block_options.filters = core::ptr::null_mut()
before returning ret_ so any partially-allocated filter state is freed and the
pointer is not left dangling (refer to filters, filters_ptr,
(*coder).block_options, lzma_block_header_decode, and lzma_filters_free).
♻️ Duplicate comments (9)
xz/src/common/common.rs (1)

123-126: ⚠️ Potential issue | 🔴 Critical

Restore runtime bounds guards in lzma_bufcpy (release-safety regression).

At Line 123–Line 126, debug_assert! is not sufficient for release builds. Without a runtime guard, Line 128/129 can underflow (in_size - *in_pos, out_size - *out_pos) and Line 136 may perform out-of-bounds copy.

Proposed fix
 pub unsafe fn lzma_bufcpy(
     in_0: *const u8,
     in_pos: *mut size_t,
     in_size: size_t,
     out: *mut u8,
     out_pos: *mut size_t,
     out_size: size_t,
 ) -> size_t {
+    if *in_pos > in_size || *out_pos > out_size {
+        return 0;
+    }
     debug_assert!(!in_0.is_null() || *in_pos == in_size);
     debug_assert!(!out.is_null() || *out_pos == out_size);
     debug_assert!(*in_pos <= in_size);
     debug_assert!(*out_pos <= out_size);
#!/bin/bash
set -euo pipefail

FILE="xz/src/common/common.rs"

echo "Inspect lzma_bufcpy body:"
sed -n '115,145p' "$FILE"

echo
echo "Check for runtime guard vs. debug-only checks:"
rg -n 'debug_assert!\(\*in_pos <= in_size\)|debug_assert!\(\*out_pos <= out_size\)|if \*in_pos > in_size \|\| \*out_pos > out_size' "$FILE"

echo
echo "Locate arithmetic/copy sites that depend on validated positions:"
rg -n 'in_size - \*in_pos|out_size - \*out_pos|copy_nonoverlapping' "$FILE"

Expected verification result: subtraction/copy sites exist, and no runtime early-return guard is present before them in lzma_bufcpy.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xz/src/common/common.rs` around lines 123 - 126, The debug_assert! checks in
lzma_bufcpy are insufficient for release; replace them with a runtime guard that
checks the positions and pointers (in_0, out, *in_pos, *out_pos, in_size,
out_size) and returns early with an error/appropriate status if *in_pos >
in_size || *out_pos > out_size or if required pointers are null, before
performing the subtractions (in_size - *in_pos, out_size - *out_pos) and the
copy_nonoverlapping call; ensure the check occurs at the start of lzma_bufcpy so
underflow/out-of-bounds copies cannot happen in release builds while preserving
the existing debug_asserts if you want extra debug diagnostics.
xz/src/lz/lz_encoder_mf.rs (1)

616-628: ⚠️ Potential issue | 🟡 Minor

Local caching optimization looks good, but use wrapping_sub for delta calculations.

The caching of hash, son, cyclic_pos, cyclic_size, and depth into locals is a valid optimization to reduce repeated struct dereferences.

However, lines 623-624 use plain subtraction while the analogous lzma_mf_hc4_find function (lines 228-229) uses wrapping_sub. Since pos can wrap around during long-running compression, the subtraction needs to handle cases where the hash value exceeds pos.

🔧 Proposed fix
-    let mut delta2: u32 = pos - *hash.offset(hash_2_value as isize);
-    let delta3: u32 = pos - *hash.offset(hash_3_index as isize);
+    let mut delta2: u32 = pos.wrapping_sub(*hash.offset(hash_2_value as isize));
+    let delta3: u32 = pos.wrapping_sub(*hash.offset(hash_3_index as isize));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xz/src/lz/lz_encoder_mf.rs` around lines 616 - 628, The local caching
introduced in lz_encoder_mf (variables hash, son, cyclic_pos, cyclic_size,
depth) is fine but delta calculations use plain subtraction which can underflow
when pos wraps; change the computations of delta2 and delta3 (currently using
pos - *hash.offset(...)) to use wrapping_sub on pos (e.g.,
pos.wrapping_sub(...)) so they match the behavior in lzma_mf_hc4_find and
correctly handle wrapped positions; update only the expressions that compute
delta2 and delta3 (referencing delta2, delta3, pos, and the hash.offset(...)
calls).
xz/src/check/crc32_fast.rs (1)

357-373: ⚠️ Potential issue | 🟠 Major

Make AArch64 helper loads explicitly little-endian.

read_unaligned reads native-endian integer values. These *_le helpers should convert with from_le(...) to preserve correctness on big-endian AArch64 targets.

Suggested fix
 #[cfg(target_arch = "aarch64")]
 #[inline(always)]
 unsafe fn aligned_read16le(buf: *const u8) -> u16 {
-    core::ptr::read_unaligned(buf as *const u16)
+    u16::from_le(core::ptr::read_unaligned(buf as *const u16))
 }

 #[cfg(target_arch = "aarch64")]
 #[inline(always)]
 unsafe fn aligned_read32le(buf: *const u8) -> u32 {
-    core::ptr::read_unaligned(buf as *const u32)
+    u32::from_le(core::ptr::read_unaligned(buf as *const u32))
 }

 #[cfg(target_arch = "aarch64")]
 #[inline(always)]
 unsafe fn aligned_read64le(buf: *const u8) -> u64 {
-    core::ptr::read_unaligned(buf as *const u64)
+    u64::from_le(core::ptr::read_unaligned(buf as *const u64))
 }

Run this read-only check to validate the current helper implementation:

#!/bin/bash
set -euo pipefail
sed -n '350,380p' xz/src/check/crc32_fast.rs
rg -n "aligned_read(16|32|64)le|read_unaligned|from_le" xz/src/check/crc32_fast.rs

Expected result: aligned_read*le uses read_unaligned without from_le.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xz/src/check/crc32_fast.rs` around lines 357 - 373, The AArch64 helper
functions aligned_read16le, aligned_read32le, and aligned_read64le return
native-endian values from core::ptr::read_unaligned and must be converted to
little-endian; update each function to call from_le on the read_unaligned result
(e.g., u16::from_le(core::ptr::read_unaligned(...))) so the helpers preserve
little-endian semantics on big-endian AArch64 targets while keeping the same
unsafe, #[inline(always)] signatures.
xz/src/common/easy_decoder_memusage.rs (1)

4-10: ⚠️ Potential issue | 🔴 Critical

This has the same assume_init_mut() UB pattern as encoder memusage.

lzma_easy_preset partially initializes lzma_options_easy; materializing a mutable reference to the whole struct is undefined behavior.

Suggested fix
 pub fn lzma_easy_decoder_memusage(preset: u32) -> u64 {
     let mut opt_easy = MaybeUninit::<lzma_options_easy>::uninit();
-    if unsafe { lzma_easy_preset(opt_easy.as_mut_ptr(), preset) } {
+    let opt_easy_ptr = opt_easy.as_mut_ptr();
+    if unsafe { lzma_easy_preset(opt_easy_ptr, preset) } {
         return UINT32_MAX as u64;
     }
-    let opt_easy = unsafe { opt_easy.assume_init_mut() };
     unsafe {
-        lzma_raw_decoder_memusage(::core::ptr::addr_of_mut!(opt_easy.filters) as *mut lzma_filter)
+        lzma_raw_decoder_memusage(::core::ptr::addr_of!((*opt_easy_ptr).filters) as *const lzma_filter)
     }
 }

Run this read-only check to confirm the same partial-init pattern:

#!/bin/bash
set -euo pipefail
sed -n '1,20p' xz/src/common/easy_preset.rs
sed -n '1,20p' xz/src/common/easy_decoder_memusage.rs
rg -n "MaybeUninit|assume_init_mut|filters\\[|opt_lzma" xz/src/common/easy_preset.rs xz/src/common/easy_decoder_memusage.rs

Expected result: partial initialization in lzma_easy_preset with assume_init_mut() still used in decoder memusage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xz/src/common/easy_decoder_memusage.rs` around lines 4 - 10, The code
currently calls assume_init_mut() on opt_easy (lzma_options_easy) after
lzma_easy_preset, which is UB because lzma_easy_preset only partially
initializes the struct; instead, keep opt_easy as MaybeUninit, avoid creating a
mutable reference to the whole struct, and pass field pointers derived from the
raw uninitialized pointer when calling lzma_raw_decoder_memusage: use
opt_easy.as_mut_ptr() and compute the address of the filters field via
ptr::addr_of_mut!((*opt_easy.as_mut_ptr()).filters) (or equivalent raw-pointer
arithmetic) to call lzma_raw_decoder_memusage, and only call
assume_init()/assume_init_mut() after the struct is fully initialized by
lzma_easy_preset; replace assume_init_mut() usage with this raw-pointer approach
around lzma_raw_decoder_memusage to eliminate the partial-init UB.
xz/src/common/easy_encoder_memusage.rs (1)

4-10: ⚠️ Potential issue | 🔴 Critical

Avoid assume_init_mut() on partially initialized lzma_options_easy.

lzma_easy_preset does not fully initialize the whole struct, so creating a mutable reference with assume_init_mut() is undefined behavior.

Suggested fix
 pub fn lzma_easy_encoder_memusage(preset: u32) -> u64 {
     let mut opt_easy = MaybeUninit::<lzma_options_easy>::uninit();
-    if unsafe { lzma_easy_preset(opt_easy.as_mut_ptr(), preset) } {
+    let opt_easy_ptr = opt_easy.as_mut_ptr();
+    if unsafe { lzma_easy_preset(opt_easy_ptr, preset) } {
         return UINT32_MAX as u64;
     }
-    let opt_easy = unsafe { opt_easy.assume_init_mut() };
     unsafe {
-        lzma_raw_encoder_memusage(::core::ptr::addr_of_mut!(opt_easy.filters) as *mut lzma_filter)
+        lzma_raw_encoder_memusage(::core::ptr::addr_of!((*opt_easy_ptr).filters) as *const lzma_filter)
     }
 }

Run this read-only check to confirm the partial initialization + assume_init_mut pattern:

#!/bin/bash
set -euo pipefail
sed -n '1,20p' xz/src/common/easy_preset.rs
sed -n '1,20p' xz/src/common/easy_encoder_memusage.rs
rg -n "MaybeUninit|assume_init_mut|filters\\[|opt_lzma" xz/src/common/easy_preset.rs xz/src/common/easy_encoder_memusage.rs

Expected result: lzma_easy_preset initializes only opt_lzma, filters[0], and filters[1].id, while easy_encoder_memusage still calls assume_init_mut().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xz/src/common/easy_encoder_memusage.rs` around lines 4 - 10, The code
currently calls assume_init_mut() on opt_easy (MaybeUninit<lzma_options_easy>)
which is UB because lzma_easy_preset only partially initializes the struct;
instead avoid creating a &mut lzma_options_easy reference—keep opt_easy as
MaybeUninit, get a raw pointer to the initialized filters field and pass that to
lzma_raw_encoder_memusage. Concretely, after
lzma_easy_preset(opt_easy.as_mut_ptr(), preset) succeeds, use unsafe {
core::ptr::addr_of_mut!((*opt_easy.as_mut_ptr()).filters) as *mut lzma_filter }
when calling lzma_raw_encoder_memusage, and do not call
assume_init_mut()/assume_init() on opt_easy anywhere in easy_encoder_memusage.
xz/src/common/easy_buffer_encoder.rs (1)

14-20: ⚠️ Potential issue | 🔴 Critical

Unsound assume_init_mut() on partially initialized struct.

lzma_easy_preset() only initializes opt_lzma, filters[0], and filters[1]. The remaining filters array elements stay uninitialized. Calling assume_init_mut() creates a mutable reference to this partially initialized struct, which is undefined behavior in Rust.

Use raw pointers throughout to avoid creating a reference to uninitialized memory.

Proposed fix using raw pointers
     let mut opt_easy = MaybeUninit::<lzma_options_easy>::uninit();
-    if lzma_easy_preset(opt_easy.as_mut_ptr(), preset) {
+    let opt_easy_ptr = opt_easy.as_mut_ptr();
+    if lzma_easy_preset(opt_easy_ptr, preset) {
         return LZMA_OPTIONS_ERROR;
     }
-    let opt_easy = opt_easy.assume_init_mut();
     lzma_stream_buffer_encode(
-        ::core::ptr::addr_of_mut!(opt_easy.filters) as *mut lzma_filter,
+        ::core::ptr::addr_of_mut!((*opt_easy_ptr).filters) as *mut lzma_filter,
         check,
         allocator,
         in_0,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xz/src/common/easy_buffer_encoder.rs` around lines 14 - 20, The code unsafely
calls opt_easy.assume_init_mut() creating a Rust reference to a
partially-initialized lzma_options_easy (lzma_easy_preset only initializes some
fields/filters); instead, avoid assume_init_mut() and use raw pointers from
opt_easy.as_mut_ptr() when calling lzma_stream_buffer_encode so you never form a
reference to uninitialized memory — e.g., obtain a *mut lzma_options_easy via
opt_easy.as_mut_ptr(), compute the address of its filters field from that raw
pointer, pass that raw pointer to lzma_stream_buffer_encode, and only call
assume_init/assume_init_mut() (or transmute to initialized value) after the
entire struct is guaranteed initialized.
scripts/inspect_codegen.sh (1)

38-65: ⚠️ Potential issue | 🟡 Minor

Guard option values before accessing $2.

With set -u enabled, calling the script with a flag but no value (e.g., scripts/inspect_codegen.sh foo --package) causes an unbound-variable error instead of a clear CLI error message. Each flag that consumes a value should validate $# first.

🛡️ Proposed fix
 while [[ $# -gt 0 ]]; do
   case "$1" in
     --package)
+      [[ $# -ge 2 ]] || { echo "--package requires a value" >&2; exit 2; }
       PACKAGE="$2"
       shift 2
       ;;
     --format)
+      [[ $# -ge 2 ]] || { echo "--format requires a value" >&2; exit 2; }
       FORMAT="$2"
       shift 2
       ;;
     --features)
+      [[ $# -ge 2 ]] || { echo "--features requires a value" >&2; exit 2; }
       FEATURES="$2"
       shift 2
       ;;
     --target-dir)
+      [[ $# -ge 2 ]] || { echo "--target-dir requires a value" >&2; exit 2; }
       TARGET_DIR="$2"
       shift 2
       ;;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/inspect_codegen.sh` around lines 38 - 65, The argument-parsing loop
in scripts/inspect_codegen.sh accesses $2 for flags (--package, --format,
--features, --target-dir) without checking there is a next argument, which fails
under set -u; update the case branches for PACKAGE, FORMAT, FEATURES and
TARGET_DIR to first verify that $# -ge 2 (or that "$2" is non-empty) and if not
print a clear error (or call print_usage) and exit with a non-zero code, before
assigning PACKAGE="$2" (and analogous assignments) and shifting; keep the
unknown-argument behavior as-is.
xz/src/common/easy_encoder.rs (1)

9-18: ⚠️ Potential issue | 🔴 Critical

Undefined behavior: assume_init_mut() on partially initialized struct.

lzma_easy_preset() only initializes opt_lzma, filters[0], and filters[1].id. The remaining fields in lzma_options_easy (including filters[2..4] and filters[1].options) stay uninitialized. Calling assume_init_mut() on line 13 creates a reference to this partially initialized memory, which is undefined behavior.

Access the initialized filters field through the raw pointer instead:

🐛 Proposed fix
 let mut opt_easy = MaybeUninit::<lzma_options_easy>::uninit();
-if lzma_easy_preset(opt_easy.as_mut_ptr(), preset) {
+let opt_easy_ptr = opt_easy.as_mut_ptr();
+if lzma_easy_preset(opt_easy_ptr, preset) {
     return LZMA_OPTIONS_ERROR;
 }
-let opt_easy = opt_easy.assume_init_mut();
 lzma_stream_encoder(
     strm,
-    ::core::ptr::addr_of_mut!(opt_easy.filters) as *mut lzma_filter,
+    ::core::ptr::addr_of_mut!((*opt_easy_ptr).filters) as *mut lzma_filter,
     check,
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xz/src/common/easy_encoder.rs` around lines 9 - 18, The code unsafely calls
assume_init_mut() on opt_easy after lzma_easy_preset, but lzma_easy_preset only
partially initializes lzma_options_easy (making assume_init_mut UB); instead
keep opt_easy as MaybeUninit, pass opt_easy.as_mut_ptr() into lzma_easy_preset,
then use the raw pointer returned by opt_easy.as_mut_ptr() to access only the
initialized filters field (e.g. compute a pointer to
(*opt_easy.as_mut_ptr()).filters and pass that pointer to lzma_stream_encoder),
avoiding creating a Rust reference to the whole uninitialized struct; update
calls involving opt_easy (assume_init_mut, filters) to use the raw pointer
access pattern and then call lzma_stream_encoder with that filters pointer.
examples/standard_files_probe.rs (1)

177-177: ⚠️ Potential issue | 🟡 Minor

Avoid panic on non-UTF8 filenames.

into_string().unwrap() can panic on valid non-UTF8 filesystem entries.

Suggested fix
-        let filename = entry.file_name().into_string().unwrap();
+        let filename = entry.file_name().to_string_lossy().into_owned();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/standard_files_probe.rs` at line 177,
entry.file_name().into_string().unwrap() can panic for non-UTF8 filenames;
replace the unwrap with a safe conversion using
entry.file_name().to_string_lossy() and store the owned String (or explicitly
handle the OsString/OsStr case and skip or log non-UTF8 names) so the code never
panics on valid non-UTF8 filesystem entries; update the variable assignment for
filename and any downstream use in this function to accept the lossy String or
the handled OsStr path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/standard_files_probe.rs`:
- Around line 54-84: The code currently proceeds to run and print measurements
when the loaded corpus `cases` is empty; add a fail-fast check after
loading/printing case info to treat zero selected cases as a configuration/input
error: if `cases.is_empty()` then write an explanatory message to stderr (e.g.
via `eprintln!`) and call `std::process::exit(1)` so the program doesn't
continue to call `measure`/`run_cases`; insert this check before the
`measure(total_bytes, ...)` call (and before `run_cases`/`print_measurement`) to
stop execution early.
- Around line 12-18: The current cfgs for BACKEND_NAME allow both backends or
none; add compile-time guards to reject the "both" and "no backend"
configurations by using cfg attributes and compile_error! so compilation fails
in those cases. Locate the BACKEND_NAME declarations (the const BACKEND_NAME:
&str and its cfg attributes) and add two mutually exclusive compile_error!
branches: one that triggers when both features feature = "rust-backend" and
feature = "c-backend" are set, and another that triggers when neither feature is
set, leaving only the existing single-backend cfg branches for "rust" and "c" to
define BACKEND_NAME.

---

Outside diff comments:
In `@xz/src/common/stream_decoder.rs`:
- Around line 238-267: The early return after lzma_block_header_decode leaves
the stack-backed filters buffer referenced by (*coder).block_options.filters and
can leak allocated filter options; modify the error path where ret_ != LZMA_OK
to call lzma_filters_free(filters_ptr, allocator) and then set
(*coder).block_options.filters = core::ptr::null_mut() before returning ret_ so
any partially-allocated filter state is freed and the pointer is not left
dangling (refer to filters, filters_ptr, (*coder).block_options,
lzma_block_header_decode, and lzma_filters_free).

---

Duplicate comments:
In `@examples/standard_files_probe.rs`:
- Line 177: entry.file_name().into_string().unwrap() can panic for non-UTF8
filenames; replace the unwrap with a safe conversion using
entry.file_name().to_string_lossy() and store the owned String (or explicitly
handle the OsString/OsStr case and skip or log non-UTF8 names) so the code never
panics on valid non-UTF8 filesystem entries; update the variable assignment for
filename and any downstream use in this function to accept the lossy String or
the handled OsStr path.

In `@scripts/inspect_codegen.sh`:
- Around line 38-65: The argument-parsing loop in scripts/inspect_codegen.sh
accesses $2 for flags (--package, --format, --features, --target-dir) without
checking there is a next argument, which fails under set -u; update the case
branches for PACKAGE, FORMAT, FEATURES and TARGET_DIR to first verify that $#
-ge 2 (or that "$2" is non-empty) and if not print a clear error (or call
print_usage) and exit with a non-zero code, before assigning PACKAGE="$2" (and
analogous assignments) and shifting; keep the unknown-argument behavior as-is.

In `@xz/src/check/crc32_fast.rs`:
- Around line 357-373: The AArch64 helper functions aligned_read16le,
aligned_read32le, and aligned_read64le return native-endian values from
core::ptr::read_unaligned and must be converted to little-endian; update each
function to call from_le on the read_unaligned result (e.g.,
u16::from_le(core::ptr::read_unaligned(...))) so the helpers preserve
little-endian semantics on big-endian AArch64 targets while keeping the same
unsafe, #[inline(always)] signatures.

In `@xz/src/common/common.rs`:
- Around line 123-126: The debug_assert! checks in lzma_bufcpy are insufficient
for release; replace them with a runtime guard that checks the positions and
pointers (in_0, out, *in_pos, *out_pos, in_size, out_size) and returns early
with an error/appropriate status if *in_pos > in_size || *out_pos > out_size or
if required pointers are null, before performing the subtractions (in_size -
*in_pos, out_size - *out_pos) and the copy_nonoverlapping call; ensure the check
occurs at the start of lzma_bufcpy so underflow/out-of-bounds copies cannot
happen in release builds while preserving the existing debug_asserts if you want
extra debug diagnostics.

In `@xz/src/common/easy_buffer_encoder.rs`:
- Around line 14-20: The code unsafely calls opt_easy.assume_init_mut() creating
a Rust reference to a partially-initialized lzma_options_easy (lzma_easy_preset
only initializes some fields/filters); instead, avoid assume_init_mut() and use
raw pointers from opt_easy.as_mut_ptr() when calling lzma_stream_buffer_encode
so you never form a reference to uninitialized memory — e.g., obtain a *mut
lzma_options_easy via opt_easy.as_mut_ptr(), compute the address of its filters
field from that raw pointer, pass that raw pointer to lzma_stream_buffer_encode,
and only call assume_init/assume_init_mut() (or transmute to initialized value)
after the entire struct is guaranteed initialized.

In `@xz/src/common/easy_decoder_memusage.rs`:
- Around line 4-10: The code currently calls assume_init_mut() on opt_easy
(lzma_options_easy) after lzma_easy_preset, which is UB because lzma_easy_preset
only partially initializes the struct; instead, keep opt_easy as MaybeUninit,
avoid creating a mutable reference to the whole struct, and pass field pointers
derived from the raw uninitialized pointer when calling
lzma_raw_decoder_memusage: use opt_easy.as_mut_ptr() and compute the address of
the filters field via ptr::addr_of_mut!((*opt_easy.as_mut_ptr()).filters) (or
equivalent raw-pointer arithmetic) to call lzma_raw_decoder_memusage, and only
call assume_init()/assume_init_mut() after the struct is fully initialized by
lzma_easy_preset; replace assume_init_mut() usage with this raw-pointer approach
around lzma_raw_decoder_memusage to eliminate the partial-init UB.

In `@xz/src/common/easy_encoder_memusage.rs`:
- Around line 4-10: The code currently calls assume_init_mut() on opt_easy
(MaybeUninit<lzma_options_easy>) which is UB because lzma_easy_preset only
partially initializes the struct; instead avoid creating a &mut
lzma_options_easy reference—keep opt_easy as MaybeUninit, get a raw pointer to
the initialized filters field and pass that to lzma_raw_encoder_memusage.
Concretely, after lzma_easy_preset(opt_easy.as_mut_ptr(), preset) succeeds, use
unsafe { core::ptr::addr_of_mut!((*opt_easy.as_mut_ptr()).filters) as *mut
lzma_filter } when calling lzma_raw_encoder_memusage, and do not call
assume_init_mut()/assume_init() on opt_easy anywhere in easy_encoder_memusage.

In `@xz/src/common/easy_encoder.rs`:
- Around line 9-18: The code unsafely calls assume_init_mut() on opt_easy after
lzma_easy_preset, but lzma_easy_preset only partially initializes
lzma_options_easy (making assume_init_mut UB); instead keep opt_easy as
MaybeUninit, pass opt_easy.as_mut_ptr() into lzma_easy_preset, then use the raw
pointer returned by opt_easy.as_mut_ptr() to access only the initialized filters
field (e.g. compute a pointer to (*opt_easy.as_mut_ptr()).filters and pass that
pointer to lzma_stream_encoder), avoiding creating a Rust reference to the whole
uninitialized struct; update calls involving opt_easy (assume_init_mut, filters)
to use the raw pointer access pattern and then call lzma_stream_encoder with
that filters pointer.

In `@xz/src/lz/lz_encoder_mf.rs`:
- Around line 616-628: The local caching introduced in lz_encoder_mf (variables
hash, son, cyclic_pos, cyclic_size, depth) is fine but delta calculations use
plain subtraction which can underflow when pos wraps; change the computations of
delta2 and delta3 (currently using pos - *hash.offset(...)) to use wrapping_sub
on pos (e.g., pos.wrapping_sub(...)) so they match the behavior in
lzma_mf_hc4_find and correctly handle wrapped positions; update only the
expressions that compute delta2 and delta3 (referencing delta2, delta3, pos, and
the hash.offset(...) calls).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ed3d0cbd-a0c5-4aea-ab45-2aeba3c0e983

📥 Commits

Reviewing files that changed from the base of the PR and between 2891602 and 60dd817.

📒 Files selected for processing (108)
  • Cargo.lock.msrv
  • Cargo.toml
  • docs/performance-workflow.md
  • examples/bufread_trailing_probe.rs
  • examples/qc_probe.rs
  • examples/standard_files_probe.rs
  • perf-probe/Cargo.toml
  • perf-probe/src/main.rs
  • scripts/compare_api_workloads.sh
  • scripts/inspect_codegen.sh
  • systest/Cargo.toml
  • systest/build.rs
  • systest/src/main.rs
  • tests/sys_equivalence.rs
  • xz-sys/Cargo.toml
  • xz-sys/src/lib.rs
  • xz/Cargo.toml
  • xz/src/alloc.rs
  • xz/src/check/check.rs
  • xz/src/check/crc32_fast.rs
  • xz/src/check/crc64_fast.rs
  • xz/src/check/mod.rs
  • xz/src/check/sha256.rs
  • xz/src/common/alone_decoder.rs
  • xz/src/common/alone_encoder.rs
  • xz/src/common/auto_decoder.rs
  • xz/src/common/block_buffer_decoder.rs
  • xz/src/common/block_buffer_encoder.rs
  • xz/src/common/block_decoder.rs
  • xz/src/common/block_encoder.rs
  • xz/src/common/block_header_decoder.rs
  • xz/src/common/block_header_encoder.rs
  • xz/src/common/block_util.rs
  • xz/src/common/common.rs
  • xz/src/common/easy_buffer_encoder.rs
  • xz/src/common/easy_decoder_memusage.rs
  • xz/src/common/easy_encoder.rs
  • xz/src/common/easy_encoder_memusage.rs
  • xz/src/common/easy_preset.rs
  • xz/src/common/file_info.rs
  • xz/src/common/filter_buffer_decoder.rs
  • xz/src/common/filter_buffer_encoder.rs
  • xz/src/common/filter_common.rs
  • xz/src/common/filter_decoder.rs
  • xz/src/common/filter_encoder.rs
  • xz/src/common/filter_flags_decoder.rs
  • xz/src/common/filter_flags_encoder.rs
  • xz/src/common/hardware_cputhreads.rs
  • xz/src/common/hardware_physmem.rs
  • xz/src/common/index.rs
  • xz/src/common/index_decoder.rs
  • xz/src/common/index_encoder.rs
  • xz/src/common/index_hash.rs
  • xz/src/common/lzip_decoder.rs
  • xz/src/common/microlzma_decoder.rs
  • xz/src/common/microlzma_encoder.rs
  • xz/src/common/mod.rs
  • xz/src/common/outqueue.rs
  • xz/src/common/stream_buffer_decoder.rs
  • xz/src/common/stream_buffer_encoder.rs
  • xz/src/common/stream_decoder.rs
  • xz/src/common/stream_decoder_mt.rs
  • xz/src/common/stream_encoder.rs
  • xz/src/common/stream_encoder_mt.rs
  • xz/src/common/stream_flags_common.rs
  • xz/src/common/stream_flags_decoder.rs
  • xz/src/common/stream_flags_encoder.rs
  • xz/src/common/stream_mt.rs
  • xz/src/common/string_conversion.rs
  • xz/src/common/vli_decoder.rs
  • xz/src/common/vli_encoder.rs
  • xz/src/common/vli_size.rs
  • xz/src/delta/delta_common.rs
  • xz/src/delta/delta_decoder.rs
  • xz/src/delta/delta_encoder.rs
  • xz/src/delta/mod.rs
  • xz/src/lib.rs
  • xz/src/lz/lz_decoder.rs
  • xz/src/lz/lz_encoder.rs
  • xz/src/lz/lz_encoder_mf.rs
  • xz/src/lz/mod.rs
  • xz/src/lzma/fastpos_table.rs
  • xz/src/lzma/lzma2_decoder.rs
  • xz/src/lzma/lzma2_encoder.rs
  • xz/src/lzma/lzma_decoder.rs
  • xz/src/lzma/lzma_encoder.rs
  • xz/src/lzma/lzma_encoder_optimum_fast.rs
  • xz/src/lzma/lzma_encoder_optimum_normal.rs
  • xz/src/lzma/lzma_encoder_presets.rs
  • xz/src/lzma/mod.rs
  • xz/src/rangecoder/mod.rs
  • xz/src/rangecoder/price_table.rs
  • xz/src/simple/arm.rs
  • xz/src/simple/arm64.rs
  • xz/src/simple/armthumb.rs
  • xz/src/simple/ia64.rs
  • xz/src/simple/mod.rs
  • xz/src/simple/powerpc.rs
  • xz/src/simple/riscv.rs
  • xz/src/simple/simple_coder.rs
  • xz/src/simple/simple_decoder.rs
  • xz/src/simple/simple_encoder.rs
  • xz/src/simple/sparc.rs
  • xz/src/simple/x86.rs
  • xz/src/tuklib/mod.rs
  • xz/src/tuklib/tuklib_cpucores.rs
  • xz/src/tuklib/tuklib_physmem.rs
  • xz/src/types.rs

Comment thread examples/standard_files_probe.rs Outdated
Comment on lines +12 to +18
#[cfg(all(feature = "rust-backend", not(feature = "c-backend")))]
const BACKEND_NAME: &str = "rust";
#[cfg(all(feature = "c-backend", not(feature = "rust-backend")))]
const BACKEND_NAME: &str = "c";
#[cfg(all(feature = "rust-backend", feature = "c-backend"))]
const BACKEND_NAME: &str = "both";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that dual-backend build is currently allowed in this file (and not compile-blocked).
rg -n 'cfg\(all\(feature = "rust-backend", feature = "c-backend"\)\)|compile_error!' examples/standard_files_probe.rs -C2

Repository: youknowone/xz-rs

Length of output: 269


Disallow dual-backend and no-backend builds for this probe.

The probe currently allows compilation with both backend features enabled ("both"), which violates the constraint that C and Rust backends cannot be compared in the same process due to conflicting symbol exports. Block this configuration and the no-backend case at compile time.

Suggested fix
 #[cfg(all(feature = "rust-backend", not(feature = "c-backend")))]
 const BACKEND_NAME: &str = "rust";
 #[cfg(all(feature = "c-backend", not(feature = "rust-backend")))]
 const BACKEND_NAME: &str = "c";
-#[cfg(all(feature = "rust-backend", feature = "c-backend"))]
-const BACKEND_NAME: &str = "both";
+#[cfg(all(feature = "rust-backend", feature = "c-backend"))]
+compile_error!("standard_files_probe must be built with exactly one backend feature enabled");
+#[cfg(not(any(feature = "rust-backend", feature = "c-backend")))]
+compile_error!("enable one backend feature: rust-backend or c-backend");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[cfg(all(feature = "rust-backend", not(feature = "c-backend")))]
const BACKEND_NAME: &str = "rust";
#[cfg(all(feature = "c-backend", not(feature = "rust-backend")))]
const BACKEND_NAME: &str = "c";
#[cfg(all(feature = "rust-backend", feature = "c-backend"))]
const BACKEND_NAME: &str = "both";
#[cfg(all(feature = "rust-backend", not(feature = "c-backend")))]
const BACKEND_NAME: &str = "rust";
#[cfg(all(feature = "c-backend", not(feature = "rust-backend")))]
const BACKEND_NAME: &str = "c";
#[cfg(all(feature = "rust-backend", feature = "c-backend"))]
compile_error!("standard_files_probe must be built with exactly one backend feature enabled");
#[cfg(not(any(feature = "rust-backend", feature = "c-backend")))]
compile_error!("enable one backend feature: rust-backend or c-backend");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/standard_files_probe.rs` around lines 12 - 18, The current cfgs for
BACKEND_NAME allow both backends or none; add compile-time guards to reject the
"both" and "no backend" configurations by using cfg attributes and
compile_error! so compilation fails in those cases. Locate the BACKEND_NAME
declarations (the const BACKEND_NAME: &str and its cfg attributes) and add two
mutually exclusive compile_error! branches: one that triggers when both features
feature = "rust-backend" and feature = "c-backend" are set, and another that
triggers when neither feature is set, leaving only the existing single-backend
cfg branches for "rust" and "c" to define BACKEND_NAME.

Comment on lines +54 to +84
let cases = load_cases(&config).unwrap_or_else(|err| {
eprintln!("failed to load corpus: {err}");
std::process::exit(1);
});
let total_bytes: usize = cases.iter().map(|case| case.data.len()).sum();

println!(
"config: backend={} mode={:?} cases={} bytes={} iters={} warmup={}",
BACKEND_NAME,
config.mode,
cases.len(),
total_bytes,
config.iters,
config.warmup
);
if let Some(pattern) = &config.name_pattern {
println!("name_pattern: {pattern}");
}
if !cases.is_empty() {
let case_names = cases
.iter()
.map(|case| case.name.as_str())
.collect::<Vec<_>>()
.join(",");
println!("cases: {case_names}");
}

let measurement = measure(total_bytes, config.iters, config.warmup, || {
run_cases(black_box(&cases))
});
print_measurement(&measurement, config.iters);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast when no corpus cases are selected.

If filters produce zero cases, the probe still exits successfully and emits benchmark numbers that are not meaningful. Treat this as an input/config error.

Suggested fix
     let cases = load_cases(&config).unwrap_or_else(|err| {
         eprintln!("failed to load corpus: {err}");
         std::process::exit(1);
     });
+    if cases.is_empty() {
+        eprintln!("no .xz corpus cases selected (check --corpus-dir / --mode / --name-pattern)");
+        std::process::exit(2);
+    }
     let total_bytes: usize = cases.iter().map(|case| case.data.len()).sum();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/standard_files_probe.rs` around lines 54 - 84, The code currently
proceeds to run and print measurements when the loaded corpus `cases` is empty;
add a fail-fast check after loading/printing case info to treat zero selected
cases as a configuration/input error: if `cases.is_empty()` then write an
explanatory message to stderr (e.g. via `eprintln!`) and call
`std::process::exit(1)` so the program doesn't continue to call
`measure`/`run_cases`; insert this check before the `measure(total_bytes, ...)`
call (and before `run_cases`/`print_measurement`) to stop execution early.

- Replace the `rust-backend`/`c-backend` Cargo features with
  `xz-sys`/`liblzma-sys`, matching the underlying sys crate names.
  Default feature is `xz-sys`. CI, examples, probes, scripts,
  docs, benches, and `perf-probe` are updated to the new names.
- Alias the optional dependencies as `xz-sys-rs` and `liblzma-sys-c`
  in the workspace so both backends can resolve simultaneously.
- Add `lzma_*_rs` `#[inline(always)]` wrappers in `xz-sys` and route
  `src/lib.rs` and `src/stream.rs` through them when the `xz-sys`
  feature is on, bypassing the `extern "C"` thunks.
- Tighten visibility of internal coder/init helpers in the `xz`
  crate (many `pub` items become `pub(crate)`); drop the spurious
  `extern "C"` from `sha256::rotr_32`.
- Speed up `crc64_fast::lzma_crc64_generic` with an unrolled
  8-byte step, and reimplement `types::read32le`/`write32le` with
  `core::ptr::read_unaligned`/`write_unaligned`.
- Add a backend selection section to README.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
xz/src/common/block_decoder.rs (1)

165-278: 🧹 Nitpick | 🔵 Trivial

Run perf gates before merge to validate tuning impact.

Given this PR’s performance/tuning scope, please run the documented backend performance workflow and deterministic backend comparison gate before merge.

Based on learnings: Use the dedicated backend-performance workflow documented in docs/performance-workflow.md, and treat scripts/compare_backends.sh as the coarse deterministic regression gate for regression testing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xz/src/common/block_decoder.rs` around lines 165 - 278, The PR changes around
lzma_block_decoder_init (and related symbols block_decode, block_decoder_end,
lzma_raw_decoder_init) affect performance-sensitive decoder paths; before
merging, run the documented backend performance workflow
(docs/performance-workflow.md) and the deterministic regression gate script
scripts/compare_backends.sh to validate tuning impact and ensure no regressions;
if regressions appear, iterate on the changes to lzma_block_decoder_init (or
revert specific tuning) and repeat the workflows until comparisons are clean.
benches/backend_comparison.rs (1)

55-68: ⚠️ Potential issue | 🟠 Major

Fail fast on encode/decode errors before recording timings.

Both helpers ignore the return value from lzma_easy_buffer_encode / lzma_stream_buffer_decode. If one backend starts returning an error, the bench will time truncated output instead of surfacing the failure. Please check for LZMA_OK, and in backend_decode also assert that all input was consumed and out_pos == out_size.

Also applies to: 73-88

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benches/backend_comparison.rs` around lines 55 - 68, The helpers
backend_encode and backend_decode currently ignore the return values from
lzma_easy_buffer_encode and lzma_stream_buffer_decode; update both to check the
return code against LZMA_OK and fail fast (panic or return) before recording
timings if it is not OK. In backend_decode additionally assert that the decoder
consumed all input (input position/consumed count from
lzma_stream_buffer_decode) and that out_pos == out_size after decoding; if these
conditions are not met, fail immediately so the benchmark does not time
truncated output. Use the existing symbols (lzma_easy_buffer_encode,
lzma_stream_buffer_decode, LZMA_OK, out_pos, out_size, etc.) to locate where to
add the checks and error handling.
♻️ Duplicate comments (9)
xz/src/lz/lz_encoder_mf.rs (1)

638-639: ⚠️ Potential issue | 🟠 Major

Use wrapping_sub for the cached hash deltas.

These positions are u32 values that can wrap during long-running compression. Plain subtraction here can overflow under overflow checks and diverges from the wraparound behavior already used by lzma_mf_hc4_find in this file.

🐛 Proposed fix
-    let mut delta2: u32 = pos - *hash.offset(hash_2_value as isize);
-    let delta3: u32 = pos - *hash.offset(hash_3_index as isize);
+    let mut delta2: u32 = pos.wrapping_sub(*hash.offset(hash_2_value as isize));
+    let delta3: u32 = pos.wrapping_sub(*hash.offset(hash_3_index as isize));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xz/src/lz/lz_encoder_mf.rs` around lines 638 - 639, The subtraction of cached
hash deltas can overflow; change both computations to use wrapping subtraction:
compute delta2 and delta3 as pos.wrapping_sub(...). Specifically update the
lines that set delta2 (using hash and hash_2_value) and delta3 (using hash and
hash_3_index) to call wrapping_sub on pos so the u32 wraparound semantics match
lzma_mf_hc4_find.
xz/src/common/common.rs (1)

123-126: ⚠️ Potential issue | 🔴 Critical

Reintroduce runtime bounds guards before size arithmetic in lzma_bufcpy.

Line 123–126 only enforce invariants in debug builds. In release, invalid *in_pos/*out_pos can underflow in_size - *in_pos / out_size - *out_pos and lead to unsafe out-of-bounds copy lengths.

Suggested fix
 pub unsafe fn lzma_bufcpy(
     in_0: *const u8,
     in_pos: *mut size_t,
     in_size: size_t,
     out: *mut u8,
     out_pos: *mut size_t,
     out_size: size_t,
 ) -> size_t {
+    if *in_pos > in_size || *out_pos > out_size {
+        return 0;
+    }
+    if (*in_pos < in_size && in_0.is_null()) || (*out_pos < out_size && out.is_null()) {
+        return 0;
+    }
     debug_assert!(!in_0.is_null() || *in_pos == in_size);
     debug_assert!(!out.is_null() || *out_pos == out_size);
     debug_assert!(*in_pos <= in_size);
     debug_assert!(*out_pos <= out_size);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xz/src/common/common.rs` around lines 123 - 126, The debug_asserts in
lzma_bufcpy (the debug_assert! lines validating in_pos/out_pos vs
in_size/out_size) only run in debug builds; add explicit runtime guards at the
start of lzma_bufcpy that check *in_pos <= in_size and *out_pos <= out_size and
that pointer/null combos (in_0/out) match the position invariants, and return an
Err or clamp/limit the copy length if they fail before performing any arithmetic
like in_size - *in_pos or out_size - *out_pos; ensure these checks reference the
same symbols (in_0, out, *in_pos, *out_pos, in_size, out_size) so unsafe length
calculations cannot underflow in release builds.
scripts/compare_backends.sh (1)

32-45: ⚠️ Potential issue | 🟡 Minor

Document the implemented --help|-h flag.

The option is handled on Lines 32-46, but the usage block still doesn't mention it.

Proposed fix
 Options:
   --runs <n>          Number of measured runs per command (default: 5)
   --warmup <n>        Warmup runs per command (default: 1)
   --root-repeats <n>  Repeat the root test-binary bundle inside each timed command
   --results-dir <dir> Where to write hyperfine reports
   --skip-systest      Only compare the root crate test suite
+  --help, -h          Show this help message
 EOF
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/compare_backends.sh` around lines 32 - 45, The usage block printed by
the '--help|-h)' case does not document the short and long help flags; update
the heredoc under the '--help|-h)' branch so the Options section includes an
entry for "--help, -h" (or "--help|-h") with a one-line description like "Show
this help message and exit", ensuring both forms are mentioned and aligned with
the existing option formatting in the Usage output used by the script.
scripts/compare_workloads.sh (1)

49-64: ⚠️ Potential issue | 🟠 Major

Normalize decode args before generating the fixture and timed commands.

Line 51 appends --iters 1 --warmup 0 after raw user args, so duplicate-flag precedence decides whether those overrides actually win. Then Lines 53-64 drop the caller's --iters / --warmup entirely for the timed decode runs. That makes the decode path behave differently from the documented CLI and can skew comparisons.

Based on learnings, Use scripts/compare_workloads.sh for focused backend workload performance testing and keep workload shape, iteration count, and profiler command stable while iterating on a hotspot.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/compare_workloads.sh` around lines 49 - 64, The script currently
appends --iters 1 --warmup 0 to RAW_ARGS when producing the compressed fixture
but then constructs RUST_CMD and C_CMD from RAW_ARGS without normalizing those
flags, causing inconsistent decode runs; normalize by stripping any
user-supplied --iters/--warmup from RAW_ARGS and then append the canonical
iteration flags used for the fixture to both the encode invocation
(COMPRESSED_INPUT generation) and the decode invocations (RUST_CMD and C_CMD) so
COMPRESSED_INPUT, RUST_CMD, and C_CMD all run with the same deterministic
--iters/--warmup values; locate the construction points for COMPRESSED_INPUT,
the encode call that uses "$C_BIN" with RAW_ARGS, and the arrays RUST_CMD and
C_CMD to implement this normalization (remove duplicate flags from RAW_ARGS,
then add the chosen --iters/--warmup explicitly).
scripts/profile_backend.sh (1)

4-18: ⚠️ Potential issue | 🟡 Minor

Help text still shows both as a valid option despite being rejected.

Lines 6 and 10 mention both as a valid backend, but lines 43-46 explicitly reject it with an error. Remove both from the usage text and example to avoid confusion.

📝 Proposed fix
-Usage: scripts/profile_backend.sh <c|liblzma-sys|xz|rust|xz-sys|both> <encode|decode|size|crc32|crc64> [backend_probe args...]
+Usage: scripts/profile_backend.sh <c|liblzma-sys|xz|rust|xz-sys> <encode|decode|size|crc32|crc64> [backend_probe args...]
 
 Examples:
   scripts/profile_backend.sh xz decode --size 1048576 --iters 500 --warmup 50
-  scripts/profile_backend.sh both encode --input-kind random --size 8388608
+  scripts/profile_backend.sh liblzma-sys encode --input-kind random --size 8388608
   scripts/profile_backend.sh xz size --input-kind random --size 1048576 --iters 800 --warmup 80
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/profile_backend.sh` around lines 4 - 18, The help/usage block in the
script still lists "both" as a valid backend even though the runtime validation
(the backend rejection logic around the lines that check and error on "both")
disallows it; update the usage text and example lines in the help block (the
here-doc printed when $# -lt 2) to remove "both" and any example lines that
reference "both" so the printed help matches the actual accepted backends.
examples/standard_files_probe.rs (3)

12-17: 🧹 Nitpick | 🔵 Trivial

Missing compile-time guard for backend feature exclusivity.

Same issue as other probe examples - add compile_error! guards to enforce exactly one backend feature.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/standard_files_probe.rs` around lines 12 - 17, The three
BACKEND_NAME const definitions lack compile-time checks to ensure exactly one of
the features ("xz", "xz-sys", "liblzma-sys") is enabled; add #[cfg(...)]
compile_error! guards adjacent to those BACKEND_NAME declarations to (1) emit a
compile_error! when none of the three features are enabled and (2) emit a
compile_error! when more than one of the features is enabled, referencing the
same feature names so the compiler message clearly indicates the required
exclusive backend configuration.

82-94: ⚠️ Potential issue | 🟡 Minor

Fail fast when no corpus cases are selected.

If filters produce zero cases, the probe proceeds to measure() with total_bytes=0, producing misleading metrics. Add an early exit.

🛡️ Proposed fix
     if !cases.is_empty() {
         let case_names = cases
             .iter()
             .map(|case| case.name.as_str())
             .collect::<Vec<_>>()
             .join(",");
         println!("cases: {case_names}");
+    } else {
+        eprintln!("no .xz corpus cases selected (check --corpus-dir / --mode / --name-pattern)");
+        std::process::exit(2);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/standard_files_probe.rs` around lines 82 - 94, If filtering yields
zero cases, avoid calling measure() with total_bytes==0 by adding an early exit:
after computing cases (check cases.is_empty()), print a clear error message
(e.g., "no cases match filters") and return/exit (e.g., std::process::exit(1))
so run_cases, measure, and print_measurement are not invoked with an empty
corpus; update the block that currently prints case_names and the subsequent
call to measure(...) to short-circuit when cases.is_empty().

204-204: ⚠️ Potential issue | 🟡 Minor

Potential panic on non-UTF8 filenames.

into_string().unwrap() will panic if a filename contains non-UTF8 characters. Handle gracefully by skipping such files.

🛡️ Proposed fix
-        let filename = entry.file_name().into_string().unwrap();
+        let filename = match entry.file_name().into_string() {
+            Ok(s) => s,
+            Err(_) => continue, // Skip non-UTF8 filenames
+        };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/standard_files_probe.rs` at line 204, The code currently uses
entry.file_name().into_string().unwrap() which panics on non-UTF8 filenames;
change this to safely handle those cases by calling
entry.file_name().into_string().ok() and skipping the entry when None (e.g., if
let Some(filename) = entry.file_name().into_string().ok() { ... } else {
continue; }), so non-UTF8 file names are ignored instead of causing a panic;
update usage of the filename variable inside the same block (where filename is
referenced) accordingly.
examples/qc_probe.rs (1)

9-14: 🧹 Nitpick | 🔵 Trivial

Missing compile-time guard for backend feature exclusivity.

Same issue as in bufread_trailing_probe.rs - add compile_error! guards to enforce exactly one backend feature.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/qc_probe.rs` around lines 9 - 14, Add compile-time guards around the
BACKEND_NAME definitions to enforce exactly one backend feature: add conditional
compile_error! when none of "xz", "xz-sys", or "liblzma-sys" are enabled and add
conditional compile_error! when more than one of those features is enabled; keep
the existing BACKEND_NAME consts for each #[cfg(feature = "...")] branch
(referencing BACKEND_NAME and the feature names "xz", "xz-sys", "liblzma-sys")
so the compiler emits a clear error if zero or multiple backend features are
selected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/main.yml:
- Around line 79-82: The single cargo invocation runs tests/sys_equivalence.rs
which links both C symbol-exporting backends (liblzma_sys and xz_sys) in one
process and causes symbol conflicts; replace the single "cargo test --test
sys_equivalence" step with two separate test steps (or jobs) that run the same
test binary but only build one backend per process (one targeting liblzma_sys
and one targeting xz_sys), e.g. by setting the appropriate feature/ENV or cargo
flags so each run compiles and links only the desired provider (refer to
tests/sys_equivalence.rs, liblzma_sys, and xz_sys to ensure each step isolates a
single backend).

In `@Cargo.toml`:
- Around line 55-58: The liblzma-sys feature currently forces
liblzma-sys/bindgen by listing "liblzma-sys/bindgen" in the liblzma-sys feature
array; remove that entry so enabling --features liblzma-sys no longer implicitly
enables bindgen. Update the liblzma-sys feature declaration to only include
"dep:liblzma-sys" (leave static/parallel lines unchanged) and keep a separate
bindgen feature that maps to "liblzma-sys?/bindgen" so bindgen remains opt-in.

In `@examples/bufread_trailing_probe.rs`:
- Around line 8-13: The three BACKEND_NAME constant declarations assume exactly
one of the backend features is enabled; add compile-time guards around these
constants using cfg attributes and compile_error! so builds with none or more
than one of the features fail; specifically, add mutually-exclusive checks that
emit compile_error! when combinations of features like (feature = "xz" and
feature = "xz-sys"), (feature = "xz" and feature = "liblzma-sys"), and (feature
= "xz-sys" and feature = "liblzma-sys") are enabled, and a check that emits
compile_error! when none of the three features is enabled, ensuring BACKEND_NAME
remains defined only when exactly one backend feature is active.

In `@scripts/compare_backends.sh`:
- Around line 72-74: The Rust root test build is using workspace defaults so the
measured feature set can drift; update the first cargo invocation (the env
CARGO_TARGET_DIR="$ROOT_RUST_TARGET" cargo test line) to explicitly disable
defaults and request the intended feature set (e.g. add --no-default-features
--features xz-tests or the correct feature name for the root xz tests) so the
script deterministically builds the same Rust backend as labeled; leave the
C-side invocation as-is.

In `@src/stream.rs`:
- Around line 311-323: The duplicated doc comments for PRESET_DEFAULT and
PRESET_LEVEL_MASK should be consolidated: move a single doc comment for each
constant to sit once above both cfg variants (or use a single #[doc = "..."]
applied before the cfg blocks) so PRESET_DEFAULT and PRESET_LEVEL_MASK each have
one shared doc comment rather than repeating it for the #[cfg(feature = "xz")]
and #[cfg(not(feature = "xz"))] definitions; keep the same text and ensure the
comments remain immediately above the grouped definitions so documentation
generation still applies to both symbols.

In `@tests/sys_equivalence.rs`:
- Around line 8-9: The test currently imports both C and Rust sys backends
(liblzma_sys as c_sys and xz_sys as rs_sys) and runs both paths in one process,
which risks symbol conflicts; modify tests/sys_equivalence.rs so it only wires a
single backend per test process by removing one of the dual imports (either
delete or conditionally compile out the use of liblzma_sys or xz_sys) and ensure
cross-backend equivalence is performed via a separate process-isolated mechanism
(e.g., move the comparison logic into a script like scripts/compare_backends.sh
or run the test twice under different process invocations), updating any
references to c_sys or rs_sys symbols in functions like the equivalence test
harness so each test binary links only one backend.

In `@xz/src/common/stream_decoder_mt.rs`:
- Around line 569-574: comp_blk_size currently takes a raw pointer and
dereferences it inside an unsafe block but is declared as a safe fn; change its
signature to unsafe fn comp_blk_size(coder: *const lzma_stream_coder) -> size_t
and keep the internal unsafe dereference or remove the inner unsafe block as
appropriate so callers must use an unsafe call; this makes the safety contract
explicit for the function that accesses raw pointers.

In `@xz/src/types.rs`:
- Around line 637-645: The read32le implementation currently calls .to_le() on
the value read by core::ptr::read_unaligned, which is wrong for converting
little-endian bytes into native endianness; update read32le to use
u32::from_le(...) (i.e. call from_le on the read value) so it converts from
little-endian rather than to it; leave write32le as-is since it correctly uses
.to_le().

---

Outside diff comments:
In `@benches/backend_comparison.rs`:
- Around line 55-68: The helpers backend_encode and backend_decode currently
ignore the return values from lzma_easy_buffer_encode and
lzma_stream_buffer_decode; update both to check the return code against LZMA_OK
and fail fast (panic or return) before recording timings if it is not OK. In
backend_decode additionally assert that the decoder consumed all input (input
position/consumed count from lzma_stream_buffer_decode) and that out_pos ==
out_size after decoding; if these conditions are not met, fail immediately so
the benchmark does not time truncated output. Use the existing symbols
(lzma_easy_buffer_encode, lzma_stream_buffer_decode, LZMA_OK, out_pos, out_size,
etc.) to locate where to add the checks and error handling.

In `@xz/src/common/block_decoder.rs`:
- Around line 165-278: The PR changes around lzma_block_decoder_init (and
related symbols block_decode, block_decoder_end, lzma_raw_decoder_init) affect
performance-sensitive decoder paths; before merging, run the documented backend
performance workflow (docs/performance-workflow.md) and the deterministic
regression gate script scripts/compare_backends.sh to validate tuning impact and
ensure no regressions; if regressions appear, iterate on the changes to
lzma_block_decoder_init (or revert specific tuning) and repeat the workflows
until comparisons are clean.

---

Duplicate comments:
In `@examples/qc_probe.rs`:
- Around line 9-14: Add compile-time guards around the BACKEND_NAME definitions
to enforce exactly one backend feature: add conditional compile_error! when none
of "xz", "xz-sys", or "liblzma-sys" are enabled and add conditional
compile_error! when more than one of those features is enabled; keep the
existing BACKEND_NAME consts for each #[cfg(feature = "...")] branch
(referencing BACKEND_NAME and the feature names "xz", "xz-sys", "liblzma-sys")
so the compiler emits a clear error if zero or multiple backend features are
selected.

In `@examples/standard_files_probe.rs`:
- Around line 12-17: The three BACKEND_NAME const definitions lack compile-time
checks to ensure exactly one of the features ("xz", "xz-sys", "liblzma-sys") is
enabled; add #[cfg(...)] compile_error! guards adjacent to those BACKEND_NAME
declarations to (1) emit a compile_error! when none of the three features are
enabled and (2) emit a compile_error! when more than one of the features is
enabled, referencing the same feature names so the compiler message clearly
indicates the required exclusive backend configuration.
- Around line 82-94: If filtering yields zero cases, avoid calling measure()
with total_bytes==0 by adding an early exit: after computing cases (check
cases.is_empty()), print a clear error message (e.g., "no cases match filters")
and return/exit (e.g., std::process::exit(1)) so run_cases, measure, and
print_measurement are not invoked with an empty corpus; update the block that
currently prints case_names and the subsequent call to measure(...) to
short-circuit when cases.is_empty().
- Line 204: The code currently uses entry.file_name().into_string().unwrap()
which panics on non-UTF8 filenames; change this to safely handle those cases by
calling entry.file_name().into_string().ok() and skipping the entry when None
(e.g., if let Some(filename) = entry.file_name().into_string().ok() { ... } else
{ continue; }), so non-UTF8 file names are ignored instead of causing a panic;
update usage of the filename variable inside the same block (where filename is
referenced) accordingly.

In `@scripts/compare_backends.sh`:
- Around line 32-45: The usage block printed by the '--help|-h)' case does not
document the short and long help flags; update the heredoc under the
'--help|-h)' branch so the Options section includes an entry for "--help, -h"
(or "--help|-h") with a one-line description like "Show this help message and
exit", ensuring both forms are mentioned and aligned with the existing option
formatting in the Usage output used by the script.

In `@scripts/compare_workloads.sh`:
- Around line 49-64: The script currently appends --iters 1 --warmup 0 to
RAW_ARGS when producing the compressed fixture but then constructs RUST_CMD and
C_CMD from RAW_ARGS without normalizing those flags, causing inconsistent decode
runs; normalize by stripping any user-supplied --iters/--warmup from RAW_ARGS
and then append the canonical iteration flags used for the fixture to both the
encode invocation (COMPRESSED_INPUT generation) and the decode invocations
(RUST_CMD and C_CMD) so COMPRESSED_INPUT, RUST_CMD, and C_CMD all run with the
same deterministic --iters/--warmup values; locate the construction points for
COMPRESSED_INPUT, the encode call that uses "$C_BIN" with RAW_ARGS, and the
arrays RUST_CMD and C_CMD to implement this normalization (remove duplicate
flags from RAW_ARGS, then add the chosen --iters/--warmup explicitly).

In `@scripts/profile_backend.sh`:
- Around line 4-18: The help/usage block in the script still lists "both" as a
valid backend even though the runtime validation (the backend rejection logic
around the lines that check and error on "both") disallows it; update the usage
text and example lines in the help block (the here-doc printed when $# -lt 2) to
remove "both" and any example lines that reference "both" so the printed help
matches the actual accepted backends.

In `@xz/src/common/common.rs`:
- Around line 123-126: The debug_asserts in lzma_bufcpy (the debug_assert! lines
validating in_pos/out_pos vs in_size/out_size) only run in debug builds; add
explicit runtime guards at the start of lzma_bufcpy that check *in_pos <=
in_size and *out_pos <= out_size and that pointer/null combos (in_0/out) match
the position invariants, and return an Err or clamp/limit the copy length if
they fail before performing any arithmetic like in_size - *in_pos or out_size -
*out_pos; ensure these checks reference the same symbols (in_0, out, *in_pos,
*out_pos, in_size, out_size) so unsafe length calculations cannot underflow in
release builds.

In `@xz/src/lz/lz_encoder_mf.rs`:
- Around line 638-639: The subtraction of cached hash deltas can overflow;
change both computations to use wrapping subtraction: compute delta2 and delta3
as pos.wrapping_sub(...). Specifically update the lines that set delta2 (using
hash and hash_2_value) and delta3 (using hash and hash_3_index) to call
wrapping_sub on pos so the u32 wraparound semantics match lzma_mf_hc4_find.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 386f7292-7df4-47c9-b305-e1b76eeb1e94

📥 Commits

Reviewing files that changed from the base of the PR and between 60dd817 and 0540c53.

📒 Files selected for processing (59)
  • .github/workflows/main.yml
  • Cargo.toml
  • README.md
  • benches/backend_comparison.rs
  • docs/performance-workflow.md
  • examples/bufread_trailing_probe.rs
  • examples/qc_probe.rs
  • examples/standard_files_probe.rs
  • perf-probe/Cargo.toml
  • perf-probe/src/main.rs
  • scripts/compare_api_workloads.sh
  • scripts/compare_backends.sh
  • scripts/compare_workloads.sh
  • scripts/profile_backend.sh
  • src/bufread.rs
  • src/lib.rs
  • src/stream.rs
  • src/write.rs
  • systest/Cargo.toml
  • systest/build.rs
  • systest/src/main.rs
  • tests/sys_equivalence.rs
  • xz/src/alloc.rs
  • xz/src/check/crc64_fast.rs
  • xz/src/check/sha256.rs
  • xz/src/common/alone_decoder.rs
  • xz/src/common/block_buffer_encoder.rs
  • xz/src/common/block_decoder.rs
  • xz/src/common/block_encoder.rs
  • xz/src/common/common.rs
  • xz/src/common/file_info.rs
  • xz/src/common/index.rs
  • xz/src/common/index_decoder.rs
  • xz/src/common/index_encoder.rs
  • xz/src/common/index_hash.rs
  • xz/src/common/lzip_decoder.rs
  • xz/src/common/stream_decoder.rs
  • xz/src/common/stream_decoder_mt.rs
  • xz/src/common/stream_flags_decoder.rs
  • xz/src/common/stream_flags_encoder.rs
  • xz/src/delta/delta_common.rs
  • xz/src/delta/delta_decoder.rs
  • xz/src/delta/delta_encoder.rs
  • xz/src/lz/lz_encoder_mf.rs
  • xz/src/lzma/lzma2_decoder.rs
  • xz/src/lzma/lzma2_encoder.rs
  • xz/src/lzma/lzma_decoder.rs
  • xz/src/lzma/lzma_encoder.rs
  • xz/src/simple/arm.rs
  • xz/src/simple/arm64.rs
  • xz/src/simple/armthumb.rs
  • xz/src/simple/ia64.rs
  • xz/src/simple/powerpc.rs
  • xz/src/simple/riscv.rs
  • xz/src/simple/simple_decoder.rs
  • xz/src/simple/simple_encoder.rs
  • xz/src/simple/sparc.rs
  • xz/src/simple/x86.rs
  • xz/src/types.rs

Comment thread .github/workflows/main.yml
Comment thread Cargo.toml
Comment thread examples/bufread_trailing_probe.rs
Comment on lines +72 to +74
echo "prebuilding root test binaries..."
env CARGO_TARGET_DIR="$ROOT_RUST_TARGET" cargo test --release --no-run >/dev/null
env LZMA_API_STATIC=1 CARGO_TARGET_DIR="$ROOT_C_TARGET" cargo test --release --no-default-features --features liblzma-sys --no-run >/dev/null
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Make the Rust-side root build explicit instead of relying on defaults.

This script is the regression gate, but xz-tests on Line 73 is currently whatever the workspace default features resolve to. If defaults change, the label and measured backend diverge silently.

Proposed fix
-env CARGO_TARGET_DIR="$ROOT_RUST_TARGET" cargo test --release --no-run >/dev/null
+env CARGO_TARGET_DIR="$ROOT_RUST_TARGET" cargo test --release --no-run --no-default-features --features xz >/dev/null
 env LZMA_API_STATIC=1 CARGO_TARGET_DIR="$ROOT_C_TARGET" cargo test --release --no-default-features --features liblzma-sys --no-run >/dev/null

Based on learnings, Use scripts/compare_backends.sh as the coarse deterministic regression gate for full deterministic backend comparison and keep workload shape, iteration count, and profiler command stable while iterating on a hotspot.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "prebuilding root test binaries..."
env CARGO_TARGET_DIR="$ROOT_RUST_TARGET" cargo test --release --no-run >/dev/null
env LZMA_API_STATIC=1 CARGO_TARGET_DIR="$ROOT_C_TARGET" cargo test --release --no-default-features --features liblzma-sys --no-run >/dev/null
echo "prebuilding root test binaries..."
env CARGO_TARGET_DIR="$ROOT_RUST_TARGET" cargo test --release --no-run --no-default-features --features xz >/dev/null
env LZMA_API_STATIC=1 CARGO_TARGET_DIR="$ROOT_C_TARGET" cargo test --release --no-default-features --features liblzma-sys --no-run >/dev/null
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/compare_backends.sh` around lines 72 - 74, The Rust root test build
is using workspace defaults so the measured feature set can drift; update the
first cargo invocation (the env CARGO_TARGET_DIR="$ROOT_RUST_TARGET" cargo test
line) to explicitly disable defaults and request the intended feature set (e.g.
add --no-default-features --features xz-tests or the correct feature name for
the root xz tests) so the script deterministically builds the same Rust backend
as labeled; leave the C-side invocation as-is.

Comment thread src/stream.rs
Comment thread tests/sys_equivalence.rs Outdated
Comment on lines +569 to 574
fn comp_blk_size(coder: *const lzma_stream_coder) -> size_t {
unsafe {
(vli_ceil4((*coder).block_options.compressed_size)
+ lzma_check_size((*coder).stream_flags.check) as lzma_vli) as size_t
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

LGTM! Correctly removes C ABI for internal Rust helper.

The change from extern "C" fn to fn is appropriate since comp_blk_size is an internal Rust helper that's only called from Rust code (line 627) and not used as a C callback or FFI entry point.

Optional refinement: Consider marking as unsafe fn

Since this function dereferences a raw pointer, modern Rust convention suggests marking it as unsafe fn to make the safety contract explicit:

-fn comp_blk_size(coder: *const lzma_stream_coder) -> size_t {
+unsafe fn comp_blk_size(coder: *const lzma_stream_coder) -> size_t {
-    unsafe {
-        (vli_ceil4((*coder).block_options.compressed_size)
-            + lzma_check_size((*coder).stream_flags.check) as lzma_vli) as size_t
-    }
+    (vli_ceil4((*coder).block_options.compressed_size)
+        + lzma_check_size((*coder).stream_flags.check) as lzma_vli) as size_t
}

This is a stylistic improvement, not a correctness issue, since the function is already called from an unsafe context.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn comp_blk_size(coder: *const lzma_stream_coder) -> size_t {
unsafe {
(vli_ceil4((*coder).block_options.compressed_size)
+ lzma_check_size((*coder).stream_flags.check) as lzma_vli) as size_t
}
}
unsafe fn comp_blk_size(coder: *const lzma_stream_coder) -> size_t {
(vli_ceil4((*coder).block_options.compressed_size)
lzma_check_size((*coder).stream_flags.check) as lzma_vli) as size_t
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xz/src/common/stream_decoder_mt.rs` around lines 569 - 574, comp_blk_size
currently takes a raw pointer and dereferences it inside an unsafe block but is
declared as a safe fn; change its signature to unsafe fn comp_blk_size(coder:
*const lzma_stream_coder) -> size_t and keep the internal unsafe dereference or
remove the inner unsafe block as appropriate so callers must use an unsafe call;
this makes the safety contract explicit for the function that accesses raw
pointers.

Comment thread xz/src/types.rs
Comment on lines 637 to 645
pub fn read32le(buf: *const u8) -> u32 {
return unsafe {
let mut num: u32 = *buf as u32;
num |= (*buf.offset(1) as u32) << 8;
num |= (*buf.offset(2) as u32) << 16;
num |= (*buf.offset(3) as u32) << 24;
num
};
unsafe { core::ptr::read_unaligned(buf.cast::<u32>()).to_le() }
}
#[inline]
pub fn write32le(buf: *mut u8, num: u32) {
unsafe {
*buf = num as u8;
*buf.offset(1) = (num >> 8) as u8;
*buf.offset(2) = (num >> 16) as u8;
*buf.offset(3) = (num >> 24) as u8;
core::ptr::write_unaligned(buf.cast::<u32>(), num.to_le());
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the original implementation and check if there are tests covering this
rg -n 'read32le|write32le' --type rust -C3

Repository: youknowone/xz-rs

Length of output: 13037


🌐 Web query:

What is the correct way to read little-endian u32 from unaligned bytes in Rust?

💡 Result:

Safe (recommended): copy 4 bytes, then decode

This works for any alignment because you never create an unaligned &u32; you just build a [u8; 4] and use u32::from_le_bytes. [1]

use std::convert::TryInto;

fn read_u32_le(input: &[u8]) -> Option<u32> {
    let b: [u8; 4] = input.get(..4)?.try_into().ok()?;
    Some(u32::from_le_bytes(b))
}

unsafe (often fastest): unaligned load + endian fixup

If you know you have at least 4 bytes, you can do an unaligned load with ptr::read_unaligned, then convert from little-endian to native with u32::from_le. [2]

fn read_u32_le_unaligned(input: &[u8]) -> Option<u32> {
    if input.len() < 4 { return None; }
    let p = input.as_ptr().cast::<u32>();
    Some(unsafe { u32::from_le(std::ptr::read_unaligned(p)) })
}

Common alternatives (when parsing lots of fields)

  • byteorder crate: ReadBytesExt::read_u32::<LittleEndian>() on a byte slice/reader. [3]
  • zerocopy crate: zerocopy::byteorder::U32<LE> has alignment 1 (explicitly designed for unaligned, endian-aware fields). [4]

Avoid: casting bytes to &u32 / *const u32 and dereferencing normally; creating an unaligned reference is undefined behavior in Rust (use read_unaligned or the safe copy approach instead). [2]

Sources: [1] Rust u32::from_le_bytes docs https://doc.rust-lang.org/core/primitive.u32.html • [2] std::ptr::read_unaligned docs https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html • [3] byteorder::ReadBytesExt docs https://docs.rs/byteorder/latest/byteorder/trait.ReadBytesExt.html • [4] zerocopy::byteorder::U32 docs https://google.github.io/zerocopy/zerocopy/byteorder/struct.U32.html


Fix endianness conversion in read32le—should use from_le(), not to_le().

The function reads bytes as a native-endian u32 via read_unaligned, then calls .to_le(). This inverts the byte order on big-endian machines. For reading little-endian data from bytes, use u32::from_le() instead to convert from little-endian representation to native endianness.

Correct implementation: u32::from_le(core::ptr::read_unaligned(buf.cast::<u32>()))

Note: write32le correctly uses .to_le() and does not require changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xz/src/types.rs` around lines 637 - 645, The read32le implementation
currently calls .to_le() on the value read by core::ptr::read_unaligned, which
is wrong for converting little-endian bytes into native endianness; update
read32le to use u32::from_le(...) (i.e. call from_le on the read value) so it
converts from little-endian rather than to it; leave write32le as-is since it
correctly uses .to_le().

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

♻️ Duplicate comments (8)
.github/workflows/main.yml (1)

79-82: ⚠️ Potential issue | 🔴 Critical

sys_equivalence test requires both backends, causing symbol conflicts.

The sys_equivalence test (per context snippets) imports both xz_sys::* and liblzma_sys::* to compare constants and functions. Running cargo test --test sys_equivalence without explicit features will use the default xz feature, but the test file itself references both crates directly. This creates the symbol conflict scenario the learnings warn about.

The test needs both xz-sys and liblzma-sys features enabled to compile, but then links conflicting lzma_* symbols.

Based on learnings, never compare C and Rust sys backends in the same process due to conflicting lzma_* symbol exports.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/main.yml around lines 79 - 82, The CI is running the
sys_equivalence test which imports both xz_sys and liblzma_sys and causes
conflicting lzma_* symbol exports; remove or change the single-step "cargo test
--test sys_equivalence" invocation so the test never loads both backends in the
same process — either stop running this test in that workflow step (remove the
cargo test --test sys_equivalence line) or replace it with two isolated runs
that enable only one backend per process (e.g., run the test once with features
enabling xz-sys only and once with liblzma-sys only), ensuring sys_equivalence,
xz_sys, liblzma_sys and lzma_* symbols are never linked together in one test
process.
scripts/compare_workloads.sh (1)

53-64: ⚠️ Potential issue | 🟠 Major

Forward the caller's decode settings to perf-probe.

In the decode branch, RAW_ARGS are only used to create COMPRESSED_INPUT. The measured commands on Line 53 and Line 59 drop --iters, --warmup, and any other decode-relevant probe settings, so the benchmark no longer matches the CLI the caller requested.

Based on learnings: Use scripts/compare_workloads.sh for focused backend workload performance testing, and keep workload shape, iteration count, and profiler command stable while iterating on a hotspot during performance optimization.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/compare_workloads.sh` around lines 53 - 64, The benchmark commands
RUST_CMD and C_CMD drop decode-related flags (like --iters, --warmup) because
only COMPRESSED_INPUT was derived from RAW_ARGS; modify the construction of
RUST_CMD and C_CMD to append the caller's decode settings (use RAW_ARGS or the
specific parsed flags) so perf-probe invokes the exact same CLI shape the caller
requested while still passing --compressed-input "$COMPRESSED_INPUT" and
--workload decode; ensure you reference and preserve the existing variables
RUST_CMD, C_CMD, RAW_ARGS, COMPRESSED_INPUT and include RAW_ARGS (or the subset
of decode flags) when building those arrays.
tests/sys_equivalence.rs (1)

1-1: ⚠️ Potential issue | 🔴 Critical

Do not link both sys backends into this test binary.

This suite still binds xz_sys and liblzma_sys into one process and executes both side by side. That violates the backend-isolation constraint and can invalidate the whole equivalence run with lzma_* symbol collisions; keep each test binary single-backend and move cross-backend comparison to a process-isolated workflow.

Based on learnings: Never compare C and Rust sys backends in the same process due to conflicting lzma_* symbol exports, and use scripts/compare_backends.sh as the deterministic comparison gate.

Also applies to: 120-121, 203-203, 280-287, 323-324, 395-463

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/sys_equivalence.rs` at line 1, This test binary currently ends up
linking both C and Rust sys backends (xz_sys and liblzma_sys) and must be
restricted to a single backend; update tests/sys_equivalence.rs and associated
test-build cfgs so the binary is compiled with only one backend enabled (e.g.,
gate code behind feature flags or target cfgs instead of including both
backends) and remove any direct references that cause simultaneous linking of
xz_sys and liblzma_sys; then move the cross-backend equivalence comparison out
of-process into the existing process-isolated comparison workflow (the
compare-backends script) so lzma_* symbol collisions cannot occur.
scripts/compare_backends.sh (1)

72-74: ⚠️ Potential issue | 🟠 Major

Make the Rust root build explicit.

Line 73 still uses workspace defaults, but the timed command is labeled xz-tests. If defaults move, this regression gate silently measures a different backend than its output claims.

Proposed fix
-env CARGO_TARGET_DIR="$ROOT_RUST_TARGET" cargo test --release --no-run >/dev/null
+env CARGO_TARGET_DIR="$ROOT_RUST_TARGET" cargo test --release --no-default-features --features xz --no-run >/dev/null
 env LZMA_API_STATIC=1 CARGO_TARGET_DIR="$ROOT_C_TARGET" cargo test --release --no-default-features --features liblzma-sys --no-run >/dev/null

Based on learnings: Use scripts/compare_backends.sh as the coarse deterministic regression gate for performance testing, and keep workload shape, iteration count, and profiler command stable while iterating on a hotspot during performance optimization.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/compare_backends.sh` around lines 72 - 74, The timed "xz-tests" Rust
build is still using workspace defaults so it may build the wrong crate; update
the cargo invocation that uses ROOT_RUST_TARGET (the line with env
CARGO_TARGET_DIR="$ROOT_RUST_TARGET" cargo test --release --no-run) to
explicitly target the root crate by adding a manifest or package selector (e.g.
--manifest-path pointing at the root Cargo.toml or -p <root-package-name>) so
the script deterministically builds the intended root test binary instead of
relying on workspace defaults.
examples/standard_files_probe.rs (2)

204-204: ⚠️ Potential issue | 🟡 Minor

Avoid panicking on non-UTF8 corpus filenames.

into_string().unwrap() will abort the probe on a valid filesystem entry whose name is not UTF-8.

Suggested fix
-        let filename = entry.file_name().into_string().unwrap();
+        let filename = entry.file_name().to_string_lossy().into_owned();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/standard_files_probe.rs` at line 204, The code uses
entry.file_name().into_string().unwrap(), which will panic on non-UTF-8
filenames; change this to safely handle OS strings (e.g., use
entry.file_name().to_string_lossy() to get a Cow<str> or match
entry.file_name().into_string().ok() and skip/log non-UTF8 entries) so the probe
doesn't abort on valid filesystem entries; update the usage of filename
accordingly wherever filename is used.

63-94: ⚠️ Potential issue | 🟠 Major

Fail fast when the filters select zero corpus cases.

With an empty cases set, the probe still emits timings and throughput for zero useful work. That makes the API regression report look valid even though no corpus path actually ran.

Suggested fix
     let cases = load_cases(&config).unwrap_or_else(|err| {
         eprintln!("failed to load corpus: {err}");
         std::process::exit(1);
     });
+    if cases.is_empty() {
+        eprintln!("no .xz corpus cases selected (check --corpus-dir / --mode / --name-pattern)");
+        std::process::exit(2);
+    }
     let total_bytes: usize = cases.iter().map(|case| case.data.len()).sum();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/standard_files_probe.rs` around lines 63 - 94, After loading cases
with load_cases, add a fast-fail when the filtered corpus is empty: check
cases.is_empty() after the load and before printing measurements (and before
calling measure/run_cases) and print an error message (e.g., "no cases matched
filters" or include filter info) then exit with a non-zero code. Modify the
block around the variable cases (used by load_cases, cases.iter(), measure, and
run_cases) to perform this early return so you don't emit timing/throughput for
zero work.
Cargo.toml (1)

52-55: ⚠️ Potential issue | 🟠 Major

Keep bindgen opt-in for the liblzma-sys backend.

--features liblzma-sys now implicitly enables liblzma-sys/bindgen, so plain C-backend builds require the bindgen toolchain even when callers did not ask for it.

Suggested fix
-liblzma-sys = ["dep:liblzma-sys", "liblzma-sys/bindgen"]
+liblzma-sys = ["dep:liblzma-sys"]
 static = ["liblzma-sys?/static"]
 parallel = ["xz-sys?/parallel", "liblzma-sys?/parallel", "num_cpus"]
-bindgen = ["liblzma-sys?/bindgen"] # only affects liblzma-sys
+bindgen = ["liblzma-sys", "liblzma-sys?/bindgen"] # only affects liblzma-sys

Run this to verify the feature wiring and compare it with the rest of the workspace:

#!/bin/bash
set -euo pipefail

echo "== root Cargo feature section =="
sed -n '48,58p' Cargo.toml

echo
echo "== all Cargo.toml references to liblzma-sys/bindgen =="
fd -a Cargo.toml | while read -r file; do
  echo "-- $file --"
  rg -n 'liblzma-sys|bindgen' "$file" -C1 || true
  echo
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Cargo.toml` around lines 52 - 55, The liblzma-sys feature currently enables
liblzma-sys/bindgen implicitly (liblzma-sys = ["dep:liblzma-sys",
"liblzma-sys/bindgen"]), causing plain C-backend builds to require bindgen;
remove "liblzma-sys/bindgen" from the liblzma-sys feature list so liblzma-sys is
opt-in without bindgen, and keep a separate top-level "bindgen" feature (bindgen
= ["liblzma-sys?/bindgen"] or similar) so callers must explicitly opt into
bindgen; update Cargo.toml entries around the liblzma-sys and bindgen feature
names to reflect this and run the provided verification script to ensure no
unintended wiring.
scripts/profile_backend.sh (1)

4-16: ⚠️ Potential issue | 🟡 Minor

Make the help text match the supported backend set.

The usage block still advertises both, but the runtime path rejects it. That keeps steering users into an unsupported mode.

Based on learnings: Never compare C and Rust sys backends in the same process due to conflicting lzma_* symbol exports.

Also applies to: 43-45

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/profile_backend.sh` around lines 4 - 16, Update the usage/help block
in scripts/profile_backend.sh to remove the unsupported "both" backend from the
supported backend list and all examples (so the header "Usage:
scripts/profile_backend.sh <c|liblzma-sys|xz|rust|xz-sys|both> ..." becomes
"...<c|liblzma-sys|xz|rust|xz-sys>..."), and likewise remove any example lines
that show "both"; also adjust any descriptive text that mentions comparing C and
Rust sys backends in the same process (per the runtime rejection) so the help no
longer suggests the unsupported mode—ensure you edit the Usage and Examples
sections (and the similar text referenced around lines 43-45) in this file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benches/backend_comparison.rs`:
- Around line 114-120: The benchmark currently precomputes compressed = unsafe {
backend_encode(&input) } with the backend under test, which lets encoder output
shape vary between builds and skews decode comparisons; instead produce a single
canonical compressed payload once (e.g., via a reference encoder or a checked-in
fixture) and use that same payload for every backend's bench input. Replace the
use of backend_encode in benches/backend_comparison.rs so group.bench_with_input
and the closure calling unsafe { backend_decode(black_box(data), size) } receive
the canonical_compressed bytes (keep BenchmarkId::new(BACKEND_NAME, label),
label, size, and iteration/profiler settings unchanged), ensuring
backend_decode, compressed variable, and bench harness remain otherwise
identical across backend builds.

In `@docs/performance-workflow.md`:
- Line 127: Update the sentence describing the "auto" profiler selection to
reflect the real selection order used by scripts/profile_backend.sh: it prefers
"samply" if installed, then falls back to "perf", and finally uses "plain"
(release debuginfo) if neither is available; also mention that when both tools
are present the script will still choose "samply" first and recommend using
scripts/profile_backend.sh for single-backend profiling analysis.

In `@examples/bufread_trailing_probe.rs`:
- Around line 218-230: The test helper run_decode_case uses decoder.read(&mut
decompressed) which may return a short read; replace that call with
decoder.read_exact(&mut decompressed) to guarantee the entire decompressed
buffer is filled before asserting contents and total_out. Update the code in
run_decode_case to call read_exact on bufread::XzDecoder (and handle the
unwrap/error the same way) so subsequent assertions on decompressed,
decoder.total_out(), and reading trailing bytes from decoder_reader remain
valid.

In `@examples/qc_probe.rs`:
- Around line 189-194: next_u64 implements a xorshift64* pseudorandom number
generator but currently lacks documentation; add a brief comment above the
next_u64 function stating that it is the xorshift64* PRNG (mention the shift
amounts 12, 25, 27 and the multiplier 0x2545F4914F6CDD1D), note input/output
(takes a u64 state and returns next u64), and include a short citation/link or
reference to Marsaglia/xorshift for maintainability.
- Around line 9-14: Add explicit compile-time checks to ensure exactly one
backend feature is enabled for the BACKEND_NAME constant: use cfg-based
compile_error! guards that trigger when none of the features ("xz", "xz-sys",
"liblzma-sys") are enabled and when more than one is enabled, so the compiler
emits a clear error rather than relying on implicit cfg fallback; update the
module around the BACKEND_NAME const to include these #[cfg(...)] compile_error!
cases (none enabled and multiple enabled) and keep the existing per-feature
BACKEND_NAME definitions unchanged.

In `@examples/standard_files_probe.rs`:
- Around line 278-283: The run_good_write function incorrectly pre-fills the
decoder's output buffer with decoded_seed (decoded_seed.to_vec()), causing
XzDecoder (created via write::XzDecoder::new_multi_decoder) to append decoded
data onto that seed and produce inflated outputs and wrong digests; fix by
supplying an empty buffer (e.g., Vec::new()) as the writer passed into
new_multi_decoder so the decoder only contains decoded(data) before calling
finish and fold_bytes uses the correct length/digest.

In `@perf-probe/src/main.rs`:
- Around line 224-230: The usage string in function usage() incorrectly omits
the "xz" backend; update the help text built in usage() (where message.push_str
is used) to include "xz" alongside "liblzma-sys|xz-sys" (e.g., list
"<liblzma-sys|xz-sys|xz>") so the CLI examples reflect all supported backends
and make default/backend examples valid.
- Around line 511-531: The backend_decode function currently treats any LZMA_OK
as success even if not all input was consumed or the expected output size wasn't
produced; update backend_decode to, after calling lzma_stream_buffer_decode,
verify that in_pos == compressed.len() (whole payload consumed) and out_pos ==
out_size (expected output produced) and fail/assert with a clear message
(including BACKEND_NAME and ret) if either check fails; keep these checks
adjacent to the existing assert_eq!(ret, LZMA_OK, ...) and return the truncated
buffer only when both validations pass.

In `@scripts/compare_api_workloads.sh`:
- Around line 53-54: The first build invocation (env
CARGO_TARGET_DIR="$RUST_TARGET" cargo build --example "$EXAMPLE_NAME" --release)
relies on workspace defaults and can end up building the wrong backend; change
it to explicitly select the xz backend by setting the same backend selector used
for the lzma build (e.g., export or prefix LZMA_API_STATIC=0) or by passing the
explicit feature that triggers the xz backend so the script consistently builds
the xz example binary instead of relying on workspace defaults.

In `@scripts/profile_backend.sh`:
- Around line 53-63: The script currently appends all passthrough args ("$@") to
COMMON_CMD which allows callers to override the fixed --workload; before adding
"$@" filter out any user-supplied --workload (and its value) so repeated
--workload flags cannot override the fixed --workload used in the
filename/labels. Update the logic around COMMON_CMD and the place where
COMMON_CMD+=("$@") is done to iterate over "$@" and skip any argument equal to
--workload and the following value (or any --workload=... form), then append
only the filtered args to COMMON_CMD.

---

Duplicate comments:
In @.github/workflows/main.yml:
- Around line 79-82: The CI is running the sys_equivalence test which imports
both xz_sys and liblzma_sys and causes conflicting lzma_* symbol exports; remove
or change the single-step "cargo test --test sys_equivalence" invocation so the
test never loads both backends in the same process — either stop running this
test in that workflow step (remove the cargo test --test sys_equivalence line)
or replace it with two isolated runs that enable only one backend per process
(e.g., run the test once with features enabling xz-sys only and once with
liblzma-sys only), ensuring sys_equivalence, xz_sys, liblzma_sys and lzma_*
symbols are never linked together in one test process.

In `@Cargo.toml`:
- Around line 52-55: The liblzma-sys feature currently enables
liblzma-sys/bindgen implicitly (liblzma-sys = ["dep:liblzma-sys",
"liblzma-sys/bindgen"]), causing plain C-backend builds to require bindgen;
remove "liblzma-sys/bindgen" from the liblzma-sys feature list so liblzma-sys is
opt-in without bindgen, and keep a separate top-level "bindgen" feature (bindgen
= ["liblzma-sys?/bindgen"] or similar) so callers must explicitly opt into
bindgen; update Cargo.toml entries around the liblzma-sys and bindgen feature
names to reflect this and run the provided verification script to ensure no
unintended wiring.

In `@examples/standard_files_probe.rs`:
- Line 204: The code uses entry.file_name().into_string().unwrap(), which will
panic on non-UTF-8 filenames; change this to safely handle OS strings (e.g., use
entry.file_name().to_string_lossy() to get a Cow<str> or match
entry.file_name().into_string().ok() and skip/log non-UTF8 entries) so the probe
doesn't abort on valid filesystem entries; update the usage of filename
accordingly wherever filename is used.
- Around line 63-94: After loading cases with load_cases, add a fast-fail when
the filtered corpus is empty: check cases.is_empty() after the load and before
printing measurements (and before calling measure/run_cases) and print an error
message (e.g., "no cases matched filters" or include filter info) then exit with
a non-zero code. Modify the block around the variable cases (used by load_cases,
cases.iter(), measure, and run_cases) to perform this early return so you don't
emit timing/throughput for zero work.

In `@scripts/compare_backends.sh`:
- Around line 72-74: The timed "xz-tests" Rust build is still using workspace
defaults so it may build the wrong crate; update the cargo invocation that uses
ROOT_RUST_TARGET (the line with env CARGO_TARGET_DIR="$ROOT_RUST_TARGET" cargo
test --release --no-run) to explicitly target the root crate by adding a
manifest or package selector (e.g. --manifest-path pointing at the root
Cargo.toml or -p <root-package-name>) so the script deterministically builds the
intended root test binary instead of relying on workspace defaults.

In `@scripts/compare_workloads.sh`:
- Around line 53-64: The benchmark commands RUST_CMD and C_CMD drop
decode-related flags (like --iters, --warmup) because only COMPRESSED_INPUT was
derived from RAW_ARGS; modify the construction of RUST_CMD and C_CMD to append
the caller's decode settings (use RAW_ARGS or the specific parsed flags) so
perf-probe invokes the exact same CLI shape the caller requested while still
passing --compressed-input "$COMPRESSED_INPUT" and --workload decode; ensure you
reference and preserve the existing variables RUST_CMD, C_CMD, RAW_ARGS,
COMPRESSED_INPUT and include RAW_ARGS (or the subset of decode flags) when
building those arrays.

In `@scripts/profile_backend.sh`:
- Around line 4-16: Update the usage/help block in scripts/profile_backend.sh to
remove the unsupported "both" backend from the supported backend list and all
examples (so the header "Usage: scripts/profile_backend.sh
<c|liblzma-sys|xz|rust|xz-sys|both> ..." becomes
"...<c|liblzma-sys|xz|rust|xz-sys>..."), and likewise remove any example lines
that show "both"; also adjust any descriptive text that mentions comparing C and
Rust sys backends in the same process (per the runtime rejection) so the help no
longer suggests the unsupported mode—ensure you edit the Usage and Examples
sections (and the similar text referenced around lines 43-45) in this file.

In `@tests/sys_equivalence.rs`:
- Line 1: This test binary currently ends up linking both C and Rust sys
backends (xz_sys and liblzma_sys) and must be restricted to a single backend;
update tests/sys_equivalence.rs and associated test-build cfgs so the binary is
compiled with only one backend enabled (e.g., gate code behind feature flags or
target cfgs instead of including both backends) and remove any direct references
that cause simultaneous linking of xz_sys and liblzma_sys; then move the
cross-backend equivalence comparison out of-process into the existing
process-isolated comparison workflow (the compare-backends script) so lzma_*
symbol collisions cannot occur.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c873bdf4-36b3-4b52-96ae-041eecc119ad

📥 Commits

Reviewing files that changed from the base of the PR and between 0540c53 and 01674c9.

📒 Files selected for processing (22)
  • .github/workflows/main.yml
  • Cargo.toml
  • benches/backend_comparison.rs
  • docs/performance-workflow.md
  • examples/bufread_trailing_probe.rs
  • examples/qc_probe.rs
  • examples/standard_files_probe.rs
  • perf-probe/Cargo.toml
  • perf-probe/src/main.rs
  • scripts/compare_api_workloads.sh
  • scripts/compare_backends.sh
  • scripts/compare_workloads.sh
  • scripts/profile_backend.sh
  • src/bufread.rs
  • src/lib.rs
  • src/read.rs
  • src/stream.rs
  • src/write.rs
  • systest/Cargo.toml
  • systest/src/main.rs
  • tests/sys_equivalence.rs
  • xz-sys/src/lib.rs

Comment on lines +114 to +120
let compressed = unsafe { backend_encode(&input) };
group.throughput(Throughput::Bytes(size as u64));

group.bench_with_input(BenchmarkId::new("c", label), &c_compressed, |b, data| {
b.iter(|| unsafe { c_decode(black_box(data), size) })
});
group.bench_with_input(
BenchmarkId::new("rust", label),
&rs_compressed,
|b, data| b.iter(|| unsafe { rs_decode(black_box(data), size) }),
BenchmarkId::new(BACKEND_NAME, label),
&compressed,
|b, data| b.iter(|| unsafe { backend_decode(black_box(data), size) }),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the decode fixture identical across backend builds.

Line 114 precomputes compressed with the backend under test, so cross-backend decode numbers can drift with encoder output shape instead of decoder cost. Use one canonical compressed payload for every backend build.

Based on learnings: Keep workload shape, iteration count, and profiler command stable while iterating on a hotspot during performance optimization.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benches/backend_comparison.rs` around lines 114 - 120, The benchmark
currently precomputes compressed = unsafe { backend_encode(&input) } with the
backend under test, which lets encoder output shape vary between builds and
skews decode comparisons; instead produce a single canonical compressed payload
once (e.g., via a reference encoder or a checked-in fixture) and use that same
payload for every backend's bench input. Replace the use of backend_encode in
benches/backend_comparison.rs so group.bench_with_input and the closure calling
unsafe { backend_decode(black_box(data), size) } receive the
canonical_compressed bytes (keep BenchmarkId::new(BACKEND_NAME, label), label,
size, and iteration/profiler settings unchanged), ensuring backend_decode,
compressed variable, and bench harness remain otherwise identical across backend
builds.

- `--warmup <n>`
- `--preset <level>`

On macOS the script prefers `samply`; on Linux it falls back to `perf`; otherwise it runs the workload plainly with release debuginfo enabled.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document the actual auto profiler selection order.

scripts/profile_backend.sh picks samply whenever it is installed, then falls back to perf, then plain. The current macOS/Linux wording is already stale for environments where both tools are present.

Based on learnings: Use scripts/profile_backend.sh for single-backend profiling analysis.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/performance-workflow.md` at line 127, Update the sentence describing the
"auto" profiler selection to reflect the real selection order used by
scripts/profile_backend.sh: it prefers "samply" if installed, then falls back to
"perf", and finally uses "plain" (release debuginfo) if neither is available;
also mention that when both tools are present the script will still choose
"samply" first and recommend using scripts/profile_backend.sh for single-backend
profiling analysis.

Comment on lines +218 to +230
fn run_decode_case(input: &[u8], trailing: &[u8], decode_case: &[u8]) -> u64 {
let mut decoder_reader = decode_case;
let mut decoder = bufread::XzDecoder::new(&mut decoder_reader);
let mut decompressed = vec![0u8; input.len()];
let read = decoder.read(&mut decompressed).unwrap();
assert_eq!(read, input.len());
assert_eq!(decompressed, input);
assert_eq!(decoder.total_out(), input.len() as u64);

let mut remaining = Vec::new();
let trailing_read = decoder_reader.read_to_end(&mut remaining).unwrap();
assert_eq!(trailing_read, trailing.len());
assert_eq!(remaining, trailing);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Does Rust's std::io::Read::read guarantee filling the provided buffer, or can it return fewer bytes than requested even when more data remains?

💡 Result:

No. std::io::Read::read does not guarantee it will fill the buffer; it may return any n where 0 < n <= buf.len() even if more data will be available later and the stream is not at EOF. (doc.rust-lang.org)

If you need the buffer fully filled (or an error if EOF happens first), use read_exact (or loop on read). (rust-lang.github.io)

Citations:


🏁 Script executed:

# Check if the file exists and examine the exact code at lines 218-230
fd "bufread_trailing_probe.rs" -type f

Repository: youknowone/xz-rs

Length of output: 230


🏁 Script executed:

# Show the file with line numbers to verify the exact content
cat -n examples/bufread_trailing_probe.rs | sed -n '210,235p'

Repository: youknowone/xz-rs

Length of output: 1199


🏁 Script executed:

# Check imports at the top of the file
head -30 examples/bufread_trailing_probe.rs

Repository: youknowone/xz-rs

Length of output: 620


Replace read() with read_exact() to ensure the decompression buffer is fully populated.

Read::read may return short even when more decoded bytes are available, causing spurious assertion failures or data validation errors. Use read_exact instead to guarantee the buffer is filled before inspecting trailing bytes.

Proposed fix
 fn run_decode_case(input: &[u8], trailing: &[u8], decode_case: &[u8]) -> u64 {
     let mut decoder_reader = decode_case;
     let mut decoder = bufread::XzDecoder::new(&mut decoder_reader);
     let mut decompressed = vec![0u8; input.len()];
-    let read = decoder.read(&mut decompressed).unwrap();
-    assert_eq!(read, input.len());
+    decoder.read_exact(&mut decompressed).unwrap();
     assert_eq!(decompressed, input);
     assert_eq!(decoder.total_out(), input.len() as u64);
+    drop(decoder);
 
     let mut remaining = Vec::new();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn run_decode_case(input: &[u8], trailing: &[u8], decode_case: &[u8]) -> u64 {
let mut decoder_reader = decode_case;
let mut decoder = bufread::XzDecoder::new(&mut decoder_reader);
let mut decompressed = vec![0u8; input.len()];
let read = decoder.read(&mut decompressed).unwrap();
assert_eq!(read, input.len());
assert_eq!(decompressed, input);
assert_eq!(decoder.total_out(), input.len() as u64);
let mut remaining = Vec::new();
let trailing_read = decoder_reader.read_to_end(&mut remaining).unwrap();
assert_eq!(trailing_read, trailing.len());
assert_eq!(remaining, trailing);
fn run_decode_case(input: &[u8], trailing: &[u8], decode_case: &[u8]) -> u64 {
let mut decoder_reader = decode_case;
let mut decoder = bufread::XzDecoder::new(&mut decoder_reader);
let mut decompressed = vec![0u8; input.len()];
decoder.read_exact(&mut decompressed).unwrap();
assert_eq!(decompressed, input);
assert_eq!(decoder.total_out(), input.len() as u64);
drop(decoder);
let mut remaining = Vec::new();
let trailing_read = decoder_reader.read_to_end(&mut remaining).unwrap();
assert_eq!(trailing_read, trailing.len());
assert_eq!(remaining, trailing);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/bufread_trailing_probe.rs` around lines 218 - 230, The test helper
run_decode_case uses decoder.read(&mut decompressed) which may return a short
read; replace that call with decoder.read_exact(&mut decompressed) to guarantee
the entire decompressed buffer is filled before asserting contents and
total_out. Update the code in run_decode_case to call read_exact on
bufread::XzDecoder (and handle the unwrap/error the same way) so subsequent
assertions on decompressed, decoder.total_out(), and reading trailing bytes from
decoder_reader remain valid.

Comment thread examples/qc_probe.rs
Comment on lines +9 to +14
#[cfg(feature = "xz")]
const BACKEND_NAME: &str = "xz";
#[cfg(feature = "xz-sys")]
const BACKEND_NAME: &str = "xz-sys";
#[cfg(feature = "liblzma-sys")]
const BACKEND_NAME: &str = "liblzma-sys";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add compile-time enforcement for exactly one backend feature.

Unlike perf-probe/src/main.rs which uses compile_error! to enforce exactly one backend, this probe relies on implicit cfg behavior. Adding explicit enforcement improves error messages and consistency.

♻️ Proposed fix to add compile-time enforcement
 use liblzma::read;
 use liblzma::write;
 
+#[cfg(all(feature = "xz", any(feature = "xz-sys", feature = "liblzma-sys")))]
+compile_error!("Enable exactly one backend feature: xz, xz-sys, or liblzma-sys");
+#[cfg(all(feature = "xz-sys", feature = "liblzma-sys"))]
+compile_error!("Enable exactly one backend feature: xz, xz-sys, or liblzma-sys");
+#[cfg(not(any(feature = "xz", feature = "xz-sys", feature = "liblzma-sys")))]
+compile_error!("Enable one backend feature: xz, xz-sys, or liblzma-sys");
+
 #[cfg(feature = "xz")]
 const BACKEND_NAME: &str = "xz";

Based on learnings, perf-probe uses strict compile-time enforcement to ensure exactly one backend feature is enabled.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[cfg(feature = "xz")]
const BACKEND_NAME: &str = "xz";
#[cfg(feature = "xz-sys")]
const BACKEND_NAME: &str = "xz-sys";
#[cfg(feature = "liblzma-sys")]
const BACKEND_NAME: &str = "liblzma-sys";
#[cfg(all(feature = "xz", any(feature = "xz-sys", feature = "liblzma-sys")))]
compile_error!("Enable exactly one backend feature: xz, xz-sys, or liblzma-sys");
#[cfg(all(feature = "xz-sys", feature = "liblzma-sys"))]
compile_error!("Enable exactly one backend feature: xz, xz-sys, or liblzma-sys");
#[cfg(not(any(feature = "xz", feature = "xz-sys", feature = "liblzma-sys")))]
compile_error!("Enable one backend feature: xz, xz-sys, or liblzma-sys");
#[cfg(feature = "xz")]
const BACKEND_NAME: &str = "xz";
#[cfg(feature = "xz-sys")]
const BACKEND_NAME: &str = "xz-sys";
#[cfg(feature = "liblzma-sys")]
const BACKEND_NAME: &str = "liblzma-sys";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/qc_probe.rs` around lines 9 - 14, Add explicit compile-time checks
to ensure exactly one backend feature is enabled for the BACKEND_NAME constant:
use cfg-based compile_error! guards that trigger when none of the features
("xz", "xz-sys", "liblzma-sys") are enabled and when more than one is enabled,
so the compiler emits a clear error rather than relying on implicit cfg
fallback; update the module around the BACKEND_NAME const to include these
#[cfg(...)] compile_error! cases (none enabled and multiple enabled) and keep
the existing per-feature BACKEND_NAME definitions unchanged.

Comment thread examples/qc_probe.rs
Comment on lines +189 to +194
fn next_u64(mut x: u64) -> u64 {
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
x.wrapping_mul(0x2545F4914F6CDD1D)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider documenting the PRNG algorithm.

This is a xorshift64 PRNG. A brief comment would help maintainability.

📝 Proposed documentation
+/// xorshift64 PRNG for deterministic test case generation
 fn next_u64(mut x: u64) -> u64 {
     x ^= x >> 12;
     x ^= x << 25;
     x ^= x >> 27;
     x.wrapping_mul(0x2545F4914F6CDD1D)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn next_u64(mut x: u64) -> u64 {
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
x.wrapping_mul(0x2545F4914F6CDD1D)
}
/// xorshift64 PRNG for deterministic test case generation
fn next_u64(mut x: u64) -> u64 {
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
x.wrapping_mul(0x2545F4914F6CDD1D)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/qc_probe.rs` around lines 189 - 194, next_u64 implements a
xorshift64* pseudorandom number generator but currently lacks documentation; add
a brief comment above the next_u64 function stating that it is the xorshift64*
PRNG (mention the shift amounts 12, 25, 27 and the multiplier
0x2545F4914F6CDD1D), note input/output (takes a u64 state and returns next u64),
and include a short citation/link or reference to Marsaglia/xorshift for
maintainability.

Comment on lines +278 to +283
fn run_good_write(data: &[u8], decoded_seed: &[u8]) -> u64 {
let ret = decoded_seed.to_vec();
let mut w = write::XzDecoder::new_multi_decoder(ret);
w.write_all(data).unwrap();
let ret = w.finish().unwrap();
fold_bytes(data.len() + ret.len(), &ret)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't prefill the write-decoder output buffer with decoded bytes.

write::XzDecoder appends decoded output into the supplied writer. Starting from decoded_seed.to_vec() means every write-phase iteration returns decoded_seed + decoded(data), so the write and both probes benchmark extra copy/append work and compute the wrong digest.

Suggested fix
 fn run_good_write(data: &[u8], decoded_seed: &[u8]) -> u64 {
-    let ret = decoded_seed.to_vec();
+    let ret = Vec::with_capacity(decoded_seed.len());
     let mut w = write::XzDecoder::new_multi_decoder(ret);
     w.write_all(data).unwrap();
     let ret = w.finish().unwrap();
+    debug_assert_eq!(ret.as_slice(), decoded_seed);
     fold_bytes(data.len() + ret.len(), &ret)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/standard_files_probe.rs` around lines 278 - 283, The run_good_write
function incorrectly pre-fills the decoder's output buffer with decoded_seed
(decoded_seed.to_vec()), causing XzDecoder (created via
write::XzDecoder::new_multi_decoder) to append decoded data onto that seed and
produce inflated outputs and wrong digests; fix by supplying an empty buffer
(e.g., Vec::new()) as the writer passed into new_multi_decoder so the decoder
only contains decoded(data) before calling finish and fold_bytes uses the
correct length/digest.

Comment thread perf-probe/src/main.rs
Comment on lines +224 to +230
fn usage() -> String {
let mut message = String::new();
message.push_str("Usage:\n");
message.push_str(
" cargo run -p perf-probe --release --no-default-features --features <liblzma-sys|xz-sys> -- \\\n",
);
message.push_str(" --workload <encode|decode|size|crc32|crc64> [options]\n\n");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Include xz in the usage help.

The CLI supports xz, xz-sys, and liblzma-sys, but the help text only lists the two C-ABI backends. That makes the default/backend examples in this PR look invalid.

Suggested fix
-        "  cargo run -p perf-probe --release --no-default-features --features <liblzma-sys|xz-sys> -- \\\n",
+        "  cargo run -p perf-probe --release --no-default-features --features <xz|xz-sys|liblzma-sys> -- \\\n",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn usage() -> String {
let mut message = String::new();
message.push_str("Usage:\n");
message.push_str(
" cargo run -p perf-probe --release --no-default-features --features <liblzma-sys|xz-sys> -- \\\n",
);
message.push_str(" --workload <encode|decode|size|crc32|crc64> [options]\n\n");
fn usage() -> String {
let mut message = String::new();
message.push_str("Usage:\n");
message.push_str(
" cargo run -p perf-probe --release --no-default-features --features <xz|xz-sys|liblzma-sys> -- \\\n",
);
message.push_str(" --workload <encode|decode|size|crc32|crc64> [options]\n\n");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@perf-probe/src/main.rs` around lines 224 - 230, The usage string in function
usage() incorrectly omits the "xz" backend; update the help text built in
usage() (where message.push_str is used) to include "xz" alongside
"liblzma-sys|xz-sys" (e.g., list "<liblzma-sys|xz-sys|xz>") so the CLI examples
reflect all supported backends and make default/backend examples valid.

Comment thread perf-probe/src/main.rs
Comment on lines +511 to +531
unsafe fn backend_decode(compressed: &[u8], out_size: usize) -> Vec<u8> {
let mut out = vec![0u8; out_size];
let mut memlimit = u64::MAX;
let mut in_pos = 0usize;
let mut out_pos = 0usize;
let ret = unsafe {
lzma_stream_buffer_decode(
&mut memlimit,
0,
ptr::null(),
compressed.as_ptr(),
&mut in_pos,
compressed.len(),
out.as_mut_ptr(),
&mut out_pos,
out.len(),
)
};
assert_eq!(ret, LZMA_OK, "{BACKEND_NAME} decode failed with {ret}");
out.truncate(out_pos);
out
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate that decode consumed the whole payload and produced the expected size.

backend_decode currently treats any LZMA_OK as success, then truncates to out_pos. That means a bad --expected-size or trailing bytes in --compressed-input can still benchmark a partial decode and report it as valid work.

Suggested fix
 unsafe fn backend_decode(compressed: &[u8], out_size: usize) -> Vec<u8> {
     let mut out = vec![0u8; out_size];
     let mut memlimit = u64::MAX;
     let mut in_pos = 0usize;
     let mut out_pos = 0usize;
@@
     };
     assert_eq!(ret, LZMA_OK, "{BACKEND_NAME} decode failed with {ret}");
+    assert_eq!(
+        in_pos,
+        compressed.len(),
+        "{BACKEND_NAME} decode left trailing input: consumed {in_pos} of {} bytes",
+        compressed.len()
+    );
+    assert_eq!(
+        out_pos,
+        out_size,
+        "{BACKEND_NAME} decode produced {out_pos} bytes, expected {out_size}"
+    );
     out.truncate(out_pos);
     out
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@perf-probe/src/main.rs` around lines 511 - 531, The backend_decode function
currently treats any LZMA_OK as success even if not all input was consumed or
the expected output size wasn't produced; update backend_decode to, after
calling lzma_stream_buffer_decode, verify that in_pos == compressed.len() (whole
payload consumed) and out_pos == out_size (expected output produced) and
fail/assert with a clear message (including BACKEND_NAME and ret) if either
check fails; keep these checks adjacent to the existing assert_eq!(ret, LZMA_OK,
...) and return the truncated buffer only when both validations pass.

Comment on lines +53 to +54
env CARGO_TARGET_DIR="$RUST_TARGET" cargo build --example "$EXAMPLE_NAME" --release >/dev/null
env LZMA_API_STATIC=1 CARGO_TARGET_DIR="$C_TARGET" cargo build --example "$EXAMPLE_NAME" --release --no-default-features --features liblzma-sys >/dev/null
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Build the Rust probe with an explicit backend.

Line 53 still relies on workspace defaults, but this script reports that binary as xz. If defaults move, the benchmark silently measures the wrong backend.

Proposed fix
-env CARGO_TARGET_DIR="$RUST_TARGET" cargo build --example "$EXAMPLE_NAME" --release >/dev/null
+env CARGO_TARGET_DIR="$RUST_TARGET" cargo build --example "$EXAMPLE_NAME" --release --no-default-features --features xz >/dev/null
 env LZMA_API_STATIC=1 CARGO_TARGET_DIR="$C_TARGET" cargo build --example "$EXAMPLE_NAME" --release --no-default-features --features liblzma-sys >/dev/null

Based on learnings: Use scripts/compare_api_workloads.sh for high-level API workload performance testing, and keep workload shape, iteration count, and profiler command stable while iterating on a hotspot during performance optimization.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
env CARGO_TARGET_DIR="$RUST_TARGET" cargo build --example "$EXAMPLE_NAME" --release >/dev/null
env LZMA_API_STATIC=1 CARGO_TARGET_DIR="$C_TARGET" cargo build --example "$EXAMPLE_NAME" --release --no-default-features --features liblzma-sys >/dev/null
env CARGO_TARGET_DIR="$RUST_TARGET" cargo build --example "$EXAMPLE_NAME" --release --no-default-features --features xz >/dev/null
env LZMA_API_STATIC=1 CARGO_TARGET_DIR="$C_TARGET" cargo build --example "$EXAMPLE_NAME" --release --no-default-features --features liblzma-sys >/dev/null
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/compare_api_workloads.sh` around lines 53 - 54, The first build
invocation (env CARGO_TARGET_DIR="$RUST_TARGET" cargo build --example
"$EXAMPLE_NAME" --release) relies on workspace defaults and can end up building
the wrong backend; change it to explicitly select the xz backend by setting the
same backend selector used for the lzma build (e.g., export or prefix
LZMA_API_STATIC=0) or by passing the explicit feature that triggers the xz
backend so the script consistently builds the xz example binary instead of
relying on workspace defaults.

Comment on lines +53 to +63
COMMON_CMD=(
cargo run
-p perf-probe
--release
--no-default-features
--features "$FEATURE"
--
--workload "$WORKLOAD"
)
COMMON_CMD+=("$@")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Block --workload from the passthrough args.

This script fixes the workload positionally, then appends "$@". perf-probe accepts repeated --workload flags, so callers can silently run a different workload than the one used in the output filename and console labeling.

Suggested fix
+for arg in "$@"; do
+  if [[ "$arg" == "--workload" ]]; then
+    echo "pass the workload as the second positional argument, not via --workload" >&2
+    exit 2
+  fi
+done
+
 COMMON_CMD=(
   cargo run
   -p perf-probe
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
COMMON_CMD=(
cargo run
-p perf-probe
--release
--no-default-features
--features "$FEATURE"
--
--workload "$WORKLOAD"
)
COMMON_CMD+=("$@")
for arg in "$@"; do
if [[ "$arg" == "--workload" ]]; then
echo "pass the workload as the second positional argument, not via --workload" >&2
exit 2
fi
done
COMMON_CMD=(
cargo run
-p perf-probe
--release
--no-default-features
--features "$FEATURE"
--
--workload "$WORKLOAD"
)
COMMON_CMD+=("$@")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/profile_backend.sh` around lines 53 - 63, The script currently
appends all passthrough args ("$@") to COMMON_CMD which allows callers to
override the fixed --workload; before adding "$@" filter out any user-supplied
--workload (and its value) so repeated --workload flags cannot override the
fixed --workload used in the filename/labels. Update the logic around COMMON_CMD
and the place where COMMON_CMD+=("$@") is done to iterate over "$@" and skip any
argument equal to --workload and the following value (or any --workload=...
form), then append only the filtered args to COMMON_CMD.

@youknowone youknowone force-pushed the tuning branch 4 times, most recently from 9028173 to 41a45b9 Compare April 12, 2026 06:00
@youknowone youknowone merged commit 327e82d into main Apr 13, 2026
47 checks passed
youknowone added a commit that referenced this pull request Apr 13, 2026
* Add backend performance workflow and tuning

* Rename pure Rust backend crates to xz and xz-sys

* Rename backend features to xz-sys/liblzma-sys

- Replace the `rust-backend`/`c-backend` Cargo features with
  `xz-sys`/`liblzma-sys`, matching the underlying sys crate names.
  Default feature is `xz-sys`. CI, examples, probes, scripts,
  docs, benches, and `perf-probe` are updated to the new names.
- Alias the optional dependencies as `xz-sys-rs` and `liblzma-sys-c`
  in the workspace so both backends can resolve simultaneously.
- Add `lzma_*_rs` `#[inline(always)]` wrappers in `xz-sys` and route
  `src/lib.rs` and `src/stream.rs` through them when the `xz-sys`
  feature is on, bypassing the `extern "C"` thunks.
- Tighten visibility of internal coder/init helpers in the `xz`
  crate (many `pub` items become `pub(crate)`); drop the spurious
  `extern "C"` from `sha256::rotr_32`.
- Speed up `crc64_fast::lzma_crc64_generic` with an unrolled
  8-byte step, and reimplement `types::read32le`/`write32le` with
  `core::ptr::read_unaligned`/`write_unaligned`.
- Add a backend selection section to README.

* Split backends into xz, xz-sys, and liblzma-sys
@coderabbitai coderabbitai Bot mentioned this pull request Apr 13, 2026
This was referenced Apr 20, 2026
This was referenced Apr 28, 2026
Merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant