Skip to content

zstream: avoid unnecessary checksum modifications#18293

Draft
GarthSnyder wants to merge 1 commit intoopenzfs:masterfrom
GarthSnyder:pr-zstream-conclusion
Draft

zstream: avoid unnecessary checksum modifications#18293
GarthSnyder wants to merge 1 commit intoopenzfs:masterfrom
GarthSnyder:pr-zstream-conclusion

Conversation

@GarthSnyder
Copy link
Contributor

The current zstream subcommands write end-of-record checksums for all DRR_END records. However, this behavior is inconsistent with the streams generated by zfs send; any DRR_END record generated by send_conclusion_record() in libzfs_sendrecv.c:2175 has the checksum set to zero. This PR adjusts zstream to mimic zfs send's behavior.

Checksums that are currently being modified include the doubled END in a replication stream as well as some stream-internal END cases. zfs receive doesn't mind the current behavior because it seems to ignore these checksums anyway.

The problem posed by the current behavior is that it breaks the idempotence of zstream commands. A zstream redup that's given no writes to redup should yield back the original input stream, and so for recompress. Currently, the output is not identical. That introduces noise that developers have to filter out to debug stream processing, and it prevents end-users from detecting changes with simple tools such as hashes.

This PR modifies dump_record() to replicate the behavior of zfs send. If a DRR_END record has a drr_toguid of zero and its incoming checksum is zero, then the checksum is left at zero.

Subcommands were formerly erasing all end-record checksums after validation, with comments remarking that this "needs to be done". The "needs to be done" part seems to stem from an assertion in dump_record() that requires outgoing records to initially have zero checksums.

I suspect the erasing and assertion are holdovers from libzfs, where they probably do serve some purpose. However, nothing within zstream currently seems to rely on this behavior. Removing that code allows the original checksums to flow through to dump_record(), where they can be inspected to determine which END records previously had waived checksums.

Without access to the original checksums, dump_record() would either need to maintain additional state about the stream or rely on the zero toguid as a marker. If either of those approaches is preferable, I will adjust.

The drr_toguid derives from ds_guid, which appears to be randomly generted and not explicitly checked against zero. Is zero a potentially valid GUID?

How Has This Been Tested?

I've run this round-trip on a selection of test streams using zstream recompress and zstream dedup/redup (dedup is a potential upcoming PR). The round-trip output is identical to the original input. The one-way files are accepted by zfs receive.

Types of changes

  • [ X] Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • [X ] My code follows the OpenZFS code style requirements.
  • I have updated the documentation accordingly.
  • [ X] I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • [ X] All commit messages are properly formatted and contain Signed-off-by.

Copilot AI review requested due to automatic review settings March 7, 2026 03:33
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates zstream’s stream rewriting logic to better match zfs send output by avoiding unnecessary modifications to DRR_END record-end checksums, restoring idempotence for “no-op” transformations.

Changes:

  • Update dump_record() to preserve a zero record-end checksum for “conclusion” DRR_END records (toguid == 0 and incoming record-end checksum == 0).
  • Remove now-unnecessary checksum-zeroing of outgoing records in zstream subcommands that rewrite streams (redup/recompress/decompress).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
cmd/zstream/zstream_util.c Adds conclusion-DRR_END detection and preserves zero record-end checksums to emulate zfs send.
cmd/zstream/zstream_redup.c Removes pre-write checksum field zeroing so original checksums can flow into dump_record().
cmd/zstream/zstream_recompress.c Removes pre-write checksum field zeroing; relies on updated dump_record() behavior.
cmd/zstream/zstream_decompress.c Removes pre-write checksum field zeroing; relies on updated dump_record() behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@GarthSnyder GarthSnyder changed the title zstream: remove unnecessary checksum modifications zstream: avoid unnecessary checksum modifications Mar 7, 2026
The current zstream subcommands write end-of-record checksums for all
DRR_END records. However, this behavior is inconsistent with the
streams generated by zfs send; any DRR_END record generated by
send_conclusion_record() in libzfs_sendrecv.c:2175 has the checksum set
to zero. This PR adjusts zstream to mimic zfs send's behavior.

Checksums that are currently being modified include the doubled END in a
replication stream as well as some stream-internal END cases. zfs
receive doesn't mind the current behavior because it seems to ignore
these checksums anyway.

The problem posed by the current behavior is that it breaks the
idempotence of zstream commands. A zstream redup that's given no writes
to redup should yield back the original input stream, and so for
recompress. Currently, the output is not identical. That introduces
noise that developers have to filter out to debug stream processing,
and it prevents end-users from detecting changes with simple tools like
hashes.

This PR modifies dump_record() to replicate the behavior of zfs send. If
a DRR_END record has a drr_toguid of zero and its incoming checksum is
zero, then the checksum is left at zero.

Subcommands were formerly erasing all end-record checksums after
validation, with comments remarking that this "needs to be done". The
"needs to be done" part seems to stem from an assertion in dump_record()
that requires outgoing records to start with zero checksums.

I suspect the erasing and assertion are holdovers from libzfs,
where they probably do serve some purpose. However, nothing within
zstream currently seems to rely on this behavior. Removing that code
allows the original checksums to flow through to dump_record(), where
they can be inspected to determine which END records previously had
waived checksums.

Without access to the original checksums, dump_record() would either
need to maintain additional state about the stream or rely on the zero
toguid as a marker. If either of those approaches is preferable, I will
adjust.

The drr_toguid derives from ds_guid, which appears to be randomly
generated and not explicitly checked against zero. Is zero a potentially
valid GUID?

Signed-off-by: Garth Snyder <garth@garthsnyder.com>

Claude test

Fix recompression order
@GarthSnyder GarthSnyder force-pushed the pr-zstream-conclusion branch from 617b9ea to 3e42b61 Compare March 8, 2026 05:01
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Mar 13, 2026
@GarthSnyder GarthSnyder marked this pull request as draft March 14, 2026 23:44
@github-actions github-actions bot added Status: Work in Progress Not yet ready for general review and removed Status: Code Review Needed Ready for review and testing labels Mar 14, 2026
@GarthSnyder
Copy link
Contributor Author

I have a broader proposal coming up for some refactoring in zstream that would supersede this code, so I'm switching this PR to draft status. However, the underlying issue will still need to be resolved. I'll xref this from the other PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Work in Progress Not yet ready for general review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants