Streaming Compression support for RDB#3531
Open
sarthakaggarwal97 wants to merge 2 commits intovalkey-io:unstablefrom
Open
Streaming Compression support for RDB#3531sarthakaggarwal97 wants to merge 2 commits intovalkey-io:unstablefrom
sarthakaggarwal97 wants to merge 2 commits intovalkey-io:unstablefrom
Conversation
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
c77cc98 to
616d763
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #3531 +/- ##
============================================
+ Coverage 76.40% 76.79% +0.39%
============================================
Files 159 164 +5
Lines 79851 81681 +1830
============================================
+ Hits 61008 62728 +1720
- Misses 18843 18953 +110
🚀 New features to boost your workflow:
|
some perf improvements and bug fixes Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Today,
rdbcompressiononly affects individual string payloads inside an otherwise normal RDB stream. This PR introduces Valkey Compressed Stream (VKCS) format for RDB persistence, withlz4as the first supported codec.The RDB can now be wrapped in a
VKCSenvelope and compressed as a single stream at theriolayer. The default behavior remains unchanged (lzf), whilerdb-compression-algo lz4enables the new streaming format.VKCSA
VKCSstream is identified by a header at the start of the file and carries enough metadata to classify the stream before loading it, including the stream kind and codec/checksum flags. On load, the RDB path probes for this envelope first. If present and valid, the input is transparently decompressed before normal RDB parsing continues. If absent, loading falls back to the existing uncompressed RDB path. If the envelope is malformed or incompatible, the load fails early.Streaming Compression Model
The main structural change is that streaming compression is implemented as a
riowrapper rather than as part of object serialization. That keeps the RDB serializer working against a normal byte stream while the wrapper handlesVKCSframing, compression/decompression state, buffering, and checksum behavior.There are now effectively two RDB compression modes.
lzfremains the default and preserves the old behavior of compressing individual string payloads inside the RDB.lz4enables whole-stream compression for file-backed RDB persistence. When the streaming wrapper is active, the per-string LZF path is skipped so we do not layer payload compression on top of whole-stream compression.Config
This PR adds:
rdb-compression-algowithlzfandlz4rdb-compression-levelfor codecs that support levels, currently onlylz4This PR is intentionally scoped to RDB streaming compression and the persistence/load paths that need to understand it. It does not add diskless sync and replication compression APIs.
Benchmarks
Benchmarked on r7g.2xlarge (Graviton, 8 vCPUs, 61GB RAM, NVMe). All results averaged over 3 repeats.
Datasets
Notes: