Skip to content

Streaming Compression support for RDB#3531

Open
sarthakaggarwal97 wants to merge 2 commits intovalkey-io:unstablefrom
sarthakaggarwal97:streaming-compression-rio-pr
Open

Streaming Compression support for RDB#3531
sarthakaggarwal97 wants to merge 2 commits intovalkey-io:unstablefrom
sarthakaggarwal97:streaming-compression-rio-pr

Conversation

@sarthakaggarwal97
Copy link
Copy Markdown
Contributor

@sarthakaggarwal97 sarthakaggarwal97 commented Apr 17, 2026

Today, rdbcompression only affects individual string payloads inside an otherwise normal RDB stream. This PR introduces Valkey Compressed Stream (VKCS) format for RDB persistence, with lz4 as the first supported codec.
The RDB can now be wrapped in a VKCS envelope and compressed as a single stream at the rio layer. The default behavior remains unchanged (lzf), while rdb-compression-algo lz4 enables the new streaming format.

VKCS

A VKCS stream is identified by a header at the start of the file and carries enough metadata to classify the stream before loading it, including the stream kind and codec/checksum flags. On load, the RDB path probes for this envelope first. If present and valid, the input is transparently decompressed before normal RDB parsing continues. If absent, loading falls back to the existing uncompressed RDB path. If the envelope is malformed or incompatible, the load fails early.

Streaming Compression Model

The main structural change is that streaming compression is implemented as a rio wrapper rather than as part of object serialization. That keeps the RDB serializer working against a normal byte stream while the wrapper handles VKCS framing, compression/decompression state, buffering, and checksum behavior.

There are now effectively two RDB compression modes. lzf remains the default and preserves the old behavior of compressing individual string payloads inside the RDB. lz4 enables whole-stream compression for file-backed RDB persistence. When the streaming wrapper is active, the per-string LZF path is skipped so we do not layer payload compression on top of whole-stream compression.

Config

This PR adds:

  • rdb-compression-algo with lzf and lz4
  • rdb-compression-level for codecs that support levels, currently only lz4

This PR is intentionally scoped to RDB streaming compression and the persistence/load paths that need to understand it. It does not add diskless sync and replication compression APIs.

Benchmarks

Benchmarked on r7g.2xlarge (Graviton, 8 vCPUs, 61GB RAM, NVMe). All results averaged over 3 repeats.

Datasets

  • Improved Realistic JSON: Synthetic JSON with natural language text, varied field types, per-key PRNG for high variety. Tested at 100B–10KB value sizes.
final_improved_combined
  • BlockMesh Tweets — 1M/5M unique real tweets from BlockMesh/tweets. Multilingual, avg ~270B, zero cross-key repetition.
final_blockmesh_combined

Notes:

  1. LZ4 streaming beats LZF on every metric at every size. 30-77% faster saves, 24-73% faster loads, 45-73% smaller RDBs.
  2. LZ4 library is currently vendored from https://github.com/lz4/lz4. The decision was taken here [NEW] RDB Compression via LZ4, Batching, and Batch-Level Dictionary #1962 (comment)

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
@sarthakaggarwal97 sarthakaggarwal97 force-pushed the streaming-compression-rio-pr branch from c77cc98 to 616d763 Compare April 17, 2026 23:39
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 18, 2026

Codecov Report

❌ Patch coverage is 93.04164% with 132 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.79%. Comparing base (b2d08c9) to head (616d763).
⚠️ Report is 2 commits behind head on unstable.

Files with missing lines Patch % Lines
src/compression_stream.c 90.27% 39 Missing ⚠️
src/compression_rio.c 79.20% 26 Missing ⚠️
src/valkey-check-rdb.c 61.29% 24 Missing ⚠️
src/rio.c 77.50% 18 Missing ⚠️
src/rdb.c 87.09% 12 Missing ⚠️
src/compression.c 91.54% 6 Missing ⚠️
src/aof.c 78.94% 4 Missing ⚠️
src/compression_lz4.c 97.59% 2 Missing ⚠️
src/unit/test_compression.cpp 99.89% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3531      +/-   ##
============================================
+ Coverage     76.40%   76.79%   +0.39%     
============================================
  Files           159      164       +5     
  Lines         79851    81681    +1830     
============================================
+ Hits          61008    62728    +1720     
- Misses        18843    18953     +110     
Files with missing lines Coverage Δ
src/config.c 78.77% <100.00%> (+0.43%) ⬆️
src/rdb.h 100.00% <ø> (ø)
src/rio.h 100.00% <ø> (ø)
src/server.h 100.00% <ø> (ø)
src/unit/test_compression.cpp 99.89% <99.89%> (ø)
src/compression_lz4.c 97.59% <97.59%> (ø)
src/aof.c 80.29% <78.94%> (-0.02%) ⬇️
src/compression.c 91.54% <91.54%> (ø)
src/rdb.c 77.07% <87.09%> (-0.05%) ⬇️
src/rio.c 84.98% <77.50%> (+0.72%) ⬆️
... and 3 more

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sarthakaggarwal97 sarthakaggarwal97 changed the title Compression: Add Streaming RIO-based RDB compression Add Streaming Compression support for RDB Apr 20, 2026
@sarthakaggarwal97 sarthakaggarwal97 changed the title Add Streaming Compression support for RDB Streaming Compression support for RDB Apr 20, 2026
@sarthakaggarwal97 sarthakaggarwal97 marked this pull request as ready for review April 21, 2026 16:13
some perf improvements and bug fixes

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant