Streaming Compression support for RDB by sarthakaggarwal97 · Pull Request #3531 · valkey-io/valkey

sarthakaggarwal97 · 2026-04-17T23:35:39Z

Today, rdbcompression only affects individual string payloads inside an otherwise normal RDB stream. This PR introduces Valkey Compressed Stream (VKCS) format for RDB persistence, with lz4 as the first supported codec.
The RDB can now be wrapped in a VKCS envelope and compressed as a single stream at the rio layer. The default behavior remains unchanged (lzf), while rdb-compression-algo lz4 enables the new streaming format.

`VKCS`

A VKCS stream is identified by a header at the start of the file and carries enough metadata to classify the stream before loading it, including the stream kind and codec/checksum flags. On load, the RDB path probes for this envelope first. If present and valid, the input is transparently decompressed before normal RDB parsing continues. If absent, loading falls back to the existing uncompressed RDB path. If the envelope is malformed or incompatible, the load fails early.

Streaming Compression Model

The main structural change is that streaming compression is implemented as a rio wrapper rather than as part of object serialization. That keeps the RDB serializer working against a normal byte stream while the wrapper handles VKCS framing, compression/decompression state, buffering, and checksum behavior.

There are now effectively two RDB compression modes. lzf remains the default and preserves the old behavior of compressing individual string payloads inside the RDB. lz4 enables whole-stream compression for file-backed RDB persistence. When the streaming wrapper is active, the per-string LZF path is skipped so we do not layer payload compression on top of whole-stream compression.

Config

This PR adds:

rdb-compression-algo with lzf and lz4
rdb-compression-level for codecs that support levels, currently only lz4

This PR is intentionally scoped to RDB streaming compression and the persistence/load paths that need to understand it. It does not add diskless sync and replication compression APIs.

Benchmarks

Benchmarked on r7g.2xlarge (Graviton, 8 vCPUs, 61GB RAM, NVMe). All results averaged over 3 repeats.

Datasets

Improved Realistic JSON: Synthetic JSON with natural language text, varied field types, per-key PRNG for high variety. Tested at 100B–10KB value sizes.

BlockMesh Tweets — 1M/5M unique real tweets from BlockMesh/tweets. Multilingual, avg ~270B, zero cross-key repetition.

Notes:

LZ4 streaming beats LZF on every metric at every size. 30-77% faster saves, 24-73% faster loads, 45-73% smaller RDBs.
LZ4 library is currently vendored from https://github.com/lz4/lz4. The decision was taken here [NEW] RDB Compression via LZ4, Batching, and Batch-Level Dictionary #1962 (comment)

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

codecov · 2026-04-18T00:11:47Z

Codecov Report

❌ Patch coverage is 93.04164% with 132 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.79%. Comparing base (b2d08c9) to head (616d763).
⚠️ Report is 2 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/compression_stream.c	90.27%	39 Missing ⚠️
src/compression_rio.c	79.20%	26 Missing ⚠️
src/valkey-check-rdb.c	61.29%	24 Missing ⚠️
src/rio.c	77.50%	18 Missing ⚠️
src/rdb.c	87.09%	12 Missing ⚠️
src/compression.c	91.54%	6 Missing ⚠️
src/aof.c	78.94%	4 Missing ⚠️
src/compression_lz4.c	97.59%	2 Missing ⚠️
src/unit/test_compression.cpp	99.89%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #3531      +/-   ##
============================================
+ Coverage     76.40%   76.79%   +0.39%     
============================================
  Files           159      164       +5     
  Lines         79851    81681    +1830     
============================================
+ Hits          61008    62728    +1720     
- Misses        18843    18953     +110

Files with missing lines	Coverage Δ
src/config.c	`78.77% <100.00%> (+0.43%)`	⬆️
src/rdb.h	`100.00% <ø> (ø)`
src/rio.h	`100.00% <ø> (ø)`
src/server.h	`100.00% <ø> (ø)`
src/unit/test_compression.cpp	`99.89% <99.89%> (ø)`
src/compression_lz4.c	`97.59% <97.59%> (ø)`
src/aof.c	`80.29% <78.94%> (-0.02%)`	⬇️
src/compression.c	`91.54% <91.54%> (ø)`
src/rdb.c	`77.07% <87.09%> (-0.05%)`	⬇️
src/rio.c	`84.98% <77.50%> (+0.72%)`	⬆️
... and 3 more

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

some perf improvements and bug fixes Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

github-actions Bot assigned sarthakaggarwal97 Apr 17, 2026

compression: add streaming rio-based RDB compression

616d763

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

sarthakaggarwal97 force-pushed the streaming-compression-rio-pr branch from c77cc98 to 616d763 Compare April 17, 2026 23:39

sarthakaggarwal97 changed the title ~~Compression: Add Streaming RIO-based RDB compression~~ Add Streaming Compression support for RDB Apr 20, 2026

sarthakaggarwal97 changed the title ~~Add Streaming Compression support for RDB~~ Streaming Compression support for RDB Apr 20, 2026

sarthakaggarwal97 marked this pull request as ready for review April 21, 2026 16:13

some perf improvements and bug fixes

53f7ed8

some perf improvements and bug fixes Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming Compression support for RDB#3531

Streaming Compression support for RDB#3531
sarthakaggarwal97 wants to merge 2 commits intovalkey-io:unstablefrom
sarthakaggarwal97:streaming-compression-rio-pr

sarthakaggarwal97 commented Apr 17, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sarthakaggarwal97 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

VKCS

Streaming Compression Model

Config

Benchmarks

Datasets

Uh oh!

codecov Bot commented Apr 18, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sarthakaggarwal97 commented Apr 17, 2026 •

edited

Loading

`VKCS`