Skip to content

fix: key encode use little-endian#262

Open
happy-v587 wants to merge 2 commits intoarana-db:mainfrom
happy-v587:fix-encode-decode
Open

fix: key encode use little-endian#262
happy-v587 wants to merge 2 commits intoarana-db:mainfrom
happy-v587:fix-encode-decode

Conversation

@happy-v587
Copy link
Copy Markdown
Collaborator

@happy-v587 happy-v587 commented Apr 12, 2026

Summary by CodeRabbit

  • Documentation

    • Added comprehensive doc describing key layout and on-disk encoding for all data types.
  • New Features

    • Introduced a ZSet score→member key format and corresponding types/APIs for encoding, decoding, and seek operations.
  • Refactor

    • Reorganized internal formatting modules and public exports for consistent naming.
  • Bug Fixes

    • Fixed byte-ordering for version/score encoding/parsing to ensure correct ordering.
  • Tests

    • Expanded unit tests for score/key ordering and many edge cases (NaN, infinities, ±0, boundaries).

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 12, 2026

📝 Walkthrough

Walkthrough

This change renames many internal format modules to format_*, adds a new format_zset_score_key.rs with ZSet score-member encoding/decoding and tests, removes the old zset module, switches MemberDataKey version endianness to little-endian, updates imports/usages across storage, and adds doc/01-key编解码.md.

Changes

Cohort / File(s) Summary
Documentation
doc/01-key编解码.md
Added key-encoding document describing RocksDB-style key layouts and per-type on-disk formats and access flows.
ZSet score key (new)
src/storage/src/format_zset_score_key.rs, src/storage/src/custom_comparator.rs
Added ZSetsScoreKey/ParsedZSetsScoreKey, ScoreMember types, extensive encode/decode and comparator tests (NaN, ±Inf, ±0, versions, ordering).
ZSet score key (removed)
src/storage/src/zset_score_key_format.rs
Removed previous zset score key implementation (functionality moved to format_zset_score_key.rs).
Module renames & re-exports
src/storage/src/lib.rs
Renamed many modules from *_formatformat_*, added mod format_zset_score_key, and rewired public re-exports.
Member key endianness
src/storage/src/format_member_data_key.rs
Changed version serialization/deserialization from big-endian to little-endian.
Score decoding & comparator tests
src/storage/src/redis_multi.rs, src/storage/src/custom_comparator.rs
Changed score bytes decoding to little-endian and added comparator/unit tests to cover ordering semantics.
Sets: use MemberDataKey API
src/storage/src/redis_sets.rs
Replaced manual BytesMut member-key construction with MemberDataKey::new(...).encode() and encode_seek_key() for prefixes.
Import path updates
src/storage/src/...
Updated many files to import from format_* modules (data_compaction_filter.rs, meta_compaction_filter.rs, expiration_manager.rs, format_base_.rs, redis_.rs, storage.rs, storage_impl.rs, statistics.rs, tests/*, etc.).

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

☢️ Bug

Suggested reviewers

  • marsevilspirit

Poem

🐰 I hopped through modules, swift and keen,
Renamed the paths to keep things clean,
Little-endian tuck, ZSets learn to sing,
Keys encoded snug — a joyful spring,
🥕 Hop, parse, and bloom in bytes I bring.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: key encode use little-endian' directly relates to the main objective of this PR, which standardizes key encoding across multiple modules to use little-endian format instead of big-endian, as evidenced by changes in MemberDataKey, ZSetsScoreKey, and corresponding updates throughout the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@happy-v587 happy-v587 force-pushed the fix-encode-decode branch 2 times, most recently from ec39730 to 6bf161a Compare April 12, 2026 01:14
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/storage/src/format_member_data_key.rs (1)

98-125: ⚠️ Potential issue | 🟠 Major

Return Err for short keys instead of panicking.

This parser still subtracts SUFFIX_RESERVE_LENGTH and uses expect before validating the buffer size. A truncated or corrupted key will panic in scan paths rather than surface InvalidFormat, which is much harder to recover from in production.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/format_member_data_key.rs` around lines 98 - 125, The
new(encoded_key: &[u8]) parser may panic on short/truncated buffers; before
taking slices or calling try_into/expect you must validate encoded_key.len() is
at least PREFIX_RESERVE_LENGTH + SUFFIX_RESERVE_LENGTH + size_of::<u64>() and
that seek_userkey_delim(...) returns a delimiter within the available range; if
any check fails return Err(InvalidFormat) instead of slicing/expect-ing.
Concretely, in new ensure end_idx is not less than start_idx, validate
key_end_idx <= end_idx and version_end_idx <= end_idx, and replace the
version_slice.try_into().expect(...) with a safe conversion that returns
Err(InvalidFormat) when the slice length is wrong; perform similar bounds checks
before copying reserve slices (reserve1/reserve2) and decoding the user key
(decode_user_key) so no slice/try_into can panic.
src/storage/src/redis_strings.rs (1)

2189-2213: ⚠️ Potential issue | 🟠 Major

Unreachable else if drops non-string keys from scan_keys/random_key.

At Line 2189 and Line 2201 (and similarly Line 2311 and Line 2320), both branches use ParsedBaseKey::new(...). Since that parse succeeds for valid meta keys too, the else if branch never executes, so hash/set/zset/list meta-value paths are skipped.

🔧 Suggested fix (apply same pattern in both functions)
-            if let Ok(base_key) = crate::format_base_key::ParsedBaseKey::new(&key_bytes[..]) {
-                let key_str = String::from_utf8_lossy(base_key.key());
-
-                // Check if key is not expired
-                if let Ok(parsed) = ParsedStringsValue::new(&value_bytes[..]) {
-                    if !parsed.is_stale() {
-                        // Simple pattern matching - in a full implementation, we'd support glob patterns
-                        if pattern == "*" || key_str.contains(pattern) {
-                            keys.push(key_str.to_string());
-                        }
-                    }
-                }
-            } else if let Ok(meta_key) = crate::format_base_key::ParsedBaseKey::new(&key_bytes[..])
-            {
-                let key_str = String::from_utf8_lossy(meta_key.key());
-
-                // Check if meta key is not expired using appropriate parser
-                if !value_bytes.is_empty() {
-                    let is_valid = if let Ok(meta) =
-                        crate::format_base_meta_value::ParsedBaseMetaValue::new(&value_bytes[..])
-                    {
-                        !meta.is_stale() && meta.count() > 0
-                    } else if let Ok(list_meta) =
-                        crate::format_list_meta_value::ParsedListsMetaValue::new(&value_bytes[..])
-                    {
-                        !list_meta.is_stale() && list_meta.count() > 0
-                    } else {
-                        false
-                    };
-
-                    if is_valid {
-                        // Simple pattern matching
-                        if pattern == "*" || key_str.contains(pattern) {
-                            keys.push(key_str.to_string());
-                        }
-                    }
-                }
-            }
+            if let Ok(parsed_key) = crate::format_base_key::ParsedBaseKey::new(&key_bytes[..]) {
+                let key_str = String::from_utf8_lossy(parsed_key.key());
+                let is_valid = if let Ok(parsed) = ParsedStringsValue::new(&value_bytes[..]) {
+                    !parsed.is_stale()
+                } else if let Ok(meta) =
+                    crate::format_base_meta_value::ParsedBaseMetaValue::new(&value_bytes[..])
+                {
+                    !meta.is_stale() && meta.count() > 0
+                } else if let Ok(list_meta) =
+                    crate::format_list_meta_value::ParsedListsMetaValue::new(&value_bytes[..])
+                {
+                    !list_meta.is_stale() && list_meta.count() > 0
+                } else {
+                    false
+                };
+
+                if is_valid && (pattern == "*" || key_str.contains(pattern)) {
+                    keys.push(key_str.to_string());
+                }
+            }

Use the same single-parse/value-dispatch structure in random_key so non-string live keys are eligible there too.

Also applies to: 2311-2329

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/redis_strings.rs` around lines 2189 - 2213, The bug is that
ParsedBaseKey::new(...) is called twice so the first branch always wins and
meta-type keys (hash/set/zset/list) never reach their parsers; fix both
scan_keys and random_key by parsing the base key only once (call
ParsedBaseKey::new(...) a single time), then attempt value parsing in order: try
ParsedStringsValue::new(...) and handle non-stale string values, otherwise try
the meta parsers (ParsedBaseMetaValue::new, ParsedListsMetaValue::new,
ParsedHashMetaValue::new, etc.) and handle their non-stale/count>0 checks to
include the key; replace the duplicate else-if that re-parses ParsedBaseKey with
this single-parse/value-dispatch pattern so all non-string live keys are
considered.
🧹 Nitpick comments (1)
src/storage/src/format_zset_score_key.rs (1)

67-86: Minor inefficiency: user key encoded twice.

The user key is encoded at line 69 for capacity estimation and again at line 79. Consider reusing the first encoding.

♻️ Suggested optimization
     pub fn encode(&self) -> Result<BytesMut> {
-        let mut encoded_key_buf = BytesMut::new();
-        encode_user_key(&self.key, &mut encoded_key_buf)?;
         let estimated_cap = PREFIX_RESERVE_LENGTH
-            + encoded_key_buf.len()
+            + self.key.len() * 2 + 2  // worst case: every byte escaped + delimiter
             + size_of::<u64>()  // version
             + size_of::<f64>()  // score
             + self.member.len()
             + SUFFIX_RESERVE_LENGTH;
         let mut dst = BytesMut::with_capacity(estimated_cap);

         dst.put_slice(&self.reserve1);
         encode_user_key(&self.key, &mut dst)?;
         // Use little-endian for version and score (custom comparator expects this)
         dst.put_u64_le(self.version);
         dst.put_u64_le(self.score.to_bits());
         dst.put_slice(&self.member);
         dst.put_slice(&self.reserve2);
         Ok(dst)
     }

Alternatively, keep the current approach if exact capacity is preferred over performance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/format_zset_score_key.rs` around lines 67 - 86, In encode()
the user key is encoded twice (encoded_key_buf created by
encode_user_key(&self.key, &mut encoded_key_buf) and then
encode_user_key(&self.key, &mut dst) again); replace the second encode call by
reusing encoded_key_buf (e.g., dst.put_slice(&encoded_key_buf)) so you only
encode the key once, keep the existing capacity calculation (use
encoded_key_buf.len() already present), and ensure the rest of writes
(dst.put_u64_le(self.version), dst.put_u64_le(self.score.to_bits()),
dst.put_slice(&self.member), dst.put_slice(&self.reserve2)) remain the same in
the encode() method.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@doc/01-key编解码.md`:
- Line 16: 修正文档中关于键格式的术语拼写错误:在文件 01-key编解码.md 中将所有出现的 “Filed” 和 “filed” 改为
“Field/field”,并将 “memer” 改为 “member”(例如出现在描述 Hash 类型数据结构的那句 “每个 Filed 对应一条普通数据”
以及其它相似段落),保持大小写一致性并校验其它处没有残留拼写错误以确保描述统一清晰。

In `@doc/文档与代码实现差异分析.md`:
- Around line 84-85: The document shows "初始右边界: `9223372036854775808`
(INT64_MIN)" which is ambiguous; update the text to correctly state signed
INT64_MIN as `-9223372036854775808` if you mean the signed minimum, or
explicitly note that `9223372036854775808` is the unsigned/u64 bit-pattern
equivalent of INT64_MIN (two's‑complement) if that is the intended meaning;
ensure you reference the symbols INT64_MIN and INT64_MAX in the clarification so
readers know which interpretation is used.

In `@src/storage/src/format_zset_score_key.rs`:
- Around line 20-22: The import list in format_zset_score_key.rs includes an
unused symbol ENCODED_KEY_DELIM_SIZE; remove ENCODED_KEY_DELIM_SIZE from the use
declaration that currently imports ENCODED_KEY_DELIM_SIZE, decode_user_key,
encode_user_key, seek_userkey_delim so only the actually used symbols
(decode_user_key, encode_user_key, seek_userkey_delim) remain to eliminate the
unused import warning.

---

Outside diff comments:
In `@src/storage/src/format_member_data_key.rs`:
- Around line 98-125: The new(encoded_key: &[u8]) parser may panic on
short/truncated buffers; before taking slices or calling try_into/expect you
must validate encoded_key.len() is at least PREFIX_RESERVE_LENGTH +
SUFFIX_RESERVE_LENGTH + size_of::<u64>() and that seek_userkey_delim(...)
returns a delimiter within the available range; if any check fails return
Err(InvalidFormat) instead of slicing/expect-ing. Concretely, in new ensure
end_idx is not less than start_idx, validate key_end_idx <= end_idx and
version_end_idx <= end_idx, and replace the version_slice.try_into().expect(...)
with a safe conversion that returns Err(InvalidFormat) when the slice length is
wrong; perform similar bounds checks before copying reserve slices
(reserve1/reserve2) and decoding the user key (decode_user_key) so no
slice/try_into can panic.

In `@src/storage/src/redis_strings.rs`:
- Around line 2189-2213: The bug is that ParsedBaseKey::new(...) is called twice
so the first branch always wins and meta-type keys (hash/set/zset/list) never
reach their parsers; fix both scan_keys and random_key by parsing the base key
only once (call ParsedBaseKey::new(...) a single time), then attempt value
parsing in order: try ParsedStringsValue::new(...) and handle non-stale string
values, otherwise try the meta parsers (ParsedBaseMetaValue::new,
ParsedListsMetaValue::new, ParsedHashMetaValue::new, etc.) and handle their
non-stale/count>0 checks to include the key; replace the duplicate else-if that
re-parses ParsedBaseKey with this single-parse/value-dispatch pattern so all
non-string live keys are considered.

---

Nitpick comments:
In `@src/storage/src/format_zset_score_key.rs`:
- Around line 67-86: In encode() the user key is encoded twice (encoded_key_buf
created by encode_user_key(&self.key, &mut encoded_key_buf) and then
encode_user_key(&self.key, &mut dst) again); replace the second encode call by
reusing encoded_key_buf (e.g., dst.put_slice(&encoded_key_buf)) so you only
encode the key once, keep the existing capacity calculation (use
encoded_key_buf.len() already present), and ensure the rest of writes
(dst.put_u64_le(self.version), dst.put_u64_le(self.score.to_bits()),
dst.put_slice(&self.member), dst.put_slice(&self.reserve2)) remain the same in
the encode() method.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8a5f5014-3b52-4029-a257-2603e7cdfc36

📥 Commits

Reviewing files that changed from the base of the PR and between 5e51942 and 285da3f.

⛔ Files ignored due to path filters (6)
  • doc/images/01-cf 使用.png is excluded by !**/*.png
  • doc/images/01-hash编码.png is excluded by !**/*.png
  • doc/images/01-list编码.png is excluded by !**/*.png
  • doc/images/01-set 编码.png is excluded by !**/*.png
  • doc/images/01-string 编码.png is excluded by !**/*.png
  • doc/images/01-zset编码.png is excluded by !**/*.png
📒 Files selected for processing (28)
  • doc/01-key编解码.md
  • doc/文档与代码实现差异分析.md
  • src/storage/src/custom_comparator.rs
  • src/storage/src/data_compaction_filter.rs
  • src/storage/src/expiration_manager.rs
  • src/storage/src/format_base_data_value.rs
  • src/storage/src/format_base_key.rs
  • src/storage/src/format_base_meta_value.rs
  • src/storage/src/format_base_value.rs
  • src/storage/src/format_list_meta_value.rs
  • src/storage/src/format_lists_data_key.rs
  • src/storage/src/format_member_data_key.rs
  • src/storage/src/format_strings_value.rs
  • src/storage/src/format_zset_score_key.rs
  • src/storage/src/lib.rs
  • src/storage/src/meta_compaction_filter.rs
  • src/storage/src/redis.rs
  • src/storage/src/redis_hashes.rs
  • src/storage/src/redis_lists.rs
  • src/storage/src/redis_multi.rs
  • src/storage/src/redis_sets.rs
  • src/storage/src/redis_strings.rs
  • src/storage/src/redis_zsets.rs
  • src/storage/src/statistics.rs
  • src/storage/src/storage.rs
  • src/storage/src/storage_impl.rs
  • src/storage/src/zset_score_key_format.rs
  • src/storage/tests/redis_basic_test.rs
💤 Files with no reviewable changes (1)
  • src/storage/src/zset_score_key_format.rs

Comment thread doc/01-key编解码.md

## Hash 结构的存储

Hash 类型数据结构由两部分构成,元数据(meta_key,meta_value)和普通数据(data_key,data_value)。每个 Hash 类型数据对应一条元数据,每个 Filed 对应一条普通数据。具体格式如下图所示。
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix terminology typos in key format descriptions.

There are several typos (Filed, filed, memer) that make the encoding spec harder to follow.

✏️ Suggested doc patch
-Hash 类型数据结构由两部分构成,元数据(meta_key,meta_value)和普通数据(data_key,data_value)。每个 Hash 类型数据对应一条元数据,每个 Filed 对应一条普通数据。具体格式如下图所示。
+Hash 类型数据结构由两部分构成,元数据(meta_key,meta_value)和普通数据(data_key,data_value)。每个 Hash 类型数据对应一条元数据,每个 field 对应一条普通数据。具体格式如下图所示。

-普通数据主要就是指的同一个 hash 表中一一对应的 field 和 value,普通数据的 key 字段依次拼接了保留字段,编码后的 user key,元数据中的最新 version,filed 字段以及后缀保留字段。value 则拼接了保留字段以及 field 写入时间。
+普通数据主要就是指同一个 hash 表中一一对应的 field 和 value,普通数据的 key 字段依次拼接了保留字段、编码后的 user key、元数据中的最新 version、field 字段以及后缀保留字段。value 则拼接了保留字段以及 field 写入时间。

-zset 由两部分构成,元数据(meta_key, meta_value), 普通数据(data_key, data_value)。元数据中存储的主要是 zset 集合的一些信息, 比如说当前 zset 集合中 member 的数量以及当前 zset 集合的版本号和过期时间(用做秒删功能), 而普通数据就是指的 zset 中每个 member 以及对应的 score。由于 zset 这种数据结构比较特殊,需要按照 memer 进行排序,也需要按照 score 进行排序,所以我们对于每一个 zset 我们会按照不同的格式存储两份普通数据, 在这里我们称为 member to score 和 score to member,作为具体最后 RocksDB 落盘的 KV 格式,具体格式如下:
+zset 由两部分构成,元数据(meta_key, meta_value) 和普通数据(data_key, data_value)。元数据中存储的主要是 zset 集合的一些信息,比如当前 zset 集合中 member 的数量以及当前 zset 集合的版本号和过期时间(用于秒删功能),而普通数据就是指 zset 中每个 member 以及对应的 score。由于 zset 这种数据结构比较特殊,需要按照 member 排序,也需要按照 score 排序,所以我们会按不同格式存储两份普通数据,在这里称为 member to score 和 score to member。作为最终 RocksDB 落盘的 KV 格式,具体如下:

Also applies to: 22-22, 50-50

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/01-key编解码.md` at line 16, 修正文档中关于键格式的术语拼写错误:在文件 01-key编解码.md 中将所有出现的
“Filed” 和 “filed” 改为 “Field/field”,并将 “memer” 改为 “member”(例如出现在描述 Hash 类型数据结构的那句
“每个 Filed 对应一条普通数据” 以及其它相似段落),保持大小写一致性并校验其它处没有残留拼写错误以确保描述统一清晰。

Comment on lines +84 to +85
- 初始左边界: `9223372036854775807` (INT64_MAX)
- 初始右边界: `9223372036854775808` (INT64_MIN)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify INT64_MIN representation to avoid numeric ambiguity.

Line 85 currently shows a positive literal with an INT64_MIN label. For signed semantics, INT64_MIN is -9223372036854775808. If you mean the u64 bit-pattern representation, please state that explicitly.

📝 Suggested clarification
-- 初始右边界: `9223372036854775808` (INT64_MIN)
+- 初始右边界: `-9223372036854775808` (INT64_MIN,按 i64 语义)
+- 若按 u64 存储其补码位型,则对应为 `9223372036854775808`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- 初始左边界: `9223372036854775807` (INT64_MAX)
- 初始右边界: `9223372036854775808` (INT64_MIN)
- 初始左边界: `9223372036854775807` (INT64_MAX)
- 初始右边界: `-9223372036854775808` (INT64_MIN,按 i64 语义)
- 若按 u64 存储其补码位型,则对应为 `9223372036854775808`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/文档与代码实现差异分析.md` around lines 84 - 85, The document shows "初始右边界:
`9223372036854775808` (INT64_MIN)" which is ambiguous; update the text to
correctly state signed INT64_MIN as `-9223372036854775808` if you mean the
signed minimum, or explicitly note that `9223372036854775808` is the
unsigned/u64 bit-pattern equivalent of INT64_MIN (two's‑complement) if that is
the intended meaning; ensure you reference the symbols INT64_MIN and INT64_MAX
in the clarification so readers know which interpretation is used.

Comment thread src/storage/src/format_member_data_key.rs
Comment thread src/storage/src/format_zset_score_key.rs Outdated
Comment thread src/storage/src/redis_multi.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/storage/src/redis_sets.rs (1)

718-766: ⚠️ Potential issue | 🔴 Critical

SMOVEsource == destination 时会错误删除成员。

Line 765 先删源成员,而同键场景下 Line 781 分支不会回写,导致数据丢失。需要对同键场景短路处理,避免进入删除路径。

🐛 Proposed fix
         let mut batch = self.create_batch()?;
 
+        // Same source/destination should be a no-op to avoid destructive delete.
+        if source == destination {
+            return Ok(true);
+        }
+
         // Remove from source
         batch.delete(ColumnFamilyIndex::SetsDataCF, &source_member_key)?;
         let new_source_count = source_meta.count() - 1;

Also applies to: 780-806

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/redis_sets.rs` around lines 718 - 766, The SMOVE
implementation deletes the source member before handling the destination and
doesn't short-circuit when source == destination, causing data loss; update the
logic in the SMOVE handling (around MemberDataKey::new(...).encode(),
ParsedSetsMetaValue handling, and the batch delete/insert flow) to detect when
source and destination refer to the same encoded key/version (or identical set
identity) and return early (e.g., no-op with Ok(false/true as appropriate) or
skip the delete/write) so you never delete then skip rewriting in same-key
scenarios; ensure this short-circuit occurs before executing
batch.delete(ColumnFamilyIndex::SetsDataCF, &source_member_key) and before
modifying source_meta.count().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@doc/01-key编解码.md`:
- Around line 20-24: The docs omit the byte order for the version field; update
the first definition of the "version" field in the metadata section to state it
is a u64 in little-endian (little-endian byte order) and add a short sentence
reminding readers that every reuse of "version" in keys (e.g., meta key, data
key assembly where you concatenate main key + version + field) follows this u64
little-endian convention; ensure the same clarification is added to the other
referenced sections (lines referenced in the comment) that describe "version",
"元数据", "data key" and "普通数据" key composition so all usages are explicit.

---

Outside diff comments:
In `@src/storage/src/redis_sets.rs`:
- Around line 718-766: The SMOVE implementation deletes the source member before
handling the destination and doesn't short-circuit when source == destination,
causing data loss; update the logic in the SMOVE handling (around
MemberDataKey::new(...).encode(), ParsedSetsMetaValue handling, and the batch
delete/insert flow) to detect when source and destination refer to the same
encoded key/version (or identical set identity) and return early (e.g., no-op
with Ok(false/true as appropriate) or skip the delete/write) so you never delete
then skip rewriting in same-key scenarios; ensure this short-circuit occurs
before executing batch.delete(ColumnFamilyIndex::SetsDataCF, &source_member_key)
and before modifying source_meta.count().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e25050a0-eb1e-47c3-a224-ac1f1e0bedb0

📥 Commits

Reviewing files that changed from the base of the PR and between 285da3f and 6bf161a.

⛔ Files ignored due to path filters (6)
  • doc/images/01-cf 使用.png is excluded by !**/*.png
  • doc/images/01-hash编码.png is excluded by !**/*.png
  • doc/images/01-list编码.png is excluded by !**/*.png
  • doc/images/01-set 编码.png is excluded by !**/*.png
  • doc/images/01-string 编码.png is excluded by !**/*.png
  • doc/images/01-zset编码.png is excluded by !**/*.png
📒 Files selected for processing (27)
  • doc/01-key编解码.md
  • src/storage/src/custom_comparator.rs
  • src/storage/src/data_compaction_filter.rs
  • src/storage/src/expiration_manager.rs
  • src/storage/src/format_base_data_value.rs
  • src/storage/src/format_base_key.rs
  • src/storage/src/format_base_meta_value.rs
  • src/storage/src/format_base_value.rs
  • src/storage/src/format_list_meta_value.rs
  • src/storage/src/format_lists_data_key.rs
  • src/storage/src/format_member_data_key.rs
  • src/storage/src/format_strings_value.rs
  • src/storage/src/format_zset_score_key.rs
  • src/storage/src/lib.rs
  • src/storage/src/meta_compaction_filter.rs
  • src/storage/src/redis.rs
  • src/storage/src/redis_hashes.rs
  • src/storage/src/redis_lists.rs
  • src/storage/src/redis_multi.rs
  • src/storage/src/redis_sets.rs
  • src/storage/src/redis_strings.rs
  • src/storage/src/redis_zsets.rs
  • src/storage/src/statistics.rs
  • src/storage/src/storage.rs
  • src/storage/src/storage_impl.rs
  • src/storage/src/zset_score_key_format.rs
  • src/storage/tests/redis_basic_test.rs
💤 Files with no reviewable changes (1)
  • src/storage/src/zset_score_key_format.rs
✅ Files skipped from review due to trivial changes (15)
  • src/storage/src/format_base_value.rs
  • src/storage/src/format_base_data_value.rs
  • src/storage/src/format_strings_value.rs
  • src/storage/src/format_base_meta_value.rs
  • src/storage/src/format_list_meta_value.rs
  • src/storage/src/storage.rs
  • src/storage/src/redis.rs
  • src/storage/src/expiration_manager.rs
  • src/storage/tests/redis_basic_test.rs
  • src/storage/src/meta_compaction_filter.rs
  • src/storage/src/format_member_data_key.rs
  • src/storage/src/redis_lists.rs
  • src/storage/src/data_compaction_filter.rs
  • src/storage/src/statistics.rs
  • src/storage/src/redis_zsets.rs
🚧 Files skipped from review as they are similar to previous changes (6)
  • src/storage/src/redis_multi.rs
  • src/storage/src/redis_hashes.rs
  • src/storage/src/redis_strings.rs
  • src/storage/src/lib.rs
  • src/storage/src/custom_comparator.rs
  • src/storage/src/format_zset_score_key.rs

Comment thread doc/01-key编解码.md
Comment on lines +20 to +24
元数据中的 key 由前缀保留字段,编码后的 user key 以及后缀保留字段构成,value 中记录了 hash 中元素个数,最新版本号,保留字段,数据写入时间以及数据过期时间,version 字段用于实现秒删功能。

普通数据主要就是指的同一个 hash 表中一一对应的 field 和 value,普通数据的 key 字段依次拼接了保留字段,编码后的 user key,元数据中的最新 version,filed 字段以及后缀保留字段。value 则拼接了保留字段以及 field 写入时间。

当执行读写以及删除单个 field 操作时,首先获取元信息,之后解析出 version,再将主 key,version 和 field字段拼接得到最终的 data key,最后读写 data key。
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

请在文档中明确 version 的字节序(little-endian)。

当前描述多次提到 version 拼接,但没有明确字节序;这会让实现方在跨模块/跨语言时产生歧义。建议在首次定义处注明 u64 little-endian,并在各结构复用该约定。

📝 Suggested doc patch
-元数据中的 key 由前缀保留字段,编码后的 user key 以及后缀保留字段构成,value 中记录了 hash 中元素个数,最新版本号,保留字段,数据写入时间以及数据过期时间,version 字段用于实现秒删功能。
+元数据中的 key 由前缀保留字段,编码后的 user key 以及后缀保留字段构成,value 中记录了 hash 中元素个数,最新版本号,保留字段,数据写入时间以及数据过期时间。version 字段使用 u64 little-endian 编码,并用于实现秒删功能。

Also applies to: 38-43, 50-58

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/01-key编解码.md` around lines 20 - 24, The docs omit the byte order for the
version field; update the first definition of the "version" field in the
metadata section to state it is a u64 in little-endian (little-endian byte
order) and add a short sentence reminding readers that every reuse of "version"
in keys (e.g., meta key, data key assembly where you concatenate main key +
version + field) follows this u64 little-endian convention; ensure the same
clarification is added to the other referenced sections (lines referenced in the
comment) that describe "version", "元数据", "data key" and "普通数据" key composition
so all usages are explicit.

Signed-off-by: happy-v587 <code4happy@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/storage/src/redis_multi.rs (1)

474-549: ⚠️ Potential issue | 🔴 Critical

Change helper methods to use little-endian encoding for consistency with format modules.

The commit "fix: key encode use little-endian" updated format_member_data_key.rs to use put_u64_le() for version encoding, but the newly added helper methods in redis_multi.rs still use to_be_bytes():

  • Line 480: encode_hash_data_prefixversion.to_be_bytes()
  • Line 492: encode_hash_data_keyversion.to_be_bytes()
  • Line 507: encode_list_data_prefixversion.to_be_bytes()
  • Lines 519, 522: encode_list_data_keyversion.to_be_bytes(), index.to_be_bytes()
  • Line 534: encode_zsets_member_prefixversion.to_be_bytes()
  • Line 546: encode_zsets_score_prefixversion.to_be_bytes()

These should be changed to to_le_bytes() to align with the format modules and the PR's intent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/redis_multi.rs` around lines 474 - 549, Several helper
encoders in redis_multi.rs still use big-endian bytes (to_be_bytes()) for
version/index while format modules and format_member_data_key use little-endian;
update each occurrence in encode_hash_data_prefix, encode_hash_data_key,
encode_list_data_prefix, encode_list_data_key (both version and index),
encode_zsets_member_prefix, and encode_zsets_score_prefix to use to_le_bytes()
so encoding is consistent across modules. Ensure you replace
version.to_be_bytes() and index.to_be_bytes() with
version.to_le_bytes()/index.to_le_bytes() respectively and keep the same
push/extend order.
🧹 Nitpick comments (1)
src/storage/src/format_zset_score_key.rs (1)

65-84: Consider removing redundant encode_user_key call for efficiency.

encode_user_key is called twice: once at line 67 for capacity estimation and again at line 77 for actual encoding. This duplicates work.

♻️ Suggested optimization
 pub fn encode(&self) -> Result<BytesMut> {
-    let mut encoded_key_buf = BytesMut::new();
-    encode_user_key(&self.key, &mut encoded_key_buf)?;
-    let estimated_cap = PREFIX_RESERVE_LENGTH
-        + encoded_key_buf.len()
-        + size_of::<u64>()  // version
-        + size_of::<f64>()  // score
-        + self.member.len()
-        + SUFFIX_RESERVE_LENGTH;
-    let mut dst = BytesMut::with_capacity(estimated_cap);
-
-    dst.put_slice(&self.reserve1);
-    encode_user_key(&self.key, &mut dst)?;
+    // Encode user key once, then copy to final buffer
+    let mut encoded_key_buf = BytesMut::new();
+    encode_user_key(&self.key, &mut encoded_key_buf)?;
+    
+    let estimated_cap = PREFIX_RESERVE_LENGTH
+        + encoded_key_buf.len()
+        + size_of::<u64>()  // version
+        + size_of::<f64>()  // score
+        + self.member.len()
+        + SUFFIX_RESERVE_LENGTH;
+    let mut dst = BytesMut::with_capacity(estimated_cap);
+
+    dst.put_slice(&self.reserve1);
+    dst.put_slice(&encoded_key_buf);
     // Use little-endian for version and score (custom comparator expects this)
     dst.put_u64_le(self.version);

The same optimization applies to encode_seek_key().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/format_zset_score_key.rs` around lines 65 - 84, The encode()
implementation redundantly calls encode_user_key(&self.key, ...) twice (once
into encoded_key_buf for sizing and again when writing into dst); to fix, call
encode_user_key once into a temporary BytesMut (e.g., encoded_user_key), use
encoded_user_key.len() when computing estimated_cap, and then write
encoded_user_key into dst instead of calling encode_user_key a second time;
apply the same change pattern to encode_seek_key() so both functions reuse the
single encoded_user_key buffer for sizing and output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/storage/src/redis_multi.rs`:
- Around line 474-549: Several helper encoders in redis_multi.rs still use
big-endian bytes (to_be_bytes()) for version/index while format modules and
format_member_data_key use little-endian; update each occurrence in
encode_hash_data_prefix, encode_hash_data_key, encode_list_data_prefix,
encode_list_data_key (both version and index), encode_zsets_member_prefix, and
encode_zsets_score_prefix to use to_le_bytes() so encoding is consistent across
modules. Ensure you replace version.to_be_bytes() and index.to_be_bytes() with
version.to_le_bytes()/index.to_le_bytes() respectively and keep the same
push/extend order.

---

Nitpick comments:
In `@src/storage/src/format_zset_score_key.rs`:
- Around line 65-84: The encode() implementation redundantly calls
encode_user_key(&self.key, ...) twice (once into encoded_key_buf for sizing and
again when writing into dst); to fix, call encode_user_key once into a temporary
BytesMut (e.g., encoded_user_key), use encoded_user_key.len() when computing
estimated_cap, and then write encoded_user_key into dst instead of calling
encode_user_key a second time; apply the same change pattern to
encode_seek_key() so both functions reuse the single encoded_user_key buffer for
sizing and output.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b2817b46-4e04-4194-af5a-df0f0b108db7

📥 Commits

Reviewing files that changed from the base of the PR and between 6bf161a and 114c64d.

⛔ Files ignored due to path filters (6)
  • doc/images/01-cf 使用.png is excluded by !**/*.png
  • doc/images/01-hash编码.png is excluded by !**/*.png
  • doc/images/01-list编码.png is excluded by !**/*.png
  • doc/images/01-set 编码.png is excluded by !**/*.png
  • doc/images/01-string 编码.png is excluded by !**/*.png
  • doc/images/01-zset编码.png is excluded by !**/*.png
📒 Files selected for processing (27)
  • doc/01-key编解码.md
  • src/storage/src/custom_comparator.rs
  • src/storage/src/data_compaction_filter.rs
  • src/storage/src/expiration_manager.rs
  • src/storage/src/format_base_data_value.rs
  • src/storage/src/format_base_key.rs
  • src/storage/src/format_base_meta_value.rs
  • src/storage/src/format_base_value.rs
  • src/storage/src/format_list_meta_value.rs
  • src/storage/src/format_lists_data_key.rs
  • src/storage/src/format_member_data_key.rs
  • src/storage/src/format_strings_value.rs
  • src/storage/src/format_zset_score_key.rs
  • src/storage/src/lib.rs
  • src/storage/src/meta_compaction_filter.rs
  • src/storage/src/redis.rs
  • src/storage/src/redis_hashes.rs
  • src/storage/src/redis_lists.rs
  • src/storage/src/redis_multi.rs
  • src/storage/src/redis_sets.rs
  • src/storage/src/redis_strings.rs
  • src/storage/src/redis_zsets.rs
  • src/storage/src/statistics.rs
  • src/storage/src/storage.rs
  • src/storage/src/storage_impl.rs
  • src/storage/src/zset_score_key_format.rs
  • src/storage/tests/redis_basic_test.rs
💤 Files with no reviewable changes (1)
  • src/storage/src/zset_score_key_format.rs
✅ Files skipped from review due to trivial changes (18)
  • src/storage/src/format_base_value.rs
  • src/storage/src/format_strings_value.rs
  • src/storage/src/format_list_meta_value.rs
  • src/storage/src/format_base_data_value.rs
  • src/storage/src/expiration_manager.rs
  • src/storage/src/format_base_meta_value.rs
  • src/storage/tests/redis_basic_test.rs
  • src/storage/src/storage.rs
  • src/storage/src/redis.rs
  • src/storage/src/storage_impl.rs
  • src/storage/src/statistics.rs
  • src/storage/src/redis_zsets.rs
  • src/storage/src/redis_lists.rs
  • src/storage/src/data_compaction_filter.rs
  • src/storage/src/meta_compaction_filter.rs
  • src/storage/src/redis_hashes.rs
  • src/storage/src/redis_sets.rs
  • src/storage/src/custom_comparator.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/storage/src/format_member_data_key.rs
  • src/storage/src/redis_strings.rs
  • src/storage/src/lib.rs

Comment thread src/storage/src/format_member_data_key.rs
Comment thread src/storage/src/custom_comparator.rs
Signed-off-by: happy-v587 <code4happy@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/storage/src/redis_multi.rs (1)

26-26: ⚠️ Potential issue | 🔴 Critical

Add the missing rocksdb::IteratorMode and rocksdb::Direction imports.

This file uses IteratorMode::From and Direction::Forward at multiple locations, but neither type is imported from rocksdb, causing compilation errors.

Suggested fix
-use rocksdb::{WriteBatch, WriteOptions, ReadOptions};
+use rocksdb::{Direction, IteratorMode, ReadOptions, WriteBatch, WriteOptions};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/redis_multi.rs` at line 26, The code references
rocksdb::IteratorMode::From and rocksdb::Direction::Forward but they aren't
imported; update the existing use statement that currently imports WriteBatch,
WriteOptions, ReadOptions to also import IteratorMode and Direction (e.g., add
IteratorMode and Direction to the use rocksdb::{...} list) so symbols like
IteratorMode::From and Direction::Forward resolve in functions using them.
♻️ Duplicate comments (3)
doc/01-key编解码.md (2)

20-24: ⚠️ Potential issue | 🟡 Minor

请把 version 的编码方式明确写成 u64 little-endian

这次 PR 的核心就是切到 little-endian,但文档里 version 仍然是泛指,List 段落还把 data key 文字描述成只有 key + Index。这里需要把所有复用 version 的 key/value 描述统一写死,否则跨模块实现很容易继续漂移。

Also applies to: 42-42, 54-58

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/01-key编解码.md` around lines 20 - 24, The docs currently leave the encoding
of the version ambiguous; update all occurrences (including the metadata
description, the ordinary data key/value description, and the List segment where
data key was described as "key + Index") to explicitly state that the version
field is encoded as u64 little-endian and that every place referencing version
in keys/values uses this exact encoding; ensure you specify for the meta key
(metadata), the data key (key + version + field/index), and any List/index keys
that the version bytes are appended/parsed as a u64 little-endian value so
cross-module implementations use the same binary layout.

16-22: ⚠️ Potential issue | 🟡 Minor

修正文档里的术语拼写,避免实现方误读。

这里仍然有 Filedfiledmemer 这类术语错误;在编码规格文档里,这会直接增加 field/member 布局的理解成本。

Also applies to: 50-50

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/01-key编解码.md` around lines 16 - 22, Replace the misspelled terminology in
the document: change all occurrences of "Filed" and "filed" to "Field" and
"field", and "memer" to "member" so the descriptions of Hash, meta_key,
meta_value, data_key, data_value, version, and related key/value layout language
are accurate and consistent; ensure the key/value paragraph and any captions
(including the terms used around meta_key/meta_value and data_key/data_value,
and mentions of version/field/member) are updated accordingly and also apply the
same fixes in the related "50-50" document.
src/storage/src/custom_comparator.rs (1)

372-385: ⚠️ Potential issue | 🔴 Critical

Do not lock Equal in for 0.0/-0.0 or NaN bit-pattern collisions.

These assertions bake the comparator bug in: the encoded keys are byte-distinct, but zsets_score_key_compare() can still return Ordering::Equal when the numeric comparison collapses them and member is the same. A RocksDB comparator must not report equality for distinct keys. Either canonicalize -0.0/NaN during encoding, or use the raw score bits as a tie-breaker after numeric comparison.

Does RocksDB require a custom comparator to return equality only for keys that are truly identical in ordering, and what are the recommended approaches for handling distinct floating-point encodings such as `0.0` vs `-0.0` or different NaN payloads?

Also applies to: 455-463

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/custom_comparator.rs` around lines 372 - 385, The comparator
zsets_score_key_compare currently can return Ordering::Equal for keys whose
encoded bytes differ (e.g., 0.0 vs -0.0 or distinct NaN payloads), which breaks
RocksDB's requirement that equality implies identical keys; fix by either
canonicalizing floating scores during encoding in ZSetsScoreKey::encode
(normalize -0.0 to 0.0 and collapse/normalize NaN payloads) or, more
conservatively, keep numeric comparison but add a deterministic tie-breaker
using the raw f64 bit pattern (e.g., compare f64::to_bits()) and then the member
bytes so that zsets_score_key_compare never returns Equal for distinct byte
sequences (also update tests like
test_zsets_score_key_compare_positive_negative_zero accordingly).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/storage/src/format_zset_score_key.rs`:
- Around line 18-22: The module fails to compile because size_of is not
imported; add an import for std::mem::size_of (e.g., add use std::mem::size_of;)
at the top of the module so all unqualified size_of::<T>() calls in functions
like the zset score key encoder/decoder and any other locations (the blocks
around the previously noted ranges) resolve correctly, or alternatively qualify
each call as std::mem::size_of::<T>()—make the change consistently wherever
size_of is used.
- Around line 551-589: The test test_exact_capacity_estimation currently asserts
encoded.capacity() == expected_len which is wrong because
BytesMut::with_capacity may round up; change the strict equality to
assert_eq!(encoded.len(), expected_len) to verify the actual encoded data
layout, and keep or add the existing assert!(encoded.capacity() >= expected_len)
to preserve the allocation guarantee; update the assertions around
ZSetsScoreKey::new(...).encode(), encode_user_key(), PREFIX_RESERVE_LENGTH and
SUFFIX_RESERVE_LENGTH accordingly so the test checks encoded.len() for exact
byte layout and capacity() for allocation sufficiency.

In `@src/storage/src/redis_multi.rs`:
- Around line 399-407: Add the missing imports IteratorMode and Direction to the
rocksdb import so the uses in redis_multi.rs compile; then fix the endianness
mismatch by changing all occurrences of to_be_bytes() to to_le_bytes() in the
helper encoder functions encode_hash_data_prefix, encode_hash_data_key,
encode_list_data_prefix, encode_list_data_key (also change index.to_be_bytes()
to index.to_le_bytes()), encode_zsets_member_prefix, and
encode_zsets_score_prefix so keys/prefixes are encoded in little-endian matching
decode_fixed in coding.rs and allowing prefix scans (del/rename) to find the
correct keys.

---

Outside diff comments:
In `@src/storage/src/redis_multi.rs`:
- Line 26: The code references rocksdb::IteratorMode::From and
rocksdb::Direction::Forward but they aren't imported; update the existing use
statement that currently imports WriteBatch, WriteOptions, ReadOptions to also
import IteratorMode and Direction (e.g., add IteratorMode and Direction to the
use rocksdb::{...} list) so symbols like IteratorMode::From and
Direction::Forward resolve in functions using them.

---

Duplicate comments:
In `@doc/01-key编解码.md`:
- Around line 20-24: The docs currently leave the encoding of the version
ambiguous; update all occurrences (including the metadata description, the
ordinary data key/value description, and the List segment where data key was
described as "key + Index") to explicitly state that the version field is
encoded as u64 little-endian and that every place referencing version in
keys/values uses this exact encoding; ensure you specify for the meta key
(metadata), the data key (key + version + field/index), and any List/index keys
that the version bytes are appended/parsed as a u64 little-endian value so
cross-module implementations use the same binary layout.
- Around line 16-22: Replace the misspelled terminology in the document: change
all occurrences of "Filed" and "filed" to "Field" and "field", and "memer" to
"member" so the descriptions of Hash, meta_key, meta_value, data_key,
data_value, version, and related key/value layout language are accurate and
consistent; ensure the key/value paragraph and any captions (including the terms
used around meta_key/meta_value and data_key/data_value, and mentions of
version/field/member) are updated accordingly and also apply the same fixes in
the related "50-50" document.

In `@src/storage/src/custom_comparator.rs`:
- Around line 372-385: The comparator zsets_score_key_compare currently can
return Ordering::Equal for keys whose encoded bytes differ (e.g., 0.0 vs -0.0 or
distinct NaN payloads), which breaks RocksDB's requirement that equality implies
identical keys; fix by either canonicalizing floating scores during encoding in
ZSetsScoreKey::encode (normalize -0.0 to 0.0 and collapse/normalize NaN
payloads) or, more conservatively, keep numeric comparison but add a
deterministic tie-breaker using the raw f64 bit pattern (e.g., compare
f64::to_bits()) and then the member bytes so that zsets_score_key_compare never
returns Equal for distinct byte sequences (also update tests like
test_zsets_score_key_compare_positive_negative_zero accordingly).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 700a826c-1ee3-4257-8bc0-d54ccd82f4f9

📥 Commits

Reviewing files that changed from the base of the PR and between 114c64d and 2e58693.

⛔ Files ignored due to path filters (6)
  • doc/images/01-cf使用.png is excluded by !**/*.png
  • doc/images/01-hash编码.png is excluded by !**/*.png
  • doc/images/01-list编码.png is excluded by !**/*.png
  • doc/images/01-set编码.png is excluded by !**/*.png
  • doc/images/01-string编码.png is excluded by !**/*.png
  • doc/images/01-zset编码.png is excluded by !**/*.png
📒 Files selected for processing (27)
  • doc/01-key编解码.md
  • src/storage/src/custom_comparator.rs
  • src/storage/src/data_compaction_filter.rs
  • src/storage/src/expiration_manager.rs
  • src/storage/src/format_base_data_value.rs
  • src/storage/src/format_base_key.rs
  • src/storage/src/format_base_meta_value.rs
  • src/storage/src/format_base_value.rs
  • src/storage/src/format_list_meta_value.rs
  • src/storage/src/format_lists_data_key.rs
  • src/storage/src/format_member_data_key.rs
  • src/storage/src/format_strings_value.rs
  • src/storage/src/format_zset_score_key.rs
  • src/storage/src/lib.rs
  • src/storage/src/meta_compaction_filter.rs
  • src/storage/src/redis.rs
  • src/storage/src/redis_hashes.rs
  • src/storage/src/redis_lists.rs
  • src/storage/src/redis_multi.rs
  • src/storage/src/redis_sets.rs
  • src/storage/src/redis_strings.rs
  • src/storage/src/redis_zsets.rs
  • src/storage/src/statistics.rs
  • src/storage/src/storage.rs
  • src/storage/src/storage_impl.rs
  • src/storage/src/zset_score_key_format.rs
  • src/storage/tests/redis_basic_test.rs
💤 Files with no reviewable changes (1)
  • src/storage/src/zset_score_key_format.rs
✅ Files skipped from review due to trivial changes (15)
  • src/storage/src/format_base_value.rs
  • src/storage/src/format_strings_value.rs
  • src/storage/src/format_base_data_value.rs
  • src/storage/src/redis.rs
  • src/storage/src/storage.rs
  • src/storage/src/expiration_manager.rs
  • src/storage/src/format_base_meta_value.rs
  • src/storage/tests/redis_basic_test.rs
  • src/storage/src/redis_zsets.rs
  • src/storage/src/meta_compaction_filter.rs
  • src/storage/src/redis_lists.rs
  • src/storage/src/data_compaction_filter.rs
  • src/storage/src/redis_sets.rs
  • src/storage/src/format_list_meta_value.rs
  • src/storage/src/redis_strings.rs
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/storage/src/storage_impl.rs
  • src/storage/src/redis_hashes.rs
  • src/storage/src/format_member_data_key.rs
  • src/storage/src/lib.rs

Comment on lines +18 to +22
use crate::error::InvalidFormatSnafu;
use crate::error::Result;
use crate::storage_define::{PREFIX_RESERVE_LENGTH, SUFFIX_RESERVE_LENGTH};
use crate::storage_define::{decode_user_key, encode_user_key, seek_userkey_delim};
use bytes::{BufMut, Bytes, BytesMut};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n '\bsize_of::<[^>]+>\(\)' src/storage/src/format_zset_score_key.rs
rg -n 'std::mem::size_of|use .*size_of' src/storage/src/format_zset_score_key.rs

Repository: arana-db/kiwi

Length of output: 941


🏁 Script executed:

#!/bin/bash
# Search for the test mentioned in the review comment
rg -n 'test_exact_capacity_estimation|capacity\(\) ==' src/storage/src/format_zset_score_key.rs

Repository: arana-db/kiwi

Length of output: 102


🏁 Script executed:

#!/bin/bash
# Read the test implementation
sed -n '551,580p' src/storage/src/format_zset_score_key.rs | cat -n

Repository: arana-db/kiwi

Length of output: 1511


🏁 Script executed:

#!/bin/bash
# Verify the encode() function uses size_of internally
rg -B5 -A15 'fn encode\(&self\)' src/storage/src/format_zset_score_key.rs | head -40

Repository: arana-db/kiwi

Length of output: 888


size_of is not in scope here, so this new module will not compile.

Rust does not import size_of implicitly. Every unqualified size_of::<...>() in this file is unresolved unless you import std::mem::size_of or fully qualify each call.

Suggested fix
+use std::mem::size_of;
+
 use crate::error::InvalidFormatSnafu;
 use crate::error::Result;
 use crate::storage_define::{PREFIX_RESERVE_LENGTH, SUFFIX_RESERVE_LENGTH};
 use crate::storage_define::{decode_user_key, encode_user_key, seek_userkey_delim};
 use bytes::{BufMut, Bytes, BytesMut};

Also applies to: 68-73, 89-92, 136-169

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/format_zset_score_key.rs` around lines 18 - 22, The module
fails to compile because size_of is not imported; add an import for
std::mem::size_of (e.g., add use std::mem::size_of;) at the top of the module so
all unqualified size_of::<T>() calls in functions like the zset score key
encoder/decoder and any other locations (the blocks around the previously noted
ranges) resolve correctly, or alternatively qualify each call as
std::mem::size_of::<T>()—make the change consistently wherever size_of is used.

Comment on lines +551 to +589
fn test_exact_capacity_estimation() {
// Verify accurate capacity estimation and ensure no extra memory allocation
let test_cases = vec![
(b"simple" as &[u8], b"member" as &[u8]),
(b"key\x00with\x00nulls", b"mem\x00ber"),
(b"", b""),
(b"a", b"b"),
];

for (key, member) in test_cases {
let score_key = ZSetsScoreKey::new(key, 1, 1.0, member);
let encoded = score_key.encode().expect("encode failed");

// Calculate expected length using actual encoded key length
let mut encoded_key_buf = BytesMut::new();
encode_user_key(key, &mut encoded_key_buf).expect("encode key failed");
let expected_len = PREFIX_RESERVE_LENGTH
+ encoded_key_buf.len()
+ size_of::<u64>() // version
+ size_of::<f64>() // score
+ member.len()
+ SUFFIX_RESERVE_LENGTH;

// Verify actual encoded length matches expected length
assert_eq!(
encoded.capacity(),
expected_len,
"Capacity mismatch for key={:?}, member={:?}",
key,
member
);

// Verify BytesMut capacity is sufficient (allocator may allocate more for alignment)
assert!(
encoded.capacity() >= expected_len,
"Capacity underallocated for key={:?}, member={:?}",
key,
member
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the file
fd format_zset_score_key.rs

Repository: arana-db/kiwi

Length of output: 97


🏁 Script executed:

# Read the specific lines mentioned in the review comment
wc -l src/storage/src/format_zset_score_key.rs

Repository: arana-db/kiwi

Length of output: 101


🏁 Script executed:

# Read the test function to see current state
sed -n '551,589p' src/storage/src/format_zset_score_key.rs

Repository: arana-db/kiwi

Length of output: 1616


🌐 Web query:

BytesMut with_capacity guarantee exact or minimum

💡 Result:

BytesMut::with_capacity(capacity) guarantees a minimum capacity of the specified value, not an exact capacity. The returned BytesMut can hold at least capacity bytes without reallocating, but the actual capacity may be larger depending on Vec::with_capacity behavior (which may round up to a power of 2 or similar for efficiency). This is confirmed by the official documentation: "The returned BytesMut will be able to hold at least capacity bytes without reallocating." Examples show BytesMut::with_capacity(64) has capacity == 64, but the guarantee is ">=". Source code implements it as BytesMut::from_vec(Vec::with_capacity(capacity)), inheriting Vec's minimum capacity behavior. Note: BytesMut is from the bytes crate (used by Tokio), not std::io::BytesMut (which doesn't exist).

Citations:


🏁 Script executed:

# Let's also check if there's an encode() method and understand what BytesMut len represents
rg -A 5 "fn encode\(" src/storage/src/format_zset_score_key.rs | head -30

Repository: arana-db/kiwi

Length of output: 342


🏁 Script executed:

# Check what encode_user_key does and if it sets the actual len
rg -B 3 -A 10 "fn encode_user_key" src/storage/src/format_zset_score_key.rs

Repository: arana-db/kiwi

Length of output: 39


🏁 Script executed:

# Search for encode_user_key definition
rg "encode_user_key" src/storage/src/format_zset_score_key.rs -B 2 -A 15

Repository: arana-db/kiwi

Length of output: 7365


🏁 Script executed:

# Get more context on the encode() method
sed -n '500,550p' src/storage/src/format_zset_score_key.rs

Repository: arana-db/kiwi

Length of output: 1598


Replace exact capacity assertion with length assertion.

BytesMut::with_capacity(n) guarantees at least n bytes, not exactly n. The exact capacity check makes this test allocator- and version-dependent and will fail if the allocator rounds up. Check len() instead to verify the actual encoded data layout, and keep capacity() >= expected_len for the allocation guarantee.

Suggested fix
-            assert_eq!(
-                encoded.capacity(),
-                expected_len,
-                "Capacity mismatch for key={:?}, member={:?}",
-                key,
-                member
-            );
+            assert_eq!(
+                encoded.len(),
+                expected_len,
+                "Encoded length mismatch for key={:?}, member={:?}",
+                key,
+                member
+            );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/format_zset_score_key.rs` around lines 551 - 589, The test
test_exact_capacity_estimation currently asserts encoded.capacity() ==
expected_len which is wrong because BytesMut::with_capacity may round up; change
the strict equality to assert_eq!(encoded.len(), expected_len) to verify the
actual encoded data layout, and keep or add the existing
assert!(encoded.capacity() >= expected_len) to preserve the allocation
guarantee; update the assertions around ZSetsScoreKey::new(...).encode(),
encode_user_key(), PREFIX_RESERVE_LENGTH and SUFFIX_RESERVE_LENGTH accordingly
so the test checks encoded.len() for exact byte layout and capacity() for
allocation sufficiency.

Comment on lines +399 to +407
let score_raw: [u8; 8] = score_bytes
.get(0..8)
.and_then(|b| b.try_into().ok())
.ok_or_else(|| {
StorageError::InvalidFormat(
"Invalid zset score payload length (expected 8 bytes)".to_string(),
)
})?;
let score = f64::from_le_bytes(score_raw);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n 'fn encode_.*(prefix|key)|to_be_bytes\(' src/storage/src/redis_multi.rs
rg -n 'put_u64_le|from_le_bytes|decode_fixed::<u64>' src/storage/src/format_zset_score_key.rs src/storage/src/custom_comparator.rs

Repository: arana-db/kiwi

Length of output: 2816


🏁 Script executed:

#!/bin/bash
# Check what decode_fixed does
rg -A 5 'fn decode_fixed|macro.*decode_fixed' src/storage/src/custom_comparator.rs

# Check if IteratorMode/Direction are imported or used
rg -n 'IteratorMode|Direction|use.*rocksdb' src/storage/src/redis_multi.rs | head -20

# Check if helpers are called in rename/del methods
rg -B 2 -A 10 'fn rename|fn del\(' src/storage/src/redis_multi.rs | head -40

Repository: arana-db/kiwi

Length of output: 1524


🏁 Script executed:

#!/bin/bash
# Get the full del and rename methods
rg -A 50 'pub fn del\(&self' src/storage/src/redis_multi.rs | head -100

# Get full rename method
rg -A 80 'pub fn rename\(&self' src/storage/src/redis_multi.rs | head -120

# Check decode_fixed source - likely from rocksdb or local crate
rg 'fn decode_fixed|macro_rules.*decode_fixed' --type rust

Repository: arana-db/kiwi

Length of output: 7137


🏁 Script executed:

#!/bin/bash
# Get the decode_fixed implementation to check endianness
rg -A 10 'pub fn decode_fixed' src/storage/src/coding.rs

# Check FixedInt trait to understand the implementation
rg -B 5 -A 15 'trait FixedInt' src/storage/src/coding.rs

# Also check what put_u64_le does vs to_be_bytes for comparison
rg -B 2 -A 8 'fn put_u64_le|fn put_u64_be' src/storage/src/coding.rs

Repository: arana-db/kiwi

Length of output: 1041


Two critical compile and runtime blockers must be fixed:

  1. Missing imports: IteratorMode and Direction are used at lines 310, 334, 358, 382, and 466 but not imported. Add them to the rocksdb import at line 26.

  2. Endianness mismatch in helper encoders: The helpers encode_hash_data_prefix, encode_hash_data_key, encode_list_data_prefix, encode_list_data_key, encode_zsets_member_prefix, and encode_zsets_score_prefix all write version (and index in the list key helper) using to_be_bytes() (big-endian). However, decode_fixed in src/storage/src/coding.rs reads all fixed integers using from_le_bytes() (little-endian). When del() and rename() call these helpers to construct prefixes for scanning, they will not match the actual keys written elsewhere, breaking prefix deletion and key enumeration.

Suggested fixes

For the imports, change line 26:

-use rocksdb::{WriteBatch, WriteOptions, ReadOptions};
+use rocksdb::{WriteBatch, WriteOptions, ReadOptions, IteratorMode, Direction};

For each occurrence of to_be_bytes() in the six helper functions (lines 487, 499, 514, 526, 529, 541, 553):

-        result.extend_from_slice(&version.to_be_bytes());
+        result.extend_from_slice(&version.to_le_bytes());

Apply the same change to line 529 for index.to_be_bytes().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/redis_multi.rs` around lines 399 - 407, Add the missing
imports IteratorMode and Direction to the rocksdb import so the uses in
redis_multi.rs compile; then fix the endianness mismatch by changing all
occurrences of to_be_bytes() to to_le_bytes() in the helper encoder functions
encode_hash_data_prefix, encode_hash_data_key, encode_list_data_prefix,
encode_list_data_key (also change index.to_be_bytes() to index.to_le_bytes()),
encode_zsets_member_prefix, and encode_zsets_score_prefix so keys/prefixes are
encoded in little-endian matching decode_fixed in coding.rs and allowing prefix
scans (del/rename) to find the correct keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants