Conversation
📝 WalkthroughWalkthroughThis change renames many internal format modules to Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ec39730 to
6bf161a
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/storage/src/format_member_data_key.rs (1)
98-125:⚠️ Potential issue | 🟠 MajorReturn
Errfor short keys instead of panicking.This parser still subtracts
SUFFIX_RESERVE_LENGTHand usesexpectbefore validating the buffer size. A truncated or corrupted key will panic in scan paths rather than surfaceInvalidFormat, which is much harder to recover from in production.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/src/format_member_data_key.rs` around lines 98 - 125, The new(encoded_key: &[u8]) parser may panic on short/truncated buffers; before taking slices or calling try_into/expect you must validate encoded_key.len() is at least PREFIX_RESERVE_LENGTH + SUFFIX_RESERVE_LENGTH + size_of::<u64>() and that seek_userkey_delim(...) returns a delimiter within the available range; if any check fails return Err(InvalidFormat) instead of slicing/expect-ing. Concretely, in new ensure end_idx is not less than start_idx, validate key_end_idx <= end_idx and version_end_idx <= end_idx, and replace the version_slice.try_into().expect(...) with a safe conversion that returns Err(InvalidFormat) when the slice length is wrong; perform similar bounds checks before copying reserve slices (reserve1/reserve2) and decoding the user key (decode_user_key) so no slice/try_into can panic.src/storage/src/redis_strings.rs (1)
2189-2213:⚠️ Potential issue | 🟠 MajorUnreachable
else ifdrops non-string keys fromscan_keys/random_key.At Line 2189 and Line 2201 (and similarly Line 2311 and Line 2320), both branches use
ParsedBaseKey::new(...). Since that parse succeeds for valid meta keys too, theelse ifbranch never executes, so hash/set/zset/list meta-value paths are skipped.🔧 Suggested fix (apply same pattern in both functions)
- if let Ok(base_key) = crate::format_base_key::ParsedBaseKey::new(&key_bytes[..]) { - let key_str = String::from_utf8_lossy(base_key.key()); - - // Check if key is not expired - if let Ok(parsed) = ParsedStringsValue::new(&value_bytes[..]) { - if !parsed.is_stale() { - // Simple pattern matching - in a full implementation, we'd support glob patterns - if pattern == "*" || key_str.contains(pattern) { - keys.push(key_str.to_string()); - } - } - } - } else if let Ok(meta_key) = crate::format_base_key::ParsedBaseKey::new(&key_bytes[..]) - { - let key_str = String::from_utf8_lossy(meta_key.key()); - - // Check if meta key is not expired using appropriate parser - if !value_bytes.is_empty() { - let is_valid = if let Ok(meta) = - crate::format_base_meta_value::ParsedBaseMetaValue::new(&value_bytes[..]) - { - !meta.is_stale() && meta.count() > 0 - } else if let Ok(list_meta) = - crate::format_list_meta_value::ParsedListsMetaValue::new(&value_bytes[..]) - { - !list_meta.is_stale() && list_meta.count() > 0 - } else { - false - }; - - if is_valid { - // Simple pattern matching - if pattern == "*" || key_str.contains(pattern) { - keys.push(key_str.to_string()); - } - } - } - } + if let Ok(parsed_key) = crate::format_base_key::ParsedBaseKey::new(&key_bytes[..]) { + let key_str = String::from_utf8_lossy(parsed_key.key()); + let is_valid = if let Ok(parsed) = ParsedStringsValue::new(&value_bytes[..]) { + !parsed.is_stale() + } else if let Ok(meta) = + crate::format_base_meta_value::ParsedBaseMetaValue::new(&value_bytes[..]) + { + !meta.is_stale() && meta.count() > 0 + } else if let Ok(list_meta) = + crate::format_list_meta_value::ParsedListsMetaValue::new(&value_bytes[..]) + { + !list_meta.is_stale() && list_meta.count() > 0 + } else { + false + }; + + if is_valid && (pattern == "*" || key_str.contains(pattern)) { + keys.push(key_str.to_string()); + } + }Use the same single-parse/value-dispatch structure in
random_keyso non-string live keys are eligible there too.Also applies to: 2311-2329
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/src/redis_strings.rs` around lines 2189 - 2213, The bug is that ParsedBaseKey::new(...) is called twice so the first branch always wins and meta-type keys (hash/set/zset/list) never reach their parsers; fix both scan_keys and random_key by parsing the base key only once (call ParsedBaseKey::new(...) a single time), then attempt value parsing in order: try ParsedStringsValue::new(...) and handle non-stale string values, otherwise try the meta parsers (ParsedBaseMetaValue::new, ParsedListsMetaValue::new, ParsedHashMetaValue::new, etc.) and handle their non-stale/count>0 checks to include the key; replace the duplicate else-if that re-parses ParsedBaseKey with this single-parse/value-dispatch pattern so all non-string live keys are considered.
🧹 Nitpick comments (1)
src/storage/src/format_zset_score_key.rs (1)
67-86: Minor inefficiency: user key encoded twice.The user key is encoded at line 69 for capacity estimation and again at line 79. Consider reusing the first encoding.
♻️ Suggested optimization
pub fn encode(&self) -> Result<BytesMut> { - let mut encoded_key_buf = BytesMut::new(); - encode_user_key(&self.key, &mut encoded_key_buf)?; let estimated_cap = PREFIX_RESERVE_LENGTH - + encoded_key_buf.len() + + self.key.len() * 2 + 2 // worst case: every byte escaped + delimiter + size_of::<u64>() // version + size_of::<f64>() // score + self.member.len() + SUFFIX_RESERVE_LENGTH; let mut dst = BytesMut::with_capacity(estimated_cap); dst.put_slice(&self.reserve1); encode_user_key(&self.key, &mut dst)?; // Use little-endian for version and score (custom comparator expects this) dst.put_u64_le(self.version); dst.put_u64_le(self.score.to_bits()); dst.put_slice(&self.member); dst.put_slice(&self.reserve2); Ok(dst) }Alternatively, keep the current approach if exact capacity is preferred over performance.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/src/format_zset_score_key.rs` around lines 67 - 86, In encode() the user key is encoded twice (encoded_key_buf created by encode_user_key(&self.key, &mut encoded_key_buf) and then encode_user_key(&self.key, &mut dst) again); replace the second encode call by reusing encoded_key_buf (e.g., dst.put_slice(&encoded_key_buf)) so you only encode the key once, keep the existing capacity calculation (use encoded_key_buf.len() already present), and ensure the rest of writes (dst.put_u64_le(self.version), dst.put_u64_le(self.score.to_bits()), dst.put_slice(&self.member), dst.put_slice(&self.reserve2)) remain the same in the encode() method.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@doc/01-key编解码.md`:
- Line 16: 修正文档中关于键格式的术语拼写错误:在文件 01-key编解码.md 中将所有出现的 “Filed” 和 “filed” 改为
“Field/field”,并将 “memer” 改为 “member”(例如出现在描述 Hash 类型数据结构的那句 “每个 Filed 对应一条普通数据”
以及其它相似段落),保持大小写一致性并校验其它处没有残留拼写错误以确保描述统一清晰。
In `@doc/文档与代码实现差异分析.md`:
- Around line 84-85: The document shows "初始右边界: `9223372036854775808`
(INT64_MIN)" which is ambiguous; update the text to correctly state signed
INT64_MIN as `-9223372036854775808` if you mean the signed minimum, or
explicitly note that `9223372036854775808` is the unsigned/u64 bit-pattern
equivalent of INT64_MIN (two's‑complement) if that is the intended meaning;
ensure you reference the symbols INT64_MIN and INT64_MAX in the clarification so
readers know which interpretation is used.
In `@src/storage/src/format_zset_score_key.rs`:
- Around line 20-22: The import list in format_zset_score_key.rs includes an
unused symbol ENCODED_KEY_DELIM_SIZE; remove ENCODED_KEY_DELIM_SIZE from the use
declaration that currently imports ENCODED_KEY_DELIM_SIZE, decode_user_key,
encode_user_key, seek_userkey_delim so only the actually used symbols
(decode_user_key, encode_user_key, seek_userkey_delim) remain to eliminate the
unused import warning.
---
Outside diff comments:
In `@src/storage/src/format_member_data_key.rs`:
- Around line 98-125: The new(encoded_key: &[u8]) parser may panic on
short/truncated buffers; before taking slices or calling try_into/expect you
must validate encoded_key.len() is at least PREFIX_RESERVE_LENGTH +
SUFFIX_RESERVE_LENGTH + size_of::<u64>() and that seek_userkey_delim(...)
returns a delimiter within the available range; if any check fails return
Err(InvalidFormat) instead of slicing/expect-ing. Concretely, in new ensure
end_idx is not less than start_idx, validate key_end_idx <= end_idx and
version_end_idx <= end_idx, and replace the version_slice.try_into().expect(...)
with a safe conversion that returns Err(InvalidFormat) when the slice length is
wrong; perform similar bounds checks before copying reserve slices
(reserve1/reserve2) and decoding the user key (decode_user_key) so no
slice/try_into can panic.
In `@src/storage/src/redis_strings.rs`:
- Around line 2189-2213: The bug is that ParsedBaseKey::new(...) is called twice
so the first branch always wins and meta-type keys (hash/set/zset/list) never
reach their parsers; fix both scan_keys and random_key by parsing the base key
only once (call ParsedBaseKey::new(...) a single time), then attempt value
parsing in order: try ParsedStringsValue::new(...) and handle non-stale string
values, otherwise try the meta parsers (ParsedBaseMetaValue::new,
ParsedListsMetaValue::new, ParsedHashMetaValue::new, etc.) and handle their
non-stale/count>0 checks to include the key; replace the duplicate else-if that
re-parses ParsedBaseKey with this single-parse/value-dispatch pattern so all
non-string live keys are considered.
---
Nitpick comments:
In `@src/storage/src/format_zset_score_key.rs`:
- Around line 67-86: In encode() the user key is encoded twice (encoded_key_buf
created by encode_user_key(&self.key, &mut encoded_key_buf) and then
encode_user_key(&self.key, &mut dst) again); replace the second encode call by
reusing encoded_key_buf (e.g., dst.put_slice(&encoded_key_buf)) so you only
encode the key once, keep the existing capacity calculation (use
encoded_key_buf.len() already present), and ensure the rest of writes
(dst.put_u64_le(self.version), dst.put_u64_le(self.score.to_bits()),
dst.put_slice(&self.member), dst.put_slice(&self.reserve2)) remain the same in
the encode() method.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8a5f5014-3b52-4029-a257-2603e7cdfc36
⛔ Files ignored due to path filters (6)
doc/images/01-cf 使用.pngis excluded by!**/*.pngdoc/images/01-hash编码.pngis excluded by!**/*.pngdoc/images/01-list编码.pngis excluded by!**/*.pngdoc/images/01-set 编码.pngis excluded by!**/*.pngdoc/images/01-string 编码.pngis excluded by!**/*.pngdoc/images/01-zset编码.pngis excluded by!**/*.png
📒 Files selected for processing (28)
doc/01-key编解码.mddoc/文档与代码实现差异分析.mdsrc/storage/src/custom_comparator.rssrc/storage/src/data_compaction_filter.rssrc/storage/src/expiration_manager.rssrc/storage/src/format_base_data_value.rssrc/storage/src/format_base_key.rssrc/storage/src/format_base_meta_value.rssrc/storage/src/format_base_value.rssrc/storage/src/format_list_meta_value.rssrc/storage/src/format_lists_data_key.rssrc/storage/src/format_member_data_key.rssrc/storage/src/format_strings_value.rssrc/storage/src/format_zset_score_key.rssrc/storage/src/lib.rssrc/storage/src/meta_compaction_filter.rssrc/storage/src/redis.rssrc/storage/src/redis_hashes.rssrc/storage/src/redis_lists.rssrc/storage/src/redis_multi.rssrc/storage/src/redis_sets.rssrc/storage/src/redis_strings.rssrc/storage/src/redis_zsets.rssrc/storage/src/statistics.rssrc/storage/src/storage.rssrc/storage/src/storage_impl.rssrc/storage/src/zset_score_key_format.rssrc/storage/tests/redis_basic_test.rs
💤 Files with no reviewable changes (1)
- src/storage/src/zset_score_key_format.rs
|
|
||
| ## Hash 结构的存储 | ||
|
|
||
| Hash 类型数据结构由两部分构成,元数据(meta_key,meta_value)和普通数据(data_key,data_value)。每个 Hash 类型数据对应一条元数据,每个 Filed 对应一条普通数据。具体格式如下图所示。 |
There was a problem hiding this comment.
Fix terminology typos in key format descriptions.
There are several typos (Filed, filed, memer) that make the encoding spec harder to follow.
✏️ Suggested doc patch
-Hash 类型数据结构由两部分构成,元数据(meta_key,meta_value)和普通数据(data_key,data_value)。每个 Hash 类型数据对应一条元数据,每个 Filed 对应一条普通数据。具体格式如下图所示。
+Hash 类型数据结构由两部分构成,元数据(meta_key,meta_value)和普通数据(data_key,data_value)。每个 Hash 类型数据对应一条元数据,每个 field 对应一条普通数据。具体格式如下图所示。
-普通数据主要就是指的同一个 hash 表中一一对应的 field 和 value,普通数据的 key 字段依次拼接了保留字段,编码后的 user key,元数据中的最新 version,filed 字段以及后缀保留字段。value 则拼接了保留字段以及 field 写入时间。
+普通数据主要就是指同一个 hash 表中一一对应的 field 和 value,普通数据的 key 字段依次拼接了保留字段、编码后的 user key、元数据中的最新 version、field 字段以及后缀保留字段。value 则拼接了保留字段以及 field 写入时间。
-zset 由两部分构成,元数据(meta_key, meta_value), 普通数据(data_key, data_value)。元数据中存储的主要是 zset 集合的一些信息, 比如说当前 zset 集合中 member 的数量以及当前 zset 集合的版本号和过期时间(用做秒删功能), 而普通数据就是指的 zset 中每个 member 以及对应的 score。由于 zset 这种数据结构比较特殊,需要按照 memer 进行排序,也需要按照 score 进行排序,所以我们对于每一个 zset 我们会按照不同的格式存储两份普通数据, 在这里我们称为 member to score 和 score to member,作为具体最后 RocksDB 落盘的 KV 格式,具体格式如下:
+zset 由两部分构成,元数据(meta_key, meta_value) 和普通数据(data_key, data_value)。元数据中存储的主要是 zset 集合的一些信息,比如当前 zset 集合中 member 的数量以及当前 zset 集合的版本号和过期时间(用于秒删功能),而普通数据就是指 zset 中每个 member 以及对应的 score。由于 zset 这种数据结构比较特殊,需要按照 member 排序,也需要按照 score 排序,所以我们会按不同格式存储两份普通数据,在这里称为 member to score 和 score to member。作为最终 RocksDB 落盘的 KV 格式,具体如下:Also applies to: 22-22, 50-50
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@doc/01-key编解码.md` at line 16, 修正文档中关于键格式的术语拼写错误:在文件 01-key编解码.md 中将所有出现的
“Filed” 和 “filed” 改为 “Field/field”,并将 “memer” 改为 “member”(例如出现在描述 Hash 类型数据结构的那句
“每个 Filed 对应一条普通数据” 以及其它相似段落),保持大小写一致性并校验其它处没有残留拼写错误以确保描述统一清晰。
| - 初始左边界: `9223372036854775807` (INT64_MAX) | ||
| - 初始右边界: `9223372036854775808` (INT64_MIN) |
There was a problem hiding this comment.
Clarify INT64_MIN representation to avoid numeric ambiguity.
Line 85 currently shows a positive literal with an INT64_MIN label. For signed semantics, INT64_MIN is -9223372036854775808. If you mean the u64 bit-pattern representation, please state that explicitly.
📝 Suggested clarification
-- 初始右边界: `9223372036854775808` (INT64_MIN)
+- 初始右边界: `-9223372036854775808` (INT64_MIN,按 i64 语义)
+- 若按 u64 存储其补码位型,则对应为 `9223372036854775808`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - 初始左边界: `9223372036854775807` (INT64_MAX) | |
| - 初始右边界: `9223372036854775808` (INT64_MIN) | |
| - 初始左边界: `9223372036854775807` (INT64_MAX) | |
| - 初始右边界: `-9223372036854775808` (INT64_MIN,按 i64 语义) | |
| - 若按 u64 存储其补码位型,则对应为 `9223372036854775808` |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@doc/文档与代码实现差异分析.md` around lines 84 - 85, The document shows "初始右边界:
`9223372036854775808` (INT64_MIN)" which is ambiguous; update the text to
correctly state signed INT64_MIN as `-9223372036854775808` if you mean the
signed minimum, or explicitly note that `9223372036854775808` is the
unsigned/u64 bit-pattern equivalent of INT64_MIN (two's‑complement) if that is
the intended meaning; ensure you reference the symbols INT64_MIN and INT64_MAX
in the clarification so readers know which interpretation is used.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/storage/src/redis_sets.rs (1)
718-766:⚠️ Potential issue | 🔴 Critical
SMOVE在source == destination时会错误删除成员。Line 765 先删源成员,而同键场景下 Line 781 分支不会回写,导致数据丢失。需要对同键场景短路处理,避免进入删除路径。
🐛 Proposed fix
let mut batch = self.create_batch()?; + // Same source/destination should be a no-op to avoid destructive delete. + if source == destination { + return Ok(true); + } + // Remove from source batch.delete(ColumnFamilyIndex::SetsDataCF, &source_member_key)?; let new_source_count = source_meta.count() - 1;Also applies to: 780-806
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/src/redis_sets.rs` around lines 718 - 766, The SMOVE implementation deletes the source member before handling the destination and doesn't short-circuit when source == destination, causing data loss; update the logic in the SMOVE handling (around MemberDataKey::new(...).encode(), ParsedSetsMetaValue handling, and the batch delete/insert flow) to detect when source and destination refer to the same encoded key/version (or identical set identity) and return early (e.g., no-op with Ok(false/true as appropriate) or skip the delete/write) so you never delete then skip rewriting in same-key scenarios; ensure this short-circuit occurs before executing batch.delete(ColumnFamilyIndex::SetsDataCF, &source_member_key) and before modifying source_meta.count().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@doc/01-key编解码.md`:
- Around line 20-24: The docs omit the byte order for the version field; update
the first definition of the "version" field in the metadata section to state it
is a u64 in little-endian (little-endian byte order) and add a short sentence
reminding readers that every reuse of "version" in keys (e.g., meta key, data
key assembly where you concatenate main key + version + field) follows this u64
little-endian convention; ensure the same clarification is added to the other
referenced sections (lines referenced in the comment) that describe "version",
"元数据", "data key" and "普通数据" key composition so all usages are explicit.
---
Outside diff comments:
In `@src/storage/src/redis_sets.rs`:
- Around line 718-766: The SMOVE implementation deletes the source member before
handling the destination and doesn't short-circuit when source == destination,
causing data loss; update the logic in the SMOVE handling (around
MemberDataKey::new(...).encode(), ParsedSetsMetaValue handling, and the batch
delete/insert flow) to detect when source and destination refer to the same
encoded key/version (or identical set identity) and return early (e.g., no-op
with Ok(false/true as appropriate) or skip the delete/write) so you never delete
then skip rewriting in same-key scenarios; ensure this short-circuit occurs
before executing batch.delete(ColumnFamilyIndex::SetsDataCF, &source_member_key)
and before modifying source_meta.count().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e25050a0-eb1e-47c3-a224-ac1f1e0bedb0
⛔ Files ignored due to path filters (6)
doc/images/01-cf 使用.pngis excluded by!**/*.pngdoc/images/01-hash编码.pngis excluded by!**/*.pngdoc/images/01-list编码.pngis excluded by!**/*.pngdoc/images/01-set 编码.pngis excluded by!**/*.pngdoc/images/01-string 编码.pngis excluded by!**/*.pngdoc/images/01-zset编码.pngis excluded by!**/*.png
📒 Files selected for processing (27)
doc/01-key编解码.mdsrc/storage/src/custom_comparator.rssrc/storage/src/data_compaction_filter.rssrc/storage/src/expiration_manager.rssrc/storage/src/format_base_data_value.rssrc/storage/src/format_base_key.rssrc/storage/src/format_base_meta_value.rssrc/storage/src/format_base_value.rssrc/storage/src/format_list_meta_value.rssrc/storage/src/format_lists_data_key.rssrc/storage/src/format_member_data_key.rssrc/storage/src/format_strings_value.rssrc/storage/src/format_zset_score_key.rssrc/storage/src/lib.rssrc/storage/src/meta_compaction_filter.rssrc/storage/src/redis.rssrc/storage/src/redis_hashes.rssrc/storage/src/redis_lists.rssrc/storage/src/redis_multi.rssrc/storage/src/redis_sets.rssrc/storage/src/redis_strings.rssrc/storage/src/redis_zsets.rssrc/storage/src/statistics.rssrc/storage/src/storage.rssrc/storage/src/storage_impl.rssrc/storage/src/zset_score_key_format.rssrc/storage/tests/redis_basic_test.rs
💤 Files with no reviewable changes (1)
- src/storage/src/zset_score_key_format.rs
✅ Files skipped from review due to trivial changes (15)
- src/storage/src/format_base_value.rs
- src/storage/src/format_base_data_value.rs
- src/storage/src/format_strings_value.rs
- src/storage/src/format_base_meta_value.rs
- src/storage/src/format_list_meta_value.rs
- src/storage/src/storage.rs
- src/storage/src/redis.rs
- src/storage/src/expiration_manager.rs
- src/storage/tests/redis_basic_test.rs
- src/storage/src/meta_compaction_filter.rs
- src/storage/src/format_member_data_key.rs
- src/storage/src/redis_lists.rs
- src/storage/src/data_compaction_filter.rs
- src/storage/src/statistics.rs
- src/storage/src/redis_zsets.rs
🚧 Files skipped from review as they are similar to previous changes (6)
- src/storage/src/redis_multi.rs
- src/storage/src/redis_hashes.rs
- src/storage/src/redis_strings.rs
- src/storage/src/lib.rs
- src/storage/src/custom_comparator.rs
- src/storage/src/format_zset_score_key.rs
| 元数据中的 key 由前缀保留字段,编码后的 user key 以及后缀保留字段构成,value 中记录了 hash 中元素个数,最新版本号,保留字段,数据写入时间以及数据过期时间,version 字段用于实现秒删功能。 | ||
|
|
||
| 普通数据主要就是指的同一个 hash 表中一一对应的 field 和 value,普通数据的 key 字段依次拼接了保留字段,编码后的 user key,元数据中的最新 version,filed 字段以及后缀保留字段。value 则拼接了保留字段以及 field 写入时间。 | ||
|
|
||
| 当执行读写以及删除单个 field 操作时,首先获取元信息,之后解析出 version,再将主 key,version 和 field字段拼接得到最终的 data key,最后读写 data key。 |
There was a problem hiding this comment.
请在文档中明确 version 的字节序(little-endian)。
当前描述多次提到 version 拼接,但没有明确字节序;这会让实现方在跨模块/跨语言时产生歧义。建议在首次定义处注明 u64 little-endian,并在各结构复用该约定。
📝 Suggested doc patch
-元数据中的 key 由前缀保留字段,编码后的 user key 以及后缀保留字段构成,value 中记录了 hash 中元素个数,最新版本号,保留字段,数据写入时间以及数据过期时间,version 字段用于实现秒删功能。
+元数据中的 key 由前缀保留字段,编码后的 user key 以及后缀保留字段构成,value 中记录了 hash 中元素个数,最新版本号,保留字段,数据写入时间以及数据过期时间。version 字段使用 u64 little-endian 编码,并用于实现秒删功能。Also applies to: 38-43, 50-58
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@doc/01-key编解码.md` around lines 20 - 24, The docs omit the byte order for the
version field; update the first definition of the "version" field in the
metadata section to state it is a u64 in little-endian (little-endian byte
order) and add a short sentence reminding readers that every reuse of "version"
in keys (e.g., meta key, data key assembly where you concatenate main key +
version + field) follows this u64 little-endian convention; ensure the same
clarification is added to the other referenced sections (lines referenced in the
comment) that describe "version", "元数据", "data key" and "普通数据" key composition
so all usages are explicit.
6bf161a to
114c64d
Compare
Signed-off-by: happy-v587 <code4happy@gmail.com>
114c64d to
8a9196c
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/storage/src/redis_multi.rs (1)
474-549:⚠️ Potential issue | 🔴 CriticalChange helper methods to use little-endian encoding for consistency with format modules.
The commit "fix: key encode use little-endian" updated
format_member_data_key.rsto useput_u64_le()for version encoding, but the newly added helper methods inredis_multi.rsstill useto_be_bytes():
- Line 480:
encode_hash_data_prefix→version.to_be_bytes()- Line 492:
encode_hash_data_key→version.to_be_bytes()- Line 507:
encode_list_data_prefix→version.to_be_bytes()- Lines 519, 522:
encode_list_data_key→version.to_be_bytes(),index.to_be_bytes()- Line 534:
encode_zsets_member_prefix→version.to_be_bytes()- Line 546:
encode_zsets_score_prefix→version.to_be_bytes()These should be changed to
to_le_bytes()to align with the format modules and the PR's intent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/src/redis_multi.rs` around lines 474 - 549, Several helper encoders in redis_multi.rs still use big-endian bytes (to_be_bytes()) for version/index while format modules and format_member_data_key use little-endian; update each occurrence in encode_hash_data_prefix, encode_hash_data_key, encode_list_data_prefix, encode_list_data_key (both version and index), encode_zsets_member_prefix, and encode_zsets_score_prefix to use to_le_bytes() so encoding is consistent across modules. Ensure you replace version.to_be_bytes() and index.to_be_bytes() with version.to_le_bytes()/index.to_le_bytes() respectively and keep the same push/extend order.
🧹 Nitpick comments (1)
src/storage/src/format_zset_score_key.rs (1)
65-84: Consider removing redundantencode_user_keycall for efficiency.
encode_user_keyis called twice: once at line 67 for capacity estimation and again at line 77 for actual encoding. This duplicates work.♻️ Suggested optimization
pub fn encode(&self) -> Result<BytesMut> { - let mut encoded_key_buf = BytesMut::new(); - encode_user_key(&self.key, &mut encoded_key_buf)?; - let estimated_cap = PREFIX_RESERVE_LENGTH - + encoded_key_buf.len() - + size_of::<u64>() // version - + size_of::<f64>() // score - + self.member.len() - + SUFFIX_RESERVE_LENGTH; - let mut dst = BytesMut::with_capacity(estimated_cap); - - dst.put_slice(&self.reserve1); - encode_user_key(&self.key, &mut dst)?; + // Encode user key once, then copy to final buffer + let mut encoded_key_buf = BytesMut::new(); + encode_user_key(&self.key, &mut encoded_key_buf)?; + + let estimated_cap = PREFIX_RESERVE_LENGTH + + encoded_key_buf.len() + + size_of::<u64>() // version + + size_of::<f64>() // score + + self.member.len() + + SUFFIX_RESERVE_LENGTH; + let mut dst = BytesMut::with_capacity(estimated_cap); + + dst.put_slice(&self.reserve1); + dst.put_slice(&encoded_key_buf); // Use little-endian for version and score (custom comparator expects this) dst.put_u64_le(self.version);The same optimization applies to
encode_seek_key().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/src/format_zset_score_key.rs` around lines 65 - 84, The encode() implementation redundantly calls encode_user_key(&self.key, ...) twice (once into encoded_key_buf for sizing and again when writing into dst); to fix, call encode_user_key once into a temporary BytesMut (e.g., encoded_user_key), use encoded_user_key.len() when computing estimated_cap, and then write encoded_user_key into dst instead of calling encode_user_key a second time; apply the same change pattern to encode_seek_key() so both functions reuse the single encoded_user_key buffer for sizing and output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/storage/src/redis_multi.rs`:
- Around line 474-549: Several helper encoders in redis_multi.rs still use
big-endian bytes (to_be_bytes()) for version/index while format modules and
format_member_data_key use little-endian; update each occurrence in
encode_hash_data_prefix, encode_hash_data_key, encode_list_data_prefix,
encode_list_data_key (both version and index), encode_zsets_member_prefix, and
encode_zsets_score_prefix to use to_le_bytes() so encoding is consistent across
modules. Ensure you replace version.to_be_bytes() and index.to_be_bytes() with
version.to_le_bytes()/index.to_le_bytes() respectively and keep the same
push/extend order.
---
Nitpick comments:
In `@src/storage/src/format_zset_score_key.rs`:
- Around line 65-84: The encode() implementation redundantly calls
encode_user_key(&self.key, ...) twice (once into encoded_key_buf for sizing and
again when writing into dst); to fix, call encode_user_key once into a temporary
BytesMut (e.g., encoded_user_key), use encoded_user_key.len() when computing
estimated_cap, and then write encoded_user_key into dst instead of calling
encode_user_key a second time; apply the same change pattern to
encode_seek_key() so both functions reuse the single encoded_user_key buffer for
sizing and output.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b2817b46-4e04-4194-af5a-df0f0b108db7
⛔ Files ignored due to path filters (6)
doc/images/01-cf 使用.pngis excluded by!**/*.pngdoc/images/01-hash编码.pngis excluded by!**/*.pngdoc/images/01-list编码.pngis excluded by!**/*.pngdoc/images/01-set 编码.pngis excluded by!**/*.pngdoc/images/01-string 编码.pngis excluded by!**/*.pngdoc/images/01-zset编码.pngis excluded by!**/*.png
📒 Files selected for processing (27)
doc/01-key编解码.mdsrc/storage/src/custom_comparator.rssrc/storage/src/data_compaction_filter.rssrc/storage/src/expiration_manager.rssrc/storage/src/format_base_data_value.rssrc/storage/src/format_base_key.rssrc/storage/src/format_base_meta_value.rssrc/storage/src/format_base_value.rssrc/storage/src/format_list_meta_value.rssrc/storage/src/format_lists_data_key.rssrc/storage/src/format_member_data_key.rssrc/storage/src/format_strings_value.rssrc/storage/src/format_zset_score_key.rssrc/storage/src/lib.rssrc/storage/src/meta_compaction_filter.rssrc/storage/src/redis.rssrc/storage/src/redis_hashes.rssrc/storage/src/redis_lists.rssrc/storage/src/redis_multi.rssrc/storage/src/redis_sets.rssrc/storage/src/redis_strings.rssrc/storage/src/redis_zsets.rssrc/storage/src/statistics.rssrc/storage/src/storage.rssrc/storage/src/storage_impl.rssrc/storage/src/zset_score_key_format.rssrc/storage/tests/redis_basic_test.rs
💤 Files with no reviewable changes (1)
- src/storage/src/zset_score_key_format.rs
✅ Files skipped from review due to trivial changes (18)
- src/storage/src/format_base_value.rs
- src/storage/src/format_strings_value.rs
- src/storage/src/format_list_meta_value.rs
- src/storage/src/format_base_data_value.rs
- src/storage/src/expiration_manager.rs
- src/storage/src/format_base_meta_value.rs
- src/storage/tests/redis_basic_test.rs
- src/storage/src/storage.rs
- src/storage/src/redis.rs
- src/storage/src/storage_impl.rs
- src/storage/src/statistics.rs
- src/storage/src/redis_zsets.rs
- src/storage/src/redis_lists.rs
- src/storage/src/data_compaction_filter.rs
- src/storage/src/meta_compaction_filter.rs
- src/storage/src/redis_hashes.rs
- src/storage/src/redis_sets.rs
- src/storage/src/custom_comparator.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- src/storage/src/format_member_data_key.rs
- src/storage/src/redis_strings.rs
- src/storage/src/lib.rs
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/storage/src/redis_multi.rs (1)
26-26:⚠️ Potential issue | 🔴 CriticalAdd the missing
rocksdb::IteratorModeandrocksdb::Directionimports.This file uses
IteratorMode::FromandDirection::Forwardat multiple locations, but neither type is imported fromrocksdb, causing compilation errors.Suggested fix
-use rocksdb::{WriteBatch, WriteOptions, ReadOptions}; +use rocksdb::{Direction, IteratorMode, ReadOptions, WriteBatch, WriteOptions};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/src/redis_multi.rs` at line 26, The code references rocksdb::IteratorMode::From and rocksdb::Direction::Forward but they aren't imported; update the existing use statement that currently imports WriteBatch, WriteOptions, ReadOptions to also import IteratorMode and Direction (e.g., add IteratorMode and Direction to the use rocksdb::{...} list) so symbols like IteratorMode::From and Direction::Forward resolve in functions using them.
♻️ Duplicate comments (3)
doc/01-key编解码.md (2)
20-24:⚠️ Potential issue | 🟡 Minor请把
version的编码方式明确写成u64 little-endian。这次 PR 的核心就是切到 little-endian,但文档里
version仍然是泛指,List段落还把 data key 文字描述成只有key + Index。这里需要把所有复用version的 key/value 描述统一写死,否则跨模块实现很容易继续漂移。Also applies to: 42-42, 54-58
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@doc/01-key编解码.md` around lines 20 - 24, The docs currently leave the encoding of the version ambiguous; update all occurrences (including the metadata description, the ordinary data key/value description, and the List segment where data key was described as "key + Index") to explicitly state that the version field is encoded as u64 little-endian and that every place referencing version in keys/values uses this exact encoding; ensure you specify for the meta key (metadata), the data key (key + version + field/index), and any List/index keys that the version bytes are appended/parsed as a u64 little-endian value so cross-module implementations use the same binary layout.
16-22:⚠️ Potential issue | 🟡 Minor修正文档里的术语拼写,避免实现方误读。
这里仍然有
Filed、filed、memer这类术语错误;在编码规格文档里,这会直接增加 field/member 布局的理解成本。Also applies to: 50-50
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@doc/01-key编解码.md` around lines 16 - 22, Replace the misspelled terminology in the document: change all occurrences of "Filed" and "filed" to "Field" and "field", and "memer" to "member" so the descriptions of Hash, meta_key, meta_value, data_key, data_value, version, and related key/value layout language are accurate and consistent; ensure the key/value paragraph and any captions (including the terms used around meta_key/meta_value and data_key/data_value, and mentions of version/field/member) are updated accordingly and also apply the same fixes in the related "50-50" document.src/storage/src/custom_comparator.rs (1)
372-385:⚠️ Potential issue | 🔴 CriticalDo not lock
Equalin for0.0/-0.0or NaN bit-pattern collisions.These assertions bake the comparator bug in: the encoded keys are byte-distinct, but
zsets_score_key_compare()can still returnOrdering::Equalwhen the numeric comparison collapses them andmemberis the same. A RocksDB comparator must not report equality for distinct keys. Either canonicalize-0.0/NaN during encoding, or use the raw score bits as a tie-breaker after numeric comparison.Does RocksDB require a custom comparator to return equality only for keys that are truly identical in ordering, and what are the recommended approaches for handling distinct floating-point encodings such as `0.0` vs `-0.0` or different NaN payloads?Also applies to: 455-463
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/src/custom_comparator.rs` around lines 372 - 385, The comparator zsets_score_key_compare currently can return Ordering::Equal for keys whose encoded bytes differ (e.g., 0.0 vs -0.0 or distinct NaN payloads), which breaks RocksDB's requirement that equality implies identical keys; fix by either canonicalizing floating scores during encoding in ZSetsScoreKey::encode (normalize -0.0 to 0.0 and collapse/normalize NaN payloads) or, more conservatively, keep numeric comparison but add a deterministic tie-breaker using the raw f64 bit pattern (e.g., compare f64::to_bits()) and then the member bytes so that zsets_score_key_compare never returns Equal for distinct byte sequences (also update tests like test_zsets_score_key_compare_positive_negative_zero accordingly).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/storage/src/format_zset_score_key.rs`:
- Around line 18-22: The module fails to compile because size_of is not
imported; add an import for std::mem::size_of (e.g., add use std::mem::size_of;)
at the top of the module so all unqualified size_of::<T>() calls in functions
like the zset score key encoder/decoder and any other locations (the blocks
around the previously noted ranges) resolve correctly, or alternatively qualify
each call as std::mem::size_of::<T>()—make the change consistently wherever
size_of is used.
- Around line 551-589: The test test_exact_capacity_estimation currently asserts
encoded.capacity() == expected_len which is wrong because
BytesMut::with_capacity may round up; change the strict equality to
assert_eq!(encoded.len(), expected_len) to verify the actual encoded data
layout, and keep or add the existing assert!(encoded.capacity() >= expected_len)
to preserve the allocation guarantee; update the assertions around
ZSetsScoreKey::new(...).encode(), encode_user_key(), PREFIX_RESERVE_LENGTH and
SUFFIX_RESERVE_LENGTH accordingly so the test checks encoded.len() for exact
byte layout and capacity() for allocation sufficiency.
In `@src/storage/src/redis_multi.rs`:
- Around line 399-407: Add the missing imports IteratorMode and Direction to the
rocksdb import so the uses in redis_multi.rs compile; then fix the endianness
mismatch by changing all occurrences of to_be_bytes() to to_le_bytes() in the
helper encoder functions encode_hash_data_prefix, encode_hash_data_key,
encode_list_data_prefix, encode_list_data_key (also change index.to_be_bytes()
to index.to_le_bytes()), encode_zsets_member_prefix, and
encode_zsets_score_prefix so keys/prefixes are encoded in little-endian matching
decode_fixed in coding.rs and allowing prefix scans (del/rename) to find the
correct keys.
---
Outside diff comments:
In `@src/storage/src/redis_multi.rs`:
- Line 26: The code references rocksdb::IteratorMode::From and
rocksdb::Direction::Forward but they aren't imported; update the existing use
statement that currently imports WriteBatch, WriteOptions, ReadOptions to also
import IteratorMode and Direction (e.g., add IteratorMode and Direction to the
use rocksdb::{...} list) so symbols like IteratorMode::From and
Direction::Forward resolve in functions using them.
---
Duplicate comments:
In `@doc/01-key编解码.md`:
- Around line 20-24: The docs currently leave the encoding of the version
ambiguous; update all occurrences (including the metadata description, the
ordinary data key/value description, and the List segment where data key was
described as "key + Index") to explicitly state that the version field is
encoded as u64 little-endian and that every place referencing version in
keys/values uses this exact encoding; ensure you specify for the meta key
(metadata), the data key (key + version + field/index), and any List/index keys
that the version bytes are appended/parsed as a u64 little-endian value so
cross-module implementations use the same binary layout.
- Around line 16-22: Replace the misspelled terminology in the document: change
all occurrences of "Filed" and "filed" to "Field" and "field", and "memer" to
"member" so the descriptions of Hash, meta_key, meta_value, data_key,
data_value, version, and related key/value layout language are accurate and
consistent; ensure the key/value paragraph and any captions (including the terms
used around meta_key/meta_value and data_key/data_value, and mentions of
version/field/member) are updated accordingly and also apply the same fixes in
the related "50-50" document.
In `@src/storage/src/custom_comparator.rs`:
- Around line 372-385: The comparator zsets_score_key_compare currently can
return Ordering::Equal for keys whose encoded bytes differ (e.g., 0.0 vs -0.0 or
distinct NaN payloads), which breaks RocksDB's requirement that equality implies
identical keys; fix by either canonicalizing floating scores during encoding in
ZSetsScoreKey::encode (normalize -0.0 to 0.0 and collapse/normalize NaN
payloads) or, more conservatively, keep numeric comparison but add a
deterministic tie-breaker using the raw f64 bit pattern (e.g., compare
f64::to_bits()) and then the member bytes so that zsets_score_key_compare never
returns Equal for distinct byte sequences (also update tests like
test_zsets_score_key_compare_positive_negative_zero accordingly).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 700a826c-1ee3-4257-8bc0-d54ccd82f4f9
⛔ Files ignored due to path filters (6)
doc/images/01-cf使用.pngis excluded by!**/*.pngdoc/images/01-hash编码.pngis excluded by!**/*.pngdoc/images/01-list编码.pngis excluded by!**/*.pngdoc/images/01-set编码.pngis excluded by!**/*.pngdoc/images/01-string编码.pngis excluded by!**/*.pngdoc/images/01-zset编码.pngis excluded by!**/*.png
📒 Files selected for processing (27)
doc/01-key编解码.mdsrc/storage/src/custom_comparator.rssrc/storage/src/data_compaction_filter.rssrc/storage/src/expiration_manager.rssrc/storage/src/format_base_data_value.rssrc/storage/src/format_base_key.rssrc/storage/src/format_base_meta_value.rssrc/storage/src/format_base_value.rssrc/storage/src/format_list_meta_value.rssrc/storage/src/format_lists_data_key.rssrc/storage/src/format_member_data_key.rssrc/storage/src/format_strings_value.rssrc/storage/src/format_zset_score_key.rssrc/storage/src/lib.rssrc/storage/src/meta_compaction_filter.rssrc/storage/src/redis.rssrc/storage/src/redis_hashes.rssrc/storage/src/redis_lists.rssrc/storage/src/redis_multi.rssrc/storage/src/redis_sets.rssrc/storage/src/redis_strings.rssrc/storage/src/redis_zsets.rssrc/storage/src/statistics.rssrc/storage/src/storage.rssrc/storage/src/storage_impl.rssrc/storage/src/zset_score_key_format.rssrc/storage/tests/redis_basic_test.rs
💤 Files with no reviewable changes (1)
- src/storage/src/zset_score_key_format.rs
✅ Files skipped from review due to trivial changes (15)
- src/storage/src/format_base_value.rs
- src/storage/src/format_strings_value.rs
- src/storage/src/format_base_data_value.rs
- src/storage/src/redis.rs
- src/storage/src/storage.rs
- src/storage/src/expiration_manager.rs
- src/storage/src/format_base_meta_value.rs
- src/storage/tests/redis_basic_test.rs
- src/storage/src/redis_zsets.rs
- src/storage/src/meta_compaction_filter.rs
- src/storage/src/redis_lists.rs
- src/storage/src/data_compaction_filter.rs
- src/storage/src/redis_sets.rs
- src/storage/src/format_list_meta_value.rs
- src/storage/src/redis_strings.rs
🚧 Files skipped from review as they are similar to previous changes (4)
- src/storage/src/storage_impl.rs
- src/storage/src/redis_hashes.rs
- src/storage/src/format_member_data_key.rs
- src/storage/src/lib.rs
| use crate::error::InvalidFormatSnafu; | ||
| use crate::error::Result; | ||
| use crate::storage_define::{PREFIX_RESERVE_LENGTH, SUFFIX_RESERVE_LENGTH}; | ||
| use crate::storage_define::{decode_user_key, encode_user_key, seek_userkey_delim}; | ||
| use bytes::{BufMut, Bytes, BytesMut}; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n '\bsize_of::<[^>]+>\(\)' src/storage/src/format_zset_score_key.rs
rg -n 'std::mem::size_of|use .*size_of' src/storage/src/format_zset_score_key.rsRepository: arana-db/kiwi
Length of output: 941
🏁 Script executed:
#!/bin/bash
# Search for the test mentioned in the review comment
rg -n 'test_exact_capacity_estimation|capacity\(\) ==' src/storage/src/format_zset_score_key.rsRepository: arana-db/kiwi
Length of output: 102
🏁 Script executed:
#!/bin/bash
# Read the test implementation
sed -n '551,580p' src/storage/src/format_zset_score_key.rs | cat -nRepository: arana-db/kiwi
Length of output: 1511
🏁 Script executed:
#!/bin/bash
# Verify the encode() function uses size_of internally
rg -B5 -A15 'fn encode\(&self\)' src/storage/src/format_zset_score_key.rs | head -40Repository: arana-db/kiwi
Length of output: 888
size_of is not in scope here, so this new module will not compile.
Rust does not import size_of implicitly. Every unqualified size_of::<...>() in this file is unresolved unless you import std::mem::size_of or fully qualify each call.
Suggested fix
+use std::mem::size_of;
+
use crate::error::InvalidFormatSnafu;
use crate::error::Result;
use crate::storage_define::{PREFIX_RESERVE_LENGTH, SUFFIX_RESERVE_LENGTH};
use crate::storage_define::{decode_user_key, encode_user_key, seek_userkey_delim};
use bytes::{BufMut, Bytes, BytesMut};Also applies to: 68-73, 89-92, 136-169
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/storage/src/format_zset_score_key.rs` around lines 18 - 22, The module
fails to compile because size_of is not imported; add an import for
std::mem::size_of (e.g., add use std::mem::size_of;) at the top of the module so
all unqualified size_of::<T>() calls in functions like the zset score key
encoder/decoder and any other locations (the blocks around the previously noted
ranges) resolve correctly, or alternatively qualify each call as
std::mem::size_of::<T>()—make the change consistently wherever size_of is used.
| fn test_exact_capacity_estimation() { | ||
| // Verify accurate capacity estimation and ensure no extra memory allocation | ||
| let test_cases = vec![ | ||
| (b"simple" as &[u8], b"member" as &[u8]), | ||
| (b"key\x00with\x00nulls", b"mem\x00ber"), | ||
| (b"", b""), | ||
| (b"a", b"b"), | ||
| ]; | ||
|
|
||
| for (key, member) in test_cases { | ||
| let score_key = ZSetsScoreKey::new(key, 1, 1.0, member); | ||
| let encoded = score_key.encode().expect("encode failed"); | ||
|
|
||
| // Calculate expected length using actual encoded key length | ||
| let mut encoded_key_buf = BytesMut::new(); | ||
| encode_user_key(key, &mut encoded_key_buf).expect("encode key failed"); | ||
| let expected_len = PREFIX_RESERVE_LENGTH | ||
| + encoded_key_buf.len() | ||
| + size_of::<u64>() // version | ||
| + size_of::<f64>() // score | ||
| + member.len() | ||
| + SUFFIX_RESERVE_LENGTH; | ||
|
|
||
| // Verify actual encoded length matches expected length | ||
| assert_eq!( | ||
| encoded.capacity(), | ||
| expected_len, | ||
| "Capacity mismatch for key={:?}, member={:?}", | ||
| key, | ||
| member | ||
| ); | ||
|
|
||
| // Verify BytesMut capacity is sufficient (allocator may allocate more for alignment) | ||
| assert!( | ||
| encoded.capacity() >= expected_len, | ||
| "Capacity underallocated for key={:?}, member={:?}", | ||
| key, | ||
| member | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and examine the file
fd format_zset_score_key.rsRepository: arana-db/kiwi
Length of output: 97
🏁 Script executed:
# Read the specific lines mentioned in the review comment
wc -l src/storage/src/format_zset_score_key.rsRepository: arana-db/kiwi
Length of output: 101
🏁 Script executed:
# Read the test function to see current state
sed -n '551,589p' src/storage/src/format_zset_score_key.rsRepository: arana-db/kiwi
Length of output: 1616
🌐 Web query:
BytesMut with_capacity guarantee exact or minimum
💡 Result:
BytesMut::with_capacity(capacity) guarantees a minimum capacity of the specified value, not an exact capacity. The returned BytesMut can hold at least capacity bytes without reallocating, but the actual capacity may be larger depending on Vec::with_capacity behavior (which may round up to a power of 2 or similar for efficiency). This is confirmed by the official documentation: "The returned BytesMut will be able to hold at least capacity bytes without reallocating." Examples show BytesMut::with_capacity(64) has capacity == 64, but the guarantee is ">=". Source code implements it as BytesMut::from_vec(Vec::with_capacity(capacity)), inheriting Vec's minimum capacity behavior. Note: BytesMut is from the bytes crate (used by Tokio), not std::io::BytesMut (which doesn't exist).
Citations:
- 1: https://docs.rs/bytes/latest/bytes/struct.BytesMut.html
- 2: https://docs.rs/bytes/0.4.8/bytes/struct.BytesMut.html
- 3: https://docs.rs/bytes/1.10.1/x86_64-pc-windows-msvc/src/bytes/bytes_mut.rs.html
- 4: https://people.eecs.berkeley.edu/~pschafhalter/pub/erdos/doc/bytes/struct.BytesMut.html
- 5: https://durch.github.io/rust-goauth/bytes/struct.BytesMut.html
- 6: https://codebrowser.dev/tokio/crates/bytes-1.5.0/src/bytes_mut.rs.html
- 7: https://tikv.github.io/doc/bytes/struct.BytesMut.html
- 8: https://cseweb.ucsd.edu/classes/sp22/cse223B-a/tribbler/bytes/struct.BytesMut.html
🏁 Script executed:
# Let's also check if there's an encode() method and understand what BytesMut len represents
rg -A 5 "fn encode\(" src/storage/src/format_zset_score_key.rs | head -30Repository: arana-db/kiwi
Length of output: 342
🏁 Script executed:
# Check what encode_user_key does and if it sets the actual len
rg -B 3 -A 10 "fn encode_user_key" src/storage/src/format_zset_score_key.rsRepository: arana-db/kiwi
Length of output: 39
🏁 Script executed:
# Search for encode_user_key definition
rg "encode_user_key" src/storage/src/format_zset_score_key.rs -B 2 -A 15Repository: arana-db/kiwi
Length of output: 7365
🏁 Script executed:
# Get more context on the encode() method
sed -n '500,550p' src/storage/src/format_zset_score_key.rsRepository: arana-db/kiwi
Length of output: 1598
Replace exact capacity assertion with length assertion.
BytesMut::with_capacity(n) guarantees at least n bytes, not exactly n. The exact capacity check makes this test allocator- and version-dependent and will fail if the allocator rounds up. Check len() instead to verify the actual encoded data layout, and keep capacity() >= expected_len for the allocation guarantee.
Suggested fix
- assert_eq!(
- encoded.capacity(),
- expected_len,
- "Capacity mismatch for key={:?}, member={:?}",
- key,
- member
- );
+ assert_eq!(
+ encoded.len(),
+ expected_len,
+ "Encoded length mismatch for key={:?}, member={:?}",
+ key,
+ member
+ );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/storage/src/format_zset_score_key.rs` around lines 551 - 589, The test
test_exact_capacity_estimation currently asserts encoded.capacity() ==
expected_len which is wrong because BytesMut::with_capacity may round up; change
the strict equality to assert_eq!(encoded.len(), expected_len) to verify the
actual encoded data layout, and keep or add the existing
assert!(encoded.capacity() >= expected_len) to preserve the allocation
guarantee; update the assertions around ZSetsScoreKey::new(...).encode(),
encode_user_key(), PREFIX_RESERVE_LENGTH and SUFFIX_RESERVE_LENGTH accordingly
so the test checks encoded.len() for exact byte layout and capacity() for
allocation sufficiency.
| let score_raw: [u8; 8] = score_bytes | ||
| .get(0..8) | ||
| .and_then(|b| b.try_into().ok()) | ||
| .ok_or_else(|| { | ||
| StorageError::InvalidFormat( | ||
| "Invalid zset score payload length (expected 8 bytes)".to_string(), | ||
| ) | ||
| })?; | ||
| let score = f64::from_le_bytes(score_raw); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n 'fn encode_.*(prefix|key)|to_be_bytes\(' src/storage/src/redis_multi.rs
rg -n 'put_u64_le|from_le_bytes|decode_fixed::<u64>' src/storage/src/format_zset_score_key.rs src/storage/src/custom_comparator.rsRepository: arana-db/kiwi
Length of output: 2816
🏁 Script executed:
#!/bin/bash
# Check what decode_fixed does
rg -A 5 'fn decode_fixed|macro.*decode_fixed' src/storage/src/custom_comparator.rs
# Check if IteratorMode/Direction are imported or used
rg -n 'IteratorMode|Direction|use.*rocksdb' src/storage/src/redis_multi.rs | head -20
# Check if helpers are called in rename/del methods
rg -B 2 -A 10 'fn rename|fn del\(' src/storage/src/redis_multi.rs | head -40Repository: arana-db/kiwi
Length of output: 1524
🏁 Script executed:
#!/bin/bash
# Get the full del and rename methods
rg -A 50 'pub fn del\(&self' src/storage/src/redis_multi.rs | head -100
# Get full rename method
rg -A 80 'pub fn rename\(&self' src/storage/src/redis_multi.rs | head -120
# Check decode_fixed source - likely from rocksdb or local crate
rg 'fn decode_fixed|macro_rules.*decode_fixed' --type rustRepository: arana-db/kiwi
Length of output: 7137
🏁 Script executed:
#!/bin/bash
# Get the decode_fixed implementation to check endianness
rg -A 10 'pub fn decode_fixed' src/storage/src/coding.rs
# Check FixedInt trait to understand the implementation
rg -B 5 -A 15 'trait FixedInt' src/storage/src/coding.rs
# Also check what put_u64_le does vs to_be_bytes for comparison
rg -B 2 -A 8 'fn put_u64_le|fn put_u64_be' src/storage/src/coding.rsRepository: arana-db/kiwi
Length of output: 1041
Two critical compile and runtime blockers must be fixed:
-
Missing imports:
IteratorModeandDirectionare used at lines 310, 334, 358, 382, and 466 but not imported. Add them to the rocksdb import at line 26. -
Endianness mismatch in helper encoders: The helpers
encode_hash_data_prefix,encode_hash_data_key,encode_list_data_prefix,encode_list_data_key,encode_zsets_member_prefix, andencode_zsets_score_prefixall writeversion(andindexin the list key helper) usingto_be_bytes()(big-endian). However,decode_fixedinsrc/storage/src/coding.rsreads all fixed integers usingfrom_le_bytes()(little-endian). Whendel()andrename()call these helpers to construct prefixes for scanning, they will not match the actual keys written elsewhere, breaking prefix deletion and key enumeration.
Suggested fixes
For the imports, change line 26:
-use rocksdb::{WriteBatch, WriteOptions, ReadOptions};
+use rocksdb::{WriteBatch, WriteOptions, ReadOptions, IteratorMode, Direction};For each occurrence of to_be_bytes() in the six helper functions (lines 487, 499, 514, 526, 529, 541, 553):
- result.extend_from_slice(&version.to_be_bytes());
+ result.extend_from_slice(&version.to_le_bytes());Apply the same change to line 529 for index.to_be_bytes().
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/storage/src/redis_multi.rs` around lines 399 - 407, Add the missing
imports IteratorMode and Direction to the rocksdb import so the uses in
redis_multi.rs compile; then fix the endianness mismatch by changing all
occurrences of to_be_bytes() to to_le_bytes() in the helper encoder functions
encode_hash_data_prefix, encode_hash_data_key, encode_list_data_prefix,
encode_list_data_key (also change index.to_be_bytes() to index.to_le_bytes()),
encode_zsets_member_prefix, and encode_zsets_score_prefix so keys/prefixes are
encoded in little-endian matching decode_fixed in coding.rs and allowing prefix
scans (del/rename) to find the correct keys.
Summary by CodeRabbit
Documentation
New Features
Refactor
Bug Fixes
Tests