feat: support inverted index for nested path by fengys1996 · Pull Request #8075 · GreptimeTeam/greptimedb

fengys1996 · 2026-05-07T04:26:01Z

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

I have written the necessary rustdoc comments.
I have added the necessary unit tests and integration tests.
This PR requires documentation updates.
API changes are backward compatible.
Schema or data changes are backward compatible.

gemini-code-assist

Code Review

This pull request introduces support for nested paths in inverted index targets by extending the IndexTarget enum and updating the InvertedIndexer to handle these paths. The review identifies several performance issues related to repeated string allocations and formatting within loops, recommending that target_key be pre-calculated. Additionally, the review points out a logic gap where nested paths are skipped in do_update and raises a design concern regarding potential parsing collisions due to the lack of escaping for delimiters in the encoded target string.

gemini-code-assist · 2026-05-07T04:30:20Z

+        for target in &self.indexed_targets {
+            let Some(column_meta) = self.metadata.column_by_id(target.column_id()) else {
                debug!(
                    "Column {} not found in the metadata during building inverted index",
-                    col_id
+                    target.column_id()
                );
                continue;
            };
-            let column_name = &column_meta.column_schema.name;
-            if let Some(column_array) = batch.column_by_name(column_name) {
+            let column_name = if let Some(path) = target.path() {
+                format!("{}.{}", column_meta.column_schema.name, path.join("."))
+            } else {
+                column_meta.column_schema.name.clone()
+            };


Calculating target_key and column_name inside the row loop (or repeatedly in the target loop) is inefficient. target.to_string() involves string joining and formatting, which should be done once per target per batch. Ideally, these should be pre-calculated and stored in the InvertedIndexer struct, similar to how target_key was handled before this change.

Suggested change

for target in &self.indexed_targets {

let Some(column_meta) = self.metadata.column_by_id(target.column_id()) else {

debug!(

"Column {} not found in the metadata during building inverted index",

col_id

target.column_id()

);

continue;

};

let column_name = &column_meta.column_schema.name;

if let Some(column_array) = batch.column_by_name(column_name) {

let column_name = if let Some(path) = target.path() {

format!("{}.{}", column_meta.column_schema.name, path.join("."))

} else {

column_meta.column_schema.name.clone()

};

for target in &self.indexed_targets {

let target_key = target.to_string();

let Some(column_meta) = self.metadata.column_by_id(target.column_id()) else {

debug!(

"Column {} not found in the metadata during building inverted index",

target.column_id()

);

continue;

};

let column_name = match target.path() {

Some(path) => format!("{}.{}", column_meta.column_schema.name, path.join(".")),

None => column_meta.column_schema.name.clone(),

};

gemini-code-assist · 2026-05-07T04:30:20Z

                    if value_ref.is_null() {
                        self.index_creator
-                            .push_with_name(target_key, None)
+                            .push_with_name(&target.to_string(), None)


Use the pre-calculated target_key here to avoid repeated string allocations in the row loop.

Suggested change

.push_with_name(&target.to_string(), None)

.push_with_name(&target_key, None)

gemini-code-assist · 2026-05-07T04:30:20Z

                        .context(EncodeSnafu)?;
                        self.index_creator
-                            .push_with_name(target_key, Some(&self.value_buf))
+                            .push_with_name(&target.to_string(), Some(&self.value_buf))


Use the pre-calculated target_key here to avoid repeated string allocations in the row loop.

Suggested change

.push_with_name(&target.to_string(), Some(&self.value_buf))

.push_with_name(&target_key, Some(&self.value_buf))

gemini-code-assist · 2026-05-07T04:30:20Z


                    self.index_creator
-                        .push_with_name_n(target_key, elem, *count)
+                        .push_with_name_n(&target.to_string(), elem, *count)


Use the pre-calculated target_key here.

Suggested change

.push_with_name_n(&target.to_string(), elem, *count)

.push_with_name_n(&target_key, elem, *count)

gemini-code-assist · 2026-05-07T04:30:20Z

+        for target in &self.indexed_targets {
+            if target.path().is_some() {
+                continue;
+            }


Pre-calculate target_key here and note that nested paths are currently skipped in do_update. Since do_update is the primary method used for indexing during regular SST writing, this means nested indices will not be created for most SSTs. If mito2::read::Batch does not contain flattened columns, the indexer should be updated to extract nested values from the JSON vectors.

for target in &self.indexed_targets { let target_key = target.to_string(); if target.path().is_some() { continue; }

gemini-code-assist · 2026-05-07T04:30:20Z


                    self.index_creator
-                        .push_with_name_n(target_key, value, n)
+                        .push_with_name_n(&target.to_string(), value, n)


Use the pre-calculated target_key here.

Suggested change

.push_with_name_n(&target.to_string(), value, n)

.push_with_name_n(&target_key, value, n)

gemini-code-assist · 2026-05-07T04:30:20Z

                        if value.is_null() {
                            self.index_creator
-                                .push_with_name(target_key, None)
+                                .push_with_name(&target.to_string(), None)


Use the pre-calculated target_key here.

Suggested change

.push_with_name(&target.to_string(), None)

.push_with_name(&target_key, None)

gemini-code-assist · 2026-05-07T04:30:20Z

                            .context(EncodeSnafu)?;
                            self.index_creator
-                                .push_with_name(target_key, Some(&self.value_buf))
+                                .push_with_name(&target.to_string(), Some(&self.value_buf))


Use the pre-calculated target_key here.

Suggested change

.push_with_name(&target.to_string(), Some(&self.value_buf))

.push_with_name(&target_key, Some(&self.value_buf))

gemini-code-assist · 2026-05-07T04:30:20Z

+
+        let nested_path_str = nested_path.trim();
+        ensure!(!nested_path_str.is_empty(), InvalidPathSnafu { key });
+        // FIXME(fys): do we need to handle special characters in here and encode method?


The use of . and : as delimiters in the encoded IndexTarget string without escaping is problematic. JSON keys frequently contain these characters. For example, a nested path like ["a.b", "c"] would be encoded as col_id:a.b.c, which the decode method would incorrectly parse as ["a", "b", "c"]. Consider implementing an escaping mechanism or using a more robust serialization format for the target key to avoid collisions.

feat: support inverted index for nested path

cb55624

github-actions Bot added size/S docs-not-required This change does not impact docs. labels May 7, 2026

gemini-code-assist Bot reviewed May 7, 2026

View reviewed changes

add advance api for index-related trait

803c94d

github-actions Bot added size/M and removed size/S labels May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support inverted index for nested path#8075

feat: support inverted index for nested path#8075
fengys1996 wants to merge 2 commits intoGreptimeTeam:mainfrom
fengys1996:feat/inverted-index-for-nested-path

fengys1996 commented May 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	.push_with_name(&target.to_string(), None)
	.push_with_name(&target_key, None)

	.push_with_name(&target.to_string(), Some(&self.value_buf))
	.push_with_name(&target_key, Some(&self.value_buf))

	.push_with_name_n(&target.to_string(), elem, *count)
	.push_with_name_n(&target_key, elem, *count)

	.push_with_name_n(&target.to_string(), value, n)
	.push_with_name_n(&target_key, value, n)

Conversation

fengys1996 commented May 7, 2026

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

PR Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant