feat: try support build index for nested path by ai by fengys1996 · Pull Request #8042 · GreptimeTeam/greptimedb

fengys1996 · 2026-04-28T07:24:46Z

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

RUN CI ONLY, NO MERGE

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

I have written the necessary rustdoc comments.
I have added the necessary unit tests and integration tests.
This PR requires documentation updates.
API changes are backward compatible.
Schema or data changes are backward compatible.

gemini-code-assist

Code Review

This pull request introduces support for indexing nested sub-fields within Struct or Json columns. It adds a SubField variant to IndexTarget, updates region options to configure these targets, and implements the extraction and casting logic in the InvertedIndexer. Feedback highlights that sub-field indexing currently misses primary key (tag) columns. Additionally, the extraction logic for JSON sub-fields is inefficient due to repeated parsing and contains a bug where intermediate objects or arrays cause traversal to fail. There is also an opportunity to optimize performance by grouping extractions for multiple sub-fields on the same column.

gemini-code-assist · 2026-04-28T07:31:19Z

+        for indexed_target in &self.indexed_targets {
+            if matches!(indexed_target.target, IndexTarget::ColumnId(_)) {
+                let col_id = match &indexed_target.target {
+                    IndexTarget::ColumnId(col_id) => *col_id,
+                    IndexTarget::SubField { .. } => unreachable!(),
+                };


In do_update, SubField targets are only handled in the else block (starting at line 444), which exclusively uses batch.field_col_value(col_id). This implementation fails to support subfield indexing for columns that are part of the primary key (tags), as those are stored separately in the Batch and field_col_value will return None for them. This means nested fields within tag columns won't be indexed during flush or compaction.

gemini-code-assist · 2026-04-28T07:31:19Z

+fn extract_subfield_value(root: ValueRef<'_>, path: &[String]) -> Option<Value> {
+    let mut current = Value::from(root);
+    for segment in path {
+        current = match current {
+            Value::Struct(struct_value) => {
+                let fields = struct_value.struct_type().fields();
+                let index = fields.iter().position(|field| field.name() == segment)?;
+                struct_value.items().get(index).cloned()?
+            }
+            Value::Json(_) => {
+                let mut json: serde_json::Value = current.try_into().ok()?;
+                let obj = json.as_object_mut()?;
+                json_value_to_value(obj.remove(segment)?)?
+            }
+            _ => return None,
+        };
+    }
+    Some(current)
+}


The current implementation of extract_subfield_value has significant efficiency and correctness issues:

Efficiency: Value::from(root) clones the entire structure (e.g., a large Struct) at the start. For Json columns, it parses the JSON string into a serde_json::Value at every segment of the path traversal. This will cause severe performance degradation during index building.

Correctness: json_value_to_value (called on line 592) returns None for Object and Array. This means that if a path is a.b and a is an object within a JSON column, the traversal will stop at a and return None, failing to extract b. Nested paths in JSON columns are effectively unsupported by this logic.

It is recommended to use ValueRef for traversal to avoid cloning, and for JSON columns, parse the string once and traverse the serde_json::Value tree for all remaining segments.

gemini-code-assist · 2026-04-28T07:31:19Z

+                for i in 0..n {
+                    self.value_buf.clear();
+                    let value = values.data.get_ref(i);
+                    let Some(value) = extract_subfield_value(value, path)
+                        .and_then(|value| cast_value_for_index_type(value, value_type))
+                    else {
+                        self.index_creator
+                            .push_with_name(target_key, None)
+                            .await
+                            .context(PushIndexValueSnafu)?;
+                        continue;
+                    };
+
+                    IndexValueCodec::encode_nonnull_value(
+                        value.as_value_ref(),
+                        &sort_field,
+                        &mut self.value_buf,
+                    )
+                    .context(EncodeSnafu)?;
+                    self.index_creator
+                        .push_with_name(target_key, Some(&self.value_buf))
+                        .await
+                        .context(PushIndexValueSnafu)?;
+                }


Processing subfield targets row-by-row for each target independently leads to redundant work. If multiple subfield indexes are defined on the same column (especially a JSON column), the column value is extracted and processed (including JSON parsing) multiple times per row. Consider grouping subfield extractions by column_id to parse the JSON once per row per column.

feat: try support build index for nested path

5f6bd13

github-actions Bot added size/M docs-not-required This change does not impact docs. labels Apr 28, 2026

fix: cargo fmt

dcce91d

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: try support build index for nested path by ai#8042

feat: try support build index for nested path by ai#8042
fengys1996 wants to merge 2 commits intoGreptimeTeam:mainfrom
fengys1996:feat/try-support-nested-filed-index-by-ai

fengys1996 commented Apr 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fengys1996 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

PR Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fengys1996 commented Apr 28, 2026 •

edited

Loading