feat: flow inc query terminal metrics transport#8045
Conversation
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
There was a problem hiding this comment.
Code Review
This pull request introduces OutputMetrics and OutputWithMetrics to track and expose terminal metrics, such as region watermarks, from query results across the client, frontend, and gRPC layers. It updates the gRPC Flight stream handling to support interleaved metrics messages and adds new API methods to retrieve these metrics. A critical issue was identified in the client's stream processing where a valid RecordBatch could be dropped if a subsequent Metrics message is malformed; yielding the batch before processing the next message is recommended to prevent data loss.
| FlightMessage::RecordBatch(arrow_batch) => { | ||
| yield Ok(RecordBatch::from_df_record_batch( | ||
| let result_to_yield = RecordBatch::from_df_record_batch( | ||
| schema_cloned.clone(), | ||
| arrow_batch, | ||
| )) | ||
| ); | ||
|
|
||
| if let Some(next_flight_message_result) = | ||
| flight_message_stream.next().await | ||
| { | ||
| match next_flight_message_result { | ||
| Ok(FlightMessage::Metrics(s)) => { | ||
| match parse_terminal_metrics(&s) { | ||
| Ok(m) => { | ||
| metrics_ref.swap(Some(Arc::new(m))); | ||
| } | ||
| Err(e) => { | ||
| yield Err(BoxedError::new(e)) | ||
| .context(ExternalSnafu); | ||
| break; | ||
| } | ||
| }; | ||
| } | ||
| Ok(FlightMessage::RecordBatch(rb)) => { | ||
| buffered_message = Some(FlightMessage::RecordBatch(rb)); | ||
| } | ||
| Ok(_) => { | ||
| yield IllegalFlightMessagesSnafu {reason: "A RecordBatch message can only be succeeded by a Metrics message or another RecordBatch message"} | ||
| .fail() | ||
| .map_err(BoxedError::new) | ||
| .context(ExternalSnafu); | ||
| break; | ||
| } | ||
| Err(e) => { | ||
| yield Err(BoxedError::new(e)).context(ExternalSnafu); | ||
| break; | ||
| } | ||
| } | ||
| } else { | ||
| stream_ended = true; | ||
| } | ||
|
|
||
| yield Ok(result_to_yield) | ||
| } |
There was a problem hiding this comment.
The current implementation for handling FlightMessage::RecordBatch can lead to data loss. If an error occurs while processing the message following a RecordBatch (e.g., an invalid Metrics message), the stream yields an error and terminates, but the last valid RecordBatch that was received is dropped without being yielded to the consumer.
While the test test_invalid_terminal_metrics_after_record_batch_fails_before_yielding_batch seems to confirm this behavior, dropping valid data due to subsequent invalid metadata is not ideal for clients. The consumer should receive all valid data batches, and then an error if one occurs.
To fix this, the RecordBatch should be yielded immediately after it's created, before processing the next message in the stream. This ensures no data is lost if a subsequent error occurs.
FlightMessage::RecordBatch(arrow_batch) => {
let result_to_yield = RecordBatch::from_df_record_batch(
schema_cloned.clone(),
arrow_batch,
);
yield Ok(result_to_yield);
if let Some(next_flight_message_result) =
flight_message_stream.next().await
{
match next_flight_message_result {
Ok(FlightMessage::Metrics(s)) => {
match parse_terminal_metrics(&s) {
Ok(m) => {
metrics_ref.swap(Some(Arc::new(m)));
}
Err(e) => {
yield Err(BoxedError::new(e))
.context(ExternalSnafu);
}
};
break;
}
Ok(FlightMessage::RecordBatch(rb)) => {
buffered_message = Some(FlightMessage::RecordBatch(rb));
}
Ok(_) => {
yield IllegalFlightMessagesSnafu {reason: "A RecordBatch message can only be succeeded by a Metrics message or another RecordBatch message"}
.fail()
.map_err(BoxedError::new)
.context(ExternalSnafu);
break;
}
Err(e) => {
yield Err(BoxedError::new(e)).context(ExternalSnafu);
break;
}
}
} else {
break;
}
}Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Avoid routing Flow-specific query extensions through comma-separated hints so checkpoint JSON values remain intact over Flight. Signed-off-by: discord9 <discord9@163.com>
|
|
||
| const FLOW_EXTENSIONS_METADATA_KEY: &str = "x-greptime-flow-extensions"; | ||
|
|
||
| #[derive(Debug, Clone, Default)] |
There was a problem hiding this comment.
Alaways document the pub structs
| } | ||
| Some(FlightMessage::Metrics(_)) => { | ||
| return IllegalFlightMessagesSnafu { | ||
| reason: "'AffectedRows' Flight metadata already carries Metrics and cannot be followed by another Metrics message".to_string(), |
There was a problem hiding this comment.
| reason: "'AffectedRows' Flight metadata already carries Metrics and cannot be followed by another Metrics message".to_string(), | |
| reason: "'AffectedRows' Flight metadata already carries Metrics and cannot be followed by another Metrics message", |
| .map(OutputWithMetrics::into_output) | ||
| } | ||
|
|
||
| pub async fn sql_with_terminal_metrics<S>( |
There was a problem hiding this comment.
Document the new pub function.
| Schema(SchemaRef), | ||
| RecordBatch(DfRecordBatch), | ||
| AffectedRows(usize), | ||
| AffectedRows { |
There was a problem hiding this comment.
Is it a breaking change in protocol?
| fn should_collect_region_watermark( | ||
| return_region_seq: bool, | ||
| has_incremental_after_seqs: bool, | ||
| ) -> bool { | ||
| return_region_seq || has_incremental_after_seqs | ||
| } |
There was a problem hiding this comment.
Is it necessary to wrap this into a function?
There was a problem hiding this comment.
Is it necessary to wrap this into a function?
it's used twice so kind of necessary?
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
This PR wires terminal record-batch metrics through the Flight/client/Flow consumer path.
It builds on #8015, which introduced
RecordBatchMetrics.region_watermarksand query-side terminal metric collection.Changes
Metricsmessages.OutputWithMetrics/OutputMetricshelpers so callers can consume terminal metrics without breaking existingOutputAPIs.Scope
This PR is limited to terminal metrics transport and consumption:
src/servers/src/grpc/flight.rssrc/client/src/database.rssrc/flow/src/batching_mode/frontend_client.rsIt intentionally does not include later stale-cursor, incremental-after-seq, benchmark.
Compatibility
Existing client APIs (
sql,query,create,alter, etc.) continue to return plainOutput.Terminal metrics are opt-in through the new metrics-aware helper path.
Malformed terminal metrics are rejected as transport/parsing errors instead of being silently ignored.
Tests
cargo test -p client terminal_metrics --libcargo test -p flow query_with_terminal_metrics --libCoverage includes:
PR Checklist
Please convert it to a draft if some of the following conditions are not met.