Skip to content

PMM-12832 Timeouts for exporters.#5134

Open
JiriCtvrtka wants to merge 101 commits intov3from
PMM-12832-exporter-timeouts
Open

PMM-12832 Timeouts for exporters.#5134
JiriCtvrtka wants to merge 101 commits intov3from
PMM-12832-exporter-timeouts

Conversation

@JiriCtvrtka
Copy link
Copy Markdown
Contributor

@JiriCtvrtka JiriCtvrtka commented Mar 12, 2026

PMM-12832

FB: Percona-Lab/pmm-submodules#4275

Summary

  1. Add support for configuring custom connection timeout for exporters
  2. Apply the configured timeout consistently across supported exporter types, including RDS and Azure flows
  3. Ensure generated MySQL and PostgreSQL config/DSN values handle whole-second timeout requirements correctly
  4. New default for exporter is increased from 1 second to 2 seconds

This comment was marked as resolved.

This comment was marked as outdated.

This comment was marked as resolved.

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 104 out of 105 changed files in this pull request and generated 1 comment.

Comment on lines +25 to +31
// FromProto converts a protobuf Duration to a time.Duration.
func FromProto(d *durationpb.Duration) time.Duration {
if d == nil {
return 0
}

return d.AsDuration()
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FromProto returns d.AsDuration() without validating the protobuf Duration or guarding against time.Duration overflow. A user can provide a very large (but protobuf-valid) duration that overflows time.Duration and becomes a negative/small value, leading to incorrect exporter timeouts. Consider validating (CheckValid) and adding an explicit upper bound / overflow-safe conversion (reject or clamp values that can’t fit into time.Duration).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

@JiriCtvrtka JiriCtvrtka Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot d.CheckValid() is already done in pb files.

Comment on lines +1201 to +1207
if params.ExporterOptions.ConnectionTimeout != nil {
if pointer.GetDuration(params.ExporterOptions.ConnectionTimeout) == 0 {
row.ExporterOptions.ConnectionTimeout = nil
} else {
row.ExporterOptions.ConnectionTimeout = params.ExporterOptions.ConnectionTimeout
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if params.ExporterOptions.ConnectionTimeout != nil {
if pointer.GetDuration(params.ExporterOptions.ConnectionTimeout) == 0 {
row.ExporterOptions.ConnectionTimeout = nil
} else {
row.ExporterOptions.ConnectionTimeout = params.ExporterOptions.ConnectionTimeout
}
}
if params.ExporterOptions.ConnectionTimeout != nil {
row.ExporterOptions.ConnectionTimeout = params.ExporterOptions.ConnectionTimeout
}

}

// EffectiveDialTimeout returns the timeout configured for this agent's exporter.
// Only for exporters that have connection timeout support in DSN/config generation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks confusing- for exporters that don't support timeout in DSN/config it returns default constant that may be treated as supported timeout

}
env := []string{
fmt.Sprintf("MONGODB_URI=%s", exporter.DSN(service, models.DSNParams{DialTimeout: time.Second, Database: database}, tdp, pmmAgentVersion)),
fmt.Sprintf("MONGODB_URI=%s", exporter.DSN(service, models.DSNParams{DialTimeout: exporter.EffectiveDialTimeout(), Database: database}, tdp, pmmAgentVersion)),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the rest of Mongo Agents (QAN, RTA)? They support timeout in DSN as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, other non exporters agents support timeouts too, but scope in ticket is about exporters only.

Comment on lines +139 to +152
case exporter.AzureOptions.ClientID != "":
if pointer.GetDuration(exporter.ExporterOptions.ConnectionTimeout) != 0 {
roundedConnectionTimeout = *exporter.ExporterOptions.ConnectionTimeout
} else {
roundedConnectionTimeout = postgresRemoteCloudDefaultDialTimeout
}
case node.NodeType == models.RemoteRDSNodeType:
if pointer.GetDuration(exporter.ExporterOptions.ConnectionTimeout) != 0 {
roundedConnectionTimeout = *exporter.ExporterOptions.ConnectionTimeout
} else {
roundedConnectionTimeout = postgresRemoteCloudDefaultDialTimeout
}
default:
roundedConnectionTimeout = exporter.EffectiveDialTimeout()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case exporter.AzureOptions.ClientID != "":
if pointer.GetDuration(exporter.ExporterOptions.ConnectionTimeout) != 0 {
roundedConnectionTimeout = *exporter.ExporterOptions.ConnectionTimeout
} else {
roundedConnectionTimeout = postgresRemoteCloudDefaultDialTimeout
}
case node.NodeType == models.RemoteRDSNodeType:
if pointer.GetDuration(exporter.ExporterOptions.ConnectionTimeout) != 0 {
roundedConnectionTimeout = *exporter.ExporterOptions.ConnectionTimeout
} else {
roundedConnectionTimeout = postgresRemoteCloudDefaultDialTimeout
}
default:
roundedConnectionTimeout = exporter.EffectiveDialTimeout()
case exporter.AzureOptions.ClientID != "",
node.NodeType == models.RemoteRDSNodeType:
if pointer.GetDuration(exporter.ExporterOptions.ConnectionTimeout) != 0 {
roundedConnectionTimeout = *exporter.ExporterOptions.ConnectionTimeout
} else {
roundedConnectionTimeout = postgresRemoteCloudDefaultDialTimeout
}
default:
roundedConnectionTimeout = exporter.EffectiveDialTimeout()

connectionTimeout := exporter.ExporterOptions.ConnectionTimeout
if pointer.GetDuration(connectionTimeout) == 0 {
connectionTimeout = pointer.ToDuration(defaultValkeyTimeout)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
connectionTimeout := &defaultValkeyTimeout
if pointer.GetDuration(exporter.ExporterOptions.ConnectionTimeout) != 0 {
connectionTimeout = exporter.ExporterOptions.ConnectionTimeout
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general: there are many places where pointer.GetDuration(exporterOptions.ConnectionTimeout) is used.
Q: if we have validation on API level gte:0 it means here on service/model level there can be either exporterOptions.ConnectionTimeout == nil or exporterOptions.ConnectionTimeout > 0. So is there any reason to use pointer.GetDuration() at all?

return nil
}

return pointer.ToDuration(d.AsDuration())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, there is shorter generic method pointer.To[T any](t T) *T that returns pointer to passed type

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which one is "faster" but changed, because yes it is shorter and simpler to read.

}
}

func optionalDurationFromProto(d *durationpb.Duration) *time.Duration {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to managed/utils/duration ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to utils.

ServiceID: service.ServiceID,
AzureOptions: models.AzureOptionsFromRequest(req),
ExporterOptions: models.ExporterOptions{
ConnectionTimeout: pointer.ToDuration(duration.FromProto(req.ConnectionTimeout)),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. optionalDurationFromProto ?
  2. I wonder why some Create* funds receive ConnectionTimeout *time.Duration but others ConnectionTimeout time.Duration

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yeah right. It was used for inventory. So I moved it from inventory to utils (like you requested even here PMM-12832 Timeouts for exporters. #5134 (comment)) and right now it is used also in management everywhere.
  2. My fault and inconsistency. Fixed

LogLevel log_level = 5;
// Expose the node_exporter process on all public interfaces
bool expose_exporter = 6;
// Connection timeout for exporter (if set).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not wrong, the validation below will disallow me to skip the connection timeout, right?

ConnectionTimeout: &custom,
},
}
assert.Equal(t, custom, a.EffectiveDialTimeout())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only use assert in very rare cases, where we want the test to continue running. In this particular case and downwards, IMO we should use require.

if node.NodeType == models.RemoteRDSNodeType {
dsnParams.DialTimeout = 5 * time.Second
}
var roundedConnectionTimeout time.Duration
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var roundedConnectionTimeout time.Duration
var connectionTimeout time.Duration

I'd keep it short and simple :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants