Skip to content

Zane/fix fluentd procstat pattern#1662

Merged
zanejohnson-azure merged 5 commits intoci_prodfrom
zane/fix-fluentd-procstat-pattern
Apr 28, 2026
Merged

Zane/fix fluentd procstat pattern#1662
zanejohnson-azure merged 5 commits intoci_prodfrom
zane/fix-fluentd-procstat-pattern

Conversation

@zanejohnson-azure
Copy link
Copy Markdown
Contributor

@zanejohnson-azure zanejohnson-azure commented Apr 24, 2026

problem 1

both ruby and fluentd has the keyword "ruby" and "fluentd" in its cmdline, so using pattern can't tell who is who. also using exe as filter does not work either, because fluentd is a gem under ruby, so the exe of fluentd process is also ruby.

fix: use procstat.filter doc: https://docs.influxdata.com/telegraf/v1/input-plugins/procstat/#:~:text=This%20plugin%20allows%20to%20monitor,service%20that%20started%20the%20process.

tested, now values of fluentd and ruby matches what is showing inside container

image

manual check
fluentd (supervisor): 47.9 MB
ruby: 82.5 MB

problem 2:

main telegraf and the process metrics collection telegraf both use telegraf.exe so exe can be differentiate whether the telegraf process is main or process-metrics.

fix: The only difference is the telegraf config files, so we can use pattern to differentiate between them.

problem 3:

add a ProcessName field to make query easier.

test

image

zanejohnson-azure and others added 2 commits April 24, 2026 16:30
Previously exe=ruby matched both fluentd PIDs (supervisor + worker)
since both resolve to /usr/bin/ruby, producing identical metrics.
Now uses distinct pattern matching:
- fluentd supervisor: pattern = "fluentd(?!.*under-supervisor)"
- fluentd worker: pattern = "fluentd.*under-supervisor"

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Go regex (used by procstat) does not support lookaheads (?!...).
Change supervisor pattern from fluentd(?!.*under-supervisor) to
"ruby /usr/bin/fluentd" which only matches the supervisor cmdline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@zanejohnson-azure zanejohnson-azure requested a review from a team as a code owner April 24, 2026 23:52
…/worker

Replace pattern-based matching with [[inputs.procstat.filter]] using
process_names which matches against /proc/pid/comm. This cleanly
separates fluentd supervisor (comm=fluentd) from worker (comm=ruby).

Validated via fast-test on zane-ama-logs-helm-test cluster.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@zanejohnson-azure
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@zanejohnson-azure
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Comment thread build/linux/installer/conf/telegraf-ama-logs-process-metrics.conf
@zanejohnson-azure
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@zanejohnson-azure zanejohnson-azure enabled auto-merge (squash) April 28, 2026 20:52
@zanejohnson-azure zanejohnson-azure merged commit 2fa77af into ci_prod Apr 28, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants