Skip to content

Spec: Fusion GPU metrics collection#7022

Open
pditommaso wants to merge 9 commits intomasterfrom
260410-fusion-gpu-metrics-v2
Open

Spec: Fusion GPU metrics collection#7022
pditommaso wants to merge 9 commits intomasterfrom
260410-fusion-gpu-metrics-v2

Conversation

@pditommaso
Copy link
Copy Markdown
Member

Summary

  • Adds feature specification for collecting GPU metrics from Fusion trace.json on task completion
  • GPU metrics block sent to Seqera Platform as a transient field on TraceRecord, following the resourceAllocation pattern
  • Covers all Fusion-enabled executors (AWS Batch, Google Batch, Azure Batch, K8s, Seqera, SLURM)

Spec highlights

  • Metrics collected irrespective of task status (success/failure)
  • Graceful handling when trace.json is missing or malformed
  • Forward-compatible: entire gpu block sent as a map, not fixed fields
  • Includes full example trace.json format

Test plan

  • Review spec for completeness and accuracy
  • Validate alignment with Platform API expectations

pditommaso and others added 3 commits April 7, 2026 23:06
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 10, 2026

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit cc2b559
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69d922752934de00089295a0

@pditommaso
Copy link
Copy Markdown
Member Author

Ok, adding the Plan and tasks (do not merge please)

pditommaso and others added 5 commits April 10, 2026 17:52
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Add support for reading GPU metrics from Fusion .fusion/trace.json
file on task completion. The gpu block is carried as a transient
gpuMetrics field on TraceRecord (following the resourceAllocation
pattern) and included in the task payload sent to Seqera Platform.

- Add gpuMetrics transient field with getter/setter to TraceRecord
- Add parseFusionTraceFile() to extract gpu block from trace JSON
- Read .fusion/trace.json in TaskHandler.getTraceRecord(), gated by
  executor.isFusionEnabled()
- Include gpuMetrics in TowerClient.makeTaskMap0() task payload
- Add unit tests for TraceRecord and TowerClient

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
- Use JsonSlurper.parse(Path) via SlurperEx instead of file.text + parseText
- Add TaskRun.FUSION_TRACE constant for .fusion/trace.json path
- Remove narrating comments, align with existing .command.trace style

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@pditommaso pditommaso requested a review from bentsherman April 10, 2026 16:17
@pditommaso
Copy link
Copy Markdown
Member Author

Ready. TO BE TESTED END TO END

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants