Skip to content

Refine PerfEvaluation to compare network perf data with sku cap#4430

Open
LiliDeng wants to merge 1 commit intomainfrom
lildeng/perf_evaluation
Open

Refine PerfEvaluation to compare network perf data with sku cap#4430
LiliDeng wants to merge 1 commit intomainfrom
lildeng/perf_evaluation

Conversation

@LiliDeng
Copy link
Copy Markdown
Collaborator

Description

Related Issue

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Refactoring
  • Documentation update

Checklist

  • Description is filled in above
  • No credentials, secrets, or internal details are included
  • Peer review requested (if not, add required peer reviewers after raising PR)
  • Tests executed and results posted below

Test Validation

Key Test Cases:

Impacted LISA Features:

Tested Azure Marketplace Images:

Test Results

Image VM Size Result
PASSED / FAILED / SKIPPED

Copilot AI review requested due to automatic review settings April 21, 2026 15:28
@github-actions
Copy link
Copy Markdown

🤖 AI Test Selection

No test cases were selected for this PR.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the PerfEvaluation notifier to evaluate network throughput metrics against Azure SKU bandwidth capabilities (via cached location/SKU data) using a peak-based validation approach, and enriches failure summaries with more detailed context.

Changes:

  • Added configurable post_peak_drop_ratio and implemented peak-based throughput validation (post-peak drop rule + SKU 90% peak rule).
  • Added Azure SKU capability cache lookup (azure_locations_{location}.json) to drive throughput evaluation when available.
  • Expanded performance-failure summaries to include per-metric failure reasons.
Comments suppressed due to low confidence (1)

lisa/notifiers/perfevaluation/perfevaluation.py:1207

  • _modify_message appends detail_msg into TestResultMessage.message, which can easily exceed the 2048-char cap applied when the message is first created (TestResult._send_result_message). This can cause downstream consumers to receive unexpectedly large messages. Consider keeping message.message concise (and/or truncating detail_msg to a safe length) while storing full details only in perf_evaluation_summary.
                    summary_msg = (
                        f"Performance evaluation failed: {len(failed_metrics)} "
                        f"metric(s) did not meet criteria. Details: {detail_msg}"
                    )
                    message.perf_evaluation_summary = {
                        "failed_metrics_count": len(failed_metrics),
                        "failed_metrics": failed_metrics,
                        "summary": summary_msg,
                    }
                    runbook = cast(PerfEvaluationSchema, self.runbook)
                    fail_on_perf = getattr(
                        runbook, "fail_test_on_performance_failure", False
                    )
                    if fail_on_perf and message.status != TestStatus.FAILED:
                        original_message = message.message or ""
                        perf_summary = (
                            f"Performance criteria failed: {len(failed_metrics)} "
                            f"metrics. {detail_msg}"
                        )
                        message.status = TestStatus.FAILED
                        if original_message:
                            message.message = f"{original_message}\n{perf_summary}"

Comment on lines +26 to +27
# Allowed post-peak throughput drop ratio, e.g. 0.02 means 2% drop allowed.
post_peak_drop_ratio: float = 0.02
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description template is still empty (Description/Related Issue/Type of Change/Checklist). Please fill it in so reviewers understand the intent and validation plan for the new PerfEvaluation behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +840 to +858
# Step 1: if this is a throughput metric and SKU bandwidth is available,
# use the peak-based check exclusively — skip perf_criteria entirely.
throughput_prefixes = (
"throughput_in_gbps",
"tx_throughput_in_gbps",
"rx_throughput_in_gbps",
)
if (
any(metric_name.startswith(p) for p in throughput_prefixes)
and perf_message.metric_relativity
== messages.MetricRelativity.HigherIsBetter
):
bw_spec = self._lookup_az_max_bandwidth(
perf_message.location, perf_message.vmsize or ""
)
if bw_spec is not None:
self._update_bw_peak(perf_message, metric_name, actual_value, bw_spec)
return

Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SKU-based throughput path returns early after _update_bw_peak(...) and does not record a passing evaluation result. If all perf metrics in a run are throughput metrics and they pass, _evaluation_results can remain empty and finalize() will log "No performance evaluations performed." Consider appending a pass result per throughput metric (and/or for the SKU 90% check) so reports/JSON output remain consistent with the non-SKU criteria path.

Copilot uses AI. Check for mistakes.
try:
with open(cache_path, "r", encoding="utf-8") as f:
self._az_cap_cache[location] = json.load(f)
except Exception:
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_lookup_az_max_bandwidth catches Exception and silently falls back to {} on any failure reading/parsing the cache file. This can mask real issues (e.g., JSON corruption) and make troubleshooting difficult. Prefer catching specific exceptions (e.g., OSError, json.JSONDecodeError, ValueError) and logging the exception (at least debug) with the cache path before falling back.

Suggested change
except Exception:
except (OSError, json.JSONDecodeError, ValueError) as e:
self._log.debug(
f"Failed to read Azure capability cache from {cache_path}: "
f"{e}. Verify the cache file exists, is readable, and "
"contains valid JSON. Falling back to an empty cache."
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants