Refine PerfEvaluation to compare network perf data with sku cap by LiliDeng · Pull Request #4430 · microsoft/lisa

LiliDeng · 2026-04-21T15:28:06Z

Description

Related Issue

Type of Change

Checklist

Description is filled in above
No credentials, secrets, or internal details are included
Peer review requested (if not, add required peer reviewers after raising PR)
Tests executed and results posted below

Test Validation

Key Test Cases:

Impacted LISA Features:

Tested Azure Marketplace Images:

Test Results

Image	VM Size	Result
		PASSED / FAILED / SKIPPED

github-actions · 2026-04-21T15:28:48Z

🤖 AI Test Selection

No test cases were selected for this PR.

Copilot

Pull request overview

This PR updates the PerfEvaluation notifier to evaluate network throughput metrics against Azure SKU bandwidth capabilities (via cached location/SKU data) using a peak-based validation approach, and enriches failure summaries with more detailed context.

Changes:

Added configurable post_peak_drop_ratio and implemented peak-based throughput validation (post-peak drop rule + SKU 90% peak rule).
Added Azure SKU capability cache lookup (azure_locations_{location}.json) to drive throughput evaluation when available.
Expanded performance-failure summaries to include per-metric failure reasons.

Comments suppressed due to low confidence (1)

lisa/notifiers/perfevaluation/perfevaluation.py:1207

_modify_message appends detail_msg into TestResultMessage.message, which can easily exceed the 2048-char cap applied when the message is first created (TestResult._send_result_message). This can cause downstream consumers to receive unexpectedly large messages. Consider keeping message.message concise (and/or truncating detail_msg to a safe length) while storing full details only in perf_evaluation_summary.

                    summary_msg = (
                        f"Performance evaluation failed: {len(failed_metrics)} "
                        f"metric(s) did not meet criteria. Details: {detail_msg}"
                    )
                    message.perf_evaluation_summary = {
                        "failed_metrics_count": len(failed_metrics),
                        "failed_metrics": failed_metrics,
                        "summary": summary_msg,
                    }
                    runbook = cast(PerfEvaluationSchema, self.runbook)
                    fail_on_perf = getattr(
                        runbook, "fail_test_on_performance_failure", False
                    )
                    if fail_on_perf and message.status != TestStatus.FAILED:
                        original_message = message.message or ""
                        perf_summary = (
                            f"Performance criteria failed: {len(failed_metrics)} "
                            f"metrics. {detail_msg}"
                        )
                        message.status = TestStatus.FAILED
                        if original_message:
                            message.message = f"{original_message}\n{perf_summary}"

Copilot · 2026-04-21T15:40:00Z

+    # Allowed post-peak throughput drop ratio, e.g. 0.02 means 2% drop allowed.
+    post_peak_drop_ratio: float = 0.02


The PR description template is still empty (Description/Related Issue/Type of Change/Checklist). Please fill it in so reviewers understand the intent and validation plan for the new PerfEvaluation behavior.

Copilot · 2026-04-21T15:40:00Z

+        # Step 1: if this is a throughput metric and SKU bandwidth is available,
+        # use the peak-based check exclusively — skip perf_criteria entirely.
+        throughput_prefixes = (
+            "throughput_in_gbps",
+            "tx_throughput_in_gbps",
+            "rx_throughput_in_gbps",
+        )
+        if (
+            any(metric_name.startswith(p) for p in throughput_prefixes)
+            and perf_message.metric_relativity
+            == messages.MetricRelativity.HigherIsBetter
+        ):
+            bw_spec = self._lookup_az_max_bandwidth(
+                perf_message.location, perf_message.vmsize or ""
+            )
+            if bw_spec is not None:
+                self._update_bw_peak(perf_message, metric_name, actual_value, bw_spec)
+                return
+


The SKU-based throughput path returns early after _update_bw_peak(...) and does not record a passing evaluation result. If all perf metrics in a run are throughput metrics and they pass, _evaluation_results can remain empty and finalize() will log "No performance evaluations performed." Consider appending a pass result per throughput metric (and/or for the SKU 90% check) so reports/JSON output remain consistent with the non-SKU criteria path.

Copilot · 2026-04-21T15:40:01Z

+                try:
+                    with open(cache_path, "r", encoding="utf-8") as f:
+                        self._az_cap_cache[location] = json.load(f)
+                except Exception:


_lookup_az_max_bandwidth catches Exception and silently falls back to {} on any failure reading/parsing the cache file. This can mask real issues (e.g., JSON corruption) and make troubleshooting difficult. Prefer catching specific exceptions (e.g., OSError, json.JSONDecodeError, ValueError) and logging the exception (at least debug) with the cache path before falling back.

Suggested change

except Exception:

except (OSError, json.JSONDecodeError, ValueError) as e:

self._log.debug(

f"Failed to read Azure capability cache from {cache_path}: "

f"{e}. Verify the cache file exists, is readable, and "

"contains valid JSON. Falling back to an empty cache."

)

Refine PerfEvaluation to compare network perf data with sku cap

883e1e8

Copilot AI review requested due to automatic review settings April 21, 2026 15:28

LiliDeng requested a review from johnsongeorge-w as a code owner April 21, 2026 15:28

Copilot started reviewing on behalf of LiliDeng April 21, 2026 15:28 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine PerfEvaluation to compare network perf data with sku cap#4430

Refine PerfEvaluation to compare network perf data with sku cap#4430
LiliDeng wants to merge 1 commit intomainfrom
lildeng/perf_evaluation

LiliDeng commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Allowed post-peak throughput drop ratio, e.g. 0.02 means 2% drop allowed.
		post_peak_drop_ratio: float = 0.02

-                except Exception:
+                except (OSError, json.JSONDecodeError, ValueError) as e:
+                    self._log.debug(
+                        f"Failed to read Azure capability cache from {cache_path}: "
+                        f"{e}. Verify the cache file exists, is readable, and "
+                        "contains valid JSON. Falling back to an empty cache."
+                    )

Conversation

LiliDeng commented Apr 21, 2026

Description

Related Issue

Type of Change

Checklist

Test Validation

Test Results

Uh oh!

github-actions Bot commented Apr 21, 2026

🤖 AI Test Selection

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants