ch_tests_tool: run all perf tests in one go by anirudhrb · Pull Request #4419 · microsoft/lisa

anirudhrb · 2026-04-16T11:28:55Z

This pull request significantly refactors the Cloud Hypervisor performance metrics test logic to simplify test execution, improve result collection, and remove legacy per-test configuration. The main focus is on running all metrics subtests in a single invocation, collecting results from a generated JSON report, and eliminating no-longer-needed per-test environment variables and logic.

Related Issue

Type of Change

Checklist

Description is filled in above
No credentials, secrets, or internal details are included
Peer review requested (if not, add required peer reviewers after raising PR)
Tests executed and results posted below

Test Validation

Key Test Cases:

Impacted LISA Features:

Tested Azure Marketplace Images:

Test Results

Image	VM Size	Result
		PASSED / FAILED / SKIPPED

total_idle_time should be reset whenever there is log growth. There is no point in adding to total idle time if progress was eventually made. In long running tests, it is expected that there is no log growth for a while. The existing logic would continue adding to total_time and eventually timeout even if the tests are not stuck. Signed-off-by: Anirudh Rayabharam <anrayabh@microsoft.com>

This is already checked by LISA's tool installation logic and we would never reach here if dev_cli.sh didn't exist. Signed-off-by: Anirudh Rayabharam <anrayabh@microsoft.com>

Copilot

Pull request overview

This PR refactors the Cloud Hypervisor performance metrics test flow to run all metrics subtests in a single invocation, then derive per-subtest pass/fail and metrics from a generated JSON report, removing legacy per-test env/config handling.

Changes:

Run metrics tests once with --continue-on-failure and collect per-subtest results from a JSON report.
Consolidate one-time host setup into _ensure_host_setup() and remove per-test policies/env var logic.
Remove MS-CLH variables for per-test block-size tuning that are no longer used.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
lisa/microsoft/testsuites/cloud_hypervisor/ch_tests_tool.py	Reworks metrics execution to one run + JSON report parsing; adjusts host setup and diagnostics behavior.
lisa/microsoft/testsuites/cloud_hypervisor/ch_tests.py	Removes now-unused variables for metrics block-size configuration passed into the tool.

Copilot · 2026-04-16T11:34:55Z

+        result = None
+        try:
+            result = self._run_with_enhanced_diagnostics(
+                cmd_args=cmd_args,
+                timeout=cmd_timeout,
+                log_path=log_path,
+                test_name=test_name,
+                numa_cmd=numa_cmd,
+            )


Unlike the previous per-subtest execution path, any exception from _run_with_enhanced_diagnostics() (e.g., timeout, SSH failure) will currently abort run_metrics_tests() before per-subtest results are reported, defeating the purpose of --continue-on-failure. Consider catching expected execution exceptions here, recording a failed status for each requested testcase, and still running the artifact/report collection logic for debugging.

Copilot · 2026-04-16T11:34:56Z

+        # Store result for log writing
+        self._last_result = result
+


self._last_result is assigned after the per-testcase loop, but _write_testcase_log() (called inside the loop) relies on _last_result to include stdout/stderr. This can lead to per-testcase logs containing stale output (from a previous run) or only the fallback trace. Consider setting _last_result = result before the loop (or passing result into _write_testcase_log() explicitly) to keep per-testcase logs accurate.

Copilot · 2026-04-16T11:34:56Z

+        except Exception as e:
+            self._log.warning(
+                f"Failed to retrieve metrics report: {e}. "


The report retrieval/parsing wraps everything in a broad except Exception, which can mask programming errors and makes it harder to differentiate "report missing" from "report malformed". Consider catching the specific exceptions expected here (e.g., FileNotFoundError/OSError from copy_back, and json.JSONDecodeError from parsing), and include the remote/local report paths in the warning for easier investigation.

Suggested change

except Exception as e:

self._log.warning(

f"Failed to retrieve metrics report: {e}. "

except OSError as e:

self._log.warning(

f"Failed to retrieve metrics report from remote path "

f"{remote_report} to local path {local_report}: {e}. "

"Verify the report file was generated and the copy_back "

"operation succeeded. All subtests will be marked as failed."

)

per_test_results = {}

except json.JSONDecodeError as e:

self._log.warning(

f"Failed to parse metrics report copied from remote path "

f"{remote_report} to local path {local_report}: {e}. "

"Verify the report file contains valid JSON. "

Copilot · 2026-04-16T11:34:56Z

 echo "[env] CH_HANG_KILL_SECS=${{CH_HANG_KILL_SECS:-1800}}"
 echo "[env] CH_CHECK_INTERVAL=${{CH_CHECK_INTERVAL:-30}}"
 echo "[env] MIGRATABLE_VERSION=${{MIGRATABLE_VERSION:-not_set}}"
-test -x scripts/dev_cli.sh || {{ echo "[error] scripts/dev_cli.sh missing"; exit 98; }}



The wrapper still relies on scripts/dev_cli.sh being present/executable, but the explicit pre-flight check for it was removed in this hunk. Without it, failures become a generic "No such file or directory" later in the script. Consider restoring an early test -x scripts/dev_cli.sh check with a clear message/exit code so missing checkout/layout issues are immediately actionable.

Existence of dev_cli.sh is verified during tool initialization. If there is no dev_cli.sh it will never reach here.

Copilot · 2026-04-16T11:34:57Z

    def _build_metrics_cmd_args(
-        self, testcase: str, hypervisor: str, subtest_timeout: Optional[int]
+        self, hypervisor: str, subtest_timeout: Optional[int]
    ) -> str:
-        """Build command arguments for metrics test."""
        cmd_args = (
            f"tests --hypervisor {hypervisor} --metrics -- --"
-            f" --test-filter {testcase}"
+            " --continue-on-failure"
+            f" --report-file /cloud-hypervisor/{self.METRICS_REPORT_FILE}"
        )
-
-        # Use subtest_timeout if provided, otherwise check for MQ test extension
        if subtest_timeout:
            cmd_args = f"{cmd_args} --timeout {subtest_timeout}"
-        elif self._is_multi_queue_test(testcase):
-            # Lengthen measure window for MQ tests to ~90s
-            # This reduces influence of transient queue skew
-            mq_timeout = self.perf_mq_test_timeout
-            if mq_timeout > 0:
-                cmd_args = f"{cmd_args} --timeout {mq_timeout}"
-
        return cmd_args


run_metrics_tests() still accepts only/skip, and _prepare_metrics_subtests() filters the list, but _build_metrics_cmd_args() no longer applies any filter/skip arguments to the actual Cloud Hypervisor invocation. As a result, the tool will execute all metrics tests even when the caller requested a subset. Either pass an appropriate filter/skip option through to the CLI (if supported) or remove/rename these parameters to avoid misleading behavior.

log_path parameter is unused in several places. Remove it. Signed-off-by: Anirudh Rayabharam <anrayabh@microsoft.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

lisa/microsoft/testsuites/cloud_hypervisor/ch_tests_tool.py:539

run_metrics_tests no longer treats the overall command exit status as authoritative. If the metrics runner exits non-zero (e.g., one or more subtests failed) but the JSON report still contains entries, the function can end up passing because failed_testcases stays empty and there is no result.exit_code/result.is_timeout assertion. Please fail the test when result.is_timeout is true or result.exit_code != 0, and/or cross-check the report contents against the expected testcase list so missing/failed subtests can’t be silently ignored.

        for testcase, entry in per_test_results.items():
            status, metrics, trace = self._classify_test_result(testcase, entry)

            if status == TestStatus.FAILED:
                failed_testcases.append(testcase)

            self._send_metrics_test_result(
                test_result, testcase, status, metrics, trace
            )

        if len(failed_testcases) > 0:
            diagnostic_info = self._extract_diagnostic_info(
                log_path, test_name, result
            )

            self._log.error(
                f"Test cases failed: {failed_testcases} "
                f"Diagnostics: {diagnostic_info}"
            )

        self._check_test_panic_from_logs(
            test_result=test_result,
            content=result.stdout,
            stage="metrics tests",
            test_name="run_metrics_tests",
        )

        self._save_kernel_logs(log_path)

        # Check for kernel panic after all tests complete
        if self.node.features.is_supported(SerialConsole):
            self.node.features[SerialConsole].check_panic(
                saved_path=log_path, force_run=True, test_result=test_result
            )

        assert_that(
            failed_testcases, f"Failed Testcases: {failed_testcases}"
        ).is_empty()

Copilot · 2026-04-23T18:46:19Z

+    def _classify_test_result(
+        self,
+        testcase: str,
+        result_entry: Dict[str, Any],
    ) -> Tuple[TestStatus, str, str]:
-        """Process the result of a metrics test."""
-        if result.exit_code == 0:
-            status = TestStatus.PASSED
-            metrics = self._process_perf_metric_test_result(result.stdout)
-            trace = ""
-        else:
-            status = TestStatus.FAILED
-            metrics = ""
-            # Get enhanced diagnostic information for better error reporting
-            diagnostic_info = self._extract_diagnostic_info(log_path, test_name, result)
-            if diagnostic_info:
-                trace = f"Testcase '{testcase}' failed: {diagnostic_info}"
-            else:
-                trace = (
-                    f"Testcase '{testcase}' failed with exit code "
-                    f"{result.exit_code}"
-                )
+        """Determine pass/fail for a single test from the metrics run.
+
+        Returns (status, metrics, trace).
+        """
+        if result_entry.get("status") == "FAILED":
+            return (
+                TestStatus.FAILED,
+                "",
+                f"Testcase '{testcase}' reported status FAILED.",
+            )

-        return status, metrics, trace
+        metrics = f'"results": [{json.dumps(result_entry)}]'
+        return TestStatus.PASSED, metrics, ""


_classify_test_result checks result_entry.get("status") == "FAILED", but _parse_metrics_report’s documented schema shows each entry only has name/mean/std_dev/max/min (no status). With the current logic, a failed test may be reported as PASSED (or omitted entirely) unless the report format actually includes a status field. Please align classification with the real report schema (e.g., use an explicit failed flag if present, or rely on the overall exit code and presence/absence of expected result entries).

The current implementation lists all the perf tests and runs them one-by-one i.e. one dev_cli.sh per test. This process is too slow. It takes up around 5.5 hours for all the tests to run. Instead, run all the tests in one go and use the --continue-on-failure and --report-file parameters. --continue-on-failure ensures that we continue running the tests even after a test failure. --report-file produces a JSON file with all the test results. Signed-off-by: Anirudh Rayabharam <anrayabh@microsoft.com>

anirudhrb added 2 commits April 15, 2026 11:56

ch_tests_tool: remove unnecessary dev_cli.sh existence check

f13b6fd

This is already checked by LISA's tool installation logic and we would never reach here if dev_cli.sh didn't exist. Signed-off-by: Anirudh Rayabharam <anrayabh@microsoft.com>

Copilot AI review requested due to automatic review settings April 16, 2026 11:28

Copilot started reviewing on behalf of anirudhrb April 16, 2026 11:29 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

ch_tests_tool: remove unused function parameters

8b29b1c

log_path parameter is unused in several places. Remove it. Signed-off-by: Anirudh Rayabharam <anrayabh@microsoft.com>

anirudhrb force-pushed the perf_tests_oneshot branch from e3b3163 to 80ec8fe Compare April 20, 2026 13:05

Copilot AI review requested due to automatic review settings April 23, 2026 18:40

anirudhrb force-pushed the perf_tests_oneshot branch from 80ec8fe to ca8df13 Compare April 23, 2026 18:40

Copilot started reviewing on behalf of anirudhrb April 23, 2026 18:41 View session

anirudhrb marked this pull request as ready for review April 23, 2026 18:41

anirudhrb requested review from pupacha and vyadavmsft as code owners April 23, 2026 18:41

Copilot AI reviewed Apr 23, 2026

View reviewed changes

anirudhrb force-pushed the perf_tests_oneshot branch from ca8df13 to 6788aeb Compare April 28, 2026 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ch_tests_tool: run all perf tests in one go#4419

ch_tests_tool: run all perf tests in one go#4419
anirudhrb wants to merge 4 commits intomicrosoft:mainfrom
anirudhrb:perf_tests_oneshot

anirudhrb commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

anirudhrb Apr 17, 2026

Uh oh!

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        except Exception as e:
-            self._log.warning(
-                f"Failed to retrieve metrics report: {e}. "
+        except OSError as e:
+            self._log.warning(
+                f"Failed to retrieve metrics report from remote path "
+                f"{remote_report} to local path {local_report}: {e}. "
+                "Verify the report file was generated and the copy_back "
+                "operation succeeded. All subtests will be marked as failed."
+            )
+            per_test_results = {}
+        except json.JSONDecodeError as e:
+            self._log.warning(
+                f"Failed to parse metrics report copied from remote path "
+                f"{remote_report} to local path {local_report}: {e}. "
+                "Verify the report file contains valid JSON. "

Conversation

anirudhrb commented Apr 16, 2026

Related Issue

Type of Change

Checklist

Test Validation

Test Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

anirudhrb Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants