ch_tests_tool: run all perf tests in one go#4419
ch_tests_tool: run all perf tests in one go#4419anirudhrb wants to merge 4 commits intomicrosoft:mainfrom
Conversation
total_idle_time should be reset whenever there is log growth. There is no point in adding to total idle time if progress was eventually made. In long running tests, it is expected that there is no log growth for a while. The existing logic would continue adding to total_time and eventually timeout even if the tests are not stuck. Signed-off-by: Anirudh Rayabharam <anrayabh@microsoft.com>
This is already checked by LISA's tool installation logic and we would never reach here if dev_cli.sh didn't exist. Signed-off-by: Anirudh Rayabharam <anrayabh@microsoft.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors the Cloud Hypervisor performance metrics test flow to run all metrics subtests in a single invocation, then derive per-subtest pass/fail and metrics from a generated JSON report, removing legacy per-test env/config handling.
Changes:
- Run metrics tests once with
--continue-on-failureand collect per-subtest results from a JSON report. - Consolidate one-time host setup into
_ensure_host_setup()and remove per-test policies/env var logic. - Remove MS-CLH variables for per-test block-size tuning that are no longer used.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| lisa/microsoft/testsuites/cloud_hypervisor/ch_tests_tool.py | Reworks metrics execution to one run + JSON report parsing; adjusts host setup and diagnostics behavior. |
| lisa/microsoft/testsuites/cloud_hypervisor/ch_tests.py | Removes now-unused variables for metrics block-size configuration passed into the tool. |
| result = None | ||
| try: | ||
| result = self._run_with_enhanced_diagnostics( | ||
| cmd_args=cmd_args, | ||
| timeout=cmd_timeout, | ||
| log_path=log_path, | ||
| test_name=test_name, | ||
| numa_cmd=numa_cmd, | ||
| ) |
There was a problem hiding this comment.
Unlike the previous per-subtest execution path, any exception from _run_with_enhanced_diagnostics() (e.g., timeout, SSH failure) will currently abort run_metrics_tests() before per-subtest results are reported, defeating the purpose of --continue-on-failure. Consider catching expected execution exceptions here, recording a failed status for each requested testcase, and still running the artifact/report collection logic for debugging.
| # Store result for log writing | ||
| self._last_result = result | ||
|
|
There was a problem hiding this comment.
self._last_result is assigned after the per-testcase loop, but _write_testcase_log() (called inside the loop) relies on _last_result to include stdout/stderr. This can lead to per-testcase logs containing stale output (from a previous run) or only the fallback trace. Consider setting _last_result = result before the loop (or passing result into _write_testcase_log() explicitly) to keep per-testcase logs accurate.
| except Exception as e: | ||
| self._log.warning( | ||
| f"Failed to retrieve metrics report: {e}. " |
There was a problem hiding this comment.
The report retrieval/parsing wraps everything in a broad except Exception, which can mask programming errors and makes it harder to differentiate "report missing" from "report malformed". Consider catching the specific exceptions expected here (e.g., FileNotFoundError/OSError from copy_back, and json.JSONDecodeError from parsing), and include the remote/local report paths in the warning for easier investigation.
| except Exception as e: | |
| self._log.warning( | |
| f"Failed to retrieve metrics report: {e}. " | |
| except OSError as e: | |
| self._log.warning( | |
| f"Failed to retrieve metrics report from remote path " | |
| f"{remote_report} to local path {local_report}: {e}. " | |
| "Verify the report file was generated and the copy_back " | |
| "operation succeeded. All subtests will be marked as failed." | |
| ) | |
| per_test_results = {} | |
| except json.JSONDecodeError as e: | |
| self._log.warning( | |
| f"Failed to parse metrics report copied from remote path " | |
| f"{remote_report} to local path {local_report}: {e}. " | |
| "Verify the report file contains valid JSON. " |
| echo "[env] CH_HANG_KILL_SECS=${{CH_HANG_KILL_SECS:-1800}}" | ||
| echo "[env] CH_CHECK_INTERVAL=${{CH_CHECK_INTERVAL:-30}}" | ||
| echo "[env] MIGRATABLE_VERSION=${{MIGRATABLE_VERSION:-not_set}}" | ||
| test -x scripts/dev_cli.sh || {{ echo "[error] scripts/dev_cli.sh missing"; exit 98; }} | ||
|
|
There was a problem hiding this comment.
The wrapper still relies on scripts/dev_cli.sh being present/executable, but the explicit pre-flight check for it was removed in this hunk. Without it, failures become a generic "No such file or directory" later in the script. Consider restoring an early test -x scripts/dev_cli.sh check with a clear message/exit code so missing checkout/layout issues are immediately actionable.
There was a problem hiding this comment.
Existence of dev_cli.sh is verified during tool initialization. If there is no dev_cli.sh it will never reach here.
| def _build_metrics_cmd_args( | ||
| self, testcase: str, hypervisor: str, subtest_timeout: Optional[int] | ||
| self, hypervisor: str, subtest_timeout: Optional[int] | ||
| ) -> str: | ||
| """Build command arguments for metrics test.""" | ||
| cmd_args = ( | ||
| f"tests --hypervisor {hypervisor} --metrics -- --" | ||
| f" --test-filter {testcase}" | ||
| " --continue-on-failure" | ||
| f" --report-file /cloud-hypervisor/{self.METRICS_REPORT_FILE}" | ||
| ) | ||
|
|
||
| # Use subtest_timeout if provided, otherwise check for MQ test extension | ||
| if subtest_timeout: | ||
| cmd_args = f"{cmd_args} --timeout {subtest_timeout}" | ||
| elif self._is_multi_queue_test(testcase): | ||
| # Lengthen measure window for MQ tests to ~90s | ||
| # This reduces influence of transient queue skew | ||
| mq_timeout = self.perf_mq_test_timeout | ||
| if mq_timeout > 0: | ||
| cmd_args = f"{cmd_args} --timeout {mq_timeout}" | ||
|
|
||
| return cmd_args |
There was a problem hiding this comment.
run_metrics_tests() still accepts only/skip, and _prepare_metrics_subtests() filters the list, but _build_metrics_cmd_args() no longer applies any filter/skip arguments to the actual Cloud Hypervisor invocation. As a result, the tool will execute all metrics tests even when the caller requested a subset. Either pass an appropriate filter/skip option through to the CLI (if supported) or remove/rename these parameters to avoid misleading behavior.
log_path parameter is unused in several places. Remove it. Signed-off-by: Anirudh Rayabharam <anrayabh@microsoft.com>
e3b3163 to
80ec8fe
Compare
80ec8fe to
ca8df13
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
lisa/microsoft/testsuites/cloud_hypervisor/ch_tests_tool.py:539
run_metrics_testsno longer treats the overall command exit status as authoritative. If the metrics runner exits non-zero (e.g., one or more subtests failed) but the JSON report still contains entries, the function can end up passing becausefailed_testcasesstays empty and there is noresult.exit_code/result.is_timeoutassertion. Please fail the test whenresult.is_timeoutis true orresult.exit_code != 0, and/or cross-check the report contents against the expected testcase list so missing/failed subtests can’t be silently ignored.
for testcase, entry in per_test_results.items():
status, metrics, trace = self._classify_test_result(testcase, entry)
if status == TestStatus.FAILED:
failed_testcases.append(testcase)
self._send_metrics_test_result(
test_result, testcase, status, metrics, trace
)
if len(failed_testcases) > 0:
diagnostic_info = self._extract_diagnostic_info(
log_path, test_name, result
)
self._log.error(
f"Test cases failed: {failed_testcases} "
f"Diagnostics: {diagnostic_info}"
)
self._check_test_panic_from_logs(
test_result=test_result,
content=result.stdout,
stage="metrics tests",
test_name="run_metrics_tests",
)
self._save_kernel_logs(log_path)
# Check for kernel panic after all tests complete
if self.node.features.is_supported(SerialConsole):
self.node.features[SerialConsole].check_panic(
saved_path=log_path, force_run=True, test_result=test_result
)
assert_that(
failed_testcases, f"Failed Testcases: {failed_testcases}"
).is_empty()
| def _classify_test_result( | ||
| self, | ||
| testcase: str, | ||
| result_entry: Dict[str, Any], | ||
| ) -> Tuple[TestStatus, str, str]: | ||
| """Process the result of a metrics test.""" | ||
| if result.exit_code == 0: | ||
| status = TestStatus.PASSED | ||
| metrics = self._process_perf_metric_test_result(result.stdout) | ||
| trace = "" | ||
| else: | ||
| status = TestStatus.FAILED | ||
| metrics = "" | ||
| # Get enhanced diagnostic information for better error reporting | ||
| diagnostic_info = self._extract_diagnostic_info(log_path, test_name, result) | ||
| if diagnostic_info: | ||
| trace = f"Testcase '{testcase}' failed: {diagnostic_info}" | ||
| else: | ||
| trace = ( | ||
| f"Testcase '{testcase}' failed with exit code " | ||
| f"{result.exit_code}" | ||
| ) | ||
| """Determine pass/fail for a single test from the metrics run. | ||
|
|
||
| Returns (status, metrics, trace). | ||
| """ | ||
| if result_entry.get("status") == "FAILED": | ||
| return ( | ||
| TestStatus.FAILED, | ||
| "", | ||
| f"Testcase '{testcase}' reported status FAILED.", | ||
| ) | ||
|
|
||
| return status, metrics, trace | ||
| metrics = f'"results": [{json.dumps(result_entry)}]' | ||
| return TestStatus.PASSED, metrics, "" |
There was a problem hiding this comment.
_classify_test_result checks result_entry.get("status") == "FAILED", but _parse_metrics_report’s documented schema shows each entry only has name/mean/std_dev/max/min (no status). With the current logic, a failed test may be reported as PASSED (or omitted entirely) unless the report format actually includes a status field. Please align classification with the real report schema (e.g., use an explicit failed flag if present, or rely on the overall exit code and presence/absence of expected result entries).
The current implementation lists all the perf tests and runs them one-by-one i.e. one dev_cli.sh per test. This process is too slow. It takes up around 5.5 hours for all the tests to run. Instead, run all the tests in one go and use the --continue-on-failure and --report-file parameters. --continue-on-failure ensures that we continue running the tests even after a test failure. --report-file produces a JSON file with all the test results. Signed-off-by: Anirudh Rayabharam <anrayabh@microsoft.com>
ca8df13 to
6788aeb
Compare
This pull request significantly refactors the Cloud Hypervisor performance metrics test logic to simplify test execution, improve result collection, and remove legacy per-test configuration. The main focus is on running all metrics subtests in a single invocation, collecting results from a generated JSON report, and eliminating no-longer-needed per-test environment variables and logic.
Related Issue
Type of Change
Checklist
Test Validation
Key Test Cases:
Impacted LISA Features:
Tested Azure Marketplace Images:
Test Results