Skip to content

ch_tests_tool: run all perf tests in one go#4419

Open
anirudhrb wants to merge 4 commits intomicrosoft:mainfrom
anirudhrb:perf_tests_oneshot
Open

ch_tests_tool: run all perf tests in one go#4419
anirudhrb wants to merge 4 commits intomicrosoft:mainfrom
anirudhrb:perf_tests_oneshot

Conversation

@anirudhrb
Copy link
Copy Markdown
Collaborator

This pull request significantly refactors the Cloud Hypervisor performance metrics test logic to simplify test execution, improve result collection, and remove legacy per-test configuration. The main focus is on running all metrics subtests in a single invocation, collecting results from a generated JSON report, and eliminating no-longer-needed per-test environment variables and logic.

Related Issue

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Refactoring
  • Documentation update

Checklist

  • Description is filled in above
  • No credentials, secrets, or internal details are included
  • Peer review requested (if not, add required peer reviewers after raising PR)
  • Tests executed and results posted below

Test Validation

Key Test Cases:

Impacted LISA Features:

Tested Azure Marketplace Images:

Test Results

Image VM Size Result
PASSED / FAILED / SKIPPED

total_idle_time should be reset whenever there is log growth. There is
no point in adding to total idle time if progress was eventually made. In
long running tests, it is expected that there is no log growth for a
while. The existing logic would continue adding to total_time and
eventually timeout even if the tests are not stuck.

Signed-off-by: Anirudh Rayabharam <anrayabh@microsoft.com>
This is already checked by LISA's tool installation logic and we would
never reach here if dev_cli.sh didn't exist.

Signed-off-by: Anirudh Rayabharam <anrayabh@microsoft.com>
Copilot AI review requested due to automatic review settings April 16, 2026 11:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Cloud Hypervisor performance metrics test flow to run all metrics subtests in a single invocation, then derive per-subtest pass/fail and metrics from a generated JSON report, removing legacy per-test env/config handling.

Changes:

  • Run metrics tests once with --continue-on-failure and collect per-subtest results from a JSON report.
  • Consolidate one-time host setup into _ensure_host_setup() and remove per-test policies/env var logic.
  • Remove MS-CLH variables for per-test block-size tuning that are no longer used.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
lisa/microsoft/testsuites/cloud_hypervisor/ch_tests_tool.py Reworks metrics execution to one run + JSON report parsing; adjusts host setup and diagnostics behavior.
lisa/microsoft/testsuites/cloud_hypervisor/ch_tests.py Removes now-unused variables for metrics block-size configuration passed into the tool.

Comment on lines +460 to +468
result = None
try:
result = self._run_with_enhanced_diagnostics(
cmd_args=cmd_args,
timeout=cmd_timeout,
log_path=log_path,
test_name=test_name,
numa_cmd=numa_cmd,
)
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the previous per-subtest execution path, any exception from _run_with_enhanced_diagnostics() (e.g., timeout, SSH failure) will currently abort run_metrics_tests() before per-subtest results are reported, defeating the purpose of --continue-on-failure. Consider catching expected execution exceptions here, recording a failed status for each requested testcase, and still running the artifact/report collection logic for debugging.

Copilot uses AI. Check for mistakes.
Comment on lines +511 to +513
# Store result for log writing
self._last_result = result

Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._last_result is assigned after the per-testcase loop, but _write_testcase_log() (called inside the loop) relies on _last_result to include stdout/stderr. This can lead to per-testcase logs containing stale output (from a previous run) or only the fallback trace. Consider setting _last_result = result before the loop (or passing result into _write_testcase_log() explicitly) to keep per-testcase logs accurate.

Copilot uses AI. Check for mistakes.
Comment on lines +478 to +480
except Exception as e:
self._log.warning(
f"Failed to retrieve metrics report: {e}. "
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The report retrieval/parsing wraps everything in a broad except Exception, which can mask programming errors and makes it harder to differentiate "report missing" from "report malformed". Consider catching the specific exceptions expected here (e.g., FileNotFoundError/OSError from copy_back, and json.JSONDecodeError from parsing), and include the remote/local report paths in the warning for easier investigation.

Suggested change
except Exception as e:
self._log.warning(
f"Failed to retrieve metrics report: {e}. "
except OSError as e:
self._log.warning(
f"Failed to retrieve metrics report from remote path "
f"{remote_report} to local path {local_report}: {e}. "
"Verify the report file was generated and the copy_back "
"operation succeeded. All subtests will be marked as failed."
)
per_test_results = {}
except json.JSONDecodeError as e:
self._log.warning(
f"Failed to parse metrics report copied from remote path "
f"{remote_report} to local path {local_report}: {e}. "
"Verify the report file contains valid JSON. "

Copilot uses AI. Check for mistakes.
Comment on lines 971 to 974
echo "[env] CH_HANG_KILL_SECS=${{CH_HANG_KILL_SECS:-1800}}"
echo "[env] CH_CHECK_INTERVAL=${{CH_CHECK_INTERVAL:-30}}"
echo "[env] MIGRATABLE_VERSION=${{MIGRATABLE_VERSION:-not_set}}"
test -x scripts/dev_cli.sh || {{ echo "[error] scripts/dev_cli.sh missing"; exit 98; }}

Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapper still relies on scripts/dev_cli.sh being present/executable, but the explicit pre-flight check for it was removed in this hunk. Without it, failures become a generic "No such file or directory" later in the script. Consider restoring an early test -x scripts/dev_cli.sh check with a clear message/exit code so missing checkout/layout issues are immediately actionable.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existence of dev_cli.sh is verified during tool initialization. If there is no dev_cli.sh it will never reach here.

Comment thread lisa/microsoft/testsuites/cloud_hypervisor/ch_tests_tool.py
Comment on lines 631 to 641
def _build_metrics_cmd_args(
self, testcase: str, hypervisor: str, subtest_timeout: Optional[int]
self, hypervisor: str, subtest_timeout: Optional[int]
) -> str:
"""Build command arguments for metrics test."""
cmd_args = (
f"tests --hypervisor {hypervisor} --metrics -- --"
f" --test-filter {testcase}"
" --continue-on-failure"
f" --report-file /cloud-hypervisor/{self.METRICS_REPORT_FILE}"
)

# Use subtest_timeout if provided, otherwise check for MQ test extension
if subtest_timeout:
cmd_args = f"{cmd_args} --timeout {subtest_timeout}"
elif self._is_multi_queue_test(testcase):
# Lengthen measure window for MQ tests to ~90s
# This reduces influence of transient queue skew
mq_timeout = self.perf_mq_test_timeout
if mq_timeout > 0:
cmd_args = f"{cmd_args} --timeout {mq_timeout}"

return cmd_args
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_metrics_tests() still accepts only/skip, and _prepare_metrics_subtests() filters the list, but _build_metrics_cmd_args() no longer applies any filter/skip arguments to the actual Cloud Hypervisor invocation. As a result, the tool will execute all metrics tests even when the caller requested a subset. Either pass an appropriate filter/skip option through to the CLI (if supported) or remove/rename these parameters to avoid misleading behavior.

Copilot uses AI. Check for mistakes.
log_path parameter is unused in several places. Remove it.

Signed-off-by: Anirudh Rayabharam <anrayabh@microsoft.com>
@anirudhrb anirudhrb force-pushed the perf_tests_oneshot branch from e3b3163 to 80ec8fe Compare April 20, 2026 13:05
Copilot AI review requested due to automatic review settings April 23, 2026 18:40
@anirudhrb anirudhrb force-pushed the perf_tests_oneshot branch from 80ec8fe to ca8df13 Compare April 23, 2026 18:40
@anirudhrb anirudhrb marked this pull request as ready for review April 23, 2026 18:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

lisa/microsoft/testsuites/cloud_hypervisor/ch_tests_tool.py:539

  • run_metrics_tests no longer treats the overall command exit status as authoritative. If the metrics runner exits non-zero (e.g., one or more subtests failed) but the JSON report still contains entries, the function can end up passing because failed_testcases stays empty and there is no result.exit_code/result.is_timeout assertion. Please fail the test when result.is_timeout is true or result.exit_code != 0, and/or cross-check the report contents against the expected testcase list so missing/failed subtests can’t be silently ignored.
        for testcase, entry in per_test_results.items():
            status, metrics, trace = self._classify_test_result(testcase, entry)

            if status == TestStatus.FAILED:
                failed_testcases.append(testcase)

            self._send_metrics_test_result(
                test_result, testcase, status, metrics, trace
            )

        if len(failed_testcases) > 0:
            diagnostic_info = self._extract_diagnostic_info(
                log_path, test_name, result
            )

            self._log.error(
                f"Test cases failed: {failed_testcases} "
                f"Diagnostics: {diagnostic_info}"
            )

        self._check_test_panic_from_logs(
            test_result=test_result,
            content=result.stdout,
            stage="metrics tests",
            test_name="run_metrics_tests",
        )

        self._save_kernel_logs(log_path)

        # Check for kernel panic after all tests complete
        if self.node.features.is_supported(SerialConsole):
            self.node.features[SerialConsole].check_panic(
                saved_path=log_path, force_run=True, test_result=test_result
            )

        assert_that(
            failed_testcases, f"Failed Testcases: {failed_testcases}"
        ).is_empty()

Comment on lines +681 to +698
def _classify_test_result(
self,
testcase: str,
result_entry: Dict[str, Any],
) -> Tuple[TestStatus, str, str]:
"""Process the result of a metrics test."""
if result.exit_code == 0:
status = TestStatus.PASSED
metrics = self._process_perf_metric_test_result(result.stdout)
trace = ""
else:
status = TestStatus.FAILED
metrics = ""
# Get enhanced diagnostic information for better error reporting
diagnostic_info = self._extract_diagnostic_info(log_path, test_name, result)
if diagnostic_info:
trace = f"Testcase '{testcase}' failed: {diagnostic_info}"
else:
trace = (
f"Testcase '{testcase}' failed with exit code "
f"{result.exit_code}"
)
"""Determine pass/fail for a single test from the metrics run.

Returns (status, metrics, trace).
"""
if result_entry.get("status") == "FAILED":
return (
TestStatus.FAILED,
"",
f"Testcase '{testcase}' reported status FAILED.",
)

return status, metrics, trace
metrics = f'"results": [{json.dumps(result_entry)}]'
return TestStatus.PASSED, metrics, ""
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_classify_test_result checks result_entry.get("status") == "FAILED", but _parse_metrics_report’s documented schema shows each entry only has name/mean/std_dev/max/min (no status). With the current logic, a failed test may be reported as PASSED (or omitted entirely) unless the report format actually includes a status field. Please align classification with the real report schema (e.g., use an explicit failed flag if present, or rely on the overall exit code and presence/absence of expected result entries).

Copilot uses AI. Check for mistakes.
The current implementation lists all the perf tests and runs them
one-by-one i.e. one dev_cli.sh per test. This process is too slow. It
takes up around 5.5 hours for all the tests to run.

Instead, run all the tests in one go and use the --continue-on-failure
and --report-file parameters. --continue-on-failure ensures that we
continue running the tests even after a test failure. --report-file
produces a JSON file with all the test results.

Signed-off-by: Anirudh Rayabharam <anrayabh@microsoft.com>
@anirudhrb anirudhrb force-pushed the perf_tests_oneshot branch from ca8df13 to 6788aeb Compare April 28, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants