Skip to content

Fix flaky timing assertion in gloo transport multiproc tests#505

Open
yahayaohinoyi wants to merge 1 commit into
pytorch:mainfrom
yahayaohinoyi:export-D102326105
Open

Fix flaky timing assertion in gloo transport multiproc tests#505
yahayaohinoyi wants to merge 1 commit into
pytorch:mainfrom
yahayaohinoyi:export-D102326105

Conversation

@yahayaohinoyi
Copy link
Copy Markdown

Summary:
The IoErrors and UnboundIoErrors tests assert that error propagation after SIGKILL completes within kMultiProcTimeout * 2 (6s). On loaded CI machines, TCP error detection and process scheduling delays can push wall-clock time well beyond this limit (observed ~10.6s), causing spurious failures. Relax the multiplier from 2x to 4x to accommodate CI variability while still catching genuine hangs.


overriding_review_checks_triggers_an_audit_and_retroactive_review
Oncall Short Name: testing_frameworks

Differential Revision: D102326105

Summary:
The IoErrors and UnboundIoErrors tests assert that error propagation after SIGKILL completes within `kMultiProcTimeout * 2` (6s). On loaded CI machines, TCP error detection and process scheduling delays can push wall-clock time well beyond this limit (observed ~10.6s), causing spurious failures. Relax the multiplier from 2x to 4x to accommodate CI variability while still catching genuine hangs.

___

overriding_review_checks_triggers_an_audit_and_retroactive_review
Oncall Short Name: testing_frameworks

Differential Revision: D102326105
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Apr 24, 2026

@yahayaohinoyi has exported this pull request. If you are a Meta employee, you can view the originating Diff in D102326105.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant