[Repo Assist] test: increase sample size in linear classification auto-assignment test#1466
Conversation
…akiness The _generate_linear_classification_data() function used only 100 samples, which made the cross-validation scores for LogisticRegression and SVC very close. When the BETTER quality auto-assignment includes SVC in its candidate pool, SVC could occasionally win on small random data, causing the test 'test_given_linear_classification_problem_when_auto_assign_causal_models_ with_better_quality_returns_linear_model' to fail even with @flaky(max_runs=3). Increasing to 500 samples makes the linear signal strong and stable enough that LogisticRegression reliably outperforms SVC in cross-validation, eliminating the spurious SVC wins. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Stabilizes an intermittently failing linear classification auto-assignment test by increasing the generated dataset size used for cross-validation.
Changes:
- Increase
_generate_linear_classification_data()sample count from 100 to 500. - Add an inline comment explaining the rationale (stabilize CV so LogisticRegression beats SVC on linear data).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| X = np.random.normal(0, 1, (500, 5)) | ||
| Y = (np.sum(X * np.random.uniform(-5, 5, X.shape[1]), axis=1) > 0).astype(str) |
There was a problem hiding this comment.
This test data generator still relies on the global NumPy RNG state, so outcomes can remain non-deterministic (and potentially influenced by other tests consuming randomness). To fully eliminate flakiness, consider using a local, fixed-seed generator (e.g., rng = np.random.default_rng(seed) and calling rng.normal(...), rng.uniform(...)) inside _generate_linear_classification_data() so the dataset is reproducible across runs.
| X = np.random.normal(0, 1, (500, 5)) | |
| Y = (np.sum(X * np.random.uniform(-5, 5, X.shape[1]), axis=1) > 0).astype(str) | |
| rng = np.random.default_rng(0) | |
| X = rng.normal(0, 1, (500, 5)) | |
| Y = (np.sum(X * rng.uniform(-5, 5, X.shape[1]), axis=1) > 0).astype(str) |
🤖 This PR was created by Repo Assist, an automated AI assistant. Please review carefully before merging.
Problem
The test
test_given_linear_classification_problem_when_auto_assign_causal_models_with_better_quality_returns_linear_modelwas failing intermittently in CI, even with@flaky(max_runs=3). CI run 24607026398 (build 3.12, group 3) shows:Root Cause
_generate_linear_classification_data()used only 100 samples with randomly-generated class weights. With this small sample count:BETTERquality includesSVCin the candidate pool@flaky(max_runs=3)retries don't help because the same small random dataset is regenerated each attempt, and consistently favours SVCBy contrast,
_generate_non_linear_classification_data()already uses 1,000 samples, which is why those tests are stable.Fix
Increase
_generate_linear_classification_data()from 100 → 500 samples. With 500 samples:Test Status
This is a test-only change. No production code was modified.