Skip to content

[Repo Assist] fix: correct categorical groupby KeyError and pandas FutureWarning in dummy_outcome_refuter#1460

Open
github-actions[bot] wants to merge 4 commits intomainfrom
repo-assist/fix-issue-1316-pandas-groupby-warnings-0b1ea40ed2c81e89
Open

[Repo Assist] fix: correct categorical groupby KeyError and pandas FutureWarning in dummy_outcome_refuter#1460
github-actions[bot] wants to merge 4 commits intomainfrom
repo-assist/fix-issue-1316-pandas-groupby-warnings-0b1ea40ed2c81e89

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 This is an automated pull request from Repo Assist, an AI assistant. Please review all changes carefully before merging.

Fixes a KeyError bug and a pandas FutureWarning in DummyOutcomeRefuter._get_treatment_groups().

Closes #1316


Root Cause

In dummy_outcome_refuter.py, the _get_treatment_groups() method has two branches for treatment type: continuous/float (uses pd.cut"bins" column) and categorical (uses the treatment column directly). The categorical branch contained a copy-paste error:

# Before (categorical branch) — BUGGY
groups = data.groupby(treatment_variable_name)   # ✓ correct
groups = data.groupby("bins")                    # ✗ "bins" never exists here → KeyError

This caused an immediate KeyError whenever DummyOutcomeRefuter was used with a categorical treatment variable.

Additionally, both groupby calls used the default observed=False (deprecated in pandas ≥ 1.5), triggering a FutureWarning about the changing default when grouping CategoricalDtype columns (created by pd.cut).


Fix

  1. Removed the erroneous second data.groupby("bins") in the categorical branch.
  2. Added observed=True to both groupby calls ("bins" for continuous and treatment column for categorical) to suppress the FutureWarning and adopt the modern pandas default. This retains correct behaviour since only non-empty bins are relevant.
# After — fixed
if treatment_is_continuous:
    groups = data.groupby("bins", observed=True)
else:
    groups = data.groupby(treatment_variable_name, observed=True)

Test Status

A regression test test_refutation_dummy_outcome_refuter_default_categorical_treatment was added to tests/causal_refuters/test_dummy_outcome_refuter.py. This test uses the existing categorical_treatment_testsuite infrastructure and would previously raise KeyError: 'bins'. The test now passes correctly.

Format/lint checks:

  • black --check
  • flake8 --select=E9,F63,F7,F82

Full test suite could not be run due to environment constraints (missing system-level graphviz for pygraphviz), but the fix is surgical with no logic changes beyond removing a dead line and adding observed=True to two groupby calls.

Generated by 🌈 Repo Assist, see workflow run. Learn more.

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@11c9a2c442e519ff2b427bf58679f5a525353f76

…g in dummy_outcome_refuter

Two issues fixed in _get_treatment_groups():

1. Bug (categorical branch): line 767 overwrote the correct groupby result
   with data.groupby('bins') even though 'bins' was never created for
   categorical treatments.  This caused a KeyError whenever the refuter
   was run with a categorical treatment variable.  Fixed by removing
   the dead line so the function correctly returns groupby on the
   treatment column.

2. FutureWarning (pandas): groupby calls on columns with CategoricalDtype
   (pd.cut output 'bins', and categorical treatment) did not set
   observed=True, triggering a pandas FutureWarning about the changing
   default.  Added observed=True to both groupby calls; this retains
   existing behaviour because only non-empty bins are meaningful for
   the dummy-outcome estimator.

Test added: test_refutation_dummy_outcome_refuter_default_categorical_treatment
  — regression test that previously raised KeyError.

Refs #1316

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@github-actions github-actions bot added automation bug Something isn't working repo-assist labels Apr 16, 2026
@emrekiciman emrekiciman marked this pull request as ready for review April 17, 2026 03:58
@emrekiciman emrekiciman requested a review from Copilot April 17, 2026 03:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes DummyOutcomeRefuter._get_treatment_groups() for categorical treatments by removing an erroneous groupby("bins") call and explicitly setting observed=... on groupby to address a pandas FutureWarning.

Changes:

  • Remove incorrect categorical groupby("bins") that could raise KeyError.
  • Add observed=True to groupby(...) for both continuous ("bins") and categorical (treatment column) branches.
  • Add regression test covering categorical treatment refutation path (issue #1316).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
dowhy/causal_refuters/dummy_outcome_refuter.py Fixes categorical grouping KeyError and sets observed=True on groupby to avoid the pandas warning.
tests/causal_refuters/test_dummy_outcome_refuter.py Adds a regression test to ensure categorical treatments don’t trigger the previous KeyError.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dowhy/causal_refuters/dummy_outcome_refuter.py Outdated
Comment thread dowhy/causal_refuters/dummy_outcome_refuter.py Outdated
Comment thread tests/causal_refuters/test_dummy_outcome_refuter.py
refuter_tester.binary_treatment_testsuite(tests_to_run="atleast-one-common-cause", num_samples=num_samples)

@mark.parametrize(
["error_tolerence", "estimator_method", "num_samples"],
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name error_tolerence is misspelled (tolerence vs tolerance). Since this is newly added test code, consider renaming the local test parameter to error_tolerance for readability (keeping the passed value the same).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

[(0.1, "backdoor.propensity_score_matching", 1000)],
)
def test_refutation_dummy_outcome_refuter_default_categorical_treatment(
self, error_tolerence, estimator_method, num_samples
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name error_tolerence is misspelled (tolerence vs tolerance). Since this is newly added test code, consider renaming the local test parameter to error_tolerance for readability (keeping the passed value the same).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

):
# Regression test for #1316: the categorical branch had a copy-paste error that tried to
# groupby("bins") which doesn't exist for categorical treatments, raising KeyError.
refuter_tester = SimpleRefuter(error_tolerence, estimator_method, "dummy_outcome_refuter")
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name error_tolerence is misspelled (tolerence vs tolerance). Since this is newly added test code, consider renaming the local test parameter to error_tolerance for readability (keeping the passed value the same).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b7d0e71 — renamed the parameter from error_tolerence to error_tolerance in the newly added categorical treatment test (parametrize list, function signature, and body).

emrekiciman and others added 3 commits April 19, 2026 02:23
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Emre Kıcıman <emrek@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Emre Kıcıman <emrek@microsoft.com>
…al test

Signed-off-by: Copilot <copilot@github.com>

Agent-Logs-Url: https://github.com/py-why/dowhy/sessions/227e1739-c29d-4256-9664-7e83d209eb69

Co-authored-by: emrekiciman <5982160+emrekiciman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automation bug Something isn't working repo-assist

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Warnings using causal.model

3 participants