Skip to content

[Repo Assist] fix: support multiple treatments in PlaceboTreatmentRefuter (closes #251)#1467

Draft
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-issue-251-placebo-refuter-multi-treatment-8cbf89c6951064fd
Draft

[Repo Assist] fix: support multiple treatments in PlaceboTreatmentRefuter (closes #251)#1467
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-issue-251-placebo-refuter-multi-treatment-8cbf89c6951064fd

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 Created by Repo Assist, an automated AI assistant.

Summary

Fixes ValueError: Wrong number of items passed N, placement implies 1 when placebo_treatment_refuter is used with a multi-treatment CausalModel (issue #251, open since 2021).

Closes #251

Root Cause

_refute_once always assigned a single placebo column, regardless of how many treatments were in the model:

new_treatment = data[treatment_names].iloc[permuted_idx].values  # shape (n, 3) for 3 treatments
new_data = data.assign(placebo=new_treatment)  # ← raises ValueError
```

`pandas.DataFrame.assign` cannot accept a 2-D array as a single-column value. The same issue affected the `DEFAULT` (random data) path, which only generated data for `treatment_names[0]`.

## Fix

Two small helpers + minimal surgical changes:

| Helper | Purpose |
|--------|---------|
| `_get_placebo_names(treatment_names)` | Returns `["placebo"]` for 1 treatment (backward-compatible) and `["placebo_<name>"]` per treatment otherwise |
| `_generate_random_placebo(data, treatment_name, type_dict)` | Generates a random `pd.Series` for a single treatment column, respecting its dtype (float / bool / int / category) |

`_refute_once` now iterates over treatments, creating one placebo column per treatment.  
`refute_placebo_treatment` sets `identified_estimand.treatment_variable` to the full list of placebo column names.

Single-treatment behavior is unchanged (column is still named `"placebo"`).

## Tests

Two new parametrised tests in `tests/causal_refuters/test_placebo_refuter.py`:

```
test_placebo_refuter_multiple_treatments[permute]
test_placebo_refuter_multiple_treatments[Random Data]

Both use a 3-treatment linear dataset and verify that the refuter completes without error and that the placebo effect is smaller than the original estimate.

Test Status

All 7 placebo refuter tests pass (5 pre-existing + 2 new). Pre-existing test failures in test_dummy_outcome_refuter.py ("read-only array") and econml tests are unrelated to this change and were present on main before this PR.

Generated by 🌈 Repo Assist, see workflow run. Learn more.

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@11c9a2c442e519ff2b427bf58679f5a525353f76

)

With multiple treatments, _refute_once was assigning a 2-D array as a
single 'placebo' column, raising:

  ValueError: Wrong number of items passed N, placement implies 1

Root cause: the single-column assign(placebo=...) pattern cannot accept
a 2-D array or multi-column DataFrame.

Fix:
* Add _get_placebo_names(): returns ['placebo'] for one treatment
  (backward-compatible) and ['placebo_<name>'] per treatment otherwise.
* Extract _generate_random_placebo(): generates a per-treatment random
  Series respecting the original dtype (float/bool/int/category).
* _refute_once now iterates over treatments, creating one placebo column
  per treatment using the above helpers.
* refute_placebo_treatment sets identified_estimand.treatment_variable
  to the full list of placebo column names.
* Add two parametrized tests covering 3-treatment permute and random
  data cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes placebo_treatment_refuter failing on multi-treatment models by generating one placebo column per treatment (instead of trying to assign a 2-D array into a single "placebo" column), addressing issue #251.

Changes:

  • Add helpers to derive placebo column names for single vs multi-treatment cases.
  • Update placebo generation to create one placebo column per treatment for both permute and "Random Data" modes.
  • Add regression tests covering multi-treatment placebo refutation for both placebo types.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
dowhy/causal_refuters/placebo_treatment_refuter.py Introduces placebo naming + per-treatment placebo generation and updates estimand treatment variables accordingly.
tests/causal_refuters/test_placebo_refuter.py Adds parametrized regression test to ensure multi-treatment placebo refutation runs and yields a small placebo effect.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +156 to +160
permuted_values = data[treatment_names].iloc[permuted_idx].values
new_data = data.copy()
for i, pname in enumerate(placebo_names):
col = permuted_values[:, i] if len(treatment_names) > 1 else permuted_values.ravel()
new_data[pname] = col
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PERMUTE path, data[treatment_names].iloc[permuted_idx].values converts the permuted columns to a NumPy array, which can silently coerce dtypes (e.g., categorical -> object, mixed dtypes -> object). That can change estimator behavior compared to the original treatment columns. Consider permuting each treatment column as a pandas Series (preserving dtype) and then resetting its index to data.index before assignment so the permutation applies by position without index alignment undoing it.

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +124
def _generate_random_placebo(data: pd.DataFrame, treatment_name: str, type_dict: Dict) -> pd.Series:
"""Generate a single random placebo column matching the dtype of *treatment_name*."""
dtype_name = type_dict[treatment_name].name
n = data.shape[0]
if "float" in dtype_name:
logger.info(
"Using a Normal Distribution with Mean:{} and Variance:{}".format(
DEFAULT_MEAN_OF_NORMAL,
DEFAULT_STD_DEV_OF_NORMAL,
)
)
return pd.Series(
np.random.randn(n) * DEFAULT_STD_DEV_OF_NORMAL + DEFAULT_MEAN_OF_NORMAL,
index=data.index,
)
elif "bool" in dtype_name:
logger.info(
"Using a Binomial Distribution with {} trials and {} probability of success".format(
DEFAULT_NUMBER_OF_TRIALS,
DEFAULT_PROBABILITY_OF_BINOMIAL,
)
)
return pd.Series(
np.random.binomial(DEFAULT_NUMBER_OF_TRIALS, DEFAULT_PROBABILITY_OF_BINOMIAL, n).astype(bool),
index=data.index,
)
elif "int" in dtype_name:
logger.info(
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_generate_random_placebo infers dtype via substring checks on type_dict[treatment_name].name (e.g., "int" in dtype_name). This is brittle for pandas extension dtypes like Int64/Float64 (case mismatch) and can lead to the new ValueError even when the column is a valid numeric type. Prefer using pandas.api.types helpers (is_float_dtype/is_integer_dtype/is_bool_dtype/is_categorical_dtype) on the actual dtype/Series for a robust check.

Copilot uses AI. Check for mistakes.
Comment on lines +134 to +136
categories = data[treatment_name].unique()
logger.info("Using a Discrete Uniform Distribution with the following categories:{}".format(categories))
return pd.Series(np.random.choice(categories, size=n), index=data.index).astype("category")
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For categorical treatments, sampling from data[treatment_name].unique() and then .astype('category') can drop the original category set/order and produce a different categorical dtype than the source column. It’s safer to preserve the original categories (e.g., use .cat.categories / original dtype) when generating the placebo so downstream estimators that rely on consistent categories don’t see a dtype change.

Suggested change
categories = data[treatment_name].unique()
logger.info("Using a Discrete Uniform Distribution with the following categories:{}".format(categories))
return pd.Series(np.random.choice(categories, size=n), index=data.index).astype("category")
treatment = data[treatment_name]
categories = treatment.cat.categories
logger.info("Using a Discrete Uniform Distribution with the following categories:{}".format(categories))
return pd.Series(
pd.Categorical(
np.random.choice(categories, size=n),
categories=categories,
ordered=treatment.cat.ordered,
),
index=data.index,
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Placebo refuter with multiple treatments

1 participant