[Repo Assist] fix: support multiple treatments in PlaceboTreatmentRefuter (closes #251) by github-actions[bot] · Pull Request #1467 · py-why/dowhy

github-actions · 2026-04-19T13:19:43Z

🤖 Created by Repo Assist, an automated AI assistant.

Summary

Fixes ValueError: Wrong number of items passed N, placement implies 1 when placebo_treatment_refuter is used with a multi-treatment CausalModel (issue #251, open since 2021).

Closes #251

Root Cause

_refute_once always assigned a single placebo column, regardless of how many treatments were in the model:

new_treatment = data[treatment_names].iloc[permuted_idx].values  # shape (n, 3) for 3 treatments
new_data = data.assign(placebo=new_treatment)  # ← raises ValueError
```

`pandas.DataFrame.assign` cannot accept a 2-D array as a single-column value. The same issue affected the `DEFAULT` (random data) path, which only generated data for `treatment_names[0]`.

## Fix

Two small helpers + minimal surgical changes:

| Helper | Purpose |
|--------|---------|
| `_get_placebo_names(treatment_names)` | Returns `["placebo"]` for 1 treatment (backward-compatible) and `["placebo_<name>"]` per treatment otherwise |
| `_generate_random_placebo(data, treatment_name, type_dict)` | Generates a random `pd.Series` for a single treatment column, respecting its dtype (float / bool / int / category) |

`_refute_once` now iterates over treatments, creating one placebo column per treatment.  
`refute_placebo_treatment` sets `identified_estimand.treatment_variable` to the full list of placebo column names.

Single-treatment behavior is unchanged (column is still named `"placebo"`).

## Tests

Two new parametrised tests in `tests/causal_refuters/test_placebo_refuter.py`:

```
test_placebo_refuter_multiple_treatments[permute]
test_placebo_refuter_multiple_treatments[Random Data]

Both use a 3-treatment linear dataset and verify that the refuter completes without error and that the placebo effect is smaller than the original estimate.

Test Status

All 7 placebo refuter tests pass (5 pre-existing + 2 new). Pre-existing test failures in test_dummy_outcome_refuter.py ("read-only array") and econml tests are unrelated to this change and were present on main before this PR.

Generated by 🌈 Repo Assist, see workflow run. Learn more.

To install this agentic workflow, run
gh aw add githubnext/agentics/workflows/repo-assist.md@11c9a2c442e519ff2b427bf58679f5a525353f76

) With multiple treatments, _refute_once was assigning a 2-D array as a single 'placebo' column, raising: ValueError: Wrong number of items passed N, placement implies 1 Root cause: the single-column assign(placebo=...) pattern cannot accept a 2-D array or multi-column DataFrame. Fix: * Add _get_placebo_names(): returns ['placebo'] for one treatment (backward-compatible) and ['placebo_<name>'] per treatment otherwise. * Extract _generate_random_placebo(): generates a per-treatment random Series respecting the original dtype (float/bool/int/category). * _refute_once now iterates over treatments, creating one placebo column per treatment using the above helpers. * refute_placebo_treatment sets identified_estimand.treatment_variable to the full list of placebo column names. * Add two parametrized tests covering 3-treatment permute and random data cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Copilot

Pull request overview

This PR fixes placebo_treatment_refuter failing on multi-treatment models by generating one placebo column per treatment (instead of trying to assign a 2-D array into a single "placebo" column), addressing issue #251.

Changes:

Add helpers to derive placebo column names for single vs multi-treatment cases.
Update placebo generation to create one placebo column per treatment for both permute and "Random Data" modes.
Add regression tests covering multi-treatment placebo refutation for both placebo types.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`dowhy/causal_refuters/placebo_treatment_refuter.py`	Introduces placebo naming + per-treatment placebo generation and updates estimand treatment variables accordingly.
`tests/causal_refuters/test_placebo_refuter.py`	Adds parametrized regression test to ensure multi-treatment placebo refutation runs and yields a small placebo effect.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-19T16:33:43Z

+        permuted_values = data[treatment_names].iloc[permuted_idx].values
+        new_data = data.copy()
+        for i, pname in enumerate(placebo_names):
+            col = permuted_values[:, i] if len(treatment_names) > 1 else permuted_values.ravel()
+            new_data[pname] = col


In the PERMUTE path, data[treatment_names].iloc[permuted_idx].values converts the permuted columns to a NumPy array, which can silently coerce dtypes (e.g., categorical -> object, mixed dtypes -> object). That can change estimator behavior compared to the original treatment columns. Consider permuting each treatment column as a pandas Series (preserving dtype) and then resetting its index to data.index before assignment so the permutation applies by position without index alignment undoing it.

Copilot · 2026-04-19T16:33:44Z

+def _generate_random_placebo(data: pd.DataFrame, treatment_name: str, type_dict: Dict) -> pd.Series:
+    """Generate a single random placebo column matching the dtype of *treatment_name*."""
+    dtype_name = type_dict[treatment_name].name
+    n = data.shape[0]
+    if "float" in dtype_name:
+        logger.info(
+            "Using a Normal Distribution with Mean:{} and Variance:{}".format(
+                DEFAULT_MEAN_OF_NORMAL,
+                DEFAULT_STD_DEV_OF_NORMAL,
+            )
+        )
+        return pd.Series(
+            np.random.randn(n) * DEFAULT_STD_DEV_OF_NORMAL + DEFAULT_MEAN_OF_NORMAL,
+            index=data.index,
+        )
+    elif "bool" in dtype_name:
+        logger.info(
+            "Using a Binomial Distribution with {} trials and {} probability of success".format(
+                DEFAULT_NUMBER_OF_TRIALS,
+                DEFAULT_PROBABILITY_OF_BINOMIAL,
+            )
+        )
+        return pd.Series(
+            np.random.binomial(DEFAULT_NUMBER_OF_TRIALS, DEFAULT_PROBABILITY_OF_BINOMIAL, n).astype(bool),
+            index=data.index,
+        )
+    elif "int" in dtype_name:
+        logger.info(


_generate_random_placebo infers dtype via substring checks on type_dict[treatment_name].name (e.g., "int" in dtype_name). This is brittle for pandas extension dtypes like Int64/Float64 (case mismatch) and can lead to the new ValueError even when the column is a valid numeric type. Prefer using pandas.api.types helpers (is_float_dtype/is_integer_dtype/is_bool_dtype/is_categorical_dtype) on the actual dtype/Series for a robust check.

Copilot · 2026-04-19T16:33:44Z

+        categories = data[treatment_name].unique()
+        logger.info("Using a Discrete Uniform Distribution with the following categories:{}".format(categories))
+        return pd.Series(np.random.choice(categories, size=n), index=data.index).astype("category")


For categorical treatments, sampling from data[treatment_name].unique() and then .astype('category') can drop the original category set/order and produce a different categorical dtype than the source column. It’s safer to preserve the original categories (e.g., use .cat.categories / original dtype) when generating the placebo so downstream estimators that rely on consistent categories don’t see a dtype change.

Suggested change

categories = data[treatment_name].unique()

logger.info("Using a Discrete Uniform Distribution with the following categories:{}".format(categories))

return pd.Series(np.random.choice(categories, size=n), index=data.index).astype("category")

treatment = data[treatment_name]

categories = treatment.cat.categories

logger.info("Using a Discrete Uniform Distribution with the following categories:{}".format(categories))

return pd.Series(

pd.Categorical(

np.random.choice(categories, size=n),

categories=categories,

ordered=treatment.cat.ordered,

),

index=data.index,

)

github-actions bot added automation repo-assist labels Apr 19, 2026

emrekiciman requested a review from Copilot April 19, 2026 16:30

Copilot started reviewing on behalf of emrekiciman April 19, 2026 16:30 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

github-actions bot mentioned this pull request Apr 20, 2026

[Repo Assist] Monthly Activity 2026-04 #1433

Open

41 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Repo Assist] fix: support multiple treatments in PlaceboTreatmentRefuter (closes #251)#1467

[Repo Assist] fix: support multiple treatments in PlaceboTreatmentRefuter (closes #251)#1467
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-issue-251-placebo-refuter-multi-treatment-8cbf89c6951064fd

github-actions bot commented Apr 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

github-actions bot commented Apr 19, 2026

Summary

Root Cause

Test Status

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant