[Repo Assist] fix: update second-stage model estimand when pre-instantiated in TwoStageRegression by github-actions[bot] · Pull Request #1462 · py-why/dowhy

github-actions · 2026-04-17T13:30:15Z

🤖 This PR was created by Repo Assist, an automated AI assistant. Please review carefully before merging.

Root Cause

When a user passes a pre-instantiated CausalEstimator as second_stage_model in TwoStageRegressionEstimator, the instance is used as-is. The pre-instantiated estimator holds a _target_estimand with identifier_method="mediation" (from the NIE estimand) and default_backdoor_id=None.

During fit(), the second-stage model calls get_backdoor_variables():

elif self.backdoor_variables is not None and len(self.backdoor_variables) > 0:
    return self.backdoor_variables[self.default_backdoor_id]  # None → KeyError!

default_backdoor_id=None caused KeyError: None.

The fix when a class (not an instance) is passed already worked correctly — TwoStageRegressionEstimator constructed the model using modified_target_estimand (which has identifier_method="backdoor" and the correct backdoor_variables). Only the pre-instantiated path was missing this update.

Fix

In TwoStageRegressionEstimator.__init__, when second_stage_model is a pre-instantiated CausalEstimator, explicitly update its _target_estimand to modified_target_estimand:

if isinstance(second_stage_model, CausalEstimator):
    self._second_stage_model = second_stage_model
    self._second_stage_model._target_estimand = modified_target_estimand
```

## Trade-offs

- This mutates the user-provided estimator instance's `_target_estimand`. This is acceptable: `TwoStageRegressionEstimator` already mutates properties of `_second_stage_model` during `fit()` (e.g. `treatment_variable`). A user passing a pre-instantiated estimator to `TwoStageRegressionEstimator` implicitly delegates its lifecycle management.
- The behaviour when passing a class (not an instance) is unchanged.

## Test Status

Two new regression tests added to `tests/causal_estimators/test_two_stage_regression_estimator.py`:

1. `test_nie_with_preinstantiated_second_stage_no_keyerror` — verifies that `estimate_effect` completes without `KeyError` when a pre-instantiated `GeneralizedLinearModelEstimator` is passed.
2. `test_nie_preinstantiated_second_stage_estimand_updated` — verifies that after construction, `_second_stage_model._target_estimand.identifier_method == "backdoor"`.

All 10 tests in the file pass:
```
======================= 10 passed, 78 warnings in 5.09s ========================

Generated by 🌈 Repo Assist, see workflow run. Learn more.

To install this agentic workflow, run
gh aw add githubnext/agentics/workflows/repo-assist.md@11c9a2c442e519ff2b427bf58679f5a525353f76

…tageRegression (closes #1335) When second_stage_model is passed as a pre-instantiated CausalEstimator, the estimator's _target_estimand was never updated to modified_target_estimand (which has identifier_method='backdoor' and the correct backdoor_variables). Instead the original mediation estimand (with identifier_method='mediation', default_backdoor_id=None) was used, causing KeyError: None when the second-stage model called get_backdoor_variables() during fit(). Fix: explicitly update _target_estimand to modified_target_estimand when a pre-instantiated CausalEstimator is supplied. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Copilot

Pull request overview

Fixes a mediation/two-stage regression failure when second_stage_model is provided as a pre-instantiated estimator instance by ensuring its internal _target_estimand is rewritten to the backdoor-modified estimand (avoiding default_backdoor_id=None lookup issues).

Changes:

Update TwoStageRegressionEstimator.__init__ to overwrite _second_stage_model._target_estimand when a pre-instantiated estimator instance is supplied.
Add regression tests covering the pre-instantiated second_stage_model path for NIE mediation.
Minor formatting cleanup in test graph string .replace() calls.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`dowhy/causal_estimators/two_stage_regression_estimator.py`	Updates handling of pre-instantiated `second_stage_model` to use the backdoor-modified estimand.
`tests/causal_estimators/test_two_stage_regression_estimator.py`	Adds regression tests for the pre-instantiated second-stage estimator path and cleans up string formatting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T02:45:14Z

+    def test_nie_with_preinstantiated_second_stage_no_keyerror(self):
+        """Passing a pre-instantiated second_stage_model must not raise KeyError."""
+        import statsmodels.api as sm
+
+        from dowhy.causal_estimators.generalized_linear_model_estimator import GeneralizedLinearModelEstimator
+
+        df = _make_mediation_data()
+        model = CausalModel(data=df, treatment="X", outcome="Y", graph=_MEDIATION_GML)
+        estimand = model.identify_effect(
+            estimand_type=EstimandType.NONPARAMETRIC_NIE,
+            proceed_when_unidentifiable=True,
+        )
+        second_stage = GeneralizedLinearModelEstimator(identified_estimand=estimand, glm_family=sm.families.Gaussian())
+        # This must not raise KeyError: None
+        estimate = model.estimate_effect(
+            identified_estimand=estimand,
+            method_name="mediation.two_stage_regression",
+            method_params={"second_stage_model": second_stage},
+        )
+        assert np.isfinite(estimate.value)
+
+    def test_nie_preinstantiated_second_stage_estimand_updated(self):
+        """The pre-instantiated second_stage_model's _target_estimand is updated to backdoor."""
+        import statsmodels.api as sm
+
+        from dowhy.causal_estimators.generalized_linear_model_estimator import GeneralizedLinearModelEstimator
+
+        df = _make_mediation_data()
+        model = CausalModel(data=df, treatment="X", outcome="Y", graph=_MEDIATION_GML)
+        estimand = model.identify_effect(
+            estimand_type=EstimandType.NONPARAMETRIC_NIE,
+            proceed_when_unidentifiable=True,
+        )
+        second_stage = GeneralizedLinearModelEstimator(identified_estimand=estimand, glm_family=sm.families.Gaussian())
+        estimator = TwoStageRegressionEstimator(
+            identified_estimand=estimand,
+            second_stage_model=second_stage,
+        )
+        assert estimator._second_stage_model._target_estimand.identifier_method == "backdoor"


The new regression coverage exercises a pre-instantiated second_stage_model, but the same failure mode can occur when first_stage_model is passed as a pre-instantiated estimator instance (its _target_estimand isn’t currently rewritten to the backdoor-modified estimand). Adding a test that passes a pre-instantiated first-stage estimator (and asserts no exception / correct identifier_method) would help prevent a partial regression fix.

Copilot · 2026-04-18T02:45:14Z

        if second_stage_model is not None:
-            self._second_stage_model = (
-                second_stage_model
-                if isinstance(second_stage_model, CausalEstimator)
-                else second_stage_model(
+            if isinstance(second_stage_model, CausalEstimator):
+                self._second_stage_model = second_stage_model
+                # Update the estimand so the second-stage model uses the correct
+                # backdoor configuration rather than the original mediation estimand.
+                self._second_stage_model._target_estimand = modified_target_estimand
+            else:
+                self._second_stage_model = second_stage_model(


The pre-instantiated-model handling was fixed for second_stage_model, but first_stage_model still accepts a pre-instantiated CausalEstimator without updating its _target_estimand to the backdoor-modified estimand. If a user passes a pre-instantiated RegressionEstimator (e.g., GeneralizedLinearModelEstimator) for the first stage with a mediation estimand, RegressionEstimator.fit() will raise (identifier_method='mediation') or hit the same default_backdoor_id=None issue. Consider applying the same pattern used below for second_stage_model to the first_stage_model branch as well (i.e., set self._first_stage_model._target_estimand to the first-stage modified_target_estimand).

Signed-off-by: Emre Kiciman <emrek@microsoft.com>

github-actions bot added automation bug Something isn't working repo-assist labels Apr 17, 2026

github-actions bot mentioned this pull request Apr 18, 2026

[Repo Assist] Monthly Activity 2026-04 #1433

Open

41 tasks

emrekiciman marked this pull request as ready for review April 18, 2026 02:37

emrekiciman requested a review from Copilot April 18, 2026 02:37

Copilot started reviewing on behalf of emrekiciman April 18, 2026 02:41 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

fix formatting

5ee3d4f

Signed-off-by: Emre Kiciman <emrek@microsoft.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Repo Assist] fix: update second-stage model estimand when pre-instantiated in TwoStageRegression#1462

[Repo Assist] fix: update second-stage model estimand when pre-instantiated in TwoStageRegression#1462
github-actions[bot] wants to merge 2 commits intomainfrom
repo-assist/fix-issue-1335-twostage-preinstantiated-estimand-fd3a39553a65d491

github-actions bot commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

github-actions bot commented Apr 17, 2026

Root Cause

Fix

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants