Skip to content

[Repo Assist] fix: update second-stage model estimand when pre-instantiated in TwoStageRegression#1462

Open
github-actions[bot] wants to merge 2 commits intomainfrom
repo-assist/fix-issue-1335-twostage-preinstantiated-estimand-fd3a39553a65d491
Open

[Repo Assist] fix: update second-stage model estimand when pre-instantiated in TwoStageRegression#1462
github-actions[bot] wants to merge 2 commits intomainfrom
repo-assist/fix-issue-1335-twostage-preinstantiated-estimand-fd3a39553a65d491

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 This PR was created by Repo Assist, an automated AI assistant. Please review carefully before merging.

Closes #1335

Root Cause

When a user passes a pre-instantiated CausalEstimator as second_stage_model in TwoStageRegressionEstimator, the instance is used as-is. The pre-instantiated estimator holds a _target_estimand with identifier_method="mediation" (from the NIE estimand) and default_backdoor_id=None.

During fit(), the second-stage model calls get_backdoor_variables():

elif self.backdoor_variables is not None and len(self.backdoor_variables) > 0:
    return self.backdoor_variables[self.default_backdoor_id]  # None → KeyError!

default_backdoor_id=None caused KeyError: None.

The fix when a class (not an instance) is passed already worked correctly — TwoStageRegressionEstimator constructed the model using modified_target_estimand (which has identifier_method="backdoor" and the correct backdoor_variables). Only the pre-instantiated path was missing this update.

Fix

In TwoStageRegressionEstimator.__init__, when second_stage_model is a pre-instantiated CausalEstimator, explicitly update its _target_estimand to modified_target_estimand:

if isinstance(second_stage_model, CausalEstimator):
    self._second_stage_model = second_stage_model
    self._second_stage_model._target_estimand = modified_target_estimand
```

## Trade-offs

- This mutates the user-provided estimator instance's `_target_estimand`. This is acceptable: `TwoStageRegressionEstimator` already mutates properties of `_second_stage_model` during `fit()` (e.g. `treatment_variable`). A user passing a pre-instantiated estimator to `TwoStageRegressionEstimator` implicitly delegates its lifecycle management.
- The behaviour when passing a class (not an instance) is unchanged.

## Test Status

Two new regression tests added to `tests/causal_estimators/test_two_stage_regression_estimator.py`:

1. `test_nie_with_preinstantiated_second_stage_no_keyerror`verifies that `estimate_effect` completes without `KeyError` when a pre-instantiated `GeneralizedLinearModelEstimator` is passed.
2. `test_nie_preinstantiated_second_stage_estimand_updated`verifies that after construction, `_second_stage_model._target_estimand.identifier_method == "backdoor"`.

All 10 tests in the file pass:
```
======================= 10 passed, 78 warnings in 5.09s ========================

Generated by 🌈 Repo Assist, see workflow run. Learn more.

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@11c9a2c442e519ff2b427bf58679f5a525353f76

…tageRegression (closes #1335)

When second_stage_model is passed as a pre-instantiated CausalEstimator, the
estimator's _target_estimand was never updated to modified_target_estimand (which
has identifier_method='backdoor' and the correct backdoor_variables). Instead the
original mediation estimand (with identifier_method='mediation',
default_backdoor_id=None) was used, causing KeyError: None when the second-stage
model called get_backdoor_variables() during fit().

Fix: explicitly update _target_estimand to modified_target_estimand when a
pre-instantiated CausalEstimator is supplied.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@github-actions github-actions bot added automation bug Something isn't working repo-assist labels Apr 17, 2026
@emrekiciman emrekiciman marked this pull request as ready for review April 18, 2026 02:37
@emrekiciman emrekiciman requested a review from Copilot April 18, 2026 02:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a mediation/two-stage regression failure when second_stage_model is provided as a pre-instantiated estimator instance by ensuring its internal _target_estimand is rewritten to the backdoor-modified estimand (avoiding default_backdoor_id=None lookup issues).

Changes:

  • Update TwoStageRegressionEstimator.__init__ to overwrite _second_stage_model._target_estimand when a pre-instantiated estimator instance is supplied.
  • Add regression tests covering the pre-instantiated second_stage_model path for NIE mediation.
  • Minor formatting cleanup in test graph string .replace() calls.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
dowhy/causal_estimators/two_stage_regression_estimator.py Updates handling of pre-instantiated second_stage_model to use the backdoor-modified estimand.
tests/causal_estimators/test_two_stage_regression_estimator.py Adds regression tests for the pre-instantiated second-stage estimator path and cleans up string formatting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +325 to +363
def test_nie_with_preinstantiated_second_stage_no_keyerror(self):
"""Passing a pre-instantiated second_stage_model must not raise KeyError."""
import statsmodels.api as sm

from dowhy.causal_estimators.generalized_linear_model_estimator import GeneralizedLinearModelEstimator

df = _make_mediation_data()
model = CausalModel(data=df, treatment="X", outcome="Y", graph=_MEDIATION_GML)
estimand = model.identify_effect(
estimand_type=EstimandType.NONPARAMETRIC_NIE,
proceed_when_unidentifiable=True,
)
second_stage = GeneralizedLinearModelEstimator(identified_estimand=estimand, glm_family=sm.families.Gaussian())
# This must not raise KeyError: None
estimate = model.estimate_effect(
identified_estimand=estimand,
method_name="mediation.two_stage_regression",
method_params={"second_stage_model": second_stage},
)
assert np.isfinite(estimate.value)

def test_nie_preinstantiated_second_stage_estimand_updated(self):
"""The pre-instantiated second_stage_model's _target_estimand is updated to backdoor."""
import statsmodels.api as sm

from dowhy.causal_estimators.generalized_linear_model_estimator import GeneralizedLinearModelEstimator

df = _make_mediation_data()
model = CausalModel(data=df, treatment="X", outcome="Y", graph=_MEDIATION_GML)
estimand = model.identify_effect(
estimand_type=EstimandType.NONPARAMETRIC_NIE,
proceed_when_unidentifiable=True,
)
second_stage = GeneralizedLinearModelEstimator(identified_estimand=estimand, glm_family=sm.families.Gaussian())
estimator = TwoStageRegressionEstimator(
identified_estimand=estimand,
second_stage_model=second_stage,
)
assert estimator._second_stage_model._target_estimand.identifier_method == "backdoor"
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new regression coverage exercises a pre-instantiated second_stage_model, but the same failure mode can occur when first_stage_model is passed as a pre-instantiated estimator instance (its _target_estimand isn’t currently rewritten to the backdoor-modified estimand). Adding a test that passes a pre-instantiated first-stage estimator (and asserts no exception / correct identifier_method) would help prevent a partial regression fix.

Copilot uses AI. Check for mistakes.
Comment on lines 118 to +125
if second_stage_model is not None:
self._second_stage_model = (
second_stage_model
if isinstance(second_stage_model, CausalEstimator)
else second_stage_model(
if isinstance(second_stage_model, CausalEstimator):
self._second_stage_model = second_stage_model
# Update the estimand so the second-stage model uses the correct
# backdoor configuration rather than the original mediation estimand.
self._second_stage_model._target_estimand = modified_target_estimand
else:
self._second_stage_model = second_stage_model(
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pre-instantiated-model handling was fixed for second_stage_model, but first_stage_model still accepts a pre-instantiated CausalEstimator without updating its _target_estimand to the backdoor-modified estimand. If a user passes a pre-instantiated RegressionEstimator (e.g., GeneralizedLinearModelEstimator) for the first stage with a mediation estimand, RegressionEstimator.fit() will raise (identifier_method='mediation') or hit the same default_backdoor_id=None issue. Consider applying the same pattern used below for second_stage_model to the first_stage_model branch as well (i.e., set self._first_stage_model._target_estimand to the first-stage modified_target_estimand).

Copilot uses AI. Check for mistakes.
Signed-off-by: Emre Kiciman <emrek@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automation bug Something isn't working repo-assist

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue with Logistic Regression in Mediation Analysis

2 participants