[Repo Assist] fix: update second-stage model estimand when pre-instantiated in TwoStageRegression#1462
Conversation
…tageRegression (closes #1335) When second_stage_model is passed as a pre-instantiated CausalEstimator, the estimator's _target_estimand was never updated to modified_target_estimand (which has identifier_method='backdoor' and the correct backdoor_variables). Instead the original mediation estimand (with identifier_method='mediation', default_backdoor_id=None) was used, causing KeyError: None when the second-stage model called get_backdoor_variables() during fit(). Fix: explicitly update _target_estimand to modified_target_estimand when a pre-instantiated CausalEstimator is supplied. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes a mediation/two-stage regression failure when second_stage_model is provided as a pre-instantiated estimator instance by ensuring its internal _target_estimand is rewritten to the backdoor-modified estimand (avoiding default_backdoor_id=None lookup issues).
Changes:
- Update
TwoStageRegressionEstimator.__init__to overwrite_second_stage_model._target_estimandwhen a pre-instantiated estimator instance is supplied. - Add regression tests covering the pre-instantiated
second_stage_modelpath for NIE mediation. - Minor formatting cleanup in test graph string
.replace()calls.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
dowhy/causal_estimators/two_stage_regression_estimator.py |
Updates handling of pre-instantiated second_stage_model to use the backdoor-modified estimand. |
tests/causal_estimators/test_two_stage_regression_estimator.py |
Adds regression tests for the pre-instantiated second-stage estimator path and cleans up string formatting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_nie_with_preinstantiated_second_stage_no_keyerror(self): | ||
| """Passing a pre-instantiated second_stage_model must not raise KeyError.""" | ||
| import statsmodels.api as sm | ||
|
|
||
| from dowhy.causal_estimators.generalized_linear_model_estimator import GeneralizedLinearModelEstimator | ||
|
|
||
| df = _make_mediation_data() | ||
| model = CausalModel(data=df, treatment="X", outcome="Y", graph=_MEDIATION_GML) | ||
| estimand = model.identify_effect( | ||
| estimand_type=EstimandType.NONPARAMETRIC_NIE, | ||
| proceed_when_unidentifiable=True, | ||
| ) | ||
| second_stage = GeneralizedLinearModelEstimator(identified_estimand=estimand, glm_family=sm.families.Gaussian()) | ||
| # This must not raise KeyError: None | ||
| estimate = model.estimate_effect( | ||
| identified_estimand=estimand, | ||
| method_name="mediation.two_stage_regression", | ||
| method_params={"second_stage_model": second_stage}, | ||
| ) | ||
| assert np.isfinite(estimate.value) | ||
|
|
||
| def test_nie_preinstantiated_second_stage_estimand_updated(self): | ||
| """The pre-instantiated second_stage_model's _target_estimand is updated to backdoor.""" | ||
| import statsmodels.api as sm | ||
|
|
||
| from dowhy.causal_estimators.generalized_linear_model_estimator import GeneralizedLinearModelEstimator | ||
|
|
||
| df = _make_mediation_data() | ||
| model = CausalModel(data=df, treatment="X", outcome="Y", graph=_MEDIATION_GML) | ||
| estimand = model.identify_effect( | ||
| estimand_type=EstimandType.NONPARAMETRIC_NIE, | ||
| proceed_when_unidentifiable=True, | ||
| ) | ||
| second_stage = GeneralizedLinearModelEstimator(identified_estimand=estimand, glm_family=sm.families.Gaussian()) | ||
| estimator = TwoStageRegressionEstimator( | ||
| identified_estimand=estimand, | ||
| second_stage_model=second_stage, | ||
| ) | ||
| assert estimator._second_stage_model._target_estimand.identifier_method == "backdoor" |
There was a problem hiding this comment.
The new regression coverage exercises a pre-instantiated second_stage_model, but the same failure mode can occur when first_stage_model is passed as a pre-instantiated estimator instance (its _target_estimand isn’t currently rewritten to the backdoor-modified estimand). Adding a test that passes a pre-instantiated first-stage estimator (and asserts no exception / correct identifier_method) would help prevent a partial regression fix.
| if second_stage_model is not None: | ||
| self._second_stage_model = ( | ||
| second_stage_model | ||
| if isinstance(second_stage_model, CausalEstimator) | ||
| else second_stage_model( | ||
| if isinstance(second_stage_model, CausalEstimator): | ||
| self._second_stage_model = second_stage_model | ||
| # Update the estimand so the second-stage model uses the correct | ||
| # backdoor configuration rather than the original mediation estimand. | ||
| self._second_stage_model._target_estimand = modified_target_estimand | ||
| else: | ||
| self._second_stage_model = second_stage_model( |
There was a problem hiding this comment.
The pre-instantiated-model handling was fixed for second_stage_model, but first_stage_model still accepts a pre-instantiated CausalEstimator without updating its _target_estimand to the backdoor-modified estimand. If a user passes a pre-instantiated RegressionEstimator (e.g., GeneralizedLinearModelEstimator) for the first stage with a mediation estimand, RegressionEstimator.fit() will raise (identifier_method='mediation') or hit the same default_backdoor_id=None issue. Consider applying the same pattern used below for second_stage_model to the first_stage_model branch as well (i.e., set self._first_stage_model._target_estimand to the first-stage modified_target_estimand).
Signed-off-by: Emre Kiciman <emrek@microsoft.com>
🤖 This PR was created by Repo Assist, an automated AI assistant. Please review carefully before merging.
Closes #1335
Root Cause
When a user passes a pre-instantiated
CausalEstimatorassecond_stage_modelinTwoStageRegressionEstimator, the instance is used as-is. The pre-instantiated estimator holds a_target_estimandwithidentifier_method="mediation"(from the NIE estimand) anddefault_backdoor_id=None.During
fit(), the second-stage model callsget_backdoor_variables():default_backdoor_id=NonecausedKeyError: None.The fix when a class (not an instance) is passed already worked correctly —
TwoStageRegressionEstimatorconstructed the model usingmodified_target_estimand(which hasidentifier_method="backdoor"and the correctbackdoor_variables). Only the pre-instantiated path was missing this update.Fix
In
TwoStageRegressionEstimator.__init__, whensecond_stage_modelis a pre-instantiatedCausalEstimator, explicitly update its_target_estimandtomodified_target_estimand: