[Repo Assist] feat: auto-select default estimation method when method_name is omitted#1464
Conversation
When method_name is None, choose a sensible default: - binary treatment + backdoor estimand → propensity_score_stratification - continuous treatment + backdoor estimand → linear_regression - IV-only graph → iv.instrumental_variable - frontdoor-only graph → frontdoor.two_stage_regression - nothing identified → raise ValueError with actionable message - no directed path → return zero estimate immediately (avoids NameError) An INFO log message records the auto-selected method name. Previously method_name=None caused an UnboundLocalError on identifier_name because the if/pass block left it unset. This fulfils the feature design agreed in #365 (comment by @amit-sharma, Jan 2022). Adds three tests covering the binary-treatment, continuous-treatment, and no-valid-estimand paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds automatic default estimator selection in CausalModel.estimate_effect() when method_name is omitted, fixing a crash (UnboundLocalError) and aligning behavior with the long-standing TODO/issue discussion.
Changes:
- Implemented
_select_default_method_name()and wired it intoCausalModel.estimate_effect()whenmethod_name is None. - Added an early-return zero-effect
CausalEstimatefor theno_directed_path+method_name is Nonecase. - Added tests covering default selection for binary vs continuous treatment and the “no valid estimand” error path.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
dowhy/causal_model.py |
Implements default-method auto-selection, early return for no-directed-path, and updates the estimate_effect docstring. |
tests/test_causal_model.py |
Adds tests validating default estimator choice and the explicit error when no valid estimand exists. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| treatment_col = self._treatment[0] | ||
| if self._data[treatment_col].nunique() <= 2: | ||
| return "backdoor.propensity_score_stratification" | ||
| return "backdoor.linear_regression" |
There was a problem hiding this comment.
_select_default_method_name infers binary/continuous treatment using only self._treatment[0]. For multivariate treatments (supported by parse_state and dataset generators), this can misclassify the treatment type and auto-select an estimator that’s inappropriate or fails later. Consider either (a) requiring method_name explicitly when len(self._treatment) > 1, or (b) checking all treatment columns and treating the treatment as binary only if every treatment has ≤2 unique values (and otherwise falling back to a continuous-safe default).
| if method_name is None: | ||
| # TODO add propensity score as default backdoor method, iv as default iv method, add an informational message to show which method has been selected. | ||
| pass | ||
| else: | ||
| # TODO add dowhy as a prefix to all dowhy estimators | ||
| num_components = len(method_name.split(".")) | ||
| str_arr = method_name.split(".", maxsplit=1) | ||
| identifier_name = str_arr[0] | ||
| estimator_name = str_arr[1] | ||
| # This is done as all dowhy estimators have two parts and external ones have two or more parts | ||
| if num_components > 2: | ||
| estimator_package = estimator_name.split(".")[0] | ||
| if estimator_package == "dowhy": # For updated dowhy methods | ||
| estimator_method = estimator_name.split(".", maxsplit=1)[ | ||
| 1 | ||
| ] # discard dowhy from the full package name | ||
| causal_estimator_class = causal_estimators.get_class_object(estimator_method + "_estimator") | ||
| else: | ||
| third_party_estimator_package = estimator_package | ||
| causal_estimator_class = causal_estimators.get_class_object( | ||
| third_party_estimator_package, estimator_name | ||
| ) | ||
| if method_params is None: | ||
| method_params = {} | ||
| # Define the third-party estimation method to be used | ||
| method_params[third_party_estimator_package + "_estimator"] = estimator_name | ||
| else: # For older dowhy methods | ||
| self.logger.info(estimator_name) | ||
| # Process the dowhy estimators | ||
| causal_estimator_class = causal_estimators.get_class_object(estimator_name + "_estimator") | ||
|
|
||
| if method_params is not None and (num_components <= 2 or estimator_package == "dowhy"): | ||
| extra_args = method_params.get("init_params", {}) | ||
| else: | ||
| extra_args = {} | ||
| if method_params is None: | ||
| method_params = {} | ||
|
|
||
| identified_estimand.set_identifier_method(identifier_name) | ||
|
|
||
| # If not fit_estimator, attempt to retrieve existing estimator. | ||
| # Keep original behaviour to create new estimator if none found. | ||
| causal_estimator = None | ||
| if not fit_estimator: | ||
| causal_estimator = self.get_estimator(method_name) | ||
|
|
||
| if causal_estimator is None: | ||
| causal_estimator = causal_estimator_class( | ||
| if identified_estimand.no_directed_path: | ||
| self.logger.warning( | ||
| "No directed path from %s to %s. Causal effect is zero.", self._treatment, self._outcome | ||
| ) | ||
| return CausalEstimate( | ||
| None, | ||
| None, | ||
| None, | ||
| 0, | ||
| identified_estimand, | ||
| test_significance=test_significance, | ||
| evaluate_effect_strength=evaluate_effect_strength, | ||
| confidence_intervals=confidence_intervals, | ||
| **method_params, | ||
| **extra_args, | ||
| None, | ||
| control_value=control_value, | ||
| treatment_value=treatment_value, | ||
| ) |
There was a problem hiding this comment.
The new early-return path for method_name is None + identified_estimand.no_directed_path is behaviorally important (it avoids the previous UnboundLocalError), but it isn’t covered by the added tests. Adding a focused test that constructs a graph with no directed path and asserts estimate.value == 0 (and no exception) would prevent regressions in this branch.
Signed-off-by: Emre Kiciman <emrek@microsoft.com>
🤖 This PR was created by Repo Assist, an automated AI assistant. Please review carefully before merging.
Closes #365
Root Cause
CausalModel.estimate_effect()would raiseUnboundLocalError: local variable 'identifier_name' referenced before assignmentwhenevermethod_name=None(the default) was passed. This happened because theif method_name is None:branch was a barepass, leavingidentifier_nameandcausal_estimatorundefined before they were consumed by the call to the standaloneestimate_effect()helper.A companion bug: when there is no directed path from treatment to outcome and
method_name=None, the method would also crash (same unset variables), rather than returning the zero-effect estimate that the downstream helper would produce.Fix
Implements the design agreed by
@amit-sharmain the original issue comment (Jan 2022):New private helper
_select_default_method_name(identified_estimand):backdoor.propensity_score_stratificationbackdoor.linear_regressioniv.instrumental_variablefrontdoor.two_stage_regressionValueError(actionable message)Early return for
no_directed_path: whenmethod_name=Noneand there is no causal path, the method now returns aCausalEstimate(value=0)directly (matching the existing behaviour when amethod_namewas provided).Logging: an
INFO-level message records the auto-selected method so users can see what was chosen without having to inspect the source.The docstring for
estimate_effectis updated to document the new auto-selection behaviour.Tests Added
Three new tests in
TestCausalModel:test_estimate_effect_default_method_binary_treatment– binary treatment →PropensityScoreStratificationEstimatortest_estimate_effect_default_method_continuous_treatment– continuous treatment →LinearRegressionEstimatortest_estimate_effect_default_method_no_valid_estimand_raises– all estimands None →ValueErrorwith "method_name explicitly" in messageTrade-offs
method_nameexplicitly; the INFO log reminds them.two_stage_regression) is included for completeness but untested here since frontdoor identification is less common. Maintainers may choose to limit auto-selection to backdoor/IV only.Test Status
Code quality:
_select_default_method_nameis straightforward and low-risk