Skip to content

[Repo Assist] feat: auto-select default estimation method when method_name is omitted#1464

Open
github-actions[bot] wants to merge 2 commits intomainfrom
repo-assist/feature-365-smart-method-defaults-20260418-6cc4ff3716959e80
Open

[Repo Assist] feat: auto-select default estimation method when method_name is omitted#1464
github-actions[bot] wants to merge 2 commits intomainfrom
repo-assist/feature-365-smart-method-defaults-20260418-6cc4ff3716959e80

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 This PR was created by Repo Assist, an automated AI assistant. Please review carefully before merging.

Closes #365


Root Cause

CausalModel.estimate_effect() would raise UnboundLocalError: local variable 'identifier_name' referenced before assignment whenever method_name=None (the default) was passed. This happened because the if method_name is None: branch was a bare pass, leaving identifier_name and causal_estimator undefined before they were consumed by the call to the standalone estimate_effect() helper.

A companion bug: when there is no directed path from treatment to outcome and method_name=None, the method would also crash (same unset variables), rather than returning the zero-effect estimate that the downstream helper would produce.

Fix

Implements the design agreed by @amit-sharma in the original issue comment (Jan 2022):

New private helper _select_default_method_name(identified_estimand):

Identified estimand Treatment Auto-selected method
backdoor binary (≤ 2 unique values) backdoor.propensity_score_stratification
backdoor continuous backdoor.linear_regression
IV any iv.instrumental_variable
frontdoor any frontdoor.two_stage_regression
none valid any ValueError (actionable message)

Early return for no_directed_path: when method_name=None and there is no causal path, the method now returns a CausalEstimate(value=0) directly (matching the existing behaviour when a method_name was provided).

Logging: an INFO-level message records the auto-selected method so users can see what was chosen without having to inspect the source.

The docstring for estimate_effect is updated to document the new auto-selection behaviour.

Tests Added

Three new tests in TestCausalModel:

  1. test_estimate_effect_default_method_binary_treatment – binary treatment → PropensityScoreStratificationEstimator
  2. test_estimate_effect_default_method_continuous_treatment – continuous treatment → LinearRegressionEstimator
  3. test_estimate_effect_default_method_no_valid_estimand_raises – all estimands None → ValueError with "method_name explicitly" in message

Trade-offs

  • The auto-selection logic is heuristic (binary ≤ 2 unique values). Power users who want a specific estimator should always supply method_name explicitly; the INFO log reminds them.
  • Frontdoor default (two_stage_regression) is included for completeness but untested here since frontdoor identification is less common. Maintainers may choose to limit auto-selection to backdoor/IV only.

Test Status

⚠️ The test runner could not be executed in this environment (no Poetry/virtual env available). The changes were validated by careful code review.

Code quality:

  • ✅ All new lines ≤ 127 characters (flake8 limit)
  • ✅ No new imports required
  • ✅ No changes to public API signatures — existing callers unaffected
  • ✅ Logic of _select_default_method_name is straightforward and low-risk

Generated by 🌈 Repo Assist, see workflow run. Learn more.

Generated by 🌈 Repo Assist, see workflow run. Learn more.

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@11c9a2c442e519ff2b427bf58679f5a525353f76

When method_name is None, choose a sensible default:
- binary treatment + backdoor estimand → propensity_score_stratification
- continuous treatment + backdoor estimand → linear_regression
- IV-only graph → iv.instrumental_variable
- frontdoor-only graph → frontdoor.two_stage_regression
- nothing identified → raise ValueError with actionable message
- no directed path → return zero estimate immediately (avoids NameError)

An INFO log message records the auto-selected method name.

Previously method_name=None caused an UnboundLocalError on identifier_name
because the if/pass block left it unset.  This fulfils the feature design
agreed in #365 (comment by @amit-sharma, Jan 2022).

Adds three tests covering the binary-treatment, continuous-treatment, and
no-valid-estimand paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@github-actions github-actions bot added automation bug Something isn't working enhancement New feature or request repo-assist labels Apr 18, 2026
@emrekiciman emrekiciman marked this pull request as ready for review April 18, 2026 02:36
@emrekiciman emrekiciman requested a review from Copilot April 18, 2026 02:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds automatic default estimator selection in CausalModel.estimate_effect() when method_name is omitted, fixing a crash (UnboundLocalError) and aligning behavior with the long-standing TODO/issue discussion.

Changes:

  • Implemented _select_default_method_name() and wired it into CausalModel.estimate_effect() when method_name is None.
  • Added an early-return zero-effect CausalEstimate for the no_directed_path + method_name is None case.
  • Added tests covering default selection for binary vs continuous treatment and the “no valid estimand” error path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
dowhy/causal_model.py Implements default-method auto-selection, early return for no-directed-path, and updates the estimate_effect docstring.
tests/test_causal_model.py Adds tests validating default estimator choice and the explicit error when no valid estimand exists.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dowhy/causal_model.py
Comment on lines +414 to +417
treatment_col = self._treatment[0]
if self._data[treatment_col].nunique() <= 2:
return "backdoor.propensity_score_stratification"
return "backdoor.linear_regression"
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_select_default_method_name infers binary/continuous treatment using only self._treatment[0]. For multivariate treatments (supported by parse_state and dataset generators), this can misclassify the treatment type and auto-select an estimator that’s inappropriate or fails later. Consider either (a) requiring method_name explicitly when len(self._treatment) > 1, or (b) checking all treatment columns and treating the treatment as binary only if every treatment has ≤2 unique values (and otherwise falling back to a continuous-safe default).

Copilot uses AI. Check for mistakes.
Comment thread dowhy/causal_model.py
Comment on lines 311 to 325
if method_name is None:
# TODO add propensity score as default backdoor method, iv as default iv method, add an informational message to show which method has been selected.
pass
else:
# TODO add dowhy as a prefix to all dowhy estimators
num_components = len(method_name.split("."))
str_arr = method_name.split(".", maxsplit=1)
identifier_name = str_arr[0]
estimator_name = str_arr[1]
# This is done as all dowhy estimators have two parts and external ones have two or more parts
if num_components > 2:
estimator_package = estimator_name.split(".")[0]
if estimator_package == "dowhy": # For updated dowhy methods
estimator_method = estimator_name.split(".", maxsplit=1)[
1
] # discard dowhy from the full package name
causal_estimator_class = causal_estimators.get_class_object(estimator_method + "_estimator")
else:
third_party_estimator_package = estimator_package
causal_estimator_class = causal_estimators.get_class_object(
third_party_estimator_package, estimator_name
)
if method_params is None:
method_params = {}
# Define the third-party estimation method to be used
method_params[third_party_estimator_package + "_estimator"] = estimator_name
else: # For older dowhy methods
self.logger.info(estimator_name)
# Process the dowhy estimators
causal_estimator_class = causal_estimators.get_class_object(estimator_name + "_estimator")

if method_params is not None and (num_components <= 2 or estimator_package == "dowhy"):
extra_args = method_params.get("init_params", {})
else:
extra_args = {}
if method_params is None:
method_params = {}

identified_estimand.set_identifier_method(identifier_name)

# If not fit_estimator, attempt to retrieve existing estimator.
# Keep original behaviour to create new estimator if none found.
causal_estimator = None
if not fit_estimator:
causal_estimator = self.get_estimator(method_name)

if causal_estimator is None:
causal_estimator = causal_estimator_class(
if identified_estimand.no_directed_path:
self.logger.warning(
"No directed path from %s to %s. Causal effect is zero.", self._treatment, self._outcome
)
return CausalEstimate(
None,
None,
None,
0,
identified_estimand,
test_significance=test_significance,
evaluate_effect_strength=evaluate_effect_strength,
confidence_intervals=confidence_intervals,
**method_params,
**extra_args,
None,
control_value=control_value,
treatment_value=treatment_value,
)
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new early-return path for method_name is None + identified_estimand.no_directed_path is behaviorally important (it avoids the previous UnboundLocalError), but it isn’t covered by the added tests. Adding a focused test that constructs a graph with no directed path and asserts estimate.value == 0 (and no exception) would prevent regressions in this branch.

Copilot uses AI. Check for mistakes.
Signed-off-by: Emre Kiciman <emrek@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automation bug Something isn't working enhancement New feature or request repo-assist

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CausalModel.estimate_effect - UnboundLocalError: local variable 'identifier_name' referenced before assignment

2 participants