-
Notifications
You must be signed in to change notification settings - Fork 1k
[Repo Assist] fix: compute proper chi-squared p-value for CMI independence test in GraphRefuter #1431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Repo Assist] fix: compute proper chi-squared p-value for CMI independence test in GraphRefuter #1431
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,6 +1,8 @@ | ||||||||||||||||||||||||
| import logging | ||||||||||||||||||||||||
| from math import log | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| import numpy as np | ||||||||||||||||||||||||
| from scipy.stats import chi2 | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| from dowhy.causal_refuter import CausalRefutation, CausalRefuter | ||||||||||||||||||||||||
| from dowhy.utils.cit import conditional_MI, partial_corr | ||||||||||||||||||||||||
|
|
@@ -56,14 +58,27 @@ def partial_correlation(self, x=None, y=None, z=None): | |||||||||||||||||||||||
| self._results[key] = [p_value, True] | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| def conditional_mutual_information(self, x=None, y=None, z=None): | ||||||||||||||||||||||||
| cmi_val = conditional_MI(data=self._data, x=x, y=y, z=list(z)) | ||||||||||||||||||||||||
| cmi_bits = conditional_MI(data=self._data, x=x, y=y, z=list(z)) | ||||||||||||||||||||||||
| key = (x, y) + (z,) | ||||||||||||||||||||||||
| if cmi_val <= 0.05: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| n = len(self._data) | ||||||||||||||||||||||||
| # Convert CMI (bits) to G-test statistic (asymptotically chi-squared under H0) | ||||||||||||||||||||||||
| g_stat = 2 * n * cmi_bits * log(2) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Degrees of freedom: (|X| - 1)(|Y| - 1) * number of distinct Z combinations | ||||||||||||||||||||||||
| x_card = self._data[x].nunique() | ||||||||||||||||||||||||
| y_card = self._data[y].nunique() | ||||||||||||||||||||||||
| z_card = self._data[list(z)].drop_duplicates().shape[0] if z else 1 | ||||||||||||||||||||||||
| df = max(1, (x_card - 1) * (y_card - 1) * z_card) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| p_value = float(chi2.sf(g_stat, df=df)) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
| df = max(1, (x_card - 1) * (y_card - 1) * z_card) | |
| p_value = float(chi2.sf(g_stat, df=df)) | |
| df = (x_card - 1) * (y_card - 1) * z_card | |
| if x_card <= 1 or y_card <= 1 or df <= 0: | |
| # Degenerate contingency structure: the chi-squared approximation is not meaningful. | |
| # Treat this as a non-rejection instead of forcing df=1. | |
| p_value = 1.0 | |
| else: | |
| p_value = float(chi2.sf(g_stat, df=df)) |
Copilot
AI
Apr 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change alters the statistical decision rule by introducing a chi-squared p-value computation, but existing tests only assert overall pass/fail outcomes for random graphs. Adding a focused unit test that checks (1) an (approximately) independent discrete pair yields p_value >= 0.05 and (2) a dependent pair yields p_value < 0.05 would prevent regressions in the p-value computation (including df and unit conversions).
Uh oh!
There was an error while loading. Please reload this page.