Skip to content

[Repo Assist] fix: use string key for column lookup in conditional_MI (fixes KeyError with multi-char column names)#1455

Open
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-issue-949-graph-refute-column-names-cit-a62690f05e7d1267
Open

[Repo Assist] fix: use string key for column lookup in conditional_MI (fixes KeyError with multi-char column names)#1455
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-issue-949-graph-refute-column-names-cit-a62690f05e7d1267

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 This is an automated fix from Repo Assist.

Closes #949

Root Cause

In dowhy/utils/cit.py, conditional_MI used data[list(x)] and data[list(y)] to access columns. When x and y are string column names (as they always are when called from GraphRefuter.conditional_mutual_information), list(x) iterates over individual characters:

list('Foo')  # → ['F', 'o', 'o']
data[['F', 'o', 'o']]  # → KeyError: "None of [Index(['F', 'o', 'o'], ...)] are in the [columns]"

This means any column name longer than one character would trigger the error.

Fix

Change data[list(x)]data[x] and data[list(y)]data[y], which performs standard single-column Series lookup. This is correct since x and y are scalar string column names.

Trade-offs

  • The fix is minimal and surgical — no behaviour change for the entropy calculation (iterating a Series yields values, which is what the downstream zip / format-string logic expects).
  • Single-character column names continue to work correctly.

Test Status

New tests added in tests/causal_refuters/test_graph_refuter.py:

  • test_conditional_mi_multi_char_column_names — regression test confirming multi-char names no longer raise KeyError
  • test_conditional_mi_single_char_column_names — ensures single-char names still work
  • test_graph_refuter_with_multi_char_columns — end-to-end refute_model call with multi-char columns

All new tests pass locally. Formatting (black, isort) and linting (flake8) pass on changed files.

Generated by 🌈 Repo Assist, see workflow run. Learn more.

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@11c9a2c442e519ff2b427bf58679f5a525353f76

When x or y are string column names passed to conditional_MI,
calling list(x) iterates over individual characters (e.g. 'Foo'
becomes ['F','o','o']) rather than treating the string as a key.

Fix: use data[x] and data[y] (Series lookup) instead of
data[list(x)] and data[list(y)] (multi-key DataFrame lookup).

Add tests to cover multi-character column names in GraphRefuter.

Closes #949

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error handling column names in CIT as used by CausalModel.graph_refute method

0 participants