Better provenance metadata by agriyakhetarpal · Pull Request #22 · conda-incubator/pytest-conda-solvers

agriyakhetarpal · 2026-03-11T10:52:05Z

This PR updates the provenance metadata for all tests. We now record provenance as a structured object that includes the test's node ID, the commit hash, and a direct URL to the relevant source lines in https://github.com/conda/conda/.

AI disclosure

I quickly realised after updating a few test file links manually that this was going to be a tedious task, so the update (see the second commit) was performed by a script that Claude Code, using the Sonnet 4.6 model, wrote upon my asking. Here is the script:

import ast, re, urllib.request, glob as g

COMMIT = "03329e0f4a627c9b9aa92ef34f7f93b9aa83e438"
BASE = f"https://raw.githubusercontent.com/conda/conda/{COMMIT}"

# 1. Collect all node_ids from YAML files
yaml_files = g.glob("conda-solver-tests/*.yaml")
node_ids = set()
for yf in yaml_files:
    with open(yf) as f:
        for line in f:
            m = re.match(r'\s+node_id:\s+(.+)', line)
            if m:
                node_ids.add(m.group(1).strip())

# 2. Group by source file
file_funcs = {}
for nid in node_ids:
    parts = nid.split("::")
    filepath, func_part = parts[0], parts[1]
    file_funcs.setdefault(filepath, set()).add(func_part)

# 3. Fetch files and parse with AST to find function line ranges
func_lines = {}
for filepath, funcs in file_funcs.items():
    url = f"{BASE}/{filepath}"
    with urllib.request.urlopen(url) as resp:
        source = resp.read().decode("utf-8")
    tree = ast.parse(source)
    for node in ast.walk(tree):
        # Class methods (e.g. SolverTests.test_iopro_mkl)
        if isinstance(node, ast.ClassDef):
            for item in node.body:
                if isinstance(item, (ast.FunctionDef, ast.AsyncFunctionDef)):
                    qualified = f"{node.name}.{item.name}"
                    if qualified in funcs:
                        func_lines[(filepath, qualified)] = (item.lineno, item.end_lineno)
        # Top-level functions
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
            if node.name in funcs:
                func_lines[(filepath, node.name)] = (node.lineno, node.end_lineno)

# 4. Patch YAML files: update url lines based on preceding node_id
for yf in yaml_files:
    with open(yf) as f:
        lines = f.readlines()
    new_lines = []
    current_node_id = None
    for line in lines:
        m = re.match(r'(\s+)node_id:\s+(.+)', line)
        if m:
            current_node_id = m.group(2).strip()
        url_m = re.match(r'(\s+)url:\s+(https://github\.com/conda/conda/blob/.+)', line)
        if url_m and current_node_id:
            indent = url_m.group(1)
            parts = current_node_id.split("::")
            key = (parts[0], parts[1])
            if key in func_lines:
                start, end = func_lines[key]
                new_url = f"https://github.com/conda/conda/blob/{COMMIT}/{parts[0]}#L{start}-L{end}"
                new_lines.append(f"{indent}url: {new_url}\n")
                current_node_id = None
                continue
        new_lines.append(line)
    with open(yf, 'w') as f:
        f.writelines(new_lines)
    print(f"Updated {yf}")

We could restructure the above script into a custom pre-commit hook manually, so that I don't have to come back to it again and do all of this later. Thoughts, @jaimergp?

jaimergp · 2026-03-11T11:09:46Z

I kind of like it 😂 You mean a pre-commit hook, right? As long as it's not flaky, yep, why not. We can also add a meta-test specifically for that, as you wish. Maybe that's easier than a pre-commit hook.

agriyakhetarpal · 2026-03-11T11:13:32Z

I kind of like it 😂 You mean a pre-commit hook, right? As long as it's not flaky, yep, why not. We can also add a meta-test specifically for that, as you wish. Maybe that's easier than a pre-commit hook.

Ah, yes. I do mean a pre-commit hook. I realise I missed writing that in the PR description after the word "custom" 😅 Let me explore if a meta-test is easier/better; maybe it is.

That said, the linters are not running in this PR. Could you please install the pre-commit.ci app to this repository for me?

agriyakhetarpal · 2026-03-11T13:48:01Z

I spent some time adding a meta-test of sorts that runs a bunch of checks for the provenance data. This was a bit tedious, so agentic coding tools came in a bit handy here. However, I am aware that there is a bit more code than we'd imagine – we could loosen the validations a bit, if you'd like. I kind of like this a bit better than having a pre-commit hook, since it's executed with the tests.

agriyakhetarpal added 2 commits March 11, 2026 15:33

Create a Provenance struct

6509fcf

Update all provenance metadata

d93925d

agriyakhetarpal requested a review from jaimergp March 11, 2026 10:52

Add some provenance validation tests

0d8d433

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better provenance metadata#22

Better provenance metadata#22
agriyakhetarpal wants to merge 3 commits intomainfrom
provenance-updates

agriyakhetarpal commented Mar 11, 2026 •

edited

Loading

Uh oh!

jaimergp commented Mar 11, 2026

Uh oh!

agriyakhetarpal commented Mar 11, 2026

Uh oh!

agriyakhetarpal commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

agriyakhetarpal commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI disclosure

Uh oh!

jaimergp commented Mar 11, 2026

Uh oh!

agriyakhetarpal commented Mar 11, 2026

Uh oh!

agriyakhetarpal commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

agriyakhetarpal commented Mar 11, 2026 •

edited

Loading