Skip to content

Better provenance metadata#22

Open
agriyakhetarpal wants to merge 3 commits intomainfrom
provenance-updates
Open

Better provenance metadata#22
agriyakhetarpal wants to merge 3 commits intomainfrom
provenance-updates

Conversation

@agriyakhetarpal
Copy link
Collaborator

@agriyakhetarpal agriyakhetarpal commented Mar 11, 2026

This PR updates the provenance metadata for all tests. We now record provenance as a structured object that includes the test's node ID, the commit hash, and a direct URL to the relevant source lines in https://github.com/conda/conda/.

AI disclosure

I quickly realised after updating a few test file links manually that this was going to be a tedious task, so the update (see the second commit) was performed by a script that Claude Code, using the Sonnet 4.6 model, wrote upon my asking. Here is the script:

import ast, re, urllib.request, glob as g

COMMIT = "03329e0f4a627c9b9aa92ef34f7f93b9aa83e438"
BASE = f"https://raw.githubusercontent.com/conda/conda/{COMMIT}"

# 1. Collect all node_ids from YAML files
yaml_files = g.glob("conda-solver-tests/*.yaml")
node_ids = set()
for yf in yaml_files:
    with open(yf) as f:
        for line in f:
            m = re.match(r'\s+node_id:\s+(.+)', line)
            if m:
                node_ids.add(m.group(1).strip())

# 2. Group by source file
file_funcs = {}
for nid in node_ids:
    parts = nid.split("::")
    filepath, func_part = parts[0], parts[1]
    file_funcs.setdefault(filepath, set()).add(func_part)

# 3. Fetch files and parse with AST to find function line ranges
func_lines = {}
for filepath, funcs in file_funcs.items():
    url = f"{BASE}/{filepath}"
    with urllib.request.urlopen(url) as resp:
        source = resp.read().decode("utf-8")
    tree = ast.parse(source)
    for node in ast.walk(tree):
        # Class methods (e.g. SolverTests.test_iopro_mkl)
        if isinstance(node, ast.ClassDef):
            for item in node.body:
                if isinstance(item, (ast.FunctionDef, ast.AsyncFunctionDef)):
                    qualified = f"{node.name}.{item.name}"
                    if qualified in funcs:
                        func_lines[(filepath, qualified)] = (item.lineno, item.end_lineno)
        # Top-level functions
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
            if node.name in funcs:
                func_lines[(filepath, node.name)] = (node.lineno, node.end_lineno)

# 4. Patch YAML files: update url lines based on preceding node_id
for yf in yaml_files:
    with open(yf) as f:
        lines = f.readlines()
    new_lines = []
    current_node_id = None
    for line in lines:
        m = re.match(r'(\s+)node_id:\s+(.+)', line)
        if m:
            current_node_id = m.group(2).strip()
        url_m = re.match(r'(\s+)url:\s+(https://github\.com/conda/conda/blob/.+)', line)
        if url_m and current_node_id:
            indent = url_m.group(1)
            parts = current_node_id.split("::")
            key = (parts[0], parts[1])
            if key in func_lines:
                start, end = func_lines[key]
                new_url = f"https://github.com/conda/conda/blob/{COMMIT}/{parts[0]}#L{start}-L{end}"
                new_lines.append(f"{indent}url: {new_url}\n")
                current_node_id = None
                continue
        new_lines.append(line)
    with open(yf, 'w') as f:
        f.writelines(new_lines)
    print(f"Updated {yf}")

We could restructure the above script into a custom pre-commit hook manually, so that I don't have to come back to it again and do all of this later. Thoughts, @jaimergp?

@jaimergp
Copy link
Member

I kind of like it 😂 You mean a pre-commit hook, right? As long as it's not flaky, yep, why not. We can also add a meta-test specifically for that, as you wish. Maybe that's easier than a pre-commit hook.

@agriyakhetarpal
Copy link
Collaborator Author

I kind of like it 😂 You mean a pre-commit hook, right? As long as it's not flaky, yep, why not. We can also add a meta-test specifically for that, as you wish. Maybe that's easier than a pre-commit hook.

Ah, yes. I do mean a pre-commit hook. I realise I missed writing that in the PR description after the word "custom" 😅 Let me explore if a meta-test is easier/better; maybe it is.

That said, the linters are not running in this PR. Could you please install the pre-commit.ci app to this repository for me?

@agriyakhetarpal
Copy link
Collaborator Author

I spent some time adding a meta-test of sorts that runs a bunch of checks for the provenance data. This was a bit tedious, so agentic coding tools came in a bit handy here. However, I am aware that there is a bit more code than we'd imagine – we could loosen the validations a bit, if you'd like. I kind of like this a bit better than having a pre-commit hook, since it's executed with the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants