Skip to content

Add validation for include filters in CodeList#573

Open
dc-almeida wants to merge 20 commits intoIAMconsortium:mainfrom
dc-almeida:feature/include-validation
Open

Add validation for include filters in CodeList#573
dc-almeida wants to merge 20 commits intoIAMconsortium:mainfrom
dc-almeida:feature/include-validation

Conversation

@dc-almeida
Copy link
Copy Markdown
Contributor

@dc-almeida dc-almeida commented Mar 17, 2026

Closes #551. Introduce validation for include filters used in nomenclature.yaml for codelists and mappings to ensure that all specified filters correspond to existing codes.
For codelists, include filters by code attributes, for mappings it filters by model name.

@dc-almeida dc-almeida added the enhancement New feature or request label Mar 17, 2026
@dc-almeida dc-almeida self-assigned this Mar 17, 2026
@dc-almeida
Copy link
Copy Markdown
Contributor Author

dc-almeida commented Mar 24, 2026

Added validation for mappings, as a necessary step when reading nomenclature.yaml (if mappings files include regions not present in external repos, raises).
Logs warning if no model mappings found in external repository, but doesn't raise (e.g.: legacy-definitions).
Refactored RegionAggregationMapping instantiation from file when creation a RegionProcessor to output an exception group instead of raising on first exception.
Updated ruff because it was crashing on my VSCode and realised it was a quite old version.
Added a test suggested by Copilot mocking fetch_repos which made me think might be a good method to apply in other tests to avoid the overhead of importing entire repos.
Added pydantic validation that closes #517

@dc-almeida dc-almeida marked this pull request as ready for review March 24, 2026 11:40
@phackstock
Copy link
Copy Markdown
Contributor

As an aside to this PR, my two cents on mocking external resources:

Added a test suggested by Copilot mocking fetch_repos which made me think might be a good method to apply in other tests to avoid the overhead of importing entire repos.

I'm always torn when it comes to mocking external resources/services/etc... On the one hand I agree that in order to speed up the tests, it's a good idea to mock things. On the other hand sometimes what you want is precisely an integration type test where you actually test the interaction of your piece of code with external resources.
So in short, in my opinion, mocking can be a good idea but not always. Sometimes you can unknowingly skip bugs by not actually interacting with external services and resources.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds validation to catch typos/mismatches in nomenclature.yaml include/exclude filters by ensuring include filters/patterns actually match existing codes/mappings, with accompanying regression tests.

Changes:

  • Add validation that repository include filters in CodeList match at least one code, raising an ExceptionGroup otherwise.
  • Add validation that mapping include patterns match at least one model in the mappings repository.
  • Add tests and new fixture config files to cover wrong nesting level, missing codes, missing hierarchies, and missing mappings.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
nomenclature/codelist.py Validates include filters against loaded codes and updates filtering types/signature.
nomenclature/config.py Adds config-level validation for mapping include patterns after fetching repos.
nomenclature/processor/region.py Collects mapping parse errors from external repos instead of failing fast.
tests/test_codelist.py Adds tests asserting missing-code/missing-hierarchy includes raise grouped errors.
tests/test_config.py Adds tests for include/exclude at wrong YAML nesting level and missing mapping includes.
tests/data/config/*.yaml New config fixtures for the new validation test cases.
pyproject.toml / poetry.lock Updates ruff dev dependency and lockfile.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread nomenclature/config.py Outdated
Comment thread tests/test_codelist.py Outdated
Comment thread tests/test_codelist.py Outdated
Comment thread nomenclature/codelist.py Outdated
Comment thread nomenclature/codelist.py
Comment thread nomenclature/codelist.py
@dc-almeida
Copy link
Copy Markdown
Contributor Author

As an aside to this PR, my two cents on mocking external resources:

Added a test suggested by Copilot mocking fetch_repos which made me think might be a good method to apply in other tests to avoid the overhead of importing entire repos.

I'm always torn when it comes to mocking external resources/services/etc... On the one hand I agree that in order to speed up the tests, it's a good idea to mock things. On the other hand sometimes what you want is precisely an integration type test where you actually test the interaction of your piece of code with external resources. So in short, in my opinion, mocking can be a good idea but not always. Sometimes you can unknowingly skip bugs by not actually interacting with external services and resources.

Agreed, the suggestion here would be to apply this for those tests that precisely are only importing external repos to access their data but where what matters is the data and not the connection.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread nomenclature/codelist.py Outdated
Comment thread nomenclature/config.py Outdated
Comment thread nomenclature/config.py
Comment thread tests/test_config.py Outdated
dc-almeida and others added 4 commits March 25, 2026 12:23
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@danielhuppmann danielhuppmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this feature this morning when implementing iiasa/prisma-workflow#18, very useful! But the error messages could be more precise, see some suggestion inline. And it's not clear to me how models are validated.

Comment thread nomenclature/codelist.py Outdated
Comment thread tests/data/config/include_nonexistent_code.yaml Outdated
Comment thread tests/test_codelist.py Outdated
for exp in expected:
assert excinfo.group_contains(
ValueError,
match=r"No codes found for include filter: " + exp,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match=r"No codes found for include filter: " + exp,
match=r"No regions found for include-filter: " + exp,

Comment thread tests/test_codelist.py Outdated
)
try:
with pytest.RaisesGroup(
ValueError, ValueError, match="Include filter validation failed"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ValueError, ValueError, match="Include filter validation failed"
ValueError, ValueError, match="Importing regions from external repository failed"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will have to be a more generic "Importing definitions..." because other dimensions can also be used in the include filter and the method is applicable to any type of codelist. It still distinguishes from definitions vs. mappings and the name of a variable vs. a region will be self-evident.

Comment thread tests/test_codelist.py Outdated

try:
with pytest.RaisesGroup(
ValueError, match="Include filter validation failed"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ValueError, match="Include filter validation failed"
ValueError, match="Importing mappings from external repository failed"

Comment thread tests/test_codelist.py Outdated
for exp in expected:
assert excinfo.group_contains(
ValueError,
match=r"No codes found for include filter: " + exp,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match=r"No codes found for include filter: " + exp,
match=r"No mapping found for include-filter: " + exp,

Comment thread nomenclature/config.py
repo.fetch_repo(target_folder / repo_name)

def validate_mapping_includes(self) -> None:
"""Validate that all mapping include patterns match at least one model."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not clear to me, where is the list of models coming from?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The all_models variable, the list of models is compiled from reading all external repo mappings files. So it checks all the models listed on all imported mappings files to see if there are filters that don't apply to any, and if so, raises.

It's not the most effective way (when a RegionProcessor is instantiated, the mapping files are read again), but without some big refactoring it was the only way I saw of raising the error as soon as possible (when nomenclature.yaml is read) and where all the necessary info was gathered (the include filter and the repo mappings to match against).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ensure that items included via nomenclature.yaml exist

4 participants