Skip to content
Open
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
a00b0bd
Add ProcessorConfig to nomenclature.yaml
dc-almeida Mar 2, 2026
36848fe
Add `ProcessorConfig` and update docstrings
dc-almeida Mar 5, 2026
124667f
Refactor aggregation in `RegionProcessor`
dc-almeida Mar 6, 2026
9de72d6
Fix type hint for models in NutsProcessor class
dc-almeida Mar 9, 2026
a346ded
Implement NutsProcessor for NUTS region aggregation in scenario proce…
dc-almeida Mar 10, 2026
6e36695
Add validation checks and improve aggregation logic in `NutsProcessor`
dc-almeida Mar 10, 2026
4f1ed40
Add first test case for `NutsProcessor`
dc-almeida Mar 10, 2026
3c70ad8
Add documentation for NutsProcessor class and its methods
dc-almeida Mar 11, 2026
bd01d41
Implement EU27 aggregation logic
dc-almeida Mar 16, 2026
990c6b1
Add tests for EU27(+UK) aggregation
dc-almeida Mar 16, 2026
7cb2fbc
Add initial version of API documentation
dc-almeida Mar 16, 2026
219381a
Allow `process` to instantiate processors from config file
dc-almeida Mar 17, 2026
96154a6
Update `process` docstring
dc-almeida Mar 17, 2026
0a88906
Add info log message for skipped processors
dc-almeida Mar 17, 2026
716542a
Update documentation for NutsProcessor
dc-almeida Mar 17, 2026
ebef5f3
Add configuration section for processors in nomenclature.yaml
dc-almeida Mar 17, 2026
970b6f4
Change log info to `ValueError` to avoid unintended behaviour
dc-almeida Apr 17, 2026
bfddede
Apply suggested changes
dc-almeida Apr 17, 2026
5e4f364
Update tests
dc-almeida Apr 17, 2026
bcb88a0
Improve NUTS aggregation example in documentation
dc-almeida Apr 17, 2026
79e593e
Add docstrings and fix formatting
dc-almeida Apr 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/api/countries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ A common list of countries
==========================

Having an agreed list of country names including a mapping to alpha-3 and alpha-2 codes
(also know as ISO3 and ISO2 codes) is an important prerequisite for scenario analysis
(also known as ISO3 and ISO2 codes) is an important prerequisite for scenario analysis
and model comparison.

The :class:`nomenclature` package builds on the :class:`pycountry` package
Expand Down
78 changes: 74 additions & 4 deletions docs/api/nuts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,78 @@ The full list of NUTS regions is accessible via the Eurostat website (`xlsx, 500

from nomenclature import nuts

# list of NUTS region codes
nuts.codes
# Access NUTS region information
nuts.codes # List of all NUTS codes
nuts.names # List of all NUTS region names

# list of NUTS region names
nuts.names
# Query specific NUTS levels
nuts.get(level=3) # Get all NUTS3 regions

# Query by country
nuts.get(country_code="AT") # Get all NUTS regions in Austria

.. currentmodule:: nomenclature.processor.nuts

**NutsProcessor**
=================

The :class:`NutsProcessor` class provides automated aggregation of scenario data
across NUTS regions. It performs hierarchical aggregation in the following order:

1. NUTS3 → NUTS2
2. NUTS2 → NUTS1
3. NUTS1 → Country
4. Country → European Union (if ≥ 23 of the 27 EU member states are present)
5. Country + UK → European Union and United Kingdom (if the United Kingdom is also present)

The EU-level aggregations (steps 4-5) are only performed if the corresponding
target regions (``European Union`` and ``European Union and United Kingdom``) are
defined in the project's region codelist. If fewer than 23 EU member states are
present in the data, the EU aggregation is skipped silently.

The processor ensures that regional data is consistently aggregated and validated
according to the configured NUTS regions and variable code lists.

Consider the example below for configuring a project using NUTS aggregation.
The *nomenclature.yaml* in the project directory is as follows:

.. code:: yaml

dimensions:
- region
- variable
definitions:
region:
nuts:
nuts-3: [ AT ]
Comment thread
dc-almeida marked this conversation as resolved.
Outdated
country: true
processors:
nuts: [ Model A ]

With this configuration, calling :func:`process` will automatically instantiate
and apply the :class:`NutsProcessor`.

.. code:: python

import pyam
from nomenclature import DataStructureDefinition, process

df = pyam.IamDataFrame(data="path/to/file.csv")
dsd = DataStructureDefinition("definitions")
aggregated_data = process(df, dsd)

The data is aggregated for the applicable variables, creating the common region
``Austria`` (AT) from its constituent NUTS subregions.
The country-level regions must be defined in a region definition file or by setting
*definitions.region.country* as *true* in the configuration file
(see :ref:`adding-countries`).

.. note::

Only models listed under ``processors.nuts`` in *nomenclature.yaml* are processed
by :class:`NutsProcessor`. Data for other models is passed through unchanged.
If a NUTS region appears in the data for a listed model but the corresponding
country is missing from ``definitions.region.nuts``, a ``ValueError`` is raised.

.. autoclass:: NutsProcessor
:members: from_definition, apply
41 changes: 41 additions & 0 deletions docs/user_guide/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,8 @@ the nomenclature package will add all countries to the *region* codelist.

More details on the list of countries can be found here: :ref:`countries`.

.. _adding-countries:

Adding NUTS to the region codelist
----------------------------------

Expand Down Expand Up @@ -174,3 +176,42 @@ the filtering for definitions.

The above example retrieves only the model mapping for *MESSAGEix-GLOBIOM 2.1-M-R12*
from the common-definitions repository.

Configuring processors
----------------------

The ``processors`` section of *nomenclature.yaml* allows processors to be declared
directly in the configuration file, so they are applied automatically when calling
:func:`process` without passing an explicit ``processor`` argument.

Region processor
^^^^^^^^^^^^^^^^

Setting *processors.region-processor* as *true* will automatically create a
:class:`RegionProcessor` from the project's default ``mappings/`` directory:

.. code:: yaml

processors:
region-processor: true

This is equivalent to calling:

.. code:: python

from nomenclature.processor import RegionProcessor
processor = RegionProcessor.from_directory("mappings", dsd)

NUTS processor
^^^^^^^^^^^^^^

Setting *processors.nuts* to a list of model names will automatically create a
:class:`NutsProcessor` and apply NUTS hierarchical aggregation (NUTS3 → NUTS2 →
NUTS1 → Country → EU27) for those models:

.. code:: yaml

processors:
nuts: [ Model A, Model B ]

More details on NUTS aggregation can be found here: :ref:`nuts`.
1 change: 1 addition & 0 deletions nomenclature/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from nomenclature.nuts import nuts # noqa
from nomenclature.processor import ( # noqa
RegionAggregationMapping, # noqa
NutsProcessor,
RegionProcessor,
RequiredDataValidator,
)
Expand Down
64 changes: 58 additions & 6 deletions nomenclature/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,24 @@


class CodeListFromRepository(BaseModel):
"""
Configuration for a codelist from an external repository.

The `include` and `exclude` filters allow selecting which definitions to import.
"""

name: str
include: list[dict[str, Any]] = [{"name": "*"}]
exclude: list[dict[str, Any]] = Field(default_factory=list)


class CodeListConfig(BaseModel):
"""Configuration for a dimension's codelist.

This class lists external repositories for codelists, importing definitions
from remote sources.
"""

dimension: str | None = None
repositories: list[CodeListFromRepository] = Field(
default_factory=list, alias="repository"
Expand Down Expand Up @@ -60,6 +72,12 @@ def repository_dimension_path(self) -> str:


class RegionCodeListConfig(CodeListConfig):
"""
Configuration for a region's codelist.
Comment thread
dc-almeida marked this conversation as resolved.
Outdated

This class allows importing the definitions for ISO3 countries and NUTS regions.
"""

country: bool = False
nuts: dict[str, str | list[str] | bool] | None = None

Expand All @@ -77,11 +95,12 @@ def check_nuts(


class Repository(BaseModel):
"""Configuration for an external codelist repository."""

url: str
hash: str | None = None
release: str | None = None
local_path: Path | None = Field(default=None, validate_default=True)
# defined via the `repository` name in the configuration

@model_validator(mode="after")
@classmethod
Expand Down Expand Up @@ -150,13 +169,16 @@ def check_external_repo_double_stacking(self):


class DataStructureConfig(BaseModel):
"""A class for configuration of a DataStructureDefinition
"""
Configuration class for the data structure definition.

Attributes
----------
region : RegionCodeListConfig
Attributes for configuring the RegionCodeList
This class defines the configuration for the main IAMC dimensions:
- model
Comment thread
dc-almeida marked this conversation as resolved.
Outdated
- scenario
- region
- variable

Each dimension can be configured with its own code list and repository sources.
"""

model: CodeListConfig = Field(default_factory=CodeListConfig)
Comment thread
dc-almeida marked this conversation as resolved.
Outdated
Expand All @@ -179,6 +201,8 @@ def repos(self) -> dict[str, str]:


class MappingRepository(BaseModel):
"""Configuration for a mapping repository."""

name: str
include: list[str] = ["*"]

Expand All @@ -196,6 +220,8 @@ def match_models(self, models: list[str]) -> list[str]:


class RegionMappingConfig(BaseModel):
"""Configuration for region mapping/aggregation external repositories."""

repositories: list[MappingRepository] = Field(
default_factory=list, alias="repository"
)
Expand All @@ -217,7 +243,20 @@ def convert_to_set_of_repos(cls, v):
return v


class ProcessorConfig(BaseModel):
"""Configuration for region processor settings."""

nuts: list[str] | None = None
region_processor: bool = Field(False, alias="region-processor")

model_config = ConfigDict(
validate_by_name=True, validate_by_alias=True, extra="forbid"
)


class TimeDomainConfig(BaseModel):
"""Configuration for time domain validation settings."""

year_allowed: bool = Field(default=True, alias="year")
datetime_allowed: bool = Field(default=False, alias="datetime")
timezone: str | None = Field(
Expand Down Expand Up @@ -305,6 +344,9 @@ class NomenclatureConfig(BaseModel):
repositories: dict[str, Repository] = Field(default_factory=dict)
definitions: DataStructureConfig = Field(default_factory=DataStructureConfig)
mappings: RegionMappingConfig = Field(default_factory=RegionMappingConfig)
processor: ProcessorConfig = Field(
default_factory=ProcessorConfig, alias="processors"
)
illegal_characters: list[str] = Field(
default=[":", ";", '"'], alias="illegal-characters"
)
Expand All @@ -326,6 +368,7 @@ def check_illegal_chars(cls, v: str | list[str]) -> list[str]:
def check_definitions_repository(
cls, v: "NomenclatureConfig"
) -> "NomenclatureConfig":
"""Check that all repositories referenced in definitions and mappings exist."""
mapping_repos = {"mappings": v.mappings.repositories} if v.mappings else {}
repos: dict[str, list[MappingRepository]] = {
**v.definitions.repos,
Expand All @@ -337,6 +380,15 @@ def check_definitions_repository(
raise ValueError((f"Unknown repository {unknown_repos} in '{use}'."))
return v

@model_validator(mode="after")
@classmethod
def check_nuts_consistency(cls, v: "NomenclatureConfig") -> "NomenclatureConfig":
if v.processor and v.processor.nuts and not v.definitions.region.nuts:
Comment thread
dc-almeida marked this conversation as resolved.
Outdated
raise ValueError(
"`nuts` region processor set but no NUTS regions in `definitions`."
Comment thread
dc-almeida marked this conversation as resolved.
Outdated
)
return v

def fetch_repos(self, target_folder: Path):
for repo_name, repo in self.repositories.items():
repo.fetch_repo(target_folder / repo_name)
Expand Down
43 changes: 34 additions & 9 deletions nomenclature/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

from nomenclature.definition import DataStructureDefinition
from nomenclature.processor import Processor, RegionProcessor
from nomenclature.processor.nuts import NutsProcessor

logger = logging.getLogger(__name__)

Expand All @@ -21,11 +22,13 @@ def process(
This function is the recommended way of using the nomenclature package. It performs
the following operations:

* Validation against the codelists and criteria of a DataStructureDefinition
* Region-processing, which can consist of three parts:
1. Model native regions not listed in the model mapping will be dropped
2. Model native regions can be renamed
3. Aggregation from model native regions to "common regions"
* Validation against the codelists and criteria of a :class:`DataStructureDefinition`
* Region processing, which can occur via one or more :class:`Processor` instances. This can be:
* Region aggregation (via :class:`RegionProcessor`), which renames and aggregates based on user-provided mappings.
1. Model native regions not listed in the model mapping will be dropped
2. Model native regions can be renamed
3. Aggregation from model native regions to "common regions"
* NUTS aggregation (via :class:`NutsProcessor`), which aggregates NUTS3 -> NUTS2 -> NUTS1 -> Country -> EU27(+UK)
* Validation of consistency across the variable hierarchy

Parameters
Expand All @@ -36,9 +39,9 @@ def process(
Codelists that are used for validation.
dimensions : list, optional
Dimensions to be used in the validation, defaults to all dimensions defined in
`dsd`
processor : :class:`RegionProcessor`, optional
Region processor to perform region renaming and aggregation (if given)
``dsd``.
processor : :class:`Processor` or list of :class:`Processor`, optional
One or more processors to apply. Runs before any config-declared processors.

Returns
-------
Expand All @@ -56,8 +59,30 @@ def process(

dimensions = dimensions or dsd.dimensions

# Auto-instantiate processors declared in nomenclature.yaml under 'processors'
# Explicit processors take precedence; config-based ones are appended after.
if dsd.config.processor.region_processor:
if any(isinstance(p, RegionProcessor) for p in processor):
logger.info(
Comment thread
dc-almeida marked this conversation as resolved.
Outdated
"Config declares 'region-processor: true' but an explicit "
"RegionProcessor was provided -- skipping config-defined processor."
)
else:
processor = processor + [
Comment thread
dc-almeida marked this conversation as resolved.
Outdated
RegionProcessor.from_directory(dsd.project_folder / "mappings", dsd)
]

if dsd.config.processor.nuts is not None:
if any(isinstance(p, NutsProcessor) for p in processor):
logger.info(
Comment thread
dc-almeida marked this conversation as resolved.
Outdated
"Config declares 'nuts' processor but an explicit NutsProcessor "
"was provided -- skipping config-defined processor."
)
else:
processor = processor + [NutsProcessor.from_definition(dsd)]

if (
any(isinstance(p, RegionProcessor) for p in processor)
any(isinstance(p, (RegionProcessor, NutsProcessor)) for p in processor)
and "region" in dimensions
):
dimensions.remove("region")
Expand Down
1 change: 1 addition & 0 deletions nomenclature/processor/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
RegionAggregationMapping,
RegionProcessor,
)
from nomenclature.processor.nuts import NutsProcessor # noqa
from nomenclature.processor.required_data import RequiredDataValidator # noqa
from nomenclature.processor.data_validator import DataValidator # noqa
from nomenclature.processor.aggregator import Aggregator # noqa
Loading
Loading