Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,733 changes: 1,733 additions & 0 deletions demo/extras/filter_tests.ipynb

Large diffs are not rendered by default.

Binary file added demo/figures/semantic_temp_filter_I.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added demo/figures/semantic_temp_filter_II.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 28 additions & 3 deletions demo/processor.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3131,6 +3131,31 @@
"It should be noted that in our demos only data loaded from locally stored GeoTIFF files are analysed. This is sort of the worst case for demonstrating the benefits of caching since the data is stored locally and is therefore quickly accessible. Keep in mind, however, that caching is designed for and particularly beneficial in case of STACCubes when loading data over the internet."
]
},
{
"cell_type": "markdown",
"id": "6b429e49",
"metadata": {},
"source": [
"## Filtering data layers temporally\n",
"\n",
"The execution of the QueryProcessor via `recipe.execute()` has an optional preceding FilterProcessor which is switched on by default (`filter_check=True`). The FilterProcessor evaluates the recipe for possible temporal filter operations in order to analyse which data actually needs to be loaded for the result calculation. Without the FilterProcessor, the temporal extent of the loaded data is determined exclusively via the context parameter `time` that needs to be passed to the QueryProcessor. If the data is further subset temporally within a recipe, all data would first be loaded in order to subsequently filter it temporally. The FilterProcessor uses a semantic evaluation of the recipe to filter the metadata in advance to the extent necessary in terms of time. The FilterProcessor evaluates the entire recipe for temporal filter operations and keeps track of the data layers that are affected by these filter operations. If a recipe contains several results, the union of the temporal extents required for all results is finally determined in order to obtain the minimum required extent for the entire recipe evaluation. Some of the specifics of how the FilterProcessor works are illustrated in the following figures.\n",
"\n",
"* The order of filter operations within a recipe is generally irrelevant when concatenating with other verbs. This means that in both cases 1.a) and 1.b), the temporal filter is recognised and taken into account when the data is later loaded via QueryProcessor. \n",
"* An exception to the previously mentioned irrelevance of the order of verbs with regard to filter evaluation is `smooth`, `shift` and `fill`. These verbs can directly change the temporal extent of the data to be loaded by including temporally neighbouring values. However, the question of which values are temporally neighbouring cannot be answered without an actual content-based evaluation of the recipe. For example, depending on a previous semantic filter operation (e.g. according to the entity clouds), the next non-null value may be at different distances in time depending. As a universal evaluation of the temporal effect of these verbs is therefore not possible, the FilterProcessor can't analyse them or any subsequent verbs with regard to their filter effect. As shown in 2.a) & 2.b), this means for the creation of recipes that temporal filter operations must be placed before these verbs so that they are taken into account by the FilterProcessor.\n",
"\n",
"![FilterProcessor - temporal filtering, part A](figures/semantic_temp_filter_I.png)\n",
"\n",
"* According to the general functionality of some verbs, when processing several data layers (e.g. by merging different entities), a data layer can also be indirectly filtered by a temporal filter in the other part of the recipe. This is possible with the verbs `filter`, `groupby`, `evaluate`, `concatenate` and `merge` as shown in 3.a). There, the temporality of the data in the part of the recipe that has not yet been filtered is also filtered automatically (partly via implicit align). This only happens if the data used for filtering, grouping, evaluating, concatenating or merging still have timestamps themselves. If these no longer exist, e.g. via reduce as shown in 3.b), the implicit filter effect does not exist either. In this case, the temporal filter operation must be explicitly called in both parts of the recipe so that the entirety of all loaded data has a reduced temporal extent, which is recognised accordingly by the FilterProcessor (3.c).\n",
"\n",
"![FilterProcessor - temporal filtering, part B](figures/semantic_temp_filter_II.png)\n",
"\n",
"Some other things to be considered:\n",
"* The FilterProcessor evaluates all forms of temporal filter operations (regardless of whether the operation is called via shortcut `filter_time()` or `filter(sq.self().extract().evaluate()`).\n",
"* Per definition `filter_time` is not removing the filtered coordinates but setting them to NaN. To really remove them, a `trim` operation needs to follow `filter_time`. The FilterProcessor looks at the recipes results and treats them as if a final `trim` operation is called. This means that coordinates with NaNs caused by temporal filters are filtered out even though no explicit `trim` has been included in the recipe. If the user want's to filter the data temporally but keep the filtered data as NaNs, he/she would need to disable the FilterProcessor evaluation (`filter_check=False`). Evaluating a hypothetical `trim` at the very end of a recipe's result is a design choice to match users general behaviour not to care about NaNs. The fact that the `trim` is evaluated as a final operation (and not as a direct sucessor of filter_time calls) ensures that other operations which actually rely on the existance of the NaNs are not affected. For example: If the user calls an `extract_time` after the temporal filter to get the timestamps of all data points (incl. those where the data is filtered out but not trimmed), the FilterEvaluator won't modify the result of the extract_time since the NaNs aren't filtered out until the very end. The final trim ensures that NaNs are considered to be superfluous only if they are part of the recipes result not if they occur somewhere along the way. This also implies that the user could benefit from making a manual `trim` call after a temporal filter, if he/she wants be be certain that the intermediate processing result is as trimmed right away. \n",
"* For custom verbs it is generally assumed that they do not change the temporality of the data to be loaded. Meaning the FilterProcessor doesn't consider custom verbs in the category of verbs to be looking out for as terminating verbs (such as `smooth`, `shift` and `fill`). Custom verbs are not evaluated in terms of their potential effect on temporal filtering. In case a custom verb does modify the temporality of the data, it is up to the user to switch off the FilterProcessor accordingly so that the data is not pre-filtered incorrectly (`filter_check=False`).\n",
"* Recipe results representing spatial dimensions (something produced by calling `extract_space`, which is not processed further e.g. by evaluating it in a filter operation) are ignored in the FilterProcessor evaluation. Temporal filters are not considered here as those results are completely time-independent."
]
},
{
"cell_type": "markdown",
"id": "bc13ea19",
Expand Down Expand Up @@ -3177,9 +3202,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:semantique]",
"display_name": "semantique",
"language": "python",
"name": "conda-env-semantique-py"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -3191,7 +3216,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.10.1"
}
},
"nbformat": 4,
Expand Down
Loading