-
Notifications
You must be signed in to change notification settings - Fork 47
Support one-file open_dataset #1479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -353,7 +353,7 @@ def list_grid_names( | |
|
|
||
| def open_dataset( | ||
| grid_filename_or_obj: str | os.PathLike[Any] | dict | Dataset, | ||
| filename_or_obj: str | os.PathLike[Any], | ||
| filename_or_obj: str | os.PathLike[Any] | None = None, | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While technically there is nothing wrong with this, I'd like to share a few thoughts on use-case logic:
|
||
| chunks=None, | ||
| chunk_grid: bool = True, | ||
| use_dual: bool | None = False, | ||
|
|
@@ -368,10 +368,12 @@ def open_dataset( | |
| Strings and Path objects are interpreted as a path to a grid file. Xarray Datasets assume that | ||
| each member variable is in the UGRID conventions and will be used to create a ``ux.Grid``. Similarly, a dictionary | ||
| containing UGRID variables can be used to create a ``ux.Grid`` | ||
| filename_or_obj : str | os.PathLike[Any] | ||
| filename_or_obj : str | os.PathLike[Any], optional | ||
| String or Path object as a path to a netCDF file or an OpenDAP URL that | ||
| stores the actual data set. It is the same ``filename_or_obj`` in | ||
| ``xarray.open_dataset``. | ||
| ``xarray.open_dataset``. If omitted, ``grid_filename_or_obj`` is also | ||
| used as the data source, allowing combined grid-and-data files to be | ||
| opened with a single argument. | ||
| chunks : int, dict, 'auto' or None, default: None | ||
| If provided, used to load the grid into dask arrays. | ||
|
|
||
|
|
@@ -406,17 +408,50 @@ def open_dataset( | |
|
|
||
| >>> import uxarray as ux | ||
| >>> ux_ds = ux.open_dataset("grid_file.nc", "data_file.nc") | ||
|
|
||
| Open a dataset stored in a single combined grid-and-data file | ||
|
|
||
| >>> ux_ds = ux.open_dataset("combined_file.nc") | ||
| """ | ||
| if grid_kwargs is None: | ||
| grid_kwargs = {} | ||
|
|
||
| # Construct a Grid, validate parameters, and correct chunks | ||
| uxgrid, corrected_chunks = _get_grid( | ||
| grid_filename_or_obj, chunks, chunk_grid, use_dual, grid_kwargs, **kwargs | ||
| ) | ||
| if filename_or_obj is None: | ||
| if isinstance(grid_filename_or_obj, (str, os.PathLike)): | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Limiting what was originally possible (dict, xr.dataset) doesn't look like a great idea. I am not sure if we could support dict with this, but I am pretty sure we should for xr.dataset. I think the main check should be on whether a good Grid object could be opened out of the file object here or not. |
||
| if os.path.isdir(grid_filename_or_obj): | ||
| raise ValueError( | ||
| "Directory-based grids require a separate data file when calling ux.open_dataset()." | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this to cover the multiple grid files case? |
||
| ) | ||
|
|
||
| # Load the data as a Xarray Dataset | ||
| ds = _open_dataset_with_fallback(filename_or_obj, chunks=corrected_chunks, **kwargs) | ||
| filename_or_obj = grid_filename_or_obj | ||
|
|
||
| if "latlon" in kwargs: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why this check? |
||
| warn( | ||
| "'latlon is no longer a supported parameter", | ||
| DeprecationWarning, | ||
| stacklevel=2, | ||
| ) | ||
| grid_kwargs["latlon"] = kwargs["latlon"] | ||
|
|
||
| corrected_chunks = match_chunks_to_ugrid(grid_filename_or_obj, chunks) | ||
| ds = _open_dataset_with_fallback( | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am a bit confused about |
||
| filename_or_obj, chunks=corrected_chunks, **kwargs | ||
| ) | ||
| uxgrid = open_grid(ds, use_dual=use_dual, **grid_kwargs) | ||
| else: | ||
| raise ValueError( | ||
| "If filename_or_obj is omitted, grid_filename_or_obj must be a file path." | ||
| ) | ||
| else: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe rather than splitting the whole logic into two parts as it is in this if-else block now, we'd only need to split the grid opening task as follows:
|
||
| # Construct a Grid, validate parameters, and correct chunks | ||
| uxgrid, corrected_chunks = _get_grid( | ||
| grid_filename_or_obj, chunks, chunk_grid, use_dual, grid_kwargs, **kwargs | ||
| ) | ||
|
|
||
| # Load the data as a Xarray Dataset | ||
| ds = _open_dataset_with_fallback( | ||
| filename_or_obj, chunks=corrected_chunks, **kwargs | ||
| ) | ||
|
|
||
| # Map original dimensions to the UGRID conventions | ||
| ds = _map_dims_to_ugrid(ds, uxgrid._source_dims_dict, uxgrid) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need another test case to make sure the single file has proper grid definitions in it (i.e. If not throw exception)