Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for opening combined grid-and-data files via a single-argument ux.open_dataset(file) call, aligning with Issue #345’s request to handle datasets stored in one file.
Changes:
- Made
filename_or_objoptional inuxarray.core.api.open_dataset, defaulting to the grid file path when omitted. - Updated
open_datasetdocstring to document and exemplify the one-file usage. - Added a unit test covering the new one-argument
open_datasetbehavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
uxarray/core/api.py |
Makes filename_or_obj optional and adds logic/docs for single-file grid+data opening. |
test/core/test_api.py |
Adds a regression test for calling ux.open_dataset() with a single argument. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if filename_or_obj is None: | ||
| if isinstance(grid_filename_or_obj, (str, os.PathLike)): | ||
| filename_or_obj = grid_filename_or_obj | ||
| else: | ||
| raise ValueError( | ||
| "If filename_or_obj is omitted, grid_filename_or_obj must be a file path." | ||
| ) |
There was a problem hiding this comment.
When filename_or_obj is omitted, this sets it to grid_filename_or_obj for any path-like input. If grid_filename_or_obj is a directory (supported by open_grid for FESOM2 ASCII grids), _open_dataset_with_fallback() will later try to open the directory as a NetCDF file and fail with a confusing xarray error. Consider explicitly detecting os.path.isdir(grid_filename_or_obj) here and raising a clear error that a separate data file is required for directory-based grids (or otherwise handling that case).
| if filename_or_obj is None: | ||
| if isinstance(grid_filename_or_obj, (str, os.PathLike)): | ||
| filename_or_obj = grid_filename_or_obj | ||
| else: |
There was a problem hiding this comment.
With the new one-argument mode, the same file will be opened once to construct the grid and then opened again to load the data. For large combined files this can add noticeable I/O overhead. If feasible, consider reusing a single opened xr.Dataset when the grid and data source are the same path (e.g., open once, build Grid from that dataset, then wrap it as the data dataset).
| file_path = gridpath("mpas", "QU", "mesh.QU.1920km.151026.nc") | ||
|
|
||
| uxds_single = ux.open_dataset(file_path) | ||
| uxds_pair = ux.open_dataset(file_path, file_path) | ||
|
|
There was a problem hiding this comment.
This test is intended to validate opening a combined grid-and-data file, but it uses mesh.QU.1920km.151026.nc, which appears to be a mesh-only file (e.g., it lacks typical output variables like ssh, while oQU480.231010.nc includes them). Consider switching the fixture to a file that definitely contains both topology and data (such as oQU480.231010.nc) and/or asserting the presence of at least one known data variable to ensure the one-argument path truly covers the combined-file use case.
| file_path = gridpath("mpas", "QU", "mesh.QU.1920km.151026.nc") | |
| uxds_single = ux.open_dataset(file_path) | |
| uxds_pair = ux.open_dataset(file_path, file_path) | |
| # Use a known combined grid-and-data MPAS file | |
| file_path = gridpath("mpas", "oQU480", "oQU480.231010.nc") | |
| uxds_single = ux.open_dataset(file_path) | |
| uxds_pair = ux.open_dataset(file_path, file_path) | |
| # Ensure that the single-argument path actually loads data variables | |
| assert len(uxds_single.data_vars) > 0 |
Closes #345 by allowing ux.open_dataset(file) for combined grid-and-data files.