Skip to content

Support one-file open_dataset#1479

Draft
rajeeja wants to merge 1 commit intomainfrom
rajeeja/onefile_open_dataset
Draft

Support one-file open_dataset#1479
rajeeja wants to merge 1 commit intomainfrom
rajeeja/onefile_open_dataset

Conversation

@rajeeja
Copy link
Copy Markdown
Contributor

@rajeeja rajeeja commented Mar 30, 2026

Closes #345 by allowing ux.open_dataset(file) for combined grid-and-data files.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for opening combined grid-and-data files via a single-argument ux.open_dataset(file) call, aligning with Issue #345’s request to handle datasets stored in one file.

Changes:

  • Made filename_or_obj optional in uxarray.core.api.open_dataset, defaulting to the grid file path when omitted.
  • Updated open_dataset docstring to document and exemplify the one-file usage.
  • Added a unit test covering the new one-argument open_dataset behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
uxarray/core/api.py Makes filename_or_obj optional and adds logic/docs for single-file grid+data opening.
test/core/test_api.py Adds a regression test for calling ux.open_dataset() with a single argument.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +419 to +425
if filename_or_obj is None:
if isinstance(grid_filename_or_obj, (str, os.PathLike)):
filename_or_obj = grid_filename_or_obj
else:
raise ValueError(
"If filename_or_obj is omitted, grid_filename_or_obj must be a file path."
)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When filename_or_obj is omitted, this sets it to grid_filename_or_obj for any path-like input. If grid_filename_or_obj is a directory (supported by open_grid for FESOM2 ASCII grids), _open_dataset_with_fallback() will later try to open the directory as a NetCDF file and fail with a confusing xarray error. Consider explicitly detecting os.path.isdir(grid_filename_or_obj) here and raising a clear error that a separate data file is required for directory-based grids (or otherwise handling that case).

Copilot uses AI. Check for mistakes.
Comment on lines +419 to +422
if filename_or_obj is None:
if isinstance(grid_filename_or_obj, (str, os.PathLike)):
filename_or_obj = grid_filename_or_obj
else:
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new one-argument mode, the same file will be opened once to construct the grid and then opened again to load the data. For large combined files this can add noticeable I/O overhead. If feasible, consider reusing a single opened xr.Dataset when the grid and data source are the same path (e.g., open once, build Grid from that dataset, then wrap it as the data dataset).

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +50
file_path = gridpath("mpas", "QU", "mesh.QU.1920km.151026.nc")

uxds_single = ux.open_dataset(file_path)
uxds_pair = ux.open_dataset(file_path, file_path)

Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is intended to validate opening a combined grid-and-data file, but it uses mesh.QU.1920km.151026.nc, which appears to be a mesh-only file (e.g., it lacks typical output variables like ssh, while oQU480.231010.nc includes them). Consider switching the fixture to a file that definitely contains both topology and data (such as oQU480.231010.nc) and/or asserting the presence of at least one known data variable to ensure the one-argument path truly covers the combined-file use case.

Suggested change
file_path = gridpath("mpas", "QU", "mesh.QU.1920km.151026.nc")
uxds_single = ux.open_dataset(file_path)
uxds_pair = ux.open_dataset(file_path, file_path)
# Use a known combined grid-and-data MPAS file
file_path = gridpath("mpas", "oQU480", "oQU480.231010.nc")
uxds_single = ux.open_dataset(file_path)
uxds_pair = ux.open_dataset(file_path, file_path)
# Ensure that the single-argument path actually loads data variables
assert len(uxds_single.data_vars) > 0

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate allowing the construction of UxDataset from a single grid & data file

2 participants