Skip to content

Add dataset_path existence check in DatasetParams finalize_and_validate#2320

Open
idoudali wants to merge 2 commits intoidoudali/config-exceptionsfrom
idoudali/dataset-file-check
Open

Add dataset_path existence check in DatasetParams finalize_and_validate#2320
idoudali wants to merge 2 commits intoidoudali/config-exceptionsfrom
idoudali/dataset-file-check

Conversation

@idoudali
Copy link
Copy Markdown
Contributor

@idoudali idoudali commented Mar 26, 2026

Description

Depends on #2319

Raise ConfigFileNotFoundError early when dataset_path points to a
nonexistent file or directory, giving users a clear error before
training begins.

Fixes OPE-1849

Related issues

Fixes # (issue)

Before submitting

  • This PR only changes documentation. (You can ignore the following checks in that case)
  • Did you read the contributor guideline Pull Request guidelines?
  • Did you link the issue(s) related to this PR in the section above?
  • Did you add / update tests where needed?

Reviewers

At least one review from a member of oumi-ai/oumi-staff is required.

@idoudali idoudali requested a review from oelachqar March 26, 2026 14:19
@idoudali idoudali self-assigned this Mar 26, 2026
Copilot AI review requested due to automatic review settings March 26, 2026 14:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds early validation to catch invalid dataset_path values during config finalization, improving user-facing failures before training starts.

Changes:

  • Added dataset_path existence validation in DatasetParams.__finalize_and_validate__.
  • Added unit tests covering nonexistent paths, existing files/directories, and missing dataset_path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/oumi/core/configs/params/data_params.py Raises ConfigFileNotFoundError when dataset_path is set but does not exist.
tests/unit/core/configs/params/test_data_params.py Adds tests validating the new dataset_path existence check behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/oumi/core/configs/params/data_params.py
@idoudali idoudali force-pushed the idoudali/config-exceptions branch from 14e626c to 35745d5 Compare March 26, 2026 16:19
@idoudali idoudali force-pushed the idoudali/dataset-file-check branch from 3fa4bc3 to cf2ca7e Compare March 26, 2026 16:22
@idoudali idoudali force-pushed the idoudali/config-exceptions branch from 35745d5 to 9c851f3 Compare March 27, 2026 11:56
@idoudali idoudali force-pushed the idoudali/dataset-file-check branch from cf2ca7e to 5bb15aa Compare March 27, 2026 13:55
…rror handling

Introduce dedicated config exception types and add explicit
configuration checks to catch invalid or missing values early. Wire
these failures into the CLI global handler so users get clearer,
consistent error messages. Improve current file checks during
configuration parsing.

Fixes OPE-1848
@idoudali idoudali force-pushed the idoudali/config-exceptions branch from 9c851f3 to e5f549f Compare March 27, 2026 14:33
# If `model_name` specifies a LoRA adapter dir without the base model
# present, set it to the base model name found in the adapter config,
# if present. Error otherwise.
if len(list(adapter_dir.glob("config.json"))) == 0:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can move this to a separate "load_model_config" function?

+ "\n".join(f"- {path}" for path in sorted(removed_paths))
)

try:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, it might be useful to get the stacktrack to know where the call originated from, i'd suggest skipping for now

Raise ConfigFileNotFoundError early when dataset_path points to a
nonexistent file or directory, giving users a clear error before
training begins.

Fixes OPE-1849
@idoudali idoudali force-pushed the idoudali/dataset-file-check branch from 5bb15aa to f1d3d4f Compare March 27, 2026 15:47
@idoudali idoudali force-pushed the idoudali/config-exceptions branch 3 times, most recently from e46d02e to 4804801 Compare April 17, 2026 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants