Skip to content

[FEAT] Use dates for train/val/test split#1473

Open
marcopeix wants to merge 3 commits intomainfrom
feat/split_by_dates
Open

[FEAT] Use dates for train/val/test split#1473
marcopeix wants to merge 3 commits intomainfrom
feat/split_by_dates

Conversation

@marcopeix
Copy link
Copy Markdown
Contributor

Add the option to use dates instead of number of time steps to perform the train/val/test split in cross-validation.

@marcopeix marcopeix marked this pull request as ready for review March 5, 2026 21:09
Copy link
Copy Markdown
Contributor

@elephaint elephaint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the intuitive part 😄 .... but:

  • Input validation is tough:
    • What if df doesn't have timestamps
    • What if validation date is later than test date
    • Different timestamp formats
  • Kinda breaks API consistency across all the repos (we'd have to add it in cv endpoints sf / mf / nixtla too) - doesn't mean we shouldn't implement it, but just noting.
  • val_size and test_size are used quite extensively throughout the library; if we support this in the cv endpoint, would make sense to make it consistent and support it in all endpoints where test_size and val_size are provided as an argument.

If kept very consistent throughout NF and strong input validation, we could use it, but it remains hard....

E.g. "2011-03-02", how does it know what date that is if the format isn't supplied? So, it should then check against the format of the df I guess?

Comment thread neuralforecast/core.py
step_size: int = 1,
val_size: Optional[int] = 0,
test_size: Optional[int] = None,
validation_cutoff: Optional[Any] = None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing Any here is a validation nightmare, especially with dates that can be timestamp, pandas offsets, strings....

@marcopeix
Copy link
Copy Markdown
Contributor Author

I get the intuitive part 😄 .... but:

* Input validation is tough:
  
  * What if df doesn't have timestamps
  * What if validation date is later than test date
  * Different timestamp formats

* Kinda breaks API consistency across all the repos (we'd have to add it in cv endpoints sf / mf / nixtla too) - doesn't mean we shouldn't implement it, but just noting.

* val_size and test_size are used quite extensively throughout the library; if we support this in the cv endpoint, would make sense to make it consistent and support it in all endpoints where test_size and val_size are provided as an argument.

If kept very consistent throughout NF and strong input validation, we could use it, but it remains hard....

E.g. "2011-03-02", how does it know what date that is if the format isn't supplied? So, it should then check against the format of the df I guess?

I agree, and I'm not sure if I like this feature either haha 😅. I knew validation would be hard, and I still find it fragile. To be honest, it hasn't been a wildly requested feature. Let's store it aside for now, and see if it's something the community really wants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Core] Split sets by date AutoNBEATS/AutoNHITS - Questions on model features

2 participants