Conversation
There was a problem hiding this comment.
I get the intuitive part 😄 .... but:
- Input validation is tough:
- What if df doesn't have timestamps
- What if validation date is later than test date
- Different timestamp formats
- Kinda breaks API consistency across all the repos (we'd have to add it in cv endpoints sf / mf / nixtla too) - doesn't mean we shouldn't implement it, but just noting.
- val_size and test_size are used quite extensively throughout the library; if we support this in the cv endpoint, would make sense to make it consistent and support it in all endpoints where test_size and val_size are provided as an argument.
If kept very consistent throughout NF and strong input validation, we could use it, but it remains hard....
E.g. "2011-03-02", how does it know what date that is if the format isn't supplied? So, it should then check against the format of the df I guess?
| step_size: int = 1, | ||
| val_size: Optional[int] = 0, | ||
| test_size: Optional[int] = None, | ||
| validation_cutoff: Optional[Any] = None, |
There was a problem hiding this comment.
Allowing Any here is a validation nightmare, especially with dates that can be timestamp, pandas offsets, strings....
I agree, and I'm not sure if I like this feature either haha 😅. I knew validation would be hard, and I still find it fragile. To be honest, it hasn't been a wildly requested feature. Let's store it aside for now, and see if it's something the community really wants. |
Add the option to use dates instead of number of time steps to perform the train/val/test split in cross-validation.