Conversation
…at needs to be done in the outer loop, refactoring genericHybrid is needed for that
There was a problem hiding this comment.
Code Review
This pull request introduces GPU support and refactors the data pipeline to separate predictors and forcings, updating the training, evaluation, and checkpointing logic accordingly. Feedback identifies several critical issues, including a typo in the loss computation that will cause an UndefVarError, a statistically incorrect R² formula, and flawed NaN masking logic that could lead to incorrect loss values in multi-target models. Additionally, improvements are suggested for configuration type safety, the removal of leftover comments, and fixing a broken warning check in the data preparation logic.
| y_t = y[target]# _get_target_y(y, target) | ||
| ŷ_t = ŷ[target]#_get_target_ŷ(ŷ, y_t, target) | ||
| _apply_loss(ŷ_t, y_t, y_nan, loss_spec) | ||
| # _apply_loss(ŷ_t, y_t, _get_target_nan(y_nan, target), loss_spec) |
There was a problem hiding this comment.
There is a typo on line 109: ŷ (y + U+0302) is used instead of the argument ŷ (U+0177), which will cause an UndefVarError. Additionally, y_nan should be target-specific (e.g., y_nan[target]) to ensure that NaNs in one target do not invalidate observations for other targets during loss calculation.
y_t = y[target]
ŷ_t = ŷ[target]
_apply_loss(ŷ_t, y_t, y_nan[target], loss_spec)| function loss_fn(ŷ, y, y_nan, ::Val{:r2}) | ||
| r = cor(ŷ[y_nan], y[y_nan]) | ||
| return r * r | ||
| return 1 - sum((y[y_nan] .- ŷ[y_nan]).^2) / sum((y[y_nan] .- mean(ŷ[y_nan])).^2) |
| is_no_nan = falses(length(first(y))) |> cfg.gdev | ||
| for vec in y | ||
| is_no_nan = is_no_nan.|| .!isnan.(vec) | ||
| end |
There was a problem hiding this comment.
The current logic computes a global NaN mask by OR-ing all targets. This is problematic for multi-target models: if target A is NaN but target B is valid at index i, the mask will be true, and the loss function will attempt to compute a loss on the NaN value for target A, resulting in a NaN total loss. The mask should be kept per-target (e.g., as a NamedTuple of masks).
is_no_nan = map(vec -> .!isnan.(vec), y)
!any(map(any, is_no_nan)) && continue| is_no_nan_t = falses(length(first(y_train))) |> cfg.gdev | ||
| for vec in y_train | ||
| is_no_nan_t = is_no_nan_t .|| .!isnan.(vec) | ||
| end | ||
| is_no_nan_v = falses(length(first(y_val))) |> cfg.gdev | ||
| for vec in y_val | ||
| is_no_nan_v = is_no_nan_v .|| .!isnan.(vec) | ||
| end |
| gdev = gpu_device() | ||
|
|
||
| "Set the `cpu_device`, useful for sending back to the cpu model parameters" | ||
| cdev = cpu_device() |
There was a problem hiding this comment.
The fields gdev and cdev should have explicit type annotations (e.g., Lux.AbstractLuxDevice) to improve code clarity and potentially help with compiler optimizations. Additionally, ensure that gpu_device() is the intended default for all instances of DataConfig, as it may trigger device initialization.
"Select a gpu_device or default to cpu if none available"
gdev::Lux.AbstractLuxDevice = gpu_device()
"Set the `cpu_device`, useful for sending back to the cpu model parameters"
cdev::Lux.AbstractLuxDevice = cpu_device()| # predicto | ||
| # predictors_forcing = unique(predictors_forcing) |
| if isempty(predictors_forcing) | ||
| @warn "Note that you don't have predictors or forcing variables." | ||
| end |
There was a problem hiding this comment.
this follows from #256