Conversation
…at needs to be done in the outer loop, refactoring genericHybrid is needed for that
There was a problem hiding this comment.
Code Review
This pull request introduces GPU support and refactors data handling to separate predictors and forcings throughout the training pipeline. Key changes include adding CUDA dependencies, updating configuration objects with device selectors, and modifying data loaders, splitters, and model forward passes to accommodate a new nested tuple input structure. Feedback highlights a mathematical error in the R-squared calculation, potential shape mismatches and incorrect NaN masking in the epoch loop, and several instances of dead code or typos. Additionally, a logic error was identified in a warning check within the data preparation module.
| function loss_fn(ŷ, y, y_nan, ::Val{:r2}) | ||
| r = cor(ŷ[y_nan], y[y_nan]) | ||
| return r * r | ||
| return 1 - sum((y[y_nan] .- ŷ[y_nan]).^2) / sum((y[y_nan] .- mean(ŷ[y_nan])).^2) |
There was a problem hiding this comment.
The R-squared calculation is incorrect. The denominator should use the mean of the observed values (y), not the predicted values (ŷ). The standard definition of R² is
return 1 - sum((y[y_nan] .- ŷ[y_nan]).^2) / sum((y[y_nan] .- mean(y[y_nan])).^2)| is_no_nan = falses(length(first(y))) |> cfg.gdev | ||
| for vec in y | ||
| is_no_nan = is_no_nan.|| .!isnan.(vec) | ||
| end |
There was a problem hiding this comment.
This logic has two significant issues:
- Shape Mismatch:
falses(length(first(y)))creates a 1D array. If the targets are multi-dimensional (e.g.,(time, batch)), the bitwise OR operation.||will fail. Usesize(first(y))instead oflength. - Incorrect Masking: Computing a single global
is_no_nanmask by ORing all targets is problematic. If target A has a NaN at an index where target B is valid, the global mask will betrueat that index. Consequently, the loss for target A will be computed using the NaN value, resulting in aNaNtotal loss. Masks should be computed and applied per-target.
| # predicto | ||
| # predictors_forcing = unique(predictors_forcing) |
| @warn "Note that you don't have target names." | ||
| end | ||
| return predictors_forcing, targets | ||
| return predictors, forcings, targets |
There was a problem hiding this comment.
The function now returns predictors and forcings separately, but the warning check at line 115 (visible in context) still references predictors_forcing. Since predictors_forcing is initialized as an empty array at line 89 and never populated in the new logic, this warning will be triggered on every call. The check should be updated to verify if both predictors and forcings are empty.
| y_t = y[target]# _get_target_y(y, target) | ||
| ŷ_t = ŷ[target]#_get_target_ŷ(ŷ, y_t, target) |
There was a problem hiding this comment.
There is a typo in line 109: ŷ (y with combining circumflex) is used instead of the argument ŷ (U+0177) defined at line 105. While Julia normalizes identifiers to NFC, mixing these characters is confusing and can lead to issues in environments with different normalization rules. Additionally, the commented-out code should be removed.
y_t = y[target]
ŷ_t = ŷ[target]| ps, st = LuxCore.setup(Random.default_rng(), model) |> cfg.gdev | ||
| ps = ps |> cfg.gdev |
No description provided.