gpu support continue by lazarusA · Pull Request #257 · EarthyScience/EasyHybrid.jl

lazarusA · 2026-04-07T11:00:54Z

this follows from #256

…at needs to be done in the outer loop, refactoring genericHybrid is needed for that

gemini-code-assist

Code Review

This pull request introduces GPU support and refactors the data pipeline to separate predictors and forcings, updating the training, evaluation, and checkpointing logic accordingly. Feedback identifies several critical issues, including a typo in the loss computation that will cause an UndefVarError, a statistically incorrect R² formula, and flawed NaN masking logic that could lead to incorrect loss values in multi-target models. Additionally, improvements are suggested for configuration type safety, the removal of leftover comments, and fixing a broken warning check in the data preparation logic.

gemini-code-assist · 2026-04-07T11:02:35Z

src/losses/compute_loss.jl

+                y_t = y[target]# _get_target_y(y, target)
+                ŷ_t = ŷ[target]#_get_target_ŷ(ŷ, y_t, target)
+                _apply_loss(ŷ_t, y_t, y_nan, loss_spec)
+                # _apply_loss(ŷ_t, y_t, _get_target_nan(y_nan, target), loss_spec)


There is a typo on line 109: ŷ (y + U+0302) is used instead of the argument ŷ (U+0177), which will cause an UndefVarError. Additionally, y_nan should be target-specific (e.g., y_nan[target]) to ensure that NaNs in one target do not invalidate observations for other targets during loss calculation.

y_t = y[target] ŷ_t = ŷ[target] _apply_loss(ŷ_t, y_t, y_nan[target], loss_spec)

gemini-code-assist · 2026-04-07T11:02:35Z

src/losses/loss_fn.jl

 function loss_fn(ŷ, y, y_nan, ::Val{:r2})
-    r = cor(ŷ[y_nan], y[y_nan])
-    return r * r
+    return 1 - sum((y[y_nan] .- ŷ[y_nan]).^2) / sum((y[y_nan] .- mean(ŷ[y_nan])).^2)


The $R^2$ formula is statistically incorrect. The denominator (Total Sum of Squares) should be calculated using the mean of the observed values y, not the predicted values ŷ.

return 1 - sum((y[y_nan] .- ŷ[y_nan]).^2) / sum((y[y_nan] .- mean(y[y_nan])).^2)

gemini-code-assist · 2026-04-07T11:02:35Z

src/training/epoch.jl

+        is_no_nan = falses(length(first(y))) |> cfg.gdev
+        for vec in y 
+            is_no_nan = is_no_nan.|| .!isnan.(vec)
+        end


The current logic computes a global NaN mask by OR-ing all targets. This is problematic for multi-target models: if target A is NaN but target B is valid at index i, the mask will be true, and the loss function will attempt to compute a loss on the NaN value for target A, resulting in a NaN total loss. The mask should be kept per-target (e.g., as a NamedTuple of masks).

is_no_nan = map(vec -> .!isnan.(vec), y) !any(map(any, is_no_nan)) && continue

gemini-code-assist · 2026-04-07T11:02:36Z

src/training/epoch.jl

+    is_no_nan_t = falses(length(first(y_train))) |> cfg.gdev
+    for vec in y_train 
+        is_no_nan_t = is_no_nan_t .|| .!isnan.(vec)
+    end
+    is_no_nan_v = falses(length(first(y_val))) |> cfg.gdev
+    for vec in y_val 
+        is_no_nan_v = is_no_nan_v .|| .!isnan.(vec)
+    end


Similar to the training loop, the NaN masks for evaluation should be per-target to avoid cross-contamination of NaN values between different target variables.

is_no_nan_t = map(vec -> .!isnan.(vec), y_train) is_no_nan_v = map(vec -> .!isnan.(vec), y_val)

gemini-code-assist · 2026-04-07T11:02:36Z

src/config/DataConfig.jl

+    gdev = gpu_device()
+
+    "Set the `cpu_device`, useful for sending back to the cpu model parameters"
+    cdev = cpu_device()


The fields gdev and cdev should have explicit type annotations (e.g., Lux.AbstractLuxDevice) to improve code clarity and potentially help with compiler optimizations. Additionally, ensure that gpu_device() is the intended default for all instances of DataConfig, as it may trigger device initialization.

"Select a gpu_device or default to cpu if none available" gdev::Lux.AbstractLuxDevice = gpu_device() "Set the `cpu_device`, useful for sending back to the cpu model parameters" cdev::Lux.AbstractLuxDevice = cpu_device()

gemini-code-assist · 2026-04-07T11:02:36Z

src/data/prepare_data.jl

+    # predicto
+    # predictors_forcing = unique(predictors_forcing)


Remove leftover comments and incomplete words.

gemini-code-assist · 2026-04-07T11:02:36Z

src/data/prepare_data.jl

    if isempty(predictors_forcing)
        @warn "Note that you don't have predictors or forcing variables."
    end


The warning check uses predictors_forcing, which is an empty Symbol[] initialized at line 89 and never updated. The logic should check if both predictors and forcings are empty instead.

if isempty(predictors) && isempty(forcings) @warn "Note that you don't have predictors or forcing variables." end

lazarusA and others added 5 commits March 20, 2026 12:32

gpu, cpu devices

4c415a5

pfuff, of course symbol or variable indexing will not work on gpu, th…

527e634

…at needs to be done in the outer loop, refactoring genericHybrid is needed for that

GPU support

c08172b

Merge branch 'main' into ac/gpu_support

dc98746

split view end dim tuple

479a954

lazarusA mentioned this pull request Apr 7, 2026

gpu support #256

Open

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

view method and prepare data step

1a8eef5

lazarusA closed this Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu support continue#257

gpu support continue#257
lazarusA wants to merge 6 commits intomainfrom
ac_la/gpu_continue

lazarusA commented Apr 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lazarusA commented Apr 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants