Add self-supervised 3D-Var-based AI data assimilation #194
Add self-supervised 3D-Var-based AI data assimilation #194SOHAMPAL23 wants to merge 11 commits intoopenclimatefix:mainfrom
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
hey @jacobbieker Can you review this |
jacobbieker
left a comment
There was a problem hiding this comment.
Please don't include AI-generated code or documentation.
Also, this PR is way too large, it needs to be split up.
|
|
||
|
|
||
| class SelfAttentionLayer(nn.Module): | ||
| """Self-attention layer for processing point cloud features. |
There was a problem hiding this comment.
Please split these just doc changes into a separate PR, this is a huge PR that is hard to review and has a lot of unrelated changes. Do that, and then I can review just the implementation.
| return H | ||
|
|
||
|
|
||
| def generate_synthetic_data(batch_size=32, grid_size=(10, 10), num_channels=1): |
There was a problem hiding this comment.
This generation for testing shouldn't be in the model file, only in the unit tests
There was a problem hiding this comment.
These model training loops, etc. stuff shouldn't be in the /models/ directory, but in a subdirectory, like /models/data_assimilation/ or something similar
There was a problem hiding this comment.
These feel like more generic things for training that shouldn't be in a model PR. If you want to add more visualizations, feel free, but that should be separate.
There was a problem hiding this comment.
I don't think this evaluation should be included in this PR, a lot of the computation already exists somewhere in the repo, and is more generally useful. Please refactor or remove.
| return analysis | ||
|
|
||
|
|
||
| class SimpleDataAssimilationModel(nn.Module): |
There was a problem hiding this comment.
I would prefer just the model implementation from the paper, not a second, simpler one for this.
|
|
||
| self.network = nn.Sequential(*layers) | ||
|
|
||
| def forward(self, background, observations): |
There was a problem hiding this comment.
In this repo, we want the models to take in points and output points, so they are all compatible. Please make this work with the same kind of interface as the GraphWeatherForecaster, or GenCast implementations.
| return sample | ||
|
|
||
|
|
||
| def create_synthetic_assimilation_dataset( |
There was a problem hiding this comment.
This might be needed, but needs changes. The models being added should be compatible with taking in points and lat/lon locations, and so should this one. This might make it not necessary to have this.
for more information, see https://pre-commit.ci
|
Sure @jacobbieker |
|
Closing this as #196 is a duplicate |
Pull Request
Description
This PR introduces a self-supervised AI-based data assimilation prototype
The implementation replaces supervised learning with a physics-based 3D-Var cost function, allowing the model to learn directly from:
Sparse/noisy observations
Scope
Focused on demonstrating the feasibility and correctness of the self-supervised assimilation approach
Intended as an experimental framework, not a full operational replacement
Checklist
Implements the 3D-Var objective as the training loss
Supports both forecast-based and cold-start first-guess states
Modular design for experimenting with different observation operators and error assumptions
Suitable as a foundation for extending to real-world variables