Skip to content

Chore/undefined aware base model#159

Open
kelvin-aipolabs wants to merge 3 commits intomainfrom
chore/undefined-aware-base-model
Open

Chore/undefined aware base model#159
kelvin-aipolabs wants to merge 3 commits intomainfrom
chore/undefined-aware-base-model

Conversation

@kelvin-aipolabs
Copy link
Copy Markdown
Contributor

@kelvin-aipolabs kelvin-aipolabs commented Oct 8, 2025

📝 Description

  • Adding a base class for help differentiating "update to null" and "no change" during inputs for partial DB updates.

Summary by cubic

Adds an UndefinedAwareBaseModel to clearly separate “set to null” from “no change” in partial DB updates. It also safeguards model_dump usage to avoid accidentally including unset fields.

  • New Features
    • New Pydantic base model with non-nullable field validation via a custom model_validator.
    • Uses model_fields_set to detect which fields were explicitly provided.
    • Overrides model_dump to require/encourage exclude_unset; configurable WARN or ERROR behavior.

Summary by CodeRabbit

  • New Features
    • Smarter handling of undefined vs. null fields in data models for more predictable validation.
    • Configurable behavior when exporting data without specifying unset handling, with optional warnings or errors to prevent accidental data loss.
  • Documentation
    • Expanded inline documentation explaining new model behavior and configuration.
  • Chores
    • Added logging to surface misconfigurations during data export.
    • Introduced internal configuration points to control serialization and validation behavior.

@vercel
Copy link
Copy Markdown

vercel bot commented Oct 8, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
gate22 Ready Ready Preview Comment Oct 8, 2025 5:39pm

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 8, 2025

Walkthrough

Introduces a new Pydantic base model that distinguishes undefined vs. null, enforces non-nullable checks only for fields explicitly provided, and customizes model_dump to warn or raise when exclude_unset is omitted via a configurable enum and logging.

Changes

Cohort / File(s) Summary
Pydantic undefined-aware base model
backend/aci/common/schemas/undefined_aware_base_model.py
Added UndefinedAwareBaseModel implementing _non_nullable_fields and _dump_without_exclude_unset_behavior, a BehaviorOnDumpWithoutExcludeUnset enum, post-validation validate_non_nullable_fields (runs only for fields in model_fields_set), and an overridden model_dump that warns or raises if exclude_unset is not provided. Includes logger and inline docs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant Model as UndefinedAwareBaseModel
  participant Pydantic as Pydantic Core

  Caller->>Model: instantiate(**data)
  Model->>Pydantic: validate input
  Pydantic-->>Model: validated instance
  Note over Model: Post-validation hook
  Model->>Model: validate_non_nullable_fields()\n(only checks fields in model_fields_set)
  Model-->>Caller: instance
Loading
sequenceDiagram
  autonumber
  actor Caller
  participant Model as UndefinedAwareBaseModel
  participant Logger as Logger

  Caller->>Model: model_dump(exclude_unset=?)
  alt exclude_unset provided
    Model-->>Caller: dict per args
  else exclude_unset not provided
    Model->>Model: check _dump_without_exclude_unset_behavior
    alt Behavior = WARN
      Model->>Logger: warn about missing exclude_unset
      Model-->>Caller: dict via super().model_dump(**kwargs)
    else Behavior = RAISE
      Model-->>Caller: raise RuntimeError / ValueError
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I hop through fields of None and set,
I know which keys you gave, not yet—
I nudge with warnings, or loudly cry,
When dumps forget exclude_unset’s eye.
A rabbit validator, tidy and spry. 🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title references the new undefined-aware base model, which is the core change introduced, but the “Chore/” prefix and its brevity make it somewhat generic and omit the model’s purpose around explicit null and unset field handling.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore/undefined-aware-base-model

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
backend/aci/common/schemas/undefined_aware_base_model.py (2)

27-45: Consider using Pydantic 2.12's MISSING sentinel instead.

While the custom validator approach works, Pydantic 2.12 introduced a MISSING sentinel specifically to distinguish absent values from None. This provides a cleaner, more idiomatic solution without requiring custom validation logic.

Based on learnings

Example using MISSING:

from pydantic import BaseModel, MISSING, Field

class MyModel(BaseModel):
    # Fields that must be provided (if set) but can be None
    field1: str | None = MISSING
    
    # Fields that are optional with a default
    field2: int = 0

With this approach:

  • MISSING clearly indicates "no value provided"
  • None is an explicit value
  • No custom validator needed
  • model_fields_set still works as expected
  • Validation behavior is built-in and well-tested

This would simplify the base model significantly and align with Pydantic's intended usage patterns.


63-64: Address or track the TODO comment.

The question about intercepting __getattribute__() is valid. Direct field access via model.field bypasses the model_dump safeguards. Consider:

  1. Whether this is a concern for the use cases
  2. If needed, implementing __getattribute__ to warn/error on unset field access
  3. Documenting the expected usage pattern to avoid this scenario

Would you like me to open a new issue to track this decision, or would you prefer to address it in this PR?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ceeb03 and e25f4cd.

📒 Files selected for processing (1)
  • backend/aci/common/schemas/undefined_aware_base_model.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/aci/common/schemas/undefined_aware_base_model.py (1)
backend/aci/common/logging_setup.py (1)
  • get_logger (44-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: cubic · AI code reviewer
🔇 Additional comments (3)
backend/aci/common/schemas/undefined_aware_base_model.py (3)

1-9: LGTM!

Imports and logger setup are appropriate for the module's functionality.


11-14: LGTM!

The enum clearly defines the two supported behaviors for handling model_dump calls without exclude_unset.


47-61: Add or confirm Python ≥3.10 requirement
The match case syntax in model_dump (backend/aci/common/schemas/undefined_aware_base_model.py:47-61) requires Python 3.10+. No requires-python was found in pyproject.toml, setup.py, or setup.cfg—please update or verify the project’s interpreter requirement.

Comment on lines +22 to +25
_non_nullable_fields: list[str] = []
_dump_without_exclude_unset_behavior: BehaviorOnDumpWithoutExcludeUnset = (
BehaviorOnDumpWithoutExcludeUnset.WARN
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Use PrivateAttr() for internal configuration fields.

The fields _non_nullable_fields and _dump_without_exclude_unset_behavior are currently treated as regular Pydantic model fields, which means they:

  • Will be included in model_dump() output
  • Will be validated during instantiation
  • Will appear in the JSON schema

Since these are internal configuration attributes, they should use PrivateAttr() from Pydantic v2.

Additionally, the mutable default [] for _non_nullable_fields is a Python anti-pattern that could cause modifications in one instance to affect others.

Apply this diff to fix the issue:

+from pydantic import BaseModel, model_validator, PrivateAttr
-from pydantic import BaseModel, model_validator

 class UndefinedAwareBaseModel(BaseModel):
     """
     A base model that allows all fields to be nullable and use a custom validator to check for
     non-nullable fields.
     """

-    _non_nullable_fields: list[str] = []
+    _non_nullable_fields: list[str] = PrivateAttr(default_factory=list)
-    _dump_without_exclude_unset_behavior: BehaviorOnDumpWithoutExcludeUnset = (
-        BehaviorOnDumpWithoutExcludeUnset.WARN
-    )
+    _dump_without_exclude_unset_behavior: BehaviorOnDumpWithoutExcludeUnset = PrivateAttr(
+        default=BehaviorOnDumpWithoutExcludeUnset.WARN
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_non_nullable_fields: list[str] = []
_dump_without_exclude_unset_behavior: BehaviorOnDumpWithoutExcludeUnset = (
BehaviorOnDumpWithoutExcludeUnset.WARN
)
from pydantic import BaseModel, model_validator, PrivateAttr
class UndefinedAwareBaseModel(BaseModel):
"""
A base model that allows all fields to be nullable and use a custom validator to check for
non-nullable fields.
"""
_non_nullable_fields: list[str] = PrivateAttr(default_factory=list)
_dump_without_exclude_unset_behavior: BehaviorOnDumpWithoutExcludeUnset = PrivateAttr(
default=BehaviorOnDumpWithoutExcludeUnset.WARN
)
# …rest of class…
🤖 Prompt for AI Agents
In backend/aci/common/schemas/undefined_aware_base_model.py around lines 22-25,
the internal config attributes _non_nullable_fields and
_dump_without_exclude_unset_behavior are currently regular Pydantic fields;
change them to Pydantic PrivateAttr()s so they are not validated, serialized, or
exposed in JSON schema. Specifically, replace the list default with
PrivateAttr(default_factory=list) to avoid a shared mutable default, and set the
enum config with PrivateAttr(default=BehaviorOnDumpWithoutExcludeUnset.WARN) so
it remains an internal attribute.

As there is no easy way to differentiate between "None" and "Undefined" with Pydantic.
We don't know whether caller do not provide a value for a field or want to explicitly set
it to None. We use a workaround as follow:
- We allow all fields to be nullable and default to None ewhen defining the pydantic model.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in docstring.

"ewhen" should be "when".

-    - We allow all fields to be nullable and default to None ewhen defining the pydantic model.
+    - We allow all fields to be nullable and default to None when defining the pydantic model.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- We allow all fields to be nullable and default to None ewhen defining the pydantic model.
- We allow all fields to be nullable and default to None when defining the pydantic model.
🤖 Prompt for AI Agents
In backend/aci/common/schemas/undefined_aware_base_model.py around line 33, the
docstring contains a typo: "ewhen" should be "when"; update the docstring to
replace "ewhen" with "when" so the sentence reads correctly.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
backend/aci/common/schemas/undefined_aware_base_model.py (2)

22-25: Past review comment remains unaddressed: Use PrivateAttr() for internal configuration.

The internal fields _non_nullable_fields and _dump_without_exclude_unset_behavior are still defined as regular Pydantic fields instead of private attributes. This means they will be serialized, validated, and exposed in the JSON schema. Additionally, the mutable default [] can cause shared state bugs.


33-33: Past review comment remains unaddressed: Fix typo in docstring.

The typo "ewhen" should be "when" in the docstring.

🧹 Nitpick comments (2)
backend/aci/common/schemas/undefined_aware_base_model.py (2)

63-64: Resolve or track the incomplete TODO.

The TODO suggests that direct field access (e.g., model.field) may bypass the undefined-awareness protections. This could undermine the model's purpose if users access fields directly rather than through model_dump(exclude_unset=True).

Would you like me to:

  1. Generate code to intercept __getattribute__ with proper undefined handling?
  2. Open a new issue to track this as a follow-up enhancement?
  3. Document in the class docstring that users must use model_dump(exclude_unset=True) or check model_fields_set rather than direct access?

16-61: Consider Pydantic 2.12's MISSING sentinel for cleaner undefined handling.

Pydantic 2.12 introduced a native MISSING sentinel specifically to distinguish "no value" from None, which aligns with your use case. This could eliminate the need for the workaround approach (making all fields nullable with custom validation).

Based on learnings.

Example pattern with MISSING:

from pydantic import BaseModel, MISSING

class MyModel(BaseModel):
    field: int | None = MISSING  # absent by default
    
# field not in instance.model_fields_set -> value is MISSING
# field explicitly set to None -> value is None
# field set to 5 -> value is 5

This approach would:

  • Eliminate the need for _non_nullable_fields tracking
  • Remove the custom validator complexity
  • Provide clearer semantics for API consumers
  • Work naturally with model_dump(exclude_unset=True)

If adopting this, you could still keep the model_dump override to enforce exclude_unset usage.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e25f4cd and 8be703e.

📒 Files selected for processing (1)
  • backend/aci/common/schemas/undefined_aware_base_model.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/aci/common/schemas/undefined_aware_base_model.py (1)
backend/aci/common/logging_setup.py (1)
  • get_logger (44-45)

Comment on lines +22 to +45
_non_nullable_fields: list[str] = []
_dump_without_exclude_unset_behavior: BehaviorOnDumpWithoutExcludeUnset = (
BehaviorOnDumpWithoutExcludeUnset.WARN
)

@model_validator(mode="after")
def validate_non_nullable_fields(self) -> "UndefinedAwareBaseModel":
"""
As there is no easy way to differentiate between "None" and "Undefined" with Pydantic.
We don't know whether caller do not provide a value for a field or want to explicitly set
it to None. We use a workaround as follow:
- We allow all fields to be nullable and default to None ewhen defining the pydantic model.
- We use a custom validator to check for non-nullable fields.
- Only when caller provided a value, field name will be in `model.model_fields_set`.
- When updating to database, we either check `model.model_fields_set` or use
`model_dump(exclude_unset=True)` to exclude unset fields.
- If model_dump is called without exclude_unset, we console warn (default) or raise error.
"""

non_nullable_fields = self._non_nullable_fields
for field in self.model_fields_set:
if field in non_nullable_fields and getattr(self, field) is None:
raise ValueError(f"{field} cannot be None if it is provided.")
return self
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Unclear how subclasses populate _non_nullable_fields.

The validation logic relies on _non_nullable_fields, but there's no documented or type-safe mechanism for subclasses to declare which fields should be non-nullable. Users may not understand how to properly configure this base model.

Consider one of these approaches:

  1. Class-level configuration using ConfigDict (if feasible with your design):
class ConfigDict:
    non_nullable_fields: list[str] = []
  1. Document the expected usage pattern in the docstring:
class MyModel(UndefinedAwareBaseModel):
    field1: str | None = None
    field2: int | None = None
    _non_nullable_fields: list[str] = PrivateAttr(default_factory=lambda: ["field1"])
  1. Consider leveraging Pydantic 2.12's new MISSING sentinel (based on learnings), which provides a native way to distinguish undefined from None without requiring this workaround.

Comment on lines +57 to +60
raise SyntaxError(
"model_dump is called without providing `exclude_unset` args. This may "
"accidentally include unset fields."
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Use ValueError instead of SyntaxError.

SyntaxError is semantically incorrect for runtime validation errors—it's reserved for syntax parsing failures. Use ValueError or define a custom exception.

Apply this diff:

-                    raise SyntaxError(
+                    raise ValueError(
                         "model_dump is called without providing `exclude_unset` args. This may "
                         "accidentally include unset fields."
                     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raise SyntaxError(
"model_dump is called without providing `exclude_unset` args. This may "
"accidentally include unset fields."
)
raise ValueError(
"model_dump is called without providing `exclude_unset` args. This may "
"accidentally include unset fields."
)
🤖 Prompt for AI Agents
In backend/aci/common/schemas/undefined_aware_base_model.py around lines 57 to
60, the code currently raises SyntaxError when model_dump is called without
exclude_unset; replace this with ValueError (or a project-specific custom
exception) to correctly represent a runtime validation error, updating the raise
statement message to use ValueError(...) and keeping the existing message text
unchanged.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant