Skip to content

add transformations#13

Open
emptymalei wants to merge 6 commits intomainfrom
lm/more-transformations
Open

add transformations#13
emptymalei wants to merge 6 commits intomainfrom
lm/more-transformations

Conversation

@emptymalei
Copy link
Copy Markdown
Owner

No description provided.

Copilot AI review requested due to automatic review settings August 15, 2025 07:49
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds several new transformation classes to the dataframe module, expanding the available data preprocessing capabilities. The changes introduce utility transformations for common data manipulation tasks like shuffling, type conversion, value replacement, and statistical operations.

Key changes include:

  • Added 9 new transformation classes extending TransformBase
  • Introduced loguru logger for transformation tracking
  • Added utility classes for data preprocessing workflows

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread haferml/transforms/dataframe.py Outdated
Comment thread haferml/transforms/dataframe.py Outdated
Comment thread haferml/transforms/dataframe.py
Comment thread haferml/transforms/dataframe.py Outdated
Default is to overwrite original dt_column
"""

def __init__(self, column_name: str, target_column: Optional[str] = None):
Copy link

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type hint Optional[str] is used but Optional is not imported. You need to add Optional to the imports from typing.

Copilot uses AI. Check for mistakes.

:param lambda_filter: a callable that specifies which row to filter
:param column_to_replace: which column to replace values with
:param replacement_value: the value to replace with
Copy link

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring example shows ReplaceValues but this is the AddColumnWithCondition class. The example should demonstrate AddColumnWithCondition usage instead.

Suggested change
:param replacement_value: the value to replace with
Example:
```python
# Suppose you want to add a column "is_bad" that is True if "indicator_column" == "bad_value"
lambda_compute = lambda x: x["indicator_column"] == "bad_value"
add_col = AddColumnWithCondition(
lambda_compute=lambda_compute,
target_column="is_bad"
)
df = add_col(df)
```
:param lambda_compute: a callable that computes the value for each row (applied with DataFrame.apply, axis=1)
:param target_column: the name of the column to add or overwrite

Copilot uses AI. Check for mistakes.
)
```

:param lambda_filter: a callable that specifies which row to filter
Copy link

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter documentation refers to lambda_filter but the actual parameter name is lambda_compute. The documentation should match the parameter name.

Suggested change
:param lambda_filter: a callable that specifies which row to filter
lambda_compute = lambda x: x["indicator_column"] == "bad_value"
replace_val = ReplaceValues(
lambda_compute = lambda_compute,
column_to_replace = "value_a_column",
replacement_value = np.nan
)
```
:param lambda_compute: a callable that specifies which row to filter

Copilot uses AI. Check for mistakes.
```

:param lambda_filter: a callable that specifies which row to filter
:param column_to_replace: which column to replace values with
Copy link

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter documentation refers to column_to_replace but the actual parameter name is target_column. The documentation should match the parameter name.

Suggested change
:param column_to_replace: which column to replace values with
:param target_column: which column to replace values with

Copilot uses AI. Check for mistakes.

:param lambda_filter: a callable that specifies which row to filter
:param column_to_replace: which column to replace values with
:param replacement_value: the value to replace with
Copy link

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter documentation refers to replacement_value but this parameter doesn't exist in the AddColumnWithCondition class. This documentation appears to be copied from another class.

Suggested change
:param replacement_value: the value to replace with
"""Add a calculated column based on a lambda function.
Example:
```python
# Adds a new column 'is_bad' based on a condition
lambda_compute = lambda x: x["indicator_column"] == "bad_value"
add_col = AddColumnWithCondition(
lambda_compute=lambda_compute,
target_column="is_bad"
)
```
:param lambda_compute: a callable that computes the value for each row
:param target_column: the name of the column to add or overwrite

Copilot uses AI. Check for mistakes.
Comment thread haferml/transforms/dataframe.py Outdated
emptymalei and others added 5 commits August 15, 2025 07:59
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants