-
-
Notifications
You must be signed in to change notification settings - Fork 133
Limit EIA-861 years in the fast ETL #4568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 17 commits
2c23157
37ee8cf
a2449b8
4540b26
c4dac29
93258e9
fcaec32
36cb979
9af11c0
ce7411c
4c2d06c
4b4d291
8b30f10
20af392
cec8be7
a005dd1
b885483
417eb08
4a030ec
e51e818
aa20ab9
db42d90
c5ed18c
1bdfa2d
c63bb5a
7787af8
8097e72
03f5ff5
c6d3cf6
42ae523
bff998c
dcdadba
0212ad6
9025d1d
f84c246
d4bc91b
d491496
769f780
5d7945f
88e826b
64c8605
b6ce11b
0f798e9
bff06f9
42cdeb5
678d720
9f50082
a2f6833
19d2584
ebd0707
9cc483f
cd9ebae
aef4e73
798fbe5
d4e50bc
eae4f04
627d065
8013eb8
d8b25e9
ef63feb
7cfab00
57de223
26bfd8d
0606b8c
c088d60
d66dd60
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -47,17 +47,17 @@ | |
|
|
||
| The changes are applied locally to EIA 861 tables. | ||
|
|
||
| * `id` (int): EIA balancing authority identifier (`balancing_authority_id_eia`). | ||
| * `from` (int): Reference year, to use as a template for target years. | ||
| * `to` (List[int]): Target years, in the closed interval format [minimum, maximum]. | ||
| Rows in `core_eia861__yearly_balancing_authority` are added (if missing) for every target year | ||
| * ``id`` (int): EIA balancing authority identifier (``balancing_authority_id_eia``). | ||
| * ``from`` (int): Reference year, to use as a template for target years. | ||
| * ``to`` (List[int]): Target years, in the closed interval format [minimum, maximum]. | ||
| Rows in ``core_eia861__yearly_balancing_authority`` are added (if missing) for every target year | ||
| with the attributes from the reference year. | ||
| Rows in `core_eia861__assn_balancing_authority` are added (or replaced, if existing) | ||
| Rows in ``core_eia861__assn_balancing_authority`` are added (or replaced, if existing) | ||
| for every target year with the utility associations from the reference year. | ||
| Rows in `core_eia861__yearly_service_territory` are added (if missing) for every target year | ||
| Rows in ``core_eia861__yearly_service_territory`` are added (if missing) for every target year | ||
| with the nearest year's associated utilities' counties. | ||
| * `exclude` (Optional[List[str]]): Utilities to exclude, by state (two-letter code). | ||
| Rows are excluded from `core_eia861__assn_balancing_authority` with target year and state. | ||
| * ``exclude`` (Optional[List[str]]): Utilities to exclude, by state (two-letter code). | ||
| Rows are excluded from ``core_eia861__assn_balancing_authority`` with target year and state. | ||
| """ | ||
|
|
||
| UTILITIES: list[dict[str, Any]] = [ | ||
|
|
@@ -76,14 +76,14 @@ | |
|
|
||
| The changes are applied locally to EIA 861 tables. | ||
|
|
||
| * `id` (int): EIA balancing authority (BA) identifier (`balancing_authority_id_eia`). | ||
| Rows for `id` are removed from `core_eia861__yearly_balancing_authority`. | ||
| * `reassign` (Optional[bool]): Whether to reassign utilities to parent BAs. | ||
| Rows for `id` as BA in `core_eia861__assn_balancing_authority` are removed. | ||
| Utilities assigned to `id` for a given year are reassigned | ||
| to the BAs for which `id` is an associated utility. | ||
| * `replace` (Optional[bool]): Whether to remove rows where `id` is a utility in | ||
| `core_eia861__assn_balancing_authority`. Applies only if `reassign=True`. | ||
| * ``id`` (int): EIA balancing authority (BA) identifier (``balancing_authority_id_eia``). | ||
| Rows for ``id`` are removed from ``core_eia861__yearly_balancing_authority``. | ||
| * ``reassign`` (Optional[bool]): Whether to reassign utilities to parent BAs. | ||
| Rows for ``id`` as BA in ``core_eia861__assn_balancing_authority`` are removed. | ||
| Utilities assigned to ``id`` for a given year are reassigned | ||
| to the BAs for which ``id`` is an associated utility. | ||
| * ``replace`` (Optional[bool]): Whether to remove rows where ``id`` is a utility in | ||
| ``core_eia861__assn_balancing_authority``. Applies only if ``reassign=True``. | ||
| """ | ||
|
|
||
| ################################################################################ | ||
|
|
@@ -186,28 +186,35 @@ def filled_core_eia861__yearly_balancing_authority( | |
| """Modified core_eia861__yearly_balancing_authority table. | ||
|
|
||
| This function adds rows for each balancing authority-year pair missing from the | ||
| cleaned core_eia861__yearly_balancing_authority table, using a dictionary of manual fixes. It | ||
| uses the reference year as a template. The function also removes balancing | ||
| authorities that are manually categorized as utilities. | ||
| cleaned :ref:`core_eia861__yearly_balancing_authority` table, using a dictionary | ||
| of manual fixes. It uses the reference year as a template. The function also removes | ||
| balancing authorities that are manually categorized as utilities. | ||
| """ | ||
| df = core_eia861__yearly_balancing_authority | ||
| index = ["balancing_authority_id_eia", "report_date"] | ||
| dfi = df.set_index(index) | ||
| # Prepare reference rows | ||
| keys = [(fix["id"], pd.Timestamp(fix["from"], 1, 1)) for fix in ASSOCIATIONS] | ||
| eia861_years = df["report_date"].dt.year.unique() | ||
| keys = [ | ||
| (fix["id"], pd.Timestamp(fix["from"], 1, 1)) | ||
| for fix in ASSOCIATIONS | ||
| if fix["from"] in eia861_years | ||
| ] | ||
|
Comment on lines
+198
to
+202
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider defining a local version of ASSOCIATIONS to use both here and at line 209 |
||
| refs = dfi.loc[keys].reset_index().to_dict("records") | ||
| # Build table of new rows | ||
| # Insert row for each target balancing authority-year pair | ||
| # missing from the original table, using the reference year as a template. | ||
| rows: list[dict[str, Any]] = [] | ||
| for ref, fix in zip(refs, ASSOCIATIONS, strict=True): | ||
| for ref, fix in zip( | ||
| refs, [fx for fx in ASSOCIATIONS if fx["from"] in eia861_years], strict=True | ||
| ): | ||
| for year in range(fix["to"][0], fix["to"][1] + 1): | ||
| key = (fix["id"], pd.Timestamp(year, 1, 1)) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would it make sense to skip iterations here when year is not in eia861_years, instead of filtering at 216? |
||
| if key not in dfi.index: | ||
| rows.append({**ref, "report_date": key[1]}) | ||
| df = pd.concat( | ||
| [df, apply_pudl_dtypes(pd.DataFrame(rows), group="eia")], axis="index" | ||
| ) | ||
| new_rows = apply_pudl_dtypes(pd.DataFrame(rows), group="eia") | ||
| new_rows = new_rows[new_rows["report_date"].dt.year.isin(eia861_years)] | ||
| df = pd.concat([df, new_rows], axis="index") | ||
| # Remove balancing authorities treated as utilities | ||
| mask = df["balancing_authority_id_eia"].isin([util["id"] for util in UTILITIES]) | ||
| return apply_pudl_dtypes(df[~mask], group="eia") | ||
|
|
@@ -219,10 +226,10 @@ def filled_core_eia861__assn_balancing_authority( | |
| """Modified core_eia861__assn_balancing_authority table. | ||
|
|
||
| This function adds rows for each balancing authority-year pair missing from the | ||
| cleaned core_eia861__assn_balancing_authority table, using a dictionary of manual fixes. | ||
| It uses the reference year as a template. The function also reassigns balancing | ||
| authorities that are manually categorized as utilities to their parent balancing | ||
| authorities. | ||
| cleaned :ref:`core_eia861__assn_balancing_authority` table, using a dictionary of | ||
| manual fixes. It uses the reference year as a template. The function also reassigns | ||
| balancing authorities that are manually categorized as utilities to their parent | ||
| balancing authorities. | ||
| """ | ||
| df = core_eia861__assn_balancing_authority | ||
| # Prepare reference rows | ||
|
|
@@ -249,7 +256,10 @@ def filled_core_eia861__assn_balancing_authority( | |
| tables.append(ref.assign(report_date=key[1])) | ||
| replaced |= mask | ||
| # Append to original table with matching rows removed | ||
| df = pd.concat([df[~replaced], apply_pudl_dtypes(pd.concat(tables), group="eia")]) | ||
| new_rows = apply_pudl_dtypes(pd.concat(tables), group="eia") | ||
| eia861_years = df["report_date"].dt.year.unique() | ||
| new_rows = new_rows[new_rows["report_date"].dt.year.isin(eia861_years)] | ||
| df = pd.concat([df[~replaced], new_rows], axis="index") | ||
| # Remove balancing authorities treated as utilities | ||
| mask = np.zeros(df.shape[0], dtype=bool) | ||
| tables = [] | ||
|
|
@@ -300,20 +310,22 @@ def filled_service_territory_eia861( | |
| """Modified core_eia861__yearly_service_territory table. | ||
|
|
||
| This function adds rows for each balancing authority-year pair missing from the | ||
| cleaned core_eia861__yearly_service_territory table, using a dictionary of manual fixes. It also | ||
| drops utility-state combinations which are missing counties across all years of | ||
| data, fills records missing counties with the nearest year of county data for the | ||
| same utility and state. | ||
| cleaned :ref:`core_eia861__yearly_service_territory` table, using a dictionary of | ||
| manual fixes. It also drops utility-state combinations which are missing counties | ||
| across all years of data, fills records missing counties with the nearest year of | ||
| county data for the same utility and state. | ||
|
|
||
| """ | ||
| index = ["utility_id_eia", "state", "report_date"] | ||
| # Select relevant balancing authority-utility associations | ||
| assn = filled_core_eia861__assn_balancing_authority( | ||
| core_eia861__assn_balancing_authority | ||
| ) | ||
| eia861_years = core_eia861__yearly_service_territory["report_date"].dt.year.unique() | ||
| selected = np.zeros(assn.shape[0], dtype=bool) | ||
| for fix in ASSOCIATIONS: | ||
| years = [fix["from"], *range(fix["to"][0], fix["to"][1] + 1)] | ||
| dates = [pd.Timestamp(year, 1, 1) for year in years] | ||
| dates = [pd.Timestamp(year, 1, 1) for year in years if year in eia861_years] | ||
| mask = assn["balancing_authority_id_eia"].eq(fix["id"]).to_numpy(bool) | ||
| mask[mask] = assn["report_date"][mask].isin(dates) | ||
| selected |= mask | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding an early exit here if
eia861_yearsis empty