WIP: Add FERC Form 1 identification table to PUDL#5008
Draft
WIP: Add FERC Form 1 identification table to PUDL#5008
Conversation
5b0c503 to
5ed40cc
Compare
For more information, see https://pre-commit.ci
For more information, see https://pre-commit.ci
6 tasks
6 tasks
katie-lamb
reviewed
Apr 2, 2026
Comment on lines
+3157
to
+3162
| df[ | ||
| ["office_street_address", "office_city", "office_state", "office_zip_code"] | ||
| ] = pd.DataFrame( | ||
| df["office_street_address"].apply(parse_address).tolist(), | ||
| index=df.index, | ||
| ) |
Member
There was a problem hiding this comment.
a few suggested cleaning steps:
- it might be nice to make them lowercase or title case, but not a big deal. will be done in splink/during match anyways
- expand address suffixes to their full name and stripping punctuation, or vice versa. e.g. st. -> street. this could also happen within a separate cleaning in the match step. i believe i do this for the SEC addresses and can dig it up
Member
There was a problem hiding this comment.
taking these cleaning steps also depends on how useful address actually is for the match. maybe zip or state is really more useful
katie-lamb
reviewed
Apr 2, 2026
Comment on lines
+2612
to
+2614
| parsed.get("PlaceName", None), | ||
| parsed.get("StateName", None), | ||
| parsed.get("ZipCode", None), |
Member
There was a problem hiding this comment.
it seems like we can maybe do the match without state and zip, but it's probably nice to have those columns in this table anyways.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Addresses first step in #5150
What problem does this address?
In #4975, we updated the utility matching method to auto-match utilities when they hear identical (cleaned) utility names. We want to extend this method to consider utility addresses in matching.
The first step to doing this will be to bring in the FERC Form 1 identification table, which includes data on utility addresses necessary for matching.
What did you change?
core_ferc1__yearly_identification_and_certificationTodos
Documentation
Make sure to update relevant aspects of the documentation:
docs/data_sources/templates).src/metadata).Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list
dbttests.pixi run pre-commit-runto run linters and static code analysis checks.pixi run pytest-cilocally to ensure that the merge queue will accept your PR.build-deploy-pudlGitHub Action manually and ensure that it succeeds.