Skip to content

Switch from pre-commit to prek; add security pre-commit hooks#5141

Merged
zaneselvans merged 4 commits intomainfrom
trufflehog
Apr 1, 2026
Merged

Switch from pre-commit to prek; add security pre-commit hooks#5141
zaneselvans merged 4 commits intomainfrom
trufflehog

Conversation

@zaneselvans
Copy link
Copy Markdown
Member

@zaneselvans zaneselvans commented Mar 29, 2026

Overview

  • Adds two new pre-commit hooks: trufflehog and detect-secrets
  • These will help prevent us from accidentally committing passwords, API keys, and access tokens to the repository.
  • PR includes results of a baseline scan with detect-secrets with the identified suspicious strings marked as false-positives after review, so that they are ignored in the future.
  • Partly this was inspired by setting up my own dotfiles repo to track my local configurations, and partly by just how easy it is to create multi-thousand line PRs with the help of coding agents... and it would be real bad if they snuck authentication tokens into the codebase.
  • We are also starting to deploy more services to cloud resources, so there's an increasing risk of doing this.

Documentation

  • Review and update any other aspects of the documentation that might be affected by this PR.

To-do list

  • Run pixi run prek --all-files and fix the issues it raises
  • Run pixi run prek-run to run linters and static code analysis checks.
  • Review the PR yourself and call out any questions or issues you have.

@zaneselvans zaneselvans self-assigned this Mar 29, 2026
@zaneselvans zaneselvans added developer experience Things that make the developers' lives easier, but don't necessarily directly improve the data. testing Writing tests, creating test data, automating testing, etc. labels Mar 29, 2026
@zaneselvans zaneselvans moved this from New to In progress in Catalyst Megaproject Mar 29, 2026
@zaneselvans zaneselvans added the cloud Stuff that has to do with adapting PUDL to work in cloud computing context. label Mar 29, 2026
# happen to only be available via PyPI.
"catalystcoop.pudl" = { path = ".", editable = true }
# Furo is pure python and the conda package is like a year out of date.
detect-secrets = ">=1.5"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's useful to have this in our environment for running or re-running baseline scans, but it's not strictly necessary since that's rare and the hook runs in its own isolated environment. It's not available on conda-forge, so we don't get to clean this section after the docs PR merges.

autoupdate_commit_msg: "[pre-commit.ci] pre-commit autoupdate"
autoupdate_schedule: weekly
skip: [unit-tests, nb-output-clear, shellcheck]
skip: [unit-tests, nb-output-clear, shellcheck, trufflehog, detect-secrets]
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need some resources not available on pre-commit.ci so they don't work there.

- "--baseline"
- ".secrets.baseline"
- "--exclude-files"
- "(?x)(dbt/package-lock\\.yml|.*\\.ipynb|docs/.*\\.html|migrations/.*|skills-lock\\.json)"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a full scan with no exclusions first to see where it would find false positives, and created this filter based on those results.

Copy link
Copy Markdown
Member

@jdangerx jdangerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Nothing blocking.

I think probably we can add a workload_identity_provider: rule which means we have fewer special cases to go through - here's the diff if you want to apply that.

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 46e130920..bdbf2184b 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -112,6 +112,8 @@ repos:
         args:
           - "--baseline"
           - ".secrets.baseline"
+          - "--exclude-lines"
+          - "workload_identity_provider:"
           - "--exclude-files"
           - "(?x)(dbt/package-lock\\.yml|.*\\.ipynb|docs/.*\\.html|migrations/.*|skills-lock\\.json)"
 
diff --git a/.secrets.baseline b/.secrets.baseline
index 587616db2..85262e0a2 100644
--- a/.secrets.baseline
+++ b/.secrets.baseline
@@ -130,39 +130,15 @@
       "pattern": [
         "(?x)(dbt/package-lock\\.yml|.*\\.ipynb|docs/.*\\.html|migrations/.*|skills-lock\\.json)"
       ]
+    },
+    {
+      "path": "detect_secrets.filters.regex.should_exclude_line",
+      "pattern": [
+        "workload_identity_provider:"
+      ]
     }
   ],
   "results": {
-    ".github/workflows/build-deploy-ferceqr.yml": [
-      {
-        "type": "Base64 High Entropy String",
-        "filename": ".github/workflows/build-deploy-ferceqr.yml",
-        "hashed_secret": "99359e3ddedb1602025485e4ff9017fa7d71cc77",
-        "is_verified": false,
-        "line_number": 63,
-        "is_secret": false
-      }
-    ],
-    ".github/workflows/build-deploy-pudl.yml": [
-      {
-        "type": "Base64 High Entropy String",
-        "filename": ".github/workflows/build-deploy-pudl.yml",
-        "hashed_secret": "99359e3ddedb1602025485e4ff9017fa7d71cc77",
-        "is_verified": false,
-        "line_number": 110,
-        "is_secret": false
-      }
-    ],
-    ".github/workflows/deploy-pudl.yml": [
-      {
-        "type": "Base64 High Entropy String",
-        "filename": ".github/workflows/deploy-pudl.yml",
-        "hashed_secret": "99359e3ddedb1602025485e4ff9017fa7d71cc77",
-        "is_verified": false,
-        "line_number": 68,
-        "is_secret": false
-      }
-    ],
     "docker/dagster.yaml": [
       {
         "type": "Secret Keyword",
@@ -240,5 +216,5 @@
       }
     ]
   },
-  "generated_at": "2026-03-29T17:53:48Z"
+  "generated_at": "2026-04-01T15:17:31Z"
 }

@jdangerx jdangerx moved this from In progress to In review in Catalyst Megaproject Apr 1, 2026
@zaneselvans zaneselvans moved this from In review to In progress in Catalyst Megaproject Apr 1, 2026
Switch from pre-commit to prek, a parallelized Rust-based drop-in
replacement, to speed up hook execution during development. Also fixes
several unquoted shell variable warnings (SC2086) in GitHub Actions
workflows and updates the dataset label sed script to handle any
whitespace.

- Replace `pre-commit` dependency with `prek` in pyproject.toml
- Rename pixi tasks: pre-commit-{run,install,autoupdate} → prek-{run,install,autoupdate}
- Update AGENTS.md and dev_setup.rst to reference prek commands and docs
- Update update-lockfiles.yml to call prek-autoupdate
- Fix unquoted $GITHUB_OUTPUT and dataset arg in update-dois.yml
- Update sed pattern to handle any whitespace: `sed -E 's/[[:space:]]+/, /g'`
- Regenerate pixi.lock file to reflect the new dependencies.
- Add an exception to security screening pre-commit hook for WIF tokens
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@zaneselvans
Copy link
Copy Markdown
Member Author

@jdangerx I've replaced pre-commit with prek if you'd like to take another look.

@zaneselvans zaneselvans requested a review from jdangerx April 1, 2026 19:30
run: |
DEST_DIR=${{ case(github.ref == 'refs/heads/main', 'latest', github.ref) }}
echo DEST_DIR=${DEST_DIR#refs/heads/} > $GITHUB_ENV
echo DEST_DIR=${DEST_DIR#refs/heads/} > "$GITHUB_ENV"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed some linting issues that came up when I ran prek run --all-files

run: |
DATASETS="${{ github.event.inputs.datasets }}"
LABELS=$(echo "$DATASETS" | sed 's/ \+/, /g')
# shellcheck disable=SC2001
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShellCheck is like "Why you wanna use sed for this?"

pixi update --json | pixi exec pixi-diff-to-markdown >> diff.md
pixi install --locked
pixi run pre-commit-autoupdate
pixi run prek-autoupdate
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only place we're programmatically using it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this guy snuck in without clearing the outputs.

Comment on lines +115 to +116
- "--exclude-lines"
- "workload_identity_provider:"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exclude the WIF

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the (now) false WIF positives.

Comment on lines +97 to +99
grpcio = "==1.78.1"
grpcio-health-checking = "==1.78.1"
grpcio-status = "==1.78.1"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sneaking this pin update in so I don't forget.

@zaneselvans zaneselvans changed the title Add trufflehog and detect-secrets to pre-commit hooks Switch from pre-commit to prek; add security pre-commit hooks Apr 1, 2026
@zaneselvans zaneselvans moved this from In progress to In review in Catalyst Megaproject Apr 1, 2026
@zaneselvans zaneselvans added this pull request to the merge queue Apr 1, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 1, 2026
@zaneselvans zaneselvans added this pull request to the merge queue Apr 1, 2026
@zaneselvans
Copy link
Copy Markdown
Member Author

Popped this out and back into the queue because it got hung up installing the pixi environment for some reason

Merged via the queue into main with commit 884494c Apr 1, 2026
15 checks passed
@zaneselvans zaneselvans deleted the trufflehog branch April 1, 2026 22:07
@github-project-automation github-project-automation bot moved this from In review to Done in Catalyst Megaproject Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cloud Stuff that has to do with adapting PUDL to work in cloud computing context. developer experience Things that make the developers' lives easier, but don't necessarily directly improve the data. testing Writing tests, creating test data, automating testing, etc.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants