Switch from pre-commit to prek; add security pre-commit hooks#5141
Switch from pre-commit to prek; add security pre-commit hooks#5141zaneselvans merged 4 commits intomainfrom
pre-commit to prek; add security pre-commit hooks#5141Conversation
| # happen to only be available via PyPI. | ||
| "catalystcoop.pudl" = { path = ".", editable = true } | ||
| # Furo is pure python and the conda package is like a year out of date. | ||
| detect-secrets = ">=1.5" |
There was a problem hiding this comment.
It's useful to have this in our environment for running or re-running baseline scans, but it's not strictly necessary since that's rare and the hook runs in its own isolated environment. It's not available on conda-forge, so we don't get to clean this section after the docs PR merges.
| autoupdate_commit_msg: "[pre-commit.ci] pre-commit autoupdate" | ||
| autoupdate_schedule: weekly | ||
| skip: [unit-tests, nb-output-clear, shellcheck] | ||
| skip: [unit-tests, nb-output-clear, shellcheck, trufflehog, detect-secrets] |
There was a problem hiding this comment.
They need some resources not available on pre-commit.ci so they don't work there.
| - "--baseline" | ||
| - ".secrets.baseline" | ||
| - "--exclude-files" | ||
| - "(?x)(dbt/package-lock\\.yml|.*\\.ipynb|docs/.*\\.html|migrations/.*|skills-lock\\.json)" |
There was a problem hiding this comment.
I did a full scan with no exclusions first to see where it would find false positives, and created this filter based on those results.
jdangerx
left a comment
There was a problem hiding this comment.
Nice! Nothing blocking.
I think probably we can add a workload_identity_provider: rule which means we have fewer special cases to go through - here's the diff if you want to apply that.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 46e130920..bdbf2184b 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -112,6 +112,8 @@ repos:
args:
- "--baseline"
- ".secrets.baseline"
+ - "--exclude-lines"
+ - "workload_identity_provider:"
- "--exclude-files"
- "(?x)(dbt/package-lock\\.yml|.*\\.ipynb|docs/.*\\.html|migrations/.*|skills-lock\\.json)"
diff --git a/.secrets.baseline b/.secrets.baseline
index 587616db2..85262e0a2 100644
--- a/.secrets.baseline
+++ b/.secrets.baseline
@@ -130,39 +130,15 @@
"pattern": [
"(?x)(dbt/package-lock\\.yml|.*\\.ipynb|docs/.*\\.html|migrations/.*|skills-lock\\.json)"
]
+ },
+ {
+ "path": "detect_secrets.filters.regex.should_exclude_line",
+ "pattern": [
+ "workload_identity_provider:"
+ ]
}
],
"results": {
- ".github/workflows/build-deploy-ferceqr.yml": [
- {
- "type": "Base64 High Entropy String",
- "filename": ".github/workflows/build-deploy-ferceqr.yml",
- "hashed_secret": "99359e3ddedb1602025485e4ff9017fa7d71cc77",
- "is_verified": false,
- "line_number": 63,
- "is_secret": false
- }
- ],
- ".github/workflows/build-deploy-pudl.yml": [
- {
- "type": "Base64 High Entropy String",
- "filename": ".github/workflows/build-deploy-pudl.yml",
- "hashed_secret": "99359e3ddedb1602025485e4ff9017fa7d71cc77",
- "is_verified": false,
- "line_number": 110,
- "is_secret": false
- }
- ],
- ".github/workflows/deploy-pudl.yml": [
- {
- "type": "Base64 High Entropy String",
- "filename": ".github/workflows/deploy-pudl.yml",
- "hashed_secret": "99359e3ddedb1602025485e4ff9017fa7d71cc77",
- "is_verified": false,
- "line_number": 68,
- "is_secret": false
- }
- ],
"docker/dagster.yaml": [
{
"type": "Secret Keyword",
@@ -240,5 +216,5 @@
}
]
},
- "generated_at": "2026-03-29T17:53:48Z"
+ "generated_at": "2026-04-01T15:17:31Z"
}Switch from pre-commit to prek, a parallelized Rust-based drop-in
replacement, to speed up hook execution during development. Also fixes
several unquoted shell variable warnings (SC2086) in GitHub Actions
workflows and updates the dataset label sed script to handle any
whitespace.
- Replace `pre-commit` dependency with `prek` in pyproject.toml
- Rename pixi tasks: pre-commit-{run,install,autoupdate} → prek-{run,install,autoupdate}
- Update AGENTS.md and dev_setup.rst to reference prek commands and docs
- Update update-lockfiles.yml to call prek-autoupdate
- Fix unquoted $GITHUB_OUTPUT and dataset arg in update-dois.yml
- Update sed pattern to handle any whitespace: `sed -E 's/[[:space:]]+/, /g'`
- Regenerate pixi.lock file to reflect the new dependencies.
- Add an exception to security screening pre-commit hook for WIF tokens
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
@jdangerx I've replaced |
| run: | | ||
| DEST_DIR=${{ case(github.ref == 'refs/heads/main', 'latest', github.ref) }} | ||
| echo DEST_DIR=${DEST_DIR#refs/heads/} > $GITHUB_ENV | ||
| echo DEST_DIR=${DEST_DIR#refs/heads/} > "$GITHUB_ENV" |
There was a problem hiding this comment.
Fixed some linting issues that came up when I ran prek run --all-files
| run: | | ||
| DATASETS="${{ github.event.inputs.datasets }}" | ||
| LABELS=$(echo "$DATASETS" | sed 's/ \+/, /g') | ||
| # shellcheck disable=SC2001 |
There was a problem hiding this comment.
ShellCheck is like "Why you wanna use sed for this?"
| pixi update --json | pixi exec pixi-diff-to-markdown >> diff.md | ||
| pixi install --locked | ||
| pixi run pre-commit-autoupdate | ||
| pixi run prek-autoupdate |
There was a problem hiding this comment.
I think this is the only place we're programmatically using it.
There was a problem hiding this comment.
Not sure how this guy snuck in without clearing the outputs.
| - "--exclude-lines" | ||
| - "workload_identity_provider:" |
There was a problem hiding this comment.
Removed the (now) false WIF positives.
| grpcio = "==1.78.1" | ||
| grpcio-health-checking = "==1.78.1" | ||
| grpcio-status = "==1.78.1" |
There was a problem hiding this comment.
Sneaking this pin update in so I don't forget.
pre-commit to prek; add security pre-commit hooks
|
Popped this out and back into the queue because it got hung up installing the pixi environment for some reason |
Overview
Documentation
To-do list
pixi run prek --all-filesand fix the issues it raisespixi run prek-runto run linters and static code analysis checks.