-
Notifications
You must be signed in to change notification settings - Fork 14
feat: aligners & variant callers #282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
f911ef4
fd2bca4
1ebe2a0
9533656
1b28075
0ac46db
5161669
af994e2
00deed4
4434666
b07d769
cb47772
39ca901
a921e17
733368e
2dc478a
1c832fc
bd84132
8abaf9c
d42fbfb
589190d
efb1d50
25681c7
1083326
30d1159
e58c2ed
e680a5b
60dce46
f614485
352fa6f
339f20a
df13caf
8ca1901
a37db77
5badc77
fb1ed43
b8411d5
f749a6c
b2659f4
6bf5936
c17a874
31d32b1
bc6af62
d53e15c
fe34740
3afa3e0
e0e602d
a9948bd
81224cf
455f4d7
f197aef
56328f2
174703f
fd50c7e
34e808e
c084df7
c6fdb1a
6b6b92e
f7b7fb1
664a0d1
f470afa
fcafe1e
b4621c3
dd35f36
ab2496a
5ba4e5a
99d645e
7ab7375
cda5a4a
e579d69
9d07e09
810cbe6
a8e2148
db57b47
707b702
0bb848a
ff82245
9adf2f5
2efd098
9f62956
0929b67
211ba8e
d002e55
7ef3492
95a075d
5862a97
95d08cd
e5b5695
89882df
4aeb876
5e29fb6
942e51e
ddf2f9d
af86744
b8ad722
8edb6c8
6b3fc73
be50eb5
b709ccf
f12b66b
7c5e9e6
ee107bd
ab4fb5b
6f610cf
ee5657e
955451c
1e9c395
941f101
2e414ab
91ec9ab
9c56c03
633286f
ba51f9c
62556c5
8183ac3
dbb89bd
85e3f23
21c4f6d
e671287
9b5cffd
94b9583
2dd4880
264575f
313f38c
4c76f80
7399793
f294f8d
0ae6503
eb57f9a
2a268c3
2b5baa3
de11d2c
0222220
af74388
a9b5492
8c4c535
6c86f5f
d53c1b8
9604f07
f48596d
7c7a3c2
f214124
14f3e5b
5a596c1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,141 @@ | ||
| # AGENTS.md | ||
|
|
||
| Agent guidance for the St. Jude Cloud bioinformatics workflows repository. | ||
|
|
||
| ## What this repo is | ||
|
|
||
| A collection of WDL 1.1 pipelines and tool wrappers for genomic analysis. Primary artifacts are `.wdl` files; there is no traditional build step. Python and R exist only as scripts embedded in Docker images under `scripts/`. | ||
|
|
||
| ## Layout | ||
|
|
||
| ``` | ||
| workflows/ # End-to-end pipelines (rnaseq/, dnaseq/, chipseq/, methylation/, etc.) | ||
| tools/ # Individual WDL task wrappers (one tool per file) | ||
| data_structures/ # WDL struct definitions | ||
| docker/ # Dockerfiles + package.json version metadata per image | ||
| scripts/ # Python and R scripts (not standalone; embedded in Docker images) | ||
| tests/ # pytest-workflow YAML test definitions | ||
| tools/ # Per-tool test YAMLs | ||
| workflows/ # Per-workflow test YAMLs | ||
| input_json/ # Input JSON fixtures | ||
| input/ # Binary fixtures (BAMs, FASTQs, etc.) — tracked via git-lfs | ||
| template/ # task-examples.wdl (canonical patterns) + common-parameter-meta.txt | ||
| developer_scripts/ | ||
| run_sprocket_or_miniwdl.sh # Unified test runner | ||
| update_container_tags.sh # Rewrites container tags for branch testing (do not commit) | ||
| ``` | ||
|
|
||
| ## Developer commands | ||
|
|
||
| ### Setup | ||
|
|
||
| ```bash | ||
| uv sync # Python deps (canonical; use this, not pip) | ||
| git lfs pull # Required after clone to get test fixture files | ||
| ``` | ||
|
|
||
| ### WDL lint / format / check (Sprocket) | ||
|
|
||
| ```bash | ||
| sprocket lint . # Lint all WDL | ||
| sprocket format tools/bwa.wdl # Format a file | ||
| sprocket check . # Thorough check (validates inputs, etc.) | ||
| sprocket validate <wdl> <inputs> # Validate inputs against a workflow | ||
| ``` | ||
|
|
||
| `sprocket.toml` disables the `ContainerUri` rule and sets `deny_notes = true` (notes in WDL cause check failure). | ||
|
|
||
| ### Python (scripts/ only) | ||
|
|
||
| ```bash | ||
| uv run ruff format --check --diff scripts/ | ||
| uv run ruff format scripts/ | ||
| uv run ruff check scripts/ | ||
| ``` | ||
|
|
||
| ### R (scripts/ only) | ||
|
|
||
| ```r | ||
| styler::style_dir() # format | ||
| lintr::lint_dir() # lint | ||
| ``` | ||
|
|
||
| ### Tests | ||
|
|
||
| ```bash | ||
| # All tests | ||
| uv run pytest --kwdof --wt $(nproc) | ||
|
|
||
| # Single test by tag | ||
| pytest --tag bwa | ||
|
|
||
| # Single test by name | ||
| pytest -k bwa_aln | ||
|
|
||
| # Choose runner (defaults to sprocket) | ||
| RUNNER=miniwdl pytest --tag bwa | ||
| ``` | ||
|
|
||
| `pytest.ini` always injects `--git-aware --symlink`. Use `--kwdof` to keep outputs on failure. | ||
|
|
||
| Every test calls `./developer_scripts/run_sprocket_or_miniwdl.sh` internally — do not call runners directly in test commands. | ||
|
|
||
| ### Run a WDL task directly (outside pytest) | ||
|
|
||
| ```bash | ||
| # sprocket | ||
| sprocket run --output-dir output --target bwa_aln tools/bwa.wdl <input_file> | ||
|
|
||
| # miniwdl | ||
| miniwdl run --task bwa_aln --verbose --dir output/. -i tests/tools/input_json/bwa_aln.json tools/bwa.wdl | ||
| ``` | ||
|
|
||
| ## CI pipeline | ||
|
|
||
| All jobs trigger on push. Key facts: | ||
| - `reference` and `slow` tags are **excluded** from the CI test matrix. | ||
| - CI deletes `slow`-tagged tests from YAML files in-place before running — do not replicate this destructively locally. | ||
| - CI builds Docker images and runs `update_container_tags.sh` to rewrite tags before pytest — do not commit the rewritten WDL files. | ||
| - Every test runs against both `sprocket` and `miniwdl` runners as a matrix. | ||
| - `sprocket` is built from source (`cargo install sprocket --locked`) in CI. | ||
| - `requirements-ci.txt` pins older versions than `pyproject.toml`; use `uv` locally. | ||
|
|
||
| ## WDL conventions (non-obvious) | ||
|
|
||
| - All WDL files must be `version 1.1`. | ||
| - All tasks must include `set -euo pipefail` when using pipes or multiple commands. | ||
| - Multi-core tasks: accept `use_all_cores: Boolean = false` and `ncpu: Int = 2`; use `$(nproc)` when `use_all_cores` is true. `use_all_cores` must be the last Boolean in the input block; `ncpu` precedes memory/disk inputs. | ||
| - Resource inputs (`ncpu`, `modify_memory_gb`, `modify_disk_size_gb`) must be overridable. | ||
| - Single output → `outfile_name`; multiple outputs or extension matters → `prefix`. | ||
| - BAM/BAI companion pairs must be localized via `ln -s` to CWD and cleaned up with `rm` at task end. | ||
| - Scripts go in `scripts/`; never embed Python/R directly in WDL `command` blocks. | ||
| - Imports within the repo must use **relative paths**. External imports must pin a **tagged release** (not `main`/`master`) — enforced by `pull-check.yaml`. | ||
| - Deprecated tasks: add `deprecated: true` and `warning: "**[DEPRECATED]**..."` to `meta`. Never add `deprecated: false`. | ||
| - Update the relevant `CHANGELOG.md` when modifying any WDL under a subdirectory. | ||
|
|
||
| ## Parameter meta conventions | ||
|
|
||
| Prefer names from `template/common-parameter-meta.txt`. Key ones: | ||
| - `bam` (not `input_bam`, `in_bam`) | ||
| - `bam_index` (not `bai`) | ||
| - `read_one_fastq_gz` / `read_two_fastq_gz` (not `read1`/`read2`) | ||
| - `paired_end` (not `paired`) | ||
|
|
||
| ## Docker image versioning | ||
|
|
||
| Each image lives in `docker/<tool>/` with `Dockerfile` and `package.json`. `version` = underlying tool version; `revision` starts at `0`, increments for image-only changes, resets to `0` on tool version upgrades. Prefer BioContainers images over creating new custom images. | ||
|
|
||
| ## Testing quirks | ||
|
|
||
| - Binary fixtures in `tests/input/` are in **git-lfs** — run `git lfs pull` before testing. | ||
| - Test files prefixed with `_` (e.g., `_test_methylation-preprocess.yaml`) are disabled. | ||
| - Sprocket outputs land in `output/runs/*/_latest/outputs.json`, then get copied to `output/outputs.json`. miniwdl outputs go directly to `output/`. Test YAMLs reference `output/outputs.json`. | ||
| - Fixture inputs are downsampled to small chromosomes (chrY/chrM, chr9/chr22) for speed. | ||
|
|
||
| ## Existing instruction sources | ||
|
|
||
| - `.github/instructions/wdl.instructions.md` — applies to `**/*.wdl`; references CONTRIBUTING.md, best-practices.md, template/task-examples.wdl, and template/common-parameter-meta.txt. | ||
| - `template/task-examples.wdl` — canonical WDL task patterns; read before writing new tasks. | ||
| - `template/common-parameter-meta.txt` — required/banned parameter_meta strings. | ||
| - `CONTRIBUTING.md` — general coding style. | ||
| - `best-practices.md` — WDL-specific best practices. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| FROM quay.io/biocontainers/samtools:1.17--h00cdaf9_0 AS samtools | ||
| FROM quay.io/biocontainers/bwa-mem2:2.3--he70b90d_0 | ||
|
|
||
| COPY --from=samtools /usr/local/bin/ /usr/local/bin/ | ||
| COPY --from=samtools /usr/local/lib/ /usr/local/lib/ | ||
| COPY --from=samtools /usr/local/libexec/ /usr/local/libexec/ | ||
|
|
||
| ENTRYPOINT [ "bwa-mem2" ] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| { | ||
| "name": "bwamem2", | ||
| "version": "2.3", | ||
| "revision": "0" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| FROM quay.io/biocontainers/samtools:1.17--h00cdaf9_0 AS samtools | ||
| FROM quay.io/biocontainers/hisat2:2.2.1--hdbdd923_7 | ||
|
|
||
| COPY --from=samtools /usr/local/bin/ /usr/local/bin/ | ||
| COPY --from=samtools /usr/local/lib/ /usr/local/lib/ | ||
| COPY --from=samtools /usr/local/libexec/ /usr/local/libexec/ | ||
|
|
||
| ENTRYPOINT [ "hisat2" ] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| { | ||
| "name": "hisat2", | ||
| "version": "2.2.1", | ||
| "revision": "0" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| FROM quay.io/biocontainers/samtools:1.17--h00cdaf9_0 AS samtools | ||
| FROM quay.io/biocontainers/minimap2:2.30--h577a1d6_0 | ||
|
|
||
| COPY --from=samtools /usr/local/bin/ /usr/local/bin/ | ||
| COPY --from=samtools /usr/local/lib/ /usr/local/lib/ | ||
| COPY --from=samtools /usr/local/libexec/ /usr/local/libexec/ | ||
|
|
||
| ENTRYPOINT [ "minimap2" ] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| { | ||
| "name": "minimap2", | ||
| "version": "2.30", | ||
| "revision": "0" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| FROM eclipse-temurin:21 | ||
Check warningCode scanning / Snyk Container Low severity - Allocation of Resources Without Limits or Throttling vulnerability in binutils Medium
This file introduces a vulnerable binutils package with a low severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35371 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35364 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35351 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35345 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35360 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35363 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35344 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35352 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35359 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35377 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - Time-of-check Time-of-use (TOCTOU) vulnerability in sed Warning
This file introduces a vulnerable sed package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - Directory Traversal vulnerability in tar Warning
This file introduces a vulnerable tar package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - Unrestricted Upload of File with Dangerous Type vulnerability in tar Medium
This file introduces a vulnerable tar package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - Time-of-check Time-of-use (TOCTOU) vulnerability in util-linux Warning
This file introduces a vulnerable util-linux package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - Authentication Bypass vulnerability in util-linux Medium
This file introduces a vulnerable util-linux package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - Open Redirect vulnerability in wget Medium
This file introduces a vulnerable wget package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - Heap-based Buffer Overflow vulnerability in openjdk-jre Medium
This file introduces a vulnerable openjdk-jre package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - Algorithmic Complexity vulnerability in expat Medium
This file introduces a vulnerable expat package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-4437 vulnerability in glibc Warning
This file introduces a vulnerable glibc package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-4438 vulnerability in glibc Warning
This file introduces a vulnerable glibc package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-4046 vulnerability in glibc Warning
This file introduces a vulnerable glibc package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - Algorithmic Complexity vulnerability in gnutls28 Medium
This file introduces a vulnerable gnutls28 package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Low severity - Stack-based Buffer Overflow vulnerability in gnutls28 Medium
This file introduces a vulnerable gnutls28 package with a low severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35350 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35357 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35374 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35373 vulnerability in rust-coreutils Medium
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35370 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35341 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35367 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35368 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35354 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check warningCode scanning / Snyk Container Medium severity - CVE-2026-35348 vulnerability in rust-coreutils Warning
This file introduces a vulnerable rust-coreutils package with a medium severity vulnerability.
Check noticeCode scanning / Snyk Container Low severity - Covert Timing Channel vulnerability in libgcrypt20 Note
This file introduces a vulnerable libgcrypt20 package with a low severity vulnerability.
Check noticeCode scanning / Snyk Container Low severity - CVE-2024-56433 vulnerability in shadow Note
This file introduces a vulnerable shadow package with a low severity vulnerability.
|
||
|
|
||
|
|
||
| RUN wget https://github.com/NGSEP/NGSEPcore/releases/download/v5.1.0/NGSEPcore_5.1.0.jar -O /usr/local/bin/NGSEPcore.jar | ||
|
|
||
| ENTRYPOINT [ "java", "-jar", "/usr/local/bin/NGSEPcore.jar" ] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| { | ||
| "name": "ngsep", | ||
| "version": "5.1.0", | ||
| "revision": "0" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| # Mutect2 fixtures | ||
|
|
||
| Inputs and intermediate artifacts used by the per-task tests in `tools/test/mutect2.yaml` for `tools/mutect2.wdl`. Files are downsampled to `chrY` and `chrM` to match the reference at `test/fixtures/reference/GRCh38.chrY_chrM.fa` and the BAMs at `test/fixtures/bams/test.bwa_aln_pe*.chrY_chrM.bam`. | ||
|
|
||
| The following list is sorted alphabetically: | ||
|
|
||
| ## af-only-gnomad.hg38.chrY_chrM.vcf.gz | ||
|
|
||
| GATK best-practices germline resource (`af-only-gnomad.hg38.vcf.gz` from the GATK resource bundle), subset to `chrY` and `chrM`. Used as the `germline_resource_vcf` input for the `mutect2` task and as both `intervals` and `variants` inputs for the `get_pileup_summaries` task. | ||
|
|
||
| ## af-only-gnomad.hg38.chrY_chrM.vcf.gz.tbi | ||
|
|
||
| Tabix index for `af-only-gnomad.hg38.chrY_chrM.vcf.gz`. | ||
|
|
||
| ## test.bwa_aln_pe.chrY_chrM_pileup_summaries.table | ||
|
|
||
| Output of `gatk GetPileupSummaries` run on `test/fixtures/bams/test.bwa_aln_pe.chrY_chrM.bam` (normal sample) over the sites in `af-only-gnomad.hg38.chrY_chrM.vcf.gz`. Used as the `normal_pileups` input for the `calculate_contamination` task. | ||
|
|
||
| ## test.bwa_aln_pe.with_variants.chrY_chrM_pileup_summaries.contamination.table | ||
|
|
||
| Output of `gatk CalculateContamination` run on the tumor and normal pileup tables (matched-normal mode). Used as the `contamination_table` input for the `filter_mutect` task. | ||
|
|
||
| ## test.bwa_aln_pe.with_variants.chrY_chrM_pileup_summaries.segments.table | ||
|
|
||
| Tumor segmentation output of the same `gatk CalculateContamination` invocation that produced the contamination table. Used as the `maf_segments` input for the `filter_mutect` task. | ||
|
|
||
| ## test.bwa_aln_pe.with_variants.chrY_chrM_pileup_summaries.table | ||
|
|
||
| Output of `gatk GetPileupSummaries` run on `test/fixtures/bams/test.bwa_aln_pe.with_variants.chrY_chrM.bam` (tumor sample) over the sites in `af-only-gnomad.hg38.chrY_chrM.vcf.gz`. Used as the `tumor_pileups` input for the `calculate_contamination` task. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| #<METADATA>SAMPLE=test | ||
| contig position ref_count alt_count other_alt_count allele_frequency |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| sample contamination error | ||
| test 0.0 1.0 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| #<METADATA>SAMPLE=test | ||
| contig start end minor_allele_fraction |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| #<METADATA>SAMPLE=test | ||
| contig position ref_count alt_count other_alt_count allele_frequency |
Uh oh!
There was an error while loading. Please reload this page.