This README is pulled from a default template for workflows.
- The
libdirectory contains general libraries that may be referenced by multiple workflows, for instance cromwell configs and python configs. Currently nothing in this directory is used.
-
Each pipeline is a full analysis. Think of it like the heading of a methods section in a paper. For instance if this were genetic summary statistics workflow, a pipeline might be "fine-mapping" that does both conditional and credible set analysis. Another pipeline may be "colocalization".
-
Pipelines may have numbers prior to their name (e.g.,
example_pipeline_1to0025-example_pipeline_1). These numbers do not mean anything, but merely used to keep pipelines in their general order of execution. These are optional. -
A pipeline consists of :
- A workflow.
- A
scriptsdirectory with all scripts referenced by that workflow (unless a general lib script is called). Scripts may have numbers prior to their name. These numbers do not mean anything, but merely used to keep scripts in their general order of execution. These are optional. - A
docsdirectory that contains a documentation of the default parameters written in a style that is publishable as methods in a paper (including citations). Within thedocsdirectory there may be areferencewith any additional reference materials. - An
example_runtime_setupdirectory contains files that give an example of actual config files and any other files used to run the pipeline.
- A studies directory should either exist within the workflow repo or be a separate repo that has the same name as the workflow repo, but with
studiesappended to it (e.g.template-workflowbecomestemplate-workflow-studies). - If there is a standard set of plots that will always look the same way, a pipeline should generate such plots. Otherwise, all code to analyze the results of a pipeline run should be in the
studiesdirectory. For instance if this were genetic summary statistics workflow,studiesmay contain at2ddirectory and aweightdirectory. - Within a study is either an Jupyter notebook (either python or R kernel) or an R markdown file. Nearly all plots / analysis of the results of running the various pipelines should be done in the notebook / markdown file.
- A study may also contain a scripts directory with scripts to aggregate data for a one off analysis (if the analysis is going to be repeated, consider making a new pipeline or adding it to an existing pipeline) or for special plots that cannot be done in the notebook / markdown file.
- Documentation
- Environment version control
- Pipeline version control
- Git branches
- Code review
Be sure to document your code!
Analysis environment is controlled using conda. Each pipeline should have an environment.yml file with all of the packages used. If a required package or library is missing from conda (and therefore not in the environment.yml), it should be noted in the README.md of the pipeline.
conda env export --no-builds | grep -v prefix | grep -v name > environment.ymlEach pipeline within this workflow uses bumpversion for automatic semantic versioning.
# bump the appropriate increment
bumpversion patch --verbose --dry-run
bumpversion minor --verbose --dry-run
bumpversion major --verbose --dry-run
# commit with tags
git push --tagsForking the repository allows developers to work independently while retaining well-maintained code on the master fork. For instructions on how to fork, follow the Fork a repo instructions.
After forking the repo, clone the repo to your local desktop:
# to use SSH
git clone git@github.com:<username>/template-workflow.git
# to use Https
git clone https://github.com/<username>/template-workflow.gitThis creates a replica of the remote repository on your local desktop. Note: When you create your local repository, it will also make a local clone of the remote repository (typically as origin). So, your local master branch would simply be master. But, your remote master branch will be origin/master. You can also add multiple remote repositories. For instance, let us say our main repository is under the remote repository my_repo. We will want to add it as a remote repository, so we can fetch the most up-to-date code. You could add it by:
# Add the my_repo remote repo to your local desktop -- this will allow you to pull and push to branches on the my_repo repository
git remote add my_repo git@github.com:my_repo/template-workflow.gitBranching is how git actually tracks code development. For more information, see the Git Branch Tutorial on Atlassian. If you want to add a new feature, pipeline, or fix a bug, a common work flow would look like this:
# Update your local copy of the master branch to make sure you are getting the most up-to-date code
git pull
# Create the branch on your local machine and switch in this branch
git checkout -b [name_of_your_new_branch]
# Push the branch on github
git push origin [name_of_your_new_branch]As you develop, you want to commit your work to your branch, so you don't lose it all if something happens!
# Confirm we're on the right branch
git branch -a
# Add all your work to be tracked (Note: there are many ways to add specific files, etc. See https://git-scm.com/docs/git-add for more information). The following command adds everything in your currently directory.
git add .
# Commit your work to the branch with a message describing what's in the commit
git commit -m "Created the scATAC-seq pipeline!"
# You can add a -u parameter to set-upstream for a push
# Alternatively, git will also automatically query you when you do your first push.
# You can also set this manually by adding a new remote for your branch:
#git remote add [name_of_your_remote] [name_of_your_new_branch]
# Here is another push where we specify HEAD
#git push origin HEAD # HEAD pushes everything up to the most recent commitCreate a GitHub Pull Request. A PR allows other developers a chance to go through and comment on lines of code they believe can be improved. In addition, it will tell you if the code you are trying to merge into the my_repo branch actually conflicts with code that already exists in the branch, so you don't overwrite someone else's work.
Once another developer approves the PR, you have the go-ahead to merge your code! Congrats, you finished your feature!
Note: There are some cases where you may just want to push directly to the my_repo fork, thereby avoiding code reviews. For instance, if you're working on a one-off project that you want people to be able to see, but no one else is necessarily working on, you can always push directly to the branches on my_repo fork. Or, you could also still go through the steps of a PR, but simply merge your own code without CR.