Chapter 15 Self Documenting Help

As we build up our Snakemake workflow the Snakefile becomes increasingly complex. It is easy to imagine that a future version of you, or a co-author, might have difficulty understanding what each rule is doing. This chapter introduces an approach to add some comments to the Snakefile to make it easier to navigate. We also add a new help rule so that we can get a summary of the comments printed to screen.

15.1 Our Commenting Syle

If we look at the Snakefile in it’s current state, we see that there are a bare minimum amount of comments already integrated. Because Snakemake is a Python library all comments must begin with one #. We then added some additional formatting of comments to create two types.21 Our first type of comment is a ‘double hash’ - ##. We use this to represent information that someone who reads our code might find helpful to build understanding of the file. For example, the head of the Snakefile contains information on the name of the workflow and the authors:

## Snakemake - MRW Replication
##
## @yourname
##

Second, we have broken up the Snakefile somewhat into sections such as Dictionaries, Build Rules and Clean Rules using the # --- Something --- # notation.

15.2 Adding Further Comments

The information we have added in comments so far is not really enough to help us remember much. What we want is a simple one or two line summary of what each of our rules do so that we can look back at them in the future. We are going to make comments above each rule using the ## notation. The structure will follow a common pattern: rule_name : some description For example, for the textbook_solow rule:

## textbook_solow: construct a table of regression estimates for textbook solow model
rule textbook_solow:
    input:
        script = "src/tables/tab01_textbook_solow.R",
        models = expand("out/analysis/{iModel}_ols_{iSubset}.rds",
                            iModel = MODELS,
                            iSubset = DATA_SUBSET)),
    params:
        filepath   = "out/analysis/",
        model_expr = "model_solow*.rds"
    output:
        table = "out/tables/tab01_textbook_solow.tex"
    shell:
        "Rscript {input.script} \
            --filepath {params.filepath} \
            --models {params.model_expr} \
            --out {output.table}"

Exercise: Adding Comments

Add some comments using our suggested style to each of the Snakemake rules we have constructed so far.

15.3 A help Rule

The Snakefile is now much easier to understand when reading through it. Future you will be grateful for the effort. We can go one step further: construct a rule that prints the output of these comments to the screen. Let’s call this rule help, so that if we run snakemake help all our useful comments will be printed to screen. We usually put this help rule at the bottom of our Snakefile, so that we don’t keep bumping into it as we work. The rule has the following structure:

# --- Help Rules --- #
## help               : prints help comments for Snakefile
rule help:
    input: "Snakefile"
    shell:
        "sed -n 's/^##//p' {input}"

The rule depends on the Snakefile itself - because if the file changes, the might too. The shell comand that we use is a little confusing when we look at it the first time. Intuitively heres what it is doing:

  • The comments we want to print are those that have the double hash notation, ##.
  • sed is a progam that can perform text manipulations to some input
    • We want it to find all lines starting with ## in our input file …
    • … and print out the remainder of those lines

Let’s see it in action:

$ snakemake help
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   help
    1

[Wed Feb  6 19:13:51 2019]
rule help:
    input: Snakefile
    jobid: 0

 Snakemake - MRW Replication

 @yourname

 all                : builds all final outputs
 augment_solow      : construct a table of estimates for augmented solow model
 textbook_solow     : construct a table of regression estimates for textbook solow model
 make_figs          : builds all figures
 figures            : recipe for constructing a figure (cannot be called)
 estimate_models    : estimates all regressions
 ols_models         : recipe for estimating a single regression (cannot be called)
 gen_regression_vars: creates variables needed to estimate a regression
 rename_vars        : creates meaningful variable names
 clean              : removes all content from out/ directory
 help               : prints help comments for Snakefile
[Wed Feb  6 19:13:51 2019]
Finished job 0.
1 of 1 steps (100%) done

We see that after the usual Snakemake output telling us what it is doing, our comments are printed to screen.

Exercise: Saving help output to file

Modify the help rule so that the printed content is saved in a file called HELP.txt


  1. Do notice that the exact formatting of all this help is totally arbitrary. Noone tells you that two #’s or a mix of #’s and -’s are meaningful, correct or standard ways of putting comments in a file. Our choice reflects path dependence in the way we started writing Snakefiles, and we have found them useful and persisted with our initial choice. Feel free to create your own. The only important thing is the line starts with one # so that Snakemake doesnt try and execute that line as code.↩︎