Chapter 13 Adding Parameters
In our analysis pipeline so far we have always passed files to our R scripts.
While we promote this practice, and wrapping important information into .json files,
sometimes it seems like overkill to write a new json file to contain one line of configuration.
An alternative to passing these json files is to use Snakemake’s built in
params
arguments, which are rule specific,
to store information that we want to pass to our R Script.
This goal of this chapter is to show how to use params
to pass across a piece of information.
13.1 Motivating Example: Constructing a Regression Table from OLS Results
So far, we have estimated a series of OLS regressions and stored there output inside the out/analysis
directory.
Typically once we have estimated one or more models, we want to format the output into a regression table that we can insert into a written document like a paper or set of presentation slides.
In the folder src/tables/
we can see that there are a series of R scripts:
which prints to the screen:
tab01_textbook_solow.R tab03_ucc_solow.R tab05_cc_aug_solow.R
tab02_augment_solow.R tab04_cc_solow.R tab06_cc_aug_solow_restr.R
This shows that the example is designed to build 6 tables.
Each table has it’s own script that constructs it.
We will start by constructing Table 1, from tab01_textbook_solow.R
Let’s have a look at what information this script expects us to pass using the help flag:
Usage: src/tables/tab01_textbook_solow.R [options]
Options:
-fp CHARACTER, --filepath=CHARACTER
A directory path where models are saved
-m CHARACTER, --models=CHARACTER
A regex of the models to load
-o CHARACTER, --out=CHARACTER
output file name [default = out.tex]
-h, --help
Show this help message and exit
From this we learn that we need to pass:
--filepath
, which is the directory where our OLS models are stored--models
, a regular expression to tell R which models within the filepath to workwith--out
, a .tex file where we want to direct the output
Now we will work on constructing this rule.
13.2 Creating a Rule with params
We are going to use the params
option to pass across the filepath and the models regular expression into R.
A sketch of the rule we want to create is:
rule textbook_solow:
input:
script = ,
models =
params:
filepath = ,
model_expr =
output:
table = ,
shell:
"Rscript {input.script} \
--filepath {params.filepath} \
--models {params.model_expr} \
--out {output.table}"
There are two important points to notice about how we added params to our rules:
params
are added to the rule in a similar way to inputs and listsparams
are referenced identically to inputs and outputs in the shell command
Now we need to decide what information needs to be entered into each line of our rule. EXPLAIN
Our rule then becomes:
rule textbook_solow:
input:
script = "src/tables/tab01_textbook_solow.R",
models = expand("out/analysis/{iModel}_ols_{iSubset}.rds",
iModel = MODELS,
iSubset = DATA_SUBSET)),
params:
filepath = "out/analysis/",
model_expr = "model_solow*.rds"
output:
table = "out/tables/tab01_textbook_solow.tex"
shell:
"Rscript {input.script} \
--filepath {params.filepath} \
--models {params.model_expr} \
--out {output.table}"
There are two ways to run this rule:
- Tell snakemake to run this rule explicitly,
snakemake textbook_solow
- Because it is not the first rule of the Snakefile it isnt run by default
- Add the output of this rule to the
all
rule.- Adds creating this table to our complete analysis pipeline
We prefer (2).
Hence we also update the all
rule as follows:
rule all:
input:
figs = expand("out/figures/{iFigure}.pdf",
iFigure = FIGURES),
models = expand("out/analysis/{iModel}_ols_{iSubset}.rds",
iModel = MODELS,
iSubset = DATA_SUBSET),
tab01 = "out/tables/tab01_textbook_solow.tex"
If we then do a dry run to see what Snakemake plans to do20:
Building DAG of jobs...
Job counts:
count jobs
1 all
1 textbook_solow
2
[Tue Feb 5 17:42:59 2019]
rule textbook_solow:
input: src/tables/tab01_textbook_solow.R, out/analysis/model_solow_ols_subset_oecd.rds, out/analysis/model_solow_ols_subset_nonoil.rds, out/analysis/model_solow_ols_subset_intermediate.rds, out/analysis/model_aug_cc_restr_ols_subset_oecd.rds, out/analysis/model_aug_cc_restr_ols_subset_nonoil.rds, out/analysis/model_aug_cc_restr_ols_subset_intermediate.rds, out/analysis/model_solow_restr_ols_subset_oecd.rds, out/analysis/model_solow_restr_ols_subset_nonoil.rds, out/analysis/model_solow_restr_ols_subset_intermediate.rds, out/analysis/model_cc_ols_subset_oecd.rds, out/analysis/model_cc_ols_subset_nonoil.rds, out/analysis/model_cc_ols_subset_intermediate.rds, out/analysis/model_ucc_ols_subset_oecd.rds, out/analysis/model_ucc_ols_subset_nonoil.rds, out/analysis/model_ucc_ols_subset_intermediate.rds, out/analysis/model_aug_solow_restr_ols_subset_oecd.rds, out/analysis/model_aug_solow_restr_ols_subset_nonoil.rds, out/analysis/model_aug_solow_restr_ols_subset_intermediate.rds, out/analysis/model_aug_cc_ols_subset_oecd.rds, out/analysis/model_aug_cc_ols_subset_nonoil.rds, out/analysis/model_aug_cc_ols_subset_intermediate.rds, out/analysis/model_aug_solow_ols_subset_oecd.rds, out/analysis/model_aug_solow_ols_subset_nonoil.rds, out/analysis/model_aug_solow_ols_subset_intermediate.rds
output: out/tables/tab01_textbook_solow.tex
jobid: 27
[Tue Feb 5 17:42:59 2019]
localrule all:
input: out/figures/conditional_convergence.pdf, out/figures/unconditional_convergence.pdf, out/figures/aug_conditional_convergence.pdf, out/analysis/model_solow_ols_subset_oecd.rds, out/analysis/model_solow_ols_subset_nonoil.rds, out/analysis/model_solow_ols_subset_intermediate.rds, out/analysis/model_aug_cc_restr_ols_subset_oecd.rds, out/analysis/model_aug_cc_restr_ols_subset_nonoil.rds, out/analysis/model_aug_cc_restr_ols_subset_intermediate.rds, out/analysis/model_solow_restr_ols_subset_oecd.rds, out/analysis/model_solow_restr_ols_subset_nonoil.rds, out/analysis/model_solow_restr_ols_subset_intermediate.rds, out/analysis/model_cc_ols_subset_oecd.rds, out/analysis/model_cc_ols_subset_nonoil.rds, out/analysis/model_cc_ols_subset_intermediate.rds, out/analysis/model_ucc_ols_subset_oecd.rds, out/analysis/model_ucc_ols_subset_nonoil.rds, out/analysis/model_ucc_ols_subset_intermediate.rds, out/analysis/model_aug_solow_restr_ols_subset_oecd.rds, out/analysis/model_aug_solow_restr_ols_subset_nonoil.rds, out/analysis/model_aug_solow_restr_ols_subset_intermediate.rds, out/analysis/model_aug_cc_ols_subset_oecd.rds, out/analysis/model_aug_cc_ols_subset_nonoil.rds, out/analysis/model_aug_cc_ols_subset_intermediate.rds, out/analysis/model_aug_solow_ols_subset_oecd.rds, out/analysis/model_aug_solow_ols_subset_nonoil.rds, out/analysis/model_aug_solow_ols_subset_intermediate.rds, out/tables/tab01_textbook_solow.tex
jobid: 0
Job counts:
count jobs
1 all
1 textbook_solow
2
We see that snakemake only needs to create the table from our newly created rule. Now run snakemake to build the table:
and when finished if we list the contents of out/tables
we see our new regression table has been created:
tab01_textbook_solow.tex
Exercise: Building Table 2
Using the same rule format as above, incorporate params
into a new rule called augment_solow
that constructs Table 2.
An alternative would be to run
snakemake --summary
and examine the output.↩︎