2  How to understand snakemake?

In this course, we would like to show you the mechanics of the aMeta workflow by creating a step-by-step tutorial.

The purpose of this section is to give information about each snakemake type of rule, and to allow you to execute them manually.

Here is a typical snakemake rule from the pipeline:

rule FastQC_BeforeTrimming:
    """Run fastq before trimming"""
    output:
        html="results/FASTQC_BEFORE_TRIMMING/{sample}_fastqc.html",
        zip="results/FASTQC_BEFORE_TRIMMING/{sample}_fastqc.zip",
    input:
        fastq=lambda wildcards: samples.loc[wildcards.sample].fastq,
    conda:
        "../envs/fastqc.yaml"
    envmodules:
        *config["envmodules"]["fastqc"],
    benchmark:
        "benchmarks/FASTQC_BEFORE_TRIMMING/{sample}.benchmark.txt"
    message:
        "FastQC_BeforeTrimming: RUNNING QUALITY CONTROL WITH FASTQC FOR SAMPLE {input.fastq} BEFORE TRIMMING ADAPTERS"
    log:
        "logs/FASTQC_BEFORE_TRIMMING/{sample}.log",
    threads: 2
    shell:
        "fastqc {input.fastq} --threads {threads} --nogroup --outdir results/FASTQC_BEFORE_TRIMMING &> {log}"

There are specific snakemake describers in here. To improve the readability, we won’t show the full rule in the tutorial.

However, in this section, we explain briefly the parameters that will be ignored in the tutorial.

2.1 Using conda package manager

Each rule can use a specific conda subenvironment from the main aMeta conda environment. So, if you run snakemake with the --use-conda flag, the specified environment will be installed and used. In this case, the rule is using the fastqc conda subenvironment.

    conda:
        "../envs/fastqc.yaml"

2.2 Using environment modules

If you want to use environmental modules from your server instead of conda, you need to use the envmodules flag when running snakemake. In this case, you will have to create an envmodule file containing the information on how to load the environmental modules, as explained in the official aMeta GitHub.

This snakemake parameter looks for the specific information on how to load the right environment modules for the fastqc rule.

 envmodules:
        *config["envmodules"]["fastqc"]

2.3 Benchmarks

Some snakemake rules can generate benchmarking statistics and they store it in the benchmarks folder.

    benchmark:
        "benchmarks/FASTQC_BEFORE_TRIMMING/{sample}.benchmark.txt"

2.4 Logs

If you have problems with your snakemake run, each rule will generate a log file. So you can check this file for any error message.

    log:
        "logs/FASTQC_BEFORE_TRIMMING/{sample}.log",

So much for the information about the specific parameters of snakemake in our rules!