Assembly Quality Control
subworkflows/local/qc_assembly
Steps
-
Generate assembly quality metrics QUAST is used to generate summary assembly metrics such as: N50 value, number of contigs,average depth of coverage and genome size.
-
Assembly filtering Using a custom nexflow DSL (Groovy)script, assemblies are filtered to meet quality thresholds.
-
See nextflow.config in the
quast_filter
section to see what defaults are currently implemented, or to set your own. -
Contamination detection CheckM is run to identify a percent contamination score and build up evidence for signs of contamination in a sample.
-
CheckM can be skipped by adding
--skip_checkm
to the command-line options as the data it generates may not be needed, and it can have a long run time. -
Classic seven gene MLST mlst is run and its outputs are contained within the final report.
-
This step can be skipped by adding
--skip_mlst
to the commmand line options.
Input
- cleaned reads (
fastq
) from theFinalReads
dir - This is the final reads file from the last step in the
Clean Reads
workflow (taking into account any skip flags that have been used) - Contig file (
fasta
) from theFinalAssembly
dir - This is the final contig file from the last step in the CleanAssemble workflow (taking into account any skip flags that have been used)
Outputs
- Assembly
- Quality
- CheckM
- SAMPLE
- bins
- storage
- tree
- SAMPLE
- Quast
- SAMPLE
- CheckM
- Subtyping
- SevenGeneMLST
- mlst