Skip to content

Assembly Quality Control

subworkflows/local/qc_assembly

Steps

  1. Generate assembly quality metrics QUAST is used to generate summary assembly metrics such as: N50 value, number of contigs,average depth of coverage and genome size.

  2. Assembly filtering Using a custom nexflow DSL (Groovy)script, assemblies are filtered to meet quality thresholds.

  3. See nextflow.config in the quast_filter section to see what defaults are currently implemented, or to set your own.

  4. Contamination detection CheckM is run to identify a percent contamination score and build up evidence for signs of contamination in a sample.

  5. CheckM can be skipped by adding --skip_checkm to the command-line options as the data it generates may not be needed, and it can have a long run time.

  6. Classic seven gene MLST mlst is run and its outputs are contained within the final report.

  7. This step can be skipped by adding --skip_mlst to the commmand line options.

Input

  • cleaned reads (fastq) from the FinalReads dir
  • This is the final reads file from the last step in the Clean Reads workflow (taking into account any skip flags that have been used)
  • Contig file (fasta) from the FinalAssembly dir
  • This is the final contig file from the last step in the CleanAssemble workflow (taking into account any skip flags that have been used)

Outputs

  • Assembly
  • Quality
    • CheckM
      • SAMPLE
        • bins
        • storage
        • tree
    • Quast
      • SAMPLE
  • Subtyping
  • SevenGeneMLST
  • mlst