Tool Specific Parameters

To access tool specific parameters from the command line you must use the dot operator. For organization and readability sake, the below documentation is nested to indicate where the dot operator is used. For example:

- quast
    - min_contig_length NUM

Translates to --quast.min_contig_length NUM on the CLI.

Note: Easily changed parameters are bolded. Sensible defaults are provided.

Abricate

Screens contigs for antimicrobial and virulence genes. If you wish to use a different Abricate database you may need to update the container you use.

abricate
singularity: Abricate singularity container
docker: Abricate docker container
args: Can be a string of additional command line arguments to pass to abricate
report_tag: determines the name of the Abricate output in the final summary file. Do not change this unless doing pipeline development.
header_p: This field tells the report module that the Abricate output contains headers. Do not change this unless doing pipeline development.

Raw Read Metrics

A custom Python script that gathers quality metrics for each fastq file.

raw_reads
high_precision: When set to true, floating point precision of values output are accurate down to very small decimal places. Recommended to leave this setting as false (use the standard floats), it is much faster and having such precise decimal places is not needed for this module.
report_tag: this field determines the name of the Raw Read Metric field in the final summary report. Do not change this unless doing pipeline development.

Coreutils

In cases where a process uses bash scripting only, Nextflow by default will utilize system binaries when they are available and no container is specified. For reproducibility, we have chosen to use containers in such cases. When a better container is available, you can direct the pipeline to use it via below commands:

coreutils
singularity: coreutils singularity container
docker: coreutils docker container

Python

Some scripts require Python3, therefore a well tested Python3 container is provided for reproducibility. However, as all the scripts within mikrokondo use only the standard library you can swap these containers to use any python interpreter version. For instance, swapping in pypy3 may result a massive performance boost from the scripts, though this is currently untested.

python3
singularity: Python3 singularity container
docker: Python3 docker container

KAT

Kat was previously used to estimate genome size, however at the time of writing KAT appears to be only infrequently updated and newer versions would have issues running/sometimes giving an incorrect output due to failures in peak recognition. Therefore, KAT has been removed from the pipeline, It's code still remains but it will be removed in the future.

Seqtk

Seqtk is used for both the sub-sampling of reads and conversion of fasta files to fastq files in mikrokondo. The usage of seqtk to convert a fasta to a fastq is needed in certain typing tools requiring reads as input (this was a design decision to keep the pipeline generalizable).

seqtk
singularity: Singularity container for seqtk
docker: Docker container for seqtk
seed: A seed value for sub-sampling
reads_ext: Extension of reads after sub-sampling, do not touch alter this unless doing pipeline development.
assembly_fastq: Extension of the fastas after being converted to fastq files. Do not change this unless doing pipeline development.
report_tag: Name of seqtk data in the final summary report. Do not change this unless doing pipeline development.

Rasusa

For long read data Rasusa is used for down sampling as it take read length into consideration when down sampling.

rasusa
singularity: singularity container for rasusa
docker: docker container for rasusa
seed: A seed value for sub-sampling
reads_ext: The extension of the generated fastq files. Do not change this unless doing pipeline development.

FastP

Fastp is fast and widely used program for gathering of read quality metrics, adapter trimming, read filtering and read trimming. FastP has extensive options for configuration which are detailed in their documentation, but sensible defaults have been set. Adapter trimming in Fastp is performed using overlap analysis, however if you do not trust this you can specify the sequencing adapters used directly in the additional arguments for Fastp.

fastp
singularity: singularity container for FastP
docker: docker container for FastP
fastq_ext: extension of the output Fastp trimmed reads, Do not change this unless doing pipeline development.
html_ext: Extension of the html report output by fastp, Do not touch unless doing pipeline development.
json_ext: Extension of json report output by FastP do not touch unless doing pipeline development.
report_tag: Title of FastP data in the summary report.
average_quality_e: If a read/read-pair quality is less than this value it is discarded. Can be set from the command line with --fp_average_quality.
cut_tail_mean_quality: The mean quality threshold for a sliding window below which trailing bases are trimmed from the reads. Can be set from the command line with --fp_cut_tail_mean_quality
cut_tail_window_size: The window size to cut a tail with. Can be set from the command line with --fp_cut_tail_window_size.
complexity_threshold: the threshold for low complexity filter. Can be set from the command line with --fp_complexity_threshold.
qualified_quality_phred: the quality of a base to be qualified if filtering by unqualified bases. Can be set from the command line with --fp_qualified_phred.
unqualified_percent_limit: The percent amount of bases that are allowed to be unqualified in a read. This parameter is affected by the above qualified_quality_phred parameter and can be specified from the command line with --fp_unqualified_percent_limit.
polyg_min_len: The minimum length to detect a polyG tail. This value can be set from the command line with --fp_polyg_min_len.
polyx_min_len: The minimum length to detect a polyX tail. This value can be set from the command line with --fp_polyx_min_len.
illumina_length_min: The minimum read length to be allowed in illumina data. This value can be set from the command line with --fp_illumina_length_min.
illumina_length_max: The maximum read length allowed for illumina data. This value can be set from the command line with --fp_illumina_length_max.
single_end_length_min: the minimum read length allowed in Pacbio or Nanopore data. This value can be set from the command line with --fp_single_end_length_min.
dedup_reads: A parameter to be turned on to allow for deduplication of reads. This value can be set from the command line with --fp_dedup_reads.
illumina_args: The command string passed to Fastp when using illumina data, if you override this parameter other set parameters such as average_quality_e must be overridden as well as the command string will be passed to FastP as written
single_end_args: The command string passed to FastP if single end data is used e.g. Pacbio or Nanopore data. If this option is overridden you must specify all parameters passed to Fastp as this string is passed to FastP as written.
report_exclude_fields: Fields in the summary json to be excluded from the final aggregated report. Do not alter this field unless doing pipeline development

Chopper

Chopper was originally used for trimming of Nanopore reads, but FastP was able to do the same work so Chopper is no longer used. Its code currently remains but it cannot be run in the pipeline.

Flye

Flye is used for assembly of Nanopore data.

flye
nanopore
- raw: corresponds to the option in Flye of --nano-raw
- corr: corresponds to the option in Flye of --nano-corr
- hq: corresponds to the option in Flye of --nano-hq
pacbio
- raw: corresponds to the option in Flye of --pacbio-raw
- corr: corresponds to the option in Flye of --pacbio-corr
- hifi: corresponds to the option in Flye of --pacbio-hifi
singularity: Singularity container for Flye
docker: Docker container for Flye
fasta_ext: The file extension for fasta files. Do not alter this field unless doing pipeline development
gfa_ext: The file extension for gfa files. Do not alter this field unless doing pipeline development
gv_ext: The file extension for gv files. Do not alter this field unless doing pipeline development
txt_ext: the file extension for txt files. Do not alter this field unless doing pipeline development
log_ext: the file extension for the Flye log files. Do not alter this field unless doing pipeline development
json_ext: the file extension for the Flye json files. Do not alter this field unless doing pipeline development
polishing_iterations: The number of polishing iterations for Flye.
ext_args: Extra command line options to pass to Flye

Spades

Used for paired end read assembly

spades
singularity: Singularity container for spades
docker: Docker container for spades
scaffolds_ext: The file extension for the scaffolds file. Do not alter this field unless doing pipeline development
contigs_ext: The file extension containing assembled contigs. Do not alter this field unless doing pipeline development
transcripts_ext: The file extension for the assembled transcripts. Do not alter this field unless doing pipeline development
assembly_graphs_ext: the file extension of the assembly graphs. Do not alter this field unless doing pipeline development
log_ext: The file extension for the log files. Do not alter this field unless doing pipeline development
outdir: The name of the output directory for assemblies. Do not alter this field unless doing pipeline development

FastQC

This is a default tool added to nf-core pipelines. This feature will likely be removed in the future but for those fond of it, the outputs of FastQC still remain.

fastqc
html_ext: The file extension of the fastqc html file. Do not alter this field unless doing pipeline development
zip_ext: The file extension of the zipped FastQC outputs. Do not alter this field unless doing pipeline development

Quast

Quast is used to gather assembly metrics which automated quality control criteria are the applied too.

quast
singularity: Singularity container for quast.
docker: Docker container for quast.
suffix: The suffix attached to quast outputs. Do not alter this field unless doing pipeline development.
report_base: The base term for output quast files to be used in reporting. Do not alter this field unless doing pipeline development.
report_prefix: The prefix of the quast outputs to be used in reporting. Do not alter this field unless doing pipeline development.
min_contig_length: The minimum length of for contigs to be used in quasts generation of metrics. Do not alter this field unless doing pipeline development. This argument can be set from the command line with --qt_min_contig_length.
args: A command string to past to quast, altering this is unadvised as certain options may affect your reporting output. This string will be passed to quast verbatim. Do not alter this field unless doing pipeline development.
header_p: This tells the pipeline that the Quast report outputs contains a header. Do not alter this field unless doing pipeline development.

Quast Filter

Assemblies can be prevented from going into further analyses based on the Quast output. The options for the mentioned filter are listed here.

quast_filter
n50_field: The name of the field to search for and filter. Do not alter this field unless doing pipeline development.
n50_value: The minimum value the field specified is allowed to contain.
nr_contigs_field: The name of field in the Quast report to filter on. Do not alter this field unless doing pipeline development.
nr_contigs_value: The minimum number of contigs an assembly must have to proceed further through the pipeline.
sample_header: The column name in the Quast output containing the sample information. Do not alter this field unless doing pipeline development.

CheckM2

CheckM2 is used within the pipeline for assessing contamination in assemblies.

checkm2
singularity: Singularity container containing CheckM
docker: Docker container containing CheckM
download_link: The link used to pull the checkm2 model if selected.
report_tag: Name of the data outputs in the final report. Do not touch this field.
header_p: Denotes that the result used by the pipeline in generation of the summary report contains a header. Do not alter this field unless doing pipeline development.

Kraken2

Kraken2 can be used a substitute for mash in speciation of samples, and it is used to bin contigs of metagenomic samples.

kraken
singularity: Singularity container for the Kraken2.
docker: Docker container for Kraken2.
classified_suffix: Suffix for classified data from Kraken2. Do not alter this field unless doing pipeline development.
unclassified_suffix: Suffix for unclassified data from Kraken2. Do not alter this field unless doing pipeline development.
report_suffix: The name of the report output by Kraken2.
output_suffix: The name of the output file from Kraken2. Do not alter this field unless doing pipeline development.
tophit_level: The taxonomic level to classify a sample at. e.g. default is S for species but you could use S1 or F.
save_output_fastqs: Option to save the output fastq files from Kraken2. Do not alter this field unless doing pipeline development.
save_read_assignments: Option to save how Kraken2 assigns reads. Do not alter this field unless doing pipeline development.
run_kraken_quick: This option can be set to true if one wishes to run Kraken2 in quick mode.
report_tag: The name of the Kraken2 data in the final report. Do not alter this field unless doing pipeline development.
header_p: Tell the pipeline that the file used for reporting does or does not contain header data. Do not alter this field unless doing pipeline development.
headers: A list of headers in the Kraken2 report. Do not alter this field unless doing pipeline development.

Seven Gene MLST

Run Torsten Seemann's seven gene MLST program.

mlst
singularity: Singularity container for mlst.
docker: Docker container for mlst.
args: Additional arguments to pass to mlst.
tsv_ext: Extension of the mlst tabular file. Do not alter this field unless doing pipeline development.
json_ext: Extension of the mlst output JSON file. Do not alter this field unless doing pipeline development.
report_tag: Name of the data outputs in the final report. Do not alter this field unless doing pipeline development.

Mash

Mash is used repeatedly throughout the pipeline for estimation of genome size from reads, contamination detection and for determining the final species of an assembly.

mash
singularity: Singularity container for mash.
docker: Docker container for mash.
mash_ext: Extension of the mash screen file. Do not alter this field unless doing pipeline development.
output_reads_ext: Extension of mash outputs when run on reads. Do not alter this field unless doing pipeline development.
output_taxa_ext: Extension of mash output when run on contigs. Do not alter this field unless doing pipeline development.
mash_sketch: The GTDB sketch used by the pipeline, this sketch is special as it contains the taxonomic paths in the classification step of the pipeline. It can as of 2023-10-05 be found here: https://zenodo.org/record/8408361
sketch_ext: File extension of a mash sketch. Do not alter this field unless doing pipeline development.
json_ext: File extension of json data output by Mash. Do not alter this field unless doing pipeline development.
sketch_kmer_size: The size of the kmers used in the sketching in genome size estimation.
min_kmer: The minimum number of kmer copies required to pass the noise filter. this value is used in estimation of genome size from reads. The default value is 10 as it seems to work well for Illumina data. This value can be set from the command line by setting --mh_min_kmer.
final_sketch_name: to be removed This parameter was originally part of a subworkflow included in the pipeline for generation of the GTDB sketch. But this has been removed and replaced with scripting.
report_tag: Report tag for Mash in the summary report. Do not alter this field unless doing pipeline development.
header_p: Tells the pipeline if the output data contains headers. Do not alter this field unless doing pipeline development.
headers: A list of the headers the output of mash should contain. Do not alter this field unless doing pipeline development.

Mash Meta

This process is used to determine if a sample is metagenomic or not.

mash_meta.
report_tag: The name of this output field in the summary report. Do not alter this field unless doing pipeline development.

top_hit_species

As Kraken2 of Mash can be used for determining the species present in the pipeline, the share a common report tag.

top_hig_species
report_tag: The name of the determined species in the final report. Do not alter this field unless doing pipeline development.

Contamination Removal

This step is used to remove contaminants from read data, it exists to perform dehosting, and removal of kitomes.

r_contaminants
singularity: Singularity container used to perform dehosting, this container contains minimap2 and samtools.
docker: Docker container used to perform dehosting, this container contains minimap2 and samtools.
phix_fa: The path to file containing the phiX fasta.
homo_sapiens_fa: The path to file containing the human genomes fasta.
pacbio_mg: The path to file containing the pacbio sequencing control.
output_ext: The extension of the deconned fastq files. Do not alter this field unless doing pipeline development.
mega_mm2_idx: The path to the minimap2 index used for dehosting. Do not alter this field unless doing pipeline development.
mm2_illumina: The arguments passed to minimap2 for Illumina data. Do not alter this field unless doing pipeline development.
mm2_pac: The arguments passed to minimap2 for Pacbio Data. Do not alter this field unless doing pipeline development.
mm2_ont: The arguments passed to minimap2 for Nanopore data. Do not alter this field unless doing pipeline development.
samtools_output_ext: The extension of the output from samtools. Do not alter this field unless doing pipeline development.
samtools_singletons_ext: The extension of singleton reads from samtools. Do not alter this field unless doing pipeline development.
output_ext: The name of the files output from samtools. Do not alter this field unless doing pipeline development.
output_dir: The directory where deconned reads are placed. Do not alter this field unless doing pipeline development.

Minimap2

Minimap2 is used frequently throughout the pipeline for decontamination and mapping reads back to assemblies for polishing.

minimap2
singularity: The singularity container for minimap2, the same one is used for contamination removal.
docker: The Docker container for minimap2, the same one is used for contamination removal.
index_outdir: The directory where created indices are output. Do not alter this field unless doing pipeline development.
index_ext: The file extension of create indices. Do not alter this field unless doing pipeline development.

Samtools

Samtools is used for sam to bam conversion in the pipeline.

samtools
singularity: The Singularity container containing samtools, the same container is used as the one in contamination removal.
docker: The Docker container containing samtools, the same container is used as the on in contamination removal.
bam_ext: The extension of the bam file from samtools. Do not alter this field unless doing pipeline development.
bai_ext: the extension of the bam index from samtools. Do not alter this field unless doing pipeline development.

Racon

Racon is used as a first pass for polishing assemblies.

racon
singularity: The Singularity container containing racon.
docker: The Docker container containing racon.
consensus_suffix: The suffix for racons outputs. Do not alter this field unless doing pipeline development.
consensus_ext: The file extension for the racon consensus sequence. Do not alter this field unless doing pipeline development.
outdir: The directory containing the polished sequences. Do not alter this field unless doing pipeline development.

Pilon

Pilon was added to the pipeline, but it is run iteratively which at the time of writing this pipeline was not well supported in Nextflow so a separate script and containers are provided to utilize Pilon. The code for Pilon remains in the pipeline so that when able to do so easily, iterative Pilon polishing can be integrated directly into the pipeline.

Pilon Iterative Polishing

This process is a wrapper around minimap2, samtools and Pilon for iterative polishing containers are built but if you ever have problems with this step, disabling polishing will fix your issue (at the cost of polishing).

pilon_iterative
singularity: The container containing the iterative pilon program. If you ever have issues with the singularity image you can use the Docker image as Nextflow will automatically convert the docker image into a singularity image.
docker: The Docker container for the Pilon iterative polisher.
outdir: The directory where polished data is output. Do not alter this field unless doing pipeline development.
fasta_ext: File extension for the fasta to be polished. Do not alter this field unless doing pipeline development.
fasta_outdir: The output directory name for the polished fastas. Do not alter this field unless doing pipeline development.
vcf_ext: File extension for the VCF output by Pilon. Do not alter this field unless doing pipeline development.
vcf_outdir: output directory containing the VCF files from Pilon. Do not alter this field unless doing pipeline development.
bam_ext: Bam file extension. Do not alter this field unless doing pipeline development.
bai_ext: Bam index file extension. Do not alter this field unless doing pipeline development.
changes_ext: File extensions for the pilon output containing the changes applied to the assembly. Do not alter this field unless doing pipeline development.
changes_outdir: The output directory for the pilon changes. Do not alter this field unless doing pipeline development.
max_memory_multiplier: On failure this program will try again with more memory, the multiplier is the factor that the amount of memory passed to the program will be increased by. Do not alter this field unless doing pipeline development.
max_polishing_illumina: Number of iterations for polishing an illumina assembly with illumina reads.
max_polishing_nanopore: Number of iterations to polish a Nanopore assembly with (will use illumina reads if provided).
max_polishing_pacbio: Number iterations to polish assembly with (will use illumina reads if provided).

Medaka Polishing

Medaka is used for polishing of Nanopore assemblies, make sure you specify a medaka model when using the pipeline so the correct settings are applied. If you have issues with Medaka running, try disabling resume or alternatively disable polishing as Medaka can be troublesome to run.

medaka
singularity: Singularity container with Medaka.
docker: Docker container with Medaka.
model: This parameter will be auto filled with the model specified at the top level by the nanopore_chemistry option. Do not alter this field unless doing pipeline development.
fasta_ext: Polished fasta output. Do not alter this field unless doing pipeline development.
batch_size: The batch size passed to medaka, this can improve performance. Do not alter this field unless doing pipeline development.

Unicycler

Unicycler is an option provided for hybrid assembly, it is a great option and outputs an excellent assembly but it requires A lot of resources. Which is why the alternate hybrid assembly option using Flye->Racon->Pilon is available. As well there can be a fairly cryptic Spades error generated by Unicycler that usually relates to memory usage, it will typically say something involving tputs.

unicycler
singularity: The Singularity container containing Unicycler.
docker: The Docker container containing Unicycler.
scaffolds_ext: The scaffolds file extension output by unicycler. Do not alter this field unless doing pipeline development.
assembly_ext: The assembly extension output by Unicycler. Do not alter this field unless doing pipeline development.
log_ext: The log file output by Unicycler. Do not alter this field unless doing pipeline development.
outdir: The output directory the Unicycler data is sent to. Do not alter this field unless doing pipeline development.
mem_modifier: Specifies a high amount of memory for Unicycler to prevent a common spades error that is fairly cryptic. Do not alter this field unless doing pipeline development.
threads_increase_factor: Factor to increase the number of threads passed to Unicycler. Do not alter this field unless doing pipeline development.

Mob-suite Recon

mob-suite recon provides annotation of plasmids in the assembly data.

mobsuite_recon
singularity: The singularity container containing mob-suite recon.
docker: The Docker container containing mob-suite recon.
args: Additional arguments to pass to mobsuite.
fasta_ext: The file extension for FASTAs. Do not alter this field unless doing pipeline development.
results_ext: The file extension for results in mob-suite. Do not alter this field unless doing pipeline development.
mob_results_file: The final results to be included in the final report by mob-suite. Do not alter this field unless doing pipeline development.
report_tag: The field name of mob-suite data in the final report. Do not alter this field unless doing pipeline development.
header_p: Default is true and indicates that the results output by mob-suite contains a header. Do not alter this field unless doing pipeline development.

StarAMR

StarAMR provides annotation of antimicrobial resistance genes within your data. The process will alter FASTA headers of input files to ensure the header length <50 characters long.

staramr
singularity: The singularity container containing staramr.
docker: The Docker container containing StarAMR.
db: The database for StarAMR. The default value of null tells the pipeline to use the database included in the StarAMR container. However you can specify a path to a valid StarAMR database and use that instead.
tsv_ext: File extension of the reports from StarAMR. Do not alter this field unless doing pipeline development.
txt_ext: File extension of the text reports from StarAMR. Do not alter this field unless doing pipeline development.
xlsx_ext: File extension of the excel spread sheet from StarAMR. Do not alter this field unless doing pipeline development.
args: Additional arguments to pass to StarAMR. Do not alter this field unless doing pipeline development.
point_finder_dbs: A list containing the valid databases StarAMR supports for pointfinder. The way they are structured matches what StarAMR needs for input. Do not alter this field unless doing pipeline development.
report_tag: The field name of StarAMR in the final summary report. Do not alter this field unless doing pipeline development.
header_p: Indicates the final report from StarAMR contains a header line. Do not alter this field unless doing pipeline development.

Bakta

Bakta is used to provide annotation of genomes, it is very reliable but it can be slow.

bakta
singularity: The singularity container containing Bakta.
docker: The Docker container containing Bakta.
db: the path where the downloaded Bakta database should be downloaded. This can be set from the command line using the argument --bakta_db.
output_dir: The name of the folder where Bakta data is saved too. Do not alter this field unless doing pipeline development.
embl_ext: File extension of embl file. Do not alter this field unless doing pipeline development.
faa_ext: File extension of faa file. Do not alter this field unless doing pipeline development.
ffn_ext: File extension of the ffn file. Do not alter this field unless doing pipeline development.
fna_ext: File extension of the fna file. Do not alter this field unless doing pipeline development.
gbff_ext: File extension of gbff file. Do not alter this field unless doing pipeline development.
gff_ext: File extension of GFF file. Do not alter this field unless doing pipeline development.
threads: Number of threads for Bakta to use, remember more is not always better. Do not alter this field unless doing pipeline development.
hypotheticals_tsv_ext: File extension for hypothetical genes. Do not alter this field unless doing pipeline development.
hypotheticals_faa_ext: File extension of hypothetical genes fasta. Do not alter this field unless doing pipeline development.
tsv_ext: The file extension of the final bakta tsv report. Do not alter this field unless doing pipeline development.
txt_ext: The file extension of the txt report. Do not alter this field unless doing pipeline development.
min_contig_length: The minimum contig length to be annotated by Bakta. This can be set from the command line using the argument --ba_min_contig_length.

Bandage

Bandage is included to make bandage plots of the initial assemblies e.g. Spades, Flye or Unicycler. These images can be useful in determining the quality of an assembly.

bandage
singularity: The path to the singularity image containing bandage.
docker: The path to the docker file containing bandage.
svg_ext: The extension of the SVG file created by bandage. Do not alter this field unless doing pipeline development.
outdir: The output directory of the bandage images.

Subtyping Report

All sub typing report tools contain a common report tag so that they can be identified by the program.

subtyping_report
report_tag: Subtyping report name. Do not alter this field unless doing pipeline development.

ECTyper

ECTyper is used to perform in-silico typing of Escherichia and is automatically triggered by the pipeline.

ectyper
singularity: The path to the singularity container containing ECTyper.
docker: The path to the Docker container containing ECTyper.
log_ext: File extension of the ECTyper log file. Do not alter this field unless doing pipeline development.
tsv_ext: File extension of the ECTyper text file. Do not alter this field unless doing pipeline development.
txt_ext: Text file extension of ECTyper output. Do not alter this field unless doing pipeline development.
report_tag: Report tag for ECTyper data. Do not alter this field unless doing pipeline development.
header_p: denotes if the table output from ECTyper contains a header. Do not alter this field unless doing pipeline development.
ec_opid`: The minimum percent identity to determine an O antigens presence, It must be an integer.
ec_opcov: The minimum percent coverage of O antigen, It must be an integer.
ec_hpid: The minimum percent identity to determine an H antigens presence, It must be an integer.
ec_hcov: The minimum percent coverage of the H antigen, It must be an integer.
ec_enable_verification: A boolean value to enable species verification in ECTyper.

Kleborate

Kleborate performs automatic typing of Kelbsiella.

kleborate
singularity: The path to the singularity container containing Kleborate.
docker: The path to the docker container containing Kleborate.
txt_ext: The subtyping report tag for Kleborate. Do not alter this field unless doing pipeline development.
report_tag: The report tag for Kleborate. Do not alter this field unless doing pipeline development.
header_p: Denotes the Kleborate table contains a header. Do not alter this field unless doing pipeline development.

Spatyper

Performs typing of Staphylococcus species.

spatyper
singularity: The path to the singularity container containing Spatyper.
docker: The path to docker container containing Spatyper.
tsv_ext: The file extension of the Spatyper output. Do not alter this field unless doing pipeline development.
report_tag: The report tag for Spatyper. Do not alter this field unless doing pipeline development.
header_p: denotes whether or not the output table contains a header. Do not alter this field unless doing pipeline development.
repeats: An optional file specifying repeats can be passed to Spatyper.
repeat_order: An optional file containing a repeat order to pass to Spatyper.

SISTR

In-silico Salmonella serotype prediction.

sistr
singularity: The path to the singularity container containing SISTR.
docker: The path to the Docker container containing SISTR.
tsv_ext: The file extension of the SISTR output. Do not alter this field unless doing pipeline development.
allele_fasta_ext: The extension of the alleles identified by SISTR. Do not alter this field unless doing pipeline development.
allele_json_ext: The extension to the output JSON file from SISTR. Do not alter this field unless doing pipeline development.
cgmlst_tag: The extension of the CGMLST file from SISTR. Do not alter this field unless doing pipeline development.
report_tag: The report tag for SISTR. Do not alter this field unless doing pipeline development.
header_p: Denotes whether or not the output table contains a header. Do not alter this field unless doing pipeline development.

Lissero

in-silico Listeria typing.

lissero
singularity: The path to the singularity container containing Lissero.
docker: The path to the docker container containing Lissero.
tsv_ext: The file extension of the Lissero output. Do not alter this field unless doing pipeline development.
report_tag: The report tag for Lissero. Do not alter this field unless doing pipeline development.
header_p: Denotes if the output table of Lissero contains a header. Do not alter this field unless doing pipeline development.

Shigeifinder

in-silico Shigella typing.

NOTE: It is unlikely this subtyper will be triggered as GTDB has merged E.coli and Shigella in an updated sketch. An updated version of ECTyper will be released soon to address the shortfalls of this sketch. If you are relying on Shigella detection add --run_kraken true to your command line or update the value in the .nextflow.config as Kraken2 (while slower) can still detect Shigella.

shigeifinder
singularity: The Singularity container containing Shigeifinder.
docker: The path to the Docker container containing Shigeifinder.
container_version: The version number to be updated with the containers as Shigeifinder does not currently have a version number tracked in the command.
tsv_ext: Extension of output report.
report_tag: The name of the output report for shigeifinder.
header_p: Denotes that the output from Shigeifinder includes header values.

Shigatyper (Replaced with Shigeifinder)

Code still remains but it will likely be removed later on.

shigatyper
singularity: The Singularity container containing Shigatyper.
docker: The path to the Docker container containing Shigatyper.
tsv_ext: The tsv file extension. Do not alter this field unless doing pipeline development.
report_tag: The report tag for Shigatyper. Do not alter this field unless doing pipeline development.
header_p: Denotes if the report output contains a header. Do not alter this field unless doing pipeline development.

Kraken2 Contig Binning

Bins contigs based on the Kraken2 output for contaminated/metagenomic samples. This is implemented by using a custom script.

kraken_bin
taxonomic_level: The taxonomic level to bin contigs at. Binning at species level is not recommended the default is to bin at a genus level which is species by a character of G. To bin at a higher level such as family you would specify F.
fasta_ext: The extension of the fasta files output. Do not alter this field unless doing pipeline development.

Locidex (Allele Calling)

Parameters for use of locidex in allele calling.

Locidex
singularity: The Singularity container containing Locidex.
docker: The path to the Docker container containing Locidex.
private_repository: The path to the Docker container containing Locidex in a private repository (this helps in cloud execution environments).
min_evalue = See --lx_min_evalue.
min_dna_len = See --lx_min_dna_len.
min_aa_len = See --lx_min_aa_len.
max_dna_len = See --lx_max_dna_len.
max_aa_len = See --lx_max_aa_len.
min_dna_ident = See --lx_min_dna_ident.
min_aa_ident = See --lx_min_aa_ident.
min_dna_match_cov = See --lx_min_dna_match_cov.
min_aa_match_cov = See --lx_min_aa_match_cov
max_target_seqs = See --lx_max_target_seqs.
extraction_mode = See --lx_extraction_mode.
report_mode = See --lx_report_mode.
report_prop = See --lx_report_prop.
report_max_ambig = See --lx_report_max_ambig.
report_max_stop = See --lx_report_max_stop.
allele_database = See --lx_allele_database.
date_format_string: The date format used in parsing the locidex manifest.json file. Do not alter this field unless doing pipeline development.
manifest_db_path: Do not alter this field unless doing pipeline development.
manifest_config_key: The name of key holding config data. Do not alter this field unless doing pipeline development.
manifest_config_name: The name field to use in the locidex manifest.json file for db identification. Do not alter this field unless doing pipeline development.
manifest_config_version: Config key field containing the version information for locidex. Do not alter this field unless doing pipeline development.
manifest_name: The name of the manifest.json file for locidex. Do not alter this field unless doing pipeline development.
config_data_file: The name of the locidex database file containing config information. Do not alter this field unless doing pipeline development.
database_config_value_date: Name of the field containing the date in the locidex manifest.json. Do not alter this field unless doing pipeline development.
extracted_seqs_suffix: Extracted sequences file suffix. Do not alter this field unless doing pipeline development.
seq_store_suffix: Seq store suffix. Do not alter this field unless doing pipeline development.
gbk_suffix: Extension name of the generated GBK file. Do not alter this field unless doing pipeline development.
extraction_dir: Directory name of the locidex extract outputs. Do not alter this field unless doing pipeline development.
report_suffix: Report suffix of the locidex outputs. Do not alter this field unless doing pipeline development.
db_config_output_name: Output name of the selected database used for locidex. Do not alter this field unless doing pipeline development.
report_tag: The report tag for Locidex Report. Do not alter this field unless doing pipeline development.

Locidex Summary

The information used in creating a summary of the locidex outputs.

locidex_summary
report_tag: The report tag for the locidex summary. Do not alter this field unless doing pipeline development.
data_key: The field containing the relevant data to summarize. Do not alter this field unless doing pipeline development.
data_profile_key: The key containing the profile information. Do not alter this field unless doing pipeline development.
data_sample_key: The name of the key containing the sample info. Do not alter this field unless doing pipeline development.
missing_allele_value: The field used for the missing allele value. Do not alter this field unless doing pipeline development.
reportable_alleles: A list of alleles to show their presence or absence of in the final output.
report_exclude_fields: Fields to exclude from the final summary report. Do not alter this field unless doing pipeline development.