HyDRA

HyDRA is an annotated reference-based bioinformatics pipeline, which analyzes next-generation sequencing data for genotyping HIV-1 drug resistance mutation. It utilizes an annotated HXB2 sequence for reference mapping by Bowtie2, and stringent data quality assurance and variant calling to identify HIV drug resistant (HIVDR) associated mutations based on the Stanford HIV Drug Resistance Database and the 2009 WHO list for Sruveillance of Transmitted HIVDR. All HIVDR mutations found in the pol gene; protease (PR), reverse transcriptase (RT), and integrase (IN) are reported according to classifications outlined in the Stanford Surveillance Drug Resistance Mutation list.

Basic Usage

quasitools hydra [OPTIONS] <FORWARD READS> [REVERSE READS]

Arguments

Forward Reads

The forward or single-end FASTQ-format reads. HyDRA will attempt to identify known drug resistant mutations in these reads.

Reverse Reads

This parameter is optional. If provided, these reads will be assumed to be paired with the provided forward reads and HyDRA will attempt to identify known drug resistant mutations in both sets of reads.

Options

Output

-o, --output DIRECTORY

The location of the output directory, which will contain several files created from HyDRA's operation, including any identified drug resistant mutations.

Mutation Database

-m, --mutation_db FILE

The mutation database describes specific mutations within the named genetic regions specified previously in the BED4 file. When provided to the tool. The entries in the "genetic regions" colummn of this database must match the "names" column of the provided BED4 file. Please refer to Data Formats for more information.

Reporting Threshold

-rt, --reporting_threshold INTEGER

The minimum number of observations required in the read data (FASTQ file) for an entry to be reported in the drug resistance report. Mutations with a number of observations less than this will not be reported. The default value is 1.

Generate Consensus

-gc, --generate_consensus

When this flag is set, the consensus sequence of the provided reads will be reported in the HyDRA output.

Consensus Percentage

-cp, --consensus_pct INTEGER

The minimum percentage of observations for an observation to be incorporated into the consensus. This option must be used with the -gc/--generate_consensus flag turned.

This option causes the consensus construction to operate in one of two distinct modes. When the percentage is set to exactly 100, then the consensus is generated using by taking the most abundant base at each position. In contrast, when the percentage is less than 100, then the consensus is generated by comparing the frequency of each base at a position against the treshold. The default value is 100.

These two modes of operation are described in greater detail within the description for quasitools consensus.

Quiet

-q, --quiet

This flag is used to suppress all standard output throughout the pipeline. However, this does not affect any file generation.

Trim Reads

-tr, --trim_reads

When this flag is enabled, the pipeline will iteratively trim the ends of reads until they either meet filter values or become too short. If trimmed reads become too short, they will be discarded. When this flag is not enabled, the pipeline will remove reads which do not meet filter values.

Mask Reads

-mr, --mask_reads

This option will mask low quality regions with "N" if they are below the minimum read quality threshold. This option and N-filtering cannot be enabled simultaneously.

Minimum Read Quality

-rq, --min_read_qual INTEGER

When read masking is enabled, this parameter is the minimum quality that a position must have in a read must have. If below this threshold, the position will be masked as an N.

Length Cutoff

-lc, --length_cutoff INTEGER

Reads which fall short of the specified length will be filtered out.

Mean or Median Score

-me, --median / -mn, --mean

This determines whether the pipeline will use a mean or median (default) score as a cutoff for read filtering.

Score Cutoff

-sc, --score_cutoff INTEGER

Reads that have a median or mean quality score (depending on the score type specified) less than the score cutoff value will be filtered out.

Filter Ns

-n, --ns

If enabled, the pipeline will discard any reads that contain N characters.

Error Rate

-e, --error_rate FLOAT

The estimated substitution sequencing error rate for the sequencing platform.

Minimum Variant Quality

-vq, --min_variant_qual INTEGER

Minimum quality for an amino acid variant to be included in the produced AAVF (Amino Acid Variant Format) file. This minimum will affect which amino acid variants are reported.

Minimum Depth

-md, --min_dp INTEGER

Minimum required read depth for observed nucleotide variants included for processing in the pipeline.

Minimum Allele Count

-ma, --min_ac INTEGER

The minimum required allele count for observed nucleotide variants to be included for processing in the pipeline.

Minimum Frequency

-mf, --min_freq FLOAT

The minimum required frequency for observed amino acid variants to be included for processing in the pipeline.

ID

-i, --id TEXT

This is used to specifiy a FASTA sequence identifier to be used in the consensus report output.

Output

The output directory location will default to the current directory and will be called data/. It will include the following output files:

align.bam
align.bam.bai
consensus.fasta
coverage_file.csv
dr_report.csv
filtered.fastq
hydra.vcf
mutation_report.aavf
stats.txt

Example

Data

The following example data may be used to run the tool:

variant.fastq

Command

quasitools hydra variant.fastq -o output

Output

Standard Output

# Performing quality control on reads...
# Mapping reads...
# Loading read mappings...
# Identifying variants...
# Masking filtered variants...
# Building amino acid census...
# Finding amino acid mutations...
# Writing drug resistant mutation report...

mutation_report.aavf

##reference=hxb2_pol.fas
##source=quasitools:hydra
##fileformat=AAVFv1.0
##fileDate=20190215
##INFO=<ID=SRVL,Number=.,Type=String,Description="Drug Resistance Surveillance">
##INFO=<ID=AC,Number=.,Type=String,Description="Alternate Codon">
##INFO=<ID=CAT,Number=.,Type=String,Description="Drug Resistance Category">
##INFO=<ID=ACF,Number=.,Type=Float,Description="Alternate Codon Frequency,for each Alternate Codon,in the same order aslisted.">
##INFO=<ID=RC,Number=1,Type=String,Description="Reference Codon">
##FILTER=<ID=af0.01,Description="Set if True; alt_freq<0.01">
#CHROM  GENE    POS     REF     ALT     FILTER  ALT_FREQ        COVERAGE        INFO
hxb2_pol        PR      85      I       S       PASS    1.0000  144     SRVL=.;AC=aGt;CAT=.;ACF=1.0000;RC=att
hxb2_pol        RT      91      Q       H       PASS    1.0000  131     SRVL=.;AC=caT;CAT=.;ACF=1.0000;RC=caa
hxb2_pol        PR      40      G       E       PASS    1.0000  122     SRVL=.;AC=gAa;CAT=.;ACF=1.0000;RC=gga