HyDRA
HyDRA is an annotated reference-based bioinformatics pipeline, which analyzes next-generation sequencing data for genotyping HIV-1 drug resistance mutation. It utilizes an annotated HXB2 sequence for reference mapping by Bowtie2, and stringent data quality assurance and variant calling to identify HIV drug resistant (HIVDR) associated mutations based on the Stanford HIV Drug Resistance Database and the 2009 WHO list for Sruveillance of Transmitted HIVDR. All HIVDR mutations found in the pol gene; protease (PR), reverse transcriptase (RT), and integrase (IN) are reported according to classifications outlined in the Stanford Surveillance Drug Resistance Mutation list.
Basic Usage
quasitools hydra [OPTIONS] <FORWARD READS> [REVERSE READS]
Arguments
Forward Reads
The forward or single-end FASTQ-format reads. HyDRA will attempt to identify known drug resistant mutations in these reads.
Reverse Reads
This parameter is optional. If provided, these reads will be assumed to be paired with the provided forward reads and HyDRA will attempt to identify known drug resistant mutations in both sets of reads.
Options
Output
-o, --output DIRECTORY
The location of the output directory, which will contain several files created from HyDRA's operation, including any identified drug resistant mutations.
Mutation Database
-m, --mutation_db FILE
The mutation database describes specific mutations within the named genetic regions specified previously in the BED4 file. When provided to the tool. The entries in the "genetic regions" colummn of this database must match the "names" column of the provided BED4 file. Please refer to Data Formats for more information.
Reporting Threshold
-rt, --reporting_threshold INTEGER
The minimum number of observations required in the read data (FASTQ file) for an entry to be reported in the drug resistance report. Mutations with a number of observations less than this will not be reported. The default value is 1.
Generate Consensus
-gc, --generate_consensus
When this flag is set, the consensus sequence of the provided reads will be reported in the HyDRA output.
Consensus Percentage
-cp, --consensus_pct INTEGER
The minimum percentage of observations for an observation to be incorporated into the consensus. This option must be used with the -gc/--generate_consensus
flag turned.
This option causes the consensus construction to operate in one of two distinct modes. When the percentage is set to exactly 100
, then the consensus is generated using by taking the most abundant base at each position. In contrast, when the percentage is less than 100
, then the consensus is generated by comparing the frequency of each base at a position against the treshold. The default value is 100
.
These two modes of operation are described in greater detail within the description for quasitools consensus.
Quiet
-q, --quiet
This flag is used to suppress all standard output throughout the pipeline. However, this does not affect any file generation.
Trim Reads
-tr, --trim_reads
When this flag is enabled, the pipeline will iteratively trim the ends of reads until they either meet filter values or become too short. If trimmed reads become too short, they will be discarded. When this flag is not enabled, the pipeline will remove reads which do not meet filter values.
Mask Reads
-mr, --mask_reads
This option will mask low quality regions with "N" if they are below the minimum read quality threshold. This option and N-filtering cannot be enabled simultaneously.
Minimum Read Quality
-rq, --min_read_qual INTEGER
When read masking is enabled, this parameter is the minimum quality that a position must have in a read must have. If below this threshold, the position will be masked as an N
.
Length Cutoff
-lc, --length_cutoff INTEGER
Reads which fall short of the specified length will be filtered out.
Mean or Median Score
-me, --median / -mn, --mean
This determines whether the pipeline will use a mean or median (default) score as a cutoff for read filtering.
Score Cutoff
-sc, --score_cutoff INTEGER
Reads that have a median or mean quality score (depending on the score type specified) less than the score cutoff value will be filtered out.
Filter Ns
-n, --ns
If enabled, the pipeline will discard any reads that contain N
characters.
Error Rate
-e, --error_rate FLOAT
The estimated substitution sequencing error rate for the sequencing platform.
Minimum Variant Quality
-vq, --min_variant_qual INTEGER
Minimum quality for an amino acid variant to be included in the produced AAVF (Amino Acid Variant Format) file. This minimum will affect which amino acid variants are reported.
Minimum Depth
-md, --min_dp INTEGER
Minimum required read depth for observed nucleotide variants included for processing in the pipeline.
Minimum Allele Count
-ma, --min_ac INTEGER
The minimum required allele count for observed nucleotide variants to be included for processing in the pipeline.
Minimum Frequency
-mf, --min_freq FLOAT
The minimum required frequency for observed amino acid variants to be included for processing in the pipeline.
ID
-i, --id TEXT
This is used to specifiy a FASTA sequence identifier to be used in the consensus report output.
Output
The output directory location will default to the current directory and will be called data/
. It will include the following output files:
- align.bam
- align.bam.bai
- consensus.fasta
- coverage_file.csv
- dr_report.csv
- filtered.fastq
- hydra.vcf
- mutation_report.aavf
- stats.txt
Example
Data
The following example data may be used to run the tool:
Command
quasitools hydra variant.fastq -o output
Output
Standard Output
# Performing quality control on reads...
# Mapping reads...
# Loading read mappings...
# Identifying variants...
# Masking filtered variants...
# Building amino acid census...
# Finding amino acid mutations...
# Writing drug resistant mutation report...
mutation_report.aavf
##reference=hxb2_pol.fas
##source=quasitools:hydra
##fileformat=AAVFv1.0
##fileDate=20190215
##INFO=<ID=SRVL,Number=.,Type=String,Description="Drug Resistance Surveillance">
##INFO=<ID=AC,Number=.,Type=String,Description="Alternate Codon">
##INFO=<ID=CAT,Number=.,Type=String,Description="Drug Resistance Category">
##INFO=<ID=ACF,Number=.,Type=Float,Description="Alternate Codon Frequency,for each Alternate Codon,in the same order aslisted.">
##INFO=<ID=RC,Number=1,Type=String,Description="Reference Codon">
##FILTER=<ID=af0.01,Description="Set if True; alt_freq<0.01">
#CHROM GENE POS REF ALT FILTER ALT_FREQ COVERAGE INFO
hxb2_pol PR 85 I S PASS 1.0000 144 SRVL=.;AC=aGt;CAT=.;ACF=1.0000;RC=att
hxb2_pol RT 91 Q H PASS 1.0000 131 SRVL=.;AC=caT;CAT=.;ACF=1.0000;RC=caa
hxb2_pol PR 40 G E PASS 1.0000 122 SRVL=.;AC=gAa;CAT=.;ACF=1.0000;RC=gga