aavar

Call amino acid mutations from a BAM alignment file and a supplied reference file. Please refer to Data Formats for detailed information about the the expected input formats for this tool.

Basic Usage

quasitools aavar [options] <BAM file> <reference file> <bed file> [variants file] [mutation db]

Arguments

BAM File

A BAM file (.bam) of sequences aligned to a related reference. A BAM index file (.bai) is also required and should be named the same as the BAM file, with the extension instead changed from ".bam" to ".bai".

Reference File

A reference file related to the aligned sequences in the BAM file. The provided reference file must be the same reference file used when producing the BAM and BAM index files.

BED File

A BED file that specifies the coordinates of genes, with repsect to the provided reference. This BED file must be a BED4+ file and therefore contain at least the first 4 BED file columns. The "names" of these genetic regions in the BED4 file must be the same names used in the "genetic regions" column of the mutation database. Please refer to Data Formats for more information.

Variants File

A VCF (Variant Call Format) file format specifying identified variants in the BAM file, with respect to the passed reference file. When this file is provided, the computational running time of the program is improved. This variants file should be generated using the same BAM and reference files passed as parameters to program. The VCF output of call ntvar may be used as input to this program.

Mutation Database

The mutation database describes specific mutations within the named genetic regions specified previously in the BED4 file. When provided to the tool, the surveillence and drug resistance category information are included in the output mutation annotation.

The entries in the "genetic regions" colummn of this database must match the "names" column of the provided BED4 file. Please refer to Data Formats for more information.

Options

Minimum Frequency

-f, --min_freq FLOAT

The minimum observed frequency for a variant to reported. The default frequency is 0.01.

Error Rate

-e, --error_rate FLOAT

This is the expected substitution sequencing error rate. The default value is 0.0021 substitutions per sequenced base.

Output

-o, --output FILENAME

The file output location to write the identified amino acid mutations.

Output

The amino acid mutations that exceed the minimum frequency threshold will be output in AAVF format. By default, the results will be printed to standard output. The user may direct the output to a file by specifying a file name with the -o / --output option.

Examples

Data

The following example data may be used to run the tool:

Example: No Database

Command

quasitools call aavar variant.bam hiv.fasta hiv.bed

Output

##reference=hiv.fasta
##source=quasitools:aavar
##fileformat=AAVFv1.0
##fileDate=20190206
##INFO=<ID=SRVL,Number=.,Type=String,Description="Drug Resistance Surveillance">
##INFO=<ID=AC,Number=.,Type=String,Description="Alternate Codon">
##INFO=<ID=CAT,Number=.,Type=String,Description="Drug Resistance Category">
##INFO=<ID=ACF,Number=.,Type=Float,Description="Alternate Codon Frequency,for each Alternate Codon,in the same order aslisted.">
##INFO=<ID=RC,Number=1,Type=String,Description="Reference Codon">
##FILTER=<ID=af0.01,Description="Set if True; alt_freq<0.01">
#CHROM  GENE    POS REF ALT FILTER  ALT_FREQ    COVERAGE    INFO
AF033819.3  gag 339 P   Q   PASS    1.0000  133 SRVL=.;AC=cAa;CAT=.;ACF=1.0000;RC=cca
AF033819.3  env 67  N   T   PASS    1.0000  116 SRVL=.;AC=aCt;CAT=.;ACF=1.0000;RC=aat
AF033819.3  gag 441 Y   S   PASS    1.0000  109 SRVL=.;AC=tCc;CAT=.;ACF=1.0000;RC=tac
AF033819.3  pol 141 I   S   PASS    1.0000  145 SRVL=.;AC=aGt;CAT=.;ACF=1.0000;RC=att
AF033819.3  vpu 34  L   I   PASS    1.0000  118 SRVL=.;AC=Ata;CAT=.;ACF=1.0000;RC=tta
AF033819.3  env 302 N   Y   PASS    1.0000  138 SRVL=.;AC=Tat;CAT=.;ACF=1.0000;RC=aat
AF033819.3  gag 3   A   P   PASS    1.0000  140 SRVL=.;AC=Ccg;CAT=.;ACF=1.0000;RC=gcg
AF033819.3  pol 246 Q   H   PASS    1.0000  139 SRVL=.;AC=caT;CAT=.;ACF=1.0000;RC=caa
AF033819.3  gag 230 E   D   PASS    1.0000  126 SRVL=.;AC=gaT;CAT=.;ACF=1.0000;RC=gaa
AF033819.3  pol 96  G   E   PASS    1.0000  128 SRVL=.;AC=gAa;CAT=.;ACF=1.0000;RC=gga

Observe that there is no information under INFO column of the output for SRVL (surveillance) and CAT (drug resistance category). This is because a mutation database was not provided.

Example: With Database

Command

quasitools call aavar variant.bam hiv.fasta hiv.bed variant.vcf hiv_db.tsv

Output

##reference=hiv.fasta
##source=quasitools:aavar
##fileformat=AAVFv1.0
##fileDate=20190206
##INFO=<ID=SRVL,Number=.,Type=String,Description="Drug Resistance Surveillance">
##INFO=<ID=AC,Number=.,Type=String,Description="Alternate Codon">
##INFO=<ID=CAT,Number=.,Type=String,Description="Drug Resistance Category">
##INFO=<ID=ACF,Number=.,Type=Float,Description="Alternate Codon Frequency,for each Alternate Codon,in the same order aslisted.">
##INFO=<ID=RC,Number=1,Type=String,Description="Reference Codon">
##FILTER=<ID=af0.01,Description="Set if True; alt_freq<0.01">
#CHROM  GENE    POS REF ALT FILTER  ALT_FREQ    COVERAGE    INFO
AF033819.3  gag 339 P   Q   PASS    1.0000  133 SRVL=.;AC=cAa;CAT=.;ACF=1.0000;RC=cca
AF033819.3  env 67  N   T   PASS    1.0000  116 SRVL=No;AC=aCt;CAT=minor;ACF=1.0000;RC=aat
AF033819.3  gag 441 Y   S   PASS    1.0000  109 SRVL=No;AC=tCc;CAT=minor;ACF=1.0000;RC=tac
AF033819.3  pol 141 I   S   PASS    1.0000  145 SRVL=.;AC=aGt;CAT=.;ACF=1.0000;RC=att
AF033819.3  vpu 34  L   I   PASS    1.0000  118 SRVL=Yes;AC=Ata;CAT=major;ACF=1.0000;RC=tta
AF033819.3  env 302 N   Y   PASS    1.0000  138 SRVL=.;AC=Tat;CAT=.;ACF=1.0000;RC=aat
AF033819.3  gag 3   A   P   PASS    1.0000  140 SRVL=Yes;AC=Ccg;CAT=major;ACF=1.0000;RC=gcg
AF033819.3  pol 246 Q   H   PASS    1.0000  139 SRVL=No;AC=caT;CAT=minor;ACF=1.0000;RC=caa
AF033819.3  gag 230 E   D   PASS    1.0000  126 SRVL=.;AC=gaT;CAT=.;ACF=1.0000;RC=gaa
AF033819.3  pol 96  G   E   PASS    1.0000  128 SRVL=Yes;AC=gAa;CAT=major;ACF=1.0000;RC=gga

Observe that under the INFO column of the output, that the SRVL and CAT information is included in the annotation, if the mutation was specified in the mutation database.