Signature Format
The signatures produced by Neptune are output in FASTA format with additional information in the description line. Signatures are output in the following format:
>[ID] [SCORE] [IN SCORE] [EX SCORE] [LENGTH] [REF] [POS]
[SEQUENCE]
The following is an example:
>425 score=0.86 in=0.98 ex=-0.13 len=31 ref=ecoli pos=160
TGTCATTCTCCTGTTCTGCCTGTATCACTGC
Where:
Item | Full Name | Description |
---|---|---|
[ID] | ID | An arbitrary, run-unique ID assigned to the signature. |
[SCORE] | Score | The total signature score. This is the sum of the inclusion (sensitivity) and exclusion (specificity) scores. |
[IN SCORE] | Inclusion Score | The positive inclusion component of signature score (sensitivity). |
[EX SCORE] | Exclusion Score | The negative exclusion component of signature score (specificity). |
[LENGTH] | Length | The signature length in bases. |
[REF] | Reference | The unique identifier of the contig from which the signature was extracted. |
[POS] | Position | The starting position of the signature in the reference. |
[SEQUENCE] | Sequence | The sequence content of the signature. |
ID
The signature ID is an arbitrary, run-unique ID assigned to the signature. The signatures within the same FASTA file will have unique IDs, relative to each other. However, signatures within multiple output files will have overlapping signature IDs. This will be the case when using multiple references or not specifying any reference files. The signatures within the consolidated.fasta
output will have unique signature IDs.
Total Score
Signatures are assigned a score corresponding to their highest-scoring BLAST alignments with all inclusion and exclusion targets, which is a sum of the positive inclusion score (sensitivity) and the negative exclusion component (specificity). This score is maximized when all inclusion targets contain a region exactly matching the entire signature and there exists no exclusion targets that match the signature.
Inclusion Score
The inclusion score is a non-negative number between 0.00 and 1.00 and relates to the signature's sensitivity. This score is determined by the signature's highest-scoring BLAST alignments with all inclusion targets. The inclusion score is maximized (good) when the signature is found exactly and completely in all inclusion targets and minimized (bad) when the signature is not found whatsoever in any inclusion targets.
Exclusion Score
The exclusion score is a non-positive number between -1.00 and 0.00 and relates to the signature's specificity. This score is determined by the signature's highest-scoring BLAST alignments with all exclusion targets. The exclusion score is maximized (bad) when the signature is found exactly and completely in all exclusion targets and minimized (good) when the signature is not found whatsoever in any exclusion targets.
Length
The length describes the length of the signature in bases. Although this can be calculated from the sequence, it is included in the FASTA description to accommodate other tools.
Reference
The reference describes the sequence identifier of the contig the signature was extracted from. This is useful for determining where the signature lies and what sequence surrounds it.
Position
The position describes the base position of the signature within the contig reference it was extracted from. This is useful for determining where the signature lies and what sequence surrounds it.
Sequence
The sequence describes the sequence content of the signature and follows the specifications of FASTA format. However, the sequence will not contain line breaks, regardless of the sequence length.