Output

Neptune's output directory contains the following items:

Item Type Description
candidates directory The directory containing signature candidates in extracted order.
filtered directory The directory containing filtered signature candidates in extracted order.
sorted directory The directory containing filtered signatures in signature-score sorted order.
consolidated directory The directory containing the consolidate signatures from multiple sorted-signature reference files.
database directory The directory containing Neptune's BLAST constructed databases.
aggregate.kmers file The k-mer file containing all observed k-mers.
receipt.txt file The file containing Neptune's run receipt.

A file with the same name as each reference will be placed in each output directory (candidates, filtered, sorted), corresponding to the reference file from which it was derived.

Candidates

The candidate signatures are the sequences produced from the signature extraction step. These signatures will relatively sensitive, but not necessarily specific. This is because signature extraction is done using exact k-mer matches. The candidate signatures are guaranteed to contain no more exact matches with any exclusion k-mer than specified by the --exhits parameter. However, there may be inexact matches with exclusion targets.

Filtered

The filtering step is designed to remove signatures which are not interesting enough to warrant further investigation, because the negative component of their score is prohibitively large. The filtering step removes signatures that align sufficiently with any exclusion target. The filtered signatures are a subset of the candidate signatures.

Sorted

The sorted signatures files are organized as FASTA records containing the same signatures as their filtered signatures counterparts. However, the signatures are listed in descending order by their signature score. Signatures are assigned a score corresponding to their highest-scoring BLAST alignments with all inclusion and exclusion targets, which is a sum of a positive inclusion component and a negative exclusion component. This score is maximized when all inclusion targets contain a region exactly matching the entire signature and there exists no exclusion targets that match the signature.

Consolidated

The sorted signatures from all references are combined into a single "consolidated.fasta" file, located within the "consolidated" directory. Signatures are added to the consolidated signatures file in a greedy manner by selecting the next highest scoring signature available from all references. While effort is taken to prevent signatures from overlapping entirely, it is possible for consolidate signatures to have a small amount of overlap. In many circumstances, this output might be considered the final output of Neptune.

Databases

The databases directory contains BLAST databases constructed from the inclusion and exclusion files.

Aggregate k-mers

The aggregated k-mers file, aggregated.kmers, contains a list of all k-mers observed in the inclusion and exclusion groups. These k-mers are sorted and followed by two integers: the number of inclusion and exclusion targets the k-mer appears in, respectively.

Run Receipt

The run receipt contains information about the Neptune execution. It contains a list of all the files in the inclusion and exclusion group, and the command line parameters used for the execution.