Using the SISTR Pipeline

This guide describes how to make use of the Salmonella in-silico Typing Resource (SISTR) pipeline within IRIDA. This pipeline enables the identification of the serovar and cgMLST types for Salmonella whole genome sequencing (WGS) data through comparisons with a large database (10,000+) of Salmonella genomes within NCBI.

Pipeline Overview

The SISTR pipeline that is implemented within IRIDA makes use of the following steps to translate WGS data into typing information:

  1. Paired-end reads are merged with FLASH.
  2. Merged and un-merged reads are assembled de novo using SPAdes.
  3. Low-coverage and small contigs are removed from the generated assembly.
  4. The assembled genome is passed to sistr_cmd, a command-line program for comparing genomes against the SISTR database.

Running the Pipeline

The SISTR pipeline can bet set up to run using two separate methods.

1. Automated Execution

The SISTR pipeline can be set to run automatically on upload of new sequencing data to particular projects.

Project Settings

Automated SISTR analysis can also be enabled (or disabled) after a project is created from the project Settings page.

project-settings-sistr.png

If automated execution of SISTR has been enabled for a project, then a SISTR analysis will be scheduled for execution on the upload of new sequencing data. The results are accessible from the particular Project > Analyses page.

sistr-typing-analysis-page.png

Clicking the Automated SISTR Typing link brings you to the appropriate analysis page for SISTR.

sistr-typing-status.png

2. Manual Execution

To execute SISTR manually, please refer to the IRIDA/SISTR Tutorial.

SISTR Results

A successful SISTR run should produce the following page as output. There are three possible Quality Control Statuses (Pass, Warning, and Fail)

sistr-results.png

The results are broken up into three different sections (SISTR Information, cgMLST330, and Mash).

To view the output files and/or download the outputs, click the Output Files tab.

sistr-results-outputs.png

Report

Interpretation of the produced output is as follows:

1. SISTR Information

Basic information on the sample and quality of the SISTR results.

Serovar Predictions:

The in silico serovar predictions generated from SISTR.

2. cgMLST330

The results of additional predictions made using the SISTR cgMLST330 schema.

3. Mash

The results of predictions made through comparisons using the software Mash. Generally, cgMLST results are preferred over Mash.

Output Files

In addition to the report, the SISTR pipeline produces the following files available for download.

More information on the interpretation of these files is available on the sistr_cmd page.