sourCEntral - mobile manpages

pdf

SORTMERNA

NAME

sortmerna − tool for filtering, mapping and OTU-picking NGS reads

SYNOPSIS

sortmerna −−ref db.fasta,db.idx −−reads file.fa −−aligned base_name_output [OPTIONS]

DESCRIPTION

SortMeRNA is a biological sequence analysis tool for filtering, mapping and OTU-picking NGS reads. The core algorithm is based on approximate seeds and allows for fast and sensitive analyses of nucleotide sequences. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. Additional applications include OTU-picking and taxonomy assignation available through QIIME v1.9+ (http://qiime.org - v1.9.0-rc1).

SortMeRNA takes as input a file of reads (fasta or fastq format) and one or multiple rRNA database file(s), and sorts apart rRNA and rejected reads into two files specified by the user. Optionally, it can provide high quality local alignments of rRNA reads against the rRNA database. SortMeRNA works with Illumina, 454, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

OPTIONS

MANDATORY OPTIONS
−−ref
STRING,STRING

FASTA reference file, index file
Example:
−−ref
/path/to/file1.fasta,/path/to/index1
If passing multiple reference sequence files, separate them by ’:’
Example:
−−ref
/path/f1.fasta,/path/index1:/path/f2.fasta,path/index2

−−reads STRING

FASTA/FASTQ reads file

−−aligned STRING

aligned reads filepath + base file name (appropriate extension will be added)

COMMON OPTIONS
−−other
STRING

rejected reads filepath + base file name (appropriate extension will be added)

−−fastx BOOL

output FASTA/FASTQ fil (default: off, for aligned and/or rejected reads)

−−sam BOOL

output SAM alignmen (default: off, for aligned reads only)

−−SQ BOOL

add SQ tags to the SAM fil (default: off)

−−blast INT

output alignments in various Blast−like formats
0 − pairwise
1 − tabular (Blast −m 8 format)
2 − tabular + column for CIGAR
3 − tabular + columns for CIGAR and query coverage

−−log BOOL

output overall statistic (default: off)

−−num_alignments INT

report first INT alignments per read reaching E−value (default: -1, −−num_alignments 0 signifies all alignments will be output)

or (default)
−−best
INT

report INT best alignments per read reaching E−value (default: 1) by searching −−min_lis INT candidate alignments (−−best 0 signifies all candidate alignments will be searched)

−−min_lis INT

search all alignments having the first INT longest LIS (default: 2) LIS stands for Longest Increasing Subsequence, it is computed using seeds’ positions to expand hits into longer matches prior to Smith−Waterman alignment.

−−print_all_reads

output null alignment strings for non−aligned reads (default: off) to SAM and/or BLAST tabular files

−−paired_in BOOL

both paired−end reads go in −−aligned fasta/q file (default: off, interleaved reads only, see Section 4.2.4 of User Manual)

−−paired_out BOOL

both paired−end reads go in −−other fasta/q file (default: off, interleaved reads only, see Section 4.2.4 of User Manual)

−−match INT

SW score (positive integer) for a match (default: 2)

−−mismatch INT

SW penalty (negative integer) for a mismatch (default: -3)

−−gap_open INT

SW penalty (positive integer) for introducing a gap (default: 5)

−−gap_ext INT

SW penalty (positive integer) for extending a gap (default: 2)

−N INT

SW penalty for ambiguous letters (N’s) (default: scored as −−mismatch)

−F BOOL

search only the forward strand (default: off)

−R BOOL

search only the reverse−complementary strand (default: off)

−a INT

number of threads to use (default: 1)

−e DOUBLE

E−value threshold (default: 1)

−m INT

INT Mbytes for loading the reads into memory (default: 1024, maximum −m INT is 5872)

−v BOOL

verbose (default: off)

OTU PICKING OPTIONS
−−id
DOUBLE

%id similarity threshold (the alignment must still pass the E−value threshold, default: 0.97)

−−coverage DOUBLE

%query coverage threshold (the alignment must still pass the E−value threshold, default: 0.97)

−−de_novo_otu BOOL

FASTA/FASTQ file for reads matching database < %id
(set using −−id) and < %cov (set using −−coverage)
(alignment must still pass the E−value threshold, default: off)

−−otu_map BOOL

output OTU map (input to QIIME’s make_otu_table.py, default: off)

ADVANCED OPTIONS
see SortMeRNA user manual for more details
−−passes
INT

three intervals at which to place the seed on the read (L is the seed length set in indexdb_rna(1), default: L,L/2,3)

−−edges INT

number (or percent if INT followed by % sign) of nucleotides to add to each edge of the read prior to SW local alignment (default: 4)

−−num_seeds INT

number of seeds matched before searching for candidate LIS (default: 2)

−−full_search BOOL

search for all 0−error and 1−error seed matches in the index rather than stopping after finding a 0−error match (<1% gain in sensitivity with up four−fold decrease in speed, default: off)

−−pid BOOL

add pid to output file names (default: off)

−h BOOL

help

−−version BOOL

SortMeRNA version number

pdf