hisat2-align-s − graph-based alignment of short nucleotide reads to many genomes, wrapper script
HISAT2 version 2.0.5 by Daehwan Kim (infphilo AT gmail DOT com, www.ccb.jhu.edu/people/infphilo) Usage:
hisat2 [options]* −x <ht2−idx> {−1 <m1> −2 <m2> | −U <r>} [−S <sam>]
<ht2−idx>
Index filename prefix (minus trailing .X.ht2).
<m1> |
Files with #1 mates, paired with files in <m2>. Could be gzip’ed (extension: .gz) or bzip2’ed (extension: .bz2). |
||
<m2> |
Files with #2 mates, paired with files in <m1>. Could be gzip’ed (extension: .gz) or bzip2’ed (extension: .bz2). |
||
<r> |
Files with unpaired reads. Could be gzip’ed (extension: .gz) or bzip2’ed (extension: .bz2). |
||
<sam> |
File for SAM output (default: stdout) |
<m1>, <m2>, <r> can be comma−separated lists (no whitespace) and can be specified many times. E.g. ’−U file1.fq,file2.fq −U file3.fq’.
Options (defaults in parentheses):
Input:
−q |
query input files are FASTQ .fq/.fastq (default) |
||
−−qseq |
query input files are in Illumina’s qseq format |
||
−f |
query input files are (multi−)FASTA .fa/.mfa |
||
−r |
query input files are raw one−sequence−per−line |
||
−c |
<m1>, <m2>, <r> are sequences themselves, not files |
−s/−−skip <int>
skip the first <int> reads/pairs in the input (none)
−u/−−upto <int>
stop after first <int> reads/pairs (no limit)
−5/−−trim5 <int>
trim <int> bases from 5’/left end of reads (0)
−3/−−trim3 <int>
trim <int> bases from 3’/right end of reads (0)
−−phred33
qualities are Phred+33 (default)
−−phred64
qualities are Phred+64
−−int−quals
qualities encoded as space−delimited integers
Alignment:
−−n−ceil <func>
func for max # non−A/C/G/Ts permitted in aln (L,0,0.15)
−−ignore−quals
treat all quality values as 30 on Phred scale (off)
−−nofw |
do not align forward (original) version of read (off) |
||
−−norc |
do not align reverse−complement version of read (off) |
Spliced Alignment:
−−pen−cansplice <int>
penalty for a canonical splice site (0)
−−pen−noncansplice <int>
penalty for a non−canonical splice site (12)
−−pen−canintronlen <func>
penalty for long introns (G,−8,1) with canonical splice sites
−−pen−noncanintronlen <func>
penalty for long introns (G,−8,1) with noncanonical splice sites
−−min−intronlen <int>
minimum intron length (20)
−−max−intronlen <int>
maximum intron length (500000)
−−known−splicesite−infile <path>
provide a list of known splice sites
−−novel−splicesite−outfile <path>
report a list of splice sites
−−novel−splicesite−infile <path>
provide a list of novel splice sites
−−no−temp−splicesite
disable the use of splice sites found
−−no−spliced−alignment
disable spliced alignment
−−rna−strandness <string>
Specify strand−specific information (unstranded)
−−tmo |
Reports only those alignments within known transcriptome |
||
−−dta |
Reports alignments tailored for transcript assemblers |
−−dta−cufflinks
Reports alignments tailored specifically for cufflinks
Scoring:
−−ma <int>
match bonus (0 for −−end−to−end, 2 for −−local)
−−mp <int>,<int>
max and min penalties for mismatch; lower qual = lower penalty <2,6>
−−sp <int>,<int>
max and min penalties for soft−clipping; lower qual = lower penalty <1,2>
−−no−softclip
no soft−clipping
−−np <int>
penalty for non−A/C/G/Ts in read/ref (1)
−−rdg <int>,<int>
read gap open, extend penalties (5,3)
−−rfg <int>,<int>
reference gap open, extend penalties (5,3)
−−score−min <func> min acceptable alignment score w/r/t read length
(L,0.0,−0.2)
Reporting:
−k <int> (default: 5) report up to <int> alns per read |
Paired−end:
−I/−−minins <int>
minimum fragment length (0), only valid with −−no−spliced−alignment
−X/−−maxins <int>
maximum fragment length (500), only valid with −−no−spliced−alignment
−−fr/−−rf/−−ff −1, −2 mates align fw/rev, rev/fw, fw/fw (−−fr) |
−−no−mixed
suppress unpaired alignments for paired reads
−−no−discordant
suppress discordant alignments for paired reads
Output:
−t/−−time
print wall−clock time taken by search phases
−−un <path>
write unpaired reads that didn’t align to <path>
−−al <path>
write unpaired reads that aligned at least once to <path>
−−un−conc <path>
write pairs that didn’t align concordantly to <path>
−−al−conc <path>
write pairs that aligned concordantly at least once to <path>
(Note: for −−un, −−al, −−un−conc, or −−al−conc, add ’−gz’ to the option name, e.g. −−un−gz <path>, to gzip compress output, or add ’−bz2’ to bzip2 compress output.) −−quiet print nothing to stderr except serious errors −−met−file <path> send metrics to file at <path> (off) −−met−stderr send metrics to stderr (off) −−met <int> report internal counters & metrics every <int> secs (1) −−no−head supppress header lines, i.e. lines starting with @ −−no−sq supppress @SQ header lines −−rg−id <text> set read group id, reflected in @RG line and RG:Z: opt field −−rg <text> add <text> ("lab:value") to @RG line of SAM header.
Note: @RG line only printed when −−rg−id is set.
−−omit−sec−seq
put ’*’ in SEQ and QUAL fields for secondary alignments.
Performance:
−o/−−offrate <int> override offrate of index; must be >= index’s offrate |
|
−p/−−threads <int> number of alignment threads to launch (1) |
−−reorder
force SAM output order to match order of input reads
−−mm |
use memory−mapped I/O for index; many ’hisat2’s can share |
Other:
−−qc−filter
filter out reads that are bad according to QSEQ filter
−−seed <int>
seed for random number generator (0)
−−non−deterministic seed rand. gen. arbitrarily instead of using read attributes |
−−remove−chrname
remove ’chr’ from reference names in alignment
−−add−chrname
add ’chr’ to reference names in alignment
−−version
print version information and quit
−h/−−help
print this usage message
64−bit Built on Debian unstable Fri Dec 9 12:45:26 UTC 2016 Compiler: gcc version 6.2.1 20161202 (Ubuntu 6.2.1−5ubuntu2) Options: −O3 −funroll−loops −g3 −Wdate−time −D_FORTIFY_SOURCE=2 −DPOPCNT_CAPABILITY Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}