fastahack − indexing and sequence extraction from FASTA files
fastahack [options] <fasta reference>
fastahack is a small application for indexing and extracting sequences and subsequences from FASTA files. The included Fasta.cpp library provides a FASTA reader and indexer that can be embedded into applications which would benefit from directly reading subsequences from FASTA files. The library automatically handles index file generation and use.
Features:
FASTA index (.fai) generation for FASTA files
Sequence extraction
Subsequence extraction
Sequence statistics (currently only entropy is provided)
Sequence and subsequence extraction use fseek64 to provide fastest-possible extraction without RAM-intensive file loading operations. This makes fastahack a useful tool for bioinformaticists who need to quickly extract many subsequences from a reference FASTA sequence.
−i, −−index
generate fasta index <fasta reference>.fai
−r, −−region REGION
print the specified region
−c, −−stdin
read a stream of line−delimited region specifiers on stdin and print the corresponding sequence for each on stdout
−e, −−entropy
print the shannon entropy of the specified region
−d, −−dump
print the fasta file in the form ’seq_name <tab> sequence’
REGION is of the form
<seq>, <seq>:<start>[sep]<end>, <seq1>:<start>[sep]<seq2>:<end>
where start and end are 1−based, and the region includes the end position. [sep] is "−" or ".."
Specifying a sequence name alone will return the entire sequence, specifying range will return that range, and specifying a single coordinate pair, e.g. <seq>:<start> will return just that base.
This software was written by Erik Garrison <erik DOT garrison AT bc DOT edu>.
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.