sourCEntral - mobile manpages

pdf

MIRABAIT

NAME

mirabait − a ’grep’ like tool to select reads with kmers up to 256 bp

SYNOPSIS

mirabait [options] {−b baitfile [−b ...] | −B file | −j joblibrary} {−p file_1 file_2 | −P file3}* [file4 ...]

DESCRIPTION

mirabait selects reads from a read collection which are partly similar or equal to sequences defined as target baits. Similarity is defined by finding a user−adjustable number of common k−mers (sequences of k consecutive bases) which are the same in the bait sequences and the screened sequences to be selected, either in forward or forward/reverse complement direction. Adding a DUST−like repeat filter for repeats up 4 bases is optional.

When used on paired files, selects sequences where at least one mate matches.

OPTIONS

Main options:
−b
file

Load bait sequences from file (multiple −b allowed)

−B file

Load baits from kmer statistics file, not from sequence files. Only one −B allowed, cannot be combined with −b. (see −K for creating such a file)

−j job

Set options for predefined job from supplied MIRA library Currently available jobs:

rrna Bait rRNA sequences

−p file1 file2

Load paired sequences to search from file1 and file2 Files must contain same number of sequences, sequence names must be in same order. Multiple −p allowed, but must come before non−paired files.

−P file

Load paired sequences from file File must be interleaved: pairs must follow each other, non−pairs are not allowed. Multiple −p allowed, but must come before non−paired files.

−k int

kmer length of bait in bases (<=256, default=31)

−n int

If >0: minimum number of k−mer baits needed (default=1) If <=0: allowed number of missed kmers over sequence

length

−d

Do not use kmers with microrepeats (DUST−like, see also −D)

−D int

Set length of microrepeats in kmers to discard from bait.

− int > 0 microrepeat len in percentage of kmer length. E.g.: −k 17 −D 67 −−> 11.39 bases −−> 12 bases.
− int < 0 microrepeat len in bases.
− int != 0 implies −d, int=0 turns DUST filter off.

−i

Selects sequences that do not hit bait

−I

Selects sequences that hit and do not hit bait (to different files)

−r

No checking of reverse complement direction

−t

Number of threads to use (default=0 −> up to 4 CPU cores)

Options for output definition:
Normally mirabait writes separate result files (named ’bait_match_*’ and ’bait_miss_*’) for each input to the current directory. For changing this behaviour and other relating to output, use these options:

−c

No case change of sequence to denote bait hits

−l int

length of a line (FASTA only, default 0=unlimited)

−K file

Save kmer statistics to ’file’ (see also −B)

−N name

Change the prefix ’bait’ to <name> Has no effect if −o/−O is used and targets are not directories

−o <path>

Save sequences matching bait to path If path is a directory, write separate files into this directory. If not, combine all matching sequences from the input file(s) into a single file specified by the path.

−O <path>

Like −o, but for sequences not matching

Other options:

−T dir

Use ’dir’ as directory for temporary files instead of current working directory.

−m integer

Memory to use for computing kmer statistics
0..100 = use percentage of free system memory
>100 = amount of MiB to use (e.g. 16384 for 16 GiB)
Default 75 (75% of free system memory).

Defining files types to load/save:

Normally mirabait recognises the file types according to the file extension (even when packed). In cases you need to force a certain file type because the file extension is non−standard, use the EMBOSS notation to force a type: <filetype>::<name_of_file>. E.g., to tell that "somefile.dat" is FASTQ, use: fastq::somefile.dat Recognised types are: caf, fasta, fastq, gbf, gbk, gbff, maf and phd.

MIRABAIT will write files in the same file type as the corresponding input files. Examples:
mirabait −b b.fasta file.fastq
mirabait −I −j rrna −p file_1.fastq file_2.fastq
mirabait −b b1.fasta −b b2.gbk file.fastq
mirabait −b fasta::baits.dat −p fastq::file_1.dat fastq::file_2.dat
mirabait −b b.fasta −p file_1.fastq file_2.fastq −P file3.fasta
file4.caf
mirabait −I −b b.fasta −p file_1.fastq file_2.fastq −P file3.fasta
file4.caf
mirabait −k 27 −n 10 −b b.fasta file.fastq
mirabait −b fasta::b.dat fastq::file.dat
mirabait −o /dev/shm/ −b b.fasta −p file_1.fastq file_2.fastq
mirabait −o /dev/shm/match −b b.fasta −p file_1.fastq file_2.fastq
mirabait −b human_genome.fasta −K HG_kmerstats.mhs.gz −p file1.fastq
file2.fastq
mirabait −B HG_kmerstats.mhs.gz −p file1.fastq file2.fastq
mirabait −d −B HG_kmerstats.mhs.gz −p file1.fastq file2.fastq

SEE ALSO

mira(1), miraconvert(1)

A more extensive documentation is provided in the MIRA manual available online at

http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html

On Debian, this can be installed with the mira-doc package and can then be found at /usr/share/doc/mira-assembler/DefinitiveGuideToMIRA.html. On other systems, you may want to check in /usr/local/share/mira/doc or run "locate DefinitiveGuideToMIRA" to find it locally.

You can also subscribe one of the MIRA mailing lists at

http://www.chevreux.org/mira_mailinglists.html

After subscribing, mail general questions to the MIRA talk mailing list:

mira_talk AT freelists DOT org

BUGS

To report bugs or ask for features, please use the ticketing system at:

http://sourceforge.net/projects/mira-assembler/

AUTHOR

Bastien Chevreux <bach AT chevreux DOT org>

This manual page was written by Bastien Chevreux <bach AT chevreux DOT org> but can be freely used for any documentation purpose.

pdf