sourCEntral - mobile manpages

pdf

SUMTREES

NAME

sumtrees − Phylogenetic Tree Summarization and Annotation

SYNOPSIS

sumtrees [−i FORMAT] [−b BURNIN] [−−force−rooted] [−−force−unrooted]

DESCRIPTION

SumTrees is a program to summarize non-parameteric bootstrap or Bayesian posterior probability support for splits or clades on phylogenetic trees.

The basis of the support assessment is typically given by a set of non-parametric bootstrap replicate tree samples produced by programs such as GARLI or RAxML, or by a set of MCMC tree samples produced by programs such as Mr. Bayes or BEAST. The proportion of trees out of the samples in which a particular split is found is taken to be the degree of support for that split as indicated by the samples. The samples that are the basis of the support can be distributed across multiple files, and a burn-in option allows for an initial number of trees in each file to be excluded from the analysis if they are not considered to be drawn from the true support distribution.

Summarizations collections of trees, e.g., MCMC samples from a posterior distribution, non−parametric bootstrap replicates, mapping posterior probability, support, or frequency that splits/clades are found in the source set of trees onto a target tree.

OPTIONS

Source Options:
TREE−FILEPATH

Source(s) of trees to summarize. At least one valid source of trees must be provided. Use ’−’ to specify reading from standard input (note that this requires the input file format to be explicitly set using the ’−−source−format’ option).

−i FORMAT, −−input−format FORMAT, −−source−format FORMAT

Format of all input trees (defaults to handling either NEXUS or NEWICK through inspection; it is more efficient to explicitly specify the format if it is known).

−b BURNIN, −−burnin BURNIN

Number of trees to skip from the beginning of *each* tree file when counting support (default: 0).

−−force−rooted, −−rooted

Treat source trees as rooted.

−−force−unrooted, −−unrooted

Treat source trees as unrooted.

−v, −−ultrametricity−precision, −−branch−length−epsilon

Precision to use when validating ultrametricity (default: 1e−05; specify ’0’ to disable validation).

−−weighted−trees

Use weights of trees (as indicated by ’[&W m/n]’ comment token) to weight contribution of splits found on each tree to overall split frequencies.

−−preserve−underscores

Do not convert unprotected (unquoted) underscores to spaces when reading NEXUS/NEWICK format trees.

−−taxon−name−filepath FILEPATH

Path to file listing all the taxon names or labels that will be found across the entire set of source trees. This file should be a plain text file with a single name list on each line. This file is only read when multiprocessing (’−M’ or ’−m’) is requested. When multiprocessing using the ’−M’ or ’−m’ options, all taxon names need to be defined in advance of any actual tree analysis. By default this is done by reading the first tree in the first tree source and extracting the taxon names. At best, this is, inefficient, as it involves an extraneous reading of the tree. At worst, this can be erroneous, if the first tree does not contain all the taxa. Explicitly providing the taxon names via this option can avoid these issues.

Target Tree Topology Options:
−t
FILE, −−target−tree−filepath FILE

Summarize support and other information from the source trees to topology or topologies given by the tree(s) described in FILE. If no use−specified target topologies are given, then a summary topology will be used as the target. Use the ’−s’ or ’−−summary−target’ to specify the type of summary tree to use.

−s SUMMARY−TYPE, −−summary−target SUMMARY−TYPE

Construct and summarize support and other information from the source trees to one of the following summary topologies: − ’consensus’

A consensus tree. The minimum frequency

threshold of clades to be included can be specified using the ’−f’ or ’−−min−clade−freq’ flags. This is the DEFAULT if a user− specified target tree is not given through the ’−t’ or ’−−target−tree−filepath’ options.

− ’mcct’

The maximum clade credibility tree. The tree from the source set that maximizes the *product* of clade posterior probabilities.

− ’msct’

The maximum clade credibility tree. The tree from the source set that maximizes the *product* of clade posterior probabilities.

Target Tree Supplemental Options:
−f
#.##, −−min−consensus−freq #.##, −−min−freq #.##, −−min−clade−freq
#.##

If using a consensus tree summarization strategy, then this is the minimum frequency or probability for a clade or a split to be included in the resulting tree (default: > 0.5).

−−allow−unknown−target−tree−taxa

Do not fail with error if target tree(s) have taxa not previously encountered in source trees or defined in the taxon discovery file.

Target Tree Rooting Options:
−−root−target−at−outgroup
TAXON−LABEL

Root target tree(s) using specified taxon as outgroup.

−−root−target−at−midpoint

Root target tree(s) at midpoint.

−−set−outgroup TAXON−LABEL

Rotate the target trees such the specified taxon is in the outgroup position, but do not explicitly change the target tree rooting.

Target Tree Edge Options:
−e
STRATEGY, −−set−edges STRATEGY, −−edges STRATEGY

Set the edge lengths of the target or summary trees based on the specified summarization STRATEGY: − ’mean−length’

Edge lengths will be set to the mean of the

lengths of the corresponding split or clade in the source trees.

− ’median−length’

Edge lengths will be set to the median of the

lengths of the corresponding split or clade in

the source trees.

− ’mean−age’

Edge lengths will be adjusted so that the age of subtended nodes will be equal to the mean age of the corresponding split or clade in the source trees. Source trees will need to to be ultrametric for this option.

− ’median−age’

Edge lengths will be adjusted so that the age of subtended nodes will be equal to the median age of the corresponding split or clade in the source trees. Source trees will need to to be ultrametric for this option.

− support

Edge lengths will be set to the support value for the split represented by the edge.

− ’keep’

Do not change the existing edge lengths. This is the DEFAULT if target tree(s) are sourced from an external file using the ’−t’ or ’−−targettree−filepath’ option

− ’clear’

Edge lengths will be cleared from the target trees if they are present.

Note the default settings varies according to the

following, in order of preference: (1) If target trees are specified using the ’−t’ or

’−−target−tree−filepath’ option, then the default edge

summarization strategy is: ’keep’.

(2) If target trees are not specified, but the

’−−summarize−node−ages’ option is specified, then the default edge summarization strategy is: ’mean−age’.

(3) If no target trees are specified and the

node ages are NOT specified to be summarized, then the default edge summarization strategy is: ’mean−length’.

−−force−minimum−edge−length FORCE_MINIMUM_EDGE_LENGTH

(If setting edge lengths) force all edges to be at least this length.

−−collapse−negative−edges

(If setting edge lengths) force parent node ages to be at least as old as its oldest child when summarizing node ages.

Target Tree Annotation Options:
−−summarize−node−ages
, −−ultrametric, −−node−ages

Assume that source trees are ultrametic and summarize node ages (distances from tips).

−l {support,keep,clear}, −−labels {support,keep,clear}

Set the node labels of the summary or target tree(s): − ’support’

Node labels will be set to the support value for

the clade represented by the node. This is the DEFAULT.

− ’keep’

Do not change the existing node labels.

− ’clear’

Node labels will be cleared from the target trees if they are present.

−−suppress−annotations, −−no−annotations

Do NOT annotate nodes and edges with any summarization information metadata such as.support values, edge length and/or node age summary statistcs, etc.

Support Expression Options:
−p
, −−percentages

Indicate branch support as percentages (otherwise, will report as proportions by default).

−d #, −−decimals #

Number of decimal places in indication of support values (default: 8).

Output Options:
−o
FILEPATH, −−output−tree−filepath FILEPATH, −−output FILEPATH

Path to output file (if not specified, will print to standard output).

−F {nexus,newick,phylip,nexml}, −−output−tree−format
{nexus,newick,phylip,nexml}

Format of the output tree file (if not specified, defaults to input format, if this has been explicitly specified, or ’nexus’ otherwise).

−x PREFIX, −−extended−output PREFIX

If specified, extended summarization information will be generated, consisting of the following files: − ’<PREFIX>.topologies.trees’

A collection of topologies found in the sources

reported with their associated posterior probabilities as metadata annotations.

− ’<PREFIX>.bipartitions.trees’

A collection of bipartitions, each represented as a tree, with associated information as metadataannotations.

− ’<PREFIX>.bipartitions.tsv’

Table listing bipartitions as a group pattern as the key column, and information regarding each the bipartitions as the remaining columns.

− ’<PREFIX>.edge−lengths.tsv’

List of bipartitions and corresponding edge lengths. Only generated if edge lengths are summarized.

− ’<PREFIX>.node−ages.tsv’

List of bipartitions and corresponding ages. Only generated if node ages are summarized.

−−no−taxa−block

When writing NEXUS format output, do not include a taxa block in the output treefile (otherwise will create taxa block by default).

−−no−analysis−metainformation, −−no−meta−comments

Do not include meta−information describing the summarization parameters and execution details.

−c ADDITIONAL_COMMENTS, −−additional−comments ADDITIONAL_COMMENTS

Additional comments to be added to the summary file.

−r, −−replace

Replace/overwrite output file without asking if it already exists.

Parallel Processing Options:
−M
, −−maximum−multiprocessing

Run in parallel mode using as many processors as available, up to the number of sources.

−m NUM−PROCESSES, −−multiprocessing NUM−PROCESSES

Run in parallel mode with up to a maximum of NUMPROCESSES processes (’max’ or ’#’ means to run in as many processes as there are cores on the local machine; i.e., same as specifying ’−M’ or ’−−maximummultiprocessing’).

Program Logging Options:
−g
LOG−FREQUENCY, −−log−frequency LOG−FREQUENCY

Tree processing progress logging frequency (default: 500; set to 0 to suppress).

−q, −−quiet

Suppress ALL logging, progress and feedback messages.

Program Error Options:
−−ignore−missing−support

Ignore missing support tree files (note that at least one must exist).

Program Information Options:
−h
, −−help

Show help information for program and exit.

−−citation

Show citation information for program and exit.

−−usage−examples

Show usage examples of program and exit.

−−describe

Show information regarding your DendroPy and Python installations and exit.

AUTHORS

Jeet Sukumaran and Mark T. Holder

SEE ALSO

If any stage of your work or analyses relies on code or programs from this library, either directly or indirectly (e.g., through usage of your own or third−party programs, pipelines, or toolkits which use, rely on, incorporate, or are otherwise primarily derivative of code/programs in this library), please cite:

Sukumaran, J and MT Holder. 2010. DendroPy: a Python library for phylogenetic computing. Bioinformatics 26: 1569−1571.

Sukumaran, J and MT Holder. SumTrees: Phylogenetic Tree Summarization. 4.0.0 (Jan 31 2015). Available at https://github.com/jeetsukumaran/DendroPy.

Note that, in the interests of scientific reproducibility, you should describe in the text of your publications not only the specific version of the SumTrees program, but also the DendroPy library used in your analysis. For your information, you are running DendroPy 4.0.2.

pdf