Data from: SATé-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees

Kevin Liu, Tandy J. Warnow, Mark T. Holder, Serita M. Nelesen, Jiaye Yu, Alexandros P. Stamatakis & C. Randal Linder
Highly accurate estimation of phylogenetic trees for large datasets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Co-estimation of alignments and trees has been attempted, but currently only SATé estimates reasonably accurate trees and alignments for large datasets in practical time frames (Liu et al., 2009b). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I)...

Data from: Understanding angiosperm diversification using small and large phylogenetic trees

Stephen A. Smith, Jeremy M. Beaulieu, Alexandros Stamatakis & Michael J. Donoghue
How will the emerging possibility of inferring ultra-large phylogenies influence our ability to identify shifts in diversification rate? For several large angiosperm clades (Angiospermae, Monocotyledonae, Orchidaceae, Poaceae, Eudicotyledonae, Fabaceae, and Asteraceae), we explore this issue by contrasting two approaches: (1) using small backbone trees with an inferred number of extant species assigned to each terminal clade and (2) using a mega-phylogeny of 55473 seed plant species represented in GenBank. The mega-phylogeny approach assumes that the...

Data from: An efficient independence sampler for updating branches in Bayesian Markov chain Monte Carlo sampling of phylogenetic trees

Andre J. Aberer, Alexandros Stamatakis & Fredrik Ronquist
Sampling tree space is the most challenging aspect of Bayesian phylogenetic inference. The sheer number of alternative topologies is problematic by itself. In addition, the complex dependency between branch lengths and topology increases the difficulty of moving efficiently among topologies. Current tree proposals are fast but sample new trees using primitive transformations or re-mappings of old branch lengths. This reduces acceptance rates and presumably slows down convergence and mixing. Here, we explore branch proposals that...

Data from: Phylogenomics and the evolution of hemipteroid insects

Kevin P. Johnson, Christopher H. Dietrich, Frank Friedrich, Rolf G. Beutel, Benjamin Wipfler, Ralph S. Peters, Julie M. Allen, Malte Petersen, Alexander Donath, Kimberly K. O. Walden, Alexey M. Kozlov, Lars Podsiadlowski, Christoph Mayer, Karen Meusemann, Alexandros Vasilikopoulos, Robert M. Waterhouse, Stephen L. Cameron, Christiane Weirauch, Daniel R. Swanson, Diana M. Percy, Nate B. Hardy, Irene Terry, Shanlin Liu, Xin Zhou, Bernhard Misof … & Kazunori Yoshizawa
Hemipteroid insects (Paraneoptera), with over 10% of all known insect diversity, are a major component of terrestrial and aquatic ecosystems. Previous phylogenetic analyses have not consistently resolved the relationships among major hemipteroid lineages. We provide maximum likelihood-based phylogenomic analyses of a taxonomically comprehensive dataset comprising sequences of 2,395 single-copy, protein-coding genes for 193 samples of hemipteroid insects and outgroups. These analyses yield a well-supported phylogeny for hemipteroid insects. Monophyly of each of the three hemipteroid...

Data from: Decisive datasets in phylogenomics: lessons from studies on the phylogenetic relationships of primarily wingless insects

Emiliano Dell'Ampio, Karen Meusemann, Nikolaus U. Szucsich, Ralph S. Peters, Benjamin Meyer, Janus Borner, Malte Petersen, Andre J. Aberer, Alexandros Stamatakis, Manfred G. Walzl, Bui Quang Minh, Arndt Von Haeseler, Ingo Ebersberger, Günther Pass & Bernhard Misof
Phylogenetic relationships of the primarily wingless insects are still considered unresolved. Even the most comprehensive phylogenomic studies that addressed this question did not yield congruent results. In order to get a grip on these problems, we here analyzed the sources of incongruence in these phylogenomic studies using an extended transcriptome dataset.Our analyses showed that unevenly distributed missing data can be severely misleading by inflating node support despite the absence of phylogenetic signal. In consequence, only...

Data from: Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice

Andre J. Aberer, Denis Krompass & Alexandros Stamatakis
The presence of rogue taxa (rogues) in a set of trees can frequently have a negative impact on the results of a bootstrap analysis (e.g., the overall support in consensus trees). We introduce an efficient graph-based algorithm for rogue taxon identification as well as an interactive web-service implementing this algorithm. Compared to our previous method, the new algorithm is up to four orders of magnitude faster, while returning qualitatively identical results. Because of this significant...

Data from: Short tree, long tree, right tree, wrong tree: new acquisition bias corrections for inferring SNP phylogenies

Adam D. Leaché, Barbara L. Banbury, Joseph Felsenstein, Adrián Nieto-Montes De Oca & Alexandros Stamatakis
Single nucleotide polymorphisms (SNPs) are useful markers for phylogenetic studies owing in part to their ubiquity throughout the genome and ease of collection. Restriction site associated DNA sequencing (RADseq) methods are becoming increasingly popular for SNP data collection, but an assessment of the best practises for using these data in phylogenetics is lacking. We use computer simulations, and new double digest RADseq (ddRADseq) data for the lizard family Phrynosomatidae, to investigate the accuracy of RAD...

Data from: Phylogenomics resolves the timing and pattern of insect evolution

Bernhard Misof, Shanlin Liu, Karen Meusemann, Ralph S. Peters, Alexander Donath, Christoph Mayer, Paul B. Frandsen, Jessica Ware, Tomas Flouri, Rolf G. Beutel, Oliver Niehuis, Malte Petersen, Fernando Izquierdo-Carrasco, Torsten Wappler, Jes Rust, Andre J. Aberer, Ulrike Aspöck, Horst Aspöck, Daniela Bartel, Alexander Blanke, Simon Berger, Alexander Böhm, Thomas Buckley, Brett Calcott, Junqing Chen … & Xin Zhou
Insects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically robust and congruent results resolving previously controversial phylogenetic relations hips. We dated the origin of insects to the Early Ordovician [~479 million years ago (Ma)], of insect flight...

Data from: EPA-ng: massively parallel evolutionary placement of genetic sequences

Pierre Barbera, Alexey M. Kozlov, Lucas Czech, Benoit Morel, Diego Darriba, Tomas Flouri & Alexandros Stamatakis
Next Generation Sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these sequences fit into anevolutionary context. Previous implementations of phylogenetic placement algorithms, such as the Evolutionary Placement Algorithm (EPA) included in RAxML, or pplacer, are being increasingly used for this purpose. However, due to the steady...

Quartet-based computations of internode certainty provide robust measures of phylogenetic incongruence

Xiaofan Zhou, Sarah Lutteropp, Lucas Czech, Alexandros Stamatakis, Moritz Von Looz & Antonis Rokas
Incongruence, or topological conflict, is prevalent in genome-scale data sets. Internode certainty (IC) and related measures were recently introduced to explicitly quantify the level of incongruence of a given internal branch among a set of phylogenetic trees and complement regular branch support measures (e.g., bootstrap, posterior probability) that instead assess the statistical confidence of inference. Since most phylogenomic studies contain data partitions (e.g., genes) with missing taxa and IC scores stem from the frequencies of...

