Optimal sequence similarity thresholds for clustering of molecular operational taxonomic units in DNA metabarcoding studies

Aurélie Bonin, Alessia Guerrieri & Gentile F. Ficetola
Clustering approaches are pivotal to handle the many sequence variants obtained in DNA metabarcoding datasets, therefore they have become a key step of metabarcoding analysis pipelines. Clustering often relies on a sequence similarity threshold to gather sequences in Molecular Operational Taxonomic Units (MOTUs), each of which ideally representing a homogeneous taxonomic entity, e.g. a species or a genus. However, the choice of the clustering threshold is rarely justified, and its impact on MOTU over-splitting or...
