Fasta files for InpactorDB: A Plant classified lineage-level LTR retrotransposon reference library for free-alignment methods based on Machine Learning

Romain Guyot, Simon Orozco-Arias & Gustavo Isaza
Here, we present InpactorDB a semi-curated dataset composed of 130,511 elements from 195 plant genomes belonging to 108 plant species, classified down to the lineage level. This dataset has been used to train two deep neural networks (one fully connected and one convolutional) for fast classification of elements. Used in lineage-level classification approaches, we obtain a score above 98% of F1-score, precision and recall. In order to classify elements of the ‘LTR_STRUC’ and ‘EDTA’ datasets,...
