MLgsc: A Maximum-Likelihood General Sequence Classifier
Thomas Junier, Vincent Herve, Tina Wunderlin & Pilar Junier
Résumé |
We present software package for classifying protein or nucleotide
sequences to user-specified sets of reference sequences. The
software trains a model using a multiple sequence alignment and a
phylogenetic tree, both supplied by the user. The latter is used to
guide model construction and as a decision tree to speed up the
classification process. The software was evaluated on all the 16S
rRNA gene sequences of the reference dataset found in the
GreenGenes database. On this dataset, the software was shown to
achieve an error rate of around 1% at genus level. Examples of
applications based on the nitrogenase subunit NifH gene and a
protein-coding gene found in endospore-forming Firmicutes is also
presented. The programs in the package have a simple,
straightforward command-line interface for the Unix shell, and are
free and open-source. The package has minimal dependencies and thus
can be easily integrated in command-line based classification
pipelines. |
Mots-clés |
Databases, Genetic, Databases, Protein, *Likelihood Functions, Nucleotides/chemistry, *Phylogeny, Proteins/chemistry, Software |
Citation | Junier, T., Herve, V., Wunderlin, T., & Junier, P. (2015). MLgsc: A Maximum-Likelihood General Sequence Classifier. PLoS One, 10(7). |
Type | Article de périodique (Anglais) |
Date de publication | 2015 |
Nom du périodique | PLoS One |
Volume | 10 |
Numéro | 7 |