DNA barcodes - one or multiple very short gene sequences - have been proven effective to classify a specimen to species. To handle this task in the plant and fungus kingdoms, multi-locus DNA barcode data as well as sequence analysis techniques are demanded, posing new challenges. In this work, we describe LAF-BARCODING, a Logic Alignment Free technique that counts the number of fixed-length substrings (k-mers) of the input sequences, represents them in feature vectors, and classifies them through a rule-based approach in order to specifically assign multi-locus DNA barcode sequences to their corresponding species. We use LAF to classify several sets of DNA barcode sequences, belonging to the plant and fungus life kingdoms, obtaining compact and meaningful classification models (if-then rules) with high accuracy rates. Conversely to the widespread alignmentbased (e.g., character, tree, and similarity) methods, we highlight that LAF can be successfully applied to multi-locus DNA barcode sequences.

LAF Barcoding: classifying DNA Barcode multi-locus sequences with feature vectors and supervised approaches

Emanuel Weitschek;Giulia Fiscon;Paola Bertolazzi;Giovanni Felici
2015

Abstract

DNA barcodes - one or multiple very short gene sequences - have been proven effective to classify a specimen to species. To handle this task in the plant and fungus kingdoms, multi-locus DNA barcode data as well as sequence analysis techniques are demanded, posing new challenges. In this work, we describe LAF-BARCODING, a Logic Alignment Free technique that counts the number of fixed-length substrings (k-mers) of the input sequences, represents them in feature vectors, and classifies them through a rule-based approach in order to specifically assign multi-locus DNA barcode sequences to their corresponding species. We use LAF to classify several sets of DNA barcode sequences, belonging to the plant and fungus life kingdoms, obtaining compact and meaningful classification models (if-then rules) with high accuracy rates. Conversely to the widespread alignmentbased (e.g., character, tree, and similarity) methods, we highlight that LAF can be successfully applied to multi-locus DNA barcode sequences.
2015
Istituto di Analisi dei Sistemi ed Informatica ''Antonio Ruberti'' - IASI
9788890643798
DNA Barcoding
alignment-free
classification
supervised machine learning.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/290386
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact