DNA barcodes - one or multiple very short gene sequences - have been proven effective to classify a specimen to species. To handle this task in the plant and fungus kingdoms, multi-locus DNA barcode data as well as sequence analysis techniques are demanded, posing new challenges. In this work, we describe LAF-BARCODING, a Logic Alignment Free technique that counts the number of fixed-length substrings (k-mers) of the input sequences, represents them in feature vectors, and classifies them through a rule-based approach in order to specifically assign multi-locus DNA barcode sequences to their corresponding species. We use LAF to classify several sets of DNA barcode sequences, belonging to the plant and fungus life kingdoms, obtaining compact and meaningful classification models (if-then rules) with high accuracy rates. Conversely to the widespread alignmentbased (e.g., character, tree, and similarity) methods, we highlight that LAF can be successfully applied to multi-locus DNA barcode sequences.

LAF Barcoding: classifying DNA Barcode multi-locus sequences with feature vectors and supervised approaches

Emanuel Weitschek;Giulia Fiscon;Paola Bertolazzi;Giovanni Felici
2015

Abstract

DNA barcodes - one or multiple very short gene sequences - have been proven effective to classify a specimen to species. To handle this task in the plant and fungus kingdoms, multi-locus DNA barcode data as well as sequence analysis techniques are demanded, posing new challenges. In this work, we describe LAF-BARCODING, a Logic Alignment Free technique that counts the number of fixed-length substrings (k-mers) of the input sequences, represents them in feature vectors, and classifies them through a rule-based approach in order to specifically assign multi-locus DNA barcode sequences to their corresponding species. We use LAF to classify several sets of DNA barcode sequences, belonging to the plant and fungus life kingdoms, obtaining compact and meaningful classification models (if-then rules) with high accuracy rates. Conversely to the widespread alignmentbased (e.g., character, tree, and similarity) methods, we highlight that LAF can be successfully applied to multi-locus DNA barcode sequences.
2015
Istituto di Analisi dei Sistemi ed Informatica ''Antonio Ruberti'' - IASI
Inglese
12th International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics
1
6
9788890643798
http://bioinfo.na.iac.cnr.it/cibb2015/
Sì, ma tipo non specificato
DNA Barcoding
alignment-free
classification
supervised machine learning.
4
02 Contributo in Volume::02.01 Contributo in volume (Capitolo o Saggio)
268
none
Emanuel Weitschek; Giulia Fiscon; Valerio Cestarelli ; Paola Bertolazzi ;Giovanni Felici
info:eu-repo/semantics/bookPart
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/290386
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact