In this paper it is introduced a new methodology for the analysis of barcode sequences. Barcode DNA is a very short nucleotide sequence, corresponding for the animal kingdom to the mitochondrial gene cytochrome c oxidase subunit 1, that acts as a unique element for identification and taxonomic purposes. Traditional barcode analysis uses well consolidated bioinformatics techniques such as sequence alignment, computation of evolutionary distances and phylogenetic trees. The proposed alignment-free approach consists in the use of two different compression-based approximations of Universal Similarity Metric in order to compute dissimilarity matrices among barcode sequences of 20 datasets belonging to different species. From these matrices phylogenetic trees are computed and compared, in terms of topology and branch length, with trees built from evolutionary distance. The results show high similarity values between compression-based and evolutionary-based trees allowing us to consider the former methodology worth to be employed for the study of barcode sequences

A Study of Compression-Based Methods for the Analysis of Barcode Sequences

Riccardo Rizzo;Alfonso Urso;Antonino Fiannaca;Massimo La Rosa
2012

Abstract

In this paper it is introduced a new methodology for the analysis of barcode sequences. Barcode DNA is a very short nucleotide sequence, corresponding for the animal kingdom to the mitochondrial gene cytochrome c oxidase subunit 1, that acts as a unique element for identification and taxonomic purposes. Traditional barcode analysis uses well consolidated bioinformatics techniques such as sequence alignment, computation of evolutionary distances and phylogenetic trees. The proposed alignment-free approach consists in the use of two different compression-based approximations of Universal Similarity Metric in order to compute dissimilarity matrices among barcode sequences of 20 datasets belonging to different species. From these matrices phylogenetic trees are computed and compared, in terms of topology and branch length, with trees built from evolutionary distance. The results show high similarity values between compression-based and evolutionary-based trees allowing us to consider the former methodology worth to be employed for the study of barcode sequences
2012
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Compression Barcode sequences
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/215059
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact