Genomic sequences are usually compared using evolutionary distance, a procedure that implies the alignment of the sequences. Alignment of long sequences is a time consuming procedure and the obtained dissimilarity results is not a metric. Recently, the normalised compression distance was introduced as a method to calculate the distance between two generic digital objects and it seems a suitable way to compare genomic strings. In this paper, the clustering and the non-linear mapping obtained using the evolutionary distance and the compression distance are compared, in order to understand if the two distances sets are similar.

Comparison of Genomic Sequences Clustering using Normalized Compression Distance and Evolutionary Distance

La Rosa Massimo;Rizzo Riccardo;Urso Alfonso
2009

Abstract

Genomic sequences are usually compared using evolutionary distance, a procedure that implies the alignment of the sequences. Alignment of long sequences is a time consuming procedure and the obtained dissimilarity results is not a metric. Recently, the normalised compression distance was introduced as a method to calculate the distance between two generic digital objects and it seems a suitable way to compare genomic strings. In this paper, the clustering and the non-linear mapping obtained using the evolutionary distance and the compression distance are compared, in order to understand if the two distances sets are similar.
2009
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Genomic Sequence Clustering
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/36643
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact