Efficient processing of similarity joins is important for a large class of data analysis and data-mining applications. This primitive finds all pairs of records within a predefined distance threshold of each other. We present MCAN+, an extension of MCAN (a Content-Addressable Network for metric objects) to support similarity self join queries. The challenge of the proposed approach is to address the problem of the intrinsic quadratic complexity of similarity joins, with the aim of bounding the elaboration time, by involving an increasing number of computational nodes as the dataset size grows. To test the scalability of MCAN+, we used a real-life dataset of color features extracted from one million images of the Flickr photo sharing website.

Scalable similarity self join in a metric DHT system

Gennaro C
2009

Abstract

Efficient processing of similarity joins is important for a large class of data analysis and data-mining applications. This primitive finds all pairs of records within a predefined distance threshold of each other. We present MCAN+, an extension of MCAN (a Content-Addressable Network for metric objects) to support similarity self join queries. The challenge of the proposed approach is to address the problem of the intrinsic quadratic complexity of similarity joins, with the aim of bounding the elaboration time, by involving an increasing number of computational nodes as the dataset size grows. To test the scalability of MCAN+, we used a real-life dataset of color features extracted from one million images of the Flickr photo sharing website.
2009
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
978-88-6122-154-3
Similarity Join
Content-Addressable Network
Metric Space
File in questo prodotto:
File Dimensione Formato  
prod_92003-doc_62415.pdf

accesso aperto

Descrizione: Scalable similarity self join in a metric DHT system
Tipologia: Versione Editoriale (PDF)
Dimensione 230 kB
Formato Adobe PDF
230 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/62350
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact