In this paper we present DCI, a new data mining algorithm for frequent set counting. We also discuss in depth the parallelization strategies used in the design of ParDCI, the distributed and multi-threaded algorithm derived from DCI. Multiple heuristics strategies are adopted within DCI, so that the algorithm is able to adapt its behavior not only to the features of the specific computing platform, but also to the features of the dataset being processed. Our approach turned out to be highly scalable and very efficient for mining both short and long patterns present in real and synthetically generated datasets. The experimental results showed that DCI outperforms others previously proposed algorithms under a variety of conditions. ParDCI, the parallel version of DCI, is explicitly devised for targeting clusters of SMP nodes: shared memory and message passing paradigms were used at intra- and inter-node level, respectively. Due to the broad similarity between DCI and Apriori , we were able to adapt effective parallelization strategies previously proposed for Apriori. As a result, ParDCI reaches near optimal speedups.

A scalable multi-strategy algorithm for counting frequent sets

Palmerini P;Perego R;Silvestri F
2002

Abstract

In this paper we present DCI, a new data mining algorithm for frequent set counting. We also discuss in depth the parallelization strategies used in the design of ParDCI, the distributed and multi-threaded algorithm derived from DCI. Multiple heuristics strategies are adopted within DCI, so that the algorithm is able to adapt its behavior not only to the features of the specific computing platform, but also to the features of the dataset being processed. Our approach turned out to be highly scalable and very efficient for mining both short and long patterns present in real and synthetically generated datasets. The experimental results showed that DCI outperforms others previously proposed algorithms under a variety of conditions. ParDCI, the parallel version of DCI, is explicitly devised for targeting clusters of SMP nodes: shared memory and message passing paradigms were used at intra- and inter-node level, respectively. Due to the broad similarity between DCI and Apriori , we were able to adapt effective parallelization strategies previously proposed for Apriori. As a result, ParDCI reaches near optimal speedups.
2002
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
DCI
File in questo prodotto:
File Dimensione Formato  
prod_91525-doc_122708.pdf

accesso aperto

Descrizione: A scalable multi-strategy algorithm for counting frequent sets
Tipologia: Versione Editoriale (PDF)
Dimensione 1.17 MB
Formato Adobe PDF
1.17 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/114011
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact