Extracting frequent itemsets is an important task in many data mining applications. When data are very large, it becomes mandatory to perform the mining task by using an external memory algorithm, but only a few of these algorithms have been proposed so far. Since also the result set of all the frequent itemsets is likely to be undesirably large, condensed representations, such as closed itemsets, have recently gained a lot of attention. In this paper we discuss the limitations of the partitioning techniques adopted by external memory algorithms for extracting all the frequent itemsets, when applied to closed itemsets mining. The main issue is that the closedness of an itemset cannot be evaluated only using the local knowledge available in a single partition of the input dataset. A further step is thus needed to correctly merge the partial results. We introduce the first algorithm for mining closed itemsets out of core. The algorithm exploits a divide-et-impera approach, where the input dataset is split into smaller partitions, such that not only they can be loaded, but also they can be mined entirely into the main memory. Moreover, we devised a simple technique based on a new theoretical result that allows us to reduce the problem of merging partial solutions to an external memory sorting problem.

Mining frequent closed itemsets out-of-core

Perego R
2006

Abstract

Extracting frequent itemsets is an important task in many data mining applications. When data are very large, it becomes mandatory to perform the mining task by using an external memory algorithm, but only a few of these algorithms have been proposed so far. Since also the result set of all the frequent itemsets is likely to be undesirably large, condensed representations, such as closed itemsets, have recently gained a lot of attention. In this paper we discuss the limitations of the partitioning techniques adopted by external memory algorithms for extracting all the frequent itemsets, when applied to closed itemsets mining. The main issue is that the closedness of an itemset cannot be evaluated only using the local knowledge available in a single partition of the input dataset. A further step is thus needed to correctly merge the partial results. We introduce the first algorithm for mining closed itemsets out of core. The algorithm exploits a divide-et-impera approach, where the input dataset is split into smaller partitions, such that not only they can be loaded, but also they can be mined entirely into the main memory. Moreover, we devised a simple technique based on a new theoretical result that allows us to reduce the problem of merging partial solutions to an external memory sorting problem.
2006
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
089871611X
Frequent itemsets mining
Out of core algorithms
File in questo prodotto:
File Dimensione Formato  
prod_91344-doc_130343.pdf

solo utenti autorizzati

Descrizione: Mining frequent closed itemsets out-of-core
Tipologia: Versione Editoriale (PDF)
Dimensione 311.91 kB
Formato Adobe PDF
311.91 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/62254
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact