In this paper we address the problem of mining frequent closed itemsets in a highly distributed setting like a Grid. The extraction of frequent (closed) itemsets is an important problem in Data Mining, and is a very expensive phase needed to extract from a transactional database a reduced set of meaningful association rules, typically used for Market Basket Analysis. We figure out an environment where a transactional dataset is horizontally partitioned and stored in different sites. We assume that, due to the huge size of datasets and privacy concerns, dataset partitions cannot be moved to a centralized site where to materialize the whole dataset and perform the mining task. Thus it becomes mandatory to perform separate mining at each site, and then merge local results for deriving global knowledge. This paper shows how frequent closed itemsets, mined independently in each site, can be merged in order to derive globally frequent closed itemsets. Unfortunately, such merging might produce a superset of all the frequent closed itemsets, while the associated supports could be smaller than the exact ones because some globally frequent closed itemsets might be not locally frequent in some partition. In order to avoid an expensive post-processing phase, needed to compute exact global results, we employ a method to approximate the supports of closed itemsets. This approximation is only needed for those globally (closed) frequent itemsets which are locally infrequent on some dataset partitions, and thus are not returned at all from the corresponding sites.

Mining frequent closed itemsets from distributed repositories

Lucchese C;Orlando S;Perego R;
2007

Abstract

In this paper we address the problem of mining frequent closed itemsets in a highly distributed setting like a Grid. The extraction of frequent (closed) itemsets is an important problem in Data Mining, and is a very expensive phase needed to extract from a transactional database a reduced set of meaningful association rules, typically used for Market Basket Analysis. We figure out an environment where a transactional dataset is horizontally partitioned and stored in different sites. We assume that, due to the huge size of datasets and privacy concerns, dataset partitions cannot be moved to a centralized site where to materialize the whole dataset and perform the mining task. Thus it becomes mandatory to perform separate mining at each site, and then merge local results for deriving global knowledge. This paper shows how frequent closed itemsets, mined independently in each site, can be merged in order to derive globally frequent closed itemsets. Unfortunately, such merging might produce a superset of all the frequent closed itemsets, while the associated supports could be smaller than the exact ones because some globally frequent closed itemsets might be not locally frequent in some partition. In order to avoid an expensive post-processing phase, needed to compute exact global results, we employ a method to approximate the supports of closed itemsets. This approximation is only needed for those globally (closed) frequent itemsets which are locally infrequent on some dataset partitions, and thus are not returned at all from the corresponding sites.
2007
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
978-0-387-37830-5
Closed frequent itemsets
File in questo prodotto:
File Dimensione Formato  
prod_139015-doc_129769.pdf

solo utenti autorizzati

Descrizione: Mining frequent closed itemsets from distributed repositories
Tipologia: Versione Editoriale (PDF)
Dimensione 453.36 kB
Formato Adobe PDF
453.36 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/97878
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact