CNR Institutional Research Information System

In this paper we address the problem of mining frequent closed itemsets in a highly distributed setting like a Grid. The extraction of frequent (closed) itemsets is an important problem in Data Mining, and is a very expensive phase needed to extract from a transactional database a reduced set of meaningful association rules, typically used for Market Basket Analysis. We figure out an environment where a transactional dataset is horizontally partitioned and stored in different sites. We assume that, due to the huge size of datasets and privacy concerns, dataset partitions cannot be moved to a centralized site where to materialize the whole dataset and perform the mining task. Thus it becomes mandatory to perform separate mining at each site, and then merge local results for deriving global knowledge. This paper shows how frequent closed itemsets, mined independently in each site, can be merged in order to derive globally frequent closed itemsets. Unfortunately, such merging might produce a superset of all the frequent closed itemsets, while the associated supports could be smaller than the exact ones because some globally frequent closed itemsets might be not locally frequent in some partition. In order to avoid an expensive post-processing phase, needed to compute exact global results, we employ a method to approximate the supports of closed itemsets. This approximation is only needed for those globally (closed) frequent itemsets which are locally infrequent on some dataset partitions, and thus are not returned at all from the corresponding sites.

Mining frequent closed itemsets from distributed repositories

Lucchese C;Orlando S;Perego R;Silvestri C

2007

Abstract

In this paper we address the problem of mining frequent closed itemsets in a highly distributed setting like a Grid. The extraction of frequent (closed) itemsets is an important problem in Data Mining, and is a very expensive phase needed to extract from a transactional database a reduced set of meaningful association rules, typically used for Market Basket Analysis. We figure out an environment where a transactional dataset is horizontally partitioned and stored in different sites. We assume that, due to the huge size of datasets and privacy concerns, dataset partitions cannot be moved to a centralized site where to materialize the whole dataset and perform the mining task. Thus it becomes mandatory to perform separate mining at each site, and then merge local results for deriving global knowledge. This paper shows how frequent closed itemsets, mined independently in each site, can be merged in order to derive globally frequent closed itemsets. Unfortunately, such merging might produce a superset of all the frequent closed itemsets, while the associated supports could be smaller than the exact ones because some globally frequent closed itemsets might be not locally frequent in some partition. In order to avoid an expensive post-processing phase, needed to compute exact global results, we employ a method to approximate the supports of closed itemsets. This approximation is only needed for those globally (closed) frequent itemsets which are locally infrequent on some dataset partitions, and thus are not returned at all from the corresponding sites.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2007
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Codice ISBN
	
				978-0-387-37830-5
			
	Parole chiave
	
				Closed frequent itemsets
			
	Appare nelle tipologie:
	
				02.01 Contributo in volume (Capitolo o Saggio)

File in questo prodotto:

File	Dimensione	Formato
prod_139015-doc_129769.pdf solo utenti autorizzati Descrizione: Mining frequent closed itemsets from distributed repositories Tipologia: Versione Editoriale (PDF) Dimensione 453.36 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	453.36 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/97878

Citazioni

ND

ND

ND

social impact