In many application fields, huge binary datasets modeling real life-phenomena a re daily produced. The dataset records are usually associated with observations of some events, and people are often interested in mining these datasets in or der to recognize recurrent patterns. However, the discovery of the most importa nt patterns is very challenging. For example, these patterns may overlap, or be related only to a particular subset of the observations. Finally, the mining c an be hindered by the presence of noise. In this paper, we introduce a generative pattern model, and an associated cost model for evaluating the goodness of the set of patterns extracted from a binary dataset. We propose an efficient algorithm, named atopk, for the discovery of the patterns being most important according to the model. We show that the proposed model generalizes other approaches and supports the discovery of higher quality patterns.

A generative pattern model for mining binary datasets

Lucchese C;Perego R;Orlando S
2009

Abstract

In many application fields, huge binary datasets modeling real life-phenomena a re daily produced. The dataset records are usually associated with observations of some events, and people are often interested in mining these datasets in or der to recognize recurrent patterns. However, the discovery of the most importa nt patterns is very challenging. For example, these patterns may overlap, or be related only to a particular subset of the observations. Finally, the mining c an be hindered by the presence of noise. In this paper, we introduce a generative pattern model, and an associated cost model for evaluating the goodness of the set of patterns extracted from a binary dataset. We propose an efficient algorithm, named atopk, for the discovery of the patterns being most important according to the model. We show that the proposed model generalizes other approaches and supports the discovery of higher quality patterns.
2009
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Database Applications. Data Mining
Algorithms
Top-k Pattern Mining
Matrix Decomposition
File in questo prodotto:
File Dimensione Formato  
prod_161060-doc_131345.pdf

accesso aperto

Descrizione: A generative pattern model for mining binary datasets
Dimensione 2.18 MB
Formato Adobe PDF
2.18 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/167607
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact