In many application fields, huge binary datasets modeling real life-phenomena are daily produced. These datasets record observations of some events, and people are often interested in mining them in order to recognize recurrent patterns. However, the discovery of the most important patterns is very challenging. For example, these patterns may overlap, or be related only to a particular subset of the observations. Finally, the mining can be hindered by the presence of noise. In this paper, we introduce a generative pattern model, and an associated cost model for evaluating the goodness of the set of patterns extracted from a binary dataset. We pro- pose an efficient algorithm, named GPM, for the discovery of the most relevant patterns according to the model. We show that the proposed model generalizes other approaches and supports the discovery of high quality patterns.

A generative pattern model for mining binary datasets

Lucchese C;Perego R;Orlando S
2010

Abstract

In many application fields, huge binary datasets modeling real life-phenomena are daily produced. These datasets record observations of some events, and people are often interested in mining them in order to recognize recurrent patterns. However, the discovery of the most important patterns is very challenging. For example, these patterns may overlap, or be related only to a particular subset of the observations. Finally, the mining can be hindered by the presence of noise. In this paper, we introduce a generative pattern model, and an associated cost model for evaluating the goodness of the set of patterns extracted from a binary dataset. We pro- pose an efficient algorithm, named GPM, for the discovery of the most relevant patterns according to the model. We show that the proposed model generalizes other approaches and supports the discovery of high quality patterns.
2010
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
978-1-60558-639-7
Database Applications. Data mining
Frequent pattern mining
File in questo prodotto:
File Dimensione Formato  
prod_120727-doc_132386.pdf

solo utenti autorizzati

Descrizione: A generative pattern model for mining binary datasets
Tipologia: Versione Editoriale (PDF)
Dimensione 2 MB
Formato Adobe PDF
2 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/86051
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact