CNR Institutional Research Information System

A major mining task for binary matrixes is the extraction of approximate top-k patterns that are able to concisely describe the input data. The top-k pattern discovery problem is commonly stated as an optimization one, where the goal is to minimize a given cost function, see the accuracy of the data description. In this work, we review several greedy algorithms, and discuss PANDA(+), an algorithmic framework able to optimize different cost functions generalized into a unifying formulation. We evaluated the goodness of the algorithm by measuring the quality of the extracted patterns. We adapted standard quality measures to assess the capability of the algorithm to discover both the items and transactions of the patterns embedded in the data. The evaluation was conducted on synthetic data, where patterns were artificially embedded, and on real-world text collection, where each document is labeled with a topic. Finally, in order to qualitatively evaluate the usefulness of the discovered patterns, we exploited PANDA(+) to detect overlapping communities in a bipartite network. The results show that PANDA(+) is able to discover high-quality patterns in both synthetic and real-world datasets.

A Unifying Framework for Mining Approximate Top-k Binary Patterns

Lucchese C;Orlando S;Perego R

2014

Abstract

A major mining task for binary matrixes is the extraction of approximate top-k patterns that are able to concisely describe the input data. The top-k pattern discovery problem is commonly stated as an optimization one, where the goal is to minimize a given cost function, see the accuracy of the data description. In this work, we review several greedy algorithms, and discuss PANDA(+), an algorithmic framework able to optimize different cost functions generalized into a unifying formulation. We evaluated the goodness of the algorithm by measuring the quality of the extracted patterns. We adapted standard quality measures to assess the capability of the algorithm to discover both the items and transactions of the patterns embedded in the data. The evaluation was conducted on synthetic data, where patterns were artificially embedded, and on real-world text collection, where each document is labeled with a topic. Finally, in order to qualitatively evaluate the usefulness of the discovered patterns, we exploited PANDA(+) to detect overlapping communities in a bipartite network. The results show that PANDA(+) is able to discover high-quality patterns in both synthetic and real-world datasets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2014
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Parole chiave
	
				Mining methods and algorithms
0-1 data
approximate top-k patterns
communities in bipartite networks
MDL
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
prod_295717-doc_84959.pdf solo utenti autorizzati Descrizione: A unifying framework for mining approximate top-k binary patterns Tipologia: Versione Editoriale (PDF) Dimensione 1.67 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.67 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/265647

Citazioni

ND

61

47

social impact