CNR Institutional Research Information System

A major mining task for binary matrixes is the extraction of approximate top-k patterns that are able to concisely describe the input data. The top-k pattern discovery problem is commonly stated as an optimization one, where the goal is to minimize a given cost function, e.g., the accuracy of the data description. In this work, we review several greedy state-of-the-art algorithms, namely Asso, Hyper+, and PaNDa+, and propose a methodology to compare the patterns extracted. In evaluating the set of mined patterns, we aim at overcoming the usual assessment methodology, which only measures the given cost function to minimize. Thus, we evaluate how good are the models/patterns extracted in unveiling supervised knowledge on the data. To this end, we test algorithms and diverse cost functions on sev- eral datasets from the UCI repository. As contribution, we show that PaNDa+ performs best in the majority of the cases, since the classi- fiers built over the mined patterns used as dataset features are the most accurate.

Supervised evaluation of top-k itemset mining algorithms

Lucchese C;Perego R;Orlando S

2015

Abstract

A major mining task for binary matrixes is the extraction of approximate top-k patterns that are able to concisely describe the input data. The top-k pattern discovery problem is commonly stated as an optimization one, where the goal is to minimize a given cost function, e.g., the accuracy of the data description. In this work, we review several greedy state-of-the-art algorithms, namely Asso, Hyper+, and PaNDa+, and propose a methodology to compare the patterns extracted. In evaluating the set of mined patterns, we aim at overcoming the usual assessment methodology, which only measures the given cost function to minimize. Thus, we evaluate how good are the models/patterns extracted in unveiling supervised knowledge on the data. To this end, we test algorithms and diverse cost functions on sev- eral datasets from the UCI repository. As contribution, we show that PaNDa+ performs best in the majority of the cases, since the classi- fiers built over the mined patterns used as dataset features are the most accurate.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2015
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Lingua/e
	
				Inglese
			
	Supervisori e coordinatori esterni
	
				Sanjay Madria, Takahiro Hara
			
	Titolo del Volume
	
				Big Data Analytics and Knowledge Discovery : 17th International Conference, DaWaK 2015, Valencia, Spain, September 1-4, 2015, Proceedings
			
	Titolo del convegno
	
				Big Data Analytics and Knowledge Discovery. 17th International Conference
			
	Da pagina
	
				82
			
	A pagina
	
				94
			
	Codice ISBN
	
				978-3-319-22729-0
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-319-22729-0_7
			
	URL
	
				https://link.springer.com/chapter/10.1007%2F978-3-319-22729-0_7
			
	Referee
	
				Sì, ma tipo non specificato
			
	Periodo del Convegno
	
				01 - 04 September 2015
			
	Luogo del Convegno
	
				Valencia, Spain
			
	Parole chiave
	
				Approximate patterns
			
	Altre informazioni
	
				Il Modulo CNR corretto è 2103 - ICT.P09.006.001 - 074 - Tecnologie avanzate, Sistemi e Servizi per Grid, non presente nella lista
			
	Codice Scopus
	
				2-s2.0-84943594058
			
	Codice Web of Science
	
				WOS:000363583200007
			
	Numero autori
	
				2
			
	Fulltext
	
				restricted
			
	Tutti gli autori
	
						Lucchese C.; Perego R.; Orlando S.
					
	Tipologia Login Miur
	
				273
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Tipologia
	
				04 Contributo in convegno::04.01 Contributo in Atti di convegno
			
	Identificativo progetto
	
	Titolo Progetto
	
									Europeana Cloud: Unlocking Europe's Research via The Cloud
								
	Acronimo
	
									eCloud
								
	Finanziamento
	
									FP7
								
	N. Contratto
	
									325091
								
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_342627-doc_107196.pdf solo utenti autorizzati Descrizione: Supervised evaluation of top-k itemset mining algorithms Tipologia: Versione Editoriale (PDF) Dimensione 664.17 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	664.17 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
prod_342627-doc_107197.pdf solo utenti autorizzati Descrizione: Supervised evaluation of top-k itemset mining algorithms Tipologia: Versione Editoriale (PDF) Dimensione 420.37 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	420.37 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/303278

Citazioni

ND

1

1

social impact