CNR Institutional Research Information System

This work investigates how approximate binary patterns can be objectively evaluated by using as a proxy measure the quality achieved by a text clustering algorithm, where the document features are derived from such patterns. Specifically, we exploit approximate patterns within the well-known FIHC (Frequent Itemset-based Hierarchical Clustering) algorithm, which was originally designed to employ exact frequent itemsets to achieve a concise and informative representation of text data. We analyze different state-of-the-art algorithms for approximate pattern mining, in particular we measure their ability in extracting patterns that well characterize the document topics in terms of the quality of clustering obtained by FIHC. Extensive and reproducible experiments, conducted on publicly available text corpora, show that approximate itemsets provide a better representation than exact ones.

Evaluating top-K approximate patterns via text clustering

Lucchese C;Orlando S;Perego R

2016

Abstract

This work investigates how approximate binary patterns can be objectively evaluated by using as a proxy measure the quality achieved by a text clustering algorithm, where the document features are derived from such patterns. Specifically, we exploit approximate patterns within the well-known FIHC (Frequent Itemset-based Hierarchical Clustering) algorithm, which was originally designed to employ exact frequent itemsets to achieve a concise and informative representation of text data. We analyze different state-of-the-art algorithms for approximate pattern mining, in particular we measure their ability in extracting patterns that well characterize the document topics in terms of the quality of clustering obtained by FIHC. Extensive and reproducible experiments, conducted on publicly available text corpora, show that approximate itemsets provide a better representation than exact ones.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2016
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Lingua/e
	
				Inglese
			
	Supervisori e coordinatori esterni
	
				Madria, S.; Hara, T.
			
	Titolo del convegno
	
				Big Data Analytics and Knowledge Discovery. 18th International Conference
			
	Da pagina
	
				114
			
	A pagina
	
				127
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-319-43946-4_8
			
	URL
	
				http://link.springer.com/chapter/10.1007/978-3-319-43946-4_8
			
	Periodo del Convegno
	
				5 - 8 September 2016
			
	Luogo del Convegno
	
				Porto, Portugal
			
	Parole chiave
	
				Pattern Mining
			
	Codice Scopus
	
				2-s2.0-84981169044
			
	Codice Web of Science
	
				WOS:000389020800008
			
	Numero autori
	
				3
			
	Fulltext
	
				restricted
			
	Tutti gli autori
	
						Lucchese, C; Orlando, S; Perego, R
					
	Tipologia Login Miur
	
				273
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Tipologia
	
				04 Contributo in convegno::04.01 Contributo in Atti di convegno
			
	Identificativo progetto
	
	Titolo Progetto
	
									SoBigData Research Infrastructure
								
	Acronimo
	
									SoBigData
								
	Finanziamento
	
									H2020
								
	N. Contratto
	
									654024
								
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_367080-doc_121332.pdf solo utenti autorizzati Descrizione: Evaluating top-K approximate patterns via text clustering Tipologia: Versione Editoriale (PDF) Dimensione 257.37 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	257.37 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/331890

Citazioni

ND

0

0

social impact