CNR Institutional Research Information System

We present several exact and highly scalable local pattern sampling algorithms. They can be used as an alternative to exhaustive local pattern discovery methods (e.g, frequent set mining or optimistic-estimator-based subgroup discovery) and can substantially improve efficiency as well as con- trollability of pattern discovery processes. While previous sampling approaches mainly rely on the Markov chain Monte Carlo method, our procedures are direct, i.e., non process- simulating, sampling algorithms. The advantages of these direct methods are an almost optimal time complexity per pattern as well as an exactly controlled distribution of the produced patterns. Namely, the proposed algorithms can sample (item-)sets according to frequency, area, squared fre- quency, and a class discriminativity measure. Experiments demonstrate that these procedures can improve the accuracy of pattern-based models similar to frequent sets and often also lead to substantial gains in terms of scalability.

Direct local pattern sampling by efficient two-step random procedures

Boley M;Lucchese C;Paurat D;Gartner;T

2011

Abstract

We present several exact and highly scalable local pattern sampling algorithms. They can be used as an alternative to exhaustive local pattern discovery methods (e.g, frequent set mining or optimistic-estimator-based subgroup discovery) and can substantially improve efficiency as well as con- trollability of pattern discovery processes. While previous sampling approaches mainly rely on the Markov chain Monte Carlo method, our procedures are direct, i.e., non process- simulating, sampling algorithms. The advantages of these direct methods are an almost optimal time complexity per pattern as well as an exactly controlled distribution of the produced patterns. Namely, the proposed algorithms can sample (item-)sets according to frequency, area, squared fre- quency, and a class discriminativity measure. Experiments demonstrate that these procedures can improve the accuracy of pattern-based models similar to frequent sets and often also lead to substantial gains in terms of scalability.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2011
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Lingua/e
	
				Inglese
			
	Titolo del Volume
	
				KDD 2011
			
	Titolo del convegno
	
				ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD'11
			
	Da pagina
	
				582
			
	A pagina
	
				590
			
	Codice ISBN
	
				978-1-4503-0813-7
			
	Codice DOI
	
				https://dx.doi.org/10.1145/2020408.2020500
			
	URL
	
				http://dl.acm.org/citation.cfm?id=2020500&CFID=61806564&CFTOKEN=64940966
			
	Nome Editore
	
				ACM Press
			
	Città Editore
	
				New York
			
	Nazione Editore
	
				STATI UNITI D'AMERICA
			
	Referee
	
				Sì, ma tipo non specificato
			
	Periodo del Convegno
	
				21-24 August 2011
			
	Luogo del Convegno
	
				San Diego, USA
			
	Parole chiave
	
				Local pattern discovery
Sampling
Pattern- based classification
Frequent sets
			
	Altre informazioni
	
				Area di valutazione 01 - Scienze matematiche e informatiche.
Boley, Mario; Lucchese, Claudio; Paurat, Daniel; Gartner, Thomas
			
	Codice Scopus
	
				2-s2.0-80052653056
			
	Numero autori
	
				1
			
	Fulltext
	
				restricted
			
	Tutti gli autori
	
						Boley M.; Lucchese C.; Paurat D.; Gartner; T.
					
	Tipologia Login Miur
	
				273
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Tipologia
	
				04 Contributo in convegno::04.01 Contributo in Atti di convegno
			
	Identificativo progetto
	
	Titolo Progetto
	
									Using Local Inference in Massively Distributed Systems
								
	Acronimo
	
									LIFT
								
	Finanziamento
	
									FP7
								
	N. Contratto
	
									255951
								
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_206289-doc_46354.pdf solo utenti autorizzati Descrizione: Direct local pattern sampling by efficient two-step random procedures Tipologia: Versione Editoriale (PDF) Dimensione 277.15 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	277.15 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/174086

Citazioni

ND

81

ND

social impact