We present several exact and highly scalable local pattern sampling algorithms. They can be used as an alternative to exhaustive local pattern discovery methods (e.g, frequent set mining or optimistic-estimator-based subgroup discovery) and can substantially improve efficiency as well as con- trollability of pattern discovery processes. While previous sampling approaches mainly rely on the Markov chain Monte Carlo method, our procedures are direct, i.e., non process- simulating, sampling algorithms. The advantages of these direct methods are an almost optimal time complexity per pattern as well as an exactly controlled distribution of the produced patterns. Namely, the proposed algorithms can sample (item-)sets according to frequency, area, squared fre- quency, and a class discriminativity measure. Experiments demonstrate that these procedures can improve the accuracy of pattern-based models similar to frequent sets and often also lead to substantial gains in terms of scalability.

Direct local pattern sampling by efficient two-step random procedures

Lucchese C;
2011

Abstract

We present several exact and highly scalable local pattern sampling algorithms. They can be used as an alternative to exhaustive local pattern discovery methods (e.g, frequent set mining or optimistic-estimator-based subgroup discovery) and can substantially improve efficiency as well as con- trollability of pattern discovery processes. While previous sampling approaches mainly rely on the Markov chain Monte Carlo method, our procedures are direct, i.e., non process- simulating, sampling algorithms. The advantages of these direct methods are an almost optimal time complexity per pattern as well as an exactly controlled distribution of the produced patterns. Namely, the proposed algorithms can sample (item-)sets according to frequency, area, squared fre- quency, and a class discriminativity measure. Experiments demonstrate that these procedures can improve the accuracy of pattern-based models similar to frequent sets and often also lead to substantial gains in terms of scalability.
2011
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
KDD 2011
ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD'11
582
590
978-1-4503-0813-7
http://dl.acm.org/citation.cfm?id=2020500&CFID=61806564&CFTOKEN=64940966
ACM Press
New York
STATI UNITI D'AMERICA
Sì, ma tipo non specificato
21-24 August 2011
San Diego, USA
Local pattern discovery
Sampling
Pattern- based classification
Frequent sets
Area di valutazione 01 - Scienze matematiche e informatiche. Boley, Mario; Lucchese, Claudio; Paurat, Daniel; Gartner, Thomas
1
restricted
Boley M.; Lucchese C.; Paurat D.; Gartner; T.
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
   Using Local Inference in Massively Distributed Systems
   LIFT
   FP7
   255951
File in questo prodotto:
File Dimensione Formato  
prod_206289-doc_46354.pdf

solo utenti autorizzati

Descrizione: Direct local pattern sampling by efficient two-step random procedures
Tipologia: Versione Editoriale (PDF)
Dimensione 277.15 kB
Formato Adobe PDF
277.15 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/174086
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 81
  • ???jsp.display-item.citation.isi??? ND
social impact