CNR Institutional Research Information System

The sequence mining problem consists in nding frequent sequential patterns in a database of time-stamped events. Several application domains require limiting the maximum temporal gap between events occurring in the input sequences. However pushing down such constraint is critical for most sequence mining algorithms. In this paper we describe CCSM (Cache-based Constrained Sequence Miner), a new level-wise algorithm that overcomes the troubles usually related to this kind of constraints. CCSM adopts an innovative approach based on kway intersections of idlists to compute the support of candidate sequences. Our k-way intersection method is enhanced by the use of an effective cache that stores intermediate idlists for future reuse. The reuse of intermediate results entails a surprising reduction in the actual number of join operations performed on idlists. CCSM has been experimentally compared with cSPADE, a state of the art algorithm, on several synthetically generated datasets, obtaining better or similar results in most cases.

A new algorithm for gap constrained sequence mining

Orlando S;Perego R;Silvestri C

2004

Abstract

The sequence mining problem consists in nding frequent sequential patterns in a database of time-stamped events. Several application domains require limiting the maximum temporal gap between events occurring in the input sequences. However pushing down such constraint is critical for most sequence mining algorithms. In this paper we describe CCSM (Cache-based Constrained Sequence Miner), a new level-wise algorithm that overcomes the troubles usually related to this kind of constraints. CCSM adopts an innovative approach based on kway intersections of idlists to compute the support of candidate sequences. Our k-way intersection method is enhanced by the use of an effective cache that stores intermediate idlists for future reuse. The reuse of intermediate results entails a surprising reduction in the actual number of join operations performed on idlists. CCSM has been experimentally compared with cSPADE, a state of the art algorithm, on several synthetically generated datasets, obtaining better or similar results in most cases.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2004
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Codice ISBN
	
				1-58113-812-1
			
	Parole chiave
	
				Data mining
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_91034-doc_24819.pdf solo utenti autorizzati Descrizione: articolo pubblicato Tipologia: Versione Editoriale (PDF) Dimensione 280.76 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	280.76 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/56763

Citazioni

ND

33

ND

social impact