Most of the complexity of common data mining tasks is due to the unknown amount of information contained in the data being mined. The more patterns and corelations are contained in such data, the more resources are needed to extract them. This is confirmed by the fact that in general there is not a single best algorithm for a given data mining task on any possible kind of input dataset. Rather, in order to achieve good performances, strategies and optimizations have to be adopted according to the dataset specific characteristics. For example one typical distinction in transactional databases is between sparse and dense datasets. In this paper we consider Frequent Set Counting as a case study for data mining algorithms. We propose a statistical analysis of the properties of transactional datasets that allows for a characterization of the dataset complexity. We show how such characterization can be used in many fields, from performance prediction to optimization.

Statistical properties of transactional databases

Palmerini P;Perego R
2004

Abstract

Most of the complexity of common data mining tasks is due to the unknown amount of information contained in the data being mined. The more patterns and corelations are contained in such data, the more resources are needed to extract them. This is confirmed by the fact that in general there is not a single best algorithm for a given data mining task on any possible kind of input dataset. Rather, in order to achieve good performances, strategies and optimizations have to be adopted according to the dataset specific characteristics. For example one typical distinction in transactional databases is between sparse and dense datasets. In this paper we consider Frequent Set Counting as a case study for data mining algorithms. We propose a statistical analysis of the properties of transactional datasets that allows for a characterization of the dataset complexity. We show how such characterization can be used in many fields, from performance prediction to optimization.
2004
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
1-58113-812-1
Data mining
File in questo prodotto:
File Dimensione Formato  
prod_91112-doc_125551.pdf

solo utenti autorizzati

Descrizione: Statistical properties of transactional databases
Tipologia: Versione Editoriale (PDF)
Dimensione 355.43 kB
Formato Adobe PDF
355.43 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/57569
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? ND
social impact