The aim of this publication is to expose students to use basic tools for the analysis of big amount of data. The first section starts presenting the definition of Data Mining and Knowledge Discovery in Da-tabase explaining the more common techniques and listing the main operational applications. A second paragraph illustrates the first three phases preceding the application of Data Mining techniques: Selection/Sampling, Pre-processing/Cleaning and Transformation/Reduction of data. These prelaminar data analysis techniques are essential as the results of the Data Mining models depend on the correctness of the data. The third paragraph presents some applications of methodologies. In this section, the technical as-pect has less relevance than the operational one with the aim to explain the use of these tech-niques. However, the more common Data Mining models are listed and explained. The fourth paragraph is addressed to the Text Mining and Web Mining, which are two methodolo-gies used to analyze texts and websites. This section presents the main problems related to textual analysis and the techniques that can be used to obtain effective searches. Finally, two appendices have been added: the Statistical Appendix reports some technical insights that may be useful for understanding the Data Mining systems; in a second appendix, a Short Glossary containing the main terms related to Data Mining used in the text is proposed.

UNA BREVE INTRODUZIONE ALLE TECNICHE DI DATA MINING

Falavigna Greta
2021

Abstract

The aim of this publication is to expose students to use basic tools for the analysis of big amount of data. The first section starts presenting the definition of Data Mining and Knowledge Discovery in Da-tabase explaining the more common techniques and listing the main operational applications. A second paragraph illustrates the first three phases preceding the application of Data Mining techniques: Selection/Sampling, Pre-processing/Cleaning and Transformation/Reduction of data. These prelaminar data analysis techniques are essential as the results of the Data Mining models depend on the correctness of the data. The third paragraph presents some applications of methodologies. In this section, the technical as-pect has less relevance than the operational one with the aim to explain the use of these tech-niques. However, the more common Data Mining models are listed and explained. The fourth paragraph is addressed to the Text Mining and Web Mining, which are two methodolo-gies used to analyze texts and websites. This section presents the main problems related to textual analysis and the techniques that can be used to obtain effective searches. Finally, two appendices have been added: the Statistical Appendix reports some technical insights that may be useful for understanding the Data Mining systems; in a second appendix, a Short Glossary containing the main terms related to Data Mining used in the text is proposed.
2021
Istituto di Ricerca sulla Crescita Economica Sostenibile - IRCrES
978-88-98193-23-3
Artificial Neural Networks
Data Mining
Text mining
Classification
Clustering
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/422116
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact