The aim of this publication is to expose students to use basic tools for the analysis of big amount of data. The first section starts presenting the definition of Data Mining and Knowledge Discovery in Da-tabase explaining the more common techniques and listing the main operational applications. A second paragraph illustrates the first three phases preceding the application of Data Mining techniques: Selection/Sampling, Pre-processing/Cleaning and Transformation/Reduction of data. These prelaminar data analysis techniques are essential as the results of the Data Mining models depend on the correctness of the data. The third paragraph presents some applications of methodologies. In this section, the technical as-pect has less relevance than the operational one with the aim to explain the use of these tech-niques. However, the more common Data Mining models are listed and explained. The fourth paragraph is addressed to the Text Mining and Web Mining, which are two methodolo-gies used to analyze texts and websites. This section presents the main problems related to textual analysis and the techniques that can be used to obtain effective searches. Finally, two appendices have been added: the Statistical Appendix reports some technical insights that may be useful for understanding the Data Mining systems; in a second appendix, a Short Glossary containing the main terms related to Data Mining used in the text is proposed.
UNA BREVE INTRODUZIONE ALLE TECNICHE DI DATA MINING
Falavigna Greta
2021
Abstract
The aim of this publication is to expose students to use basic tools for the analysis of big amount of data. The first section starts presenting the definition of Data Mining and Knowledge Discovery in Da-tabase explaining the more common techniques and listing the main operational applications. A second paragraph illustrates the first three phases preceding the application of Data Mining techniques: Selection/Sampling, Pre-processing/Cleaning and Transformation/Reduction of data. These prelaminar data analysis techniques are essential as the results of the Data Mining models depend on the correctness of the data. The third paragraph presents some applications of methodologies. In this section, the technical as-pect has less relevance than the operational one with the aim to explain the use of these tech-niques. However, the more common Data Mining models are listed and explained. The fourth paragraph is addressed to the Text Mining and Web Mining, which are two methodolo-gies used to analyze texts and websites. This section presents the main problems related to textual analysis and the techniques that can be used to obtain effective searches. Finally, two appendices have been added: the Statistical Appendix reports some technical insights that may be useful for understanding the Data Mining systems; in a second appendix, a Short Glossary containing the main terms related to Data Mining used in the text is proposed.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.