A critical problem for several real world applications is class imbalance. Indeed, in contexts like fraud detection or medical diagnostics, standard machine learning models fail because they are designed to handle balanced class distributions. Existing solutions typically increase the rare class instances by generating synthetic records to achieve a balanced class distribution. However, these procedures generate not plausible data and tend to create unnecessary noise. We propose a change of perspective where instead of relying on resampling techniques, we depend on unsupervised features engineering approaches to represent records with a combination of features that will help the classifier capturing the differences among classes, even in presence of imbalanced data. Thus, we combine a large array of outlier detection, features projection, and features selection approaches to augment the expressiveness of the dataset population. We show the effectiveness of our proposal in a deep and wide set of benchmarking experiments as well as in real case studies.

Solving imbalanced learning with outlier detection and features reduction

Guidotti R
2023

Abstract

A critical problem for several real world applications is class imbalance. Indeed, in contexts like fraud detection or medical diagnostics, standard machine learning models fail because they are designed to handle balanced class distributions. Existing solutions typically increase the rare class instances by generating synthetic records to achieve a balanced class distribution. However, these procedures generate not plausible data and tend to create unnecessary noise. We propose a change of perspective where instead of relying on resampling techniques, we depend on unsupervised features engineering approaches to represent records with a combination of features that will help the classifier capturing the differences among classes, even in presence of imbalanced data. Thus, we combine a large array of outlier detection, features projection, and features selection approaches to augment the expressiveness of the dataset population. We show the effectiveness of our proposal in a deep and wide set of benchmarking experiments as well as in real case studies.
2023
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Imbalanced data learning
Outlier detection
Features reduction
Features selection
Classification framework
File in questo prodotto:
File Dimensione Formato  
prod_490298-doc_204277.pdf

accesso aperto

Descrizione: Solving imbalanced learning with outlier detection and features reduction
Tipologia: Versione Editoriale (PDF)
Dimensione 2.36 MB
Formato Adobe PDF
2.36 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/452168
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact