The profitability of banks is highly dependent on credit scoring models, which support decision making to approve a loan to a customer. State-of-the-art credit scoring models are based on learning methods. These methods need to cope with the problem of imbalanced classes since credit scoring datasets usually contain mainly paid loans and few defaults (unpaid ones). Recently, new imbalanced learning techniques have been proposed in the literature, and they can improve the credit scoring results. Motivated by this scenario, we evaluate several classification approaches to credit scoring. Besides, we also assess some preprocessing methods to overcome skewed datasets. To achieve it, we use three public real-world credit scoring datasets. In our experiments, we progressively increase the class imbalance in each of these datasets by randomly undersampling the minority class of defaulters to identify how the predictive power is affected. The results indicate that random forest, extreme gradient boosting perform very well in all imbalance levels. We also find that a complete grid search step can increase the prediction power of classification approaches in high imbalanced datasets.

An empirical comparison of classification algorithms for imbalanced credit scoring datasets

Nardini FM;Renso C;
2019

Abstract

The profitability of banks is highly dependent on credit scoring models, which support decision making to approve a loan to a customer. State-of-the-art credit scoring models are based on learning methods. These methods need to cope with the problem of imbalanced classes since credit scoring datasets usually contain mainly paid loans and few defaults (unpaid ones). Recently, new imbalanced learning techniques have been proposed in the literature, and they can improve the credit scoring results. Motivated by this scenario, we evaluate several classification approaches to credit scoring. Besides, we also assess some preprocessing methods to overcome skewed datasets. To achieve it, we use three public real-world credit scoring datasets. In our experiments, we progressively increase the class imbalance in each of these datasets by randomly undersampling the minority class of defaulters to identify how the predictive power is affected. The results indicate that random forest, extreme gradient boosting perform very well in all imbalance levels. We also find that a complete grid search step can increase the prediction power of classification approaches in high imbalanced datasets.
2019
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
9781728145495
Benchmarking
Classification
Credit scoring
Immbalanced datasets
File in questo prodotto:
File Dimensione Formato  
prod_424010-doc_151100.pdf

non disponibili

Descrizione: An empirical comparison of classification algorithms for imbalanced credit scoring datasets
Tipologia: Versione Editoriale (PDF)
Dimensione 141.91 kB
Formato Adobe PDF
141.91 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/380869
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? ND
social impact