The profitability of banks is highly dependent on credit scoring models, which support decision making to approve a loan to a customer. State-of-the-art credit scoring models are based on learning methods. These methods need to cope with the problem of imbalanced classes since credit scoring datasets usually contain mainly paid loans and few defaults (unpaid ones). Recently, new imbalanced learning techniques have been proposed in the literature, and they can improve the credit scoring results. Motivated by this scenario, we evaluate several classification approaches to credit scoring. Besides, we also assess some preprocessing methods to overcome skewed datasets. To achieve it, we use three public real-world credit scoring datasets. In our experiments, we progressively increase the class imbalance in each of these datasets by randomly undersampling the minority class of defaulters to identify how the predictive power is affected. The results indicate that random forest, extreme gradient boosting perform very well in all imbalance levels. We also find that a complete grid search step can increase the prediction power of classification approaches in high imbalanced datasets.
An empirical comparison of classification algorithms for imbalanced credit scoring datasets
Nardini FM;Renso C;
2019
Abstract
The profitability of banks is highly dependent on credit scoring models, which support decision making to approve a loan to a customer. State-of-the-art credit scoring models are based on learning methods. These methods need to cope with the problem of imbalanced classes since credit scoring datasets usually contain mainly paid loans and few defaults (unpaid ones). Recently, new imbalanced learning techniques have been proposed in the literature, and they can improve the credit scoring results. Motivated by this scenario, we evaluate several classification approaches to credit scoring. Besides, we also assess some preprocessing methods to overcome skewed datasets. To achieve it, we use three public real-world credit scoring datasets. In our experiments, we progressively increase the class imbalance in each of these datasets by randomly undersampling the minority class of defaulters to identify how the predictive power is affected. The results indicate that random forest, extreme gradient boosting perform very well in all imbalance levels. We also find that a complete grid search step can increase the prediction power of classification approaches in high imbalanced datasets.File | Dimensione | Formato | |
---|---|---|---|
prod_424010-doc_151100.pdf
non disponibili
Descrizione: An empirical comparison of classification algorithms for imbalanced credit scoring datasets
Tipologia:
Versione Editoriale (PDF)
Dimensione
141.91 kB
Formato
Adobe PDF
|
141.91 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.