Lenders, such as banks and credit card companies, use credit scoring models to evaluate the potential risk posed by lending money to customers, and therefore to mitigate losses due to bad credit. The profitability of the banks thus highly depends on the models used to decide on the customer's loans. State-of-the-art credit scoring models are based on machine learning and statistical methods. One of the major problems of this field is that lenders often deal with imbalanced datasets that usually contain many paid loans but very few not paid ones (called defaults). Recently, dynamic selection methods combined with ensemble methods and preprocessing techniques have been evaluated to improve classification models in imbalanced datasets presenting advantages over the static machine learning methods. In a dynamic selection technique, samples in the neighborhood of each query sample are used to compute the local competence of each base classifier. Then, the technique selects only competent classifiers to predict the query sample. In this paper, we evaluate the suitability of dynamic selection techniques for credit scoring problem, and we present Reduced Minority k-Nearest Neighbors (RMkNN), an approach that enhances state of the art in defining the local region of dynamic selection techniques for imbalanced credit scoring datasets. This proposed technique has a superior prediction performance in imbalanced credit scoring datasets compared to state of the art. Furthermore, RMkNN does not need any preprocessing or sampling method to generate the dynamic selection dataset (called DSEL). Additionally, we observe an equivalence between dynamic selection and static selection classification. We conduct a comprehensive evaluation of the proposed technique against state-of-the-art competitors on six real-world public datasets and one private one. Experiments show that RMkNN improves the classification performance of the evaluated datasets regarding AUC, balanced accuracy, H-measure, G-mean, F-measure, and Recall.
A novel approach to define the local region of dynamic selection techniques in imbalanced credit scoring problems
Nardini F. M.;Renso C.;Trani R.;
2020
Abstract
Lenders, such as banks and credit card companies, use credit scoring models to evaluate the potential risk posed by lending money to customers, and therefore to mitigate losses due to bad credit. The profitability of the banks thus highly depends on the models used to decide on the customer's loans. State-of-the-art credit scoring models are based on machine learning and statistical methods. One of the major problems of this field is that lenders often deal with imbalanced datasets that usually contain many paid loans but very few not paid ones (called defaults). Recently, dynamic selection methods combined with ensemble methods and preprocessing techniques have been evaluated to improve classification models in imbalanced datasets presenting advantages over the static machine learning methods. In a dynamic selection technique, samples in the neighborhood of each query sample are used to compute the local competence of each base classifier. Then, the technique selects only competent classifiers to predict the query sample. In this paper, we evaluate the suitability of dynamic selection techniques for credit scoring problem, and we present Reduced Minority k-Nearest Neighbors (RMkNN), an approach that enhances state of the art in defining the local region of dynamic selection techniques for imbalanced credit scoring datasets. This proposed technique has a superior prediction performance in imbalanced credit scoring datasets compared to state of the art. Furthermore, RMkNN does not need any preprocessing or sampling method to generate the dynamic selection dataset (called DSEL). Additionally, we observe an equivalence between dynamic selection and static selection classification. We conduct a comprehensive evaluation of the proposed technique against state-of-the-art competitors on six real-world public datasets and one private one. Experiments show that RMkNN improves the classification performance of the evaluated datasets regarding AUC, balanced accuracy, H-measure, G-mean, F-measure, and Recall.File | Dimensione | Formato | |
---|---|---|---|
prod_440239-doc_157972.pdf
non disponibili
Descrizione: published version
Tipologia:
Versione Editoriale (PDF)
Dimensione
1.1 MB
Formato
Adobe PDF
|
1.1 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
prod_440239-doc_157973.pdf
accesso aperto
Descrizione: preprint version
Tipologia:
Versione Editoriale (PDF)
Dimensione
683.48 kB
Formato
Adobe PDF
|
683.48 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.