The purpose of this article is to compare the performance of a credit scoring model by applying different Machine Learning techniques for the classification of payers in bank financing of companies (5 432 historical records). Clients were considered “non-default” or “default” depending on their default index, thus, 4 238 were considered “non-default” and 1 194 “default”, including the information related to 10 variables (features) that composed the database. First, a random undersampling technique was applied to solve the unbalanced data problem. The variables were then coded in two ways: Code I (categorical variables) and Code II (binary or dummy variables). This was followed by the feature selection methods to detect the most important variables. Finally, we used three classifier algorithms of Machine Learning (ML), Bayesian Networks (BN), Decision Tree (DT) and Support Vector Machine (SVM) comparatively. All these techniques were implemented in WEKA (Waikato Environment for Knowledge Analysis) software. The best performance was 95.2% using balanced classes, with the attributes coded in a binary way and the SVM machine learning technique. So, in this way, it is possible to automatically classify (“non-default” or “default”) new instances making use of the proposed methodology with high performance.
Abstract The purpose of this article is to compare the performance of a credit scoring model by applying different Machine Learning techniques for the classification of payers in bank [...]