Benchmarking Machine Learning Models for Large-Scale Loan Default Prediction Using Real Data

Devianto ; Yudo ; Saragih ; Rusmin ; Cahyana ; Yana

Journal of Information Technology and Computer Sci... 📅 2026 Vol 2, No 1 🔗 DOI: 10.70062/globalscience.v2i1.181

Abstract

This research benchmarks multiple machine learning (ML) algorithms for large-scale loan default prediction using a real-world dataset of 255,000 borrower records, where default cases represent only ~9–12% of total observations. The study addresses the persistent gap in comparative analyses of ML models that balance predictive accuracy, interpretability, and computational efficiency for credit risk assessment. Six algorithmic families were evaluated Logistic Regression, Random Forest, XGBoost, LightGBM, CatBoost, Artificial Neural Networks (ANN), and Stacked Ensemble—using standardized preprocessing, hybrid imbalance handling (SMOTE, class weighting, under-sampling), and comprehensive evaluation metrics (AUC, F1, Recall, Precision, PR-AUC, and Brier Score). Empirical results show Logistic Regression achieved the highest AUC of 0.732, outperforming nonlinear models under the baseline configuration, while LightGBM attained perfect recall (1.0) but low precision (0.116), indicating over-prediction of defaults. Gradient boosting models demonstrated robust calibration (Brier ≈ 0.114–0.116) and the best computational efficiency, with LightGBM showing the fastest training and lowest memory use. CatBoost exhibited strong recall but the slowest computation, and ANN underperformed on tabular data (AUC ≈ 0.56). The Stacked Ensemble delivered balanced results with AUC = 0.664 and improved overall stability. These findings confirm that boosting-based models, particularly LightGBM and CatBoost, offer superior scalability and calibration, whereas Logistic Regression remains a valuable interpretable baseline. The study concludes that effective default prediction requires integrating rebalancing, calibration, and threshold optimization to enhance recall and operational deployment reliability in large-scale credit ecosystems.

Keywords

loan default prediction; machine learning; LightGBM; benchmarking; credit risk analytics

How to Cite

Devianto, et al. (2026). Benchmarking Machine Learning Models for Large-Scale Loan Default Prediction Using Real Data. Journal of Information Technology and Computer Science, 2(1). https://doi.org/10.70062/globalscience.v2i1.181

Devianto, Yudo; Saragih, Rusmin; Cahyana, Yana, "Benchmarking Machine Learning Models for Large-Scale Loan Default Prediction Using Real Data," Journal of Information Technology and Computer Science, vol. 2, no. 1, 2026.

Devianto, Yudo; Saragih, Rusmin; Cahyana, Yana. "Benchmarking Machine Learning Models for Large-Scale Loan Default Prediction Using Real Data." Journal of Information Technology and Computer Science, vol. 2, no. 1, 2026.

Devianto, et al. (2026) 'Benchmarking Machine Learning Models for Large-Scale Loan Default Prediction Using Real Data', Journal of Information Technology and Computer Science, 2(1). doi: 10.70062/globalscience.v2i1.181.

Devianto, Yudo; Saragih, Rusmin; Cahyana, Yana. Benchmarking Machine Learning Models for Large-Scale Loan Default Prediction Using Real Data. Journal of Information Technology and Computer Science. 2026;2(1).

Toward Explainable AI for Cybersecurity: A NIST-Based Knowledge Graph for Transparent Semantic Reasoning

Pratama, Firman; Dahil, Irlon; Dien, Marion Erwin; Lase, Dewantoro

Transparent AI for Welfare Programs: Explainable Fraud Detection Using Publicly Available Administrative Data

Sutrisno, Sutrisno; Winny, Purbaratri

From Cryptography To Risk: Network Topology Of Cybersecurity Knowledge

Simarmata, Simon; Boru, Meiton

Interpretable Feature Interaction Mining in High-Dimensional Clinical Data Using Hybrid Tree–Neural Models

Widiastuti, Tiwuk; Richard , Berlien; Maryo Indra, Manjaruni

Enhancing Transparency in Recommender Systems: An Explainable AI Approach Using MovieLens

Noe'man, Achmad; Samsinar; Wibowo, Agung

Explainable End-to-End Autonomous Driving Using Vision-Based Deep Learning in Safety-Critical Scenarios

Sasmoko, Dani; Adi Supriyono, Lawrence; Wijanarko Adi Putra, Toni

Tren Sitasi Jurnal

Akses Artikel

Download PDF Baca Artikel Online doi.org/10.70062/globalscience.v2i1.18... Google Scholar

Detail Publikasi

Jurnal Journal of Information Technology and Co...

Akreditasi Jurnal internasional terindeks pada basis data internasional (non SCOPUS)

Tanggal 08 Mar 2026

Volume 2

Nomor 1

Bahasa eng

Tipe info:eu-repo/semantics/article

DOI 10.70062/globalscience.v2i1.181

Twitter