SciRepID - Scientific Publication Search

Outlier Detection Using Gaussian Mixture Model Clustering to Optimize XGBoost for Credit Approval Prediction

Setiadi, De Rosal Ignatius Moses; Muslikh, Ahmad Rofiqul; Iriananda, Syahroni Wahyu; Warto, Warto; Gondohanindijo, Jutono +1 more

Journal of Computing Theories and Applications• 2024 •Universitas Dian Nuswantoro

Credit approval prediction is one of the critical challenges in the financial industry, where the accuracy and efficiency of credit decision-making can significantly affect business risk. This study proposes an outlier detection method using the Gaussian Mixture Model (GMM) combined with Extreme Gradient Boosting (XGBoost) to improve prediction accuracy. GMM is used to detect outliers with a probabilistic approach, allowing for finer-grained anomaly identification compared to distance- or density-based methods. Furthermore, the data cleaned through GMM is processed using XGBoost, a decision tree-based boosting algorithm that efficiently handles complex datasets. This study compares the performance of XGBoost with various outlier detection methods, such as LOF, CBLOF, DBSCAN, IF, and K-Means, as well as various other classification algorithms based on machine learning and deep learning. Experimental results show that the combination of GMM and XGBoost provides the best performance with an accuracy of 95.493%, a recall of 91.650%, and an AUC of 95.145%, outperforming other models in the context of credit approval prediction on an imbalanced dataset. The proposed method has been proven to reduce prediction errors and improve the model's reliability in detecting eligible credit applications.

https://doi.org/10.62411/jcta.11638

Open Access Website Google Scholar

Effects of Data Resampling on Predicting Customer Churn via a Comparative Tree-based Random Forest and XGBoost

Ako, Rita Erhovwo; Aghware, Fidelis Obukohwo; Okpor, Margaret Dumebi; Akazue, Maureen Ifeanyi; Yoro, Rume Elizabeth +7 more

Journal of Computing Theories and Applications• 2024 •Universitas Dian Nuswantoro

Customer attrition has become the focus of many businesses today – since the online market space has continued to proffer customers, various choices and alternatives to goods, services, and products for their monies. Businesses must seek to improve value, meet customers' teething demands/needs, enhance their strategies toward customer retention, and better monetize. The study compares the effects of data resampling schemes on predicting customer churn for both Random Forest (RF) and XGBoost ensembles. Data resampling schemes used include: (a) default mode, (b) random-under-sampling RUS, (c) synthetic minority oversampling technique (SMOTE), and (d) SMOTE-edited nearest neighbor (SMOTEEN). Both tree-based ensembles were constructed and trained to assess how well they performed with the chi-square feature selection mode. The result shows that RF achieved F1 0.9898, Accuracy 0.9973, Precision 0.9457, and Recall 0.9698 for the default, RUS, SMOTE, and SMOTEEN resampling, respectively. Xgboost outperformed Random Forest with F1 0.9945, Accuracy 0.9984, Precision 0.9616, and Recall 0.9890 for the default, RUS, SMOTE, and SMOTEEN, respectively. Studies support that the use of SMOTEEN resampling outperforms other schemes; while, it attributed XGBoost enhanced performance to hyper-parameter tuning of its decision trees. Retention strategies of recency-frequency-monetization were used and have been found to curb churn and improve monetization policies that will place business managers ahead of the curve of churning by customers.

https://doi.org/10.62411/jcta.10562

Open Access Website Google Scholar

Enhanced Vision Transformer and Transfer Learning Approach to Improve Rice Disease Recognition

Rachman, Rahadian Kristiyanto; Setiadi, De Rosal Ignatius Moses; Susanto, Ajib; Nugroho, Kristiawan; Islam, Hussain Md Mehedul

Journal of Computing Theories and Applications• 2024 •Universitas Dian Nuswantoro

In the evolving landscape of agricultural technology, recognizing rice diseases through computational models is a critical challenge, predominantly addressed through Convolutional Neural Networks (CNN). However, the localized feature extraction of CNNs often falls short in complex scenarios, necessitating a shift towards models capable of global contextual understanding. Enter the Vision Transformer (ViT), a paradigm-shifting deep learning model that leverages a self-attention mechanism to transcend the limitations of CNNs by capturing image features in a comprehensive global context. This research embarks on an ambitious journey to refine and adapt the ViT Base(B) transfer learning model for the nuanced task of rice disease recognition. Through meticulous reconfiguration, layer augmentation, and hyperparameter tuning, the study tests the model's prowess across both balanced and imbalanced datasets, revealing its remarkable ability to outperform traditional CNN models, including VGG, MobileNet, and EfficientNet. The proposed ViT model not only achieved superior recall (0.9792), precision (0.9815), specificity (0.9938), f1-score (0.9791), and accuracy (0.9792) on challenging datasets but also established a new benchmark in rice disease recognition, underscoring its potential as a transformative tool in the agricultural domain. This work not only showcases the ViT model's superior performance and stability across diverse tasks and datasets but also illuminates its potential to revolutionize rice disease recognition, setting the stage for future explorations in agricultural AI applications.

https://doi.org/10.62411/jcta.10459

Open Access Website Google Scholar

Enhancing Lung Cancer Classification Effectiveness Through Hyperparameter-Tuned Support Vector Machine

Gomiasti, Fita Sheila; Warto, Warto; Kartikadarma, Etika; Gondohanindijo, Jutono; Setiadi, De Rosal Ignatius Moses

Journal of Computing Theories and Applications• 2024 •Universitas Dian Nuswantoro

This research aims to improve the effectiveness of lung cancer classification performance using Support Vector Machines (SVM) with hyperparameter tuning. Using Radial Basis Function (RBF) kernels in SVM helps deal with non-linear problems. At the same time, hyperparameter tuning is done through Random Grid Search to find the best combination of parameters. Where the best parameter settings are C = 10, Gamma = 10, Probability = True. Test results show that the tuned SVM improves accuracy, precision, specificity, and F1 score significantly. However, there was a slight decrease in recall, namely 0.02. Even though recall is one of the most important measuring tools in disease classification, especially in imbalanced datasets, specificity also plays a vital role in avoiding misidentifying negative cases. Without hyperparameter tuning, the specificity results are so poor that considering both becomes very important. Overall, the best performance obtained by the proposed method is 0.99 for accuracy, 1.00 for precision, 0.98 for recall, 0.99 for f1-score, and 1.00 for specificity. This research confirms the potential of tuned SVMs in addressing complex data classification challenges and offers important insights for medical diagnostic applications.

https://doi.org/10.62411/jcta.10106

Open Access Website Google Scholar

Dataset Analysis and Feature Characteristics to Predict Rice Production based on eXtreme Gradient Boosting

Wijayanti, Ella Budi; Setiadi, De Rosal Ignatius Moses; Setyoko, Bimo Haryo

Journal of Computing Theories and Applications• 2024 •Universitas Dian Nuswantoro

Rice plays a vital role as the main food source for almost half of the global population, contributing more than 21% of the total calories humans need. Production predictions are important for determining import-export policies. This research proposes the XGBoost method to predict rice harvests globally using FAO and World Bank datasets. Feature analysis, removal of duplicate data, and parameter tuning were carried out to support the performance of the XGBoost method. The results showed excellent performance based on which reached 0.99. Evaluation of model performance using metrics such as MSE, and MAE measured by k-fold validation show that XGBoost has a high ability to predict crop yields accurately compared to other regression methods such as Random Forest (RF), Gradient Boost (GB), Bagging Regressor (BR) and K-Nearest Neighbor (KNN). Apart from that, an ablation study was also carried out by comparing the performance of each model with various features and state-of-the-art. The results prove the superiority of the proposed XGBoost method. Where results are consistent, and performance is better, this model can effectively support agricultural sustainability, especially rice production.

https://doi.org/10.62411/jcta.10057

Open Access Website Google Scholar

Exploring DQN-Based Reinforcement Learning in Autonomous Highway Navigation Performance Under High-Traffic Conditions

Nugroho, Sandy; Setiadi, De Rosal Ignatius Moses; Islam, Hussain Md Mehedul

Journal of Computing Theories and Applications• 2024 •Universitas Dian Nuswantoro

Driving in a straight line is one of the fundamental tasks for autonomous vehicles, but it can become complex and challenging, especially when dealing with high-speed highways and dense traffic conditions. This research aims to explore the Deep-Q Networking (DQN) model, which is one of the reinforcement learning (RL) methods, in a highway environment. DQN was chosen due to its proficiency in handling complex data through integrated neural network approximations, making it capable of addressing high-complexity environments. DQN simulations were conducted across four scenarios, allowing the agent to operate at speeds ranging from 60 to nearly 100 km/h. The simulations featured a variable number of vehicles/obstacles, ranging from 20 to 80, and each simulation had a duration of 40 seconds within the Highway-Env simulator. Based on the test results, the DQN method exhibited excellent performance, achieving the highest reward value in the first scenario, 35.6117 out of a maximum of 40, and a success rate of 90.075%.

https://doi.org/10.62411/jcta.9929

Open Access Website Google Scholar

Music-Genre Classification using Bidirectional Long Short-Term Memory and Mel-Frequency Cepstral Coefficients

27 Citations

Wijaya, Nantalira Niar; Setiadi, De Rosal Ignatius Moses; Muslikh, Ahmad Rofiqul

Journal of Computing Theories and Applications• 2024 •Universitas Dian Nuswantoro

Music genre classification is one part of the music recommendation process, which is a challenging job. This research proposes the classification of music genres using Bidirectional Long Short-Term Memory (BiLSTM) and Mel-Frequency Cepstral Coefficients (MFCC) extraction features. This method was tested on the GTZAN and ISMIR2004 datasets, specifically on the IS-MIR2004 dataset, a duration cutting operation was carried out, which was only taken from seconds 31 to 60 so that it had the same duration as GTZAN, namely 30 seconds. Preprocessing operations by removing silent parts and stretching are also performed at the preprocessing stage to obtain normalized input. Based on the test results, the performance of the proposed method is able to produce accuracy on testing data of 93.10% for GTZAN and 93.69% for the ISMIR2004 dataset.

https://doi.org/10.62411/jcta.9655

Open Access Website Google Scholar