Resolving Data Imbalance Using a Bi-Directional Long-Short Term Memory for Enhanced Diabetes Mellitus Detection
(Andrew Okonji Eboka, Christopher Chukwufunaya Odiakaose, Joy Agboi, Margaret Dumebi Okpor, Paul Avweresuoghene Onoma, Tabitha Chukwudi Aghaunor, Arnold Adimabua Ojugo, Eferhire Valentine Ugbotu, Asuobite ThankGod Max-Egba, Victor Ochuko Geteloma, Amaka Patience Binitie, Christopher Chukwudi Onochie, Rita Erhovwo Ako)
DOI : 10.62411/faith.3048-3719-73
- Volume: 2,
Issue: 1,
Sitasi : 0 08-May-2025
| Abstrak
| PDF File
| Resource
| Last.31-Jul-2025
Abstrak:
Diabetes is the body’s inability to efficiently break down sugar or secrete enough insulin required to process glucose, which supports normal bodily functions. Diabetes, as a prevalent chronic disorder, has contributed to numerous underlying health challenges among its carriers and is classified by the WHO as the world’s deadliest disease and silent killer. Its non-communicable nature makes early diagnosis difficult, allowing progression through various stages: type I, type II, pre-diabetes, and gestational. This challenge is further compounded by the imbalanced nature of diabetes datasets, which leads to high misclassification, poor generalization, and reduced accuracy. This study predicts diabetes using a bi-directional long short-term memory (BiLSTM) model applied to two datasets: (a) PIMA Indian Diabetes and (b) Iraqi Society Dataset, to evaluate the impact of six known balancing techniques and assess their effectiveness. Results show that for PID, the SMOTE-Tomek fused BiLSTM outperforms other balancing schemes with F1, Accuracy, Precision, Recall, and Specificity scores of 0.9182, 0.9198, 0.9128, 0.9248, and 0.9208, respectively. For ISD, it also achieves the best performance with values of 0.9367, 0.9369, 0.9386, 0.9388, and 0.9313, respectively. Other balancing approaches yielded F1 scores ranging from [0.6751 to 0.9347], accuracy [0.684 to 0.9358], Precision [0.6851 to 0.9296], Recall [0.6639 to 0.9356], and specificity [0.6658 to 0.9298]. These results imply that BiLSTM is resilient to the vanishing gradient problem and can effectively classify diabetes cases with enhanced performance.
|
0 |
2025 |
Integrating Hybrid Statistical and Unsupervised LSTM-Guided Feature Extraction for Breast Cancer Detection
(De Rosal Ignatius Moses Setiadi, Arnold Adimabua Ojugo, Octara Pribadi, Etika Kartikadarma, Bimo Haryo Setyoko, Suyud Widiono, Robet Robet, Tabitha Chukwudi Aghaunor, Eferhire Valentine Ugbotu)
DOI : 10.62411/jcta.12698
- Volume: 2,
Issue: 4,
Sitasi : 0 05-May-2025
| Abstrak
| PDF File
| Resource
| Last.31-Jul-2025
Abstrak:
Breast cancer is the most prevalent cancer among women worldwide, requiring early and accurate diagnosis to reduce mortality. This study proposes a hybrid classification pipeline that integrates Hybrid Statistical Feature Selection (HSFS) with unsupervised LSTM-guided feature extraction for breast cancer detection using the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. Initially, 20 features were selected using HSFS based on Mutual Information, Chi-square, and Pearson Correlation. To address class imbalance, the training set was balanced using the Synthetic Minority Over-sampling Technique (SMOTE). Subsequently, an LSTM encoder extracted non-linear latent features from the selected features. A fusion strategy was applied by concatenating the statistical and latent features, followed by re-selection of the top 30 features. The final classification was performed using a Support Vector Machine (SVM) with RBF kernel and evaluated using 5-fold cross-validation and a held-out test set. Experimental results showed that the proposed method achieved an average training accuracy of 98.13%, F1-score of 98.13%, and AUC-ROC of 99.55%. On the held-out test set, the model reached an accuracy of 99.30%, precision of 100%, and F1-score of 99.05%, with an AUC-ROC of 0.9973. The proposed pipeline demonstrates improved generalization and interpretability compared to existing methods such as LightGBM-PSO, DHH-GRU, and ensemble deep networks. These results highlight the effectiveness of combining statistical selection and LSTM-based latent feature encoding in a balanced classification framework.
|
0 |
2025 |
Feature Fusion with Albumentation for Enhancing Monkeypox Detection Using Deep Learning Models
(Nizar Rafi Pratama, De Rosal Ignatius Moses Setiadi, Imanuel Harkespan, Arnold Adimabua Ojugo)
DOI : 10.62411/jcta.12255
- Volume: 2,
Issue: 3,
Sitasi : 0 21-Feb-2025
| Abstrak
| PDF File
| Resource
| Last.31-Jul-2025
Abstrak:
Monkeypox is a zoonotic disease caused by Orthopoxvirus, presenting clinical challenges due to its visual similarity to other dermatological conditions. Early and accurate detection is crucial to prevent further transmission, yet conventional diagnostic methods are often resource-intensive and time-consuming. This study proposes a deep learning-based classification model by integrating Xception and InceptionV3 using feature fusion to enhance performance in classifying Monkeypox skin lesions. Given the limited availability of annotated medical images, data augmentation was applied using Albumentation to improve model generalization. The proposed model was trained and evaluated on the Monkeypox Skin Lesion Dataset (MSLD), achieving 85.96% accuracy, 86.47% precision, 85.25% recall, 78.43% specificity, and an AUC score of 0.8931, outperforming existing methods. Notably, data augmentation significantly improved recall from 81.23% to 85.25%, demonstrating its effectiveness in enhancing sensitivity to positive cases. Ablation studies further validated that augmentation increased overall accuracy from 82.02% to 85.96%, emphasizing its role in improving model robustness. Comparative analysis with other models confirmed the superiority of our approach. This research enhances automated Monkeypox detection, offering a robust and efficient tool for low-resource clinical settings. The findings reinforce the potential of feature fusion and augmentation in improving deep learn-ing-based medical image classification, facilitating more reliable and accessible disease identification.
|
0 |
2025 |
High-Performance Face Spoofing Detection using Feature Fusion of FaceNet and Tuned DenseNet201
(Leygian Reyhan Zuama, De Rosal Ignatius Moses Setiadi, Ajib Susanto, Stefanus Santosa, Hong-Seng Gan, Arnold Adimabua Ojugo)
DOI : 10.62411/faith.3048-3719-62
- Volume: 1,
Issue: 4,
Sitasi : 0 12-Feb-2025
| Abstrak
| PDF File
| Resource
| Last.31-Jul-2025
Abstrak:
Face spoofing detection is critical for biometric security systems to prevent unauthorized access. This study proposes a deep learning-based approach integrating FaceNet and DenseNet201 to enhance face spoofing detection performance. FaceNet generates identity-based embeddings, ensuring robust facial feature representation, while DenseNet201 extracts complementary texture-based features. These features are fused using the Concatenate function to form a more comprehensive representation for im-proved classification. The proposed method is evaluated on two widely used face spoofing datasets, NUAA Photograph Imposter and LCC-FASD, achieving 100% accuracy on NUAA and 99% on LCC-FASD. Ablation studies reveal that data augmentation does not always enhance performance, particularly on high-complexity datasets such as LCC-FASD, where augmentation increases the False Rejection Rate (FRR). Conversely, DenseNet201 benefits more from augmentation, while the proposed method performs best without augmentation. Comparative analysis with previous studies further confirms the superiority of the proposed approach in reducing error rates, particularly Half Total Error Rate (HTER), False Acceptance Rate (FAR), and FRR. These findings indicate that combining identity-based embeddings and texture-based feature extraction significantly improves spoofing detection and enhances model robustness across different attack scenarios. This study advances biometric security by introducing an efficient feature fusion strategy that strengthens deep learning-based spoof detection. Future research may explore further optimization strategies and evaluate the approach on more diverse datasets to enhance generalization.
|
0 |
2025 |
Hypertension Detection via Tree-Based Stack Ensemble with SMOTE-Tomek Data Balance and XGBoost Meta-Learner
(Christopher Chukwufunaya Odiakaose, Fidelis Obukohwo Aghware, Margaret Dumebi Okpor, Andrew Okonji Eboka, Amaka Patience Binitie, Arnold Adimabua Ojugo, De Rosal Ignatius Moses Setiadi, Ayei Egu Ibor, Rita Erhovwo Ako, Victor Ochuko Geteloma, Eferhire Valentine Ugbotu, Tabitha Chukwudi Aghaunor)
DOI : 10.62411/faith.3048-3719-43
- Volume: 1,
Issue: 3,
Sitasi : 0 01-Dec-2024
| Abstrak
| PDF File
| Resource
| Last.31-Jul-2025
Abstrak:
High blood pressure (or hypertension) is a causative disorder to a plethora of other ailments – as it succinctly masks other ailments, making them difficult to diagnose and manage with a targeted treatment plan effectively. While some patients living with elevated high blood pressure can effectively manage their condition via adjusted lifestyle and monitoring with follow-up treatments, Others in self-denial leads to unreported instances, mishandled cases, and in now rampant cases – result in death. Even with the usage of machine learning schemes in medicine, two (2) significant issues abound, namely: (a) utilization of dataset in the construction of the model, which often yields non-perfect scores, and (b) the exploration of complex deep learning models have yielded improved accuracy, which often requires large dataset. To curb these issues, our study explores the tree-based stacking ensemble with Decision tree, Adaptive Boosting, and Random Forest (base learners) while we explore the XGBoost as a meta-learner. With the Kaggle dataset as retrieved, our stacking ensemble yields a prediction accuracy of 1.00 and an F1-score of 1.00 that effectively correctly classified all instances of the test dataset.
|
0 |
2024 |
Outlier Detection Using Gaussian Mixture Model Clustering to Optimize XGBoost for Credit Approval Prediction
(De Rosal Ignatius Moses Setiadi, Ahmad Rofiqul Muslikh, Syahroni Wahyu Iriananda, Warto Warto, Jutono Gondohanindijo, Arnold Adimabua Ojugo)
DOI : 10.62411/jcta.11638
- Volume: 2,
Issue: 2,
Sitasi : 0 01-Nov-2024
| Abstrak
| PDF File
| Resource
| Last.31-Jul-2025
Abstrak:
Credit approval prediction is one of the critical challenges in the financial industry, where the accuracy and efficiency of credit decision-making can significantly affect business risk. This study proposes an outlier detection method using the Gaussian Mixture Model (GMM) combined with Extreme Gradient Boosting (XGBoost) to improve prediction accuracy. GMM is used to detect outliers with a probabilistic approach, allowing for finer-grained anomaly identification compared to distance- or density-based methods. Furthermore, the data cleaned through GMM is processed using XGBoost, a decision tree-based boosting algorithm that efficiently handles complex datasets. This study compares the performance of XGBoost with various outlier detection methods, such as LOF, CBLOF, DBSCAN, IF, and K-Means, as well as various other classification algorithms based on machine learning and deep learning. Experimental results show that the combination of GMM and XGBoost provides the best performance with an accuracy of 95.493%, a recall of 91.650%, and an AUC of 95.145%, outperforming other models in the context of credit approval prediction on an imbalanced dataset. The proposed method has been proven to reduce prediction errors and improve the model's reliability in detecting eligible credit applications.
|
0 |
2024 |
Exploring Machine Learning and Deep Learning Techniques for Occluded Face Recognition: A Comprehensive Survey and Comparative Analysis
(Keny Muhamada, De Rosal Ignatius Moses Setiadi, Usman Sudibyo, Budi Widjajanto, Arnold Adimabua Ojugo)
DOI : 10.62411/faith.2024-30
- Volume: 1,
Issue: 2,
Sitasi : 0 26-Sep-2024
| Abstrak
| PDF File
| Resource
| Last.31-Jul-2025
Abstrak:
Face recognition occluded by occlusions, such as glasses or shadows, remains a challenge in many security and surveillance applications. This study aims to analyze the performance of various machine learning and deep learning techniques in face recognition scenarios with occlusions. We evaluate KNN (standard and FisherFace), CNN, DenseNet, Inception, and FaceNet methods combined with a pre-trained DeepFace model using three public datasets: YALE, Essex Grimace, and Georgia Tech. The results show that KNN maintains the highest accuracy, reaching 100% on two datasets (Essex Grimace and YALE), even in the presence of occlusions. Meanwhile, CNN shows strong performance, with accuracy remaining 100% on YALE, both with and without occlusions, although its performance drops slightly on Essex Grimace (94% with occlusion). DenseNet and Inception show a more significant drop in accuracy when faced with occlusion, with DenseNet dropping from 81% to 72% on Essex Grimace and Inception dropping from 100% to 92% on the same dataset. FaceNet + DeepFace excels on more large dataset (Georgia Tech) with 98% accuracy, but its performance drops dramatically to 53% and 70% on Essex Grimace and YALE with occlusion. These findings indicate that while deep learning methods show high accuracy under ideal conditions, machine learning methods such as KNN are more flexible and robust to occlusion in face recognition.
|
0 |
2024 |
Pilot Study on Enhanced Detection of Cues over Malicious Sites Using Data Balancing on the Random Forest Ensemble
(Margaret Dumebi Okpor, Fidelis Obukohwo Aghware, Maureen Ifeanyi Akazue, Andrew Okonji Eboka, Rita Erhovwo Ako, Arnold Adimabua Ojugo, Christopher Chukwufunaya Odiakaose, Amaka Patience Binitie, Victor Ochuko Geteloma, Patrick Ogholuwarami Ejeh)
DOI : 10.62411/faith.2024-14
- Volume: 1,
Issue: 2,
Sitasi : 0 07-Sep-2024
| Abstrak
| PDF File
| Resource
| Last.31-Jul-2025
Abstrak:
The digital revolution frontiers have rippled across society today – with various web content shared online for users as they seek to promote monetization and asset exchange, with clients constantly seeking improved alternatives at lowered costs to meet their value demands. From item upgrades to their replacement, businesses are poised with retention strategies to help curb the challenge of customer attrition. The birth of smartphones has proliferated feats such as mobility, ease of accessibility, and portability – which, in turn, have continued to ease their rise in adoption, exposing user device vulnerability as they are quite susceptible to phishing. With users classified as more susceptible than others due to online presence and personality traits, studies have sought to reveal lures/cues as exploited by adversaries to enhance phishing success and classify web content as genuine and malicious. Our study explores the tree-based Random Forest to effectively identify phishing cues via sentiment analysis on phishing website datasets as scrapped from user accounts on social network sites. The dataset is scrapped via Python Google Scrapper and divided into train/test subsets to effectively classify contents as genuine or malicious with data balancing and feature selection techniques. With Random Forest as the machine learning of choice, the result shows the ensemble yields a prediction accuracy of 97 percent with an F1-score of 98.19% that effectively correctly classified 2089 instances with 85 incorrectly classified instances for the test-dataset.
|
0 |
2024 |
Effects of Data Resampling on Predicting Customer Churn via a Comparative Tree-based Random Forest and XGBoost
(Rita Erhovwo Ako, Fidelis Obukohwo Aghware, Margaret Dumebi Okpor, Maureen Ifeanyi Akazue, Rume Elizabeth Yoro, Arnold Adimabua Ojugo, De Rosal Ignatius Moses Setiadi, Chris Chukwufunaya Odiakaose, Reuben Akporube Abere, Frances Uche Emordi, Victor Ochuko Geteloma, Patrick Ogholuwarami Ejeh)
DOI : 10.62411/jcta.10562
- Volume: 2,
Issue: 1,
Sitasi : 0 27-Jun-2024
| Abstrak
| PDF File
| Resource
| Last.31-Jul-2025
Abstrak:
Customer attrition has become the focus of many businesses today – since the online market space has continued to proffer customers, various choices and alternatives to goods, services, and products for their monies. Businesses must seek to improve value, meet customers' teething demands/needs, enhance their strategies toward customer retention, and better monetize. The study compares the effects of data resampling schemes on predicting customer churn for both Random Forest (RF) and XGBoost ensembles. Data resampling schemes used include: (a) default mode, (b) random-under-sampling RUS, (c) synthetic minority oversampling technique (SMOTE), and (d) SMOTE-edited nearest neighbor (SMOTEEN). Both tree-based ensembles were constructed and trained to assess how well they performed with the chi-square feature selection mode. The result shows that RF achieved F1 0.9898, Accuracy 0.9973, Precision 0.9457, and Recall 0.9698 for the default, RUS, SMOTE, and SMOTEEN resampling, respectively. Xgboost outperformed Random Forest with F1 0.9945, Accuracy 0.9984, Precision 0.9616, and Recall 0.9890 for the default, RUS, SMOTE, and SMOTEEN, respectively. Studies support that the use of SMOTEEN resampling outperforms other schemes; while, it attributed XGBoost enhanced performance to hyper-parameter tuning of its decision trees. Retention strategies of recency-frequency-monetization were used and have been found to curb churn and improve monetization policies that will place business managers ahead of the curve of churning by customers.
|
0 |
2024 |
Analyzing Quantum Feature Engineering and Balancing Strategies Effect on Liver Disease Classification
(Achmad Nuruddin Safriandono, De Rosal Ignatius Moses Setiadi, Akhmad Dahlan, Farah Zakiyah Rahmanti, Iwan Setiawan Wibisono, Arnold Adimabua Ojugo)
DOI : 10.62411/faith.2024-12
- Volume: 1,
Issue: 1,
Sitasi : 0 01-Jun-2024
| Abstrak
| PDF File
| Resource
| Last.31-Jul-2025
Abstrak:
This research aims to improve the accuracy of liver disease classification using Quantum Feature Engineering (QFE) and the Synthetic Minority Over-sampling Tech-nique and Tomek Links (SMOTE-Tomek) data balancing technique. Four machine learning models were compared in this research, namely eXtreme Gradient Boosting (XGB), Random Forest (RF), Support Vector Machine (SVM), and Logistic Regression (LR) on the Indian Liver Patient Dataset (ILPD) dataset. QFE is applied to capture correlations and complex patterns in the data, while SMOTE-Tomek is used to address data imbalances. The results showed that QFE significantly improved LR performance in terms of recall and specificity up to 99%, which is very important in medical diagnosis. The combination of QFE and SMOTE-Tomek gives the best results for the XGB method with an accuracy of 81%, recall of 90%, and f1-score of 83%. This study concludes that the use of QFE and data balancing techniques can improve liver disease classification performance in general.
|
0 |
2024 |