Classification Performance of Stacking Ensemble with Meta-Model of Categorical Principal Component Logistic Regression on Food Insecurity Data
(Dhita Elsha Pangestika, Anwar Fitrianto, Kusman Sadik)
DOI : 10.15294/sji.v11i4.15315
- Volume: 11,
Issue: 4,
Sitasi : 0 03-Mar-2025
| Abstrak
| PDF File
| Resource
| Last.10-Jul-2025
Abstrak:
Purpose: Stacking is one type of ensemble whose base-models use different algorithms. The classification results from its base-models are categorical and tend to be associated with each other. They then become input for the stacking meta-model. However, there are no currently definite rules for determining the classifier that becomes the meta-model in stacking. On the other hand, recent research has found that CATPCA-LR can work well on categorical predictor variables associated with each other. Therefore, this study focuses on the classification performance of the stacking algorithm with the CATPCA-LR meta-model.
Methods: The study compared the classification performance stacking with CATPCA-LR meta-model to stacking with other meta-models (random forest, gradient boost, and logistic regression) and its base-models (random forest, gradient boost, extreme gradient boost, extra trees, light gradient boost). This research used food insecurity data from March 2022.
Result: The stacking algorithm with the CATPCA-LR meta-model performs better insecurity data regarding sensitivity, balanced accuracy, F1-Score, and G-Means values. This model offers a sensitivity of 46.28%, a balanced accuracy of 59.82%, an F1-Score of 37.82%, and a G-Means of 58.26%. Meanwhile, regarding specificity values, the light gradient boost (LGB) algorithm gives the highest value compared to other algorithms. This model provides a specificity value of 88.40%. Generally, the stacking with the CATPCA-LR meta-model algorithm provides the best performance compared with other algorithms on food insecurity data.
Novelty: This research has explored a stacking classification performance with CATPCA-LR as meta-model.
|
0 |
2025 |
Perbandingan Algoritma Klasterisasi dengan Principal Component Analysis pada Indikator Sosial Ekonomi Kesehatan Jawa Timur
(Uswatun Hasanah, Monica Rahma Fauziah, Anwar Fitrianto, Erfiani Erfiani, L.M. Risman Dwi Jumansyah)
DOI : 10.62411/tc.v23i4.11534
- Volume: 23,
Issue: 4,
Sitasi : 0 27-Nov-2024
| Abstrak
| PDF File
| Resource
| Last.31-Jul-2025
Abstrak:
K-Means dan K-Medoids digunakan untuk menilai indikator sosial ekonomi dan kesehatan di Provinsi Jawa Timur tahun 2023 melalui metode klasterisasi. Dengan menggunakan Principal Component Analysis (PCA) untuk mereduksi dimensi variabel, penelitian ini mengelompokkan wilayah berdasarkan karakteristik sosial ekonomi dan kesehatan. Data yang dianalisis termasuk angka harapan hidup, tingkat kemiskinan, pengangguran, dan akses ke layanan kesehatan. Kebaruan penelitian ini terletak pada kombinasi unik antara PCA dan K-Medoids untuk menghasilkan klaster yang lebih akurat dan robust terhadap outlier, dibandingkan metode yang biasanya hanya menggunakan satu teknik klasterisasi atau tidak melibatkan reduksi dimensi. Hasil penelitian menunjukkan bahwa K-Medoids dengan PCA menghasilkan klaster yang lebih koheren dan terpisah daripada K-Means, terutama dalam menangani outlier. Menurut metode Elbow dan Silhouette, empat hingga lima klaster adalah pilihan terbaik. PCA meningkatkan akurasi dan efisiensi klasterisasi dengan mengurangi kompleksitas data, yang menghasilkan klaster yang lebih baik Diharapkan temuan ini akan membantu pemerintah membuat kebijakan yang lebih baik untuk mengatasi ketimpangan kesehatan dan sosial ekonomi di Jawa Timur.
Kata kunci: Klasterisasi, Outlier, Principal Component Analysis (PCA)
|
0 |
2024 |
Performance of Ensemble Learning in Diabetic Retinopathy Disease Classification
(Anisa Nurizki, Anwar Fitrianto, Agus Mohamad Soleh)
DOI : 10.15294/sji.v11i2.4725
- Volume: 11,
Issue: 2,
Sitasi : 0 29-May-2024
| Abstrak
| PDF File
| Resource
| Last.10-Jul-2025
Abstrak:
Purpose: This study explores diabetic retinopathy (DR), a complication of diabetes leading to blindness, emphasizing early diagnostic interventions. Leveraging Macular OCT scan data, it aims to optimize prevention strategies through tree-based ensemble learning.
Methods: Data from RSKM Eye Center Padang (October-December 2022) were categorized into four scenarios based on physician certificates: Negative & non-diagnostic DR versus Positive DR, Negative versus Positive DR, Non-Diagnosis versus Positive DR, and Negative DR versus non-Diagnosis versus Positive DR. The suitability of each scenario for ensemble learning was assessed. Class imbalance was addressed with SMOTE, while potential underfitting in random forest models was investigated. Models (RF, ET, XGBoost, DRF) were compared based on accuracy, precision, recall, and speed.
Results: Tree-based ensemble learning effectively classifies DR, with RF performing exceptionally well (80% recall, 78.15% precision). ET demonstrates superior speed. Scenario III, encompassing positive and undiagnosed DR, emerges as optimal, with the highest recall and precision values. These findings underscore the practical utility of tree-based ensemble learning in DR classification, notably in Scenario III.
Novelty: This research distinguishes itself with its unique approach to validating tree-based ensemble learning for DR classification. This validation was accomplished using Macular OCT data and physician certificates, with ETDRS scores demonstrating promising classification capabilities.
|
0 |
2024 |
Village Potential Mapping: Comprehensive Cluster Analysis of Continuous and Categorical Variables with Missing Values and Outliers Dataset in Bogor, West Java, Indonesia
(Nafisa Berliana Indah Pratiwi, Indahwati, Anwar Fitrianto)
DOI : 10.15294/sji.v11i2.3903
- Volume: 11,
Issue: 2,
Sitasi : 0 21-May-2024
| Abstrak
| PDF File
| Resource
| Last.10-Jul-2025
Abstrak:
Purpose: This research emphasizes the need to map villages' conditions and identify village potentials, evaluate the effectiveness of development capability, and address the rural-urban development gap with clustering algorithms. The study employs the village development index (IPD) indicators obtained from the village potential dataset, with various numerical and categorical indicators, to capture both tangible and intangible aspects of village potential. Challenges such as missing data and outliers in IPD data collection can be found. The study aims to evaluate the effectiveness of clustering algorithms, with integrated and separated imputation processes, in handling these data issues and to track the development of villages in the Bogor Regency, West Java, Indonesia, based on the village’s potential (PODES) dataset.
Methods: Three clustering algorithms, such as k-prototype, simple k-medoids, and Clustering of Mixed Numerical and Categorical Data with Missing Values (k-CMM) are compared. The pre-processing data, which is the imputation process for the first two algorithms, is conducted separately, while the k-CMM has an integrated imputation process. Both imputation stages are tree-based algorithms. Cluster evaluation is based on internal criteria and external criteria. Clusters resulting from the k-prototype and simple k-medoids are selected by internal validity indices and compared to k-CMM using external validity indices for several numbers of clusters (k = 3,4,5).
Result: According to data exploration, the IPD of Bogor Regency, West Java, Indonesia dataset contains ± 5% of outliers and six missing values in some chosen variables. Tree-based imputation methods are applied separately in k-prototype and simple k-medoids, jointly in k-CMM. Based on the elbow and gap statistics methods, this research aims to determine the optimum number of clusters k = 3. The internal validity indices performed on k-prototype and simple k-medoids resulting in three clusters (k = 3) are optimum. Trials on several clusters (k = 3,4,5) for three algorithms show that the k-prototype with k = 3 performs the best and is most stable among the two other algorithms with IPD datasets containing many outliers; external validity indices evaluate cluster results.
Novelty: This research addresses issues commonly found in mixed datasets, including outliers and missing values, and how to treat problems before and during cluster analysis. An improvement of Gower distance is applied in the medoid-based clustering algorithm, and the k-CMM algorithm is the first algorithm to integrate the imputation process and clustering analysis, which is interesting to explore this algorithm’s performance in clustering analysis.
|
0 |
2024 |
Eksplorasi dan Klasifikasi K-NN Terhadap Kejadian Luar Biasa Diare di Jawa Barat
(Tahira Fulazzaky, Yully Sofyah Waode, Anwar Fitrianto, Erfiani Erfiani, Alfa Nugraha Pradana)
DOI : 10.33633/tc.v22i4.9281
- Volume: 22,
Issue: 4,
Sitasi : 0 28-Nov-2023
| Abstrak
| PDF File
| Resource
| Last.31-Jul-2025
Abstrak:
Tujuan dari penelitian ini adalah untuk mengkaji bagaimana kualitas air dan sanitasi mempengaruhi Kejadian Luar Biasa (KLB) Diare di Provinsi Jawa Barat, Indonesia, menggunakan data Pendataan Potensi Desa (PODES) tahun 2021. Diare merupakan permasalahan serius dalam kesehatan masyarakat Indonesia, terutama pada kelompok anak balita, dan salah satu faktor penyebab utamanya adalah rendahnya kualitas air dan sanitasi. Dalam konteks penelitian ini, kami menerapkan metode algoritma K-Nearest Neighbors (K-NN) untuk mengklasifikasikan wilayah-wilayah yang mengalami KLB Diare. Hasil eksplorasi data menunjukkan variasi yang signifikan dalam jumlah kasus diare di sejumlah kabupaten dan kota yang tersebar di wilayah Jawa Barat. Untuk menangani masalah ketidakseimbangan data, kami menerapkan teknik Pengurangan Acak (Random Under Sampling), Penambahan Acak (Random Over Sampling), dan Synthetic Minority Oversampling Technique (SMOTE).Hasil analisis menunjukkan bahwa model K-NN dengan penggunaan metode SMOTE menghasilkan tingkat akurasi tertinggi, yaitu sebesar 71.28%. Meskipun demikian, nilai F1 score untuk semua model cenderung rendah, yang mengindikasikan adanya tantangan dalam mengklasifikasikan wilayah-wilayah dengan KLB Diare. Penelitian ini memberikan wawasan yang penting mengenai korelasi antara kualitas air, sanitasi, dan KLB Diare di Jawa Barat, serta mengidentifikasi wilayah-wilayah yang memerlukan perhatian lebih dalam upaya pencegahan dan pengendalian penyakit diare. Hasil ini dapat digunakan sebagai dasar untuk merancang program-program kesehatan yang lebih efektif di daerah-daerah dengan tingkat insiden diare yang tinggi. Kata kunci: Algoritma K-Nearest Neighbors (K-NN), SMOTE, Ketidakseimbangan data dan teknik pengambilan sampel ulang, Kualitas air dan sanitasi, Program pencegahan dan pengendalian diare.
|
0 |
2023 |