SciRepID - Scientific Publication Search

Analisis Sentimen Publik terhadap Hashtag #kaburajadulu Menggunakan Kombinasi Algoritma Support Vector Machine (SVM) dan Random Forest

Yuma Akbar; Frencis Matheos Sarimolle; Dwi Swasono Rachmad; Muhammad Derry Oktaviandi

International Journal of Applied Mathematics and Computing• 2026 •Asosiasi Riset Ilmu Matematika dan Sains Indonesia

This study aims to analyze public sentiment toward the hashtag #KaburAjaDulu, which has circulated widely on the social media platform X (formerly Twitter). The hashtag reflects the growing anxiety among the public, especially younger generations, regarding socio-political issues in Indonesia. The data were collected using web scraping techniques, focusing on user-generated tweets that contain the hashtag. A comprehensive text preprocessing phase was conducted to clean the raw data by removing irrelevant elements such as URLs, emojis, numbers, and punctuation. The research applies a hybrid classification approach using a combination of Support Vector Machine (SVM) and Random Forest algorithms to categorize sentiment into three classes: positive, negative, and neutral. The performance of the model was evaluated using metrics such as accuracy, precision, recall, and F1-score to determine the effectiveness of the classification. The study aims to demonstrate that combining algorithms can improve classification performance compared to using a single algorithm. This research contributes to the field of sentiment analysis and provides valuable insights for researchers, policymakers, and social observers in understanding public opinion trends in digital media.

https://doi.org/10.62951/ijamc.v2i3.129

Open Access Website Google Scholar

Perbandingan Model Machine Learning dalam Memprediksi Churn Pelanggan Telekomunikasi

Santo Dewatmoko; Nadia Rizky Vindiazhari; Zaenal Muttaqien

Jurnal Manajemen Riset Inovasi• 2026 •Pusat Riset dan Inovasi Nasional

This study examines customer churn prediction in subscription-based telecommunications from a digital marketing perspective using machine learning. The analysis utilizes a secondary dataset of 7,043 customer records that simulate behavioral, contractual, and financial attributes commonly found in telecom services. Three classification algorithms Logistic Regression, Random Forest, and Gradient Boosting are applied to model churn behavior. Data preprocessing includes handling missing values, encoding categorical variables, and splitting data into training and testing sets. Model performance is evaluated using accuracy, recall, and ROC-AUC, with emphasis on recall due to its importance in identifying at-risk customers. The results show that Gradient Boosting achieves the highest overall performance with an ROC-AUC of 0.84, while Logistic Regression provides relatively higher recall. Key drivers of churn include short-term contracts, higher monthly charges, and lower service engagement. However, recall remains moderate, indicating limitations in capturing complex behavioral factors. These findings suggest the need to combine predictive models with behavioral insights and highlight the importance of early customer engagement and long-term retention strategies.

https://doi.org/10.55606/mri.v4i2.8976

Open Access Website Google Scholar

Explainable Artificial Intelligence Framework for Interpretable Fault Diagnosis and Remaining Useful Life Prediction in Smart Industrial Rotating Machinery

Suyahman Suyahman; Deny Prasetyo; Ahmad Budi Trisnawan; Ardy Wicaksono; Muhamad Furqon

International Journal of Mechanical, Industrial and Control Systems Engineering• 2026 •Asosiasi Riset Ilmu Teknik Indonesia

Predictive maintenance (PdM) plays a crucial role in modern industrial systems by minimizing downtime, reducing maintenance costs, and optimizing asset performance. However, many predictive models operate as “black box” systems, limiting transparency and making it difficult for operators to interpret their outputs. This study aims to integrate Explainable Artificial Intelligence (XAI) techniques with Remaining Useful Life (RUL) prediction models to improve both accuracy and interpretability. Various machine learning and deep learning approaches, including Support Vector Machines (SVM), Random Forest (RF), XGBoost, Long Short-Term Memory (LSTM), and Convolutional Neural Networks (CNN), are employed to predict RUL using real-time sensor data from rotating machinery. XAI methods such as SHAP, LIME, and attention mechanisms are applied to provide human-understandable explanations of model predictions. The models are evaluated based on accuracy, Root Mean Square Error (RMSE), and interpretability scores. The results show that XAI-enhanced models outperform traditional approaches in predictive performance while offering greater transparency. These explanations help maintenance engineers better understand the factors influencing predictions, thereby improving decision-making and trust in the system. Nevertheless, the integration of XAI introduces additional computational complexity, which may pose challenges for large-scale industrial implementation. Overall, this study highlights the potential of combining XAI with RUL prediction to develop more reliable, transparent, and effective predictive maintenance solutions.

https://doi.org/10.61132/ijmicse.v1i1.402

Open Access Website Google Scholar

Identification of Housing Eligibility Status Using Family Data in Samarinda City

Antonieta Aryuka Paskalia Nggotu; Hamdani, Hamdani; Anindita Septiarini

International Journal of Applied Mathematics and Computing• 2026 •Asosiasi Riset Ilmu Matematika dan Sains Indonesia

The issue of uninhabitable houses still requires an accurate identification mechanism because the manual data collection process has the potential to be time-consuming, costly, and subject to subjectivity in determining aid priorities. This study aims to develop a classification model to identify habitable and uninhabitable houses based on family socioeconomic data using the Random Forest algorithm. The research method includes data preprocessing, data division using stratified split in three scenarios, baseline model development, and optimization through hyperparameter tuning using GridSearchCV with 3-fold cross-validation and balanced class_weight parameters. The data used includes variables such as education type, employment status, occupation type, number of family members, and family insurance type. The test results show that the 70:30 data division scenario after tuning provides the best performance with a recall value of 0.5797 for uninhabitable houses and an F1-score of 0.4746. Feature importance analysis shows that education type and employment status are the most influential variables in the classification. The results of this study show that the model built is capable of increasing sensitivity in detecting uninhabitable houses to support more objective field survey prioritization.

https://doi.org/10.62951/ijamc.v3i2.294

Open Access Website Google Scholar

Transparent AI for Welfare Programs: Explainable Fraud Detection Using Publicly Available Administrative Data

Sutrisno, Sutrisno; Winny, Purbaratri

Journal of Information Technology and Computer Science• 2026 •International Forum of Researchers and Lecturers

This study examines the application of Transparent Artificial Intelligence (AI) for fraud detection in public welfare programs using publicly available administrative data. Persistent challenges in welfare governance such as misallocation, fraud, and data inaccuracy necessitate analytical frameworks that are both effective and explainable. The research aims to design and evaluate an interpretable anomaly detection system capable of identifying irregularities in welfare distribution while maintaining transparency and accountability. Methodologically, the study employs two unsupervised models Isolation Forest and Local Outlier Factor (LOF) to detect anomalies in sub-district-level welfare data, incorporating features such as population size, number of beneficiaries, and coverage ratio. An Explainable AI (XAI) framework integrating surrogate Random Forests, Permutation Feature Importance (PFI), and local linear surrogates (LIME-like) is applied to ensure interpretability of both global and local model behaviors. Findings reveal that receivers per 1000 population and percentage coverage are dominant determinants of anomaly scores. Fifteen administrative units were flagged for potential inconsistencies suggesting over- or under-reporting of beneficiaries. Cross-validation between IF and LOF models confirmed consistency in identifying anomalous regions. The integrated XAI explanations enhance transparency, enabling policymakers and auditors to trace the rationale behind detected anomalies. In conclusion, the proposed Transparent AI framework demonstrates that combining anomaly detection with interpretability tools can strengthen accountability and fairness in welfare administration. It offers a reproducible, ethical, and data-driven approach to social program monitoring, reinforcing public trust and supporting responsible AI governance.

https://doi.org/10.70062/globalscience.v2i1.184

Open Access Website Google Scholar

Benchmarking Machine Learning Models for Large-Scale Loan Default Prediction Using Real Data

Devianto, Yudo; Saragih, Rusmin; Cahyana, Yana

Journal of Information Technology and Computer Science• 2026 •International Forum of Researchers and Lecturers

This research benchmarks multiple machine learning (ML) algorithms for large-scale loan default prediction using a real-world dataset of 255,000 borrower records, where default cases represent only ~9–12% of total observations. The study addresses the persistent gap in comparative analyses of ML models that balance predictive accuracy, interpretability, and computational efficiency for credit risk assessment. Six algorithmic families were evaluated Logistic Regression, Random Forest, XGBoost, LightGBM, CatBoost, Artificial Neural Networks (ANN), and Stacked Ensemble—using standardized preprocessing, hybrid imbalance handling (SMOTE, class weighting, under-sampling), and comprehensive evaluation metrics (AUC, F1, Recall, Precision, PR-AUC, and Brier Score). Empirical results show Logistic Regression achieved the highest AUC of 0.732, outperforming nonlinear models under the baseline configuration, while LightGBM attained perfect recall (1.0) but low precision (0.116), indicating over-prediction of defaults. Gradient boosting models demonstrated robust calibration (Brier ≈ 0.114–0.116) and the best computational efficiency, with LightGBM showing the fastest training and lowest memory use. CatBoost exhibited strong recall but the slowest computation, and ANN underperformed on tabular data (AUC ≈ 0.56). The Stacked Ensemble delivered balanced results with AUC = 0.664 and improved overall stability. These findings confirm that boosting-based models, particularly LightGBM and CatBoost, offer superior scalability and calibration, whereas Logistic Regression remains a valuable interpretable baseline. The study concludes that effective default prediction requires integrating rebalancing, calibration, and threshold optimization to enhance recall and operational deployment reliability in large-scale credit ecosystems.

https://doi.org/10.70062/globalscience.v2i1.181

Open Access Website Google Scholar

Komparasi Algoritma SVM dan Random Forest Dalam Sentimen Analisis Review Shopee di Google Play Store Dengan Anova

Eko Susanto; Sharipuddin Sharipuddin; Benni Purnama

Prosiding Seminar Nasional Ilmu Teknik• 2026 •Asosiasi Riset Ilmu Teknik Indonesia

The rapid growth of e-commerce in Indonesia, particularly the Shopee platform, has generated a large volume of user reviews on the Google Play Store, which can be analyzed to understand consumer sentiment. This study aims to compare the performance of the Support Vector Machine (SVM) and Random Forest (RF) algorithms in binary sentiment classification (positive and negative) on Shopee reviews, as well as to statistically test the significance of their differences using One-Way ANOVA. A total of 400,498 reviews were collected via web scraping, preprocessed through text normalization, tokenization, and Indonesian language stemming, and then feature-extracted using TF-IDF and Count Vectorizer. Evaluation results show that SVM achieved an accuracy of 91.77%, precision of 91.49%, recall of 91.77%, and F1-Score of 91.56%, while RF achieved an accuracy of 90.07%, precision of 91.68%, recall of 90.07%, and F1-Score of 90.55%. ANOVA confirmed that the performance difference between the two algorithms is statistically significant (p-value = 0.0007) with a large effect size (η² = 0.1815). Therefore, SVM is recommended as a more optimal and consistent algorithm for automated sentiment analysis of Indonesian e-commerce reviews, while also providing a replicable methodological framework for similar future research.

https://doi.org/10.61132/prosemnasproit.v2i2.177

Open Access Website Google Scholar

Model Machine Learning untuk Klasifikasi Loyalitas Pelanggan Menggunakan Random Forest

Tengku Syahvina Rival Dini; Rani Chantika; Pebi Mina Husania; Puji Sri Alhirani

Prosiding Seminar Nasional Ilmu Teknik• 2026 •Asosiasi Riset Ilmu Teknik Indonesia

This research develops a machine learning model to classify customer loyalty using the Random Forest algorithm. Customer churn is a critical issue that reduces revenue and increases acquisition costs. A dataset of 50,000 customers from global e-commerce and subscription platforms was processed through data cleaning, imputation, outlier handling, and class balancing with SMOTE. The Random Forest model was built as a baseline and optimized with hyperparameter tuning. Evaluation using accuracy, precision, recall, and F1-score shows that the optimized model achieved 90.81% accuracy and 83.87% F1-score, outperforming previous Naïve Bayes approaches. Feature importance analysis highlights customer service interactions, lifetime value, and demographic factors as key predictors of churn. These findings demonstrate Random Forest’s effectiveness in churn prediction and provide practical insights for customer retention strategies

https://doi.org/10.61132/prosemnasproit.v2i2.202

Open Access Website Google Scholar

Analisis Klasifikasi Pengaruh Kegagalan dan Keterbatasan Metode Pembayaran Digital terhadap Churn Pelanggan Menggunakan Decision Tree

Dewa Ayu Putu Angelina Dewi; I Wayan Sudiarsa; Ni Made Dwi Junita Sariyani; Yuvensia Armelia Sumu; Gusti Ngurah Abhimanyu

Jurnal Bisnis Inovatif dan Digital• 2026 •Asosiasi Riset Ilmu Manajemen Kewirausahaan dan Bisnis Indonesia

The rapid development of digital technology has led to an increased adoption of digital payment methods in online transaction-based businesses. However, in practice, failures and limitations in the implementation of digital payment systems still occur, potentially disrupting transaction processes and reducing customer convenience. Payment related obstacles may result in transaction cancellations and increase the risk of customer churn. This study aims to analyze the impact of failures and limitations in digital payment methods on customer churn using a classification-based approach. The data used in this research are secondary e-commerce customer data obtained from the Kaggle platform, including transaction information, payment methods, customer behavior, and historical transaction records. The research methodology consists of data preprocessing, time-based feature engineering, and classification modeling using logistic regression, decision tree, and random forest algorithms. Model performance is evaluated using accuracy, precision, recall, F1-score, and confusion matrix metrics. The results indicate that the decision tree model demonstrates superior capability in identifying churn customers compared to the other models, although it does not always achieve the highest accuracy. In addition to digital payment methods, other factors such as purchase value, transaction frequency, purchase timing patterns, and product return rates also influence customer churn. The findings highlight the importance of optimizing digital payment systems as part of customer experience enhancement strategies and customer retention efforts in online transaction–based businesses.

https://doi.org/10.61132/jubid.v3i1.1232

Open Access Website Google Scholar

Analisis dan Prediksi Customer Churn pada Platform Streaming Berbasis Langganan Menggunakan Metode Random Forest

Imakulata Kresnawati M Bili; I Wayan Sudiarta; Maria Yuditia Wungabelen; Ni Kadek Alika Rosdiana; Putri Rafiana

Jurnal Bisnis Inovatif dan Digital• 2026 •Asosiasi Riset Ilmu Manajemen Kewirausahaan dan Bisnis Indonesia

Customer churn is a strategic challenge for digital streaming platforms because it directly Impacts revenue and business sustainability. This study aims to analyze the factors influencing customer Churn and develop a churn prediction model using the Random Forest algorithm. The study uses a Quantitative approach with an explanatory design and utilizes secondary data from the Netflix Customer Churn and Engagement Dataset available on Kaggle. The dataset consists of 1,000 customer data with 16 Variables covering demographic characteristics, service usage behavior, financial condition, and customer Satisfaction level. The data was processed through preprocessing, one-hot encoding, and a 70:30 split Between training and test data. Model performance was evaluated using accuracy, precision, recall, F1 Score, and ROC-AUC metrics. The results show that the Random Forest model produces an accuracy of 53.7%, precision of 56.3%, recall of 63.6%, F1-score of 59.7%, and ROC-AUC of 0.534, indicating Moderate predictive ability and only slightly better than random classification. Feature importanceAn.evealed that user engagement levels, such as viewing duration and frequency of interactions, Were the most dominant factors influencing churn, followed by economic factors and customer satisfaction. The results of this study are expected to provide a basis for streaming platforms to design more effective Customer retention strategies.

https://doi.org/10.61132/jubid.v3i1.1226

Open Access Website Google Scholar

Analisis Prediksi Penjualan Bisnis Retail Menggunakan Metode Decision Tree dan Random Forest

Agung Narayana Adhi Putra; I Wayan Sudiarsa; I Kadek Adi Gunawan; Kadek Bagus Karunia Dwi Dharmayasa; I Wayan Eka Saputra

Saturnus: Jurnal Teknologi dan Sistem Informasi• 2026 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

The retail industry generates an extremely large and continuously growing volume of transactional data along with the advancement of digital technology, thereby requiring sophisticated and systematic data analysis approaches to support effective and evidence-based business decision-making. This study aims to analyze retail sales data by utilizing the Retail Sales Dataset obtained from the Kaggle platform, which consists of 100,000 transaction records and broadly represents the characteristics of retail transactions. The main focus of this study is to classify product categories and predict customer segments, including the identification of high-spending customers (high spenders), based on demographic attributes such as age and gender, as well as various transaction-related features. The research methodology includes data preprocessing, label encoding, and feature engineering to generate additional variables, including Age_Group, Is_Holiday, and Spender_Group, which are expected to enhance the predictive capability of the models. Several machine learning algorithms, namely Decision Tree, Random Forest, and XGBoost, were implemented and evaluated to compare their respective performance. The experimental results indicate that multiclass product category classification achieves relatively low accuracy, ranging from 27% to 34%. These findings suggest the high complexity of retail data and highlight the need for further model optimization, class balancing techniques, and feature refinement to improve predictive performance in future studies.

https://doi.org/10.61132/saturnus.v4i1.1409

Open Access Website Google Scholar

Analisis Prediksi Customer Churn pada Sektor E-Commerce Berdasarkan Perilaku Transaksi Menggunakan Pendekatan Machine Learning

Nadeerah Hani’ Fauziyyah; I Wayan Sudiarsa; Ida Ayu Eka Sastradewi; Kadek Agustine Yueyin Parisya; Sartika Sartika

Jurnal Manajemen Bisnis Digital Terkini• 2026 •Asosiasi Riset Ilmu Manajemen Kewirausahaan dan Bisnis Indonesia

Because it directly impacts revenue, customer loyalty, and long-term business sustainability, customer churn is a critical issue for the e-commerce industry. High churn rates indicate that a business is unable to retain existing customers, which means it is more expensive to acquire new customers. Therefore, a precise analytical approach is needed to identify customer behavior patterns that are likely to churn. Using machine learning methods, this study analyzes and predicts customer churn. For this study, the E-Commerce Customer Churn 2025 dataset, obtained from Kaggle, was used. This dataset consists of 10,000 customer data and contains fifteen variables covering transaction behavior, customer characteristics, and churn status. Data preprocessing, descriptive analysis, exploratory data analysis (EDA), and classification model development using Logistic Regression and Random Forest algorithms were part of the research project. Model evaluation was conducted using a Confusion Matrix and Receiver Operating Characteristic (ROC) Curve to evaluate the model's accuracy and ability to distinguish between churned and non-churned customers. The results showed that the Random Forest model performed better than Logistic Regression, with an ROC-AUC of 1.00. Furthermore, feature importance analysis revealed that the days_since_last_purchase variable was the most dominant factor in predicting customer churn. These findings are expected to help e-commerce companies design more effective, data-driven customer retention strategies.

https://doi.org/10.61132/jumbidter.v3i1.1228

Open Access Website Google Scholar

Android Malware Detection Using Machine Learning with SMOTE-Tomek Data Balancing

Masari, Maryam Sufiyanu; Danladi, Maiauduga Abdullahi; Onyinye, Ilori Loretta; Tohomdet, Loreta Katok

Journal of Computing Theories and Applications• 2026 •Universitas Dian Nuswantoro

This study presents a comprehensive comparative analysis of four traditional machine learning algorithms Decision Tree, Random Forest, K-Nearest Neighbors, and Support Vector Machine for Android malware detection using the preprocessed TUANDROMD dataset comprising 4,465 instances and 241 features representing both static and dynamic application characteristics. Motivated by the limitations of conventional signature-based and hybrid detection methods, especially in managing imbalanced datasets and detecting emerging malware variants, the study employed SMOTE to ensure balanced training data and fair model evaluation. The dataset was divided into 80% training and 20% testing subsets, and models were assessed using key performance metrics including accuracy, precision, recall, F1-score, and ROC AUC. The findings revealed that the proposed Random Forest model outperformed the other classifiers, achieving an accuracy of 0.993, precision of 0.992, recall of 1.000, F1-score of 0.996, and a near-perfect ROC AUC of 0.9998 surpassing state-of-the-art approaches. These results affirm the superior predictive capability, consistency, and robustness of the Random Forest algorithm in Android malware detection. The study concludes that base models, when integrated with class-balancing techniques, provide reliable and efficient malware detection across imbalanced datasets. For future research, the study recommends exploring advanced hybrid or ensemble frameworks that integrate Random Forest with deep learning architectures or other meta-heuristic optimization techniques to further enhance detection accuracy, adaptability, and resilience against rapidly evolving Android malware threats.

https://doi.org/10.62411/jcta.15084

Open Access Website Google Scholar

Analisis Performansi Pendekatan Machine Learning Pada Deteksi Penyakit Daun Tanaman Kopi

Purnomo, Rosyana Fitria; Purnomo, Rosyana Fitria; Yodhi Yuniarthe; Hilda Dwi Yunita; Fatimah Fahurian +1 more

Jurnal Elektronika dan Komputer• 2026 •STEKOM PRESS

Detection and identification of plant diseases is critical to the success and efficiency of agricultural production. Plant disease outbreaks are becoming more frequent throughout the world, and the presence of these diseases in cultivated plants has a significant impact on productivity. Therefore, researchers are focusing on developing effective and reliable plant disease detection methods. Thus, farmers can take advantage of early detection of this disease to minimize future losses. This article discusses machine learning approaches as well as decision trees, K-nearest neighbors, naive Bayes, support vector machines (SVM), and random forests for detecting coffee leaf diseases using leaf images. The above-mentioned classifications were researched and compared to determine the most suitable plant disease prediction model with the highest accuracy. Compared with other classification algorithms, the SVM algorithm achieves the highest accuracy of 99.75%. All the models trained above will be used by farmers to quickly identify and classify new diseases in images as a prevention strategy. As a preventive measure, farmers can detect and classify new diseases in images early.

https://doi.org/10.51903/elkom.v18i2.3302

Open Access Website Google Scholar