SciRepID - Scientific Publication Search

Publication Search

41,520 articles from 397 journals · 1,447 citations tracked

Showing 1-20 of 70

Analytics

Yuma Akbar; Frencis Matheos Sarimolle; Dwi Swasono Rachmad; Muhammad Derry Oktaviandi

International Journal of Applied Mathematics and Computing 2026 Asosiasi Riset Ilmu Matematika dan Sains Indonesia

This study aims to analyze public sentiment toward the hashtag #KaburAjaDulu, which has circulated widely on the social media platform X (formerly Twitter). The hashtag reflects the growing anxiety among the public, especially younger generations, regarding socio-political issues in Indonesia. The data were collected using web scraping techniques, focusing on user-generated tweets that contain the hashtag. A comprehensive text preprocessing phase was conducted to clean the raw data by removing irrelevant elements such as URLs, emojis, numbers, and punctuation. The research applies a hybrid classification approach using a combination of Support Vector Machine (SVM) and Random Forest algorithms to categorize sentiment into three classes: positive, negative, and neutral. The performance of the model was evaluated using metrics such as accuracy, precision, recall, and F1-score to determine the effectiveness of the classification. The study aims to demonstrate that combining algorithms can improve classification performance compared to using a single algorithm. This research contributes to the field of sentiment analysis and provides valuable insights for researchers, policymakers, and social observers in understanding public opinion trends in digital media.

Santo Dewatmoko; Nadia Rizky Vindiazhari; Zaenal Muttaqien

Jurnal Manajemen Riset Inovasi 2026 Pusat Riset dan Inovasi Nasional

This study examines customer churn prediction in subscription-based telecommunications from a digital marketing perspective using machine learning. The analysis utilizes a secondary dataset of 7,043 customer records that simulate behavioral, contractual, and financial attributes commonly found in telecom services. Three classification algorithms Logistic Regression, Random Forest, and Gradient Boosting are applied to model churn behavior. Data preprocessing includes handling missing values, encoding categorical variables, and splitting data into training and testing sets. Model performance is evaluated using accuracy, recall, and ROC-AUC, with emphasis on recall due to its importance in identifying at-risk customers. The results show that Gradient Boosting achieves the highest overall performance with an ROC-AUC of 0.84, while Logistic Regression provides relatively higher recall. Key drivers of churn include short-term contracts, higher monthly charges, and lower service engagement. However, recall remains moderate, indicating limitations in capturing complex behavioral factors. These findings suggest the need to combine predictive models with behavioral insights and highlight the importance of early customer engagement and long-term retention strategies.

Suyahman Suyahman; Deny Prasetyo; Ahmad Budi Trisnawan; Ardy Wicaksono; Muhamad Furqon

Predictive maintenance (PdM) plays a crucial role in modern industrial systems by minimizing downtime, reducing maintenance costs, and optimizing asset performance. However, many predictive models operate as “black box” systems, limiting transparency and making it difficult for operators to interpret their outputs. This study aims to integrate Explainable Artificial Intelligence (XAI) techniques with Remaining Useful Life (RUL) prediction models to improve both accuracy and interpretability. Various machine learning and deep learning approaches, including Support Vector Machines (SVM), Random Forest (RF), XGBoost, Long Short-Term Memory (LSTM), and Convolutional Neural Networks (CNN), are employed to predict RUL using real-time sensor data from rotating machinery. XAI methods such as SHAP, LIME, and attention mechanisms are applied to provide human-understandable explanations of model predictions. The models are evaluated based on accuracy, Root Mean Square Error (RMSE), and interpretability scores. The results show that XAI-enhanced models outperform traditional approaches in predictive performance while offering greater transparency. These explanations help maintenance engineers better understand the factors influencing predictions, thereby improving decision-making and trust in the system. Nevertheless, the integration of XAI introduces additional computational complexity, which may pose challenges for large-scale industrial implementation. Overall, this study highlights the potential of combining XAI with RUL prediction to develop more reliable, transparent, and effective predictive maintenance solutions.

Antonieta Aryuka Paskalia Nggotu; Hamdani, Hamdani; Anindita Septiarini

International Journal of Applied Mathematics and Computing 2026 Asosiasi Riset Ilmu Matematika dan Sains Indonesia

The issue of uninhabitable houses still requires an accurate identification mechanism because the manual data collection process has the potential to be time-consuming, costly, and subject to subjectivity in determining aid priorities. This study aims to develop a classification model to identify habitable and uninhabitable houses based on family socioeconomic data using the Random Forest algorithm. The research method includes data preprocessing, data division using stratified split in three scenarios, baseline model development, and optimization through hyperparameter tuning using GridSearchCV with 3-fold cross-validation and balanced class_weight parameters. The data used includes variables such as education type, employment status, occupation type, number of family members, and family insurance type. The test results show that the 70:30 data division scenario after tuning provides the best performance with a recall value of 0.5797 for uninhabitable houses and an F1-score of 0.4746. Feature importance analysis shows that education type and employment status are the most influential variables in the classification. The results of this study show that the model built is capable of increasing sensitivity in detecting uninhabitable houses to support more objective field survey prioritization.

Sutrisno, Sutrisno; Winny, Purbaratri

Journal of Information Technology and Computer Science 2026 International Forum of Researchers and Lecturers

This study examines the application of Transparent Artificial Intelligence (AI) for fraud detection in public welfare programs using publicly available administrative data. Persistent challenges in welfare governance such as misallocation, fraud, and data inaccuracy necessitate analytical frameworks that are both effective and explainable. The research aims to design and evaluate an interpretable anomaly detection system capable of identifying irregularities in welfare distribution while maintaining transparency and accountability. Methodologically, the study employs two unsupervised models Isolation Forest and Local Outlier Factor (LOF) to detect anomalies in sub-district-level welfare data, incorporating features such as population size, number of beneficiaries, and coverage ratio. An Explainable AI (XAI) framework integrating surrogate Random Forests, Permutation Feature Importance (PFI), and local linear surrogates (LIME-like) is applied to ensure interpretability of both global and local model behaviors. Findings reveal that receivers per 1000 population and percentage coverage are dominant determinants of anomaly scores. Fifteen administrative units were flagged for potential inconsistencies suggesting over- or under-reporting of beneficiaries. Cross-validation between IF and LOF models confirmed consistency in identifying anomalous regions. The integrated XAI explanations enhance transparency, enabling policymakers and auditors to trace the rationale behind detected anomalies. In conclusion, the proposed Transparent AI framework demonstrates that combining anomaly detection with interpretability tools can strengthen accountability and fairness in welfare administration. It offers a reproducible, ethical, and data-driven approach to social program monitoring, reinforcing public trust and supporting responsible AI governance.

Devianto, Yudo; Saragih, Rusmin; Cahyana, Yana

Journal of Information Technology and Computer Science 2026 International Forum of Researchers and Lecturers

This research benchmarks multiple machine learning (ML) algorithms for large-scale loan default prediction using a real-world dataset of 255,000 borrower records, where default cases represent only ~9–12% of total observations. The study addresses the persistent gap in comparative analyses of ML models that balance predictive accuracy, interpretability, and computational efficiency for credit risk assessment. Six algorithmic families were evaluated Logistic Regression, Random Forest, XGBoost, LightGBM, CatBoost, Artificial Neural Networks (ANN), and Stacked Ensemble—using standardized preprocessing, hybrid imbalance handling (SMOTE, class weighting, under-sampling), and comprehensive evaluation metrics (AUC, F1, Recall, Precision, PR-AUC, and Brier Score). Empirical results show Logistic Regression achieved the highest AUC of 0.732, outperforming nonlinear models under the baseline configuration, while LightGBM attained perfect recall (1.0) but low precision (0.116), indicating over-prediction of defaults. Gradient boosting models demonstrated robust calibration (Brier ≈ 0.114–0.116) and the best computational efficiency, with LightGBM showing the fastest training and lowest memory use. CatBoost exhibited strong recall but the slowest computation, and ANN underperformed on tabular data (AUC ≈ 0.56). The Stacked Ensemble delivered balanced results with AUC = 0.664 and improved overall stability. These findings confirm that boosting-based models, particularly LightGBM and CatBoost, offer superior scalability and calibration, whereas Logistic Regression remains a valuable interpretable baseline. The study concludes that effective default prediction requires integrating rebalancing, calibration, and threshold optimization to enhance recall and operational deployment reliability in large-scale credit ecosystems.

Eko Susanto; Sharipuddin Sharipuddin; Benni Purnama

Prosiding Seminar Nasional Ilmu Teknik 2026 Asosiasi Riset Ilmu Teknik Indonesia

The rapid growth of e-commerce in Indonesia, particularly the Shopee platform, has generated a large volume of user reviews on the Google Play Store, which can be analyzed to understand consumer sentiment. This study aims to compare the performance of the Support Vector Machine (SVM) and Random Forest (RF) algorithms in binary sentiment classification (positive and negative) on Shopee reviews, as well as to statistically test the significance of their differences using One-Way ANOVA. A total of 400,498 reviews were collected via web scraping, preprocessed through text normalization, tokenization, and Indonesian language stemming, and then feature-extracted using TF-IDF and Count Vectorizer. Evaluation results show that SVM achieved an accuracy of 91.77%, precision of 91.49%, recall of 91.77%, and F1-Score of 91.56%, while RF achieved an accuracy of 90.07%, precision of 91.68%, recall of 90.07%, and F1-Score of 90.55%. ANOVA confirmed that the performance difference between the two algorithms is statistically significant (p-value = 0.0007) with a large effect size (η² = 0.1815). Therefore, SVM is recommended as a more optimal and consistent algorithm for automated sentiment analysis of Indonesian e-commerce reviews, while also providing a replicable methodological framework for similar future research.

Tengku Syahvina Rival Dini; Rani Chantika; Pebi Mina Husania; Puji Sri Alhirani

Prosiding Seminar Nasional Ilmu Teknik 2026 Asosiasi Riset Ilmu Teknik Indonesia

This research develops a machine learning model to classify customer loyalty using the Random Forest algorithm. Customer churn is a critical issue that reduces revenue and increases acquisition costs. A dataset of 50,000 customers from global e-commerce and subscription platforms was processed through data cleaning, imputation, outlier handling, and class balancing with SMOTE. The Random Forest model was built as a baseline and optimized with hyperparameter tuning. Evaluation using accuracy, precision, recall, and F1-score shows that the optimized model achieved 90.81% accuracy and 83.87% F1-score, outperforming previous Naïve Bayes approaches. Feature importance analysis highlights customer service interactions, lifetime value, and demographic factors as key predictors of churn. These findings demonstrate Random Forest’s effectiveness in churn prediction and provide practical insights for customer retention strategies

Dewa Ayu Putu Angelina Dewi; I Wayan Sudiarsa; Ni Made Dwi Junita Sariyani; Yuvensia Armelia Sumu; Gusti Ngurah Abhimanyu

Jurnal Bisnis Inovatif dan Digital 2026 Asosiasi Riset Ilmu Manajemen Kewirausahaan dan Bisnis Indonesia

The rapid development of digital technology has led to an increased adoption of digital payment methods in online transaction-based businesses. However, in practice, failures and limitations in the implementation of digital payment systems still occur, potentially disrupting transaction processes and reducing customer convenience. Payment related obstacles may result in transaction cancellations and increase the risk of customer churn. This study aims to analyze the impact of failures and limitations in digital payment methods on customer churn using a classification-based approach. The data used in this research are secondary e-commerce customer data obtained from the Kaggle platform, including transaction information, payment methods, customer behavior, and historical transaction records. The research methodology consists of data preprocessing, time-based feature engineering, and classification modeling using logistic regression, decision tree, and random forest algorithms. Model performance is evaluated using accuracy, precision, recall, F1-score, and confusion matrix metrics. The results indicate that the decision tree model demonstrates superior capability in identifying churn customers compared to the other models, although it does not always achieve the highest accuracy. In addition to digital payment methods, other factors such as purchase value, transaction frequency, purchase timing patterns, and product return rates also influence customer churn. The findings highlight the importance of optimizing digital payment systems as part of customer experience enhancement strategies and customer retention efforts in online transaction–based businesses.

Imakulata Kresnawati M Bili; I Wayan Sudiarta; Maria Yuditia Wungabelen; Ni Kadek Alika Rosdiana; Putri Rafiana

Jurnal Bisnis Inovatif dan Digital 2026 Asosiasi Riset Ilmu Manajemen Kewirausahaan dan Bisnis Indonesia

Customer churn is a strategic challenge for digital streaming platforms because it directly Impacts revenue and business sustainability. This study aims to analyze the factors influencing customer Churn and develop a churn prediction model using the Random Forest algorithm. The study uses a Quantitative approach with an explanatory design and utilizes secondary data from the Netflix Customer Churn and Engagement Dataset available on Kaggle. The dataset consists of 1,000 customer data with 16 Variables covering demographic characteristics, service usage behavior, financial condition, and customer Satisfaction level. The data was processed through preprocessing, one-hot encoding, and a 70:30 split Between training and test data. Model performance was evaluated using accuracy, precision, recall, F1 Score, and ROC-AUC metrics. The results show that the Random Forest model produces an accuracy of 53.7%, precision of 56.3%, recall of 63.6%, F1-score of 59.7%, and ROC-AUC of 0.534, indicating Moderate predictive ability and only slightly better than random classification. Feature importanceAn.evealed that user engagement levels, such as viewing duration and frequency of interactions, Were the most dominant factors influencing churn, followed by economic factors and customer satisfaction. The results of this study are expected to provide a basis for streaming platforms to design more effective Customer retention strategies.

Agung Narayana Adhi Putra; I Wayan Sudiarsa; I Kadek Adi Gunawan; Kadek Bagus Karunia Dwi Dharmayasa; I Wayan Eka Saputra

Saturnus: Jurnal Teknologi dan Sistem Informasi 2026 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

The retail industry generates an extremely large and continuously growing volume of transactional data along with the advancement of digital technology, thereby requiring sophisticated and systematic data analysis approaches to support effective and evidence-based business decision-making. This study aims to analyze retail sales data by utilizing the Retail Sales Dataset obtained from the Kaggle platform, which consists of 100,000 transaction records and broadly represents the characteristics of retail transactions. The main focus of this study is to classify product categories and predict customer segments, including the identification of high-spending customers (high spenders), based on demographic attributes such as age and gender, as well as various transaction-related features. The research methodology includes data preprocessing, label encoding, and feature engineering to generate additional variables, including Age_Group, Is_Holiday, and Spender_Group, which are expected to enhance the predictive capability of the models. Several machine learning algorithms, namely Decision Tree, Random Forest, and XGBoost, were implemented and evaluated to compare their respective performance. The experimental results indicate that multiclass product category classification achieves relatively low accuracy, ranging from 27% to 34%. These findings suggest the high complexity of retail data and highlight the need for further model optimization, class balancing techniques, and feature refinement to improve predictive performance in future studies.

Nadeerah Hani’ Fauziyyah; I Wayan Sudiarsa; Ida Ayu Eka Sastradewi; Kadek Agustine Yueyin Parisya; Sartika Sartika

Jurnal Manajemen Bisnis Digital Terkini 2026 Asosiasi Riset Ilmu Manajemen Kewirausahaan dan Bisnis Indonesia

Because it directly impacts revenue, customer loyalty, and long-term business sustainability, customer churn is a critical issue for the e-commerce industry. High churn rates indicate that a business is unable to retain existing customers, which means it is more expensive to acquire new customers. Therefore, a precise analytical approach is needed to identify customer behavior patterns that are likely to churn. Using machine learning methods, this study analyzes and predicts customer churn. For this study, the E-Commerce Customer Churn 2025 dataset, obtained from Kaggle, was used. This dataset consists of 10,000 customer data and contains fifteen variables covering transaction behavior, customer characteristics, and churn status. Data preprocessing, descriptive analysis, exploratory data analysis (EDA), and classification model development using Logistic Regression and Random Forest algorithms were part of the research project. Model evaluation was conducted using a Confusion Matrix and Receiver Operating Characteristic (ROC) Curve to evaluate the model's accuracy and ability to distinguish between churned and non-churned customers. The results showed that the Random Forest model performed better than Logistic Regression, with an ROC-AUC of 1.00. Furthermore, feature importance analysis revealed that the days_since_last_purchase variable was the most dominant factor in predicting customer churn. These findings are expected to help e-commerce companies design more effective, data-driven customer retention strategies.  

Masari, Maryam Sufiyanu; Danladi, Maiauduga Abdullahi; Onyinye, Ilori Loretta; Tohomdet, Loreta Katok

Journal of Computing Theories and Applications 2026 Universitas Dian Nuswantoro

This study presents a comprehensive comparative analysis of four traditional machine learning algorithms Decision Tree, Random Forest, K-Nearest Neighbors, and Support Vector Machine for Android malware detection using the preprocessed TUANDROMD dataset comprising 4,465 instances and 241 features representing both static and dynamic application characteristics. Motivated by the limitations of conventional signature-based and hybrid detection methods, especially in managing imbalanced datasets and detecting emerging malware variants, the study employed SMOTE to ensure balanced training data and fair model evaluation. The dataset was divided into 80% training and 20% testing subsets, and models were assessed using key performance metrics including accuracy, precision, recall, F1-score, and ROC AUC. The findings revealed that the proposed Random Forest model outperformed the other classifiers, achieving an accuracy of 0.993, precision of 0.992, recall of 1.000, F1-score of 0.996, and a near-perfect ROC AUC of 0.9998 surpassing state-of-the-art approaches. These results affirm the superior predictive capability, consistency, and robustness of the Random Forest algorithm in Android malware detection. The study concludes that base models, when integrated with class-balancing techniques, provide reliable and efficient malware detection across imbalanced datasets. For future research, the study recommends exploring advanced hybrid or ensemble frameworks that integrate Random Forest with deep learning architectures or other meta-heuristic optimization techniques to further enhance detection accuracy, adaptability, and resilience against rapidly evolving Android malware threats.

Purnomo, Rosyana Fitria; Purnomo, Rosyana Fitria; Yodhi Yuniarthe; Hilda Dwi Yunita; Fatimah Fahurian +1 more

Jurnal Elektronika dan Komputer 2026 STEKOM PRESS

Detection and identification of plant diseases is critical to the success and efficiency of agricultural production. Plant disease outbreaks are becoming more frequent throughout the world, and the presence of these diseases in cultivated plants has a significant impact on productivity. Therefore, researchers are focusing on developing effective and reliable plant disease detection methods. Thus, farmers can take advantage of early detection of this disease to minimize future losses. This article discusses machine learning approaches as well as decision trees, K-nearest neighbors, naive Bayes, support vector machines (SVM), and random forests for detecting coffee leaf diseases using leaf images. The above-mentioned classifications were researched and compared to determine the most suitable plant disease prediction model with the highest accuracy. Compared with other classification algorithms, the SVM algorithm achieves the highest accuracy of 99.75%. All the models trained above will be used by farmers to quickly identify and classify new diseases in images as a prevention strategy. As a preventive measure, farmers can detect and classify new diseases in images early.

Siska Nar; Ahmad Nugroho; Ahmad Subhan Yazid; Helmi Wibowo; Alyauma Hajjah

Background: The development of industrial technology in the Industry 4.0 era has encouraged the implementation of intelligent monitoring systems to improve machine reliability and operational efficiency. However, machine fault diagnosis systems based on artificial intelligence often face limitations in terms of interpretability because the models used are complex and difficult to explain. Objective: This study aims to develop a deep learning-based industrial machine fault diagnosis system integrated with an Explainable Artificial Intelligence (XAI) approach to improve diagnostic accuracy while providing interpretable insights for users. Method: The research method involves collecting data from industrial machine sensors consisting of vibration signals, temperature measurements, and acoustic signals, followed by data preprocessing and feature extraction processes. The processed data are then used to train a deep learning-based diagnostic model, after which explainability methods such as SHAP or LIME are applied to analyze the contribution of each feature to the model’s prediction results. Model performance is evaluated using accuracy, precision, recall, and F1-score metrics. Results: The results indicate that the proposed deep learning model achieves better performance compared to conventional machine learning methods such as Support Vector Machine and Random Forest. Furthermore, the explainability analysis reveals that vibration amplitude, increases in machine component temperature, and anomalies in acoustic signals are the main factors influencing machine fault detection. Therefore, the proposed system not only improves the accuracy of machine fault diagnosis but also provides transparency in the decision-making process, thereby supporting the implementation of predictive maintenance in smart manufacturing environments.

Yustinus Liguori; I Wayan Sudiarsa; I Made Jagat Dita; I Gusti Ngurah Galih Jimbar Baskara; Pande Wisnu Wijaya Putra

Router : Jurnal Teknik Informatika dan Terapan 2025 Asosiasi Profesi Telekomunikasi dan Informatika Indonesia

The rapid development of smartphone technology today creates challenges for consumers and manufacturers in determining an objective price range based on highly varied technical specifications. This study aims to implement the Random Forest algorithm in classifying smartphone price ranges into four main categories, namely low, mid-range, high, and flagship. The research method was carried out systematically through the stages of loading a dataset of 2,000 entries, exploratory data analysis (EDA) to ensure data integrity, and model training with a training and testing data split of 80:20. The results showed that the Random Forest model achieved a significant overall accuracy rate of 89%. Based on feature importance analysis, it was found that RAM capacity was the most dominant determining factor, contributing 47% to prediction accuracy, followed by battery power and screen resolution as supporting features. These findings have strategic implications for manufacturers to prioritize memory capacity upgrades in determining product pricing in the market, as well as providing guidance for consumers in assessing the fairness of a device's price based on its technical capabilities.

Maria Rosario Borroek; Jasmir Jasmir; Fachruddin Fachruddin; Marrylinteri Istoningtyas; Yosefina Venus

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

Software development effort estimation is crucial as it is one of the key factors for successful software development. This research employs Random Forest to estimate software development effort. To achieve better results, the study combines the Random Forest method with Genetic Algorithm. The results show that the China dataset provides more accurate estimation compared to the Desharnais dataset, because the China dataset uses relevant feature selection for estimation.

Agung Islamy Aryanto; Yovi Pratama; Afrizal Nehemia Toscany

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

ARP spoofing attacks are a serious threat to network security, particularly in vulnerable Internet of Things (IoT) environments. This final project aims to detect ARP spoofing attacks on IoT net-works using a combination of Random Forest (RF) and Robust PCA methods. RF is chosen for its classification capabilities and handling of non-linear data, while Robust PCA is used for di-mensionality reduction and handling outliers in the data. The dataset used is "MITMArpSpoof-ing.pcap.csv," which contains network traffic data. The data is processed by performing prepro-cessing, feature scaling, and converting labels to binary (0 for benign, 1 for ARP spoofing). Subsequently, Robust PCA is applied to reduce data dimensions, and then the data is trained using the RF model. The test results show that the RF model with Robust PCA achieves an accu-racy of 96.02% in detecting ARP spoofing attacks. This method has proven effective in identify-ing and classifying ARP spoofing attacks on IoT networks.

Fransiskus Dapot Sihaloho; Jasmir Jasmir; Gunardi Gunardi

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

The rapid growth of e-commerce platforms in Indonesia, particularly Tokopedia, has resulted in a large volume of consumer reviews containing valuable information regarding customer perceptions and satisfaction. However, manual analysis of such reviews is inefficient and prone to subjectivity, necessitating an automated approach based on machine learning. This study aims to classify the sentiment of sports product reviews on Tokopedia into positive, negative, and neutral categories by applying Logistic Regression, Support Vector Machine (SVM), and Random Forest using the Term Frequency–Inverse Document Frequency (TF-IDF) approach. The data were collected through web scraping of Indonesian-language sports product reviews and processed through several preprocessing stages, including data cleaning, case folding, tokenization, stopword removal, and stemming. Feature representation was performed using TF-IDF to transform textual data into numerical vectors, after which the dataset was divided into training and testing sets with an 80:20 ratio. Model performance was evaluated using accuracy, precision, recall, and F1-score metrics. The results indicate that the application of TF-IDF significantly improves the performance of all models, with SVM consistently achieving the most optimal performance compared to Logistic Regression and Random Forest. These findings demonstrate that classical machine learning algorithms combined with TF-IDF remain highly effective for sentiment analysis of Indonesian-language text. The implications of this study are expected to assist sellers in understanding customer opinions, support consumers in making informed purchasing decisions, and serve as a foundation for the development of sentiment analysis and recommendation systems on e-commerce platforms.

R. Zaevan Khazafi Putra; Riza Pahlevi; Ronald Naibaho; Agus Nugroho

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

The dynamic changes in weather patterns in Jambi City require an accurate temperature prediction system, thus this study aims to compare the performance of Random Forest and Support Vector Regression (SVR) algorithms in predicting daily maximum temperatures using weather data from 2020–2024 obtained from OpenMeteo with the application of Feature Engineering including lag and rolling window features. The test results indicate that the SVR model with a Radial Basis Function (RBF) kernel optimized using Grid Search (C=10, epsilon=0.2, gamma=0.01) significantly outperforms Random Forest based on a statistical Paired T-test (p-value < 0.05), yielding an R-squared (R²) value of 87.46%, Mean Absolute Error (MAE) of 0.3818 °C, and Root Mean Squared Error (RMSE) of 0.4964 °C compared to Random Forest's R² of 84.05%, where the previous day's temperature (lag) and three-day rolling average were identified as the most dominant predictors, leading to the recommendation of SVR as the more effective method for temperature prediction in the study area.