SciRepID - Scientific Publication Search

Publication Search

29,653 articles from 386 journals · 1,447 citations tracked

Showing 1-20 of 57

Analytics

Yuma Akbar; Frencis Matheos Sarimolle; Dwi Swasono Rachmad; Muhammad Derry Oktaviandi

International Journal of Applied Mathematics and Computing 2026 Asosiasi Riset Ilmu Matematika dan Sains Indonesia

This study aims to analyze public sentiment toward the hashtag #KaburAjaDulu, which has circulated widely on the social media platform X (formerly Twitter). The hashtag reflects the growing anxiety among the public, especially younger generations, regarding socio-political issues in Indonesia. The data were collected using web scraping techniques, focusing on user-generated tweets that contain the hashtag. A comprehensive text preprocessing phase was conducted to clean the raw data by removing irrelevant elements such as URLs, emojis, numbers, and punctuation. The research applies a hybrid classification approach using a combination of Support Vector Machine (SVM) and Random Forest algorithms to categorize sentiment into three classes: positive, negative, and neutral. The performance of the model was evaluated using metrics such as accuracy, precision, recall, and F1-score to determine the effectiveness of the classification. The study aims to demonstrate that combining algorithms can improve classification performance compared to using a single algorithm. This research contributes to the field of sentiment analysis and provides valuable insights for researchers, policymakers, and social observers in understanding public opinion trends in digital media.

Untung Surapati; Dadang Iskandar Mulyana; Dedi Gunawan; Anggit Purnama

International Journal of Applied Mathematics and Computing 2026 Asosiasi Riset Ilmu Matematika dan Sains Indonesia

Early detection of a potential heart attack is a crucial step in preventing sudden death from heart disease. This research aims to develop an Internet of Things (IoT)-based health monitoring system capable of measuring vital body data in real time and predicting the likelihood of a heart attack from CSV data obtained from sensors, integrated through RapidMiner as learning data using a machine learning algorithm, the Support Vector Machine (SVM). The system was built using an ESP32 microcontroller connected to a MAX30102 sensor to measure heart rate and finger oxygen levels (SpO₂), as well as a DHT22 sensor to measure temperature and humidity. The resulting data is sent to the Blynk application to display real-time data according to its parameters. The initial prediction logic was developed using a rule-based method based on medical thresholds for four vital parameters. The data was then used to train an SVM model as a classification system to detect potential heart attacks. Test results showed that the system can identify abnormal conditions with a good level of accuracy and provide early warnings based on changes in vital parameters in real time. This system is expected to be an initial solution for personal health monitoring, especially for individuals at risk of heart disease. It can be further developed with cloud integration and automatic notifications to users' devices.

Sutisna Sutisna; Tri Wahyudi; Dwi Swasono Rachmad; Fachrur Rozi

International Journal of Information Engineering and Science 2026 Asosiasi Riset Teknik Elektro dan Infomatika Indonesia

Social media X (Twitter) has become the main platform for the Indonesian public to express opinions, including on the trend of 'kabur aja dulu' (let's just run away for a bit). This research aims to classify the sentiments of the public using the Naïve Bayes and Support Vector Machine (SVM) methods, and to compare the accuracy of both in sentiment analysis. Data was collected via the Twitter API with the hashtag #kaburajadulu, resulting in 2,067 tweets, which, after the cleansing process and manual labeling, left 385 data points. The analysis process followed the CRISP-DM stages, which include business understanding, data understanding, data preparation, modeling, evaluation, and deployment. Model evaluation was conducted using a confusion matrix with accuracy, precision, and recall metrics. The classification results show that 82% of tweets have a positive sentiment and 18% negative. The Naïve Bayes algorithm achieved an accuracy of 86.49%, slightly lower than SVM, which reached 88.05%. In conclusion, Support Vector Machine is more effective in sentiment classification on public opinion data. This research contributes to the digital mapping of public opinion and recommends the development of automatic labeling methods as well as the exploration of advanced algorithms in the future.

Aura Rahayu Aksa Radiana; Fathoni Mahardika; Dani Indra Junaedi

Merkurius : Jurnal Riset Sistem Informasi dan Teknik Informatika 2026 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

This study aims to develop a sentiment classification method for YouTube user comments related to the game Love and Deepspace using the Naïve Bayes algorithm, focusing on improving the text data processing and understanding user perceptions. Comment data were collected through scraping from YouTube videos, followed by preprocessing including text cleaning, normalization, stopword removal, stemming, and translation into English. Initial labeling was conducted using TextBlob, then the data were randomly sampled for training the Naïve Bayes model. Evaluation involved comparing sentiment distributions and visualization using Word Cloud and bar charts. The Naïve Bayes model achieved an accuracy of 77.36% in sentiment classification. The sentiment distribution shows differences between TextBlob (positive: 1,011, neutral: 1,312, negative: 575) and Naïve Bayes (positive: 901, neutral: 1,627, negative: 370), with Naïve Bayes being more conservative. The Word Cloud visualization identifies dominant words such as "bang," "game," and "main," while the bar chart shows the largest proportion of neutral sentiment. Naïve Bayes is effective for sentiment classification on informal comment data, with significant differences from rule-based methods like TextBlob. This research contributes to the development of text data processing techniques and user perception analysis, as well as opening up optimization opportunities with other algorithms like SVM for better accuracy.

Susanto, Eko; Sharipuddin; Purnama, Benni

Prosiding Seminar Nasional Ilmu Teknik 2026 Asosiasi Riset Ilmu Teknik Indonesia

The rapid growth of e-commerce in Indonesia, particularly the Shopee platform, has generated a large volume of user reviews on the Google Play Store, which can be analyzed to understand consumer sentiment. This study aims to compare the performance of the Support Vector Machine (SVM) and Random Forest (RF) algorithms in binary sentiment classification (positive and negative) on Shopee reviews, as well as to statistically test the significance of their differences using One-Way ANOVA. A total of 400,498 reviews were collected via web scraping, preprocessed through text normalization, tokenization, and Indonesian language stemming, and then feature-extracted using TF-IDF and Count Vectorizer. Evaluation results show that SVM achieved an accuracy of 91.77%, precision of 91.49%, recall of 91.77%, and F1-Score of 91.56%, while RF achieved an accuracy of 90.07%, precision of 91.68%, recall of 90.07%, and F1-Score of 90.55%. ANOVA confirmed that the performance difference between the two algorithms is statistically significant (p-value = 0.0007) with a large effect size (η² = 0.1815). Therefore, SVM is recommended as a more optimal and consistent algorithm for automated sentiment analysis of Indonesian e-commerce reviews, while also providing a replicable methodological framework for similar future research.

Putri Ramadani; Nur Aisyah Pandia; Salsabila Putri Hati Siregar

Prosiding Seminar Nasional Ilmu Teknik 2026 Asosiasi Riset Ilmu Teknik Indonesia

The spread of hoax news in digital media is a serious problem because it can affect public opinion and social stability. This study aims to classify hoax news using the Support Vector Machine (SVM) algorithm. The dataset used is a hoax clarification dataset from the Ministry of Communication and Digital (Komdigi) of the Republic of Indonesia, totaling 1,872 data. The research process includes data collection, text pre-processing, feature extraction using TF-IDF, and classification using the SVM algorithm. Implementation was carried out using Google Colaboratory (Google Colab). Test results show that the SVM algorithm is able to provide good performance in classifying hoax news based on its topic with satisfactory accuracy, precision, recall, and F1-score values.

Afif Lustyo Muji; Aziz Musthofa; Dihin Muriyatmoko

Prosiding Seminar Nasional Ilmu Teknik 2026 Asosiasi Riset Ilmu Teknik Indonesia

Since the announcement of the policy plan for a name transfer system in the sale of used mobile phones, the issue has attracted widespread public attention and discussion. People have expressed their opinions on social media platforms, particularly TikTok. This study aims to classify the sentiment of TikTok users using Naive Bayes and Support Vector Machine (SVM) algorithms. The data were collected through a comment scraping technique on related content.The research stages include text preprocessing, sentiment labeling into positive, negative, and neutral categories, and feature extraction using TF-IDF. The classification process employs Naive Bayes and Support Vector Machine algorithms, which are then evaluated based on accuracy, precision, recall, and F1-score. The results of this study indicate that both methods are capable of classifying sentiment effectively. However, the Support Vector Machine method is superior to the Naive Bayes method with an accuracy rate of 99.57% compared to 94.30%. This study is expected to help the government understand public responses to the planned policy of the used mobile phone name transfer system.

Abubakar, Mustapha; Ibrahim, Yusuf; Ajayi, Ore-Ofe; Saminu, Sani Saleh

Journal of Computing Theories and Applications 2026 Universitas Dian Nuswantoro

The integration of Artificial Intelligence (AI) into precision agriculture has significantly improved plant disease recognition; however, many existing deep learning models remain computationally expensive and feature-redundant, limiting their deployment on low-power and edge devices. To address these limitations, this study proposes a lightweight framework for maize leaf disease recognition based on serial deep feature extraction, dimensionality reduction, and machine-learning–based classification. A pre-trained MobileNetV2 network is employed as a fixed feature extractor to obtain discriminative visual representations, while Principal Component Analysis (PCA) is applied to reduce feature dimensionality by approximately 76%, retaining 95% of the original variance and improving computational efficiency. The compressed features are subsequently classified using a Radial Basis Function Support Vector Machine (RBF-SVM), optimized via grid search and cross-validation. Experiments conducted on a four-class maize leaf disease dataset (Northern Leaf Blight, Common Rust, Gray Leaf Spot, and Healthy), with class imbalance handled during training, demonstrate that the proposed MobileNetV2–PCA–SVM pipeline achieves 97.58% accuracy, 96.60% precision, 96.59% recall, and 96.59% F1-score, outperforming the DenseNet201 + Bayesian-optimized SVM baseline (94.60%, 94.40%, 94.40%, and 94.40%, respectively). This improvement corresponds to a 2.98% accuracy gain, a 55% reduction in error rate, an 86% reduction in model parameters (20.31M to 2.75M), and an 85% reduction in model size (81 MB to 12 MB). These results indicate that the proposed framework provides a compact and efficient solution with strong potential for deployment in resource-constrained agricultural environments.

Purnomo, Rosyana Fitria; Purnomo, Rosyana Fitria; Yodhi Yuniarthe; Hilda Dwi Yunita; Fatimah Fahurian +1 more

Jurnal Elektronika dan Komputer 2026 STEKOM PRESS

Detection and identification of plant diseases is critical to the success and efficiency of agricultural production. Plant disease outbreaks are becoming more frequent throughout the world, and the presence of these diseases in cultivated plants has a significant impact on productivity. Therefore, researchers are focusing on developing effective and reliable plant disease detection methods. Thus, farmers can take advantage of early detection of this disease to minimize future losses. This article discusses machine learning approaches as well as decision trees, K-nearest neighbors, naive Bayes, support vector machines (SVM), and random forests for detecting coffee leaf diseases using leaf images. The above-mentioned classifications were researched and compared to determine the most suitable plant disease prediction model with the highest accuracy. Compared with other classification algorithms, the SVM algorithm achieves the highest accuracy of 99.75%. All the models trained above will be used by farmers to quickly identify and classify new diseases in images as a prevention strategy. As a preventive measure, farmers can detect and classify new diseases in images early.

Egi Rangga Maulana

Uranus: Jurnal Ilmiah Teknik Elektro, Sains dan Informatika 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

This study presents a high-accuracy real-time soft failure detection framework for large-scale fiber-to-the-home(FTTH) optical access network using a hybrid ensemble of Isolation Forest and One-Class Support Vector Machine (OCVSM). The proposed model was trainde and validated on a real-word multivariate performance dataset comprising more than 1.8 million samples collected at 5-minute intervals from 50 Optical Line Terminal (OLTs) and over 3,000 Optical Network Terminals (ONTs) across a five-month periode(June-October 2025). Ground-truth validation was performed using 111 confirmed network incidents in October 2025 affecting 12,990 customer. The hybrid ensemble achieved Precision 0.940, Recall 0.982, with an average detection delay of only 7.8 minutes-representing an 87.7% reduction compared to conventional manual response (63.5 minutes). The framework significantly outperforms traditional threesholding and recent ML-based methods while demonstrating practical deployability in live operational enviroments.

Ryzal Nur Alvandy; Ryzal Nur Alvandy; Arita Witianti

Jurnal Elektronika dan Komputer 2025 STEKOM PRESS

The rapid expansion of e-commerce in Indonesia has resulted in a significant rise in the number of customer reviews, which serve as a valuable source of insight for understanding consumer satisfaction. This study aims to classify or identify sentiments from product reviews on the Tokopedia platform into three categories, using the Support Vector Machine algorithm. The classification method data were ethically collected through web scraping and include review text, ratings, and the number of “likes.”  The preprocessing stage involved several NLP techniques such as pre-procesesing data representation was generated using the Term Frequency–Inverse Document Frequency method, while the issue of class imbalance was addressed using the Synthetic Minority Over-sampling Technique.  Based on the test results, the SVM model achieved an accuracy of 79.48% on the test data using a linear kernel, showing the best performance in classifying positive sentiments. However, the classification of neutral and negative sentiments still requires improvement. This study demonstrates that the combination of the TF-IDF method, additional numerical features, and data balancing techniques can produce an an efficient sentiment analysis model within the e-commerce domain.

Hamza, Ali; Hussain, Wahid; Iftikhar, Hassan; Ahmad, Aziz; Shamim, Alamgir Md

Journal of Computing Theories and Applications 2025 Universitas Dian Nuswantoro

The rapid growth of open-source software (OSS) in machine learning (ML) has intensified the need for reliable, automated methods to assess project quality, particularly as OSS increasingly underpins critical applications in science, industry, and public infrastructure. This study evaluates the effectiveness of a diverse set of machine learning and deep learning (ML/DL) algorithms for classifying GitHub OSS ML projects as engineered or non-engineered using a SMOTE-enhanced and explainable modeling pipeline. The dataset used in this research includes both numerical and categorical attributes representing documentation, testing, architecture, community engagement, popularity, and repository activity. After handling missing values, standardizing numerical features, encoding categorical variables, and addressing the inherent class imbalance using the Synthetic Minority Oversampling Technique (SMOTE), seven different classifiers—K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF), XGBoost (XGB), Logistic Regression (LR), Support Vector Machine (SVM), and a Deep Neural Network (DNN)—were trained and evaluated. Results show that LR (84%) and DNN (85%) outperform all other models, indicating that both linear and moderately deep non-linear architectures can effectively capture key quality indicators in OSS ML projects. Additional explainability analysis using SHAP reveals consistent feature importance across models, with documentation quality, unit testing practices, architectural clarity, and repository dynamics emerging as the strongest predictors. These findings demonstrate that automated, explainable ML/DL-based quality assessment is both feasible and effective, offering a practical pathway for improving OSS sustainability, guiding contributor decisions, and enhancing trust in ML-based systems that depend on open-source components.

Sipasulta, Angelica Mailen; Bayu, Teguh Indra

IT-Explore: Jurnal Penerapan Teknologi Informasi dan Komunikasi 2025 Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana

Bea Cukai has recently been in the public spotlight, especially regarding the supervision of goods from abroad. News and public responses regarding Bea Cukai's supervision create pros and cons, thus triggering a variety of responses from the public. This study aims to analyze the sentiment of Indonesian people towards the performance of Bea Cukai in monitoring goods from abroad by utilizing Twitter social media. In this research, the Support Vector Machine (SVM) algorithm is applied to classify public comments on Twitter into positive or negative sentiments. Through the crawling process carried out from June 1, 2023, to May 12, 2024, 9,051 entries of data were collected. The analysis results showed an accuracy of 93.87%, precision 94%, recall 93%, and F1-score 94%. These results show that the SVM method is effective in analyzing public sentiment, especially related to Bea Cukai's supervision.

Gunawan, Ricardho; Hendry, Hendry

IT-Explore: Jurnal Penerapan Teknologi Informasi dan Komunikasi 2025 Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana

Sentiment analysis of guest reviews is a crucial aspect in improving the quality of hotel services. This study aims to analyze the sentiment of guest reviews regarding the services of Grand Diamond Hotel Yogyakarta using a machine learning approach with the Support Vector Machine (SVM) algorithm. SVM was chosen because it can handle high-dimensional data such as text and is capable of forming an optimal separating hyperplane between sentiment classes. The research data was obtained through web scraping from Traveloka, yielding 1,119 reviews, which were processed through preprocessing, translation, and sentiment labeling using the TextBlob library. After TF-IDF weighting, the data was divided into 80% for training and 20% for testing. The linear kernel SVM model achieved 80% accuracy in classifying the reviews into positive, negative, and neutral categories. The results of this study were implemented in a web-based application equipped with data visualization and model evaluation features, allowing hotel management to efficiently monitor and analyze guest sentiment and support data-driven service quality improvement.

Wahyu Saputro

Mars: Jurnal Teknik Mesin, Industri, Elektro Dan Ilmu Komputer 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Human Resource Management (HRM) plays a strategic role in improving organizational competitiveness through proper management of employee placement, training, and performance evaluation. To support the achievement of these goals, a predictive model is needed that can provide an accurate picture of employee performance. This study utilizes a Human Resource Management (HRM) dataset of 1,200 data and applies several classification algorithms to compare their effectiveness, namely J48 or C4.5, Random Forest, Naive Bayes, K-Nearest Neighbor (KNN), Logistic Regression, and Support Vector Machine (SVM). To obtain more optimal results, this study uses resampling techniques and attribute selection methods with a correlation attribute eval approach, so that class distribution can be more balanced and model accuracy increases. From the test results, the Decision Tree J48 algorithm showed the best performance with an accuracy level reaching 95.41%, a kappa value of 0.8925, a mean absolute error (MAE) of 0.0432, a precision of 0.955, a recall of 0.954, and an area under the ROC curve of 0.964. These findings indicate that J48 has excellent predictive capabilities compared to other algorithms. Furthermore, this study also found that the most influential variables in determining employee performance include the percentage of the last salary increase (EmpLast Salary Hike Percent), the level of work environment satisfaction (Emp Environment Satisfaction), the length of time since the last promotion (Years Since Last Promotion), and experience in the current role (Experience Years in Current Role). Overall, the results of the study indicate that the C4.5 algorithm with the application of the resampling technique can be an optimal solution in building an employee performance prediction system. Thus, this model has the potential to be a strong basis for managerial decision-making, particularly in designing HR development strategies and policies to improve organizational performance.

Prashanthan, Amirthanathan

Journal of Computing Theories and Applications 2025 Universitas Dian Nuswantoro

The study presents a comprehensive framework for optimizing customer retention budget by integrating clustering, classification, and mathematical optimization techniques. The study begins with the IBM Telco dataset, which is prepared through data cleansing, encoding, and scaling.  In the preliminary phase, customer segmentation is performed using K-Means clustering, with k = 3 and k = 4 identified as optimal based on the elbow method and Silhouette score. The configurations produced three (Premium, Standard, Low) and four (Premium, Standard Plus, Standard, Low) customer segments based on purchase preferences, which served as input features for churn prediction. In the second phase, the dataset was divided into training and test sets in an 80:20 ratio, followed by data balancing using the Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbors (ENN). Multiple classification algorithms were evaluated, including Naive Bayes (NB), Random Forest (RF), Categorical Boosting (CatBoost), Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting (XGBoost), Gradient Boosting (GB), Support Vector Machine (SVM), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Multi-Layer Perceptron (MLP) using F1-score as the performance metric. CatBoost and LightGBM, with k values of 3 and 4, respectively, were the highest-performing classification models, with only minimal differences in performance.    Ultimately, customer segmentation established customer prioritization, whereas churn prediction assessed customer churn likelihood. Four distinct configurations were assessed utilizing mixed-integer linear programming (MILP) to optimise retention budget allocation within uniform budget constraints, discount amounts, and churn thresholds. In both the k=3 and k=4 scenarios, CatBoost surpassed LightGBM, with CatBoost at K=3 effectively discounting 66% of at-risk consumers across all three segments, hence improving the intervention's efficacy and budget allocation, making it the ideal choice for maximizing customer retention. The results demonstrate the importance of segmentation in enhancing retention budgeting and budget optimization, particularly concerning parameter sensitivity.

Rosa Ratri Kusuma Hariningsih; Diwahana Mutiara Candrasari; Endang Setyawati; Syamsu Wahidin; Jevon Nataniel Putra

International Journal of Computer Technology and Science 2025 Asosiasi Riset Teknik Elektro dan Infomatika Indonesia

Dengue Fever (DF) continues to be a major public health threat in Indonesia, especially in urban areas with high population density, such as Purwokerto City. This study aims to develop a predictive model to identify high-risk areas for DF outbreaks by integrating Machine Learning (ML) algorithms and Geographic Information Systems (GIS). The research utilizes historical dengue case data, meteorological parameters (rainfall, temperature, humidity), and population density as predictive variables. Three ML classification algorithms—Naïve Bayes, Logistic Regression, and Support Vector Machine (SVM)—were implemented to develop risk prediction models. Extensive data preprocessing, feature selection, and spatial integration were applied to ensure model robustness. The results show that the SVM model outperformed other methods, achieving the highest accuracy, precision, recall, and F1-score in classifying dengue risk zones. Risk maps generated through GIS visualization successfully identify priority areas for targeted interventions. The novelty of this research lies in the combination of local epidemiological data, multi-algorithm comparison, and geospatial mapping to improve early warning systems for DF in Purwokerto. This integrated approach is expected to support more effective prevention strategies and enhance public health preparedness.

Eugenea Chiquita Zahrani Assyarif; I Kadek Dwi Nuryana

Modem : Jurnal Informatika dan Sains Teknologi 2025 Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

This study aims to conduct customer segmentation and develop a classification model to predict the clusters of new customers at Monex Toys Abadi Bekasi, a micro, small, and medium enterprise (MSME). Segmentation was performed using the K-Means Clustering algorithm, incorporating parameters such as Recency, Frequency, Monetary (RFM), purchased products, payment methods, shipping cost discounts, and the total number of products purchased by customers. The segmentation results revealed two clusters: (1) Discount Hunters and (2) Loyal Customers. Subsequently, a classification process was conducted to predict customer clusters using the K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) algorithms. Evaluation results indicated that all models achieved high accuracy exceeding 98%. The best-performing model was obtained with SVM using a 70:30 data split, achieving an accuracy of 98.81%. This classification model was then implemented into a Streamlit-based cluster prediction application, enabling users to identify customer segments in real-time. The findings of this research are expected to assist MSMEs in understanding customer behavior, enhancing service quality, and supporting more effective marketing strategies.

Sarassati, Dwi Sinta; Joko Prasetyo , Sri Yulianto

IT-Explore: Jurnal Penerapan Teknologi Informasi dan Komunikasi 2025 Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana

Tidal flooding is an event of a natural phenomenon when sea water rises to land due to the influence of changes in sea tides, which causes waterlogging around the coastal area. This tidal flood hit the Demak-Semarang area, especially in the Sayung District area, which hampers and impacts community life. The purpose of this analysis is to analyze public sentiment regarding the impact of tidal flooding in Demak Regency using data obtained from social media, and the results of the analysis can be used as an evaluation for the government and related parties to formulate more responsive and effective policies to overcome the problem of tidal flooding. The SVM (Support Vector Machine) method is used to classify sentiment from each data into positive, negative, or neutral categories. The results of the analysis using SVM showed 3580 initial data, after preprocessing, 3147 data were obtained, with sentiment results of 1581 neutral opinions, 1257 negative, and 309 positive. Most opinions are neutral, indicating that people consider tidal flooding as a natural phenomenon and are used to dealing with it. However, significant negative opinions indicate dissatisfaction with the government's handling, while positive opinions are very minimal. SVM showed 84.44 percent accuracy, 86.7 percent precision, and 97.8 percent recall. The study recommends improvements in flood mitigation, assistance for affected communities, and infrastructure improvements.

Fitri Dwianasari; Rohmah Diah Yani; Karlina Novianto Laksono; Nurhafillah Mujaliza; Riza Fahlapi

Kajian Ekonomi dan Akuntansi Terapan 2025 Asosiasi Riset Ekonomi dan Akuntansi Indonesia

Mining activities in the Raja Ampat area have sparked various public reactions, both supportive and critical, particularly on social media platforms such as Twitter. This study aims to analyze public sentiment regarding the mining operations by employing two classification algorithms. A total of 500 tweets related to Raja Ampat were collected from the X platform, and after data cleaning, 168 were identified as positive sentiments and 303 as negative. Sentiment analysis was conducted using text mining techniques by comparing two algorithms: Support Vector Machine (SVM) and Naïve Bayes. To address the issue of data imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied. The analysis results showed that SVM achieved an accuracy of 80%, outperforming Naïve Bayes, which reached only 68%. This indicates that SVM performed better in classifying sentiment. Additionally, the application of SMOTE effectively enhanced both algorithms’ abilities to detect positive sentiment, as reflected in the precision, recall, and F1-score metrics. For SVM, precision reached 85%, recall 80%, and F1-score 80%, while Naïve Bayes recorded a precision and recall of 69%, and an F1-score of 68%.