SciRepID - Scientific Publication Search

Publication Search

18,135 articles from 385 journals · 1,447 citations tracked

Showing 1-20 of 70

Analytics

Tengku Syahvina Rival Dini; Rani Chantika; Pebi Mina Husania; Puji Sri Alhirani

Prosiding Seminar Nasional Ilmu Teknik 2026 Asosiasi Riset Ilmu Teknik Indonesia

This research develops a machine learning model to classify customer loyalty using the Random Forest algorithm. Customer churn is a critical issue that reduces revenue and increases acquisition costs. A dataset of 50,000 customers from global e-commerce and subscription platforms was processed through data cleaning, imputation, outlier handling, and class balancing with SMOTE. The Random Forest model was built as a baseline and optimized with hyperparameter tuning. Evaluation using accuracy, precision, recall, and F1-score shows that the optimized model achieved 90.81% accuracy and 83.87% F1-score, outperforming previous Naïve Bayes approaches. Feature importance analysis highlights customer service interactions, lifetime value, and demographic factors as key predictors of churn. These findings demonstrate Random Forest’s effectiveness in churn prediction and provide practical insights for customer retention strategies

Afif Lustyo Muji; Aziz Musthofa; Dihin Muriyatmoko

Prosiding Seminar Nasional Ilmu Teknik 2026 Asosiasi Riset Ilmu Teknik Indonesia

Since the announcement of the policy plan for a name transfer system in the sale of used mobile phones, the issue has attracted widespread public attention and discussion. People have expressed their opinions on social media platforms, particularly TikTok. This study aims to classify the sentiment of TikTok users using Naive Bayes and Support Vector Machine (SVM) algorithms. The data were collected through a comment scraping technique on related content.The research stages include text preprocessing, sentiment labeling into positive, negative, and neutral categories, and feature extraction using TF-IDF. The classification process employs Naive Bayes and Support Vector Machine algorithms, which are then evaluated based on accuracy, precision, recall, and F1-score. The results of this study indicate that both methods are capable of classifying sentiment effectively. However, the Support Vector Machine method is superior to the Naive Bayes method with an accuracy rate of 99.57% compared to 94.30%. This study is expected to help the government understand public responses to the planned policy of the used mobile phone name transfer system.

Dihin Muriyatmoko; Aziz Musthafa; Yusuf Al Banna

Prosiding Seminar Nasional Ilmu Teknik 2026 Asosiasi Riset Ilmu Teknik Indonesia

Sentiment analysis on social media is widely used to represent public perceptions of sports performance, particularly in international competitions. This study aims to analyze the sentiment of YouTube user comments regarding the performance of the Indonesian National Football Team during the FIFA World Cup 2026 Asian Qualifiers. The data were collected from user comments on videos related to the matches and analyzed using a machine learning–based sentiment analysis approach. Sentiment classification was performed using the Naive Bayes algorithm. The results indicate that the proposed approach is able to effectively identify public sentiment toward the national team’s performance during the qualification matches. The findings of this study are expected to provide insights into public perceptions and contribute to sentiment analysis research in the field of sports.

Purnomo, Rosyana Fitria; Purnomo, Rosyana Fitria; Yodhi Yuniarthe; Hilda Dwi Yunita; Fatimah Fahurian +1 more

Jurnal Elektronika dan Komputer 2026 STEKOM PRESS

Detection and identification of plant diseases is critical to the success and efficiency of agricultural production. Plant disease outbreaks are becoming more frequent throughout the world, and the presence of these diseases in cultivated plants has a significant impact on productivity. Therefore, researchers are focusing on developing effective and reliable plant disease detection methods. Thus, farmers can take advantage of early detection of this disease to minimize future losses. This article discusses machine learning approaches as well as decision trees, K-nearest neighbors, naive Bayes, support vector machines (SVM), and random forests for detecting coffee leaf diseases using leaf images. The above-mentioned classifications were researched and compared to determine the most suitable plant disease prediction model with the highest accuracy. Compared with other classification algorithms, the SVM algorithm achieves the highest accuracy of 99.75%. All the models trained above will be used by farmers to quickly identify and classify new diseases in images as a prevention strategy. As a preventive measure, farmers can detect and classify new diseases in images early.

Aditya Abdulloh Masykur; Aditya Abdulloh Masykur; Rino Raihan Gumilang; Harun Al Rosyid

Jurnal Elektronika dan Komputer 2026 STEKOM PRESS

The performance of the Indonesian National Team (Timnas) in the 2026 World Cup qualifications has triggered massive and diverse responses on social media, particularly on platform X. This study aims to identify and classify public sentiment regarding Timnas Indonesia's performance into positive, negative, and neutral categories using a data mining approach. Text data was processed through pre-processing stages, term weighting using TF-IDF, and the application of the Synthetic Minority Over-sampling Technique (SMOTE) to address significant class distribution imbalance. The classification algorithm employed was Multinomial Naïve Bayes. Model performance evaluation was conducted by comparing two training-testing data split scenarios: 90:10 and 80:20 ratios. The results indicate that public opinion is dominated by negative sentiment at 73.2%, reflecting public disappointment. In terms of model performance, the 90:10 ratio scenario yielded the best accuracy of 80%, outperforming the 80:20 ratio which recorded an accuracy of 75%. These findings demonstrate that combining Multinomial Naïve Bayes with the SMOTE technique is effective in handling imbalanced text data and is capable of accurately mapping public perception.

Claudia K. Hamsi; I Wayan Sudiarsa; Vinsensia P.K Abu; Sarling C. Dhai; Maria A. Serero

Mars: Jurnal Teknik Mesin, Industri, Elektro Dan Ilmu Komputer 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

The rapid development of digital streaming platforms such as Netflix has generated a large volume of content data with diverse characteristics, thereby requiring effective analytical methods to understand emerging patterns and trends. This study aims to classify Netflix content into two main categories, namely movies and television shows, and to analyze genre trends and content characteristics using a data mining approach with the Naive Bayes algorithm. The dataset used in this study is the Netflix Shows dataset, consisting of 8,809 content entries, with the primary features analyzed including genre, rating, and country of production. The research process begins with data exploration and preprocessing stages, including data cleaning, handling missing values, and transforming categorical features to enable effective model construction. Subsequently, the dataset is divided into training and testing sets to objectively and systematically build and evaluate the Naive Bayes classification model. Model performance is evaluated using accuracy, precision, recall, and F1-score metrics to assess the model’s ability to accurately distinguish between Netflix content types. The experimental results demonstrate that the Naive Bayes algorithm is able to classify Netflix content into Movie and TV Show categories with accuracy, precision, recall, and F1-score values of 100%, respectively. The confusion matrix indicates that no misclassification occurred, suggesting that genre, rating, and country of production features provide a very clear separation between content classes. These findings indicate that the Naive Bayes algorithm can achieve exceptionally high classification performance with optimal evaluation results. The results further reveal distinct differences in characteristics between movies and television shows based on genre and production attributes. Therefore, this study is expected to contribute to the development of content recommendation systems and strategic content management within the streaming industry.

Firdaus, Muhammad; Rosyidah, Ulya Anisatur; Handayani, Luluk

Router : Jurnal Teknik Informatika dan Terapan 2025 Asosiasi Profesi Telekomunikasi dan Informatika Indonesia

Sugar consumption in Indonesia remains high, with diabetes affecting 20.4 million people. This condition has prompted the government to introduce an excise policy on Minuman Berpemanis Dalam Kemasan (MBDK) to reduce sugar intake. Social media, particularly the X platform, serves as a medium for the public to express their opinions regarding this policy. This study aims to analyze public sentiment toward the MBDK excise policy using a lexicon-based approach for data labeling and the Multinomial Naive Bayes algorithm with unigram and bigram feature extraction. The initial results show that the highest performance was achieved using 5-Fold Cross Validation, with an average accuracy of 83%, precision of 84%, recall of 75%, and an F1-Score of 77%. After applying data balancing using Stratified Cross Validation combined with Borderline-SMOTE and limiting the features to the 700 most frequent terms, the model’s performance improved. The best results were obtained with 10-Fold Cross Validation, achieving 86% accuracy, 84% precision, 83% recall, and an F1-Score of 83%. These findings indicate that the Multinomial Naive Bayes model can effectively classify public sentiment regarding the MBDK excise policy after the data balancing process.

Ricardus Mba Dala Pati; Eka Kusuma Pratama; Tuslaela Tuslaela

Repeater : Publikasi Teknik Informatika dan Jaringan 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

JakLingko is a digital-based public transportation integration system developed to facilitate access to various transportation modes in Jakarta. Along with the increasing number of users, reviews on the JakLingko application reflect user experiences and perceptions. This study aims to analyze the sentiment of user reviews on the Google Play Store using the Naïve Bayes method. Data collection was conducted through web scraping, resulting in 3,260 reviews. The data were preprocessed, sentiment-labeled, and classified using Orange Data Mining. The research applied a quantitative experimental approach with a machine learning framework. The classification results showed that neutral sentiment dominated user reviews, followed by negative and positive sentiments. The Naïve Bayes model achieved 100% accuracy based on the confusion matrix and other evaluation metrics such as precision, recall, and F1-score. The findings highlight that Naïve Bayes can be a reliable approach for analyzing public opinion and serve as a reference for evaluating and improving digital service applications.

Selvinus Dakku; Vinsensius Aprila Kore Dima; Diana Reby Sabawaly

Router : Jurnal Teknik Informatika dan Terapan 2025 Asosiasi Profesi Telekomunikasi dan Informatika Indonesia

The Family Hope Program (PKH) is a conditional social assistance program provided by the government to improve the quality of life of underprivileged families through support in the education, health, and social welfare sectors. In its implementation, the process of determining PKH candidate recipients at the West Sumba Regency Social Service often experiences obstacles, especially with regard to objectivity, accuracy of targets, and limitations in complex data management. Thus, a decision support system (SPK) is needed that can assist the agency in selecting prospective recipients more effectively, efficiently, and on target. This study proposes the application of the Naive Bayes method in the development of SPK to determine PKH recipients. The Naive Bayes method was chosen because of its ability to classify data based on probability, and it can handle large volumes of data with a good degree of accuracy. The criteria applied in the classification include the level of household income, the number of members covered, the state of residence, the education of children, and the health of family members. The research process includes needs analysis, system design, data collection, application of Naive Bayes algorithms, and system testing. The findings of the study show that SPK based on Naive Bayes can provide recommendations for PKH recipients with better accuracy compared to manual methods. In addition, the system is able to improve transparency, fairness, and speed in the recipient selection procedure. With this system, it is hoped that the distribution of PKH in West Sumba Regency can be more orderly, balanced, and on target in accordance with the goals of government programs.

Wahyu Saputro

Mars: Jurnal Teknik Mesin, Industri, Elektro Dan Ilmu Komputer 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Human Resource Management (HRM) plays a strategic role in improving organizational competitiveness through proper management of employee placement, training, and performance evaluation. To support the achievement of these goals, a predictive model is needed that can provide an accurate picture of employee performance. This study utilizes a Human Resource Management (HRM) dataset of 1,200 data and applies several classification algorithms to compare their effectiveness, namely J48 or C4.5, Random Forest, Naive Bayes, K-Nearest Neighbor (KNN), Logistic Regression, and Support Vector Machine (SVM). To obtain more optimal results, this study uses resampling techniques and attribute selection methods with a correlation attribute eval approach, so that class distribution can be more balanced and model accuracy increases. From the test results, the Decision Tree J48 algorithm showed the best performance with an accuracy level reaching 95.41%, a kappa value of 0.8925, a mean absolute error (MAE) of 0.0432, a precision of 0.955, a recall of 0.954, and an area under the ROC curve of 0.964. These findings indicate that J48 has excellent predictive capabilities compared to other algorithms. Furthermore, this study also found that the most influential variables in determining employee performance include the percentage of the last salary increase (EmpLast Salary Hike Percent), the level of work environment satisfaction (Emp Environment Satisfaction), the length of time since the last promotion (Years Since Last Promotion), and experience in the current role (Experience Years in Current Role). Overall, the results of the study indicate that the C4.5 algorithm with the application of the resampling technique can be an optimal solution in building an employee performance prediction system. Thus, this model has the potential to be a strong basis for managerial decision-making, particularly in designing HR development strategies and policies to improve organizational performance.

Farendika Rezzi

Uranus: Jurnal Ilmiah Teknik Elektro, Sains dan Informatika 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

The rapid growth of e-commerce platforms has significantly transformed the way consumers share and access product feedback. One of the widely used platforms in Indonesia is Shopee, where customers actively provide reviews of various products, including local skincare brands such as Kahf facial wash. Customer reviews on e-commerce platforms contain valuable information that can be analyzed to understand consumer opinions and preferences. Sentiment analysis, as a branch of natural language processing, enables the classification of textual data into categories such as positive, negative, or neutral. This study aims to classify Shopee user sentiments regarding Kahf facial wash products by implementing the Multinomial Naïve Bayes algorithm, a well-known probabilistic classifier suitable for text categorization. The research methodology consisted of several preprocessing stages, including data cleansing, case folding, tokenizing, stopword removal, and stemming, to prepare raw review texts for further analysis. For feature representation, the Term Frequency–Inverse Document Frequency (TF-IDF) method was applied to capture the importance of words across documents. To evaluate the classification performance, K-Fold cross-validation was employed with K values of 4, 5, 6, and 10 to ensure model reliability and robustness. Considering the issue of imbalanced datasets in user-generated reviews, the Synthetic Minority Over-sampling Technique (SMOTE) was utilized to balance the distribution of sentiment classes. Based on the confusion matrix, the Multinomial Naïve Bayes algorithm demonstrated effective performance in classifying sentiments, achieving satisfactory levels of accuracy, precision, and recall across different folds. These results indicate that the algorithm is capable of handling sentiment analysis tasks for local product reviews effectively. The findings of this study are expected to provide meaningful insights for businesses in understanding consumer perceptions, thereby supporting decision-making processes in product development, marketing strategies, and customer engagement for local brands.

Muhammad Azlan; Elvi Rahmi

Neptunus: Jurnal Ilmu Komputer Dan Teknologi Informasi 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

This study aims to analyze the sentiment of customer reviews of the Grand Jatra Hotel Pekanbaru on the Google Review platform using the Naïve Bayes algorithm. Social media and online review platforms are increasingly becoming the primary source of information for potential customers in making purchasing decisions, particularly in the hospitality sector. Therefore, sentiment analysis of customer reviews is crucial for understanding consumer perceptions and providing strategic input for hotels in improving service quality. The research data was collected using web scraping techniques to obtain publicly available customer reviews. The obtained data was then processed through text preprocessing stages including case folding, tokenizing, normalization, stopword removal, and stemming. The Term Frequency-Inverse Document Frequency (TF-IDF) method was then used to weight each word, so that more relevant words have a greater influence in the classification process. The sentiment classification process was carried out into two main categories, namely positive and negative. The Naïve Bayes model was trained using training data and then tested with test data to measure the algorithm's performance in classifying sentiment. The evaluation results show that the model built is able to achieve an accuracy level of 98%, with a precision value of 97% and a recall of 100% in the positive class, and 92% in the negative class. These findings confirm that the Naïve Bayes algorithm can be effectively used in analyzing customer sentiment towards hotel services and facilities. Practically, the results of this study are expected to provide insight for the management of Grand Jatra Hotel Pekanbaru in understanding customer perceptions, identifying service strengths and weaknesses, and formulating more targeted marketing strategies. In addition, this study can also be a reference for the development of similar studies in the hotel industry and other service sectors.

Bambang Minto Basuki

Jupiter: Publikasi Ilmu Keteknikan Industri, Teknik Elektro dan Informatika 2025 Asosiasi Riset Ilmu Teknik Indonesia

The Paiton Steam Power Plant (PLTU) is one of the main sources of electrical energy in East Java, which plays a vital role in maintaining a sustainable electricity supply. The reliability of generator units is a key element in maintaining stable energy distribution. However, the high frequency of sudden generator failures poses serious challenges, such as increased downtime and increased maintenance costs. To address these challenges, this study aims to design a generator maintenance prediction model based on the Naive Bayes algorithm with a predictive maintenance approach. This study uses historical maintenance data and key sensor parameters such as temperature, oil pressure, and vibration as input. The data is analyzed through several stages, namely data preprocessing, selection of relevant features, and labeling generator conditions into three categories: Normal, Warning, and Critical. The Naive Bayes model is trained to classify the data probabilistically to generate predictions of future generator conditions. Model evaluation using accuracy metrics and a confusion matrix shows that the model successfully achieved an accuracy rate of 89% and was able to provide early warnings of potential failures up to 3 days before failure occurs. The implementation of this system is expected to support the shift in maintenance strategies from reactive and scheduled systems to data-driven predictive systems. Implementing failure predictions allows the technical team at the Paiton PLTU to conduct planned maintenance, avoid sudden disruptions, and extend equipment lifespan. Thus, this model has the potential to reduce operational downtime by up to 25%, while providing significant savings in operational and logistics costs. This research also shows that integrating machine learning technology into energy facility management can improve the efficiency and resilience of the overall electric power system.

Prashanthan, Amirthanathan

Journal of Computing Theories and Applications 2025 Universitas Dian Nuswantoro

The study presents a comprehensive framework for optimizing customer retention budget by integrating clustering, classification, and mathematical optimization techniques. The study begins with the IBM Telco dataset, which is prepared through data cleansing, encoding, and scaling.  In the preliminary phase, customer segmentation is performed using K-Means clustering, with k = 3 and k = 4 identified as optimal based on the elbow method and Silhouette score. The configurations produced three (Premium, Standard, Low) and four (Premium, Standard Plus, Standard, Low) customer segments based on purchase preferences, which served as input features for churn prediction. In the second phase, the dataset was divided into training and test sets in an 80:20 ratio, followed by data balancing using the Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbors (ENN). Multiple classification algorithms were evaluated, including Naive Bayes (NB), Random Forest (RF), Categorical Boosting (CatBoost), Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting (XGBoost), Gradient Boosting (GB), Support Vector Machine (SVM), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Multi-Layer Perceptron (MLP) using F1-score as the performance metric. CatBoost and LightGBM, with k values of 3 and 4, respectively, were the highest-performing classification models, with only minimal differences in performance.    Ultimately, customer segmentation established customer prioritization, whereas churn prediction assessed customer churn likelihood. Four distinct configurations were assessed utilizing mixed-integer linear programming (MILP) to optimise retention budget allocation within uniform budget constraints, discount amounts, and churn thresholds. In both the k=3 and k=4 scenarios, CatBoost surpassed LightGBM, with CatBoost at K=3 effectively discounting 66% of at-risk consumers across all three segments, hence improving the intervention's efficacy and budget allocation, making it the ideal choice for maximizing customer retention. The results demonstrate the importance of segmentation in enhancing retention budgeting and budget optimization, particularly concerning parameter sensitivity.

Lailiah, Badariatul; saadah, Rabiatus; Rizka Dahlia; saadah, Rabiatus

Jurnal Elektronika dan Komputer 2025 STEKOM PRESS

Technological advancements have brought fundamental changes in the way we interact with digital images and photography. One significant milestone in this development is the Photoshop Express Photo Editor, which has become a primary platform for image processing and editing. Datasets are used to analyze sentiment and are utilized during the accuracy testing phase. Based on the testing results, the Convolutional Neural Network (CNN) algorithm achieved an average accuracy value of 86.50%, compared to the Naïve Bayes (NB) algorithm, which achieved an average accuracy value of 75%. The results of the research conclude that the choice of sentiment analysis method should be tailored to the needs and limitations of the system. If a fast, light, and easy-to-understand process is required, the Naive Bayes method is the right choice. However, if accuracy and context understanding are the top priorities, then CNN is a superior approach, although it requires more resources. Additionally, based on the Wordcloud data, it is known that the majority of comments are positive, indicating that the reviews or texts analyzed contain many positive expressions related to quality, usability, and ease of use.

Eka Wulansari Fidayanthie; Asep Sayfulloh; Mardiana Rafa Alzena; Nilam Kurnia Sari

Saturnus: Jurnal Teknologi dan Sistem Informasi 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Lungs are vital organs in the human respiratory system, responsible for fulfilling the body's oxygen needs. If the lungs experience health problems, it can have adverse effects on the human respiratory system. Common causes of lung diseases are usually due to inhaling air contaminated by dust, smoke, viruses, and bacteria. This study aims to compare the performance of two classification algorithms, namely Random Forest and Naive Bayes, in predicting lung diseases. The data used was obtained from the Kaggle website and processed using RapidMiner software. The attributes involved include smoking habits, pre-existing conditions, staying up late, exercise activities, age, and outcomes. Based on the test results, the Random Forest algorithm demonstrated the best performance with an accuracy of 93%, while the Naive Bayes algorithm achieved an accuracy of 87%. These findings indicate that the Random Forest algorithm outperforms the Naive Bayes algorithm in terms of lung disease prediction accuracy.

Rosa Ratri Kusuma Hariningsih; Diwahana Mutiara Candrasari; Endang Setyawati; Syamsu Wahidin; Jevon Nataniel Putra

International Journal of Computer Technology and Science 2025 Asosiasi Riset Teknik Elektro dan Infomatika Indonesia

Dengue Fever (DF) continues to be a major public health threat in Indonesia, especially in urban areas with high population density, such as Purwokerto City. This study aims to develop a predictive model to identify high-risk areas for DF outbreaks by integrating Machine Learning (ML) algorithms and Geographic Information Systems (GIS). The research utilizes historical dengue case data, meteorological parameters (rainfall, temperature, humidity), and population density as predictive variables. Three ML classification algorithms—Naïve Bayes, Logistic Regression, and Support Vector Machine (SVM)—were implemented to develop risk prediction models. Extensive data preprocessing, feature selection, and spatial integration were applied to ensure model robustness. The results show that the SVM model outperformed other methods, achieving the highest accuracy, precision, recall, and F1-score in classifying dengue risk zones. Risk maps generated through GIS visualization successfully identify priority areas for targeted interventions. The novelty of this research lies in the combination of local epidemiological data, multi-algorithm comparison, and geospatial mapping to improve early warning systems for DF in Purwokerto. This integrated approach is expected to support more effective prevention strategies and enhance public health preparedness.

Seli, Francelia Regina; A. Ineke Pakereng , Magdalena

IT-Explore: Jurnal Penerapan Teknologi Informasi dan Komunikasi 2025 Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana

Technological advances that continue to develop have changed the way people carry out various activities, including online buying and selling transactions. Various e-commerce platforms are here to meet the Indonesian market, including Tiktok which in the form of a social tool that people like. The lesson wants to observe the satisfaction of Tiktok Shop users from UI/UX through the Naïve Bayes algorithm. This lesson uses the CRISP-DM method. There are stages of reviewing reports, efforts, models, readiness, appearance and reviews. 60 test data processed in Rapid Miner obtained results with a user interface accuracy level of 88.33% and a user experience accuracy level of 76.67%. This shows that the user interface and user experience are factors that influence the level of satisfaction of Tiktok Shop users.

Saputri, Eliana

IT-Explore: Jurnal Penerapan Teknologi Informasi dan Komunikasi 2025 Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana

The importance of data mining in Indonesia is increasing along with the growth of big data in various strategic sectors. Data mining plays an important role in transforming complex data into useful information to support data-driven decision making, which is urgently needed in the face of competitive challenges and operational complexity. This research aims to examine the development of data mining techniques and applications in Indonesia over the last decade (2015-2024). Through a systematic literature review approach, data was collected from academic publications in SCOPUS indexed databases. From the initial 95 papers found, a further selection was made based on accessibility, title, and abstract until 64 papers were included in the article review. The results show that techniques such as K-Means, Naive Bayes, and Decision Tree are most commonly used. In the business sector, clustering through K-Means is widely applied for market segmentation and consumer pattern analysis. The healthcare sector mainly utilizes classification techniques, such as Naive Bayes and Decision Tree, for disease risk prediction and early diagnosis. Meanwhile, the education sector uses data mining to assess student performance and predict potential dropouts, assisting institutions in optimizing learning strategies.

Yayang Tika Robiatush Sholiha; Lubna Asjad Muhda Nabilah; Imron Imron

Saturnus: Jurnal Teknologi dan Sistem Informasi 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

This study aims to evaluate user sentiment toward the Liputan6.com application available on the Google Play Store. In the digital era, user reviews serve as a significant indicator in assessing the quality of an application. However, the inconsistency between rating scores and review content renders manual analysis less objective. To address this issue, a machine learning approach was adopted by comparing two algorithms, namely Support Vector Machine (SVM) and Naïve Bayes (NB). A total of 2,500 reviews were collected through a web scraping process and automatically labeled based on the rating (positive if ≥ 3, negative if < 3). The data preprocessing stages included cleaning, case folding, tokenizing, stopword removal, and token filtering. Subsequently, word weighting was carried out using the TF-IDF method, followed by classification using 10-Fold Cross Validation in RapidMiner. The evaluation results indicate that, in the positive class, NB demonstrated superior precision (89.47%), whereas SVM achieved higher recall (98.94%) and F1-score (90.96%). In the negative class, SVM performed better in terms of precision (66.15%), while NB attained higher recall (65.65%) and F1-score (36.34%). Further evaluation based on AUC and accuracy positioned SVM in the good category (AUC 0.842; accuracy 83.82%), while NB was categorized as fail (AUC 0.505; accuracy 60.87%). Overall, SVM is considered to be more effective than NB.