SciRepID - Scientific Publication Search

Publication Search

49,117 articles from 425 journals · 1,447 citations tracked

Showing 1-20 of 34

Analytics

Mesra Betty Yel; Sopan Adrianto; Rasiban Rasiban; Eva Widiyanti

International Journal of Information Engineering and Science 2026 Asosiasi Riset Teknik Elektro dan Infomatika Indonesia

The growth of information technology has driven changes in consumer behavior, one of which is through e-commerce platforms such as Shopee. This phenomenon has generated a large number of customer reviews, including those for local cosmetic products such as Wardah. These reviews serve as an important source of information for understanding customer perceptions and satisfaction levels. However, manual analysis of large and linguistically diverse datasets is inefficient and potentially subjective. This study aims to implement the multi-category Naive Bayes algorithm to classify the sentiment of Wardah product reviews on Shopee into three categories: positive, negative, and neutral. The data were collected using a web scraping technique and processed through a series of preprocessing stages including case folding, tokenization, stopword removal, stemming, and text cleaning. Subsequently, term weighting was performed using the TF-IDF method prior to classification. Model performance was evaluated using a confusion matrix as well as accuracy, precision, and recall metrics. The results indicate that the multi-category Naive Bayes algorithm achieved an accuracy of 86.00%, a precision of 86.63%, and a recall of 98.24%. This approach can assist business practitioners in objectively understanding customer opinions and support decision-making in business strategy and product development.

Untung Surapati; Veri Arinal; Tri Wahyudi; Ahmad Fauzan

International Journal of Applied Mathematics and Computing 2026 Asosiasi Riset Ilmu Matematika dan Sains Indonesia

The rise of social media has created a digital public sphere that enables users to express their opinions on social and political issues openly and in real-time. One of the most discussed topics on social media platform X is the trending hashtag #IndonesiaGelap, which reflects public concern and criticism regarding various governmental and societal conditions. This study aims to conduct sentiment analysis on tweets containing the hashtag to determine the overall sentiment trend among users. The method employed in this research is the Naive Bayes classification algorithm, known for its simplicity and effectiveness in text classification. To enhance the model’s performance, Particle Swarm Optimization (PSO) is applied to optimize feature selection and parameter tuning. The dataset consists of public tweets collected via the Twitter API, followed by preprocessing, feature extraction using TF-IDF, and sentiment classification into three categories: positive, negative, and neutral. The results indicate that the integration of PSO significantly improves the classification accuracy of the Naive Bayes model compared to the baseline. The majority of tweets related to #IndonesiaGelap exhibit a negative sentiment, indicating widespread public dissatisfaction and criticism. This research is expected to contribute to a better understanding of public perception and serve as valuable input for stakeholders in addressing social issues in the digital age.

Eko Susanto; Sharipuddin Sharipuddin; Benni Purnama

Prosiding Seminar Nasional Ilmu Teknik 2026 Asosiasi Riset Ilmu Teknik Indonesia

The rapid growth of e-commerce in Indonesia, particularly the Shopee platform, has generated a large volume of user reviews on the Google Play Store, which can be analyzed to understand consumer sentiment. This study aims to compare the performance of the Support Vector Machine (SVM) and Random Forest (RF) algorithms in binary sentiment classification (positive and negative) on Shopee reviews, as well as to statistically test the significance of their differences using One-Way ANOVA. A total of 400,498 reviews were collected via web scraping, preprocessed through text normalization, tokenization, and Indonesian language stemming, and then feature-extracted using TF-IDF and Count Vectorizer. Evaluation results show that SVM achieved an accuracy of 91.77%, precision of 91.49%, recall of 91.77%, and F1-Score of 91.56%, while RF achieved an accuracy of 90.07%, precision of 91.68%, recall of 90.07%, and F1-Score of 90.55%. ANOVA confirmed that the performance difference between the two algorithms is statistically significant (p-value = 0.0007) with a large effect size (η² = 0.1815). Therefore, SVM is recommended as a more optimal and consistent algorithm for automated sentiment analysis of Indonesian e-commerce reviews, while also providing a replicable methodological framework for similar future research.

Putri Ramadani; Nur Aisyah Pandia; Salsabila Putri Hati Siregar

Prosiding Seminar Nasional Ilmu Teknik 2026 Asosiasi Riset Ilmu Teknik Indonesia

The spread of hoax news in digital media is a serious problem because it can affect public opinion and social stability. This study aims to classify hoax news using the Support Vector Machine (SVM) algorithm. The dataset used is a hoax clarification dataset from the Ministry of Communication and Digital (Komdigi) of the Republic of Indonesia, totaling 1,872 data. The research process includes data collection, text pre-processing, feature extraction using TF-IDF, and classification using the SVM algorithm. Implementation was carried out using Google Colaboratory (Google Colab). Test results show that the SVM algorithm is able to provide good performance in classifying hoax news based on its topic with satisfactory accuracy, precision, recall, and F1-score values.

Afif Lustyo Muji; Aziz Musthofa; Dihin Muriyatmoko

Prosiding Seminar Nasional Ilmu Teknik 2026 Asosiasi Riset Ilmu Teknik Indonesia

Since the announcement of the policy plan for a name transfer system in the sale of used mobile phones, the issue has attracted widespread public attention and discussion. People have expressed their opinions on social media platforms, particularly TikTok. This study aims to classify the sentiment of TikTok users using Naive Bayes and Support Vector Machine (SVM) algorithms. The data were collected through a comment scraping technique on related content.The research stages include text preprocessing, sentiment labeling into positive, negative, and neutral categories, and feature extraction using TF-IDF. The classification process employs Naive Bayes and Support Vector Machine algorithms, which are then evaluated based on accuracy, precision, recall, and F1-score. The results of this study indicate that both methods are capable of classifying sentiment effectively. However, the Support Vector Machine method is superior to the Naive Bayes method with an accuracy rate of 99.57% compared to 94.30%. This study is expected to help the government understand public responses to the planned policy of the used mobile phone name transfer system.

Rangga Wahyu Dealova; Deo Pradana; Ali Akbar Ramadhan; Safrizal Safrizal

Jurnal Kendali Teknik dan Sains 2026 International Forum of Researchers and Lecturers

Educator certificates are official documents that play a crucial role for teachers, as they serve as legal proof of professional competence and are required for various administrative purposes, such as professional allowance applications, promotion, transfer, and institutional accreditation. Along with the increasing number of educators in Indonesia, the volume of educator certificate data managed by educational institutions has also grown significantly. However, certificate management is still largely conducted in a conventional manner, functioning merely as digital or physical archives without an effective search mechanism, resulting in inefficiencies and difficulties in retrieving relevant documents. Therefore, an information retrieval approach is needed to support fast and accurate document searching. This study aims to analyze and implement an information retrieval system for educator certificates using the Cosine Similarity method. The research data consist of educator certificate documents, including professional educator certificates, training certificates, and competency certificates. The retrieval process involves text preprocessing, term weighting using TF-IDF, and similarity measurement using Cosine Similarity. The results show that document d1 (Professional Mathematics Educator Certificate) has the highest similarity value to the query “educator certificate,” as it contains all query terms with relatively high TF-IDF weights. Document d3 ranks second due to partial term similarity, while document d2 has the lowest similarity value because it shares only one common term with the query. These findings indicate that the Cosine Similarity method is effective in ranking educator certificate documents based on their content relevance in an objective and measurable manner. The proposed system can improve the efficiency and accuracy of educator certificate document management and retrieval in educational institutions.

Achmad Faris Fadhlulah; Dika Arif Sihombing; Muhammad Fahri Rinanda; Rizki Riandi; Sotar Ferdinand Hutabarat

Jurnal Kendali Teknik dan Sains 2026 International Forum of Researchers and Lecturers

The Indonesia Smart Program (Program Indonesia Pintar/PIP) is a government initiative aimed at ensuring equal access to education for students from underprivileged families, including those at the junior high school (SMP) level. However, at the school level, the management of PIP recipient data still faces several challenges, particularly in data searching and utilization, due to the increasing volume of data and the use of simple or manual search methods. These conditions can lead to delays in obtaining information and reduce the accuracy of decision-making. Therefore, an effective information retrieval system is needed to manage and search PIP recipient data efficiently. This study aims to design and develop an Information Retrieval System for PIP recipient data at the junior high school level using the Term Frequency–Inverse Document Frequency (TF-IDF) method. The TF-IDF method is applied to assign weights to terms in each document, enabling the system to identify and rank documents based on their relevance to user queries. The test results show that the system is able to measure document relevance accurately, where documents D3 and D4 obtain the highest similarity value of 0.099586089 and are classified as highly relevant, while other documents show lower similarity values down to zero. These results are also supported by graphical visualization, which helps users compare relevance levels more clearly. Thus, the implementation of the TF-IDF method has proven to be effective in supporting accurate, efficient, and systematic searching and management of PIP recipient data at the junior high school level.

Aditya Abdulloh Masykur; Aditya Abdulloh Masykur; Rino Raihan Gumilang; Harun Al Rosyid

Jurnal Elektronika dan Komputer 2026 STEKOM PRESS

The performance of the Indonesian National Team (Timnas) in the 2026 World Cup qualifications has triggered massive and diverse responses on social media, particularly on platform X. This study aims to identify and classify public sentiment regarding Timnas Indonesia's performance into positive, negative, and neutral categories using a data mining approach. Text data was processed through pre-processing stages, term weighting using TF-IDF, and the application of the Synthetic Minority Over-sampling Technique (SMOTE) to address significant class distribution imbalance. The classification algorithm employed was Multinomial Naïve Bayes. Model performance evaluation was conducted by comparing two training-testing data split scenarios: 90:10 and 80:20 ratios. The results indicate that public opinion is dominated by negative sentiment at 73.2%, reflecting public disappointment. In terms of model performance, the 90:10 ratio scenario yielded the best accuracy of 80%, outperforming the 80:20 ratio which recorded an accuracy of 75%. These findings demonstrate that combining Multinomial Naïve Bayes with the SMOTE technique is effective in handling imbalanced text data and is capable of accurately mapping public perception.

Elin Tamaya; Sharipuddin Sharipuddin; Nurhadi Nurhadi

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

Budget efficiency is an important issue in state financial management because it is directly related to government spending priorities and their impact on public service programs. Discussions about budget efficiency policies are widespread on social media platform X, generating diverse public responses, thus necessitating an automated approach to understand public opinion trends more quickly and objectively. This research aims to analyze the sentiment of Indonesian people toward budget efficiency policies and compare the performance of the Naïve Bayes and Support Vector Machine (SVM) algorithms in classifying sentiment. The research data used 10,909 Indonesian-language tweets sourced from a public dataset, which were then processed thru the preprocessing stages including cleaning, case folding, normalization, tokenization, stopword removal, and stemming. Sentiment labeling is performed automatically using the Indonesian Sentiment Lexicon (InSet) approach to categorize data into positive, negative, and neutral sentiments. Feature extraction was performed using Term Frequency–Inverse Document Frequency (TF-IDF), and then the data was divided into training and testing sets with an 80:20 ratio. Model performance evaluation was conducted using a confusion matrix and the metrics of accuracy, precision, recall, and F1-score. The research results show that sentiment distribution is dominated by negative sentiment at 56.78%, followed by positive sentiment at 37.40%, and neutral sentiment at 5.83%. In the classification stage, SVM performed best with an accuracy of 86%, while Naïve Bayes achieved an accuracy of 74%. These findings indicate that SVM is more optimal for sentiment classification on social media text data and can be utilized to more effectively support the analysis of public response to budget efficiency policies.

Fransiskus Dapot Sihaloho; Jasmir Jasmir; Gunardi Gunardi

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

The rapid growth of e-commerce platforms in Indonesia, particularly Tokopedia, has resulted in a large volume of consumer reviews containing valuable information regarding customer perceptions and satisfaction. However, manual analysis of such reviews is inefficient and prone to subjectivity, necessitating an automated approach based on machine learning. This study aims to classify the sentiment of sports product reviews on Tokopedia into positive, negative, and neutral categories by applying Logistic Regression, Support Vector Machine (SVM), and Random Forest using the Term Frequency–Inverse Document Frequency (TF-IDF) approach. The data were collected through web scraping of Indonesian-language sports product reviews and processed through several preprocessing stages, including data cleaning, case folding, tokenization, stopword removal, and stemming. Feature representation was performed using TF-IDF to transform textual data into numerical vectors, after which the dataset was divided into training and testing sets with an 80:20 ratio. Model performance was evaluated using accuracy, precision, recall, and F1-score metrics. The results indicate that the application of TF-IDF significantly improves the performance of all models, with SVM consistently achieving the most optimal performance compared to Logistic Regression and Random Forest. These findings demonstrate that classical machine learning algorithms combined with TF-IDF remain highly effective for sentiment analysis of Indonesian-language text. The implications of this study are expected to assist sellers in understanding customer opinions, support consumers in making informed purchasing decisions, and serve as a foundation for the development of sentiment analysis and recommendation systems on e-commerce platforms.

Srikandi Alifya; Jasmir Jasmir; Elvi yanti

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

The growth of e-commerce in Indonesia has led to an increase in product reviews, including for beauty products on Tokopedia and Shopee. These reviews serve as important sources of information to assess consumer satisfaction; however, manually analyzing thousands of reviews daily is impractical. This study applies Natural Language Processing (NLP) with Naive Bayes, C4.5, XGBoost algorithms to classify sentiment in Indonesian-language reviews. The dataset used consists of 76,256 reviews labeled as positive, negative, and neutral. The research stages include text preprocessing, feature representation using BoW and TF-IDF, data balancing through SMOTE, and model performance evaluation based on accuracy, precision, and recall. Differences in results among the algorithms were analyzed using ANOVA. The results show that Naive Bayes achieved the highest accuracy at 67.71%, followed by XGBoost at 65.91%, and C4.5 at 58.39%, with Naive Bayes performing best in identifying positive and negative sentiments, while XGBoost and C4.5 handled more complex data patterns effectively. These findings provide guidance for sentiment analysis in Indonesian and support businesses in obtaining automated insights from customer reviews to improve product quality and services.

Nanda Mediya Sari; Jasmir Jasmir; Elvi Yanti

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

Sentiment analysis is a technique in Natural Language Processing (NLP) used to identify user opinion tendencies based on textual reviews. This study analyzer user reviews of the Maxim application on the Google Play Store and compares three Machine Learning algoritmhs-Naïve Bayes, Support Vector Machine (SVM), and CatBoost-in classifying sentiment. The research stages include data collection, text preprocessing, feature extraction using TF-IDF and Chi-Square, class balancing using SMOTE, and performance evaluation through Accuracy, Precision, Recall, and F1-Score. ANOVA is used to examine the influence of feature selection on model performance. The results show that each model exhibits different performance level across the tested feature combinations. The CatBoost achieved the highest accuracy of 99,26% and demonstrating the most stable performance. Meanwhile, the Naïve Bayes and SVM models experienced performance decreases experiments, especially after applying SMOTE. These findings indicate that the choise of algorithm, feature extraction method, and class balancing technique significantly affects classification outcomes. Overall, CatBoost is identified as the best-performing model, providing more consistenst classification result in accordance with the characteristics of the user reviews.

Noronha, Marcelino Caetano; Dwiasnati, Saruni; Helena P Panjaitan, Cherlina

Journal of Information Technology and Computer Science 2025 International Forum of Researchers and Lecturers

Abstract: The rapid diffusion of Generative Artificial Intelligence (AI) has intensified public debate regarding its benefits, risks, and societal implications. This study investigates public sentiment and thematic structures surrounding Generative AI by analyzing Twitter discourse as a representation of large-scale, real-time public perception. The research addresses two main problems: how public sentiment toward Generative AI is distributed and what dominant themes shape this perception. Accordingly, the objective is to map both emotional polarity and thematic narratives embedded in social media conversations. A computational mixed-methods approach was employed using a dataset of 12,470 tweets collected on 17 December 2024. Sentiment classification was conducted using a transformer-based DistilBERT model, while semantic representations were generated with Sentence-BERT. Topic modeling was performed using BERTopic, integrating HDBSCAN clustering and class-based TF-IDF to extract coherent and interpretable topics. Human-in-the-loop validation supported the interpretive robustness of topic labeling. The findings reveal that public sentiment toward Generative AI is predominantly positive (41.8%), particularly in relation to productivity enhancement, education, and creative applications. Neutral sentiment (31.4%) reflects informational discourse, while negative sentiment (26.8%) centers on ethical concerns, privacy risks, misinformation, and AI hallucinations. Seven dominant topics were identified, with clear topic–sentiment alignment showing optimism in utility-driven themes and skepticism in ethics- and risk-related discussions. In conclusion, public perception of Generative AI is dualistic—characterized by strong enthusiasm alongside persistent caution. These results provide empirical insights for AI governance, responsible innovation, and future research on socio-technical impacts of Generative AI. *    

Ryzal Nur Alvandy; Ryzal Nur Alvandy; Arita Witianti

Jurnal Elektronika dan Komputer 2025 STEKOM PRESS

The rapid expansion of e-commerce in Indonesia has resulted in a significant rise in the number of customer reviews, which serve as a valuable source of insight for understanding consumer satisfaction. This study aims to classify or identify sentiments from product reviews on the Tokopedia platform into three categories, using the Support Vector Machine algorithm. The classification method data were ethically collected through web scraping and include review text, ratings, and the number of “likes.”  The preprocessing stage involved several NLP techniques such as pre-procesesing data representation was generated using the Term Frequency–Inverse Document Frequency method, while the issue of class imbalance was addressed using the Synthetic Minority Over-sampling Technique.  Based on the test results, the SVM model achieved an accuracy of 79.48% on the test data using a linear kernel, showing the best performance in classifying positive sentiments. However, the classification of neutral and negative sentiments still requires improvement. This study demonstrates that the combination of the TF-IDF method, additional numerical features, and data balancing techniques can produce an an efficient sentiment analysis model within the e-commerce domain.

Gunawan, Ricardho; Hendry, Hendry

IT-Explore: Jurnal Penerapan Teknologi Informasi dan Komunikasi 2025 Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana

Sentiment analysis of guest reviews is a crucial aspect in improving the quality of hotel services. This study aims to analyze the sentiment of guest reviews regarding the services of Grand Diamond Hotel Yogyakarta using a machine learning approach with the Support Vector Machine (SVM) algorithm. SVM was chosen because it can handle high-dimensional data such as text and is capable of forming an optimal separating hyperplane between sentiment classes. The research data was obtained through web scraping from Traveloka, yielding 1,119 reviews, which were processed through preprocessing, translation, and sentiment labeling using the TextBlob library. After TF-IDF weighting, the data was divided into 80% for training and 20% for testing. The linear kernel SVM model achieved 80% accuracy in classifying the reviews into positive, negative, and neutral categories. The results of this study were implemented in a web-based application equipped with data visualization and model evaluation features, allowing hotel management to efficiently monitor and analyze guest sentiment and support data-driven service quality improvement.

Devi Daniyanti; Belsana Butar Butar

Jurnal Sistem Informasi dan Ilmu Komputer 2025 International Forum of Researchers and Lecturers

This research aims to analyze GoPay user sentiments on the X social media platform (formerly known as Twitter) using the Naive Bayes Classifier algorithm. Sentiment analysis was conducted to understand user perceptions and satisfaction levels towards GoPay digital payment services based on their shared comments and reviews. Data was collected through a tweet crawling process containing the keyword "GoPay" within a specific period. The research stages included data preprocessing (case folding, tokenizing, filtering, and stemming), sentiment labeling (positive, negative), word weighting using TF-IDF, and classification using the Naive Bayes algorithm. The results showed that from a total of 1,431 analyzed tweets, 797 data contained positive sentiments, and 643 data contained negative sentiments. With a classification accuracy rate reaching 82.94%. The most frequently positively commented factors included ease of use and offered promotions, while the main complaints were related to technical issues and customer service. This research provides insights for GoPay developers to improve services according to user feedback.  

Farendika Rezzi

Uranus: Jurnal Ilmiah Teknik Elektro, Sains dan Informatika 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

The rapid growth of e-commerce platforms has significantly transformed the way consumers share and access product feedback. One of the widely used platforms in Indonesia is Shopee, where customers actively provide reviews of various products, including local skincare brands such as Kahf facial wash. Customer reviews on e-commerce platforms contain valuable information that can be analyzed to understand consumer opinions and preferences. Sentiment analysis, as a branch of natural language processing, enables the classification of textual data into categories such as positive, negative, or neutral. This study aims to classify Shopee user sentiments regarding Kahf facial wash products by implementing the Multinomial Naïve Bayes algorithm, a well-known probabilistic classifier suitable for text categorization. The research methodology consisted of several preprocessing stages, including data cleansing, case folding, tokenizing, stopword removal, and stemming, to prepare raw review texts for further analysis. For feature representation, the Term Frequency–Inverse Document Frequency (TF-IDF) method was applied to capture the importance of words across documents. To evaluate the classification performance, K-Fold cross-validation was employed with K values of 4, 5, 6, and 10 to ensure model reliability and robustness. Considering the issue of imbalanced datasets in user-generated reviews, the Synthetic Minority Over-sampling Technique (SMOTE) was utilized to balance the distribution of sentiment classes. Based on the confusion matrix, the Multinomial Naïve Bayes algorithm demonstrated effective performance in classifying sentiments, achieving satisfactory levels of accuracy, precision, and recall across different folds. These results indicate that the algorithm is capable of handling sentiment analysis tasks for local product reviews effectively. The findings of this study are expected to provide meaningful insights for businesses in understanding consumer perceptions, thereby supporting decision-making processes in product development, marketing strategies, and customer engagement for local brands.

Muhammad Azlan; Elvi Rahmi

Neptunus: Jurnal Ilmu Komputer Dan Teknologi Informasi 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

This study aims to analyze the sentiment of customer reviews of the Grand Jatra Hotel Pekanbaru on the Google Review platform using the Naïve Bayes algorithm. Social media and online review platforms are increasingly becoming the primary source of information for potential customers in making purchasing decisions, particularly in the hospitality sector. Therefore, sentiment analysis of customer reviews is crucial for understanding consumer perceptions and providing strategic input for hotels in improving service quality. The research data was collected using web scraping techniques to obtain publicly available customer reviews. The obtained data was then processed through text preprocessing stages including case folding, tokenizing, normalization, stopword removal, and stemming. The Term Frequency-Inverse Document Frequency (TF-IDF) method was then used to weight each word, so that more relevant words have a greater influence in the classification process. The sentiment classification process was carried out into two main categories, namely positive and negative. The Naïve Bayes model was trained using training data and then tested with test data to measure the algorithm's performance in classifying sentiment. The evaluation results show that the model built is able to achieve an accuracy level of 98%, with a precision value of 97% and a recall of 100% in the positive class, and 92% in the negative class. These findings confirm that the Naïve Bayes algorithm can be effectively used in analyzing customer sentiment towards hotel services and facilities. Practically, the results of this study are expected to provide insight for the management of Grand Jatra Hotel Pekanbaru in understanding customer perceptions, identifying service strengths and weaknesses, and formulating more targeted marketing strategies. In addition, this study can also be a reference for the development of similar studies in the hotel industry and other service sectors.

Yayang Tika Robiatush Sholiha; Lubna Asjad Muhda Nabilah; Imron Imron

Saturnus: Jurnal Teknologi dan Sistem Informasi 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

This study aims to evaluate user sentiment toward the Liputan6.com application available on the Google Play Store. In the digital era, user reviews serve as a significant indicator in assessing the quality of an application. However, the inconsistency between rating scores and review content renders manual analysis less objective. To address this issue, a machine learning approach was adopted by comparing two algorithms, namely Support Vector Machine (SVM) and Naïve Bayes (NB). A total of 2,500 reviews were collected through a web scraping process and automatically labeled based on the rating (positive if ≥ 3, negative if < 3). The data preprocessing stages included cleaning, case folding, tokenizing, stopword removal, and token filtering. Subsequently, word weighting was carried out using the TF-IDF method, followed by classification using 10-Fold Cross Validation in RapidMiner. The evaluation results indicate that, in the positive class, NB demonstrated superior precision (89.47%), whereas SVM achieved higher recall (98.94%) and F1-score (90.96%). In the negative class, SVM performed better in terms of precision (66.15%), while NB attained higher recall (65.65%) and F1-score (36.34%). Further evaluation based on AUC and accuracy positioned SVM in the good category (AUC 0.842; accuracy 83.82%), while NB was categorized as fail (AUC 0.505; accuracy 60.87%). Overall, SVM is considered to be more effective than NB.

Jasmine Aulia Mumtaz; Kinaya Khairunnisa Komariansyah; Wildan Holik; Muhammad Galuh Gumelar; Reza Pratama +1 more

Jurnal Rumpun Ilmu Bahasa dan Pendidikan 2025 Asosiasi Periset Bahasa Sastra Indonesia

Digital learning applications like HeyJapan are increasingly popular. User reviews on platforms such as Google Play Store contain valuable information on user perceptions and experiences. To process this information systematically, this study employs a Natural Language Processing (NLP) approach to analyze sentiment toward the HeyJapan application. Data was collected using web scraping techniques with Python and the google play scraper library, resulting in 1,000 latest user reviews. The analysis included data collection, preprocessing, sentiment labeling using TextBlob, visualization, modeling with Logistic Regression, and evaluation. After preprocessing, 923 valid reviews were classified into three sentiment categories based on polarity which are positive, neutral, and negative. Results showed 71.4% of reviews positive, 26.1% neutral, and 2.5% negative. Visualizations in pie charts and word clouds provided an overview of user perceptions. Modeling with TF-IDF and Logistic Regression achieved 88% accuracy with the highest f1-score in the positive sentiment category. Evaluation indicates the model is fairly reliable in classifying sentiments, especially for positive and neutral categories, though negative sentiment classification needs improvement. This study shows the NLP approach can evaluate user perceptions of educational applications based on reviews and serve as a basis for improving foreign language learning app quality.