SciRepID - Scientific Publication Search

Classification of Sales of Best-Selling Products in Ira Store Using Naive Bayes Algorithm and K-Nearest Neighbor Algorithm

Yuma Akbar; Kiki Setiawan; Muhammad Joko Umbaran Kharis Bahrudin; Intan Purwasih

International Journal of Electrical Engineering, Mathematics and Computer Science• 2024 •Asosiasi Riset Teknik Elektro dan Infomatika Indonesia

In today's world of retail and technology, competition is fiercely competitive. With the development of retail businesses increasing in number and mushrooming in a region, consumer needs are increasing, and retail business players are competing to develop their businesses by utilizing existing technology. Daily sales transaction data continues to increase, causing a lot of storage. Toko Ira has more than 228 sales transaction data records from 2023 to 2024 that have not been used. Data requires a lot of storage space. Additionally, the data has not been used in an effective way. Based on this problem, this research aims to use data mining to classify sales transaction data to determine which items are selling best. This research is a case study with a qualitative approach. This research was conducted with the Naive Bayes method and Rapidminer was used. The results of the sales transaction data classification research are the division of products into best-selling and non-selling categories. The results of this research show that the K-Nearest Neighbors (KNN) algorithm with a 50:50 data division is more effective in predicting and classifying sales of best-selling and non-selling products in IRA stores. The results show that the Naive Bayes algorithm has an accuracy of 89.91%, while the K-Nearest Neighbors (KNN) algorithm has an accuracy of 60.09%.

https://doi.org/10.62951/ijeemcs.v1i4.13

Open Access Website Google Scholar

Optimizing Heart Disease Prediction : A Comparative Study of Machine Learning Models Using Clinical Data

Budiman Budiman; Nur Alamsyah; Elia Setiana; Valencia Claudia Jennifer Kaunang; Syahira Putri Himmaniah

International Journal of Science and Mathematics Education• 2024 •Asosiasi Riset Ilmu Matematika dan Sains Indonesia

Cardiovascular disease is a leading cause of death globally, necessitating effective predictive systems. This research aims to analyze the effectiveness of various machine learning (ML) models—Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB), Support Vector Classifier (SVC), and K-Nearest Neighbors (KNN)—in predicting heart disease using publicly available health data. The study involved pre-processing data, training models, and evaluating them using accuracy, precision, recall, F1-score, and G-Mean metrics. The results show that KNN is the most reliable model, with the highest accuracy of 92%. Significant health features were identified, such as chest pain type and maximum heart rate. The study contributes to improving clinical decision support systems by identifying optimal ML models for heart disease prediction.

https://doi.org/10.62951/ijsme.v1i4.96

Open Access Website Google Scholar

Identification of Traditional Herbal Leaves and Their Benefits Using K-Nearest Neighbors (KNN)

Nur Rahma Ditta Zahra; Kanaya Sabila Azzahra; Nur Iman Nugraha; Muhammad Ilham Nurfajri; Nabil Malik Al Hapid +2 more

International Journal of Multilingual Education and Applied Linguistics• 2024 •Asosiasi Periset Bahasa Sastra Indonesia

Abstract. This study presents a web-based system for identifying traditional herbal leaves using K-Nearest Neighbors (KNN) and image processing techniques focused on analyzing leaf shape and color. The dataset used consists of images of various types of herbal leaves, providing a basis for classification and medicinal benefit information retrieval. The system was tested with multiple leaf samples to assess accuracy, speed, and effectiveness in identifying leaf types based on visual characteristics. Results show that the system can recognize different types of herbal leaves and display information on their medicinal properties in a user-friendly interface..

https://doi.org/10.61132/ijmeal.v1i4.113

Open Access Website Google Scholar

Analisis KNN untuk Tempat Rekomendasi Tempat Wisata Sumba Barat

Paschal Wungo; Gergorius Kopong Pati; Karolus Wulla Rato

Neptunus: Jurnal Ilmu Komputer Dan Teknologi Informasi• 2024 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

The growth of the internet has influenced the tourism industry because the internet makes it easier for individuals to obtain reviews about places to visit and because the internet is a tool used by tourist site managers to assess the quality of their offerings. The increase in the number of tourists of almost two million in just three years in West Sumba is proof of this influence. Social media is a tool that people use to interact with each other online; some people have multiple accounts on platforms such as Instagram, WhatsApp, Facebook, Telegram, Twitter, and so on. Tourists can receive recommendations for tourist attractions based on price and type of trip desired through a tourist attraction recommendation system that uses the KNN algorithm. Three factors were used in this research: activity, type of tourism, and type of price. An accuracy of 63.16% is found in the test results using the KNN algorithm and the Rapid Miner application with a K value of 5. The analysis results show that the K-Nearest Neighbor (K-NN) approach can be used as a guideline for recommending tourist destinations to visitors in West Sumba.

https://doi.org/10.61132/neptunus.v2i4.408

Open Access Website Google Scholar

Prediksi Pengaruh Kegiatan MBKM terhadap Mahasiswa menggunakan Metode K-Nearest Neighbor

Farida Hanum; Yani Maulita; I Gusti Prahmana

Bridge : Jurnal Publikasi Sistem Informasi dan Telekomunikasi• 2024 •Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

The Merdeka Belajar Kampus Merdeka (MBKM) program provides students the opportunity to study for one semester outside of their major, aiming to develop the soft and hard skills required in the workforce. One key component of this program is internships or practical work, which gives students hands-on experience in the professional world and the chance to build professional networks. This research uses the K-Nearest Neighbor (K-NN) method to predict the impact of MBKM activities on undergraduate students at STMIK Kaputama. Using the RapidMiner application, student data was tested to obtain the accuracy of predicting students' engagement in the MBKM program in the future. The test results show that the K-NN model has an accuracy of 75.34%, indicating that the model is fairly good at predicting the impact of the MBKM program on students.    

https://doi.org/10.62951/bridge.v2i4.249

Open Access Website Google Scholar

“Klasifikasi Citra Penyakit Gigi Menggunakan Metode K-Nearest Neighbor”.

Sri Dewi Novita; Achmad Fauzi; Victor Maruli Pakpahan

Bridge : Jurnal Publikasi Sistem Informasi dan Telekomunikasi• 2024 •Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

Handling of dental disease problems requires that it be handled quickly and correctly, but not all teams of dental experts can carry out treatment quickly due to the lack of a team of dental experts who are in the workplace or hospital 24 hours a day.  Apart from that, the public also has very little knowledge of information about dental disease, so that to treat dental disease, people have to consult a dentist. To classify images of dental disease, feature extraction is needed. Feature extraction is taking characteristics of an object that can describe the image. One example of image feature extraction used is Red, Green, Blue (RGB). This feature extraction is often used to identify or classify an image. Dental image data that will be used in the classification process are tooth abrasion, anterior crosbite, cavities and gingivitis. K-Nears Neigbor is the simplest data mining algorithm.  The aim of this algorithm is to find the results of the closest distance classification for each object.  In determining the distance, the data is initially divided into two parts, namely training data and testing data. After receiving the training data and testing data, the distance from each testing data (Equilidence Distance) to the training data is calculated. The K-Nearest Neighbors method can be applied to classify dental disease based on images of types of dental disease using Matlab software. As a result of the image data training process, 40 image data were input, training results obtained were 100%.

https://doi.org/10.62951/bridge.v2i4.244

Open Access Website Google Scholar

Pemodelan K-Nearest Neighbor Untuk Identifikasi Pola Kepuasan Mahasiswa Terhadap Pelayanan Kampus (Studi Kasus : STMIK Kaputama)

Muhammad Rizky R Ritonga; Marto Sihombing; Selfira Selfira

Modem : Jurnal Informatika dan Sains Teknologi• 2024 •Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

This research focuses on using the K-Nearest Neighbor (KNN) algorithm to model student satisfaction with campus services. The study finds that the quality of the dataset strongly influences the accuracy of the KNN classification results. Factors such as data cleanliness, balanced class distribution, and sufficient training data volume are highlighted as crucial for a successful model. The research also emphasizes the significance of proper feature selection in enhancing classification performance, suggesting that irrelevant features can introduce noise and decrease model accuracy. The model was evaluated using a dataset of 1032 data points and K=5, achieving an accuracy of 93.72%. While the model performed well for certain classes such as "Very Good" and "None", challenges were encountered in classifying the "Fair" and "Deficient" classes. The study concludes that KNN is effective in identifying student satisfaction patterns but highlights the need for improvements in accurately classifying these challenging classes. Ultimately, the research underscores the importance of data quality and feature selection in enhancing the performance of classification models for student satisfaction analysis.

https://doi.org/10.62951/modem.v2i4.238

Open Access Website Google Scholar

Penerapan Algoritma K-Nearest Neighbor untuk Klasifikasi Usaha Masyarakat Berdasarkan Jenis Izin Usaha

Brema Daniel Ginting; Yusfrizal Yusfrizal; Lina Arliana Nur Kadim

Modem : Jurnal Informatika dan Sains Teknologi• 2024 •Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

Business legality is the identity of a business that legalizes a business so that it is recognized by the community. Business legality must be valid according to applicable laws and regulations so that the business can be protected by various documents that are valid in the eyes of the law. One of the supporting factors for the sustainability of a business is influenced by the existence of legal elements of the business being run. Business permits that must be owned by the community are a business establishment deed, business entity NPWP, trade business license (SIUP), company domicile certificate (SKDP) and business registration number (NIB). The increase in community businesses in Sei Bingai District, Langkat Regency has triggered many business permits that are not directly supervised by the local government. Community business permits are important documents in supervising the running of these community businesses. The types of businesses in Sei Bingai District also vary, such as tourism, C mining, trade, factories and so on.

https://doi.org/10.62951/modem.v2i4.233

Open Access Website Google Scholar

Emotion Detection Using Contextual Embeddings for Indonesian Product Review Texts on E-commerce Platform

1 Citations

Ariyanto, Amelia Devi Putri; Fari Katul Fikriah; Arif Fitra Setyawan

JURNAL ILMIAH KOMPUTER GRAFIS• 2024 •UNIVERSITAS STEKOM

The advancement of e-commerce has changed the way people shop. However, there is a mismatch between the actual quality of a product and the seller’s description. Product reviews are an important source of information for making purchasing decisions. However, processing large numbers of reviews manually is difficult. This research aims to detect emotions in Indonesian language product review texts using contextual embeddings. The public dataset used was PRDECT-ID, which comprises five emotion labels. The methods used include data preprocessing, feature extraction using contextual embeddings such as Bidirectional Encoder Representations from Transformers (BERT), and classification using Decision Tree, Naïve Bayes, and k-Nearest Neighbors (KNN). Among the compared models, the KNN model demonstrated the highest improvement, achieving a 15.09% enhancement over the decision tree results. This research provides insights into the effectiveness of contextual embeddings in detecting emotions in Indonesian language product review texts.

https://doi.org/10.51903/pixel.v17i1.2010

Open Access Website Google Scholar

The CLASSIFICATION OF PROSPECTIVE POLICY HOLDERS FOR SELECTING INSURANCE PRODUCTS USING A COMPARISON OF THE K-NEAREST NEIGHBOR METHOD AND THE NAIVE BAYES METHO

irfan, Irfan Nurdiansyah; Ari Hidayatullah

Jurnal Elektronika dan Komputer• 2024 •STEKOM PRESS

The insurance business within an insurance company offers insurance products owned by the insurance company. In every insurance product there is a premium payment and the premium is the income of an insurance company at the rate of the amount insured. The problem that PT BNI Life Insurance has is that there are many stops in premium payments such as policy redemptions due to errors in the benefits received or incorrect selection of the insurance product, this can reduce the achievement of targets for an insurance company. The aim of this research is to find out the best classification algorithm compared between K-Nearest Neighbor and Naive Bayes to predict the type of insurance product that customers will choose. In this research, data mining methods are applied to compare two different methods, namely the K-Nearest Neighbor method and the Naïve Bayes method. The level of accuracy results for the K-Nearest Neighbor method is 80% and the Naïve Bayes method is 70.53%, which means that the K-Nearest Neighbor method is the best method to apply to an insurance product classification system based on the demographics of prospective customers.

https://doi.org/10.51903/elkom.v17i1.1922

Open Access Website Google Scholar

Implementasi Metode CNN Dan K-Nearest Neighbor Untuk Klasifikasi Tingkat Kematangan Tanaman Cabai Rawit

Muhammad Rifki Bahrul Ulum; Basuki Rahmat; Made Hanindia Prami Swari

Modem : Jurnal Informatika dan Sains Teknologi• 2024 •Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

The process of identifying the ripeness level of cayenne peppers is an important step in cultivation and post-harvest handling. Dependence on the quality factors of farmers, such as visual diversity and differences in ripeness perception, results in subjective harvest outcomes. This manual process is also prone to inconsistent results, as humans have time limitations, fatigue, and sometimes lack concentration when sorting for long periods. To minimize these issues, technological intervention is needed to mechanically classify the ripeness level of cayenne peppers. This research aims to develop a classification model for the maturity level of cayenne pepper plants. This research proposes the use of the CNN method for feature extraction and KNN for data classification based on the features extracted by CNN. From the test scenarios carried out, the classification carried out by KNN based on CNN feature extraction got the best accuracy of 99.33%, while the CNN classification model got the best accuracy of 87.33%.

https://doi.org/10.62951/modem.v2i3.131

Open Access Website Google Scholar

Implementasi Algoritma Machine Learning Untuk Prediksi Awal Stunting Pada Anak Usia Dini Berdasarkan Tinggi Badan Dan Berat Badan

Yunni Adiyantari

Modem : Jurnal Informatika dan Sains Teknologi• 2024 •Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

This study aims to apply the K-Nearest Neighbors (KNN) algorithm to predict stunting status in young children based on height and weight data. Stunting is a growth failure condition caused by chronic malnutrition that negatively impacts children's physical and mental development. The dataset includes height, weight, and stunting status of children. The results show that the KNN model with k=3 achieved 100% accuracy on the test data. Evaluation using the confusion matrix and classification report indicates perfect precision, recall, and F1-score for each class. Data normalization with StandardScaler improved the model's performance by ensuring all features are on the same scale. The KNN algorithm proves to be a simple yet effective method for predicting stunting, demonstrating significant potential for early detection and health intervention in children. This study recommends using a larger and more diverse dataset, as well as incorporating additional relevant features to enhance model accuracy. Implementing the model in a web or mobile application is also suggested to assist healthcare professionals in the field.

https://doi.org/10.62951/modem.v2i3.130

Open Access Website Google Scholar

Sentiment Analysis Twitter Sentiment Analysis of the 2024 Indonesian Presidential Candidates Using the KNN Method

Rizal, Adetya Rizal Permana Putra; Rizal, Adetya Rizal Permana Putra; Jati Sasongko Wibowo

Jurnal Elektronika dan Komputer• 2024 •STEKOM PRESS

Pada tahun 2024, Indonesia akan menyelenggarakan pemilihan umum serentak yang meliputi pemilihan presiden dan pemilihan wakil rakyat di seluruh Indonesia. Masyarakat menanggapi kejadian ini dengan perasaan campur aduk, membagikan pemikirannya di situs media sosial seperti Twitter. Penelitian analisis sentimen calon presiden Indonesia tahun 2024 dilakukan terkait peristiwa ini. Sebanyak 1458 tweet digunakan dalam penelitian ini. Dengan 40,31% responden menyatakan sikap positif dan 43,46% menyatakan sentimen negatif, temuan analisis menunjukkan keseimbangan antara kedua sentimen tersebut. Menggunakan frasa "calon presiden," program Python di situs web Google Colab mengambil data twitter. Pendekatan K-Nearest Neighbor digunakan dalam proses klasifikasi. Selain itu data latih dibagi 6 : 4. 40% data uji dan 60% data latih. Nilai evaluasi yang diperoleh dari pengujian model dengan teknik K-Nearest Neighbor adalah akurasi sebesar 90,95%, presisi sebesar 62,17%, recall sebesar 62,33%, dan F-Measure sebesar 61,87%.

https://doi.org/10.51903/elkom.v17i1.1603

Open Access Website Google Scholar

Implementasi Algoritma K-Nearest Neighbor (KNN) untuk Identifikasi Penyakit pada Tanaman Jeruk Berdasarkan Citra Daun

Abiyan Naufal Hilmi; Eva Yulia Puspaningrum; Henni Endah Wahanani

Router : Jurnal Teknik Informatika dan Terapan• 2024 •Asosiasi Profesi Telekomunikasi dan Informatika Indonesia

The development of image processing technology today can create systems that are able to effectively recognize digital images, one of which is in the field of agriculture for plant disease identification. Citrus plants experience a decrease in productivity due to pathogen attacks on leaves such as Black Spot, Cancer, and CVDP so that disease identification is needed. The classification method that can be used to classify images is the K-Nearest Neighbor (K-NN) algorithm because it is simple and has high accuracy in image management. This study aims to implement and determine the performance of the K-NN algorithm in identifying citrus plant diseases based on leaf images. This research uses a dataset from the Kaggle website of 1,096 images. There are 12 research scenarios using the comparison between test data and training data as much as 4, namely (90% training data + 10% test data, 80% training data + 20% test data, 70% training data + 30% test data, 60% training data + 40% test data) and testing with 3 random state values (42, 32, 22). The results showed that the K-NN algorithm is very effective in identifying citrus plant diseases with the highest accuracy value in the 90% training data scenario and 10% test data with a value of K = 2 which is 98.5%.

https://doi.org/10.62951/router.v2i2.78

Open Access Website Google Scholar

Prediction Of Laptop Sales Using The K-Nearest Neighbor Method At The MVP Computer Mawar Store, In Takengon

Adi Kurniawan; Rayuwati Rayuwati; Ira Zulfa

International Journal of Economics and Management Sciences• 2024 •Asosiasi Riset Ekonomi dan Akuntansi Indonesia

This research relates to predictions of laptop sales in computer shops in Central Aceh, with a focus on laptop brands Acer, Asus, HP and Lenovo. Over the last three years, sales of these laptops have reached 1,629 units, with a monthly average of between 108 and 150 units. Business owners today prefer brands with the highest percentage of sales, but this can lead to dead stock problems. Therefore, the author proposes using data mining techniques, especially the K-Nearest Neighbor (K-NN) method, to make recommendations for the number of products to be purchased by business owners based on past sales data. The K-NN method requires complete, structured and continuous sales data. It is important to choose an appropriate K value, and other factors such as weather, seasons, promotions, and special events also affect laptop sales. K-NN models may need to be combined with other data to improve prediction accuracy. It is hoped that this research will provide academic benefits in expanding knowledge about the use of the K-NN method in sales prediction, as well as practical benefits for business owners in planning their sales strategies. The research conclusions highlight the importance of good data collection, choosing the right K value, and considering external factors in the laptop sales prediction process.      

https://doi.org/10.61132/ijems.v1i2.33

Open Access Website Google Scholar

Enhancing the Random Forest Model via Synthetic Minority Oversampling Technique for Credit-Card Fraud Detection

Aghware, Fidelis Obukohwo; Ojugo, Arnold Adimabua; Adigwe, Wilfred; Odiakaose, Christopher Chukwufumaya; Ojei, Emma Obiajulu +3 more

Journal of Computing Theories and Applications• 2024 •Universitas Dian Nuswantoro

Fraudsters increasingly exploit unauthorized credit card information for financial gain, targeting un-suspecting users, especially as financial institutions expand their services to semi-urban and rural areas. This, in turn, has continued to ripple across society, causing huge financial losses and lowering user trust implications for all cardholders. Thus, banks cum financial institutions are today poised to implement fraud detection schemes. Five algorithms were trained with and without the application of the Synthetic Minority Over-sampling Technique (SMOTE) to assess their performance. These algorithms included Random Forest (RF), K-Nearest Neighbors (KNN), Naïve Bayes (NB), Support Vector Machines (SVM), and Logistic Regression (LR). The methodology was implemented and tested through an API using Flask and Streamlit in Python. Before applying SMOTE, the RF classifier outperformed the others with an accuracy of 0.9802, while the accuracies for LR, KNN, NB, and SVM were 0.9219, 0.9435, 0.9508, and 0.9008, respectively. Conversely, after the application of SMOTE, RF achieved a prediction accuracy of 0.9919, whereas LR, KNN, NB, and SVM attained accuracies of 0.9805, 0.9210, 0.9125, and 0.8145, respectively. These results highlight the effectiveness of combining RF with SMOTE to enhance prediction accuracy in credit card fraud detection.

https://doi.org/10.62411/jcta.10323

Open Access Website Google Scholar

Strategic Feature Selection for Enhanced Scorch Prediction in Flexible Polyurethane Form Manufacturing

43 Citations

Omoruwou, Felix; Ojugo, Arnold Adimabua; Ilodigwe, Solomon Ebuka

Journal of Computing Theories and Applications• 2024 •Universitas Dian Nuswantoro

The occurrence of scorch during the production of flexible polyurethane is a significant issue that negatively impacts foam products' resilience and generally jeopardizes their integrity. The likelihood of foam product failure can be decreased by optimizing production variables based on machine learning algorithms used to predict the occurrence of scorch. Investigating technology is required because prevention is the best approach to dealing with this problem. Hence, machine learning algorithms were trained to predict the occurrence of scorch using the thermodynamic profile of polyurethane foam, which is made up of recorded production variables. A variety of heuristics algorithms were trained and assessed for how well they performed, namely XGBoost, Decision trees, Random Forest, K-nearest neighbors, Naive Bayes, Support Vector Machines, and Logistic Regression. The XGboost ensemble was found to perform best. It outperformed others with an accuracy of 98.3% (i.e., 0.983), followed by logistic regression, decision tree, random forest, K-nearest neighbors, and naïve Bayes, yielding a training accuracy of 88.1%, 66.7%, 84.2%, 87.5%, and 67.5% respectively. The XGBoost was finally used, yielding 2-distinct cases of non(occurrence) of scorch. Ensemble demonstrates that it is quite capable and is an effective way to predict the occurrence of scorch.

https://doi.org/10.62411/jcta.9539

Open Access Website Google Scholar

Dataset Analysis and Feature Characteristics to Predict Rice Production based on eXtreme Gradient Boosting

Wijayanti, Ella Budi; Setiadi, De Rosal Ignatius Moses; Setyoko, Bimo Haryo

Journal of Computing Theories and Applications• 2024 •Universitas Dian Nuswantoro

Rice plays a vital role as the main food source for almost half of the global population, contributing more than 21% of the total calories humans need. Production predictions are important for determining import-export policies. This research proposes the XGBoost method to predict rice harvests globally using FAO and World Bank datasets. Feature analysis, removal of duplicate data, and parameter tuning were carried out to support the performance of the XGBoost method. The results showed excellent performance based on which reached 0.99. Evaluation of model performance using metrics such as MSE, and MAE measured by k-fold validation show that XGBoost has a high ability to predict crop yields accurately compared to other regression methods such as Random Forest (RF), Gradient Boost (GB), Bagging Regressor (BR) and K-Nearest Neighbor (KNN). Apart from that, an ablation study was also carried out by comparing the performance of each model with various features and state-of-the-art. The results prove the superiority of the proposed XGBoost method. Where results are consistent, and performance is better, this model can effectively support agricultural sustainability, especially rice production.

https://doi.org/10.62411/jcta.10057

Open Access Website Google Scholar