SciRepID - Scientific Publication Search

Analisis dan Penerapan Algoritma Naïve Bayes Untuk Klasifikasi Penyakit Diabetes Melitus

M Daffa Adrian; Pareza Alam Jusia; Rudolf Sinaga; Azzahra Raihana Adriansyah; Mutammimah Mutammimah

Prosiding Seminar Nasional Ilmu Teknik• 2025 •Asosiasi Riset Ilmu Teknik Indonesia

Diabetes Mellitus is a group of metabolic diseases characterized by hyperglycemia resulting from defects in insulin secretion, insulin action or both. Hyperglycemia is a medical condition in the form of an increase in glucose levels beyond normal limits which is a characteristic of several diseases, especially Diabetes Mellitus, in addition to various other conditions. Diabetes Mellitus is currently a global health threat. Classification is one of the techniques of data mining that can be used to help predict the results of the classification of types of diabetes using the naïve Bayes algorithm. Testing was carried out using 5 evaluation models including rapid miner with 3 options, namely use training set, 5 Fold Cross-Validation, 10 Fold Cross-Validation, and 2 other evaluation models, namely Microsoft Excel and Python. Testing data regarding Diabetes Mellitus has high accuracy in the excel evaluation model, which is 89.00% compared to other evaluation models. Meanwhile, the lowest accuracy is the Python evaluation model which obtains an accuracy of 86.36%. The Naïve Bayes algorithm can be said to be one of the most effective algorithms, both in terms of calculations and the final results, where the test can be used as a basis for diabetes mellitus considering the accuracy results are above 85%.

https://doi.org/10.61132/prosemnasproit.v2i2.114

Open Access Website Google Scholar

Analisis Kelayakan Pemberian Kredit dengan Algoritma Naïve Bayes untuk Antisipasi Risiko Kredit Bermasalah Pada BPR Ukabima Lestari Cabang Jambi

Anggi Saputra; Setiawan Assegaff; Benni Purnama

Prosiding Seminar Nasional Ilmu Teknik• 2025 •Asosiasi Riset Ilmu Teknik Indonesia

This study analyzes creditworthiness assessment and predicts non-performing loan (NPL) risk using the Naïve Bayes algorithm at BPR Ukabima Lestari, Jambi Branch. A quantitative data mining approach with probabilistic classification is applied. The dataset includes borrower attributes such as age, occupation, income, loan amount, tenor, collateral, and repayment history. Research stages comprise data preprocessing, model development, and performance evaluation using accuracy, precision, recall, and F1-score implemented in RapidMiner. The results indicate that the Naïve Bayes model achieves 99.58% accuracy, demonstrating strong capability to predict potential problem loans accurately and efficiently, supporting data-driven credit decisions and strengthening credit risk management in microbanking institutions.

https://doi.org/10.61132/prosemnasproit.v2i2.96

Open Access Website Google Scholar

Penerapan Metode K-Means Clustering Untuk Menentukan Faktor Resiko Pada Penderita Diabetes Melitus

Melda Septriani; Pareza Alam Jusia; Rudolf Sinaga; Shinta Renova Putri; Firyal Najla 'Afifah

Prosiding Seminar Nasional Ilmu Teknik• 2025 •Asosiasi Riset Ilmu Teknik Indonesia

Diabetes Mellitus is a disease caused by the failure of the pancreas organ in producing the hormone insulin in excess causing increased blood sugar levels and resulting in a lack of insulin. This study discusses the application of the k-means clustering method to determine risk factors for diabetes mellitus. By using the clustering method, data will be grouped into several clusters or groups which in this study compare by applying several data mining tools such as RapidMiner, SPSS, WEKA, and Python. From the results of the comparison carried out resulted in 5 calculations, namely the manual calculation of cluster 1 with a ratio value of 73% being the first priority, calculations using RapidMiner resulting in cluster 3 with a ratio value of 58% being the first priority, calculations using SPSS cluster 2 with a ratio value of 34% being the first priority, and calculations using Python produce cluster 1 with a ratio value of 55% being the first priority.

https://doi.org/10.61132/prosemnasproit.v2i2.94

Open Access Website Google Scholar

Implementasi Data Mining dengan Teknik Smote dan Fitur Gain Ratio Untuk Klasifikasi Kelayakan Siswa Penerima PIP di Kota Jambi

Dea Sabrina Candra; Jasmir Jasmir; Yanti, Elvi

Prosiding Seminar Nasional Ilmu Teknik• 2025 •Asosiasi Riset Ilmu Teknik Indonesia

The Indonesia Pintar Program (PIP) is an educational assistance program for students from underprivileged families, but determining the eligibility of recipients still faces obstacles in the form of subjectivity and data imbalance. This study aims to classify the eligibility of high school students receiving PIP in Jambi City using data mining methods. The SMOTE technique was applied to overcome class imbalance, and Gain Ratio feature selection was used to determine important attributes. The dataset used consisted of 19,596 student data with a training data distribution of 70% and testing data of 30%. The classification process used the Naïve Bayes, Decision Tree (J48), and Random Forest algorithms with the Use Training Set, 5-Fold, and 10-Fold Cross Validation testing schemes. The results show that SMOTE improves model performance, but feature selection in some cases reduces accuracy. Overall, Random Forest without feature selection provides the best results with an accuracy of 93.33% and is recommended as the most effective model for objectively determining PIP recipient eligibility.

https://doi.org/10.61132/prosemnasproit.v2i2.66

Open Access Website Google Scholar

Klasifikasi Penggunaan Daya Listrik Rumah Tangga dengan Menggunakan Metode Naive Bayes

An Nisa Ziah Putri; Dodo Zaenal Abidin; Errissya Rasywir; Athallah, Ibni Faiq Athallah

Prosiding Seminar Nasional Ilmu Teknik• 2025 •Asosiasi Riset Ilmu Teknik Indonesia

Data mining is a technique of several fields of science to find previously unknown relationships in the data warehouse so that it becomes an information that can be used later. The unwise use of electricity will of course have an impact on the high use of electricity, therefore it is expected that every community understands the effort to use electricity wisely. Therefore, authors perform analysis of data mining on these electrical usage data in order to know which is a small, medium and large category. The authors use data on electrical use questionnaire as much as 200 data which is then presented into the ARFF format. In performing author analysis using WEKA Tools. The method used is Naive Bayes classification method with the greatest percentage of accuracy obtained using the Use Training Set Correctly of 80.5%, using a 5-Fold Cross Validation Correctly of 75%, and using 10-Fold Cross Validation amounted to 74%. While the result of the selection of the attributes using the algorithm classifier attribute evaluation (ClassifierAttributeEval) is stated that the most influential attribute against the electrical power usage classification is Electonic Goods.

https://doi.org/10.61132/prosemnasproit.v2i2.60

Open Access Website Google Scholar

Analisis Machine Learning pada Data Netflix Shows untuk Mengklasifikasikan Tren Genre dan Karakteristik Film

Claudia K. Hamsi; I Wayan Sudiarsa; Vinsensia P.K Abu; Sarling C. Dhai; Maria A. Serero

Mars: Jurnal Teknik Mesin, Industri, Elektro Dan Ilmu Komputer• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

The rapid development of digital streaming platforms such as Netflix has generated a large volume of content data with diverse characteristics, thereby requiring effective analytical methods to understand emerging patterns and trends. This study aims to classify Netflix content into two main categories, namely movies and television shows, and to analyze genre trends and content characteristics using a data mining approach with the Naive Bayes algorithm. The dataset used in this study is the Netflix Shows dataset, consisting of 8,809 content entries, with the primary features analyzed including genre, rating, and country of production. The research process begins with data exploration and preprocessing stages, including data cleaning, handling missing values, and transforming categorical features to enable effective model construction. Subsequently, the dataset is divided into training and testing sets to objectively and systematically build and evaluate the Naive Bayes classification model. Model performance is evaluated using accuracy, precision, recall, and F1-score metrics to assess the model’s ability to accurately distinguish between Netflix content types. The experimental results demonstrate that the Naive Bayes algorithm is able to classify Netflix content into Movie and TV Show categories with accuracy, precision, recall, and F1-score values of 100%, respectively. The confusion matrix indicates that no misclassification occurred, suggesting that genre, rating, and country of production features provide a very clear separation between content classes. These findings indicate that the Naive Bayes algorithm can achieve exceptionally high classification performance with optimal evaluation results. The results further reveal distinct differences in characteristics between movies and television shows based on genre and production attributes. Therefore, this study is expected to contribute to the development of content recommendation systems and strategic content management within the streaming industry.

https://doi.org/10.61132/mars.v3i6.1389

Open Access Website Google Scholar

Analisis K-Means Clustering Wilayah Asal Pasien dan Fasilitas Pelayanan Kesehatan Tujuan Berdasarkan Permintaan Layanan Ambulans Transportasi di Kota Semarang

Aisya Mardatila; Ahmad Zaini; Rheni Prihanti

Jurnal Riset Ilmu Farmasi dan Kesehatan• 2025 •Asosiasi Riset Ilmu Kesehatan Indonesia

This study aims to analyze the spatial patterns of ambulance transport demand in Semarang City based on patients’ origin subdistricts, origin villages, and destination healthcare facilities. The analysis employed the K-Means Clustering algorithm as a data mining method to group areas according to similarities in the volume of ambulance requests. The dataset consisted of ambulance transport service records from January 2024 to September 2025, obtained from the Semarang City Health Office. The analytical procedures included data cleaning, normalization, determination of the optimal number of clusters using the Elbow Method, and cluster formation using K-Means. The results show two main clusters for subdistricts and destination healthcare facilities. High-demand subdistricts were generally densely populated areas such as Banyumanik and Pedurungan, with an average of 1,256 requests, while RSUP Dr. Kariadi emerged as the dominant referral facility with 3,893 requests. Meanwhile, village-level origins formed three clusters, with average demands of 549 (high), 190 (medium), and 36 (low). These findings are expected to support strategic planning for equitable ambulance fleet distribution and improved efficiency of patient transportation services in Semarang City.

https://doi.org/10.61132/obat.v3i6.1944

Open Access Website Google Scholar

Customer Data Management Analysis for Customer Segmentation Using K-Means Clustering Method

Andre Leto; Reza Aminullah; Ani Dijah Rahajoe

International Journal of Information Engineering and Science• 2025 •Asosiasi Riset Teknik Elektro dan Infomatika Indonesia

This study aims to examine customer segmentation through K-Means clustering from a customer data management perspective, emphasizing the interpretive value of analytical results rather than solely their computational outcomes. The research addresses a critical issue in contemporary data-driven organizations, where customer analytics is often reduced to technical modeling without sufficient translation into managerial insights. To respond to this gap, the study adopts a qualitative interpretive approach embedded within a quantitative clustering process, positioning clustering as part of a broader information management cycle. The empirical analysis is based on the Mall Customers Dataset obtained from Kaggle, consisting of 200 customer records with numerical attributes representing age, annual income, and spending score. Quantitative processing using K-Means clustering was employed to identify customer segments, while qualitative interpretation was applied to analyze the managerial meaning of each cluster. Data interpretation was supported by analytical documentation, visualization outputs, and reflective analysis of cluster characteristics. The findings reveal four distinct customer segments with different behavioral and economic profiles, each carrying specific strategic implications for customer relationship management and marketing decision-making. The study demonstrates that the primary value of clustering lies not merely in segment formation, but in its ability to transform raw customer data into actionable managerial knowledge. In conclusion, this research contributes to customer analytics literature by integrating data mining techniques with qualitative interpretation, offering a more human-centered and decision-oriented framework for customer data management. Future research is encouraged to extend this approach using organizational case studies or participatory decision-making contexts.

https://doi.org/10.62951/ijies.v2i4.345

Open Access Website Google Scholar

Development Strategy System Information Management in Support Organizational Digital Transformation

Hendra Jatnika; Mia Kusmiati

International Journal of Management Science and Entrepreneurship• 2025 •International Forum of Researchers and Lecturers

Goals – Goals from studies This is For explore approach strategic in development System Information Management (SIM) as integral part in support digital transformation of modern organizations. Study This emphasize importance integration technology information , effective data management as well as improvement digital competence resources Power man in operation system. Design/ methodology / approach – Conceptual article This use method review library with analyze various work relevant academic and technical manuals , in particular related implementation of SIM in the sector public and private . Study This referring to the works Jatnika et al. (2022–2024), including utilization Microsoft Office applications as skills supporting basis organizational digital literacy . Findings – Findings studies This show that SIM development is not just effort technical , but rather need strategic in support digital transformation . Key strategies covers design modular systems , data mining integration , training programs based users , and evaluation system in a way periodic . Components This allows organization build responsive and adaptive SIM ecosystem . Implications practical – Organizations that want to do digital transformation is necessary invest in development digital capabilities of sources Power the human as well as ensure effectiveness use developed SIM system in a way strategic can become driving force main in increase efficiency , accuracy , and capability taking decision across work units . Originality / value – Study This offers a conceptual model structured about development of SIM in context digital transformation , based on literature applications and needs organizations in the real world . This article give outlook practical for taker policy , IT managers , and HR developers .  

https://doi.org/10.70062/globalmanagement.v2i4.428

Open Access Website Google Scholar

Implementasi Klasifikasi Datamining dengan Algoritma C4.5 untuk Rekomendasi Pemilihan Fakultas Perguruan Tinggi Berdasarkan Minat dan Bakat Siswa SMK

Senna Hendrian; V.H Valentino; Wisdariah, Wisdariah; Riezca Talita Trista; Dudi Parulian

Neptunus: Jurnal Ilmu Komputer Dan Teknologi Informasi• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Selecting a faculty that aligns with students’ interests and talents is a strategic step in determining the success of higher education and future career paths. However, most vocational high school (SMK) students still face difficulties in identifying the most suitable faculty due to the lack of data-driven analysis. This study implements the C4.5 classification algorithm within data mining techniques to build an automatic and measurable faculty recommendation system. The dataset consists of attributes such as SMK major, interest level, aptitude test results, academic grade average, and gender, with the output being the recommended faculty. The C4.5 algorithm was chosen for its ability to generate a transparent and interpretable decision tree, which helps both guidance counselors and students understand the rationale behind the recommendations. The experimental results show that the constructed classification model achieved an accuracy rate of 88%, based on cross-validation testing using data from 12th-grade students. The implementation of this system is expected to serve as an objective tool in the faculty selection process and to promote a data-driven decision-making approach in secondary education environments.

https://doi.org/10.61132/neptunus.v3i4.1159

Open Access Website Google Scholar

Pengelompokan Data Siswa SMP dalam Mendeteksi Kesehatan Remaja Menggunakan Algoritma K-Means

Delvi Kibina Br Sembiring; Khairul Khairul; Melda Pita Uli Sitompul

Merkurius : Jurnal Riset Sistem Informasi dan Teknik Informatika• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Technological advancements in education have led to major transformations, particularly with the implementation of the Merdeka Curriculum, which emphasizes learning flexibility, student-centered approaches, and educator autonomy in developing innovative teaching methods. One of its essential aspects is the integration of technology for managing educational data, including student health records. At SMP IT Mutia Rahma, biannual student health monitoring has generated a growing volume of data, making it difficult to identify students experiencing psychological challenges. Adolescent mental health problems—such as learning stress, anxiety, and social pressure—can negatively affect academic performance if left unaddressed. This study aims to group students based on their mental health conditions to support more effective intervention strategies. The K-Means Algorithm, a data mining technique for clustering data by similarity, was employed to analyze student health data. The results show that in a three-cluster model, Cluster 2 represents students in a stable condition characterized by high resilience and low counseling needs, indicating good mental health and academic engagement. Meanwhile, Clusters 1 and 3 include students requiring further attention and support. This research demonstrates that the K-Means Algorithm can serve as an effective tool in identifying and categorizing student mental health conditions to improve school-based health management and early intervention programs.

https://doi.org/10.61132/merkurius.v3i6.1142

Open Access Website Google Scholar

Analisis Sentimen pada Ulasan Aplikasi JakLingko Menggunakan Metode Naïve Bayes

Ricardus Mba Dala Pati; Eka Kusuma Pratama; Tuslaela Tuslaela

Repeater : Publikasi Teknik Informatika dan Jaringan• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

JakLingko is a digital-based public transportation integration system developed to facilitate access to various transportation modes in Jakarta. Along with the increasing number of users, reviews on the JakLingko application reflect user experiences and perceptions. This study aims to analyze the sentiment of user reviews on the Google Play Store using the Naïve Bayes method. Data collection was conducted through web scraping, resulting in 3,260 reviews. The data were preprocessed, sentiment-labeled, and classified using Orange Data Mining. The research applied a quantitative experimental approach with a machine learning framework. The classification results showed that neutral sentiment dominated user reviews, followed by negative and positive sentiments. The Naïve Bayes model achieved 100% accuracy based on the confusion matrix and other evaluation metrics such as precision, recall, and F1-score. The findings highlight that Naïve Bayes can be a reliable approach for analyzing public opinion and serve as a reference for evaluating and improving digital service applications.

https://doi.org/10.62951/repeater.v3i4.638

Open Access Website Google Scholar

Implementasi Algoritma K-Means Clustering dengan Python untuk Analisis Produksi Bawang Merah di Indonesia

Jafar Pahrudin; Sri Mulyeni

SOSIAL: Jurnal Ilmiah Pendidikan IPS• 2025 •Asosiasi Peneliti Dan Pengajar Ilmu Sosial Indonesia

Shallots are one of the most strategic horticultural commodities in Indonesia, with high demand and varying production levels across regions. Differences in productivity between areas often create challenges in managing distribution and formulating national food policies. This study aims to analyze shallot production data in Indonesia by applying the K-Means Clustering algorithm using Python. The production data were collected from official agricultural statistics publications, followed by preprocessing, normalization, and determination of the optimal number of clusters using the Elbow method and Silhouette Score. The clustering results show the formation of several groups representing regions with high, medium, and low production levels. Visualization of the clustering results reveals the distribution patterns of shallot production, which can serve as a basis for supporting policy formulation in the development of shallot production centers in Indonesia. Thus, the application of K-Means Clustering with Python proves to be an effective approach to provide clearer insights into regional production variations and can be utilized as an analytical tool to support decision-making in the agricultural sector.

https://doi.org/10.62383/sosial.v3i4.1228

Open Access Website Google Scholar

Predicting First-Year Student Performance with SMOTE-Enhanced Stacking Ensemble and Association Rule Mining for University Success Profiling

Kikunda, Philippe Boribo; Kasongo, Issa Tasho; Nsabimana, Thierry; Ndikumagenge, Jérémie; Ndayisaba, Longin +2 more

Journal of Computing Theories and Applications• 2025 •Universitas Dian Nuswantoro

This study examines the application of Educational Data Mining (EDM) to predict the academic per-formance of first-year students at the Catholic University of Bukavu and the Higher Institute of Edu-cation (ISP) in the Democratic Republic of Congo. The primary objective is to develop a model that can identify at-risk students early, providing the university with a tool to enhance student support and academic guidance. To address the challenges posed by data imbalance (where successful cases outnumber failures), the study adopts a hybrid methodological approach. First, the SMOTE algorithm was applied to balance the dataset. Then, a stacking classification model was developed to combine the predictive power of multiple algorithms. The variables used for prediction include the National Exam score (PEx), the secondary school track (Humanities), and the type of prior institution (public, private, or religious-affiliated schools), as well as age and sex. The results demonstrate that this approach is highly effective. The model is not only capable of predicting success or failure but also of forecasting students' performance levels (e.g., honors or distinctions). Moreover, the use of the Apriori association rule mining algorithm allowed the identification of faculty-specific success profiles, transforming prediction into an interpretable decision-support tool. This research makes several significant contributions. Practically, it provides the University of Bukavu with a tool for student orientation and early risk detection. Methodologically, it illustrates the effectiveness of a combined approach to EDM in an African context. However, the study acknowledges certain limitations, including the non-public nature of the data and the geographical specificity of the sample. It therefore proposes avenues for future research, such as the integration of Explainable AI (XAI) techniques for more refined and transparent analysis of the results.

https://doi.org/10.62411/jcta.14043

Open Access Website Google Scholar

Prediksi Tingkat Pelanggaran Lalu Lintas Menggunakan Algoritma Naive Bayes

Bintang Dwi Atmaja; Yani Maulita; Novriyenni Novriyenni

Merkurius : Jurnal Riset Sistem Informasi dan Teknik Informatika• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Traffic violations are one of the serious problems frequently occurring in various regions, including Binjai City. Various types of violations, such as disobeying road signs and markings, incomplete vehicle documents, and violations that threaten the safety of drivers and other road users, continue to increase despite preventive and repressive efforts carried out by the authorities. This condition indicates that handling traffic violations cannot rely solely on field enforcement but also requires the support of technology capable of analyzing data more comprehensively. This study aims to predict the level of traffic violations by applying the Naïve Bayes method through data mining techniques. The dataset used consists of traffic violation records in 2023 from the Binjai City Police Department, with the main variables including violations of traffic signs and markings, document completeness, and safety-related violations. The Naïve Bayes method was selected because of its ability to perform classification with good accuracy, simplicity, and efficiency in processing large amounts of data. The implementation of this research is realized by developing a web-based application using Visual Studio Code as the development environment and MySQL as the database system. The results of this study are expected to provide structured information regarding traffic violation patterns, support authorities in making more effective decisions, and serve as an alternative solution in the prevention and handling of traffic violations in Binjai City.

https://doi.org/10.61132/merkurius.v3i5.1069

Open Access Website Google Scholar

Perencanaan Pelatihan dalam Rangka Pelatihan Kerajinan Wayang Kulit

Shofikatul Umma; Heri Prabowo; Sapto Budoyo; Agus Sutono

Jurnal Pelayanan Masyarakat• 2025 •Lembaga Pengembangan Kinerja Dosen

Shadow puppet craft training is a strategic intervention in preserving cultural heritage and strengthening the creative economy sector in Indonesia. To ensure the effectiveness and efficiency of training, a planning approach is needed that is not only conventional, but also based on quantitative analysis and intelligent systems. This community service proposes a training planning strategy using an interdisciplinary approach involving Operation Research, Design of Experiment (DoE), Simulation, Metaheuristic Algorithms, and Data Mining. This study begins with the identification of key training variables, such as duration, number of participants, initial competency level, teaching materials, and instructor resources. Through the DoE approach, various combinations of variables are systematically tested to identify the optimal training design. Next, Simulation is used to model the dynamics of training implementation and evaluate implementation scenarios. To predict training needs and participant behavior, Data Mining techniques are applied to historical data of arts community training. In the final stage, Metaheuristic algorithms such as Genetic Algorithm and Simulated Annealing are used to solve complex and large-scale scheduling and resource allocation problems. The results of the integration of these approaches show an increase in training efficiency of up to 27% as well as increased participant satisfaction and the quality of work results. This activity demonstrates that applying a quantitative, data-driven approach to traditional crafts training planning can provide significant added value. This model can be replicated in other training programs based on local wisdom and other creative industry sectors.

https://doi.org/10.62951/jpm.v2i3.2118

Open Access Website Google Scholar

Perbandingan Algoritma C4.5 dan Naïve Bayes dalam Prediksi Kualitas Tidur pada Kesehatan

Fakhruddin Fakhruddin; Sefrika Entas

Jurnal ilmu Kesehatan Umum• 2025 •Asosiasi Riset Ilmu Kesehatan Indonesia

Sleep is a fundamental human need that plays a crucial role in maintaining both physical and mental health. Poor sleep quality can trigger a variety of health problems, ranging from decreased concentration to an increased risk of chronic diseases. The complexity of factors influencing sleep quality—such as stress levels, heart rate, blood pressure, physical activity, and lifestyle—makes its assessment difficult through direct observation alone. Therefore, data mining approaches are increasingly utilized to identify relevant patterns in sleep-related data. This study aims to compare the performance of the C4.5 (Decision Tree) algorithm and the Naïve Bayes algorithm in predicting sleep quality using the Sleep Health and Lifestyle dataset, which contains information from 374 respondents. The research method applied is a quantitative comparative approach employing classification techniques with 10-fold cross-validation to ensure robust evaluation. Model performance is assessed using accuracy, precision, and recall metrics to provide a comprehensive understanding of the effectiveness of each algorithm. The findings indicate that the C4.5 algorithm achieves an accuracy of 96.26% and offers advantages in terms of interpretability through its decision tree visualization, enabling easier understanding of variable relationships. In contrast, the Naïve Bayes algorithm demonstrates superior predictive performance, achieving an accuracy of 98.66% along with consistently high precision and recall across nearly all classes. These results suggest that Naïve Bayes is more effective for predictive tasks involving sleep quality, while C4.5 remains highly valuable when the goal is to interpret variable interactions and decision rules. Overall, this research highlights the potential of data mining techniques in health informatics, particularly in improving the understanding and prediction of sleep quality, which in turn can contribute to better prevention and management of sleep-related health issues.

https://doi.org/10.61132/vitamin.v3i4.1773

Open Access Website Google Scholar

Penggunaan Metode Rough Set untuk Menentukan Tingkat Kesiapan Siswa dalam Menghadapi ANBK di SMP Negeri 2 Kuala

Harninda Br Keliat; Novriyenni Novriyenni; Tio Ria Pasaribu

Repeater : Publikasi Teknik Informatika dan Jaringan• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

The Computer-Based National Assessment (ANBK) is an essential instrument designed to comprehensively measure student competence, including literacy, numeracy, and character aspects. However, in practice, many students still face various challenges during preparation, such as cognitive limitations, psychological readiness, and technical barriers, which affect their overall readiness to participate in ANBK. This study aims to analyze the readiness level of students at SMP Negeri 2 Kuala by employing the Rough Set method. The variables examined include digital literacy, subject matter understanding, psychological readiness, and school facility support. Data were collected from 250 ninth-grade students through structured questionnaires and subsequently processed using the Rosetta software to perform attribute reduction and generate decision rules. The findings indicate that digital literacy, subject matter understanding, and psychological readiness are the most influential variables in determining student readiness, while facility support serves only as a complementary factor. The extraction process generated seven decision rules with an accuracy level of 100%, which effectively classified students into three readiness categories: highly ready, ready, and less ready. These results confirm that the Rough Set method is highly effective for identifying dominant factors and producing decision rules that can guide schools in developing targeted strategies to enhance student readiness for ANBK.

https://doi.org/10.62951/repeater.v3i3.619

Open Access Website Google Scholar

Pemanfaatan Data Mining untuk Klasifikasi Penyakit Daun pada Tebu dan Cara Pencegahan Penyakit dengan Metode Algoritma K-Nearest Neighbors

Muhammad Akmal Ar Rasid; Catur Pranomo; Elkin Rilvani

Bridge : Jurnal Publikasi Sistem Informasi dan Telekomunikasi• 2025 •Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

This study aims to utilize data mining techniques, specifically the K-Nearest Neighbors (KNN) algorithm, to classify leaf diseases in sugarcane (Saccharum officinarum). Early and accurate detection of leaf disease types is a crucial step in prevention and control strategies, thereby reducing potential crop losses caused by pathogen attacks. Leaf diseases in sugarcane, such as leaf scald, rust, and mosaic virus, are known to affect photosynthesis, inhibit growth, and reduce the quality and quantity of sugarcane produced. The classification process in this study was carried out through image analysis of infected sugarcane leaves, where features such as color, texture, and shape were extracted using digital image processing techniques. The KNN algorithm was chosen because of its non-parametric nature, ease of implementation, and its ability to provide accurate classification results even with limited data size. The working principle of KNN is to determine the class of a new sample based on the majority class of its k nearest neighbors in the feature space, making it very suitable for the case of leaf disease image classification. In addition to building a classification model, this study also examines disease prevention strategies based on the identification results. These strategies include the use of disease-resistant sugarcane varieties, the implementation of appropriate planting patterns, land moisture management, regular plantation sanitation, and the measured and environmentally friendly use of pesticides or fungicides. Model performance evaluation was conducted using accuracy, precision, recall, and F1-score metrics to assess model effectiveness across various data scenarios. The results of this study are expected to not only contribute to the development of decision support systems for farmers and related parties but also support the application of artificial intelligence-based technology in the agricultural sector.

https://doi.org/10.62951/bridge.v3i3.580

Open Access Website Google Scholar

IoT-Based Home Electricity Monitoring and Consumption Forecasting using k-NN Regression for Efficient Energy Management

Angdresey, Apriandy; Sitanayah, Lanny; Rumpesak, Zefanya Marieke Philia; Ooi, Jing-Quan

Journal of Computing Theories and Applications• 2025 •Universitas Dian Nuswantoro

Electricity has emerged as an essential requirement in modern life. As demand escalates, electricity costs rise, making wastefulness a drain on financial resources. Consequently, forecasting electricity usage can enhance our management of consumption. This study presents an IoT-based monitoring and forecasting system for electricity consumption. The system comprises two NodeMCU micro-controllers, a PZEM-004T sensor for collecting real-time power data, and three relays that regulate the current flow to three distinct electrical appliances. The data gathered is transmitted to a web application utilizing the k-Nearest Neighbor (k-NN) algorithm to forecast future electricity usage based on historical patterns. We evaluated the system's performance using four weeks of electricity consumption data. The results indicated that predictions were most accurate when the user’s daily consumption pattern remained stable, achieving a Mean Absolute Error (MAE) of approximately 1 watt and a Mean Absolute Percentage Error (MAPE) ranging from 1% to 1.7%. Additionally, predictions were notably precise during the early morning hours (3:00 AM to 8:00 AM) when k=6 was employed. This study demonstrates the effectiveness of integrating IoT-based systems with machine learning for real-time energy monitoring and forecasting. Furthermore, it emphasizes the application of data mining techniques within embedded IoT environments, providing valuable insights into the implementation of lightweight machine learning for smart energy systems.

https://doi.org/10.62411/jcta.13602

Open Access Website Google Scholar