SciRepID - Scientific Publication Search

Implementasi Data Mining Menggunakan Metode RapidMiner Untuk Optimasi Manajemen Akademik Di SMK Secang

Syufa’a, Niha; Juwari, Juwari; Yamin, Muhammad Ikrar; Soderi, Ahmad; Rinaldo, Rinaldo

Teknik: Jurnal Ilmu Teknik dan Informatika• 2026 •LPPM Sekolah Tinggi Ilmu Ekonomi - Studi Ekonomi Modern

Education in vocational high schools (SMKs) requires effective data management to improve students’ academic achievement and discipline. At SMK Islam Secang, students’ academic scores and attendance data have so far functioned merely as administrative archives, making it difficult to identify patterns of student performance. This study aims to classify students based on academic achievement and discipline by applying the K-Means Clustering algorithm using RapidMiner. The data used in this study consist of scores from six subjects and attendance records of 35 students from the Light Vehicle Engineering (TKR) department over two semesters. The data were obtained from original school records, compiled using Microsoft Excel, and processed in RapidMiner. The clustering process employed four clusters for academic achievement and two clusters for discipline, with Euclidean Distance used as the similarity measure. The results show that in the first semester, students were grouped into four academic achievement clusters: high achievement (6 students), moderate achievement (7 students), potentially problematic (14 students), and problematic (8 students). In the second semester, the distribution changed to high achievement (19 students), moderate achievement (14 students), potentially problematic (4 students), and problematic (1 student). Meanwhile, student discipline was divided into two clusters: disciplined (31 students) and undisciplined (4 students). These results demonstrate that K-Means Clustering is effective in mapping student conditions, revealing patterns in academic performance and attendance, and supporting educational evaluation, learning planning, and early detection of students who require academic or disciplinary intervention. Keywords: Data Mining, K-Means Clustering, Academic Achievement, Discipline, RapidMiner, Vocational High School (SMK)

https://doi.org/10.51903/teknik.v6i1.1214

Open Access Website Google Scholar

Pengembangan Smart Health Monitoring Berbasis IoT dengan Prediksi Serangan Jantung Menggunakan Algoritma SVM (Support Vector Machine)

Untung Surapati; Dadang Iskandar Mulyana; Dedi Gunawan; Anggit Purnama

International Journal of Applied Mathematics and Computing• 2026 •Asosiasi Riset Ilmu Matematika dan Sains Indonesia

Early detection of a potential heart attack is a crucial step in preventing sudden death from heart disease. This research aims to develop an Internet of Things (IoT)-based health monitoring system capable of measuring vital body data in real time and predicting the likelihood of a heart attack from CSV data obtained from sensors, integrated through RapidMiner as learning data using a machine learning algorithm, the Support Vector Machine (SVM). The system was built using an ESP32 microcontroller connected to a MAX30102 sensor to measure heart rate and finger oxygen levels (SpO₂), as well as a DHT22 sensor to measure temperature and humidity. The resulting data is sent to the Blynk application to display real-time data according to its parameters. The initial prediction logic was developed using a rule-based method based on medical thresholds for four vital parameters. The data was then used to train an SVM model as a classification system to detect potential heart attacks. Test results showed that the system can identify abnormal conditions with a good level of accuracy and provide early warnings based on changes in vital parameters in real time. This system is expected to be an initial solution for personal health monitoring, especially for individuals at risk of heart disease. It can be further developed with cloud integration and automatic notifications to users' devices.

https://doi.org/10.62951/ijamc.v2i3.128

Open Access Website Google Scholar

Analisis Tingkat Kepuasan Pengguna QRIS Berdasarkan Pengalaman Dan Persepsi Pengguna twitter/ X Menggunakan Naive Baiyes

Veri Arinal; Satria Wira Yudha; Muhammad Joko Umbaran Kharis Bahrudin; Dessyanti Ryantina

International Journal of Information Engineering and Science• 2026 •Asosiasi Riset Teknik Elektro dan Infomatika Indonesia

QRIS (Quick Response Code Indonesian Standard) has become a widely used national digital payment standard. User satisfaction with this service needs to be monitored continuously to ensure its sustainability. This study aims to predict the level of QRIS user satisfaction based on their experiences and perceptions expressed organically on the Twitter social media platform. The method used is sentiment analysis with the Naive Bayes classification algorithm implemented using RapidMiner software. The research data was obtained from Twitter user comments collected through web scraping techniques. The text data then went through a preprocessing stage that included cleansing, stopword filtering, stemming, and tokenizing to be prepared as features ready to be processed by the model. The data was divided into training (80%) and testing (20%) subsets for model training and validation. The results showed that the Naive Bayes model was able to predict user satisfaction sentiment with an accuracy of 80.99%. These findings indicate that the model is highly accurate in identifying satisfied comments and sufficiently sensitive in detecting dissatisfaction. This study concludes that sentiment analysis of Twitter UGC data using Naive Bayes is an effective and efficient approach for predicting QRIS user satisfaction in real time. The practical implication of this study is to provide an automatic feedback system for service providers to monitor public sentiment and take targeted corrective actions.

https://doi.org/10.62951/ijies.v2i4.53

Open Access Website Google Scholar

Klasterisasi Daerah Rawan Demam Berdarah Dengue (DBD) menggunakan Algoritma K-Means di Purwodadi Grobogan

Elsa Syahriza Putri; Andri Triyono; Kartika Imam Santoso

Router : Jurnal Teknik Informatika dan Terapan• 2026 •Asosiasi Profesi Telekomunikasi dan Informatika Indonesia

Dengue fever is a disease commonly found in tropical and subtropical regions. This disease can cause severe symptoms, such as very high fever, accompanied by nausea, vomiting, headache, abdominal pain, and leukopenia (decrease in white blood cells). This infectious disease, known as dengue hemorrhagic fever (DHF), is a viral infection transmitted by the Aedes Aegyppti mosquito. This study aims to classify dengue-prone areas using the K-Means Algorithm, and to classify the factors that cause dengue in Purwodadi District, Grobogan Regency. The clustering results using the K-Means algorithm with Rapidminer tool from 266 data produced 3 clusters: cluster 0 (blue) with 138 patients dominated by Kuripan, Purwodadi, Ngambak villages, cluster 1 (green) with 31 patients in Ngraji, Nambuhan, Cingkrong villages, and cluster 2 (orange) with 97 patients in Danyang, Kalongan, Pulorejo villages. This study is expected to provide additional information for stakeholders in controlling dengue cases and increase awareness of the importance of environmental cleanliness as a preventive measure.

https://doi.org/10.62951/router.v4i1.846

Open Access Website Google Scholar

Prediksi Kelulusan Mahasiswa UNU Lampung Menggunakan Algoritma Decision Tree Berbasis Data Akademik Menggunakan Rapidminer

Nuari Anisa Sivi; Imam Mualim; Roro Fatikhin; Dhea Nova Ariskha

JURNAL PENELITIAN TEKNOLOGI INFORMASI DAN SAINS (JPTIS)• 2026 •Institut Teknologi dan Bisnis (ITB) Semarang

This study aims to predict student graduation at Universitas Nahdlatul Ulama (UNU) Lampung using academic data with the Decision Tree C4.5 algorithm in RapidMiner. This research is based on the problem that some students graduate late and academic data is not fully used for decision-making. The method used is a quantitative approach with an experimental design following the CRISP-DM stages, which include business understanding, data understanding, data preparation, modeling, evaluation, and deployment. The data used consists of 389 student records from four study programs. The variables include GPA, semester GPA, gender, and study program, while graduation status is used as the target variable. The results show that GPA is the most important factor affecting graduation. Students with GPA ≤ 3.00 tend to graduate late. The model produced an accuracy of 85.34%, precision of 87.50%, and recall of 97.03%. Therefore, it can function as an early warning mechanism to support academic programs in increasing on-time graduation rates.

https://doi.org/10.54066/jptis.v3i1.3794

Open Access Website Google Scholar

Analisis sentimen masyarakat terhadap putusan Mahkamah Konstitusi tentang batasan usia calon Presiden dan Wakil Presiden di media sosial Twitter

Noviolen Jehovan Dieksa; Pakereng, Ineke

IT-Explore: Jurnal Penerapan Teknologi Informasi dan Komunikasi• 2026 •Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana

This study evaluates public sentiment toward Constitutional Court Decision No. 90/PUU-XXI/2023 regarding the age limit for presidential and vice-presidential candidates, a controversial issue closely related to Indonesia’s democratic dynamics. Understanding public opinion on Twitter, as a major platform for political expression, is essential for informing electoral policy formulation. Data were collected using Tweet Harvest through Google Colab and analyzed using the Naïve Bayes algorithm as the primary sentiment classification method, with RapidMiner employed to support and streamline the analytical process. The analysis process included data cleaning, text normalization, stopword removal, manual labeling of 80 tweets as training data, and automatic sentiment classification to identify positive and negative sentiments. From a total of 151 analyzed tweets, 84 (55.63%) were classified as negative and 67 (44.37%) as positive, with the model achieving an accuracy of 66.67%. These findings suggest a tendency toward public opposition to the decision, reflecting dissatisfaction among Twitter users. The study demonstrates that Naïve Bayes is reasonably effective for sentiment classification with limited datasets and provides insights for policymakers in understanding public responses to election-related regulations.

https://doi.org/10.24246/itexplore.v5i1.2026.pp1-10

Open Access Website Google Scholar

Penggunaan Algoritma K-Means Clustering Aplikasi Rapid Miner untuk Menganalisis Tingkat Kematian Pasien Diabetes Mellitus di Rumah Sakit Umum Daerah Ibnu Sina Gresik

Suci Ariani; Resta Dwi Yuliani; Auliyaur Rabbani

VitaMedica : Jurnal Rumpun Kesehatan Umum• 2026 •STIKES Columbia Asia Medan

Diabetes Mellitus is one of the chronic diseases with high morbidity and mortality rates, making data-driven analysis necessary to understand patient mortality patterns. This study aims to analyze the mortality rate of Diabetes Mellitus patients based on age and length of hospitalization using a data mining approach with the K-Means Clustering method. The study employs a quantitative approach using secondary data obtained from the medical records of Diabetes Mellitus patients at Ibnu Sina Regional General Hospital, Gresik Regency, in December 2022. The dataset consists of 266 patient records with variables including age, length of stay, and final patient status. Data analysis was conducted through preprocessing stages, including data cleaning, transformation, and normalization, followed by the clustering process using the K-Means algorithm with the assistance of the RapidMiner application. The results show that patient data are divided into three clusters based on age ranges: 0–40 years, 41–55 years, and 56–90 years. The cluster with the age range of 56–90 years has the highest number of patient deaths compared to the other clusters. Meanwhile, the length of hospitalization does not show a significant effect on patient mortality. This study is expected to serve as a consideration for hospitals and health institutions in efforts to prevent and manage Diabetes Mellitus, particularly among the elderly population.

https://doi.org/10.62027/vitamedica.v4i1.635

Open Access Website Google Scholar

Application of the K-Means Method for Grouping Product Data Based on Sales Level

Matuan, Helson; Dude, Esau; Mallo, Atius; Yowey, Herlina; Patey, Yusuf Selius +7 more

JUISI : Jurnal Ilmiah Sistem Informasi• 2026 •LPPM Universitas Sains dan Teknologi Komputer

Ritel modern di Indonesia tumbuh pesat dengan keragaman produk yang makin kompleks, sehingga pengelolaan data penjualan menjadi penting bagi pengambilan keputusan manajerial. Penelitian ini bertujuan mengelompokkan produk di Indomaret Kotaraja berdasarkan perilaku penjualan untuk mendukung keputusan terkait persediaan, penataan rak, dan promosi. Metode yang digunakan adalah klastering K-Means dengan implementasi di RapidMiner. Dataset mencakup penjualan bulanan selama satu tahun untuk produk makanan dan minuman kemasan. Sebelum pemodelan, dilakukan preprocessing yang meliputi pembersihan data, validasi tipe, penghapusan duplikasi, penanganan nilai hilang, dan normalisasi fitur, dengan variabel penjualan Januari hingga Desember sebagai masukan numerik, sementara identitas produk disimpan untuk interpretasi. Jumlah klaster alternatif K = 2, 3, dan 4 dievaluasi menggunakan Davies Bouldin Index, koefisien Silhouette, dan tren SSE atau WCSS. Hasil menunjukkan K = 4 memberikan pemisahan dan kekompakan terbaik dibanding K = 2 dan K = 3. Model akhir membagi 99 produk ke dalam klaster beranggotakan 16, 27, 31, dan 25 item. Profil centroid mengungkap pola yang berbeda: satu klaster memiliki penjualan tinggi dan relatif stabil pada kuartal empat fast moving, satu klaster menunjukkan lonjakan kuat di akhir tahun sensitif promosi atau musiman, satu klaster rendah namun stabil slow moving, dan satu klaster lebih volatil sehingga memerlukan pengendalian ketat. Penelitian ini menyajikan alur kerja yang dapat diulang dan profil klaster yang mudah ditafsirkan untuk tindakan operasional, seperti memprioritaskan pengisian ulang untuk fast mover, promosi terarah bagi kelompok menengah atau musiman, serta pengetatan stok dan optimasi rak untuk kelompok lambat atau volatil, sehingga membantu penyederhanaan kompleksitas data penjualan dan peningkatan keputusan berbasis data.

https://doi.org/10.51903/53pfrd78

Open Access Website Google Scholar

Application of KNN & Decision Tree Algorithms in Predicting Diabetes Using Rapid Miner

Fatah, Zaehol; Anam, Baitul; Fatah, Zaehol; Anam, Baitul

JUISI : Jurnal Ilmiah Sistem Informasi• 2026 •LPPM Universitas Sains dan Teknologi Komputer

Prediksi diabetes merupakan langkah penting dalam mendukung deteksi dini serta pencegahan komplikasi jangka panjang yang disebabkan oleh penyakit kronis. Penelitian ini bertujuan membandingkan kinerja algoritma K-Nearest Neighbor (KNN) dan Decision Tree dalam memprediksi diabetes menggunakan dataset Pima Indian Diabetes pada aplikasi RapidMiner. Dataset yang digunakan terdiri dari 768 data dengan delapan atribut kesehatan utama yang berkaitan dengan risiko diabetes. Metode penelitian meliputi preprocessing data, normalisasi, penanganan missing value, serta evaluasi model menggunakan K-Fold Cross Validation. Hasil penelitian menunjukkan bahwa algoritma Decision Tree memperoleh akurasi sebesar 74,68%, lebih tinggi dibandingkan KNN yang hanya mendapatkan akurasi 68,18%. Keunggulan Decision Tree disebabkan kemampuannya membaca pola data dengan lebih baik serta menghasilkan struktur keputusan yang mudah diinterpretasikan. Penelitian ini memberikan kontribusi dalam bidang analitika kesehatan dengan menghadirkan bukti empiris perbandingan algoritma serta menunjukkan efektivitas RapidMiner dalam pengembangan model prediksi untuk deteksi dini penyakit diabetes.

https://doi.org/10.51903/37naet22

Open Access Website Google Scholar

Predictive Analysis of Dengue Outbreak Trends Using RapidMiner-Based Machine Learning Models

Herriyawan, Herriyawan; Timur, Muhammad Bagus Bintang; Wibowo, Arief

Dinamik• 2026 •Universitas Stikubank

Demam berdarah dengue merupakan tantangan kesehatan masyarakat yang terus berulang di wilayah tropis, termasuk Indonesia. Penelitian ini bertujuan untuk memprediksi jumlah kasus tahunan dengan memanfaatkan lima algoritma pembelajaran mesin, yaitu Regresi Linier, Decision Tree, Random Forest, Support Vector Machine (SVM), dan Neural Network. Data historis tahun 2017–2024 diolah menggunakan teknik windowing deret waktu untuk menghasilkan fitur lag yang sesuai bagi pembelajaran terawasi. Evaluasi kinerja dilakukan melalui metrik Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), serta koefisien determinasi (R²). Model Decision Tree menunjukkan performa paling unggul pada sebagian besar indikator. Prediksi untuk tahun 2025 mengindikasikan adanya peningkatan moderat jumlah kasus. Namun, rendahnya nilai R² pada seluruh model mengisyaratkan perlunya pendekatan multivariat yang lebih kompleks dengan mempertimbangkan faktor iklim, lingkungan, dan demografi. Hasil penelitian ini menegaskan pentingnya kualitas data dan pemilihan fitur yang tepat dalam peramalan epidemiologis guna mendukung perencanaan kesehatan yang lebih efektif.

https://doi.org/10.35315/dinamik.v31i1.10356

Open Access Website Google Scholar

Implementasi Algoritma FP-Growth Untuk Menemukan Pola Hubungan Antar Barang pada Transaksi Penjualan

Nadia, Nadia; Tripasha, Ghina; Atya, Nur; Sutejo, Heru; Nadia, Nadia +3 more

JUISI : Jurnal Ilmiah Sistem Informasi• 2026 •LPPM Universitas Sains dan Teknologi Komputer

This study is motivated by the problem faced by Toko Polirindo, where sales transaction data are stored only as archives and have not been utilized for analytical purposes, resulting in unstable product availability, recurring stock shortages, and difficulties in predicting customer purchasing behavior; therefore, this research aims to identify patterns of item associations that frequently occur together by applying the Association Rule Mining method using the FP-Growth algorithm, which is recognized for its ability to extract frequent itemsets efficiently without the need to generate candidate combinations as in the Apriori algorithm. The dataset consists of sales transactions recorded from January to September 2025. It undergoes several stages, including preprocessing, binary transformation, and analysis using RapidMiner to generate frequent itemsets and association rules, evaluated using support, confidence, and lift metrics. The results reveal that item 3 consistently appears as the most dominant consequent across almost all generated rules, with confidence values ranging from 0.322 to 0.347, indicating that this item is most strongly associated with other items and frequently appears as a complementary product in customer transactions. These findings provide practical contributions by offering insights to optimize stock management, improve product placement, and develop promotional strategies based on actual purchasing patterns, while also demonstrating that the FP-Growth algorithm is an effective analytical tool to support data-driven decision-making aimed at enhancing operational efficiency and customer satisfaction in retail environments.

https://doi.org/10.51903/tc3ne886

Open Access Website Google Scholar

Analisis Penerapan Data Mining Untuk Klasifikasi Penjualan Makanan Terlaris Menggunakan Algoritma Decision Tree (C4.5)

Falentina, Falerina Gita; Wabdaron, Gabriel Yohan Yoseph; Andiyani, Dwi; Wondiwoi, Melki Sendoni; Sutejo, Heru +5 more

JUISI : Jurnal Ilmiah Sistem Informasi• 2026 •LPPM Universitas Sains dan Teknologi Komputer

Sektor bisnis kuliner terus berkembang pesat, menciptakan kebutuhan yang kuat akan pengambilan keputusan berbasis data untuk mendukung efisiensi operasional. Penelitian ini bertujuan untuk mengklasifikasikan kinerja penjualan item menu di Warung Makan Lalapan Haris dengan menerapkan algoritma pohon keputusan C4.5 dan metodologi KDD. Sebanyak 500 data record yang berisi atribut seperti jenis menu, jumlah pelanggan, jumlah barang terjual, dan status penjualan diproses melalui beberapa tahap, meliputi pemilihan data, praproses, transformasi, penggalian data, dan evaluasi. Model pohon keputusan dibangun menggunakan RapidMiner 2026.0.1 dengan pembagian data 70% untuk pelatihan dan 30% untuk pengujian. Hasil penelitian menunjukkan bahwa algoritma C4.5 berhasil membentuk struktur klasifikasi yang mengkategorikan item menu ke dalam kelompok Best Seller, Medium Seller, dan Low Seller. Ayam Goreng secara konsisten diidentifikasi sebagai Best Seller, Ayam Bakar sebagai Medium Seller, sementara Lele Goreng dan Lele Bakar diklasifikasikan sebagai item Low Seller dengan pola pembagian yang lebih kompleks yang terutama dipengaruhi oleh jumlah pelanggan. Hasil evaluasi menunjukkan akurasi 84%, dengan presisi dan recall sempurna untuk Ayam Goreng dan Ayam Bakar, sementara performa untuk Lele Goreng dan Lele Bakar bervariasi. Temuan ini menunjukkan bahwa algoritma pohon keputusan C4.5 efektif untuk menganalisis pola penjualan dan dapat membantu pemilik bisnis dalam merencanakan inventaris dan mengoptimalkan strategi manajemen menu.

https://doi.org/10.51903/f2jegk76

Open Access Website Google Scholar

Analisis Kelayakan Pemberian Kredit dengan Algoritma Naïve Bayes untuk Antisipasi Risiko Kredit Bermasalah Pada BPR Ukabima Lestari Cabang Jambi

Anggi Saputra; Setiawan Assegaff; Benni Purnama

Prosiding Seminar Nasional Ilmu Teknik• 2025 •Asosiasi Riset Ilmu Teknik Indonesia

This study analyzes creditworthiness assessment and predicts non-performing loan (NPL) risk using the Naïve Bayes algorithm at BPR Ukabima Lestari, Jambi Branch. A quantitative data mining approach with probabilistic classification is applied. The dataset includes borrower attributes such as age, occupation, income, loan amount, tenor, collateral, and repayment history. Research stages comprise data preprocessing, model development, and performance evaluation using accuracy, precision, recall, and F1-score implemented in RapidMiner. The results indicate that the Naïve Bayes model achieves 99.58% accuracy, demonstrating strong capability to predict potential problem loans accurately and efficiently, supporting data-driven credit decisions and strengthening credit risk management in microbanking institutions.

https://doi.org/10.61132/prosemnasproit.v2i2.96

Open Access Website Google Scholar

Penerapan Metode K-Means Clustering Untuk Menentukan Faktor Resiko Pada Penderita Diabetes Melitus

Melda Septriani; Pareza Alam Jusia; Rudolf Sinaga; Shinta Renova Putri; Firyal Najla 'Afifah

Prosiding Seminar Nasional Ilmu Teknik• 2025 •Asosiasi Riset Ilmu Teknik Indonesia

Diabetes Mellitus is a disease caused by the failure of the pancreas organ in producing the hormone insulin in excess causing increased blood sugar levels and resulting in a lack of insulin. This study discusses the application of the k-means clustering method to determine risk factors for diabetes mellitus. By using the clustering method, data will be grouped into several clusters or groups which in this study compare by applying several data mining tools such as RapidMiner, SPSS, WEKA, and Python. From the results of the comparison carried out resulted in 5 calculations, namely the manual calculation of cluster 1 with a ratio value of 73% being the first priority, calculations using RapidMiner resulting in cluster 3 with a ratio value of 58% being the first priority, calculations using SPSS cluster 2 with a ratio value of 34% being the first priority, and calculations using Python produce cluster 1 with a ratio value of 55% being the first priority.

https://doi.org/10.61132/prosemnasproit.v2i2.94

Open Access Website Google Scholar

Implementation of Naïve Bayes Algorithm on the Eligibility of Kartu Indonesia Pintar Scholarship (Case Study: University of Sepuluh Nopember Papua)

Siahaan, Daniel Bienfield Manahan; Bagre, Estevina Carolina; Wanda, Jered Imanuel; Silahooy, Grisye; Sutejo, Heru +5 more

JUISI : Jurnal Ilmiah Sistem Informasi• 2025 •LPPM Universitas Sains dan Teknologi Komputer

Program KIP Kuliah bertujuan memperluas akses pendidikan tinggi yang adil, namun seleksi manual di tingkat kampus kerap subjektif dan sukar diaudit. Penelitian ini menawarkan dukungan seleksi berbasis data menggunakan klasifikator Naive Bayes untuk membantu pengambilan keputusan di Universitas Sepuluh Nopember Papua. Tujuan penelitian adalah: (1) merancang dan mengimplementasikan model yang transparan dan replikabel untuk memprediksi kelayakan beasiswa, serta (2) mengevaluasi kinerjanya dengan metrik klasifikasi standar. Metode yang digunakan mengikuti alur KDD di RapidMiner, mencakup impor data, kendali mutu, imputasi nilai hilang, penetapan peran atribut, dan penyandian fitur; pelatihan model menggunakan smoothing Laplace. Dataset berisi 543 pendaftar periode 2024–2025 dengan atribut sosioekonomi (pekerjaan dan penghasilan orang tua, jumlah tanggungan, status DTKS, desil P3KE), sementara label target berupa kelayakan historis. Evaluasi dilakukan pada himpunan uji berukuran 50. Hasil menunjukkan akurasi 94% dengan matriks kebingungan TP=45, FP=2, FN=1, TN=2; untuk kelas Layak, presisi 95,74% dan recall 97,83%; AUC 0,891 mengindikasikan pemisahan kelas yang kuat. Temuan ini membuktikan pendekatan mampu mengenali kandidat layak secara andal, seraya menyoroti keterbatasan sensitivitas pada kelas Tidak Layak yang minoritas. Kontribusi penelitian adalah rancangan pipa analitik yang ringan dan dapat diaudit, yang mempercepat penyaringan, mengurangi subjektivitas, serta memperkuat akuntabilitas melalui keluaran terukur. Implikasinya, model dapat berperan sebagai filter tahap awal untuk memfokuskan telaah komite; peningkatan ke depan mencakup penyeimbangan kelas, penyetelan ambang, dan pelatihan ulang berkala demi menjaga keadilan dan efisiensi.

https://doi.org/10.51903/rdzdm469

Open Access Website Google Scholar

Chronic Kidney Disease Prediction Model Using Naïve Bayes (Case Study: Jayapura City)

Rumbairusy, Grace Adelin Rumbairusy; Manda, Manda; Payungallo, Yulan Nanda Sandira; Sutejo, Heru; Rumbairusy, Grace Adelin +3 more

JUISI : Jurnal Ilmiah Sistem Informasi• 2025 •LPPM Universitas Sains dan Teknologi Komputer

Penyakit Ginjal Kronis (PGK) merupakan salah satu masalah kesehatan yang kritis karena bersifat progresif dan sering tidak menunjukkan gejala pada tahap awal, sehingga banyak pasien terdiagnosis pada stadium lanjut. Di Kota Jayapura, jumlah kasus PGK terus meningkat akibat hipertensi, diabetes, serta keterbatasan akses layanan deteksi dini. Penelitian ini bertujuan membangun model prediksi PGK menggunakan algoritma Naïve Bayes serta menganalisis keterkaitan variabel klinis yang berpengaruh terhadap PGK pada pasien di Jayapura. Penelitian mengikuti kerangka kerja Cross-Industry Standard Process for Data Mining (CRISP-DM), meliputi pemahaman bisnis, pemahaman data, persiapan data, pemodelan, evaluasi, dan deployment. Dataset yang digunakan terdiri dari 500 data pasien dengan 13 atribut medis, termasuk tekanan darah, glukosa darah, kreatinin serum, hemoglobin, albumin, dan kondisi urin. Seluruh data telah melalui tahap pembersihan sebelum pemodelan sehingga tidak memerlukan preprocessing lanjutan. Pemodelan dilakukan menggunakan RapidMiner, dan algoritma Naïve Bayes menghasilkan akurasi sebesar 94.40% dengan nilai precision dan recall tinggi pada kedua kelas PGK dan non-PGK. Hasil ini menunjukkan bahwa Naïve Bayes efektif dalam mengidentifikasi pola PGK pada data klinis lokal. Kontribusi utama penelitian ini adalah pemanfaatan data nyata dari pasien Kota Jayapura, sehingga menghasilkan model prediksi yang relevan secara regional serta memberikan pemahaman baru mengenai faktor medis yang dominan. Implikasi penelitian ini mencakup potensi integrasi model ke dalam sistem pendukung keputusan klinis maupun aplikasi monitoring kesehatan untuk mendukung deteksi dini PGK dan meningkatkan kualitas layanan kesehatan.

https://doi.org/10.51903/k47t6677

Open Access Website Google Scholar

Penerapan Metode C4.5 dan K-Nearest Neighbor untuk Klasifikasi Kelulusan Mahasiswa Berdasarkan Data Akademik

Dina Amalia Putri; Naza Sefti Prianita; Elkin Rilvani

Jupiter: Publikasi Ilmu Keteknikan Industri, Teknik Elektro dan Informatika• 2025 •Asosiasi Riset Ilmu Teknik Indonesia

The issue of determining the number of students' graduation times is one of the important indicators in transmitting the quality and effectiveness of the higher education process in universities. The rate of on-time graduation not only impacts accredited institutions, but also becomes a concern for campus management in designing learning strategies and academic guidance. This study aims to apply and compare two classification algorithms in data mining, namely C4.5 and K-Nearest Neighbor KNN, in predicting the accuracy of students' graduation times. Predictions are made based on academic attributes such as Grade Point Average GPA, number of credits that have been achieved, and Semester Grade Point Average IPS as input variables. The method used in this study is Knowledge Discovery in Database KDD which includes data selection, preprocessing, transformation, data mining, and evaluation of results. The study was conducted using the RapidMiner tool, with a dataset of 279 Informatics Study Program students from the 2015 to 2019 intake. The data was classified into two categories: "graduated on time" and "not graduated on time". The test results showed that the KNN algorithm provided better performance compared to C4.5. KNN produced an accuracy of 76.08%, with a precision of 73.11% and a recall of 41.92%. Meanwhile, the C4.5 algorithm produced an accuracy of 73.49%, with a precision of 64.62% and a recall of 41.89%. This difference in accuracy indicates that KNN is more effective in capturing patterns in the data and providing more accurate predictions in this context. Thus, the KNN algorithm can be considered a more optimal method to assist universities in predicting potential student admissions in a timely manner, thus enabling early intervention for students at risk of late graduation. This research also contributes to the development of data mining-based academic decision support systems in higher education.

https://doi.org/10.61132/jupiter.v3i4.1032

Open Access Website Google Scholar

Prediksi Status Pesanan Menggunakan Metode Classification C.45 Pada Toko Stuftech.Id di Shopee

Muhamad Arief Firdaus; Fadli Rahman Latarissa; Yanuar Dzaky; Hidayanti Murtina; Fadli Rahman Latarissa +2 more

Jurnal Elektronika dan Komputer• 2025 •STEKOM PRESS

Peningkatan transaksi dalam platform e-commerce seperti Shopee menuntut adanya sistem prediksi status pesanan yang akurat, guna mengoptimalkan pelayanan dan mengurangi pembatalan maupun keterlambatan pengiriman. Penelitian ini bertujuan membangun model klasifikasi status pesanan (selesai atau batal) pada toko Stuftech.Id menggunakan algoritma C4.5. Data yang digunakan merupakan transaksi pesanan mencakup metode pembayaran, kategori wilayah pengiriman, dan ongkos kirim. Proses klasifikasi dilakukan menggunakan RapidMiner dengan tahapan preprocessing, pembangunan decision tree, dan evaluasi model. Hasil analisis menunjukkan bahwa atribut “Kategori Pulau” memiliki nilai gain tertinggi sehingga dipilih sebagai node akar. Model yang dibentuk menghasilkan akurasi sebesar 86%, dengan recall 100% untuk pesanan selesai namun hanya 6,67% untuk pesanan batal. Temuan ini mengindikasikan bahwa algoritma C4.5 efektif dalam memprediksi pesanan yang berhasil, namun perlu peningkatan dalam mendeteksi potensi pembatalan. Implementasi model ini dapat membantu pelaku usaha dalam mengambil keputusan operasional secara proaktif.

https://doi.org/10.51903/m06ehx47

Open Access Website Google Scholar

Analisis Perbandingan Algoritma Random Forest dan Algoritma Naive Bayes untuk Memprediksi Penyakit Paru-Paru di Indonesia

Eka Wulansari Fidayanthie; Asep Sayfulloh; Mardiana Rafa Alzena; Nilam Kurnia Sari

Saturnus: Jurnal Teknologi dan Sistem Informasi• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Lungs are vital organs in the human respiratory system, responsible for fulfilling the body's oxygen needs. If the lungs experience health problems, it can have adverse effects on the human respiratory system. Common causes of lung diseases are usually due to inhaling air contaminated by dust, smoke, viruses, and bacteria. This study aims to compare the performance of two classification algorithms, namely Random Forest and Naive Bayes, in predicting lung diseases. The data used was obtained from the Kaggle website and processed using RapidMiner software. The attributes involved include smoking habits, pre-existing conditions, staying up late, exercise activities, age, and outcomes. Based on the test results, the Random Forest algorithm demonstrated the best performance with an accuracy of 93%, while the Naive Bayes algorithm achieved an accuracy of 87%. These findings indicate that the Random Forest algorithm outperforms the Naive Bayes algorithm in terms of lung disease prediction accuracy.

https://doi.org/10.61132/saturnus.v3i3.956

Open Access Website Google Scholar

Analisis Sentimen Aplikasi Liputan6.Com pada Ulasan Pengguna di Google Playstore dengan Menggunakan Algoritma Support Vector Machine (Svm) dan Naïve Bayes

Yayang Tika Robiatush Sholiha; Lubna Asjad Muhda Nabilah; Imron Imron

Saturnus: Jurnal Teknologi dan Sistem Informasi• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

This study aims to evaluate user sentiment toward the Liputan6.com application available on the Google Play Store. In the digital era, user reviews serve as a significant indicator in assessing the quality of an application. However, the inconsistency between rating scores and review content renders manual analysis less objective. To address this issue, a machine learning approach was adopted by comparing two algorithms, namely Support Vector Machine (SVM) and Naïve Bayes (NB). A total of 2,500 reviews were collected through a web scraping process and automatically labeled based on the rating (positive if ≥ 3, negative if < 3). The data preprocessing stages included cleaning, case folding, tokenizing, stopword removal, and token filtering. Subsequently, word weighting was carried out using the TF-IDF method, followed by classification using 10-Fold Cross Validation in RapidMiner. The evaluation results indicate that, in the positive class, NB demonstrated superior precision (89.47%), whereas SVM achieved higher recall (98.94%) and F1-score (90.96%). In the negative class, SVM performed better in terms of precision (66.15%), while NB attained higher recall (65.65%) and F1-score (36.34%). Further evaluation based on AUC and accuracy positioned SVM in the good category (AUC 0.842; accuracy 83.82%), while NB was categorized as fail (AUC 0.505; accuracy 60.87%). Overall, SVM is considered to be more effective than NB.

https://doi.org/10.61132/saturnus.v3i3.867

Open Access Website Google Scholar