SciRepID - Scientific Publication Search

Analisis Machine Learning pada Data Netflix Shows untuk Mengklasifikasikan Tren Genre dan Karakteristik Film

Claudia K. Hamsi; I Wayan Sudiarsa; Vinsensia P.K Abu; Sarling C. Dhai; Maria A. Serero

Mars: Jurnal Teknik Mesin, Industri, Elektro Dan Ilmu Komputer• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

The rapid development of digital streaming platforms such as Netflix has generated a large volume of content data with diverse characteristics, thereby requiring effective analytical methods to understand emerging patterns and trends. This study aims to classify Netflix content into two main categories, namely movies and television shows, and to analyze genre trends and content characteristics using a data mining approach with the Naive Bayes algorithm. The dataset used in this study is the Netflix Shows dataset, consisting of 8,809 content entries, with the primary features analyzed including genre, rating, and country of production. The research process begins with data exploration and preprocessing stages, including data cleaning, handling missing values, and transforming categorical features to enable effective model construction. Subsequently, the dataset is divided into training and testing sets to objectively and systematically build and evaluate the Naive Bayes classification model. Model performance is evaluated using accuracy, precision, recall, and F1-score metrics to assess the model’s ability to accurately distinguish between Netflix content types. The experimental results demonstrate that the Naive Bayes algorithm is able to classify Netflix content into Movie and TV Show categories with accuracy, precision, recall, and F1-score values of 100%, respectively. The confusion matrix indicates that no misclassification occurred, suggesting that genre, rating, and country of production features provide a very clear separation between content classes. These findings indicate that the Naive Bayes algorithm can achieve exceptionally high classification performance with optimal evaluation results. The results further reveal distinct differences in characteristics between movies and television shows based on genre and production attributes. Therefore, this study is expected to contribute to the development of content recommendation systems and strategic content management within the streaming industry.

https://doi.org/10.61132/mars.v3i6.1389

Open Access Website Google Scholar

Development Strategy System Information Management in Support Organizational Digital Transformation

Hendra Jatnika; Mia Kusmiati

International Journal of Management Science and Entrepreneurship• 2025 •International Forum of Researchers and Lecturers

Goals – Goals from studies This is For explore approach strategic in development System Information Management (SIM) as integral part in support digital transformation of modern organizations. Study This emphasize importance integration technology information , effective data management as well as improvement digital competence resources Power man in operation system. Design/ methodology / approach – Conceptual article This use method review library with analyze various work relevant academic and technical manuals , in particular related implementation of SIM in the sector public and private . Study This referring to the works Jatnika et al. (2022–2024), including utilization Microsoft Office applications as skills supporting basis organizational digital literacy . Findings – Findings studies This show that SIM development is not just effort technical , but rather need strategic in support digital transformation . Key strategies covers design modular systems , data mining integration , training programs based users , and evaluation system in a way periodic . Components This allows organization build responsive and adaptive SIM ecosystem . Implications practical – Organizations that want to do digital transformation is necessary invest in development digital capabilities of sources Power the human as well as ensure effectiveness use developed SIM system in a way strategic can become driving force main in increase efficiency , accuracy , and capability taking decision across work units . Originality / value – Study This offers a conceptual model structured about development of SIM in context digital transformation , based on literature applications and needs organizations in the real world . This article give outlook practical for taker policy , IT managers , and HR developers .  

https://doi.org/10.70062/globalmanagement.v2i4.428

Open Access Website Google Scholar

Implementasi Klasifikasi Datamining dengan Algoritma C4.5 untuk Rekomendasi Pemilihan Fakultas Perguruan Tinggi Berdasarkan Minat dan Bakat Siswa SMK

Senna Hendrian; V.H Valentino; Wisdariah, Wisdariah; Riezca Talita Trista; Dudi Parulian

Neptunus: Jurnal Ilmu Komputer Dan Teknologi Informasi• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Selecting a faculty that aligns with students’ interests and talents is a strategic step in determining the success of higher education and future career paths. However, most vocational high school (SMK) students still face difficulties in identifying the most suitable faculty due to the lack of data-driven analysis. This study implements the C4.5 classification algorithm within data mining techniques to build an automatic and measurable faculty recommendation system. The dataset consists of attributes such as SMK major, interest level, aptitude test results, academic grade average, and gender, with the output being the recommended faculty. The C4.5 algorithm was chosen for its ability to generate a transparent and interpretable decision tree, which helps both guidance counselors and students understand the rationale behind the recommendations. The experimental results show that the constructed classification model achieved an accuracy rate of 88%, based on cross-validation testing using data from 12th-grade students. The implementation of this system is expected to serve as an objective tool in the faculty selection process and to promote a data-driven decision-making approach in secondary education environments.

https://doi.org/10.61132/neptunus.v3i4.1159

Open Access Website Google Scholar

Analisis Sentimen pada Ulasan Aplikasi JakLingko Menggunakan Metode Naïve Bayes

Ricardus Mba Dala Pati; Eka Kusuma Pratama; Tuslaela Tuslaela

Repeater : Publikasi Teknik Informatika dan Jaringan• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

JakLingko is a digital-based public transportation integration system developed to facilitate access to various transportation modes in Jakarta. Along with the increasing number of users, reviews on the JakLingko application reflect user experiences and perceptions. This study aims to analyze the sentiment of user reviews on the Google Play Store using the Naïve Bayes method. Data collection was conducted through web scraping, resulting in 3,260 reviews. The data were preprocessed, sentiment-labeled, and classified using Orange Data Mining. The research applied a quantitative experimental approach with a machine learning framework. The classification results showed that neutral sentiment dominated user reviews, followed by negative and positive sentiments. The Naïve Bayes model achieved 100% accuracy based on the confusion matrix and other evaluation metrics such as precision, recall, and F1-score. The findings highlight that Naïve Bayes can be a reliable approach for analyzing public opinion and serve as a reference for evaluating and improving digital service applications.

https://doi.org/10.62951/repeater.v3i4.638

Open Access Website Google Scholar

Predicting First-Year Student Performance with SMOTE-Enhanced Stacking Ensemble and Association Rule Mining for University Success Profiling

Kikunda, Philippe Boribo; Kasongo, Issa Tasho; Nsabimana, Thierry; Ndikumagenge, Jérémie; Ndayisaba, Longin +2 more

Journal of Computing Theories and Applications• 2025 •Universitas Dian Nuswantoro

This study examines the application of Educational Data Mining (EDM) to predict the academic per-formance of first-year students at the Catholic University of Bukavu and the Higher Institute of Edu-cation (ISP) in the Democratic Republic of Congo. The primary objective is to develop a model that can identify at-risk students early, providing the university with a tool to enhance student support and academic guidance. To address the challenges posed by data imbalance (where successful cases outnumber failures), the study adopts a hybrid methodological approach. First, the SMOTE algorithm was applied to balance the dataset. Then, a stacking classification model was developed to combine the predictive power of multiple algorithms. The variables used for prediction include the National Exam score (PEx), the secondary school track (Humanities), and the type of prior institution (public, private, or religious-affiliated schools), as well as age and sex. The results demonstrate that this approach is highly effective. The model is not only capable of predicting success or failure but also of forecasting students' performance levels (e.g., honors or distinctions). Moreover, the use of the Apriori association rule mining algorithm allowed the identification of faculty-specific success profiles, transforming prediction into an interpretable decision-support tool. This research makes several significant contributions. Practically, it provides the University of Bukavu with a tool for student orientation and early risk detection. Methodologically, it illustrates the effectiveness of a combined approach to EDM in an African context. However, the study acknowledges certain limitations, including the non-public nature of the data and the geographical specificity of the sample. It therefore proposes avenues for future research, such as the integration of Explainable AI (XAI) techniques for more refined and transparent analysis of the results.

https://doi.org/10.62411/jcta.14043

Open Access Website Google Scholar

Penggunaan Metode Rough Set untuk Menentukan Tingkat Kesiapan Siswa dalam Menghadapi ANBK di SMP Negeri 2 Kuala

Harninda Br Keliat; Novriyenni Novriyenni; Tio Ria Pasaribu

Repeater : Publikasi Teknik Informatika dan Jaringan• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

The Computer-Based National Assessment (ANBK) is an essential instrument designed to comprehensively measure student competence, including literacy, numeracy, and character aspects. However, in practice, many students still face various challenges during preparation, such as cognitive limitations, psychological readiness, and technical barriers, which affect their overall readiness to participate in ANBK. This study aims to analyze the readiness level of students at SMP Negeri 2 Kuala by employing the Rough Set method. The variables examined include digital literacy, subject matter understanding, psychological readiness, and school facility support. Data were collected from 250 ninth-grade students through structured questionnaires and subsequently processed using the Rosetta software to perform attribute reduction and generate decision rules. The findings indicate that digital literacy, subject matter understanding, and psychological readiness are the most influential variables in determining student readiness, while facility support serves only as a complementary factor. The extraction process generated seven decision rules with an accuracy level of 100%, which effectively classified students into three readiness categories: highly ready, ready, and less ready. These results confirm that the Rough Set method is highly effective for identifying dominant factors and producing decision rules that can guide schools in developing targeted strategies to enhance student readiness for ANBK.

https://doi.org/10.62951/repeater.v3i3.619

Open Access Website Google Scholar

Pemanfaatan Data Mining untuk Klasifikasi Penyakit Daun pada Tebu dan Cara Pencegahan Penyakit dengan Metode Algoritma K-Nearest Neighbors

Muhammad Akmal Ar Rasid; Catur Pranomo; Elkin Rilvani

Bridge : Jurnal Publikasi Sistem Informasi dan Telekomunikasi• 2025 •Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

This study aims to utilize data mining techniques, specifically the K-Nearest Neighbors (KNN) algorithm, to classify leaf diseases in sugarcane (Saccharum officinarum). Early and accurate detection of leaf disease types is a crucial step in prevention and control strategies, thereby reducing potential crop losses caused by pathogen attacks. Leaf diseases in sugarcane, such as leaf scald, rust, and mosaic virus, are known to affect photosynthesis, inhibit growth, and reduce the quality and quantity of sugarcane produced. The classification process in this study was carried out through image analysis of infected sugarcane leaves, where features such as color, texture, and shape were extracted using digital image processing techniques. The KNN algorithm was chosen because of its non-parametric nature, ease of implementation, and its ability to provide accurate classification results even with limited data size. The working principle of KNN is to determine the class of a new sample based on the majority class of its k nearest neighbors in the feature space, making it very suitable for the case of leaf disease image classification. In addition to building a classification model, this study also examines disease prevention strategies based on the identification results. These strategies include the use of disease-resistant sugarcane varieties, the implementation of appropriate planting patterns, land moisture management, regular plantation sanitation, and the measured and environmentally friendly use of pesticides or fungicides. Model performance evaluation was conducted using accuracy, precision, recall, and F1-score metrics to assess model effectiveness across various data scenarios. The results of this study are expected to not only contribute to the development of decision support systems for farmers and related parties but also support the application of artificial intelligence-based technology in the agricultural sector.

https://doi.org/10.62951/bridge.v3i3.580

Open Access Website Google Scholar

IoT-Based Home Electricity Monitoring and Consumption Forecasting using k-NN Regression for Efficient Energy Management

Angdresey, Apriandy; Sitanayah, Lanny; Rumpesak, Zefanya Marieke Philia; Ooi, Jing-Quan

Journal of Computing Theories and Applications• 2025 •Universitas Dian Nuswantoro

Electricity has emerged as an essential requirement in modern life. As demand escalates, electricity costs rise, making wastefulness a drain on financial resources. Consequently, forecasting electricity usage can enhance our management of consumption. This study presents an IoT-based monitoring and forecasting system for electricity consumption. The system comprises two NodeMCU micro-controllers, a PZEM-004T sensor for collecting real-time power data, and three relays that regulate the current flow to three distinct electrical appliances. The data gathered is transmitted to a web application utilizing the k-Nearest Neighbor (k-NN) algorithm to forecast future electricity usage based on historical patterns. We evaluated the system's performance using four weeks of electricity consumption data. The results indicated that predictions were most accurate when the user’s daily consumption pattern remained stable, achieving a Mean Absolute Error (MAE) of approximately 1 watt and a Mean Absolute Percentage Error (MAPE) ranging from 1% to 1.7%. Additionally, predictions were notably precise during the early morning hours (3:00 AM to 8:00 AM) when k=6 was employed. This study demonstrates the effectiveness of integrating IoT-based systems with machine learning for real-time energy monitoring and forecasting. Furthermore, it emphasizes the application of data mining techniques within embedded IoT environments, providing valuable insights into the implementation of lightweight machine learning for smart energy systems.

https://doi.org/10.62411/jcta.13602

Open Access Website Google Scholar

Korelasi Model Pembelajaran Terhadap Peningkatan Prestasi Belajar Siswa SMA Negeri 1 Kuala Menggunakan Metode Apriori

Ame Ananda Br Ginting; Novriyenni Novriyenni; Tio Ria Pasaribu

Repeater : Publikasi Teknik Informatika dan Jaringan• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

This study aims to analyze the correlation between learning models and student achievement at SMA Negeri 1 Kuala by applying the Apriori algorithm in data mining, using Rapid Miner software as the primary tool for analysis. The research is motivated by the shift in educational approaches from conventional teacher-centered methods toward more innovative strategies such as project-based learning and cooperative learning, which are expected to foster higher levels of student engagement and improve academic outcomes. In many schools, particularly at the secondary level, the choice of learning model, availability of facilities, and attendance rates are crucial factors that shape learning effectiveness and student performance. The data collected in this study include student grades, the types of learning models implemented, school facility conditions, and attendance rates for the 2023/2024 academic year, covering a total of 680 students. The Apriori algorithm was employed to discover hidden patterns and associations among these variables, enabling the identification of relationships between learning factors and academic achievement. By applying Rapid Miner software, the research systematically generated association rules that reflect meaningful correlations in the dataset. The results indicated that the use of the Indonesian language subject in combination with a cooperative learning model, adequate and complete school facilities, and good student attendance was strongly associated with the attainment of an A grade. This finding was supported by a support level of 53.33% and a confidence level of 100%, suggesting a robust and reliable relationship between these factors. The implementation of data mining techniques through Rapid Miner not only allowed for efficient data processing but also provided practical recommendations for educators and school administrators in designing effective instructional strategies.

https://doi.org/10.62951/repeater.v3i3.616

Open Access Website Google Scholar

Penerapan Metode C4.5 dan K-Nearest Neighbor untuk Klasifikasi Kelulusan Mahasiswa Berdasarkan Data Akademik

Dina Amalia Putri; Naza Sefti Prianita; Elkin Rilvani

Jupiter: Publikasi Ilmu Keteknikan Industri, Teknik Elektro dan Informatika• 2025 •Asosiasi Riset Ilmu Teknik Indonesia

The issue of determining the number of students' graduation times is one of the important indicators in transmitting the quality and effectiveness of the higher education process in universities. The rate of on-time graduation not only impacts accredited institutions, but also becomes a concern for campus management in designing learning strategies and academic guidance. This study aims to apply and compare two classification algorithms in data mining, namely C4.5 and K-Nearest Neighbor KNN, in predicting the accuracy of students' graduation times. Predictions are made based on academic attributes such as Grade Point Average GPA, number of credits that have been achieved, and Semester Grade Point Average IPS as input variables. The method used in this study is Knowledge Discovery in Database KDD which includes data selection, preprocessing, transformation, data mining, and evaluation of results. The study was conducted using the RapidMiner tool, with a dataset of 279 Informatics Study Program students from the 2015 to 2019 intake. The data was classified into two categories: "graduated on time" and "not graduated on time". The test results showed that the KNN algorithm provided better performance compared to C4.5. KNN produced an accuracy of 76.08%, with a precision of 73.11% and a recall of 41.92%. Meanwhile, the C4.5 algorithm produced an accuracy of 73.49%, with a precision of 64.62% and a recall of 41.89%. This difference in accuracy indicates that KNN is more effective in capturing patterns in the data and providing more accurate predictions in this context. Thus, the KNN algorithm can be considered a more optimal method to assist universities in predicting potential student admissions in a timely manner, thus enabling early intervention for students at risk of late graduation. This research also contributes to the development of data mining-based academic decision support systems in higher education.

https://doi.org/10.61132/jupiter.v3i4.1032

Open Access Website Google Scholar

Penggunaan Algoritma K-Means dalam Pengelompokan Pasien Diabetes Mellitus Berdasarkan Parameter Klinis di Puskesmas Brebes

Feronika, Fadia; Feronika, Fadia; Ariesanto Ramdhan, Nur; Mohamad Herdian Bhakti, Raden

Jurnal Elektronika dan Komputer• 2025 •STEKOM PRESS

Diabetes Mellitus merupakan salah satu penyakit kronis yang jumlah penderitanya terus bertambah setiap tahunnya, termasuk di wilayah Puskesmas Brebes. Banyaknya pasien dengan kondisi klinis yang beragam mendorong perlunya suatu metode untuk mengelompokkan pasien berdasarkan tingkat keparahannya. Penelitian ini bertujuan untuk menerapkan algoritma K-Means dalam proses pengelompokan pasien Diabetes Mellitus dengan menggunakan beberapa parameter klinis, yaitu Gula Darah Puasa (GDP), kadar HbA1c, Kolesterol Total (CHOL), serta tekanan darah sistolik dan diastolik. Pendekatan yang digunakan dalam penelitian ini adalah deskriptif kuantitatif dengan metode data mining berbasis algoritma K-Means. Data yang digunakan diperoleh dari rekam medis Puskesmas Brebes. Proses klasterisasi menghasilkan tiga kelompok, yaitu kategori risiko rendah, sedang, dan tinggi. Hasil penelitian menunjukkan bahwa algoritma K-Means mampu melakukan pengelompokan data pasien secara akurat sesuai tingkat keparahan. Hasil tersebut kemudian divisualisasikan melalui sistem berbasis web yang bertujuan untuk mempermudah pihak puskesmas dalam menganalisis kondisi pasien serta mendukung pengambilan keputusan medis yang lebih efektif.

https://doi.org/10.51903/elkom.v18i1.2947

Open Access Website Google Scholar

Pemetaan Wilayah Rawan Kecelakaan Lalu Lintas di Kabupaten Brebes Menggunakan Algoritma K-Means

Agung Permana, Tegar; Tegar Agung Permana; Saeful Bachri, Otong; Herdian Bhakti, RM

Jurnal Elektronika dan Komputer• 2025 •STEKOM PRESS

Kecelakaan lalu lintas di Kabupaten Brebes merupakan masalah kritis karena tingginya frekuensi insiden yang terjadi di wilayah tersebut. Penelitian ini bertujuan untuk menentukan area yang rentan terhadap kecelakaan dengan menggunakan algoritma K-Means Clustering , yang mendukung proses pengambilan keputusan berbasis data. Isu utama yang dieksplorasi dalam penelitian ini adalah bagaimana algoritma K-Means dapat diimplementasikan untuk mengelompokkan zona rawan kecelakaan dan meningkatkan kesadaran masyarakat terhadap keselamatan jalan. Metodologi yang digunakan meliputi pengumpulan data melalui tinjauan pustaka, observasi langsung, dan wawancara, yang dilanjutkan dengan penggunaan algoritma K-Means untuk mengklasifikasikan data kecelakaan berdasarkan jumlah kejadian, korban jiwa, dan cedera. Temuan menunjukkan bahwa algoritma K-Means secara efektif mengelompokkan lokasi rawan kecelakaan ke dalam tiga tingkat risiko yang berbeda: tinggi, sedang, dan rendah. Dengan demikian, informasi yang terklasifikasi ini dapat membantu otoritas terkait dalam meningkatkan langkah-langkah keselamatan lalu lintas dan mengedukasi masyarakat tentang area berisiko tinggi. Hasil penelitian ini diharapkan dapat berkontribusi pada pengembangan kebijakan keselamatan lalu lintas yang lebih terinformasi dan strategis di Kabupaten Brebes.

https://doi.org/10.51903/elkom.v18i1.2929

Open Access Website Google Scholar

Analisis Klasifikasi Risiko Dropout Mahasiswa Menggunakan Algoritma Decision Tree dan Random Forest

Abdah Syakiroh Gustian; Fathoni Mahardika

Jupiter: Publikasi Ilmu Keteknikan Industri, Teknik Elektro dan Informatika• 2025 •Asosiasi Riset Ilmu Teknik Indonesia

This study aims to develop an accurate predictive model for identifying students at risk of academic dropout using Decision Tree and Random Forest algorithms. The research utilizes a publicly available dataset sourced from Kaggle, which includes academic and demographic features such as GPA, attendance, credit load, financial aid status, and exam scores. The methodology involves several stages: data collection, preprocessing (handling missing values, encoding categorical variables, and feature scaling), model training, and evaluation using performance metrics such as Accuracy, Precision, Recall, F1-Score, and Confusion Matrix. Results show that the Random Forest algorithm outperforms Decision Tree in terms of accuracy and robustness, with notable feature importance on math, reading, and writing scores. The findings highlight the potential of machine learning in early detection of dropout risks and provide actionable insights for academic institutions to design timely interventions. This research contributes to the growing field of educational data mining and supports data-driven decision-making processes in higher education management.

https://doi.org/10.61132/jupiter.v3i4.980

Open Access Website Google Scholar

Analisis Perbandingan Algoritma Random Forest dan Algoritma Naive Bayes untuk Memprediksi Penyakit Paru-Paru di Indonesia

Eka Wulansari Fidayanthie; Asep Sayfulloh; Mardiana Rafa Alzena; Nilam Kurnia Sari

Saturnus: Jurnal Teknologi dan Sistem Informasi• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Lungs are vital organs in the human respiratory system, responsible for fulfilling the body's oxygen needs. If the lungs experience health problems, it can have adverse effects on the human respiratory system. Common causes of lung diseases are usually due to inhaling air contaminated by dust, smoke, viruses, and bacteria. This study aims to compare the performance of two classification algorithms, namely Random Forest and Naive Bayes, in predicting lung diseases. The data used was obtained from the Kaggle website and processed using RapidMiner software. The attributes involved include smoking habits, pre-existing conditions, staying up late, exercise activities, age, and outcomes. Based on the test results, the Random Forest algorithm demonstrated the best performance with an accuracy of 93%, while the Naive Bayes algorithm achieved an accuracy of 87%. These findings indicate that the Random Forest algorithm outperforms the Naive Bayes algorithm in terms of lung disease prediction accuracy.

https://doi.org/10.61132/saturnus.v3i3.956

Open Access Website Google Scholar

Pengelompokan Tindak Kejahatan Berdasarkan Tempat Kejadian Perkara di Kota Binjai Menggunakan Metode Clustering : Studi kasus: Polres Binjai

Herdina Putri Ahmadi; Magdalena Simanjuntak; Muammar Khadapi

Saturnus: Jurnal Teknologi dan Sistem Informasi• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Crime is a social issue that continues to evolve alongside increasing community activity and regional development. This study aims to Cluster crime data in Binjai City based on the location of incidents using the K-Means algorithm and the Cross Industry Standard Process for Data Mining (CRISP-DM) approach. The data were obtained from the Binjai Police Department, with attributes including the type of crime, time of occurrence, and location, categorized by district. A comprehensive data preprocessing stage was carried out, involving the extraction of information from raw data, normalization of crime type labels, and conversion of categorical data into numerical form using label encoding. The optimal number of Clusters was determined using the Silhouette score method, which yielded the best result at K = 10. The Clustering results were further evaluated using the Davies-Bouldin Index (DBI) to ensure Cluster quality. The analysis revealed that Binjai Utara District has the highest number of crimes, particularly aggravated theft (curat), which frequently occurs from early morning to late morning. This Clustering is expected to provide valuable insights for authorities in formulating more targeted and data-driven regional security strategies.

https://doi.org/10.61132/saturnus.v3i3.933

Open Access Website Google Scholar

Penerapan K-Means Clustering untuk Segmentasi Pelanggan

Fathoni Dwi Atmoko

Uranus: Jurnal Ilmiah Teknik Elektro, Sains dan Informatika• 2025 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Public transportation, with Transjakarta as its main pillar, requires a deep understanding of customer behavior to improve service quality and maintain loyalty. This study aims to segment Transjakarta customers using data mining techniques, specifically the K-Means Clustering algorithm, based on the RFM (Recency, Frequency, Monetary/Value) behavioral model. 37,900 rows of raw transaction data were processed into a clean database, resulting in 1,917 unique customers for analysis. The RFM metrics were then normalized using Min-Max Scaler. The optimal number of clusters was evaluated using the Elbow Curve and Silhouette Score Methods, which led to the determination of k = 4 clusters. The segmentation results identified four customer groups requiring specific strategies: Cluster 3 (Champions) with high R, F, and V (requiring rewards and retention); Cluster 0 (Active, Low Value) with high R and F but low V (requiring upsells and cross-sells); Cluster 1 (Potential/At-Risk); and Cluster 2 (Dormant/Lost). Preliminary analysis (EDA) showed that nearly half of customers (49.3%) used Bank DKI cards, dominated by the productive age group (25–45 years old), with the Rusun Kapuk Muara–Penjaringan route being the busiest. The main managerial recommendation is to strengthen the partnership with Bank DKI and optimize services in this busy corridor.

https://doi.org/10.61132/uranus.v3i2.1214

Open Access Website Google Scholar

Teknik dan aplikasi data mining di Indonesia: tinjauan literatur satu dekade (2015-2024)

Saputri, Eliana

IT-Explore: Jurnal Penerapan Teknologi Informasi dan Komunikasi• 2025 •Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana

The importance of data mining in Indonesia is increasing along with the growth of big data in various strategic sectors. Data mining plays an important role in transforming complex data into useful information to support data-driven decision making, which is urgently needed in the face of competitive challenges and operational complexity. This research aims to examine the development of data mining techniques and applications in Indonesia over the last decade (2015-2024). Through a systematic literature review approach, data was collected from academic publications in SCOPUS indexed databases. From the initial 95 papers found, a further selection was made based on accessibility, title, and abstract until 64 papers were included in the article review. The results show that techniques such as K-Means, Naive Bayes, and Decision Tree are most commonly used. In the business sector, clustering through K-Means is widely applied for market segmentation and consumer pattern analysis. The healthcare sector mainly utilizes classification techniques, such as Naive Bayes and Decision Tree, for disease risk prediction and early diagnosis. Meanwhile, the education sector uses data mining to assess student performance and predict potential dropouts, assisting institutions in optimizing learning strategies.

https://doi.org/10.24246/itexplore.v4i2.2025.pp138-149

Open Access Website Google Scholar

Explainable Bayesian Network Recommender for Personalized University Program Selection

Kikunda, Philippe Boribo; Ndikumagenge, Jérémie; Ndayisaba, Longin; Nsabimana, Thierry

Journal of Computing Theories and Applications• 2025 •Universitas Dian Nuswantoro

In a context where students face increasingly complex academic choices, this work proposes a recommendation system based on Bayesian networks to guide new baccalaureate holders in their university choices. Using a dataset containing variables such as secondary school section, gender, type of school, percentage obtained, age, and first-year honors, we have constructed a probabilistic model capturing the dependencies between these characteristics and the option chosen. The data is collected at the Catholic University of Bukavu, the Official University of Bukavu, and the Higher Institute of Education of Bukavu, preprocessed and then used to learn the structure via the hill-climbing algorithm with the BIC score using R's bnlearn tool. The model enables us to estimate the probability that a candidate will choose a given stream, depending on their profile. The approach has been validated using metrics such as BIC, cross-validation, and bootstrap and offers a good compromise between interpretability and predictive performance. The results highlight the potential of Bayesian networks in constructing explainable recommendation systems in the field of academic guidance. The system produces orientation probability maps for each candidate, which can be used by enrollment service advisers, as well as an ordered list of options relevant to the candidate's profile. With a remarkable performance on a test sample of precision@k=0.85, recall@k=0.61, ndcg=0.8, and Map=0.88, it constitutes an effective lever for reducing the risk of being misdirected in universities in South-Kivu, in the Democratic Republic of Congo

https://doi.org/10.62411/jcta.12720

Open Access Website Google Scholar

Formulating Tracing and Linking Algorithm For Identifying Cyber Terrorism Website

Wan Ahmad Ramzi W.Y; Aznizah Ab.Karim; Mohamad Shaufi Kambaruddin; Afzanizam Alias; Mohd Faizal Yahaya

International Journal of Mechanical, Electrical and Civil Engineering• 2025 •Asosiasi Riset Ilmu Teknik Indonesia

Nowadays, the number of cyber terrorists that using Internet as a medium keep increasing around the world are really worrying. Even though the cyber terrorist is publishing their post openly and public, there are quite difficult to recognize their main communication media. This happen because the cyber terrorist might hide their agenda through any sympathetic base website. Therefore, the aim of this project is to formulate tracing and linking algorithm using data mining technique in order to identify the relationship between each cyber terrorism components that may result to cyber terrorism website.      

https://doi.org/10.61132/ijmecie.v2i2.266

Open Access Website Google Scholar

Deteksi Alergen pada Produk Pangan Menggunakan Algoritma Support Vector Machines (SVM)

Siska Narulita; Sekarlangit Sekarlangit; Milka Putri Novianingrum

Bridge : Jurnal Publikasi Sistem Informasi dan Telekomunikasi• 2025 •Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

Food allergies are medical conditions caused by particular immunological reactions brought on by exposure to certain foods. All age groups can experience food allergies, albeit the prevalence varies between children and adults, with children experiencing this condition more frequently than adults. Find food ingredients or substances that can trigger allergies, often known as allergens. This project attempts to determine whether or not the food includes allergies by applying the SVM data mining method to a public dataset of food goods and allergens that was acquired via Kaggle. High accuracy, effective memory use, and the ability to handle non-normally distributed data are some of the benefits of the SVM method. Data collection is the first step in the research process. Data pre-processing, which includes data transformation, handling missing values, and copy objects, comes next. Validation comes next. Split validation with 90% training data and 10% testing data, 10-fold cross validation, and split validation with an 80%–20% ratio were all compared in this study. The SVM method is applied after the dataset has passed validation, and the confusion matrix is used for the last evaluation step. SVM has an accuracy rate of 97.24% when using 10-fold cross validation, according to the accuracy value produced by the validation process comparison. Split validation yields an accuracy value of 97.50% when the ratio of training data to testing data is 90% to 10%. In contrast, an accuracy rate of 98.75% was achieved by using split validation with a ratio of 80% and 20%.

https://doi.org/10.62951/bridge.v3i1.393

Open Access Website Google Scholar