SciRepID - Scientific Publication Search

Implementasi Data Mining Prediksi Perminatan Jurusan Siswa pada SMK Negeri 1 Waikabubak dengan Metode Algoritma C4.5

Marten Sudi; Gergorius Kopong Pati; Lidia Lali Momo

Neptunus: Jurnal Ilmu Komputer Dan Teknologi Informasi• 2024 •Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Admission of new students to an educational institution is an activity that is always carried out every new academic year, where prospective new students always increase from year to year (Muwardah and Pramunendar, 2015). Admission of students can be held from elementary to middle school, from middle school to high school / vocational school. The focus of this research is the registration of new students at SMK. As is known, SMK is a Vocational High School or abbreviated as (SMK) and where there are many majors provided which ultimately makes prospective new students confused about which major is right for them because will take a long time.. Based on C4.5 as a Classification Algorithm: C4.5 is a popular algorithm for building decision trees. It works by dividing a dataset into smaller subsets based on attribute values, thus forming an easy-to-understand tree structure. Classification results using decision trees provide a clear visualization of the decision-making process and the variables that contribute to student choices.

https://doi.org/10.61132/neptunus.v2i4.410

Open Access Website Google Scholar

Penerapan Algoritma C4.5 Dalam Klasifikasi Prestasi Atlet: Studi Kasus Pada Daftar Nama Penerima Penghargaan Tahun 2023

Andi Diah Kuswanto; Hotman Nicolas Badjo; Septian Kharist; Muhammad Zayyid Mubarok; Riski Saputra +1 more

Bridge : Jurnal Publikasi Sistem Informasi dan Telekomunikasi• 2024 •Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

This study aims to apply the C4.5 algorithm in classifying athlete performance based on the 2023 award recipient list. The C4.5 algorithm was chosen for its ability to construct decision trees that can identify patterns and characteristics distinguishing high-performing athletes. The data used in this study includes various attributes such as gender, age, sport, number of medals, and level of competition participation. The results show that the C4.5 algorithm can classify athletes with high accuracy. The resulting decision tree provides valuable insights into the key factors contributing to athlete performance. The implementation of this algorithm is expected to assist sports organizations in more effectively identifying and developing potential talents.    

https://doi.org/10.62951/bridge.v2i3.115

Open Access Website Google Scholar

Penerapan Algoritma C4.5 dalam Klasifikasi Prestasi Atlet: Studi Kasus pada Daftar Nama Penerima Penghargaan Tahun 2023

Andi Diah Kuswanto; Hotman Nicolas Badjo; Septian Kharist; Muhammad Zayyid Mubarok; Riski Saputra +1 more

Modem : Jurnal Informatika dan Sains Teknologi• 2024 •Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

This study aims to apply the C4.5 algorithm in classifying athlete performance based on the 2023 award recipient list. The C4.5 algorithm was chosen for its ability to construct decision trees that can identify patterns and characteristics distinguishing high-performing athletes. The data used in this study includes various attributes such as gender, age, sport, number of medals, and level of competition participation. The results show that the C4.5 algorithm can classify athletes with high accuracy. The resulting decision tree provides valuable insights into the key factors contributing to athlete performance. The implementation of this algorithm is expected to assist sports organizations in more effectively identifying and developing potential talents.

https://doi.org/10.62951/modem.v2i3.110

Open Access Website Google Scholar

Effects of Data Resampling on Predicting Customer Churn via a Comparative Tree-based Random Forest and XGBoost

Ako, Rita Erhovwo; Aghware, Fidelis Obukohwo; Okpor, Margaret Dumebi; Akazue, Maureen Ifeanyi; Yoro, Rume Elizabeth +7 more

Journal of Computing Theories and Applications• 2024 •Universitas Dian Nuswantoro

Customer attrition has become the focus of many businesses today – since the online market space has continued to proffer customers, various choices and alternatives to goods, services, and products for their monies. Businesses must seek to improve value, meet customers' teething demands/needs, enhance their strategies toward customer retention, and better monetize. The study compares the effects of data resampling schemes on predicting customer churn for both Random Forest (RF) and XGBoost ensembles. Data resampling schemes used include: (a) default mode, (b) random-under-sampling RUS, (c) synthetic minority oversampling technique (SMOTE), and (d) SMOTE-edited nearest neighbor (SMOTEEN). Both tree-based ensembles were constructed and trained to assess how well they performed with the chi-square feature selection mode. The result shows that RF achieved F1 0.9898, Accuracy 0.9973, Precision 0.9457, and Recall 0.9698 for the default, RUS, SMOTE, and SMOTEEN resampling, respectively. Xgboost outperformed Random Forest with F1 0.9945, Accuracy 0.9984, Precision 0.9616, and Recall 0.9890 for the default, RUS, SMOTE, and SMOTEEN, respectively. Studies support that the use of SMOTEEN resampling outperforms other schemes; while, it attributed XGBoost enhanced performance to hyper-parameter tuning of its decision trees. Retention strategies of recency-frequency-monetization were used and have been found to curb churn and improve monetization policies that will place business managers ahead of the curve of churning by customers.

https://doi.org/10.62411/jcta.10562

Open Access Website Google Scholar

Integrating Structural Causal Model Ontologies with LIME for Fair Machine Learning Explanations in Educational Admissions

Igoche, Bern Igoche; Matthew, Olumuyiwa; Bednar, Peter; Gegov, Alexander

Journal of Computing Theories and Applications• 2024 •Universitas Dian Nuswantoro

This study employed knowledge discovery in databases (KDD) to extract and discover knowledge from the Benue State Polytechnic (Benpoly) admission database and used a structural causal model (SCM) ontological framework to represent the admission process in the Nigerian polytechnic education system. The SCM ontology identified important causal relations in features needed to model the admission process and was validated using the conditional independence test (CIT) criteria. The SCM ontology was further employed to identify and constrain input features causing bias in the local interpretable model-agnostic explanations (LIME) framework applied to machine learning (ML) black-box predictions. The ablation process produced more stable LIME explanations devoid of fairness bias compared to LIME without ablation, with higher prediction accuracy (91% vs. 89%) and F1 scores (95% vs. 94%). The study also compared the performance of different ML models, including Gaussian Naïve Bayes, Decision Trees, and Logistic Regression, before and after ablation. The limitation is that the SCM ontology is qualitative and context-specific, so the fair-LIME framework can only be extrapolated to similar contexts. Future work could compare other explanation frameworks like Shapley on the same dataset. Overall, this study demonstrates a novel approach to enforcing fairness in ML explanations by integrating qualitative SCM ontologies with quantitative ML/LIME methods.

https://doi.org/10.62411/jcta.10501

Open Access Website Google Scholar

A Comparative Analysis of Machine Learning Models for Predictive Analytics in Finance

Jose Miguel Reyes; Lea Patricia Santos; Antonino Perez

International Journal of Applied Mathematics and Computing• 2024 •Asosiasi Riset Ilmu Matematika dan Sains Indonesia

This paper compares various machine learning models in their ability to predict financial trends, with a focus on time-series analysis. We evaluate models such as linear regression, decision trees, support vector machines, and deep learning, measuring their performance based on accuracy, computational cost, and interpretability. Our results reveal that deep learning models offer superior accuracy but are less interpretable, while simpler models, though less accurate, provide better insight into the underlying data. This research provides guidelines for selecting suitable models based on specific financial applications.

https://doi.org/10.62951/ijamc.v1i1.3

Open Access Website Google Scholar

Strategic Feature Selection for Enhanced Scorch Prediction in Flexible Polyurethane Form Manufacturing

43 Citations

Omoruwou, Felix; Ojugo, Arnold Adimabua; Ilodigwe, Solomon Ebuka

Journal of Computing Theories and Applications• 2024 •Universitas Dian Nuswantoro

The occurrence of scorch during the production of flexible polyurethane is a significant issue that negatively impacts foam products' resilience and generally jeopardizes their integrity. The likelihood of foam product failure can be decreased by optimizing production variables based on machine learning algorithms used to predict the occurrence of scorch. Investigating technology is required because prevention is the best approach to dealing with this problem. Hence, machine learning algorithms were trained to predict the occurrence of scorch using the thermodynamic profile of polyurethane foam, which is made up of recorded production variables. A variety of heuristics algorithms were trained and assessed for how well they performed, namely XGBoost, Decision trees, Random Forest, K-nearest neighbors, Naive Bayes, Support Vector Machines, and Logistic Regression. The XGboost ensemble was found to perform best. It outperformed others with an accuracy of 98.3% (i.e., 0.983), followed by logistic regression, decision tree, random forest, K-nearest neighbors, and naïve Bayes, yielding a training accuracy of 88.1%, 66.7%, 84.2%, 87.5%, and 67.5% respectively. The XGBoost was finally used, yielding 2-distinct cases of non(occurrence) of scorch. Ensemble demonstrates that it is quite capable and is an effective way to predict the occurrence of scorch.

https://doi.org/10.62411/jcta.9539

Open Access Website Google Scholar