SciRepID - Scientific Publication Search

Publication Search

41,520 articles from 397 journals · 1,447 citations tracked

Showing 1-20 of 39

Analytics

Siska Nar; Ahmad Nugroho; Ahmad Subhan Yazid; Helmi Wibowo; Alyauma Hajjah

Background: The development of industrial technology in the Industry 4.0 era has encouraged the implementation of intelligent monitoring systems to improve machine reliability and operational efficiency. However, machine fault diagnosis systems based on artificial intelligence often face limitations in terms of interpretability because the models used are complex and difficult to explain. Objective: This study aims to develop a deep learning-based industrial machine fault diagnosis system integrated with an Explainable Artificial Intelligence (XAI) approach to improve diagnostic accuracy while providing interpretable insights for users. Method: The research method involves collecting data from industrial machine sensors consisting of vibration signals, temperature measurements, and acoustic signals, followed by data preprocessing and feature extraction processes. The processed data are then used to train a deep learning-based diagnostic model, after which explainability methods such as SHAP or LIME are applied to analyze the contribution of each feature to the model’s prediction results. Model performance is evaluated using accuracy, precision, recall, and F1-score metrics. Results: The results indicate that the proposed deep learning model achieves better performance compared to conventional machine learning methods such as Support Vector Machine and Random Forest. Furthermore, the explainability analysis reveals that vibration amplitude, increases in machine component temperature, and anomalies in acoustic signals are the main factors influencing machine fault detection. Therefore, the proposed system not only improves the accuracy of machine fault diagnosis but also provides transparency in the decision-making process, thereby supporting the implementation of predictive maintenance in smart manufacturing environments.

Yustinus Liguori; I Wayan Sudiarsa; I Made Jagat Dita; I Gusti Ngurah Galih Jimbar Baskara; Pande Wisnu Wijaya Putra

Router : Jurnal Teknik Informatika dan Terapan 2025 Asosiasi Profesi Telekomunikasi dan Informatika Indonesia

The rapid development of smartphone technology today creates challenges for consumers and manufacturers in determining an objective price range based on highly varied technical specifications. This study aims to implement the Random Forest algorithm in classifying smartphone price ranges into four main categories, namely low, mid-range, high, and flagship. The research method was carried out systematically through the stages of loading a dataset of 2,000 entries, exploratory data analysis (EDA) to ensure data integrity, and model training with a training and testing data split of 80:20. The results showed that the Random Forest model achieved a significant overall accuracy rate of 89%. Based on feature importance analysis, it was found that RAM capacity was the most dominant determining factor, contributing 47% to prediction accuracy, followed by battery power and screen resolution as supporting features. These findings have strategic implications for manufacturers to prioritize memory capacity upgrades in determining product pricing in the market, as well as providing guidance for consumers in assessing the fairness of a device's price based on its technical capabilities.

Maria Rosario Borroek; Jasmir Jasmir; Fachruddin Fachruddin; Marrylinteri Istoningtyas; Yosefina Venus

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

Software development effort estimation is crucial as it is one of the key factors for successful software development. This research employs Random Forest to estimate software development effort. To achieve better results, the study combines the Random Forest method with Genetic Algorithm. The results show that the China dataset provides more accurate estimation compared to the Desharnais dataset, because the China dataset uses relevant feature selection for estimation.

Agung Islamy Aryanto; Yovi Pratama; Afrizal Nehemia Toscany

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

ARP spoofing attacks are a serious threat to network security, particularly in vulnerable Internet of Things (IoT) environments. This final project aims to detect ARP spoofing attacks on IoT net-works using a combination of Random Forest (RF) and Robust PCA methods. RF is chosen for its classification capabilities and handling of non-linear data, while Robust PCA is used for di-mensionality reduction and handling outliers in the data. The dataset used is "MITMArpSpoof-ing.pcap.csv," which contains network traffic data. The data is processed by performing prepro-cessing, feature scaling, and converting labels to binary (0 for benign, 1 for ARP spoofing). Subsequently, Robust PCA is applied to reduce data dimensions, and then the data is trained using the RF model. The test results show that the RF model with Robust PCA achieves an accu-racy of 96.02% in detecting ARP spoofing attacks. This method has proven effective in identify-ing and classifying ARP spoofing attacks on IoT networks.

Fransiskus Dapot Sihaloho; Jasmir Jasmir; Gunardi Gunardi

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

The rapid growth of e-commerce platforms in Indonesia, particularly Tokopedia, has resulted in a large volume of consumer reviews containing valuable information regarding customer perceptions and satisfaction. However, manual analysis of such reviews is inefficient and prone to subjectivity, necessitating an automated approach based on machine learning. This study aims to classify the sentiment of sports product reviews on Tokopedia into positive, negative, and neutral categories by applying Logistic Regression, Support Vector Machine (SVM), and Random Forest using the Term Frequency–Inverse Document Frequency (TF-IDF) approach. The data were collected through web scraping of Indonesian-language sports product reviews and processed through several preprocessing stages, including data cleaning, case folding, tokenization, stopword removal, and stemming. Feature representation was performed using TF-IDF to transform textual data into numerical vectors, after which the dataset was divided into training and testing sets with an 80:20 ratio. Model performance was evaluated using accuracy, precision, recall, and F1-score metrics. The results indicate that the application of TF-IDF significantly improves the performance of all models, with SVM consistently achieving the most optimal performance compared to Logistic Regression and Random Forest. These findings demonstrate that classical machine learning algorithms combined with TF-IDF remain highly effective for sentiment analysis of Indonesian-language text. The implications of this study are expected to assist sellers in understanding customer opinions, support consumers in making informed purchasing decisions, and serve as a foundation for the development of sentiment analysis and recommendation systems on e-commerce platforms.

R. Zaevan Khazafi Putra; Riza Pahlevi; Ronald Naibaho; Agus Nugroho

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

The dynamic changes in weather patterns in Jambi City require an accurate temperature prediction system, thus this study aims to compare the performance of Random Forest and Support Vector Regression (SVR) algorithms in predicting daily maximum temperatures using weather data from 2020–2024 obtained from OpenMeteo with the application of Feature Engineering including lag and rolling window features. The test results indicate that the SVR model with a Radial Basis Function (RBF) kernel optimized using Grid Search (C=10, epsilon=0.2, gamma=0.01) significantly outperforms Random Forest based on a statistical Paired T-test (p-value < 0.05), yielding an R-squared (R²) value of 87.46%, Mean Absolute Error (MAE) of 0.3818 °C, and Root Mean Squared Error (RMSE) of 0.4964 °C compared to Random Forest's R² of 84.05%, where the previous day's temperature (lag) and three-day rolling average were identified as the most dominant predictors, leading to the recommendation of SVR as the more effective method for temperature prediction in the study area.

Riza Pahlevi; Wilujeng Niar Raharjanto; Lies Aryani; Roby Setiawan

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

Jambi Province is one of the largest natural rubber producing regions in Indonesia; however, rubber factories under GAPKINDO Jambi still face productivity issues, particularly the gap between production capacity and actual output, and productivity assessment that is still conducted manually by GAPKINDO Jambi. This study employs Decision Tree, Random Forest, KNN, and SVM algorithms within a structured pipeline involving preprocessing, feature selection, standardization, data balancing using SMOTE, and hyperparameter tuning. The proposed solution applies productivity level classification both individually and through paired combinations (ensemble voting). The results show that the Decision Tree + Random Forest model achieves the best performance with an accuracy of 0.84 and an F1-score of 0.83, confirming the effectiveness of ensemble methods in supporting productivity improvement decisions.

Caterina Paras Dewi; Jasmir Jasmir; Willy Riyadi; Alya Rafina

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

Chronic Kidney Disease (CKD) is a heterogeneous disorder that gradually affects the structure and function of the kidneys, is difficult to recover, and causes the body to be unable to maintain metabolism and fail to maintain fluid and electrolyte balance, leading to increased urea levels. Chronic kidney disease data was obtained from Kaggle, in this study a comparison was made between two classification algorithms, namely Naïve Bayes Classifier (NBC) and Random Forest because it is not yet known what algorithm is best in classifying chronic kidney disease (CKD). Both algorithms are evaluated based on performance metrics such as accuracy, precision, recall, and confusion matrix. The results of the evaluation showed that in a dataset of 400 samples, the performance  of the Naïve Bayes Classifier (NBC) algorithm obtained an accuracy of 94%, while Random Forest had an accuracy of 93%. Then in the small dataset (158 data), Random Forest got a better accuracy score with 87% compared to the Naïve Bayes Classifier (NBC) of 78%. Based on the results of the evaluation, Random Forest has a more stable performance on small datasets, while Naïve Bayes Classifier (NBC) provides higher performance on larger datasets in the context of chronic kidney disease classification.

Dea Sabrina Candra; Jasmir Jasmir; Yanti, Elvi

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

The Indonesia Pintar Program (PIP) is an educational assistance program for students from underprivileged families, but determining the eligibility of recipients still faces obstacles in the form of subjectivity and data imbalance. This study aims to classify the eligibility of high school students receiving PIP in Jambi City using data mining methods. The SMOTE technique was applied to overcome class imbalance, and Gain Ratio feature selection was used to determine important attributes. The dataset used consisted of 19,596 student data with a training data distribution of 70% and testing data of 30%. The classification process used the Naïve Bayes, Decision Tree (J48), and Random Forest algorithms with the Use Training Set, 5-Fold, and 10-Fold Cross Validation testing schemes. The results show that SMOTE improves model performance, but feature selection in some cases reduces accuracy. Overall, Random Forest without feature selection provides the best results with an accuracy of 93.33% and is recommended as the most effective model for objectively determining PIP recipient eligibility.

Eni Rohaini; Gunardi, Gunardi; Nurhayati Nurhayati; Jasmir Jasmir; Zahra Prisdian Tiararosa

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

AImbalanced data remains a significant issue in heart disease classification using machine learning, as it tends to cause models to overestimate the majority class while ignoring minority classes with high clinical value. This can lead to a decrease in accuracy and the model's ability to accurately detect disease cases. Therefore, this study aims to assess the effectiveness of oversampling techniques, namely Random Oversampling and Synthetic Minority Oversampling Technique (SMOTE), in improving the performance of the K-Nearest Neighbors (KNN), Naive Bayes (NB), and Random Forest (RF) algorithms. The dataset used comes from Kaggle and consists of 918 data sets with 12 attributes representing patient information related to heart disease prediction. The research stages include data preprocessing, baseline model testing, and re-evaluation using the two oversampling methods. Experimental results show that oversampling can improve the performance of all algorithms. KNN achieved the best results with SMOTE, with an accuracy of 72.98% and an F1-score of 75.39%. In the Naive Bayes algorithm, both oversampling techniques produced relatively stable performance, with the highest F1-score of 73.56% using SMOTE. Meanwhile, Random Forest showed the most optimal performance when combined with Random Oversampling, with an accuracy of 79.19% and an F1-score of 81.51%. These findings confirm that the success of data balancing techniques is strongly influenced by the characteristics of the classification algorithm used, and provide a practical contribution in determining strategies for handling imbalanced data in health research.

Ichwanuddin, Yazid; Maria Rosario B; Erissya Rasywir

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

Gestational Diabetes Mellitus (GDM) is a pregnancy-related metabolic disorder that poses health risks to both mother and fetus if not detected early, requiring accurate prediction methods for early screening and clinical decision-making. This study applies the Random Forest algorithm to detect GDM risk using clinical data from the Pima Indian Dataset. Data preprocessing included handling missing values, standardization, feature engineering, and a 70:30 train–test split. Two models were developed: a baseline and an optimized model using GridSearchCV hyperparameter tuning, validated with 5-fold cross-validation. Performance was assessed using a classification report, confusion matrix, and ROC–AUC. Results show that the optimized model outperforms the baseline, achieving 88% accuracy, an AUC of  93%, and average recall of 81%–85%. Compared to previous studies, this approach demonstrates improved predictive performance. The findings indicate that combining Random Forest with comprehensive preprocessing, feature engineering, and model optimization is effective and feasible for developing a medical decision support system for early GDM risk screening.

Rachmatika, Rinna; Desyani, Teti; Khoirudin

Journal of Information Technology and Computer Science 2025 International Forum of Researchers and Lecturers

Diseases in primary health services exhibit complex spatial-temporal dynamics due to urbanization and population mobility. Conventional surveillance approaches are difficult to capture these patterns adaptively. Machine learning (ML) based on spatio-temporal modeling offers a solution with the ability to detect disease clusters automatically and with high precision. Research Objectives: This research aims to develop a machine learning model to detect disease hotspots from primary service data in Indonesia, with a focus on improving prediction accuracy, interpretability, and relevance of health policies. Methodology: The primary service dataset for 2024 (5,343 entries) was analyzed using three ML models Gradient Boosting Machine (GBM), Temporal Random Forest (TRF), and Multi-EigenSpot with spatial (village) and temporal (week, month) features. Performance evaluation includes predictive (AUC, F1-score) and spatial (Moran's I, Spatio-Temporal Correlation Index) metrics. Results: The results showed that Multi-EigenSpot achieved the best performance (AUC=0.91; F1=0.86), with the detection of dominant hotspots in Sungai Asam and Beringin Villages. Moran's I value of 0.63 indicates a strong spatial autocorrelation, while STCI=0.57 indicates moderate temporal stability. Conclusions: ML-based spatio-temporal models are effective in identifying hidden disease patterns and have the potential to be integrated into national digital surveillance systems. This approach supports precision public health by providing a scientific basis for real-time location- and time-based intervention policies.

Freyro Dobry Sianipar; Ruth Amelia Vega S Meliala; Yoseph Christian Sitanggang; Adidtya Perdana

Neptunus: Jurnal Ilmu Komputer Dan Teknologi Informasi 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Information system security faces serious challenges due to increasingly complex cyber attacks. Intrusion Detection Systems (IDS) require efficient approaches to handle high-dimensional data such as the NSL-KDD dataset with 41 features. This study aims to implement the Genetic Algorithm (GA) for feature selection on the NSL-KDD dataset to improve the efficiency and accuracy of network attack detection. The method used is computational experimental research, involving data preprocessing, GA implementation for feature selection, building a classification model using Random Forest, and performance evaluation based on accuracy, precision, recall, F1-score, and computation time. The results show that GA successfully reduced features from 41 to 12 features (70.7% reduction), significantly improving computational efficiency. However, model accuracy slightly decreased from 0.4973 to 0.4951, indicating that while GA is effective for feature selection, the elimination of certain features may reduce classification capability. The implication of this study is that GA can be used as a tool to simplify intrusion detection models, but it should be combined with parameter optimization and data imbalance handling to achieve more optimal performance.  

Henrydunan, John Bush; Purba, Jogi; Amanah, Fadilla; Perdana, Adidtya

Neptunus: Jurnal Ilmu Komputer Dan Teknologi Informasi 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Accurate wind turbine power curve modeling plays a crucial role in performance evaluation, energy yield estimation, and data-driven control strategies. However, actual power curves often exhibit non-linear behavior influenced by atmospheric variability, measurement noise, and SCADA anomalies, making conventional modeling approaches less effective. This study proposes an optimized logistic power curve model whose parameters are tuned using Particle Swarm Optimization (PSO) to improve predictive accuracy. The analysis uses the Wind Turbine SCADA Dataset from Kaggle, which undergoes extensive preprocessing including physical rule filtering, outlier detection with the Interquartile Range (IQR) method, anomaly removal, and smoothing of the power signal. A three-parameter logistic model is selected due to its ability to capture the typical S-shaped relationship between wind speed and power output. PSO is applied to identify optimal model parameters by minimizing the Mean Squared Error (MSE), utilizing 40 particles over 200 iterations. The optimized model achieves strong predictive performance with RMSE of 404.09, MAE of 179.96, and R² of 0.904 on the test set, indicating that more than 90% of the variability in actual power can be explained by wind speed. Residual analysis reveals heteroscedastic patterns and slight overestimation in mid-range wind speeds, yet overall model consistency remains high. Comparative evaluation against Linear Regression, Random Forest, and logistic modeling using curve_fit shows that the Logistic–PSO approach provides the most accurate and stable predictions. These findings demonstrate that combining logistic modeling with PSO offers an effective and robust method for data-driven wind turbine power curve optimization.

Hamza, Ali; Hussain, Wahid; Iftikhar, Hassan; Ahmad, Aziz; Shamim, Alamgir Md

Journal of Computing Theories and Applications 2025 Universitas Dian Nuswantoro

The rapid growth of open-source software (OSS) in machine learning (ML) has intensified the need for reliable, automated methods to assess project quality, particularly as OSS increasingly underpins critical applications in science, industry, and public infrastructure. This study evaluates the effectiveness of a diverse set of machine learning and deep learning (ML/DL) algorithms for classifying GitHub OSS ML projects as engineered or non-engineered using a SMOTE-enhanced and explainable modeling pipeline. The dataset used in this research includes both numerical and categorical attributes representing documentation, testing, architecture, community engagement, popularity, and repository activity. After handling missing values, standardizing numerical features, encoding categorical variables, and addressing the inherent class imbalance using the Synthetic Minority Oversampling Technique (SMOTE), seven different classifiers—K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF), XGBoost (XGB), Logistic Regression (LR), Support Vector Machine (SVM), and a Deep Neural Network (DNN)—were trained and evaluated. Results show that LR (84%) and DNN (85%) outperform all other models, indicating that both linear and moderately deep non-linear architectures can effectively capture key quality indicators in OSS ML projects. Additional explainability analysis using SHAP reveals consistent feature importance across models, with documentation quality, unit testing practices, architectural clarity, and repository dynamics emerging as the strongest predictors. These findings demonstrate that automated, explainable ML/DL-based quality assessment is both feasible and effective, offering a practical pathway for improving OSS sustainability, guiding contributor decisions, and enhancing trust in ML-based systems that depend on open-source components.

Khoirudin, Khoirudin; Pungkasanti, Prind Triajeng; Hidayati, Nurtriana

Systematic Literature Review Journal 2025 International Forum of Researchers and Lecturers

An answer to the worldwide need for solutions to food security, data fusion technology that combines climate data with satellite imagery greatly improves the accuracy of agricultural yield predictions; this study intends to examine the advancements, methods, and key contributions of this area. By sifting through 62 papers pulled from Scopus, this research employs the SLR methodology. Document type, data source, open access, subject area, and year of publication (2020–2024) are some of the categories filtered through by Boolean keywords in the selection process. To assess patterns in publications, the efficacy of machine learning models, and key contributions, bibliometric analysis was performed. An upward tendency in publication has been identified by the analysis, particularly beyond the year 2023. Integrating geographical and temporal data has been a great success with machine learning models like Random Forest, Random Forest, and Gradient Boosting. Data resolution, integration of data from several sources, and a real-time framework are still missing pieces to the puzzle when it comes to generalizing research outcomes. More complex data fusion approaches, multiregional datasets, and advanced machine learning models to back more accurate agricultural predictions are all things that this study notes as needing additional investigation in the future. To further innovate agricultural yield prediction, multidisciplinary collaboration is also crucial.

Listyaningrum, Heni Dwi

Jurnal Ilmiah Komputerisasi Akuntansi 2025 Universitas Sains dan Teknologi Komputer

The rapid growth of social media has yielded vast digital traces with high potential for improving corporate forensic auditing. Their utilization, however, lags behind through technological reliability, privacy, and adherence to the law. The aim of this study is to explore effective utilization of social media digital traces in forensic auditing and develop a functional framework that lags neither behind through technological efficiency nor adherence to the law and ethics. A mixed-method design was utilized, combining quantitative machine learning analysis with qualitative document analysis and semi-structured interview insight. Quantitative data drawn from social media digital traces were processed using Random Forest algorithm with SMOTE for class balancing, while qualitative data were processed using thematic analysis. The results indicated high model performance with 91.3% accuracy and AUC-ROC of 0.94, together with three emergent themes: digital integration, ethics and privacy, and regulation and legality. The results demonstrate that digital footprints may serve as an effective early and reliable indicator for fraud detection, provided they are accompanied by clear regulatory and ethical frameworks. Its principal contribution lies in the development of an operational model that combines machine learning with legal and ethical perspectives, a new strategy which matures methodological refinement and practical application in today's forensic auditing.

Ugbotu, Eferhire Valentine; Emordi, Frances Uchechukwu; Ugboh, Emeke; Anazia, Kizito Eluemunor; Odiakaose, Christopher Chukwufunaya +13 more

Journal of Computing Theories and Applications 2025 Universitas Dian Nuswantoro

The daily exchange of informatics over the Internet has both eased the widespread proliferation of resources to ease accessibility, availability and interoperability of accompanying devices. In addition, the recent widespread proliferation of smartphones alongside other computing devices has continued to advance features such as miniaturization, portability, data access ease, mobility, and other merits. It has also birthed adversarial attacks targeted at network infrastructures and aimed at exploiting interconnected cum shared resources. These exploits seek to compromise an unsuspecting user device cum unit. Increased susceptibility and success rate of these attacks have been traced to user's personality traits and behaviours, which renders them repeatedly vulnerable to such exploits especially those rippled across spoofed websites as malicious contents. Our study posits a stacked, transfer learning approach that seeks to classify malicious contents as explored by adversaries over a spoofed, phishing websites. Our stacked approach explores 3-base classifiers namely Cultural Genetic Algorithm, Random Forest, and Korhonen Modular Neural Network – whose output is utilized as input for XGBoost meta-learner. A major challenge with learning scheme(s) is the flexibility with the selection of appropriate features for estimation, and the imbalanced nature of the explored dataset for which the target class often lags behind. Our study resolved dataset imbalance challenge using the SMOTE-Tomek mode; while, the selected predictors was resolved using the relief rank feature selection. Results shows that our hybrid yields F1 0.995, Accuracy 0.997, Recall 0.998, Precision 1.000, AUC-ROC 0.997, and Specificity 1.000 – to accurately classify all 2,764 cases of its held-out test dataset. Results affirm that it outperformed bench-mark ensembles. Result shows the proposed model explored UCI Phishing Website dataset, and effectively classified phishing (cues and lures) contents on websites.

Mufti Ari Bianto; Hanif Azhar Ramadhan; Ardian Hudi Ramadhani; Tsalits Wildan Hamid

Jurnal Riset Rumpun Ilmu Teknik 2025 Pusat riset dan Inovasi Nasional

This study proposes the integration of a Hybrid Recommendation method (combining Content-Based and Collaborative Filtering) with Random Forest Regression (RFR) to improve the accuracy of stay duration prediction in web-based boarding house booking systems. The main issue in online boarding booking systems is the inaccuracy of predicting user stay duration, affecting room allocation efficiency and customer satisfaction. The dataset was sourced from the hotel sector due to its attribute similarities and data validity. The research process includes data preprocessing (missing value imputation, normalization, and one-hot encoding), temporal and contextual feature engineering, hybrid recommendation system construction with CBF and CF score weighting, and RFR model training optimized through Grid Search and 10-fold cross-validation. Evaluation was conducted using MAE, RMSE, R² metrics, as well as recommendation metrics such as Precision@5, Recall@5, and Mean Reciprocal Rank (MRR). Results show that this integrated model achieved an R² of 0.7239 and an MAE of 1.0537 days, as well as a Precision@5 of 0.9636. This integration proves effective in improving prediction accuracy and recommendation relevance and contributes to the development of AI-based intelligent systems in the accommodation domain.

Wahyu Saputro

Mars: Jurnal Teknik Mesin, Industri, Elektro Dan Ilmu Komputer 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Human Resource Management (HRM) plays a strategic role in improving organizational competitiveness through proper management of employee placement, training, and performance evaluation. To support the achievement of these goals, a predictive model is needed that can provide an accurate picture of employee performance. This study utilizes a Human Resource Management (HRM) dataset of 1,200 data and applies several classification algorithms to compare their effectiveness, namely J48 or C4.5, Random Forest, Naive Bayes, K-Nearest Neighbor (KNN), Logistic Regression, and Support Vector Machine (SVM). To obtain more optimal results, this study uses resampling techniques and attribute selection methods with a correlation attribute eval approach, so that class distribution can be more balanced and model accuracy increases. From the test results, the Decision Tree J48 algorithm showed the best performance with an accuracy level reaching 95.41%, a kappa value of 0.8925, a mean absolute error (MAE) of 0.0432, a precision of 0.955, a recall of 0.954, and an area under the ROC curve of 0.964. These findings indicate that J48 has excellent predictive capabilities compared to other algorithms. Furthermore, this study also found that the most influential variables in determining employee performance include the percentage of the last salary increase (EmpLast Salary Hike Percent), the level of work environment satisfaction (Emp Environment Satisfaction), the length of time since the last promotion (Years Since Last Promotion), and experience in the current role (Experience Years in Current Role). Overall, the results of the study indicate that the C4.5 algorithm with the application of the resampling technique can be an optimal solution in building an employee performance prediction system. Thus, this model has the potential to be a strong basis for managerial decision-making, particularly in designing HR development strategies and policies to improve organizational performance.