SciRepID - Scientific Publication Search

Publication Search

29,653 articles from 386 journals · 1,447 citations tracked

Showing 1-20 of 24

Analytics

Egi Rangga Maulana

Uranus: Jurnal Ilmiah Teknik Elektro, Sains dan Informatika 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

This study presents a high-accuracy real-time soft failure detection framework for large-scale fiber-to-the-home(FTTH) optical access network using a hybrid ensemble of Isolation Forest and One-Class Support Vector Machine (OCVSM). The proposed model was trainde and validated on a real-word multivariate performance dataset comprising more than 1.8 million samples collected at 5-minute intervals from 50 Optical Line Terminal (OLTs) and over 3,000 Optical Network Terminals (ONTs) across a five-month periode(June-October 2025). Ground-truth validation was performed using 111 confirmed network incidents in October 2025 affecting 12,990 customer. The hybrid ensemble achieved Precision 0.940, Recall 0.982, with an average detection delay of only 7.8 minutes-representing an 87.7% reduction compared to conventional manual response (63.5 minutes). The framework significantly outperforms traditional threesholding and recent ML-based methods while demonstrating practical deployability in live operational enviroments.

Ryzal Nur Alvandy; Ryzal Nur Alvandy; Arita Witianti

Jurnal Elektronika dan Komputer 2025 STEKOM PRESS

The rapid expansion of e-commerce in Indonesia has resulted in a significant rise in the number of customer reviews, which serve as a valuable source of insight for understanding consumer satisfaction. This study aims to classify or identify sentiments from product reviews on the Tokopedia platform into three categories, using the Support Vector Machine algorithm. The classification method data were ethically collected through web scraping and include review text, ratings, and the number of “likes.”  The preprocessing stage involved several NLP techniques such as pre-procesesing data representation was generated using the Term Frequency–Inverse Document Frequency method, while the issue of class imbalance was addressed using the Synthetic Minority Over-sampling Technique.  Based on the test results, the SVM model achieved an accuracy of 79.48% on the test data using a linear kernel, showing the best performance in classifying positive sentiments. However, the classification of neutral and negative sentiments still requires improvement. This study demonstrates that the combination of the TF-IDF method, additional numerical features, and data balancing techniques can produce an an efficient sentiment analysis model within the e-commerce domain.

Hamza, Ali; Hussain, Wahid; Iftikhar, Hassan; Ahmad, Aziz; Shamim, Alamgir Md

Journal of Computing Theories and Applications 2025 Universitas Dian Nuswantoro

The rapid growth of open-source software (OSS) in machine learning (ML) has intensified the need for reliable, automated methods to assess project quality, particularly as OSS increasingly underpins critical applications in science, industry, and public infrastructure. This study evaluates the effectiveness of a diverse set of machine learning and deep learning (ML/DL) algorithms for classifying GitHub OSS ML projects as engineered or non-engineered using a SMOTE-enhanced and explainable modeling pipeline. The dataset used in this research includes both numerical and categorical attributes representing documentation, testing, architecture, community engagement, popularity, and repository activity. After handling missing values, standardizing numerical features, encoding categorical variables, and addressing the inherent class imbalance using the Synthetic Minority Oversampling Technique (SMOTE), seven different classifiers—K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF), XGBoost (XGB), Logistic Regression (LR), Support Vector Machine (SVM), and a Deep Neural Network (DNN)—were trained and evaluated. Results show that LR (84%) and DNN (85%) outperform all other models, indicating that both linear and moderately deep non-linear architectures can effectively capture key quality indicators in OSS ML projects. Additional explainability analysis using SHAP reveals consistent feature importance across models, with documentation quality, unit testing practices, architectural clarity, and repository dynamics emerging as the strongest predictors. These findings demonstrate that automated, explainable ML/DL-based quality assessment is both feasible and effective, offering a practical pathway for improving OSS sustainability, guiding contributor decisions, and enhancing trust in ML-based systems that depend on open-source components.

Sipasulta, Angelica Mailen; Bayu, Teguh Indra

IT-Explore: Jurnal Penerapan Teknologi Informasi dan Komunikasi 2025 Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana

Bea Cukai has recently been in the public spotlight, especially regarding the supervision of goods from abroad. News and public responses regarding Bea Cukai's supervision create pros and cons, thus triggering a variety of responses from the public. This study aims to analyze the sentiment of Indonesian people towards the performance of Bea Cukai in monitoring goods from abroad by utilizing Twitter social media. In this research, the Support Vector Machine (SVM) algorithm is applied to classify public comments on Twitter into positive or negative sentiments. Through the crawling process carried out from June 1, 2023, to May 12, 2024, 9,051 entries of data were collected. The analysis results showed an accuracy of 93.87%, precision 94%, recall 93%, and F1-score 94%. These results show that the SVM method is effective in analyzing public sentiment, especially related to Bea Cukai's supervision.

Gunawan, Ricardho; Hendry, Hendry

IT-Explore: Jurnal Penerapan Teknologi Informasi dan Komunikasi 2025 Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana

Sentiment analysis of guest reviews is a crucial aspect in improving the quality of hotel services. This study aims to analyze the sentiment of guest reviews regarding the services of Grand Diamond Hotel Yogyakarta using a machine learning approach with the Support Vector Machine (SVM) algorithm. SVM was chosen because it can handle high-dimensional data such as text and is capable of forming an optimal separating hyperplane between sentiment classes. The research data was obtained through web scraping from Traveloka, yielding 1,119 reviews, which were processed through preprocessing, translation, and sentiment labeling using the TextBlob library. After TF-IDF weighting, the data was divided into 80% for training and 20% for testing. The linear kernel SVM model achieved 80% accuracy in classifying the reviews into positive, negative, and neutral categories. The results of this study were implemented in a web-based application equipped with data visualization and model evaluation features, allowing hotel management to efficiently monitor and analyze guest sentiment and support data-driven service quality improvement.

Wahyu Saputro

Mars: Jurnal Teknik Mesin, Industri, Elektro Dan Ilmu Komputer 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Human Resource Management (HRM) plays a strategic role in improving organizational competitiveness through proper management of employee placement, training, and performance evaluation. To support the achievement of these goals, a predictive model is needed that can provide an accurate picture of employee performance. This study utilizes a Human Resource Management (HRM) dataset of 1,200 data and applies several classification algorithms to compare their effectiveness, namely J48 or C4.5, Random Forest, Naive Bayes, K-Nearest Neighbor (KNN), Logistic Regression, and Support Vector Machine (SVM). To obtain more optimal results, this study uses resampling techniques and attribute selection methods with a correlation attribute eval approach, so that class distribution can be more balanced and model accuracy increases. From the test results, the Decision Tree J48 algorithm showed the best performance with an accuracy level reaching 95.41%, a kappa value of 0.8925, a mean absolute error (MAE) of 0.0432, a precision of 0.955, a recall of 0.954, and an area under the ROC curve of 0.964. These findings indicate that J48 has excellent predictive capabilities compared to other algorithms. Furthermore, this study also found that the most influential variables in determining employee performance include the percentage of the last salary increase (EmpLast Salary Hike Percent), the level of work environment satisfaction (Emp Environment Satisfaction), the length of time since the last promotion (Years Since Last Promotion), and experience in the current role (Experience Years in Current Role). Overall, the results of the study indicate that the C4.5 algorithm with the application of the resampling technique can be an optimal solution in building an employee performance prediction system. Thus, this model has the potential to be a strong basis for managerial decision-making, particularly in designing HR development strategies and policies to improve organizational performance.

Prashanthan, Amirthanathan

Journal of Computing Theories and Applications 2025 Universitas Dian Nuswantoro

The study presents a comprehensive framework for optimizing customer retention budget by integrating clustering, classification, and mathematical optimization techniques. The study begins with the IBM Telco dataset, which is prepared through data cleansing, encoding, and scaling.  In the preliminary phase, customer segmentation is performed using K-Means clustering, with k = 3 and k = 4 identified as optimal based on the elbow method and Silhouette score. The configurations produced three (Premium, Standard, Low) and four (Premium, Standard Plus, Standard, Low) customer segments based on purchase preferences, which served as input features for churn prediction. In the second phase, the dataset was divided into training and test sets in an 80:20 ratio, followed by data balancing using the Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbors (ENN). Multiple classification algorithms were evaluated, including Naive Bayes (NB), Random Forest (RF), Categorical Boosting (CatBoost), Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting (XGBoost), Gradient Boosting (GB), Support Vector Machine (SVM), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Multi-Layer Perceptron (MLP) using F1-score as the performance metric. CatBoost and LightGBM, with k values of 3 and 4, respectively, were the highest-performing classification models, with only minimal differences in performance.    Ultimately, customer segmentation established customer prioritization, whereas churn prediction assessed customer churn likelihood. Four distinct configurations were assessed utilizing mixed-integer linear programming (MILP) to optimise retention budget allocation within uniform budget constraints, discount amounts, and churn thresholds. In both the k=3 and k=4 scenarios, CatBoost surpassed LightGBM, with CatBoost at K=3 effectively discounting 66% of at-risk consumers across all three segments, hence improving the intervention's efficacy and budget allocation, making it the ideal choice for maximizing customer retention. The results demonstrate the importance of segmentation in enhancing retention budgeting and budget optimization, particularly concerning parameter sensitivity.

Rosa Ratri Kusuma Hariningsih; Diwahana Mutiara Candrasari; Endang Setyawati; Syamsu Wahidin; Jevon Nataniel Putra

International Journal of Computer Technology and Science 2025 Asosiasi Riset Teknik Elektro dan Infomatika Indonesia

Dengue Fever (DF) continues to be a major public health threat in Indonesia, especially in urban areas with high population density, such as Purwokerto City. This study aims to develop a predictive model to identify high-risk areas for DF outbreaks by integrating Machine Learning (ML) algorithms and Geographic Information Systems (GIS). The research utilizes historical dengue case data, meteorological parameters (rainfall, temperature, humidity), and population density as predictive variables. Three ML classification algorithms—Naïve Bayes, Logistic Regression, and Support Vector Machine (SVM)—were implemented to develop risk prediction models. Extensive data preprocessing, feature selection, and spatial integration were applied to ensure model robustness. The results show that the SVM model outperformed other methods, achieving the highest accuracy, precision, recall, and F1-score in classifying dengue risk zones. Risk maps generated through GIS visualization successfully identify priority areas for targeted interventions. The novelty of this research lies in the combination of local epidemiological data, multi-algorithm comparison, and geospatial mapping to improve early warning systems for DF in Purwokerto. This integrated approach is expected to support more effective prevention strategies and enhance public health preparedness.

Eugenea Chiquita Zahrani Assyarif; I Kadek Dwi Nuryana

Modem : Jurnal Informatika dan Sains Teknologi 2025 Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

This study aims to conduct customer segmentation and develop a classification model to predict the clusters of new customers at Monex Toys Abadi Bekasi, a micro, small, and medium enterprise (MSME). Segmentation was performed using the K-Means Clustering algorithm, incorporating parameters such as Recency, Frequency, Monetary (RFM), purchased products, payment methods, shipping cost discounts, and the total number of products purchased by customers. The segmentation results revealed two clusters: (1) Discount Hunters and (2) Loyal Customers. Subsequently, a classification process was conducted to predict customer clusters using the K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) algorithms. Evaluation results indicated that all models achieved high accuracy exceeding 98%. The best-performing model was obtained with SVM using a 70:30 data split, achieving an accuracy of 98.81%. This classification model was then implemented into a Streamlit-based cluster prediction application, enabling users to identify customer segments in real-time. The findings of this research are expected to assist MSMEs in understanding customer behavior, enhancing service quality, and supporting more effective marketing strategies.

Sarassati, Dwi Sinta; Joko Prasetyo , Sri Yulianto

IT-Explore: Jurnal Penerapan Teknologi Informasi dan Komunikasi 2025 Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana

Tidal flooding is an event of a natural phenomenon when sea water rises to land due to the influence of changes in sea tides, which causes waterlogging around the coastal area. This tidal flood hit the Demak-Semarang area, especially in the Sayung District area, which hampers and impacts community life. The purpose of this analysis is to analyze public sentiment regarding the impact of tidal flooding in Demak Regency using data obtained from social media, and the results of the analysis can be used as an evaluation for the government and related parties to formulate more responsive and effective policies to overcome the problem of tidal flooding. The SVM (Support Vector Machine) method is used to classify sentiment from each data into positive, negative, or neutral categories. The results of the analysis using SVM showed 3580 initial data, after preprocessing, 3147 data were obtained, with sentiment results of 1581 neutral opinions, 1257 negative, and 309 positive. Most opinions are neutral, indicating that people consider tidal flooding as a natural phenomenon and are used to dealing with it. However, significant negative opinions indicate dissatisfaction with the government's handling, while positive opinions are very minimal. SVM showed 84.44 percent accuracy, 86.7 percent precision, and 97.8 percent recall. The study recommends improvements in flood mitigation, assistance for affected communities, and infrastructure improvements.

Fitri Dwianasari; Rohmah Diah Yani; Karlina Novianto Laksono; Nurhafillah Mujaliza; Riza Fahlapi

Kajian Ekonomi dan Akuntansi Terapan 2025 Asosiasi Riset Ekonomi dan Akuntansi Indonesia

Mining activities in the Raja Ampat area have sparked various public reactions, both supportive and critical, particularly on social media platforms such as Twitter. This study aims to analyze public sentiment regarding the mining operations by employing two classification algorithms. A total of 500 tweets related to Raja Ampat were collected from the X platform, and after data cleaning, 168 were identified as positive sentiments and 303 as negative. Sentiment analysis was conducted using text mining techniques by comparing two algorithms: Support Vector Machine (SVM) and Naïve Bayes. To address the issue of data imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied. The analysis results showed that SVM achieved an accuracy of 80%, outperforming Naïve Bayes, which reached only 68%. This indicates that SVM performed better in classifying sentiment. Additionally, the application of SMOTE effectively enhanced both algorithms’ abilities to detect positive sentiment, as reflected in the precision, recall, and F1-score metrics. For SVM, precision reached 85%, recall 80%, and F1-score 80%, while Naïve Bayes recorded a precision and recall of 69%, and an F1-score of 68%.

Yayang Tika Robiatush Sholiha; Lubna Asjad Muhda Nabilah; Imron Imron

Saturnus: Jurnal Teknologi dan Sistem Informasi 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

This study aims to evaluate user sentiment toward the Liputan6.com application available on the Google Play Store. In the digital era, user reviews serve as a significant indicator in assessing the quality of an application. However, the inconsistency between rating scores and review content renders manual analysis less objective. To address this issue, a machine learning approach was adopted by comparing two algorithms, namely Support Vector Machine (SVM) and Naïve Bayes (NB). A total of 2,500 reviews were collected through a web scraping process and automatically labeled based on the rating (positive if ≥ 3, negative if < 3). The data preprocessing stages included cleaning, case folding, tokenizing, stopword removal, and token filtering. Subsequently, word weighting was carried out using the TF-IDF method, followed by classification using 10-Fold Cross Validation in RapidMiner. The evaluation results indicate that, in the positive class, NB demonstrated superior precision (89.47%), whereas SVM achieved higher recall (98.94%) and F1-score (90.96%). In the negative class, SVM performed better in terms of precision (66.15%), while NB attained higher recall (65.65%) and F1-score (36.34%). Further evaluation based on AUC and accuracy positioned SVM in the good category (AUC 0.842; accuracy 83.82%), while NB was categorized as fail (AUC 0.505; accuracy 60.87%). Overall, SVM is considered to be more effective than NB.

Arfian Hendro Priyono; Ema Utami; Dhani Ariatmanto

International Journal of Information Engineering and Science 2025 Asosiasi Riset Teknik Elektro dan Infomatika Indonesia

As the primary raw material for sugar and ethanol production, sugarcane is a highly significant plantation commodity. However, its relatively long growing period of approximately one year makes it more susceptible to diseases. Machine learning technology has been applied in the identification of sugarcane leaves, including through pre-processing methods and the development of disease classification models using Convolutional Neural Network (CNN) and Support Vector Machine (SVM) approaches. However, these methods exhibit limitations in terms of accuracy. Therefore, improving identification accuracy using VGG-16 is essential. The objective of this study is to enhance the accuracy of sugarcane leaf disease identification by utilizing VGG-16. The dataset consists of  2,521 sugarcane leaf images categorized into five classes. The results of this study indicate an accuracy improvement from 97.78% to 99.14%, reflecting an increase of 1.36%

Setiawan, Dita; Ali Muhammad; Siti Herawati Fransiska Dewi

Teknik: Jurnal Ilmu Teknik dan Informatika 2025 LPPM Sekolah Tinggi Ilmu Ekonomi - Studi Ekonomi Modern

Coronary heart disease (CHD) remains a leading cause of mortality worldwide. Early detection is essential to reduce complications and improve patient outcomes. This study aims to develop a classification model using machine learning algorithms to predict CHD risk based on clinical symptoms. The dataset used is the Cleveland Heart Disease dataset from the UCI Machine Learning Repository, consisting of 303 patient records with 14 clinical features. The preprocessing stage involved handling missing values, normalizing features, and transforming categorical variables. Four classification algorithms were applied: K-Nearest Neighbors (K-NN), Decision Tree, Random Forest, and Support Vector Machine (SVM). Each model was trained using stratified 10-fold cross-validation to ensure generalizability. Evaluation using accuracy, precision, recall, F1-score, and ROC-AUC metrics showed that the Random Forest algorithm achieved the highest performance with 87.2% accuracy. Feature importance analysis indicated that chest pain type, resting blood pressure, cholesterol, and ST depression were the most influential indicators. These results demonstrate that machine learning, particularly Random Forest, can effectively support early diagnosis of CHD in clinical settings and has the potential to be integrated into clinical decision support systems (CDSS).

Dwi Andre Vebriansyah; Budi Eko Soetjipto; Ludi Wisnuwardhana

Riset Ilmu Manajemen Bisnis dan Akuntansi 2025 Asosiasi Riset Ilmu Manajemen Kewirausahaan dan Bisnis Indonesia

This research conducted a systematic literature review of studies related to analyzing service quality based on user reviews with a machine learning approach. A total of 15 international and national journals were analyzed to identify challenges, methods, and trends in research in this aspect. The review results show that Natural Language Processing (NLP) and Sentiment Analysis techniques are the dominant approaches, with machine learning models such as Deep Learning, Naive Bayes, and Support Vector Machine (SVM) being commonly used. The review also identifies research gaps and provides recommendations for future research directions.

Setiadi, De Rosal Ignatius Moses; Ojugo, Arnold Adimabua; Pribadi, Octara; Kartikadarma , Etika; Setyoko, Bimo Haryo +4 more

Journal of Computing Theories and Applications 2025 Universitas Dian Nuswantoro

Breast cancer is the most prevalent cancer among women worldwide, requiring early and accurate diagnosis to reduce mortality. This study proposes a hybrid classification pipeline that integrates Hybrid Statistical Feature Selection (HSFS) with unsupervised LSTM-guided feature extraction for breast cancer detection using the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. Initially, 20 features were selected using HSFS based on Mutual Information, Chi-square, and Pearson Correlation. To address class imbalance, the training set was balanced using the Synthetic Minority Over-sampling Technique (SMOTE). Subsequently, an LSTM encoder extracted non-linear latent features from the selected features. A fusion strategy was applied by concatenating the statistical and latent features, followed by re-selection of the top 30 features. The final classification was performed using a Support Vector Machine (SVM) with RBF kernel and evaluated using 5-fold cross-validation and a held-out test set. Experimental results showed that the proposed method achieved an average training accuracy of 98.13%, F1-score of 98.13%, and AUC-ROC of 99.55%. On the held-out test set, the model reached an accuracy of 99.30%, precision of 100%, and F1-score of 99.05%, with an AUC-ROC of 0.9973. The proposed pipeline demonstrates improved generalization and interpretability compared to existing methods such as LightGBM-PSO, DHH-GRU, and ensemble deep networks. These results highlight the effectiveness of combining statistical selection and LSTM-based latent feature encoding in a balanced classification framework.

Taopik Hidayat; Daniati Uki Eka Saputri; Faruq Aziz; Nurul Khasanah

International Journal of Computer Technology and Science 2025 Asosiasi Riset Teknik Elektro dan Infomatika Indonesia

Image classification is a key field in digital image processing with broad applications, such as object recognition and disease detection. The use of artificial neural network architectures, such as MobileNetV2, has significantly advanced pattern recognition in large datasets. However, in small datasets, challenges related to accuracy and generalization are often encountered. This study explores an RGB-based approach utilizing MobileNetV2 for image feature extraction and Support Vector Machine (SVM) as the classifier. MobileNetV2 is applied to extract features from RGB images, which are then further processed by SVM to determine image classes. The results indicate that this model achieves an accuracy of 91.67%, precision of 0.9163, recall of 0.9167, and F1-score of 0.9161. Based on the confusion matrix analysis, the model effectively distinguishes between classes, despite slight overlaps. This research contributes to the development of intelligent image classification systems that can be applied in various fields, including the food industry. With these achievements, the RGB approach integrating MobileNetV2 and SVM has proven effective in enhancing image classification accuracy, even with relatively small datasets. These findings open opportunities for applying similar methods in other image processing tasks that require high accuracy in object or disease detection and classification.

Abdi Prayogi; Novriyenny Novriyenny; I Gusti Prahmana

Repeater : Publikasi Teknik Informatika dan Jaringan 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Communication is the process of exchanging information, ideas, thoughts, and feelings between individuals or groups through the use of words, signs, or actions. This process can take place verbally or non-verbally and involves various media and channels, such as face-to-face conversations, writing, gestures, facial expressions, and digital technology. This research was conducted at STMIK Kaputama Binjai, namely the WhatsApp group between lecturers and students. This study uses the Support Vector Machine (SVM) method. SVM is a type of supervised learning machine learning that requires sample data. Support Vector Machine (SVM) is an algorithm developed by Boser, Guyon, and Vapnik in 1992. Support Vector Machine (SVM) has a concept that is combined with previous computational theories. This method can transform training data into higher dimensions using non-linear patterns. The results of the Support Vector Machine method classification with a total of 16 positive sentiments, 40 neutral sentiments and 71 negative sentiments. Accuracy value 67%, margin error 39%. Positive prediction precision 75%, neutral prediction precision 83% and negative prediction precision 88%..

Siska Narulita; Sekarlangit Sekarlangit; Milka Putri Novianingrum

Bridge : Jurnal Publikasi Sistem Informasi dan Telekomunikasi 2025 Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

Food allergies are medical conditions caused by particular immunological reactions brought on by exposure to certain foods. All age groups can experience food allergies, albeit the prevalence varies between children and adults, with children experiencing this condition more frequently than adults. Find food ingredients or substances that can trigger allergies, often known as allergens. This project attempts to determine whether or not the food includes allergies by applying the SVM data mining method to a public dataset of food goods and allergens that was acquired via Kaggle. High accuracy, effective memory use, and the ability to handle non-normally distributed data are some of the benefits of the SVM method. Data collection is the first step in the research process. Data pre-processing, which includes data transformation, handling missing values, and copy objects, comes next. Validation comes next. Split validation with 90% training data and 10% testing data, 10-fold cross validation, and split validation with an 80%–20% ratio were all compared in this study. The SVM method is applied after the dataset has passed validation, and the confusion matrix is used for the last evaluation step. SVM has an accuracy rate of 97.24% when using 10-fold cross validation, according to the accuracy value produced by the validation process comparison. Split validation yields an accuracy value of 97.50% when the ratio of training data to testing data is 90% to 10%. In contrast, an accuracy rate of 98.75% was achieved by using split validation with a ratio of 80% and 20%.

Muhammad Fikry; Bustami Bustami; Ella Suzanna

Proceeding of the International Conference on Electrical Engineering and Informatics 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

This study conducts an exploratory data analysis combined with machine learning techniques to identify early signs of student depression. We investigated various factors affecting mental health among students, including sleep duration, dietary patterns, history of suicidal thoughts, family history of mental illness, and their relationships with depression across age groups and academic pressure. The study also examined the influence of gender on academic stress levels. Three machine learning models such as Random Forest, Support Vector Machine (SVM), and K-Nearest Neighbors (KNN) were utilized to predict depression. The performance of these models was evaluated, achieving accuracy rates of 84.97% for Random Forest, 84.85% for SVM, and 81.16% for KNN. The findings highlight the effectiveness of these models in predicting student depression and underscore the importance of targeted mental health interventions based on key factors influencing mental health among students.