SciRepID - Scientific Publication Search

Publication Search

41,520 articles from 397 journals · 1,447 citations tracked

Showing 1-20 of 43

Analytics

Masari, Maryam Sufiyanu; Danladi, Maiauduga Abdullahi; Onyinye, Ilori Loretta; Tohomdet, Loreta Katok

Journal of Computing Theories and Applications 2026 Universitas Dian Nuswantoro

This study presents a comprehensive comparative analysis of four traditional machine learning algorithms Decision Tree, Random Forest, K-Nearest Neighbors, and Support Vector Machine for Android malware detection using the preprocessed TUANDROMD dataset comprising 4,465 instances and 241 features representing both static and dynamic application characteristics. Motivated by the limitations of conventional signature-based and hybrid detection methods, especially in managing imbalanced datasets and detecting emerging malware variants, the study employed SMOTE to ensure balanced training data and fair model evaluation. The dataset was divided into 80% training and 20% testing subsets, and models were assessed using key performance metrics including accuracy, precision, recall, F1-score, and ROC AUC. The findings revealed that the proposed Random Forest model outperformed the other classifiers, achieving an accuracy of 0.993, precision of 0.992, recall of 1.000, F1-score of 0.996, and a near-perfect ROC AUC of 0.9998 surpassing state-of-the-art approaches. These results affirm the superior predictive capability, consistency, and robustness of the Random Forest algorithm in Android malware detection. The study concludes that base models, when integrated with class-balancing techniques, provide reliable and efficient malware detection across imbalanced datasets. For future research, the study recommends exploring advanced hybrid or ensemble frameworks that integrate Random Forest with deep learning architectures or other meta-heuristic optimization techniques to further enhance detection accuracy, adaptability, and resilience against rapidly evolving Android malware threats.

Purnomo, Rosyana Fitria; Purnomo, Rosyana Fitria; Yodhi Yuniarthe; Hilda Dwi Yunita; Fatimah Fahurian +1 more

Jurnal Elektronika dan Komputer 2026 STEKOM PRESS

Detection and identification of plant diseases is critical to the success and efficiency of agricultural production. Plant disease outbreaks are becoming more frequent throughout the world, and the presence of these diseases in cultivated plants has a significant impact on productivity. Therefore, researchers are focusing on developing effective and reliable plant disease detection methods. Thus, farmers can take advantage of early detection of this disease to minimize future losses. This article discusses machine learning approaches as well as decision trees, K-nearest neighbors, naive Bayes, support vector machines (SVM), and random forests for detecting coffee leaf diseases using leaf images. The above-mentioned classifications were researched and compared to determine the most suitable plant disease prediction model with the highest accuracy. Compared with other classification algorithms, the SVM algorithm achieves the highest accuracy of 99.75%. All the models trained above will be used by farmers to quickly identify and classify new diseases in images as a prevention strategy. As a preventive measure, farmers can detect and classify new diseases in images early.

Eni Rohaini; Gunardi, Gunardi; Nurhayati Nurhayati; Jasmir Jasmir; Zahra Prisdian Tiararosa

Prosiding Seminar Nasional Ilmu Teknik 2025 Asosiasi Riset Ilmu Teknik Indonesia

AImbalanced data remains a significant issue in heart disease classification using machine learning, as it tends to cause models to overestimate the majority class while ignoring minority classes with high clinical value. This can lead to a decrease in accuracy and the model's ability to accurately detect disease cases. Therefore, this study aims to assess the effectiveness of oversampling techniques, namely Random Oversampling and Synthetic Minority Oversampling Technique (SMOTE), in improving the performance of the K-Nearest Neighbors (KNN), Naive Bayes (NB), and Random Forest (RF) algorithms. The dataset used comes from Kaggle and consists of 918 data sets with 12 attributes representing patient information related to heart disease prediction. The research stages include data preprocessing, baseline model testing, and re-evaluation using the two oversampling methods. Experimental results show that oversampling can improve the performance of all algorithms. KNN achieved the best results with SMOTE, with an accuracy of 72.98% and an F1-score of 75.39%. In the Naive Bayes algorithm, both oversampling techniques produced relatively stable performance, with the highest F1-score of 73.56% using SMOTE. Meanwhile, Random Forest showed the most optimal performance when combined with Random Oversampling, with an accuracy of 79.19% and an F1-score of 81.51%. These findings confirm that the success of data balancing techniques is strongly influenced by the characteristics of the classification algorithm used, and provide a practical contribution in determining strategies for handling imbalanced data in health research.

Achhmad Agam; Achhmad Agam; Supatman

Jurnal Elektronika dan Komputer 2025 STEKOM PRESS

Manual quality assessment of Platelet Concentrate (TC) is highly subjective and inconsistent, necessitating an objective, automated classification system. This study aims to develop a computationally efficient, low-cost model for TC quality classification using Histogram Features extracted from grayscale images combined with the K-Nearest Neighbor (KNN) algorithm. The methodology employed critical preprocessing steps, including StandardScaler for normalization and SMOTE for balancing the training data, followed by optimization across K=1 to K=30. The optimal model achieved a maximum accuracy of 69.23% at K=6, with an F1-Score of 71.43%, confirming robust performance on the imbalanced testing set. The results validate the effectiveness of the Histogram-KNN approach as a consistent and reliable decision support system for rapid TC quality screening in resource-limited settings.

Hamza, Ali; Hussain, Wahid; Iftikhar, Hassan; Ahmad, Aziz; Shamim, Alamgir Md

Journal of Computing Theories and Applications 2025 Universitas Dian Nuswantoro

The rapid growth of open-source software (OSS) in machine learning (ML) has intensified the need for reliable, automated methods to assess project quality, particularly as OSS increasingly underpins critical applications in science, industry, and public infrastructure. This study evaluates the effectiveness of a diverse set of machine learning and deep learning (ML/DL) algorithms for classifying GitHub OSS ML projects as engineered or non-engineered using a SMOTE-enhanced and explainable modeling pipeline. The dataset used in this research includes both numerical and categorical attributes representing documentation, testing, architecture, community engagement, popularity, and repository activity. After handling missing values, standardizing numerical features, encoding categorical variables, and addressing the inherent class imbalance using the Synthetic Minority Oversampling Technique (SMOTE), seven different classifiers—K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF), XGBoost (XGB), Logistic Regression (LR), Support Vector Machine (SVM), and a Deep Neural Network (DNN)—were trained and evaluated. Results show that LR (84%) and DNN (85%) outperform all other models, indicating that both linear and moderately deep non-linear architectures can effectively capture key quality indicators in OSS ML projects. Additional explainability analysis using SHAP reveals consistent feature importance across models, with documentation quality, unit testing practices, architectural clarity, and repository dynamics emerging as the strongest predictors. These findings demonstrate that automated, explainable ML/DL-based quality assessment is both feasible and effective, offering a practical pathway for improving OSS sustainability, guiding contributor decisions, and enhancing trust in ML-based systems that depend on open-source components.

Zehy Fadia; Yani Maulita; Husnul Khair

Merkurius : Jurnal Riset Sistem Informasi dan Teknik Informatika 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Anxiety disorders are common mental health problems in society, often unrecognized by the sufferer. Identifying the type of anxiety disorder and its influencing factors is crucial for proper treatment. This research aims to apply the K-Nearest Neighbor (K-NN) method in identifying types of anxiety disorders based on influencing factors, focusing on patient data from Sylvani Hospital, Binjai. The K-NN method was chosen because of its ability to classify based on data proximity. This study used medical record data of patients with anxiety disorders, which were processed using MATLAB and Microsoft Excel software. The results show that the K-NN method is effective in identifying types of anxiety disorders, with a high level of accuracy, especially in the identification of Panic Disorder (K05) and Social Anxiety Disorder (K03). The use of MATLAB simplified the identification process by automating results, while data processing in Excel improved classification accuracy. This study concludes that the K-NN method can be an effective alternative in identifying anxiety disorder types based on the factors that influence them. It is recommended for future research to involve more variables and mental health experts for a more comprehensive validation of the results.

Wahyu Saputro

Mars: Jurnal Teknik Mesin, Industri, Elektro Dan Ilmu Komputer 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

Human Resource Management (HRM) plays a strategic role in improving organizational competitiveness through proper management of employee placement, training, and performance evaluation. To support the achievement of these goals, a predictive model is needed that can provide an accurate picture of employee performance. This study utilizes a Human Resource Management (HRM) dataset of 1,200 data and applies several classification algorithms to compare their effectiveness, namely J48 or C4.5, Random Forest, Naive Bayes, K-Nearest Neighbor (KNN), Logistic Regression, and Support Vector Machine (SVM). To obtain more optimal results, this study uses resampling techniques and attribute selection methods with a correlation attribute eval approach, so that class distribution can be more balanced and model accuracy increases. From the test results, the Decision Tree J48 algorithm showed the best performance with an accuracy level reaching 95.41%, a kappa value of 0.8925, a mean absolute error (MAE) of 0.0432, a precision of 0.955, a recall of 0.954, and an area under the ROC curve of 0.964. These findings indicate that J48 has excellent predictive capabilities compared to other algorithms. Furthermore, this study also found that the most influential variables in determining employee performance include the percentage of the last salary increase (EmpLast Salary Hike Percent), the level of work environment satisfaction (Emp Environment Satisfaction), the length of time since the last promotion (Years Since Last Promotion), and experience in the current role (Experience Years in Current Role). Overall, the results of the study indicate that the C4.5 algorithm with the application of the resampling technique can be an optimal solution in building an employee performance prediction system. Thus, this model has the potential to be a strong basis for managerial decision-making, particularly in designing HR development strategies and policies to improve organizational performance.

Muhammad Akmal Ar Rasid; Catur Pranomo; Elkin Rilvani

Bridge : Jurnal Publikasi Sistem Informasi dan Telekomunikasi 2025 Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

This study aims to utilize data mining techniques, specifically the K-Nearest Neighbors (KNN) algorithm, to classify leaf diseases in sugarcane (Saccharum officinarum). Early and accurate detection of leaf disease types is a crucial step in prevention and control strategies, thereby reducing potential crop losses caused by pathogen attacks. Leaf diseases in sugarcane, such as leaf scald, rust, and mosaic virus, are known to affect photosynthesis, inhibit growth, and reduce the quality and quantity of sugarcane produced. The classification process in this study was carried out through image analysis of infected sugarcane leaves, where features such as color, texture, and shape were extracted using digital image processing techniques. The KNN algorithm was chosen because of its non-parametric nature, ease of implementation, and its ability to provide accurate classification results even with limited data size. The working principle of KNN is to determine the class of a new sample based on the majority class of its k nearest neighbors in the feature space, making it very suitable for the case of leaf disease image classification. In addition to building a classification model, this study also examines disease prevention strategies based on the identification results. These strategies include the use of disease-resistant sugarcane varieties, the implementation of appropriate planting patterns, land moisture management, regular plantation sanitation, and the measured and environmentally friendly use of pesticides or fungicides. Model performance evaluation was conducted using accuracy, precision, recall, and F1-score metrics to assess model effectiveness across various data scenarios. The results of this study are expected to not only contribute to the development of decision support systems for farmers and related parties but also support the application of artificial intelligence-based technology in the agricultural sector.

Angdresey, Apriandy; Sitanayah, Lanny; Rumpesak, Zefanya Marieke Philia; Ooi, Jing-Quan

Journal of Computing Theories and Applications 2025 Universitas Dian Nuswantoro

Electricity has emerged as an essential requirement in modern life. As demand escalates, electricity costs rise, making wastefulness a drain on financial resources. Consequently, forecasting electricity usage can enhance our management of consumption. This study presents an IoT-based monitoring and forecasting system for electricity consumption. The system comprises two NodeMCU micro-controllers, a PZEM-004T sensor for collecting real-time power data, and three relays that regulate the current flow to three distinct electrical appliances. The data gathered is transmitted to a web application utilizing the k-Nearest Neighbor (k-NN) algorithm to forecast future electricity usage based on historical patterns. We evaluated the system's performance using four weeks of electricity consumption data. The results indicated that predictions were most accurate when the user’s daily consumption pattern remained stable, achieving a Mean Absolute Error (MAE) of approximately 1 watt and a Mean Absolute Percentage Error (MAPE) ranging from 1% to 1.7%. Additionally, predictions were notably precise during the early morning hours (3:00 AM to 8:00 AM) when k=6 was employed. This study demonstrates the effectiveness of integrating IoT-based systems with machine learning for real-time energy monitoring and forecasting. Furthermore, it emphasizes the application of data mining techniques within embedded IoT environments, providing valuable insights into the implementation of lightweight machine learning for smart energy systems.

Dina Amalia Putri; Naza Sefti Prianita; Elkin Rilvani

Jupiter: Publikasi Ilmu Keteknikan Industri, Teknik Elektro dan Informatika 2025 Asosiasi Riset Ilmu Teknik Indonesia

The issue of determining the number of students' graduation times is one of the important indicators in transmitting the quality and effectiveness of the higher education process in universities. The rate of on-time graduation not only impacts accredited institutions, but also becomes a concern for campus management in designing learning strategies and academic guidance. This study aims to apply and compare two classification algorithms in data mining, namely C4.5 and K-Nearest Neighbor KNN, in predicting the accuracy of students' graduation times. Predictions are made based on academic attributes such as Grade Point Average GPA, number of credits that have been achieved, and Semester Grade Point Average IPS as input variables. The method used in this study is Knowledge Discovery in Database KDD which includes data selection, preprocessing, transformation, data mining, and evaluation of results. The study was conducted using the RapidMiner tool, with a dataset of 279 Informatics Study Program students from the 2015 to 2019 intake. The data was classified into two categories: "graduated on time" and "not graduated on time". The test results showed that the KNN algorithm provided better performance compared to C4.5. KNN produced an accuracy of 76.08%, with a precision of 73.11% and a recall of 41.92%. Meanwhile, the C4.5 algorithm produced an accuracy of 73.49%, with a precision of 64.62% and a recall of 41.89%. This difference in accuracy indicates that KNN is more effective in capturing patterns in the data and providing more accurate predictions in this context. Thus, the KNN algorithm can be considered a more optimal method to assist universities in predicting potential student admissions in a timely manner, thus enabling early intervention for students at risk of late graduation. This research also contributes to the development of data mining-based academic decision support systems in higher education.

Prashanthan, Amirthanathan

Journal of Computing Theories and Applications 2025 Universitas Dian Nuswantoro

The study presents a comprehensive framework for optimizing customer retention budget by integrating clustering, classification, and mathematical optimization techniques. The study begins with the IBM Telco dataset, which is prepared through data cleansing, encoding, and scaling.  In the preliminary phase, customer segmentation is performed using K-Means clustering, with k = 3 and k = 4 identified as optimal based on the elbow method and Silhouette score. The configurations produced three (Premium, Standard, Low) and four (Premium, Standard Plus, Standard, Low) customer segments based on purchase preferences, which served as input features for churn prediction. In the second phase, the dataset was divided into training and test sets in an 80:20 ratio, followed by data balancing using the Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbors (ENN). Multiple classification algorithms were evaluated, including Naive Bayes (NB), Random Forest (RF), Categorical Boosting (CatBoost), Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting (XGBoost), Gradient Boosting (GB), Support Vector Machine (SVM), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Multi-Layer Perceptron (MLP) using F1-score as the performance metric. CatBoost and LightGBM, with k values of 3 and 4, respectively, were the highest-performing classification models, with only minimal differences in performance.    Ultimately, customer segmentation established customer prioritization, whereas churn prediction assessed customer churn likelihood. Four distinct configurations were assessed utilizing mixed-integer linear programming (MILP) to optimise retention budget allocation within uniform budget constraints, discount amounts, and churn thresholds. In both the k=3 and k=4 scenarios, CatBoost surpassed LightGBM, with CatBoost at K=3 effectively discounting 66% of at-risk consumers across all three segments, hence improving the intervention's efficacy and budget allocation, making it the ideal choice for maximizing customer retention. The results demonstrate the importance of segmentation in enhancing retention budgeting and budget optimization, particularly concerning parameter sensitivity.

Eugenea Chiquita Zahrani Assyarif; I Kadek Dwi Nuryana

Modem : Jurnal Informatika dan Sains Teknologi 2025 Asosiasi Profesi Telekomunikasi Dan Informatika Indonesia

This study aims to conduct customer segmentation and develop a classification model to predict the clusters of new customers at Monex Toys Abadi Bekasi, a micro, small, and medium enterprise (MSME). Segmentation was performed using the K-Means Clustering algorithm, incorporating parameters such as Recency, Frequency, Monetary (RFM), purchased products, payment methods, shipping cost discounts, and the total number of products purchased by customers. The segmentation results revealed two clusters: (1) Discount Hunters and (2) Loyal Customers. Subsequently, a classification process was conducted to predict customer clusters using the K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) algorithms. Evaluation results indicated that all models achieved high accuracy exceeding 98%. The best-performing model was obtained with SVM using a 70:30 data split, achieving an accuracy of 98.81%. This classification model was then implemented into a Streamlit-based cluster prediction application, enabling users to identify customer segments in real-time. The findings of this research are expected to assist MSMEs in understanding customer behavior, enhancing service quality, and supporting more effective marketing strategies.

Rayga Rayyan; Marice Simarmata

Jurnal Riset Ilmu Hukum, Sosial dan Politik 2025 Asosiasi Peneliti dan Pengajar Ilmu Hukum Indonesia

The utilization of Artificial Intelligence (AI) in healthcare services and medical diagnosis in Indonesia has grown rapidly alongside the digital transformation of the health sector. AI technology has been employed to improve service efficiency, accelerate diagnostic processes, and enhance disease detection accuracy, particularly through medical imaging and ECG data analysis. Algorithms such as K-Nearest Neighbor (KNN) and Chi-Square have shown effectiveness in heart disease classification. However, despite its benefits, AI implementation presents legal challenges. The absence of specific regulations regarding legal liability in cases of AI-based diagnostic errors creates uncertainty for both medical professionals and patients. Additionally, the lack of national standards, weak patient data protection, and digital literacy gaps present significant obstacles. Adaptive policies, the establishment of dedicated regulations, and collaboration between government, medical practitioners, technology developers, and academics are essential to develop a legal framework that accommodates AI advancements responsibly. With clear legal certainty, AI technology can be optimally utilized to support more inclusive and high-quality healthcare services.

Iorzua, Joseph Tersoo; Moses, Timothy; Eke, Christopher Ifeanyi; Agushaka, Ovre Jeffery; Kwaghtyo, Dekera Kenneth +1 more

Journal of Computing Theories and Applications 2025 Universitas Dian Nuswantoro

Learners are continually faced with choosing appropriate courses or making career choices due to increased educational opportunities. The emergence of machine learning-based course and career recommender systems has the potential to address this issue, offering personalized course recommendations tailored to individual learning pathways, preferences, and learning history. The optimization and feature engineering techniques and practical deployment environments have not been collectively examined in the previous research, despite the significant advancements in this area of research. Furthermore, previous research has rarely synthesized how these technical components help students choose appropriate courses and careers. This systematic review was carried out to investigate the current state of machine learning-based course and career recommender systems, focusing on key elements, such as primary data sources, feature engineering methods, algorithms, optimization techniques, evaluation metrics, and the environments where the existing course recommendation models are deployed. The PRISMA method for conducting a systematic review was used to choose studies that met the requirements for inclusion and exclusion. The study findings show significant reliance on interpretable and traditional machine learning algorithms, such as K-Nearest Neighbor and Random Forest, to develop recommender models. Feature engineering remains basic, as most studies rely on normalization, while optimization processes are often underreported. Also, evaluation metrics varied widely, impeding comparability, while most of the recommender models are deployed in an e-learning environment, leaving the traditional learning environment underrepresented. Furthermore, the study findings identified issues including data sparsity and diversity, data security and privacy, and changes in learner preferences that may have an impact on the performance of recommender systems while recommending further studies to make use of standardized optimization methods, and automated domain-informed feature engineering frameworks, benchmark and annotated datasets in developing models the gives priority to learners’ success and educational relevance.

Setiawan, Dita; Ali Muhammad; Siti Herawati Fransiska Dewi

Teknik: Jurnal Ilmu Teknik dan Informatika 2025 LPPM Sekolah Tinggi Ilmu Ekonomi - Studi Ekonomi Modern

Coronary heart disease (CHD) remains a leading cause of mortality worldwide. Early detection is essential to reduce complications and improve patient outcomes. This study aims to develop a classification model using machine learning algorithms to predict CHD risk based on clinical symptoms. The dataset used is the Cleveland Heart Disease dataset from the UCI Machine Learning Repository, consisting of 303 patient records with 14 clinical features. The preprocessing stage involved handling missing values, normalizing features, and transforming categorical variables. Four classification algorithms were applied: K-Nearest Neighbors (K-NN), Decision Tree, Random Forest, and Support Vector Machine (SVM). Each model was trained using stratified 10-fold cross-validation to ensure generalizability. Evaluation using accuracy, precision, recall, F1-score, and ROC-AUC metrics showed that the Random Forest algorithm achieved the highest performance with 87.2% accuracy. Feature importance analysis indicated that chest pain type, resting blood pressure, cholesterol, and ST depression were the most influential indicators. These results demonstrate that machine learning, particularly Random Forest, can effectively support early diagnosis of CHD in clinical settings and has the potential to be integrated into clinical decision support systems (CDSS).

Fikri Muhamad Fahmi; Budiman Budiman; Nur Alamsyah

International Journal of Science and Mathematics Education 2025 Asosiasi Riset Ilmu Matematika dan Sains Indonesia

Given the increasing prevalence of mental health challenges in digital work settings, especially among IT remote workers, early detection mechanisms have become critically important. This study aims to improve the prediction accuracy of mental health conditions among IT remote workers by integrating feature engineering techniques within machine learning models. Five algorithms consisting of Random Forest, Logistic Regression, K-Nearest Neighbors, Decision Tree, and Naive Bayes were evaluated. The Random Forest model achieved the best performance, with 83% accuracy, 83% precision, 100% recall, and a 90% F1-score, followed closely by Logistic Regression with 82% accuracy. Nevertheless, the results demonstrate the feasibility of applying machine learning to support the early detection of mental health risks, offering a strong foundation for future research in predictive analytics and the development of intelligent support systems within digital work environments.

M. Bimo Prasetyo; Dwi Oktarina

Merkurius : Jurnal Riset Sistem Informasi dan Teknik Informatika 2025 Asosiasi Riset Teknik Elektro dan Informatika Indonesia

The online gaming industry continues to grow rapidly in Indonesia, with many users purchasing digital items through 3rd party top up services such as Pitopup.com. One of the main challenges faced by Pitopup.com is the difficulty in classifying the sales of each available game item. This research aims to apply the K-Nearest Neighbor (KNN) method to predict the sales classification of game items in order to find out the sales category for each game item and hopefully help increase stock efficiency. The dataset used was obtained from historical sales data on Pitopup.com from June to September 2024. The research stages include data processing, normalization using Min-Max Scaling, data transformation using label encoding, separating test and training data using a ratio of 80:20, and using confusion matrix as a model evaluation. The test results show that KNN algorithm is able to classify game item sales on the Pitopup.com website with a level of accuracy in several categories: marketable category at 100%, the moderately sellable category at 100% and the not sellable category at 100%.

Edhy Poerwandono; M. Endang Taufik

Router : Jurnal Teknik Informatika dan Terapan 2025 Asosiasi Profesi Telekomunikasi dan Informatika Indonesia

Due to the variety of types of flowers that exist and having and tracking each variety, making plant lovers and cultivators difficult to distinguish in determining the type of flower, it takes a very long time to find out the type of flower if you only rely on the five senses. With the application of the K-Nearest Neighbor algorithm and feature extraction of color and texture, it is very helpful in image processing to identify flowers more easily and shorten the time, with the greatest accuracy of 71% using the K-7 value, the flower was successfully carried out.

Ujianto, Nur Tulus; Gunawan; Fadillah, Haris; Fanti, Azizah Permata; Saputra, Aryan Dandi +1 more

IT-Explore: Jurnal Penerapan Teknologi Informasi dan Komunikasi 2025 Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana

This study aims to optimize the implementation of the K-Nearest Neighbors (K-NN) algorithm for medical image classification by focusing on selecting the optimal KKK parameter and applying dimensionality reduction techniques to improve accuracy and efficiency. The data used was sourced from public medical image repositories such as The Cancer Imaging Archive (TCIA) and Medical Image Analysis datasets, covering various diseases, including brain tumors, lung cancer, and kidney lesions. The research process involves data collection, data preprocessing, dimensionality reduction using Principal Component Analysis (PCA), applying the K-NN algorithm with Euclidean, Minkowski, and Cosine distance metrics, and performance evaluation using accuracy, precision, recall, and F1-score. Experimental results demonstrate that K=5with the Euclidean distance metric provides the best performance, achieving an accuracy of 90%. Additionally, PCA effectively reduces computational time by 30% without significantly compromising accuracy. This study proves that K-NN is an effective method for medical image classification. However, further research is needed to integrate K-NN with deep learning models to enhance performance and feature extraction capabilities.