📄
Abstract
This study presents a comprehensive comparative analysis of four traditional machine learning algorithms Decision Tree, Random Forest, K-Nearest Neighbors, and Support Vector Machine for Android malware detection using the preprocessed TUANDROMD dataset comprising 4,465 instances and 241 features representing both static and dynamic application characteristics. Motivated by the limitations of conventional signature-based and hybrid detection methods, especially in managing imbalanced datasets and detecting emerging malware variants, the study employed SMOTE to ensure balanced training data and fair model evaluation. The dataset was divided into 80% training and 20% testing subsets, and models were assessed using key performance metrics including accuracy, precision, recall, F1-score, and ROC AUC. The findings revealed that the proposed Random Forest model outperformed the other classifiers, achieving an accuracy of 0.993, precision of 0.992, recall of 1.000, F1-score of 0.996, and a near-perfect ROC AUC of 0.9998 surpassing state-of-the-art approaches. These results affirm the superior predictive capability, consistency, and robustness of the Random Forest algorithm in Android malware detection. The study concludes that base models, when integrated with class-balancing techniques, provide reliable and efficient malware detection across imbalanced datasets. For future research, the study recommends exploring advanced hybrid or ensemble frameworks that integrate Random Forest with deep learning architectures or other meta-heuristic optimization techniques to further enhance detection accuracy, adaptability, and resilience against rapidly evolving Android malware threats.