(Aji Purwinarko, Kholiq Budiman, Arif Widiyatmoko, Fitri Arum Sasi, Wahyu Hardyanto)
- Volume: 12,
Issue: 1,
Sitasi : 0
Abstrak:
Purpose: Breast cancer remains a significant cause of mortality among women, requiring accurate diagnostic methods. Traditional classification models often face accuracy challenges due to missing values and irrelevant features. This investigation advances the classification of breast cancer through the amalgamation of the C4.5 algorithm with K-Nearest Neighbor (KNN) imputation and Relief feature selection methodologies, thereby augmenting data integrity and enhancing classification efficacy.
Methods: The Wisconsin Breast Cancer Database (WBCD) was the core reference for evaluating the proposed methodology. KNN imputation addressed missing values, while Relief selected the most relevant features. The C4.5 algorithm executed training by utilizing data segregations in the corresponding proportions of 70:30, 80:20, and 90:10, with its efficiency gauged through a range of metrics, particularly accuracy, precision, recall, and F1-score.
Result: This innovative methodology achieved the highest classification accuracy of 98.57%, surpassing several existing models. Particularly noteworthy, the strategy being analyzed exhibited remarkable success relative to PSO-C4.5 (96.49%), EBL-RBFNN (98.40%), Gaussian Naïve Bayes (97.50%), and t-SNE (98.20%), demonstrating associated advancements of 2.08%, 0.17%, 1.07%, and 0.37%. These results confirm its effectiveness in handling missing values and selecting relevant features.
Novelty: Unlike prior studies that addressed missing values and feature selection separately, this research integrates both techniques, enhancing classification accuracy and computational efficiency. The findings suggest that this approach provides a reliable breast cancer diagnosis method. Future work could explore deep learning integration and validation on larger datasets to improve generalizability.