SciRepID - An Enhancement of Jiang, Z., et al.’s Compression-Based Classification Algorithm Applied to News Article Categorization

📅 14 January 2025
DOI: 10.62951/iceei.v1i1.35

An Enhancement of Jiang, Z., et al.’s Compression-Based Classification Algorithm Applied to News Article Categorization

Proceeding of the International Conference on Electrical Engineering and Informatics
Asosiasi Riset Teknik Elektro dan Informatika Indonesia (ARTEII)

📄 Abstract

This study enhances Jiang et al.'s compression-based classification algorithm by addressing its limitations in detecting semantic similarities between text documents. The proposed improvements focus on unigram extraction and optimized concatenation, eliminating reliance on entire document compression. By compressing extracted unigrams, the algorithm mitigates sliding window limitations inherent to gzip, improving compression efficiency and similarity detection. The optimized concatenation strategy replaces direct concatenation with the union of unigrams, reducing redundancy and enhancing the accuracy of Normalized Compression Distance (NCD) calculations. Experimental results across datasets of varying sizes and complexities demonstrate an average accuracy improvement of 5.73%, with gains of up to 11% on datasets containing longer documents. Notably, these improvements are more pronounced in datasets with high-label diversity and complex text structures. The methodology achieves these results while maintaining computational efficiency, making it suitable for resource-constrained environments. This study provides a robust, scalable solution for text classification, emphasizing lightweight preprocessing techniques to achieve efficient compression, which in turn enables more accurate classification.

🔖 Keywords

#Classification; Compression; News Article; Preprocessing; Unigrams

ℹ️ Informasi Publikasi

Tanggal Publikasi
14 January 2025
Volume / Nomor / Tahun
Volume 1, Nomor 1, Tahun 2025

📝 HOW TO CITE

Cid Antonio F Masapol; Sean Lester C Benavides; Jonathan C Morano; Khatalyn E Mata, "An Enhancement of Jiang, Z., et al.’s Compression-Based Classification Algorithm Applied to News Article Categorization," Proceeding of the International Conference on Electrical Engineering and Informatics, vol. 1, no. 1, Jan. 2025.

ACM
ACS
APA
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver

🔗 Artikel Terkait dari Jurnal yang Sama

📊 Statistik Sitasi Jurnal

Tren Sitasi per Tahun