SciRepID - Enhanced Named Entity Recognition Algorithm for Filipino Cultural and Heritage Texts

📅 06 January 2025
DOI: 10.62951/iceei.v1i2.29

Enhanced Named Entity Recognition Algorithm for Filipino Cultural and Heritage Texts

Proceeding of the International Conference on Electrical Engineering and Informatics
Asosiasi Riset Teknik Elektro dan Informatika Indonesia (ARTEII)

📄 Abstract

Named Entity Recognition (NER) is a crucial natural language processing task that extracts and classifies named entities from unstructured text into predefined categories. While existing NER methods have shown success in general domains, they often face significant challenges when applied to specialized contexts like Filipino cultural and historical texts. These challenges stem from the unique linguistic features, and diverse naming conventions. This research introduces an enhanced rule-based NER approach that specifically addresses these challenges. At its core, the system utilizes curated Corpus of Historical Filipino and Philippine English (COHFIE), which serves as both training and evaluation data. This research presents an enhanced rule-based approach for NER using a Corpus of Historical Filipino and Philippine English (COHFIE) building on pattern-learning methods, incorporating character and token features, and by using positive and negative example sets. To enrich the classification process, we used the International Committee for Documentation – Conceptual Reference Model (CIDOC-CRM), a cultural heritage framework, to provide a more nuanced categorization of entities based on their historical and cultural significance. Tested across existing Filipino based models (calamanCy and RoBERTa Tagalog), the enhanced model shows improvement on identifying entities related to Filipino culture (CUL) and history terms (PER, ORG, LOC).

🔖 Keywords

#Named Entity Recognition; Natural Language Processing; Filipino Corpus

ℹ️ Informasi Publikasi

Tanggal Publikasi
06 January 2025
Volume / Nomor / Tahun
Volume 1, Nomor 2, Tahun 2025

📝 HOW TO CITE

Jhan Lou P Robantes; Andreo A Serrano, "Enhanced Named Entity Recognition Algorithm for Filipino Cultural and Heritage Texts," Proceeding of the International Conference on Electrical Engineering and Informatics, vol. 1, no. 2, Jan. 2025.

ACM
ACS
APA
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver

🔗 Artikel Terkait dari Jurnal yang Sama

📊 Statistik Sitasi Jurnal

Tren Sitasi per Tahun