Language-Similarity-Guided Transfer Fine-Tuning of Pre-trained Transformer Models for Sentiment Analysis Across 12 Indonesian Regional Languages

Darnoto, Brian Rizqi Paradisiaca; Firmawan, Dony Bahtera

doi:10.62411/jcta.15975

Language-Similarity-Guided Transfer Fine-Tuning of Pre-trained Transformer Models for Sentiment Analysis Across 12 Indonesian Regional Languages

Darnoto ; Brian Rizqi Paradisiaca ; Firmawan ; Dony Bahtera

Journal of Computing Theories and Applications 📅 2026 Vol 3, No 4 🔗 DOI: 10.62411/jcta.15975

Abstract

Sentiment analysis for Indonesian regional languages faces two persistent challenges: labeled training data is extremely limited for most regional varieties, and transformer models pre-trained on Bahasa Indonesia do not generalize reliably to languages with substantially different morphological structures. Prior work on the NusaX benchmark has primarily relied on direct fine-tuning, treating each regional language independently and without exploiting linguistic proximity between related languages as a transfer signal. This paper proposes Language-Similarity-Guided Transfer (LSGT), a sequential fine-tuning strategy that first adapts a pre-trained model to a pivot language selected using character trigram similarity, followed by fine-tuning on the target language. Four transformer models are evaluated across all 12 NusaX languages using the official train/validation/test splits: IndoBERT, NusaBERT, mBERT, and XLM-R. Performance is evaluated using four metrics: accuracy, macro F1, macro precision, and macro recall. Experimental results show that LSGT improves macro F1 in 44 of 48 model-language combinations, demonstrating that the fine-tuning strategy itself is a major factor in low-resource cross-lingual sentiment classification. XLM-R benefits most strongly from LSGT, achieving an average improvement of +0.137 macro F1 and a peak gain of +0.298 on Madurese. SHAP-based token attribution analysis further reveals that predictions rely heavily on named entities and domain-specific nouns rather than sentiment-bearing vocabulary, indicating a dataset-level bias inherited from the original SmSA corpus and propagated through the NusaX translation pipeline.

Keywords

Indonesian regional languages; Low-resource NLP; NusaX; Pre-trained language models; Sentiment analysis; SHAP explainability; Transfer learning; XLM-R

How to Cite

Darnoto, et al. (2026). Language-Similarity-Guided Transfer Fine-Tuning of Pre-trained Transformer Models for Sentiment Analysis Across 12 Indonesian Regional Languages. Journal of Computing Theories and Applications, 3(4). https://doi.org/10.62411/jcta.15975

Darnoto, Brian Rizqi Paradisiaca; Firmawan, Dony Bahtera, "Language-Similarity-Guided Transfer Fine-Tuning of Pre-trained Transformer Models for Sentiment Analysis Across 12 Indonesian Regional Languages," Journal of Computing Theories and Applications, vol. 3, no. 4, 2026.

Darnoto, Brian Rizqi Paradisiaca; Firmawan, Dony Bahtera. "Language-Similarity-Guided Transfer Fine-Tuning of Pre-trained Transformer Models for Sentiment Analysis Across 12 Indonesian Regional Languages." Journal of Computing Theories and Applications, vol. 3, no. 4, 2026.

Darnoto, Brian Rizqi Paradisiaca; Firmawan, Dony Bahtera. "Language-Similarity-Guided Transfer Fine-Tuning of Pre-trained Transformer Models for Sentiment Analysis Across 12 Indonesian Regional Languages." Journal of Computing Theories and Applications 3, no. 4 (2026).

Darnoto, et al. (2026) 'Language-Similarity-Guided Transfer Fine-Tuning of Pre-trained Transformer Models for Sentiment Analysis Across 12 Indonesian Regional Languages', Journal of Computing Theories and Applications, 3(4). doi: 10.62411/jcta.15975.

Darnoto, Brian Rizqi Paradisiaca; Firmawan, Dony Bahtera. Language-Similarity-Guided Transfer Fine-Tuning of Pre-trained Transformer Models for Sentiment Analysis Across 12 Indonesian Regional Languages. Journal of Computing Theories and Applications. 2026;3(4).

A Composite Centrality Framework for Evacuation Planning in Meso-Scale Spatial Networks with Semi-Structured Connectivity

Santoso, Jaya; Muliyana, Ana; Saragih, Asido; Pakpahan, Ridho; Chrisinta, Debora

Sentence-Level Sentiment Analysis of Indonesian App Reviews Using IndoBERTweet

Aqiilah, Inge Najwa; Saptono, Ristu; Syaifuddin, Akhmad

A Systematic Review of Agentic AI in Healthcare: An Evidence-Informed Seven-Principle Framework

Prakash, Chandra; Sisodia, Avneesh ; Lind, Mary

A Systematic Literature Review of Robustness-Aware Batik Motif Classification: Acquisition Variability, Feature Representation, and Learning Models

Priyambodo, Aji; Isnanto, R. Rizal; Sanjaya, Ridwan

YOLOv9s with Region-Dispersion Channel Spatial Attention for Robust Chili Leaf Disease Detection

Hidayat, Miwan Kurniawan; Na'am, Jufriadif; Ernawan, Ferda

Understanding Statistical and Temporal Representations for Large-Scale IoT DDoS Detection Through Ablation-Driven Analysis

Wicaksono, Daniel Nomolas; Setiadi, De Rosal Ignatius Moses; Susanto, Ajib; Harkespan, Imanuel; Mohamed, Mohamad Afendee; Sambas, Aceng

Tren Sitasi Jurnal