+62 813-8532-9115 info@scirepid.com

 
JIMR - Journal of International Multidisciplinary Research - Vol. 1 Issue. 1 (2023)

Preliminary Evaluation of Gaussian Naive Bayes for Multi-Label Hate Speech and Abusive Language Detection on Indonesian Twitter

Tri Pratiwi Handayani, Wahyudin Hasyim, Nursetia Wati,



Abstract

Automatic detection of hate speech and abusive language is crucial for combating online toxicity. This study explores Gaussian Naive Bayes for multi-label classification of hate speech on Indonesian Twitter, including target, category, and level. We combined TF-IDF features with contextual BERT embeddings. The model achieved balanced performance for general hate speech and good non-abusive language detection. However, it exhibited limitations with imbalanced data and specific hate speech types. The classifier consistently favored the majority class (non-hateful/non-abusive) across labels, particularly struggling with HS_Gender, HS_Physical, etc. This suggests difficulty detecting less frequent but potentially severe hate speech, likely due to limited training data. Overall accuracy and F1-scores confirm that while Gaussian Naive Bayes is efficient, it lacks robustness for nuanced multi-label classification with imbalanced datasets. This necessitates exploring alternative approaches for effectively detecting specific and less frequent hate speech.







DOI :


Sitasi :

0

PISSN :

EISSN :

3026-6874

Date.Create Crossref:

31-May-2024

Date.Issue :

29-Nov-2023

Date.Publish :

29-Nov-2023

Date.PublishOnline :

29-Nov-2023



PDF File :

Resource :

Open

License :

https://creativecommons.org/licenses/by-sa/4.0