Tasya Nurdin; Dodo Zaenal Abidin; Kurniabudi Kurniabudi
This study conducts sentiment analysis of Indonesian user reviews of the CapCut application using IndoBERT and compares two evaluation schemes: a single 80/20 train–test split and stratified 5-fold cross-validation (k=5). A total of 1,048,575 reviews were collected from the Google Play Store through web scraping and labeled into three sentiment classes based on rating: negative (1–2), neutral (3), and positive (4–5). After preprocessing—cleaning, case folding, banned-word removal, normalization—and duplicate removal, 517,962 reviews were retained. IndoBERT Base P1 was fine-tuned using fixed hyperparameters (batch size 32, learning rate 2e-5, up to 4 epochs, early stopping patience 2), while undersampling was applied to the training set to address class imbalance. Performance was assessed using accuracy, precision, recall, F1-score, and ROC-AUC, supported by confusion matrix and ROC-curve visualizations. The single split achieved an accuracy of 0.756, whereas cross-validation produced a mean accuracy of 0.740. Across both schemes, the positive class achieved the best performance (F1-score 0.850; ROC-AUC 0.918–0.919), while the neutral class remained the most challenging (precision 0.198–0.206; F1-score 0.280–0.283). Overall, cross-validation is recommended for reporting because it reduces dependence on a single partition and provides a more representative estimate across multiple splits.