Abstract
Methods: This research used machine learning methods with Random Forest, LSTM, and CNN-BiLSTM-Attention models in determining the best model. Meanwhile, the datasets were acquired from diverse secondary data sources. Hotel Occupancy Rooms Rate was derived from BPS-Statistics Indonesia, while additional data were collected through web scraping from online travel agency websites such as Tripadvisor.com, IGT with keywords “IKN”, “hotel”, and “banjir”. For the sentiment variable from online reviews, lag effects of one, two, and three months were analyzed to determine the correlation with TPK. The highest correlation was selected for inclusion in the prediction model across all machine learning methods.
Result: The results showed that the use of IGT and online traveler reviews increased the precision of forecasting models. The best model of hotel TPK nowcasting was Random Forest Regression with the lowest MAPE value and accuracy of 5.37% and 94.63%, respectively.
Novelty: The proposed method showed great potential in improving the prediction of hotel TPK by leveraging new technology and extensive data sources. The correlation with TPK decreases with an increasing time lag of sentiment. Therefore, the sentiment of reviews in the current month has the highest correlation with TPK, compared to the previous one, two, or three months.