Application of machine learning algorithms to predict hotel occupancy
Abstract
The development and availability of information technology and the possibility of deep integration of internal IT systems with external ones gives a powerful opportunity to analyze data online based on external data providers. Recently, machine learning algorithms play a significant role in predicting different processes. This research aims to apply several machine learning algorithms to predict high frequent daily hotel occupancy at a Chinese hotel. Five machine learning models (bagged CART, bagged MARS, XGBoost, random forest, SVM) were optimized and applied for predicting occupancy. All models are compared using different model accuracy measures and with an ARDL model chosen as a benchmark for comparison. It was found that the bagged CART model showed the most relevant results (R2 > 0.50) in all periods, but the model could not beat the traditional ARDL model. Thus, despite the original use of machine learning algorithms in solving regression tasks, the models used in this research could have been more effective than the benchmark model. In addition, the variables’ importance was used to check the hypothesis that the Baidu search index and its components can be used in machine learning models to predict hotel occupancy.
Keyword : bagged CART, bagged MARS, XGBoost, random forest, SVM, hotel occupancy
This work is licensed under a Creative Commons Attribution 4.0 International License.
References
Ahani, A., Nilashi, M., Ibrahim, O., Sanzogni, L., & Weaven, S. (2019). Market segmentation and travel choice prediction in Spa hotels through TripAdvisor’s online reviews. International Journal of Hospitality Management, 80, 52–77. https://doi.org/10.1016/j.ijhm.2019.01.003
Al Shehhi, M., & Karathanasopoulos, A. (2020). Forecasting hotel room prices in selected GCC cities using deep learning. Journal of Hospitality and Tourism Management, 42, 40–50. https://doi.org/10.1016/j.jhtm.2019.11.003
Aryai, V., & Glodsworthy, M. (2023). Day ahead carbon emission forecasting of regional National Electricity Market using machine learning methods. Engneering Application of Artificial Intelligence, 123, 106314. https://doi.org/10.1016/j.engappai.2023.106314
Boriratrit, S., Fuangfoo, P., Srithapon, C., & Chatthaworn, R. (2023). Adaptive meta-learning extreme learning machine with golden eagle optimization and logistic map for forecasting the incomplete data of solar irradiance. Energy and AI, 13, 100243. https://doi.org/10.1016/j.egyai.2023.100243
Breiman, L. (1984). Classification and regression trees (1st ed.). Routledge. https://doi.org/10.1201/9781315139470
Buja, A., & Stuetzle, W. (2006). Observations on bagging. Statistica Sinica, 16(2), 323–351. http://www.jstor.org/stable/24307547
Caicedo-Torres, W., & Payares, F. (2016). A machine learning model for occupancy rates and demand forecasting in the hospitality industry. In M. Montes y Gómez, H. Escalante, A. Segura, & J. Murillo (Eds.), Lecture notes in computer science: Vol. 10022. Advances in Artificial Intelligence – IBERAMIA 2016 (pp. 201–211). Springer. https://doi.org/10.1007/978-3-319-47955-2_17
Calero-Sanz, J., Orea-Giner, A., Villacé-Molinero, T., Muñoz-Mazón, A., & Fuentes-Moraleda, L. (2022). Predicting a new hotel rating system by analysing UGC content from Tripadvisor: Machine learning application to analyse service robots influence, Procedia Computer Science, 200, 1078–1083. https://doi.org/10.1016/j.procs.2022.01.307
Chen, T., & He, T. (2023). xgboost: eXtreme Gradient Boosting. R package version 1.7.5.1. https://cran.r-project.org/web/packages/xgboost/vignettes/xgboost.pdf
Divasón, J., Ceniceros, J. F., Sanz-Garcia, A., Pernia-Espinoza, A., & Martinez-de-Pison, F. J. (2023). PSO-PARSIMONY: A method for finding parsimonious and accurate machine learning models with particle swarm optimization. Application for predicting force-displacement curves in T-stub steel connections. Neurocomputing, 548, 126414. https://doi.org/10.1016/j.neucom.2023.126414
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics, 28(2), 337–407. https://doi.org/10.1214/aos/1016218223
Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1–67. https://doi.org/10.1214/aos/1176347963
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
Gong, Y., Liu, G., Xue, Y., Li, R., & Meng, L. (2023). A survey on dataset quality on machine learning. Information and Software Technology, 162, 107268. https://doi.org/10.1016/j.infsof.2023.107268
Huang, L., & Zheng, W. (2023). Novel deep learning approach for forecasting daily hotel demand with agglomeration effect. International Journal of Hospitality Management, 98, 103038. https://doi.org/10.1016/j.ijhm.2021.103038
Jiang, Y., Tran, T. H., & Williams, L. (2023). Machine learning and mixed reality for smart aviation: Applications and challenges. Journal of Air Transport Management, 111, 102437. https://doi.org/10.1016/j.jairtraman.2023.102437
Kamm, S., Veekati, S. S., Müller, T., Jazdi, N., & Weyrich, M. (2023). A survey on machine learning based analysis of heterogeneous data in industrial automation. Computers in Industry, 149, 103930. https://doi.org/10.1016/j.compind.2023.103930
Kaya, K., Yılmaz, Y., Yaslan, Y., Öğüdücü, S. G., & Çıngı, F. (2022). Demand forecasting model using hotel clustering findings for hospitality industry. Information Processing and Management, 59(1), 102816. https://doi.org/10.1016/j.ipm.2021.102816
Khalil, M., McGough, A. S., Pourmirza, Z., Pazhoohesh, M., & Walker, S. (2022). Machine Learning, Deep Learning and Statistical Analysis for forecasting building energy consumption – A systematic review. Engineering Applications of Artificial Intelligence, 115, 105287. https://doi.org/10.1016/j.engappai.2022.105287
Kim, H. S. (2010). hotel property characteristics and occupancy rate: Examining super deluxe 1st class hotels in Seoul, Korea. International Journal of Tourism Sciences, 10(3), 25–47. https://doi.org/10.1080/15980634.2010.11434630
Kolomoyets, Y., & Dickinger, A. (2023). Understanding value perceptions and propositions: A machine learning approach. Journal of Business Research, 154, 113355. https://doi.org/10.1016/j.jbusres.2022.113355
Koupriouchina, L., van der Rest, J. P., & Schwartz, A. (2014). On revenue management and the use of occupancy forecasting error measures. International Journal of Hospitality Management, 41, 104–114. https://doi.org/10.1016/j.ijhm.2014.05.002
Li, X., Li, H., Pan, B., & Law, R. (2020). Machine learning in internet search query selection for tourism forecasting. Journal of Travel Research, 60(6), 1213–1231. https://doi.org/10.1177/0047287520934871
Lim, C. (1997). Review of international tourism demand models. Annals of Tourism Research, 24(4), 835–849. https://doi.org/10.1016/S0160-7383(97)00049-2
Mehmood, F., Ghani, M. U., Ghafoor, H., Shahzadi, R., Asim, M. N., & Mahmood, W. (2022). EGD-SNet: A computational search engine for predicting an end-to-end machine learning pipeline for Energy Generation & Demand Forecasting. Applied Energy, 324, 119754. https://doi.org/10.1016/j.apenergy.2022.119754
Prajwala, T. R. (2015). A comparative study on decision tree and random forest using R tool. International Journal of Advanced Research in Computer and Communication Engineering, 4(1), 196–199. https://doi.org/10.17148/IJARCCE.2015.4142
Qin, Q., Huang, Z., Zhou, Z., Chen, C., & Liu, R. (2023). Crude oil price forecasting with machine learning and Google search data: An accuracy comparison of single-model versus multiple-model. Engineering Applications of Artificial Intelligence, 123, 106266. https://doi.org/10.1016/j.engappai.2023.106266
Sánchez, E. C., Sánchez-Medina, A. J., & Pellejero, M. (2020). Identifying critical hotel cancellations using artificial intelligence. Tourism Management Perspectives, 35, 100718. https://doi.org/10.1016/j.tmp.2020.100718
Sánchez-Medina, A. J., & Sánchez, E. C. (2020). Using machine learning and big data for efficient forecasting of hotel booking cancellations. International Journal of Hospitality Management, 89, 102546. https://doi.org/10.1016/j.ijhm.2020.102546
Sayed, Y. A. K., Ibrahim, A. A., Tamrazyan, A. G., & Fahmy, M. F. M. (2023). Machine-learning-based models versus design-oriented models for predicting the axial compressive load of FRP-confined rectangular RC columns. Engineering Structures, 285, 116030. https://doi.org/10.1016/j.engstruct.2023.116030
Strielkowski, W., Vlasov, A., Selivanov, K., Muraviev, K., & Shakhnov, V. (2023). Prospects and challenges of the machine learning and data-driven methods for the predictive analysis of power systems: A review. Energies, 16(10), 4025. https://doi.org/10.3390/en16104025
Sun, C., & Lu, J. (2023). The relative roles of different land-use types in bike-sharing demand: A machine learning-based multiple interpolation fusion method. Information Fusion, 95, 384–400. https://doi.org/10.1016/j.inffus.2023.02.033
Sun, J., Dang, W., Wang, F., Nie, H., Wei, X., Li, P., Zhang, S., Feng, Y., & Li, F. (2023). Prediction of TOC content in organic-rich shale using machine learning algorithms: Comparative study of random forest, support vector machine, and XGBoost. Energies, 16(10), 4159. https://doi.org/10.3390/en16104159
van Eck, N. J., & Waltman, L. (2023). VOSviewer manual. https://www.vosviewer.com/documentation/Manual_VOSviewer_1.6.19.pdf
Viverit, L., Heo, C. Y., Pereira, L. N., & Tiana, G. (2023). Application of machine learning to cluster hotel booking curves for hotel demand forecasting. International Journal of Hospitality Management, 111, 103455. https://doi.org/10.1016/j.ijhm.2023.103455
Yang, Y., Pan, B., & Song, H. (2014). Predicting hotel demand using destination marketing organization’s web traffic data. Journal of Travel Research, 53(4), 433–447. https://doi.org/10.1177/0047287513500391
Yang, Y., Tang, J., Luo, H., & Law, R. (2015). Hotel location evaluation: A combination of machine learning tools and web GIS. International Journal of Hospitality Management, 47, 14–24. https://doi.org/10.1016/j.ijhm.2015.02.008
Zhai, Q., Tian, Y., Luo, J., & Zhou, J. (2023). Hotel overbooking based on no-show probability forecasts. Computers & Industrial Engineering, 180, 109226. https://doi.org/10.1016/j.cie.2023.109226