Exploring Tourist Experience through Online Reviews Using Aspect-Based Sentiment Analysis with Zero-Shot Learning for Hospitality Service Enhancement
Abstract
:1. Introduction
2. Social Media Analytics for Tourism and Hospitality
3. Materials and Methods
3.1. Dataset Collection and Preparation
- Cleaning: this preprocessing step was conducted by removing irrelevant information, like eliminating non-review-related content, such as advertisements, HTML tags, and other extraneous information;
- Tokenization: splitting the review text into individual tokens, such as words or phrases, to facilitate analysis;
- Stop-Word Removal: removing commonly used words (e.g., “and”, “the”, and “is”) that do not contribute to sentiment analysis, using a predefined stop-word list;
- Normalization: converting all text to lowercase to avoid case sensitivity issues and standardizing different spellings of the same word to ensure uniformity.
3.2. Research Framework
- Keyword Extraction: After preprocessing the dataset, the next step involved extracting keywords using a zero-shot keyword extraction technique. We employed KeyBERT, which uses the pretrained model RoBERTa (Robustly optimized BERT approach). RoBERTa was chosen over other pretrained models like BERT or GPT-3 because of its improved training methodology, which includes larger mini-batches, removing the next sentence prediction objective, and dynamically changing the masking pattern applied to the training data. These enhancements enable RoBERTa to achieve better performance in understanding the context and semantics of the text, making it highly suitable for keyword extraction.
- Aspect Candidate Preservation: After the keywords’ extraction, we then filtered the identified keywords based on frequency of appearance. We applied a simple threshold approach by taking the mean minus the standard deviation to define the threshold. The keywords that had a frequency above the threshold are then retained as aspect candidates. These keywords represented the various elements of tourist experiences discussed in the reviews, serving as the foundation for further analysis.
- Clustering of Keywords: The next step involved clustering these keywords to form more abstract aspects. K-means clustering was employed for this purpose, using word embeddings generated by RoBERTa (accessed on 5 August 2024 at https://huggingface.co/docs/transformers/en/model_doc/roberta) to calculate the similarity among keywords. RoBERTa’s robust contextual embeddings facilitated the identification of semantically similar keywords, ensuring meaningful clusters.
- Visualization and Construction of Abstract Aspects: To visualize the keyword clusters, T-SNE (t-distributed stochastic neighbor embedding) was used. T-SNE is effective in reducing the dimensionality of high-dimensional data, making it easier to visualize clusters in a two-dimensional space. This visualization helped in constructing more abstract aspects by grouping semantically similar words. Each abstract aspect represented a set of related keywords, providing a comprehensive view of the main themes and sentiments expressed in the tourist reviews.
- Segment Detection: Next, we identified segments of each review that were related to the preserved keywords. For this purpose, sentence embeddings were employed to capture the contextual meaning of sentences. Zero-shot learning was utilized again, using Sentence-BERT (SBERT) with the same pretrained RoBERTa model. Sentence-BERT effectively maps sentences to fixed-size vectors, allowing for efficient similarity calculations. This method ensures that the segments most relevant to the identified keywords are accurately detected.
- Sentiment Polarity Measurement: Each identified segment, along with its corresponding keywords, was then subjected to sentiment polarity measurement using VADER (Valence Aware Dictionary and Sentiment Reasoner). VADER is specifically designed for sentiment analysis in social media texts, providing accurate sentiment scores (positive, negative, or neutral) for each segment. This step ensured that each keyword was associated with a corresponding sentiment, reflecting the tourists’ opinions expressed in the reviews.
3.3. Zero-Shot Learning Using BERT Language Model
3.3.1. BERT Architecture
3.3.2. Pretrained Model for BERT
3.3.3. BERT Embedding
3.3.4. KeyBERT for Keyword Extraction
- Embedding Generation: BERT generates embeddings for each word in the text. These embeddings are rich, contextual representations of the words, capturing their meanings based on the surrounding words in the sentence.
- Keyword Scoring: KeyBERT calculates the similarity between the document embedding (a representation of the entire text) and individual word embeddings. This similarity score indicates how relevant each word is to the overall content of the document. Several similarities measurements (e.g., cosine, Jaccard, etc.) can be employed.
- Keyword Extraction: The top-scoring keywords, as determined by their similarity scores, are selected as the most relevant keywords representing the document. These keywords effectively summarize the main themes and topics of the text.
3.4. Analyzing Segment Sentiment with VADER
4. Results
4.1. Aspect Extraction
4.2. Aspect-Based Sentiment Analysis Results
5. Discussion
6. Conclusions and Future Works
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Guo, X.; Pesonen, J.; Komppula, R. Comparing Online Travel Review Platforms as Destination Image Information Agents. Inf. Technol. Tour. 2021, 23, 159–187. [Google Scholar] [CrossRef]
- Garner, B.; Kim, D. Analyzing User-Generated Content to Improve Customer Satisfaction at Local Wine Tourism Destinations: An Analysis of Yelp and TripAdvisor Reviews. Consum. Behav. Tour. Hosp. 2022, 17, 413–435. [Google Scholar] [CrossRef]
- Álvarez-Carmona, M.Á.; Aranda, R.; Rodríguez-Gonzalez, A.Y.; Fajardo-Delgado, D.; Sánchez, M.G.; Pérez-Espinosa, H.; Díaz-Pacheco, Á. Natural Language Processing Applied to Tourism Research: A Systematic Review and Future Research Directions. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 10125–10144. [Google Scholar] [CrossRef]
- Abbasi-Moud, Z.; Vahdat-Nejad, H.; Sadri, J. Tourism Recommendation System Based on Semantic Clustering and Sentiment Analysis. Expert Syst. Appl. 2021, 167, 114324. [Google Scholar] [CrossRef]
- Mehraliyev, F.; Chan, I.C.C.; Kirilenko, A.P. Sentiment Analysis in Hospitality and Tourism: A Thematic and Methodological Review. Int. J. Contemp. Hosp. Manag. 2022, 34, 46–77. [Google Scholar] [CrossRef]
- Raghunathan, N.; Saravanakumar, K. Challenges and Issues in Sentiment Analysis: A Comprehensive Survey. IEEE Access 2023, 11, 69626–69642. [Google Scholar] [CrossRef]
- Zhang, W.; Li, X.; Deng, Y.; Bing, L.; Lam, W. A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges. IEEE Trans. Knowl. Data Eng. 2022, 35, 11019–11038. [Google Scholar] [CrossRef]
- Jain, A.; Bansal, A.; Tomar, S. Aspect-Based Sentiment Analysis of Online Reviews for Business Intelligence. Int. J. Inf. Technol. Syst. Approach IJITSA 2022, 15, 1–21. [Google Scholar] [CrossRef]
- Jiang, Q.; Chen, L.; Xu, R.; Ao, X.; Yang, M. A Challenge Dataset and Effective Models for Aspect-Based Sentiment Analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 6280–6285. [Google Scholar] [CrossRef]
- Nazir, A.; Rao, Y.; Wu, L.; Sun, L. Issues and Challenges of Aspect-Based Sentiment Analysis: A Comprehensive Survey. IEEE Trans. Affect. Comput. 2020, 13, 845–863. [Google Scholar] [CrossRef]
- Wang, W.; Zheng, V.W.; Yu, H.; Miao, C. A Survey of Zero-Shot Learning: Settings, Methods, and Applications. ACM Trans. Intell. Syst. Technol. TIST 2019, 10, 1–37. [Google Scholar] [CrossRef]
- Shu, L.; Xu, H.; Liu, B.; Chen, J. Zero-Shot Aspect-Based Sentiment Analysis. arXiv 2022, arXiv:2202.01924. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar] [CrossRef]
- Hoang, M.; Bihorac, O.A.; Rouces, J. Aspect-Based Sentiment Analysis Using BERT. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, Turku, Finland, 30 September–2 October 2019; pp. 187–196. [Google Scholar]
- Park, H.; Jiang, S.; Lee, O.K.D.; Chang, Y. Exploring the attractiveness of service robots in the hospitality industry: Analysis of online reviews. Inf. Syst. Front. 2024, 26, 41–61. [Google Scholar] [CrossRef]
- Kim, W.; Kim, S.B.; Park, E. Mapping tourists’ destination (dis) satisfaction attributes with user-generated content. Sustainability 2021, 13, 12650. [Google Scholar] [CrossRef]
- Çevrimkaya, M.; Çavus, Ş.; Şengel, Ü. Assessment of hotels’ online complaints in domestic tourism: Mixed analysis approach. Int. J. Tour. Cities, 2024; in press. [Google Scholar] [CrossRef]
- Yan, Q.; Jiang, T.; Zhou, S.; Zhang, X. Exploring tourist interaction from user-generated content: Topic analysis and content analysis. J. Vacat. Mark. 2024, 30, 327–344. [Google Scholar] [CrossRef]
- Ghosh, P.; Mukherjee, S. Understanding tourist behaviour towards destination selection based on social media information: An evaluation using unsupervised clustering algorithms. J. Hosp. Tour. Insights 2023, 6, 754–778. [Google Scholar] [CrossRef]
- Mirzaalian, F.; Halpenny, E. Exploring destination loyalty: Application of social media analytics in a nature-based tourism setting. J. Destin. Mark. Manag. 2021, 20, 100598. [Google Scholar] [CrossRef]
- Qin, Y.; Wang, X.; Xu, Z. Ranking tourist attractions through online reviews: A novel method with intuitionistic and hesitant fuzzy information based on sentiment analysis. Int. J. Fuzzy Syst. 2022, 24, 755–777. [Google Scholar] [CrossRef]
- Skotis, A.; Livas, C. A data-driven analysis of experience in urban historic districts. Ann. Tour. Res. Empir. Insights 2022, 3, 100052. [Google Scholar] [CrossRef]
- Taecharungroj, V.; Stoica, I.S. Assessing place experiences in Luton and Darlington on Twitter with topic modelling and AI-generated lexicons. J. Place Manag. Dev. 2024, 17, 49–73. [Google Scholar] [CrossRef]
- Chen, Y.; Zhong, Y.; Yu, S.; Xiao, Y.; Chen, S. Exploring bidirectional performance of hotel attributes through online reviews based on sentiment analysis and Kano-IPA model. Appl. Sci. 2022, 12, 692. [Google Scholar] [CrossRef]
- Ayeh, J.K.; Au, N.; Law, R. “Do we believe in TripAdvisor?” Examining credibility perceptions and online travelers’ attitude toward using user-generated content. J. Travel Res. 2013, 52, 437–452. [Google Scholar] [CrossRef]
- Filieri, R.; Acikgoz, F.; Ndou, V.; Dwivedi, Y. Is TripAdvisor still relevant? The influence of review credibility, review usefulness, and ease of use on consumers’ continuance intention. Int. J. Contemp. Hosp. Manag. 2021, 33, 199–223. [Google Scholar] [CrossRef]
- Chen, C.Y.; Li, C.T. ZS-BERT: Towards Zero-Shot Relation Extraction with Attribute Representation Learning. arXiv 2021, arXiv:2104.04697. [Google Scholar] [CrossRef]
- Wang, Y.; Wu, L.; Li, J.; Liang, X.; Zhang, M. Are the BERT Family Zero-Shot Learners? A Study on Their Potential and Limitations. Artif. Intell. 2023, 322, 103953. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar] [CrossRef]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar] [CrossRef]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations. arXiv 2019, arXiv:1909.11942. [Google Scholar] [CrossRef]
- Selva Birunda, S.; Kanniga Devi, R. A Review on Word Embedding Techniques for Text Classification. In Innovative Data Communication Technologies and Application: Proceedings of ICIDCA 2020; Springer Nature: Berlin, Germany, 2021; pp. 267–281. [Google Scholar]
- Puccetti, G.; Miaschi, A.; Dell’Orletta, F. How Do BERT Embeddings Organize Linguistic Knowledge? In Proceedings of the Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, Dublin, Ireland, 10 June 2021; pp. 48–57. [Google Scholar] [CrossRef]
- Wiedemann, G.; Remus, S.; Chawla, A.; Biemann, C. Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings. arXiv 2019, arXiv:1909.10430. [Google Scholar] [CrossRef]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. arXiv 2019, arXiv:1908.10084. [Google Scholar] [CrossRef]
- Lee, J.S.; Hsiang, J. Patent Classification by Fine-Tuning BERT Language Model. World Pat. Inf. 2020, 61, 101965. [Google Scholar] [CrossRef]
- Hutto, C.; Gilbert, E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In Proceedings of the International AAAI Conference on Web and Social Media, Ann Arbor, MI, USA, 1–4 June 2014; Volume 8, pp. 216–225. [Google Scholar] [CrossRef]
- Bonta, V.; Kumaresh, N.; Janardhan, N. A Comprehensive Study on Lexicon Based Approaches for Sentiment Analysis. Asian J. Comput. Sci. Technol. 2019, 8, 1–6. [Google Scholar] [CrossRef]
- Toubes, D.R.; Araújo Vila, N.; Fraiz Brea, J.A. Changes in Consumption Patterns and Tourist Promotion after the COVID-19 Pandemic. J. Theor. Appl. Electron. Commer. Res. 2021, 16, 1332–1352. [Google Scholar] [CrossRef]
- Kalia, P.; Mladenović, D.; Acevedo-Duque, Á. Decoding the Trends and the Emerging Research Directions of Digital Tourism in the Last Three Decades: A Bibliometric Analysis. Sage Open 2022, 12, 21582440221128179. [Google Scholar] [CrossRef]
- Rachão, S.; Breda, Z.; Fernandes, C.; Joukes, V. Food Tourism and Regional Development: A Systematic Literature Review. Eur. J. Tour. Res. 2019, 21, 33–49. [Google Scholar] [CrossRef]
- Naruetharadhol, P.; Gebsombut, N. A Bibliometric Analysis of Food Tourism Studies in Southeast Asia. Cogent Bus. Manag. 2020, 7, 1733829. [Google Scholar] [CrossRef]
- Kalnaovakul, K.; Promsivapallop, P. Hotel Service Quality Dimensions and Attributes: An Analysis of Online Hotel Customer Reviews. Tour. Hosp. Res. 2023, 23, 420–440. [Google Scholar] [CrossRef]
- Ali, B.J.; Gardi, B.; Othman, B.J.; Ahmed, S.A.; Ismael, N.B.; Hamza, P.A.; Anwar, G. Hotel Service Quality: The Impact of Service Quality on Customer Satisfaction in Hospitality. Int. J. Eng. Bus. Manag. 2021, 5, 14–28. [Google Scholar] [CrossRef]
- Chen, J.; Park, H.; Fan, P.; Tian, L.; Ouyang, Z.; Lafortezza, R. Cultural Landmarks and Urban Landscapes in Three Contrasting Societies. Sustainability 2021, 13, 4295. [Google Scholar] [CrossRef]
- Cicerali, E.E.; Kaya Cicerali, L.; Saldamlı, A. Linking Psycho-Environmental Comfort Factors to Tourist Satisfaction Levels: Application of a Psychology Theory to Tourism Research. J. Hosp. Mark. Manag. 2017, 26, 717–734. [Google Scholar] [CrossRef]
- PJ, S.; Singh, K.; Kokkranikal, J.; Bharadwaj, R.; Rai, S.; Antony, J. Service Quality and Customer Satisfaction in Hospitality, Leisure, Sport and Tourism: An Assessment of Research in Web of Science. J. Qual. Assur. Hosp. Tour. 2023, 24, 24–50. [Google Scholar]
Model Name | Architecture | Training Data | Performance |
---|---|---|---|
BERT-Base [13] | 12 layers, 768 hidden units, 12 attention heads | BooksCorpus and English Wikipedia (16 GB) | Good baseline performance on NLP tasks |
BERT-Large [13] | 24 layers, 1024 hidden units, 16 attention heads | BooksCorpus and English Wikipedia (16 GB) | Good baseline performance on NLP tasks |
RoBERTa [30] | Similar to BERT (Base and Large variants) | 160 GB of diverse text data | Superior performance on NLP benchmarks |
DistilBERT [31] | 6 layers, 768 hidden units, 12 attention heads | Same data as BERT | Slightly lower accuracy, much faster, and more efficient |
ALBERT [32] | Parameter sharing across layers, factorized embeddings | Same data as BERT | Comparable to BERT-Large with reduced memory usage |
Aspect Name | Keywords | Frequency |
---|---|---|
Food in general | foods, food, menu, meal, dishes, breakfast | 5914 |
Food items (specified) | beef, meat, pork, seafood, noodles, pasta, pizza, steak, sushi, vegetarian, soup, chicken, fish, lamb, mushroom | 3087 |
Specified Indonesian cuisine | bakso, gudeg, soto | 303 |
Desserts and beverage | cake, dessert, snacks, cafe, milk, chocolate, coffee | 1561 |
Dining and cooking | bar, kitchen, cook | 289 |
Restaurants service | restaurant, restaurants, chef | 5522 |
Shopping | supermarket, mall, shopping, shop, shops | 780 |
Landmarks | borobudur, sunrise, sunset | 370 |
Wellness | massage, spa | 257 |
Transportation | airport, car, travel agent | 152 |
Travel and tour | holiday, trip, vacation | 342 |
Hospitality | hospitality, guest, guests, receptionist, waitress | 616 |
Accommodation | hostel, hotel, hotels, favehotel, cleanliness, pool | 4246 |
Cultural items | batik, wayang | 110 |
Toiletries | bath, bathroom, toilet, towels | 1064 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nawawi, I.; Ilmawan, K.F.; Maarif, M.R.; Syafrudin, M. Exploring Tourist Experience through Online Reviews Using Aspect-Based Sentiment Analysis with Zero-Shot Learning for Hospitality Service Enhancement. Information 2024, 15, 499. https://doi.org/10.3390/info15080499
Nawawi I, Ilmawan KF, Maarif MR, Syafrudin M. Exploring Tourist Experience through Online Reviews Using Aspect-Based Sentiment Analysis with Zero-Shot Learning for Hospitality Service Enhancement. Information. 2024; 15(8):499. https://doi.org/10.3390/info15080499
Chicago/Turabian StyleNawawi, Ibrahim, Kurnia Fahmy Ilmawan, Muhammad Rifqi Maarif, and Muhammad Syafrudin. 2024. "Exploring Tourist Experience through Online Reviews Using Aspect-Based Sentiment Analysis with Zero-Shot Learning for Hospitality Service Enhancement" Information 15, no. 8: 499. https://doi.org/10.3390/info15080499
APA StyleNawawi, I., Ilmawan, K. F., Maarif, M. R., & Syafrudin, M. (2024). Exploring Tourist Experience through Online Reviews Using Aspect-Based Sentiment Analysis with Zero-Shot Learning for Hospitality Service Enhancement. Information, 15(8), 499. https://doi.org/10.3390/info15080499