2. Materials and Methods
Search engines and OTR sites are powerful tools because of their ability to index and organize vast amounts of information. For our research, we chose to measure reviews from the most popular OTR site, TripAdvisor, the largest online user-generated reviews site in the tourism industry (e.g., Similarweb.com). Our data extraction was performed through the website apify.com.
We collected a sample of 2619 reviews in the category “Attractions”, for the destination of Kastoria [
22]. This category contains 27 entries. From these entries, after checking, we removed a total of 7 entries (three cafes, one internet cafe, and three fur shops), as they did not belong to the category. In addition, for two entries there was no review. The 2577 remaining reviews are written in 17 different languages by 1631 tourists from at least 45 countries who visited Kastoria between 2012 and 2024. From our sample, we obtained significant results both on the spatio-temporal distribution of tourists and on evaluative and emotional dimensions of TDI.
2.1. Content Analysis
Reviews play a crucial role in sharing opinions about products, places, or experiences. Whether positive or negative, they provide valuable insights that help others make informed decisions. Analyzing content in such a large number of texts written by different people can be a painstaking and lengthy process. However, it is a systematic, reproducible technique that allows keywords or key phrases to be compressed into a few categories of content. This approach enables researchers to systematically navigate large datasets with relative ease [
23]. In this study, we used quantitative content analysis in two phases: word splitting and frequency analysis. In addition, we created a Python [
24] program to build word clouds for each language (
Figure 1). From these word clouds, we automatically removed the stop words of each language using files specifically created for this purpose. We treated the space character and other word separators, such as commas and question marks, as delimiters. We did not differentiate between uppercase and lowercase letters and used two counters: one for the total words in the text, including stop words, and another for the unique keywords.
Word frequency analysis has a limitation in that it cannot automatically analyze complex words or word groups with different meanings, whether used together or separately, using open-source software. To address this, a custom program, such as one developed in Python, is necessary. Despite this limitation, we assume that the most frequently mentioned words highlight the primary concerns of the tourists who wrote the reviews, thereby providing researchers with valuable insights into key points of interest.
In addition, we used TextBlob [
25], a Python library developed by Steven Loria, used for natural language processing (NLP) and based on the NLTK (Natural Language Toolkit). For each sentence (input), the analysis provides two outputs: polarity and subjectivity. The polarity score ranges from −1 to 1, where −1 indicates highly negative language (e.g., “disgusting” or “awful”) and 1 indicates highly positive language (e.g., “excellent” or “best”). The subjectivity score, ranging from 0 to 1, reflects the degree of personal opinion in the sentence, with a score closer to 1 indicating a predominance of subjective opinions over factual content.
2.2. Image Analysis
Of the 2577 reviews we extracted from TripAdvisor, 537 were accompanied by 2598 photos uploaded by users. For the analysis of the images, we conducted three different checks. First, we sorted the photos into two groups: those containing people and those without. For photos featuring people, we further categorized them into three groups: (a) portraits of the photographers themselves, (b) people close to them, or (c) random passers-by. The second check was done with the help of AI, specifically with the imagerecognize tool which uses Convolutional Neural Network so that the output contains only a single probability score vector, which is organized along the depth dimension. Using an API key we provided the first images from each review for object recognition by AI. This returned a list of up to 10 words describing the image if it had a confidence score above 80%. Finally, we created a Python program, taking advantage of the k-means clustering algorithm and the PIL, tqdm, NumPy and sklearn.cluster libraries, to generate a png file with the 10 most dominant color clusters of each photo. We used these images to associate colors with emotions. Using the above libraries and additional libraries, including pandas, webcolors and collections, and based on the illustration of different emotion correlations and the Mikels’ emotion wheel [
19], we created a Python program that took as input the palette of 10 colors we had created for each photo and returned the three emotions, out of the 23 emotions we associated with the 10 main colors, that resulted for each palette, as shown in the script below:
def get_emotions_from_colors(colors):
“““It returns the three main emotions associated with a list of colors.”““
color_emotions = {
“red”: [“passion”, “energy”, “anger”],
“orange”: [“enthusiasm”, “creativity”, “warning”],
“yellow”: [“happiness”, “optimism”, “anxiety”],
“green”: [“calm”, “nature”, “envy”],
“blue”: [“peace”, “trust”, “sadness”],
“purple”: [“luxury”, “mystery”, “spirituality”],
“pink”: [“love”, “compassion”, “immaturity”],
“brown”: [“stability”, “nature”, “dullness”],
“black”: [“power”, “sophistication”, “mourning”],
“white”: [“purity”, “innocence”, “emptiness”]
}
2.3. Research Questions
The proposed method includes the following phases: (a) data collection, (b) metadata mining, and (c) quantitative analysis. As the world’s largest source of user-generated content (UGC) in the tourism sector, TripAdvisor offers the advantage of providing a vast open dataset due to the enormous volume of user reviews it hosts. Additionally, TripAdvisor’s reputation management system enhances transparency by allowing access to user profiles, other reviews, votes, and ratings, while also encouraging users to submit credible reviews [
26].
The research questions that we will try to answer with this paper are the following:
RQ1. Has there been an increase in reviews over time?
RQ2. Is there a change in the number of reviews written about Kastoria depending on the time of year?
RQ3. Which attractions are most visited by tourists from different parts of the world?
RQ4. What appears more often in the photos, people, or landscapes, and which sights do visitors choose most to include in their reviews (e.g., lake, mansions)?
RQ5. Does the existence of photos affect the likelihood of interaction with other TripAdvisor users?
RQ6. What is the general impression tourists get from visiting Kastoria?
3. Results
The analysis of the 2577 reviews revealed a significant focus on two main attractions: the Lake of Kastoria, accounting for 33.1% (
n = 853) of the reviews, and the Dragon Cave, with 29.2% (
n = 752), together comprising 62.3% of the total reviews for the region’s attractions. The third most-reviewed location was the Panagia Mavriotissa Monastery, representing 14% (
n = 360) of the reviews, followed by the Kastoria Aquarium with 7.8% (
n = 202). A detailed breakdown of reviews per attraction is provided in
Table 1.
The analysis of 2546 reviews, which indicate the year the trip took place, revealed that 19.8% (
n = 503) of the visits occurred between October 2016 and September 2017. In addressing our first research question (RQ1), we observed that the number of reviews steadily increased from May 2011, when the first review was recorded, until April 2017, after which it gradually declined until February 2020. In March 2020, with the onset of the COVID-19 pandemic, Kastoria became one of the first regions to enforce strict mobility restrictions due to outbreaks [
27]. Notably, from October 2020 to May 2021, no reviews were posted on TripAdvisor for the attractions section in Kastoria.
Figure 2 illustrates the temporal distribution of visits to Kastoria.
Our second research question (RQ2) examines the period when tourists typically visit Kastoria. Based on the temporal distribution, the months with the highest number of visitors are, in descending order, December (11.5%,
n = 294), August (10.7%,
n = 272), October (10.4%,
n = 264), and January (10.1%,
n = 257). Conversely, the city experiences the fewest visitors in June (4.5%,
n = 115) and July (5.9%,
n = 151). According to our survey data, the peak visitor period is in the winter months of December and January, followed by August, October, and April. This indicates that tourism in Kastoria is highly seasonal.
Figure 3 provides a detailed breakdown of visit percentages by month.
To address our third research question (RQ3) regarding the homogeneity of reviews for each attraction and the language in which they were written, we utilized the non-factorial chi-square test after confirming normality. The analysis revealed a statistically significant dependence between the reviews for each attraction and the language used, at a significance level of α = 0.05 (χ
2 = 20,666.420, df = 16, sig < 0.001). Consequently, we reject the null hypothesis (which states that there is no correlation between language and the attraction visited) and conclude that tourists from different parts of the world tend to visit different attractions in the city. Detailed data for the top eight languages are provided in
Table 2a,b, while
Figure 4 illustrates the preferences of Greek and Russian tourists.
The fourth research question (RQ4) examines the topics most commonly raised by visitors in their reviews, such as people, sights, and landscapes. To investigate, we analyzed 536 of the 2598 photos—the ones that users uploaded first in their reviews. The image analysis involved three checks.
First, we manually categorized the photos into those with or without people, finding that 84.9% (n = 455) did not include people. For the 81 photos with people, we divided them into three categories: (a) portraits of the visitors themselves, (b) close companions, or (c) random passers-by.
For the 81 photos with people, we divided them into three categories: (a) portraits of the visitors themselves (7.4%, n = 6), (b) close companions (35.8%, n = 29), and (c) random passers-by (59.3%, n = 48). Among these, only six photos were self-portraits, representing 7.4% of the total, with one being a solo selfie (1.2%), another featuring the visitor with a friend (1.2%), and the remaining four depicting couples (4.9%), either taken as selfies or with assistance. Additionally, 29 photos (35.8%) showed close companions, including five (6.2%) where the visitor also appeared, and three of these (3.7%) featured random passers-by. In 24 photos (29.6%), only close companions were visible in front of an attraction, while in 2 (2.5%), random passers-by were also present. Finally, 48 photos (59.3%) captured random passers-by near the attractions.
The second check used AI to analyze the photos, generating 4248 descriptive words with over 80% confidence. These words included 338 unique terms, with the 10 most frequent ones summarized in
Table 3.
Artificial intelligence identified more photos as containing people than we did, as it mistakenly recognized exhibits from the wax museum and hagiographies as people. The words returned by the AI’s API to describe people, ranked by frequency, were as follows: Person (n = 103), Female (n = 18), Woman (n = 17), Head (n = 12), Man (n = 11), Male (n = 8), People (n = 4), Child (n = 1), Girl (n = 1), and Lady (n = 1).
Finally, we analyzed the emotions evoked by the colors in the 536 photos (
Figure 5), according to different emotion correlations and Mikels’ emotion wheel [
19]. From a total of 1608 responses identifying the three primary emotions associated with each photo, 72% (
n = 1152) were distributed among the following emotions, ranked by frequency: Nature 17% (
n = 272), Dullness and Stability 15% each (
n = 249), and Compassion and Love 12% each (
n = 191), as the predominant colors are brown and green.
In response to our fifth research question (RQ5) about whether the presence of photos in a review generates more interaction on TripAdvisor, we found the following results. In our sample, 79.2% (n = 2041) of the reviews were not accompanied by photos, while the remaining 20.8% (n = 536) included a total of 2666 photos (Median = 4.97). Among the reviews with photos, 18.6% (n = 479) contained between one and 10 photos, whereas 2.2% of reviews had between 11 and 63 photos. The total number of helpful votes across all 2577 reviews was 1575. Of these, 47% (n = 737) were attributed to the 20.8% (n = 536) of reviews that included photos. Notably, the 11 reviews with more than 10 helpful votes all had photos, collectively accounting for 27% (n = 426) of the total helpful votes. Conversely, 1826 reviews received no helpful votes, with 19% (n = 341) of these reviews including photos and 81% (n = 1485) having none. These findings highlight the significant role of photos in increasing user interaction and garnering helpful votes for reviews.
In our final research question (RQ6), we aimed to determine the overall impression tourists have of their visit to Kastoria. Analyzing the review scores, we found that the majority of visitors leave with a very good impression of the city (66%, n = 1696). Additionally, 26% (n = 682) leave with a good impression, while smaller percentages leave with neutral (5%, n = 138), bad (1%, n = 33), or very bad impressions (1%, n = 28).
Along with the ratings that visitors assigned to the attractions they visited, we also analyzed the comments they included in their reviews. We developed a Python program that leverages the TextBlob library for natural language processing, along with the langdetect and googletrans libraries. These libraries enable the detection of supported languages (up to 55) through Google and their translation, allowing TextBlob to perform sentiment analysis on the text. The analysis of 2577 guest comments, after mapping polarity values to a 5-point scale for comparison with TripAdvisor scores, revealed the following results: 0.1% (
n = 2) of the comments reflected very bad impressions, 1.4% (
n = 37) bad impressions, 28.2% (
n = 727) neutral impressions, 56.6% (
n = 1458) good impressions, and only 13.7% (
n = 353) very good impressions. These findings highlight a significant discrepancy between user-assigned scores and AI-extracted sentiment scores from comments, with the AI scores averaging −0.72 points lower than those given by users. Specifically, only 27.4% (
n = 706) of the comments showed no difference between the user’s TripAdvisor score and the sentiment analysis score derived from their comment. Among the remaining comments, the majority (48.6%,
n = 1252) had a one-point lower sentiment analysis score than the TripAdvisor score, 15.6% (
n = 402) had a two-point lower score, 0.6% (
n = 16) had a three-point lower score, and 0.1% (
n = 2) had a four-point lower score. Conversely, 6.1% (
n = 157) had a one-point higher sentiment analysis score, and 1.6% (
n = 42) showed an increase of two to four points.
Figure 6 illustrates the distribution of the TextBlob results.
4. Discussion
Our research sheds light on visitor trends in the city of Kastoria over time and demonstrates the potential of using big data and AI to enable small municipalities to monitor these trends effectively. This can help them tailor promotional activities to align better with their target audiences.
We also found that Kastoria’s tourism traffic has not mirrored the growth observed nationally and globally since the COVID-19 crisis. This finding aligns with the results of [
28] but stands in opposition to any optimistic image that may have been conveyed. Our analysis revealed differing attitudes toward the monuments visited and the evaluation of services, influenced by the nationality of the visitors.
Although the reviews are generally positive, focusing on Kastoria’s strong points, such as natural beauty and monuments, in many cases, reviewers mentioned that they were unaware of the attractions available in the area. As shown in the word clouds (
Figure 1), the prominent words in English are “lake”, “Kastoria”, and “cave”; in Greek, they are “lake”, “worth”, and “cave”; and in German, they are “lake”, “location”, and “Kastoria”. Moreover, TripAdvisor is lacking key assets of the region, as none of the Destination Management Organizations (DMOs) have ensured their inclusion. This gap in communicating the region’s tourism offerings is evident, highlighting opportunities for professionals working in tourism communication to address and capitalize on these shortcomings.
We also identified distinct patterns in the photos users post on TripAdvisor compared to other social media platforms. Specifically, nearly 90% of the photos in our sample did not feature the users themselves or their close companions. This can be explained by the nature of TripAdvisor, mainly utilized to present attractions and help others with valuable information for other travelers, whereas Instagram would be more relevant in that regard for selfies and self-promotion.
Lastly, our findings on the use of AI reflect both its potential and its limitations at present. While there is a lot of tools and free libraries that allow researchers to take advantage of AI in order to ease their work, the quality of LLMs might be relatively low and full of biases—especially for less common languages like Greek. For instance, TextBlob assigned a very negative polarity score of −1 to the comment, “Panoramic view of the city and the whole lake!! magic!!!! Visit it for sure for photography! Perfect for climbing on foot or else by road! There is also a perfect cafe to rest in a warm atmosphere and drink a hot coffee in all this cold!!!!!” despite its overtly positive sentiment. On the other hand, the following comment was mistakenly assigned a strongly positive polarity score of 1: “The cave beautiful though small! The guide told us almost nothing, he ran until we got to the end, let us take 2 pictures and gave us directions on how to get to the exit! The shortest tour ever!” As for the discrepancies between the user review ratings and the polarity scores from the review texts, it seems that because of the language nuances it is very challenging for AI tools to decide if a review is good or very good, bad or very bad and most importantly neutral or good/bad. However, 92% reviews are rated by users as good or very good, and around 70% of reviews are categorized by the AI tool as good or very good which can be considered as acceptable. Despite these discrepancies, in the fields of natural language processing, sentiment analysis and computer vision, the state-of-the-art capabilities of AI truly make it a very promising and valuable tool for any researcher.
5. Conclusions
The proposed method allows for the analysis and measurement of the image perceived by travelers as transmitted by other travelers (eWoM). The number of OTRs analyzed (11,328 in total from 137 listings in the Kastoria region, with 2577 specifically in the attractions category) constitutes a robust sample size; therefore, reliable insights can be generated, and actionable business intelligence can be derived.
The method is reliable for several reasons: quantitative content analysis of stored big data has a low probability of error. The source of information, user-generated content (UGC), is widely regarded as reliable. Furthermore, the abundance of readily available information, freely accessible on travel-related websites hosting travel blogs and online travel reviews (OTRs), makes data collection both extensive and straightforward. A significant advantage of UGC data research is its relative freedom from the biases often associated with questionnaire survey results, as the data originate from individuals who voluntarily express their opinions rather than responding to structured survey questions, ensuring more authentic and uninfluenced input for analysis.
The analysis of the data focused on identifying the spatio-temporal distribution of reviews, as well as determining the most popular and best-rated attractions. Our findings revealed that the most popular attractions are concentrated along the lakeside road. This insight could be leveraged by destination management organizations (DMOs) to design targeted promotional strategies for less-visited attractions. The temporal dimension of the data also provides valuable information on the evolution of the tourist destination, including seasonal variations in visitor numbers. Additionally, content analysis of reviews could uncover recurring issues, such as inadequate opening hours, insufficient guide training, or the lack of information available in certain languages, allowing for timely interventions.
The case of Kastoria tourists’ impressions shows how this approach can be applied to other destinations worldwide. Using UGC from platforms like TripAdvisor, researchers and DMOs can identify tourist behavior, preferences, and emotional responses to attractions. This approach, merging sentiment analysis, spatio-temporal trends, and content analysis, introduces a framework for understanding global tourism dynamics. It is thus able to show seasonal patterns or attraction underperformance, and these emotional responses are coupled with features, which is extremely important in comparing destinations and thus sending tailored marketing and communication.
In the future, conducting similar research on other platforms that host user reviews, such as Google Maps, would be beneficial, as many regional attractions are not listed on TripAdvisor. Expanding the scope to include these platforms would provide a more comprehensive understanding of the region’s tourism landscape.