4.1. Model Effectiveness
This section presents the results of the
robertaSentimentFT_Tripadvisor model evaluation with the test dataset containing 770 randomly selected reviews from the augmented dataset with 7969 reviews in total. In the test dataset, there are 98 negative reviews, 220 neutral reviews, and 452 positive reviews.
Figure 7 shows the confusion matrix for the model using data augmentation and random oversampling methods. The confusion matrix visually illustrates the effectiveness of the selected model for this multi-classification problem. Reviews can be classified by the model into one of three distinct classes: negative, neutral, and positive, denoted in the confusion matrix by 0, 1, and 2, respectively. Therefore, the confusion matrix in
Figure 7 contains three rows (actual or true classes) and three columns (predicted classes). For example, by checking the first row of the confusion matrix, it can be seen that the vast majority of actual negative reviews from the test dataset (92/98) were classified by the trained model correctly, five were classified wrongly as neutral, and only one actual negative review was classified wrongly as a positive review. Similarly, in the second and third rows, there were only a handful of misclassifications of actual neutral and positive reviews. By observing the confusion matrix columns, it can be seen that the model classified 95/770 reviews as negative, 228/770 reviews as neutral, and 443/770 as positive, which is close to the actual distribution of reviews. For class 0 (negative reviews), we calculated the values of
= 92 (92 instances were correctly classified as class 0),
= 6 (5 + 1 class 0 instances were wrongly classified as class 1 or class 2),
= 3 (3 + 0 class 1 and class 2 instances were wrongly classified as class 0), and
= 669 (209 + 8 + 18 + 434 class 1 or class 2 instances were correctly classified as class 1 or class 2), which gave a false-negative rate of
= 6.1% and a false-positive rate of
= 0.44%. On the other hand, for class 1, we found the values of
= 209,
= 11,
= 23, and
= 527, resulting in the following values:
= 5% and
= 4.18%. Lastly, class 2 has
= 434,
= 18,
= 9, and
= 309, as well as
= 3.98% and
= 2.83%. Therefore, the false-negative rate was similar for all three classes, but the false-positive rate for class 1 and class 2 were much higher than that of class 0. Because of the nuances and ambiguities between the neutral and positive reviews, this model sometimes decided to classify an actual class 1 review into class 2 and vice versa.
Table 5 provides detailed performance metrics of the model, including the accuracy, precision, recall, and
measure. Because of the multi-classification problem, we first calculated the precision, recall, and
score for each of the three classes, and the values presented in
Table 5 are their average values as arithmetic means. The performance of the model is given for both raw and preprocessed data using data augmentation and/or random oversampling methods. An analysis of the obtained results showed that the model performs excellently in recognizing sentiments in reviews when both random oversampling and data augmentation were applied. In this case, all four performance metrics had a high value of 95% (highlighted in bold in
Table 5), which can be also verified from the confusion matrix presented in
Figure 7. However, it must be noted that the individual precision values for classes 0, 1, and 2 vary between 90.09% for class 1, 96.84% for class 0, and 97.97% for class 2. On the other hand, the individual recall values for classes 0, 1, and 2 are quite similar—93.88% for class 0, 95% for class 1, and 96.02% for class 2.
4.2. Sentiment Analysis of Reviews Written between 1 January 2022 and 31 December 2023
This section presents the results of the sentiment analysis of the previously excluded reviews, which were written in the period from 1 January 2022 to 31 December 2023. A total of 371 reviews were analyzed and classified. An exploratory data analysis was conducted to create a series of charts for a better understanding of the data.
Figure 8 shows a diagram with the distribution of sentiments in the reviews obtained by our model. The results show that the majority of visitors rated their experience in Dubrovnik as positive (92.7%), while neutral (4%) and negative (3.2%) reviews are rarer. This is in line with the rest of the initial dataset of reviews from 2017 to 2021, as presented in
Figure 3.
The actual ratings play an important role in reflecting the opinions of visitors. The results in
Figure 9 show that the majority of visitors rated their experience in Dubrovnik as excellent, with 265 reviews rated as 5. This is followed by 64 ratings of 4 and 28 reviews rated as 3. Ratings of 2 and 1 are less common, with only 8 and 6 reviews, respectively. It can be seen that 88.7% of the reviews (329/371) were rated, by users, as 4 or 5, and the model classified 92.7% of the reviews as positive. This means that the model also found positive sentiments in the texts of some reviews rated as 3, 2, or 1.
The distribution of sentiments by ratings can be illustrated on the basis of user ratings and sentiments in more detail.
Figure 10 shows how the sentiments detected by the model changes with the different ratings. As expected, only positive sentiments predominate for reviews with a score of 5. Positive sentiments also predominate in reviews with a rating of 4, with a lower number of neutral views. A rating of 3 shows a diversity between positive, neutral, and negative sentiments, but positive sentiments are still the most prevalent. Lower ratings (1 and 2) contain predominantly negative opinions, albeit in smaller numbers, but also sometimes positive and neutral sentiments.
Figure 11 shows the distribution of sentiments over time. In 2022, the positive sentiments remain stronger during the pre-season despite fewer reviews. There is a sharp increase in positive reviews at the start of the season, in late March and April, while the number of negative and neutral reviews remain relatively low. The positive sentiments continue to increase over the course of the season, even after a slight decline in August. The number of negative reviews reaches its peak in August, while September stands out with the most positive reviews. At the beginning of the post-season, in late September and October, there is a sudden drop in positive sentiments, as expected by the fewer visitors in Dubrovnik, which weakens by the end of the year but still remains higher compared to neutral and negative sentiments. In contrast to 2022, 2023 shows a larger increase in positive sentiments during the pre-season. However, a lower growth and a lower number of positive reviews were recorded at the start of the season. During the 2023 season, positive sentiments continue to rise slightly until July, after which they stagnate until August. It is important to note that June is characterized by the largest number of neutral reviews, July by the largest number of positive reviews, and September by the largest number of negative reviews. At the beginning of the post-season, the positive sentiments decrease sharply until the end of the year.
Figure 12 shows the differences in sentiments between 2022 and 2023. The absolute number of reviews with a positive sentiment decreases by 45%, while neutral and negative sentiments increase by 100%. When comparing relative values, it can be seen that the proportion of reviews with positive sentiments decreased from 96.1% in 2022 to 87.1% in 2023, those with neutral sentiments increased from 2.1% in 2022 to 7.1% in 2023, and those with negative sentiments also increased from 1.7% in 2022 to 5.7% in 2023.
It was also important to examine how the length of the reviews differed according to sentiments.
Figure 13 shows the distribution of sentiments according to the average character length within the reviews. In 2022, the longest reviews, on average, were negative (448.75 characters), while neutral (342.6 characters) and positive (304.5 characters) ones were quite shorter. In 2023, however, the length of reviews increased: reviews with negative sentiments increased by 12.77% (to 506 characters), reviews with neutral sentiments increased by 19.23% (to 408.5 characters), and reviews with positive sentiments recorded a significant increase of 49.01% (to 453.5 characters).
In addition to the character length by sentiments, the average number of words in the reviews by sentiments were also examined in more detail, as presented in
Figure 14. In 2022, the average number of words for reviews with a negative sentiment is 79.25 words; for neutral reviews, it is 59.6 words; and positive reviews contain an average of 55.84 words. In 2023, however, the average number of words for negative reviews increased by 17.75% (to 93.25 words); for neutral reviews, by 25.84% (to 74.90 words); and for positive reviews, by 48.93% (to 83.04 words). This increase in word count and review length indicates that reviews became more pronounced and detailed in 2023.
A word cloud is a visual representation of the most frequently occurring words as a group of words that are displayed in different sizes. The larger and more emphasized a word is in the graphical representation, the more frequently it is repeated in the reviews.
Figure 15 shows the word clouds for the positive, neutral, and negative sentiments.
Data on the type of visitors were used to gain a deeper insight into the reviews. In 2022, the largest proportion of visitors were of the “unknown” type, with a share of 71.9%; while couples accounted for 16%; friends, 5.6%; families, 5.2%; and individuals, 1.3%. In 2023, the proportion of couples rose significantly to 40%, while the proportion of unknown visitors fell to 34.3%. Friends accounted for 11.4% of visitors; families, 8.6%; individuals, 5%; while the business segment had a share of 0.7%.
Figure A1 shows the shares of visitor types in 2022 and 2023.
Regarding the distribution of sentiments by visitor type, in 2022, most positive reviews were given by unknown visitors (160 positive), while couples had 37 positive reviews. Friends had 12 positive and 1 negative review; family, 11 positive and 1 negative; and individuals, 2 positive and 1 neutral review. In 2023, couples had the most positive (46), neutral (5), and negative (5) reviews. Unknown visitors had 44 positive, 2 neutral, and 2 negative reviews. Friends had 14 positive, 1 neutral, and 1 negative reviews. Family had 11 positive and 1 neutral review, individuals had 6 positive and 1 neutral review, while the business segment had 1 positive review.
Figure A2 shows the distribution of sentiments by visitor type.
Another feature that can be compared between 2022 and 2023 is the origin of the visitors. The differences between the sample and the population indicate a bias due to the language selection of the reviews, with English-speaking countries most strongly represented in the reviews.
Figure A3 highlights the 10 most represented countries in the 2022 and 2023 reviews.
In 2022, most users came from the United Kingdom, for a 47.8% share of the total. The United States followed, with a share of 19.6%, while the proportion of users whose origin was unknown was 18.2%. Other countries include Ireland (3.8%), Canada (2.4%), Australia (1.9%), Germany (1.9%), the Netherlands (1.4%), Italy (1.4%), and Romania (1.4%). In 2023, the proportion of visitors from the United Kingdom fell to 39.7%, while the proportion of visitors of unknown origin rose to 23.1%. The United States also recorded a decline in its share to 15.7%. Increases were recorded by countries such as Australia (5%), Ireland (4.1%), the Netherlands (3.3%), and Canada (3.3%). The emergence of new countries should be highlighted, including, among others, Turkey (2.5%), France (1.7%), and Brazil (1.7%).
Figure A4 provides the distribution of sentiments according to the users’ countries of origin (a total of 584) in 2022 and 2023.
In 2022, users from the UK had the most positive reviews. In the United States of America, Ireland, Canada, Australia, and Romania, the majority of reviews were also positive, albeit in smaller numbers. In several European countries, including Germany, Italy, and the Netherlands, sentiments were mixed, with both neutral and negative reviews. However, there were some changes in the distribution of sentiments in 2023. In the UK, the number of positive reviews decreased, while the number of neutral and negative reviews increased. A similar trend can be observed in the US, where the number of positive reviews decreased and the number of neutral reviews increased. In Australia and Canada, the majority of positive reviews remained the same, while there was a change in neutral and negative reviews in Ireland. It is important to note that in new countries, such as Turkey and Brazil, positive sentiments predominate, while in France, the positive and negative sentiments are balanced. To present some of the results of the model, the most positive, the most neutral, and the most negative reviews are presented. The reviews are presented in
Table 6,
Table 7 and
Table 8 along with the specific values of the model with the highest sentiment value highlighted in bold.
We can see how the model made it possible to classify the reviews into different sentiments. The negative, neutral, and positive sentiment values depend on what the user wrote in the review. To better understand the model and its results, we compared the sentiments with the ratings of the individual reviews, as shown below.
Table 9 shows the review that was rated 1 by the user and has the most positive value of the model. This is an example of a good prediction of the model as opposed to an incorrect rating by the user. This could be an unintentional error by the user, and the analysts can later decide how to evaluate such inconsistent reviews or exclude them from further analysis.