Combining Cluster-Based Profiling Based on Social Media Features and Association Rule Mining for Personalised Recommendations of Touristic Activities
Abstract
:1. Introduction
- We introduce a method for analysing data from social media to build user profiles that encapsulate their travel preferences and habits.
- We present insights into the potential benefits of the combination of cluster analysis and association rule mining in tourism recommenders.
- We provide the results of in-depth experiments using a comprehensive set of numerical evaluation metrics to gauge the benefits of social media-based clustering for user profiling.
2. Related Work
2.1. Social Media User Profiling
2.2. Improving Recommendations with Clustering and Association Rule Mining
3. A Recommender of Tourist Activities Based on Clustering and Rule Mining
3.1. System Overview
3.2. Data Collection and Pre-Processing
3.2.1. About Twitter
3.2.2. Identification of Tweets from Tourists
Algorithm 1 Identification of tourists’ tweets (pseudocode) |
|
3.2.3. Activity Identification
- Open Street Map (OSM) [51]: It is an open-source map server which includes cartographic documentation of roads, streets, water bodies, buildings, etc. It also provides geocoding and geoparsing services. As it is open source, it has become the first choice for academic research.In the OSM database, physical features (buildings, roads, etc.) are represented by tags, which describe the geographic attributes of those features [52]. These tags provide information about an element, such as its “name” and “purpose”. For example, the tag is used to identify a beach. We have used these tags to create a tree structure to categorise activities experienced by tourists. A fully comprehensive list of tags and their descriptions can be found at OSM taginfo [52,53]. It is important to note that tags may change overtime or be discontinued and replaced by OSM.The activity tree has a root node named “Activities”. Its children are the main categories, which were inspired by an ontology we developed in previous work ([54]): Routes, Sports, Gastronomy, Leisure, Accommodation, Transportation, Nature and Culture. The tree also includes numerous subcategories that are descendants of the main categories. The leaves of the tree correspond to the OSM tags.Figure 2 shows a sample section of the activity tree. The complete tree consists of 32 subcategories and a total of 175 OSM tags in the leaves. The complete tree is detailed in Appendix A.
- Overpass turbo [55]: It is a query server for requesting specific features in the OSM database. Overpass provides a query language (Overpass QL [56]), similar to Structured Query Language (SQL), to help users gain access to specific information in the OSM database. For example, physical features within a certain radius from a coordinate pair can be requested and filtered by their OSM tags.We used the Overpass query language to request all POIs within a certain range around the coordinates of a touristic tweet. These POIs were OSM map features categorised as Nodes, Ways, Relations or Areas. Nodes are single structures, such as office buildings, which include coordinates to represent their locations. Ways consist of several nodes with individual coordinates, which represent structures such as roads, highways, streets, pathways, plaza, fountains, parks and steps. Relations are compound structures which include several nodes and ways. For example, complex attractions comprising multiple buildings such as Sagrada Familia are relations. Finally, Areas are large physical features that are represented by bounding boxes. Areas contain several nodes, ways and relations. For example, the Port Aventura theme park in Spain is categorised as an area, because it contains several attractions over a large area.This Overpass query requests all named nodes, ways and relations within a 50 m radius from the coordinates of a tweet, also including the areas if they are not cities, countries, towns or time-zone boundaries.Figure 3 shows the code snippet written in Overpass QL. Line 1 [out:json] sets the query output as JSON and [timeout:1000] sets the wait time in seconds before the query is terminated. Line 2 'nwr' requests all nodes, ways and relations 'around' <Lat>,<Lon> within the radius of <displacement> meters, and [∼"∧name(:.*)?$"∼"."] filters out unnamed map features. Line 3 requests all areas bordering the <Lat>,<Lon>. Lines 4 and 5 filter out all areas that are cities, towns, countries and time-zone boundaries. Finally, line 6 formats the results, 'out geom' gets the full geometry of results, 'tags' includes IDs and tags of the results and 'qt' sorts the results by their geometry.
- NLTK [57]: Natural Language Toolkit (NLTK) is an open source toolkit for Natural Language Processing written in Python. It is widely used because it includes a large number of tools for text analysis and it is very well documented. The NLTK library is used to pre-process the text of each tweet. The text is first stripped of stop words based on its language, and then separated into tokens with the NLTK tweet tokenizer. URLs and links of any form are removed, and hashtags composed of several capitalised words are split (e.g., #SagradaFamilia). Finally, numbers, icons, accents, punctuation, user mentions (i.e., user tags beginning with “@”) and excess letters in words, such as “funnnn”, are also eliminated. Once tweets have been processed, they can be compared with the names of the POIs returned from Overpass to find matches.The NLTK evergram tool is used to make n-grams of the POI names returned from Overpass. In this way it is possible to detect hashtags that contain POI names which could not be split in the pre-processing step.
- Step 1: The Overpass server is queried to return POIs within a 50 m radius of each tweet in the dataset.
- Step 2: Names of POIs returned from Overpass are analysed to find matches with the tweet text or the place name provided by Twitter if it is a POI.The tokens from the tweet are compared with the POI names returned by Overpass, and also with the n-grams made from the POI names that are between 2 and 5 words.
Algorithm 2 Activity identification (pseudocode) |
Input: Geolocated tweets with exact coordinates Output: category of each tweet
|
- One Match: When only one POI matches the text, the tags of that POI are checked against the activity tree, and the best suited category based on Table 2 is assigned to that tweet.
- Multiple Matches: When more than one POI matches the text, the tags of all matching POIs are checked against the activity tree, and the best suited category based on Table 2 is assigned to that tweet.
- No Match: When no match is found in the text, the tags of the returned POIs are checked against the activity tree, and the category with the highest priority rank based on Table 2 is assigned to the tweet.
- No POI: If the text analysis did not return any POI, the tweet is not assigned to any category.
3.2.4. Data Summary
- Tweets from Barcelonian citizens, not from tourists.
- Tweets that could not be assigned to any category in the activity identification process.
- Users with less than three tweets and their tweets.
3.3. Cluster Analysis
3.3.1. Clustering Features
- Activity interest features. These features embed the users’ interests in different kinds of touristic activities. They represent different levels of abstraction in the activity tree, as the analysis would be too general if we only considered the eight main categories of the first level. The activities associated with higher percentages of users in Barcelona were selected for the clustering process. All these features were scored as the percentages of tweets by users that were related to the particular types of activity. The selected features were the following:
- Top-tier features. These features represent some of the main categories in the activity tree. They are %Routes, %Sports, %Accommodation, %Transportation and %Nature.
- Middle-tier features. These are activity features selected from the subcategories of the activity tree. They are %Food, %Enotourism, %AmusementParks, %RecreationFacilities, %Beach, %Health&Care, %NightLife, %Shopping, %Viewpoint, %CulturalAmenities, %Historic and %Religious.
- Bottom-tier features. These activity features are OSM tags represented as leaves of the activity tree. They are %tourism_museum, %amenity_arts_centre and %tourism_gallery.
- Other features related to the activity tree. In the analysis it was found out that the OSM tag {tourism, artwork} was quite popular in our data set, but we did not know what type of artwork was being experienced. Thus, it was decided to break down this tag into several features that represent the type of artwork, using other tags associated with the POIs. These features are %artwork_type_sculpture, %artwork_type_architecture, %artwork_type_statue and %other_artwork. The last one represents cases of undetermined type or works of art that do not belong to the other three types. Figure 4 shows the 24 activity features and the percentages of users that visited them in Barcelona (according to the content and the location of their tweets). It may be seen that the top categories were RecreationFacilities, Religious, Historic and Food, followed by Museums and Accommodation.
- Travel features. These features are related to the travel habits of the user. Concretely, they contain information on the durations of trips to Barcelona and the degrees of mobility within the city. They are the following:
- Length of stay. It is the maximum number of consecutive days in which the user posted tweets from Barcelona.
- Tweet distance maximum and average features. Twtdistance_max and Twtdistance_avg are the maximum and average distances between the locations of the tweets of the user in Barcelona. They constitute contextual information on the user’s ability to explore the city.
- Popularity features: These features represent the interest of the user in visiting the most well-known and popular POIs. In order to obtain a popularity order, the POIs were rated according to the numbers of users in the database that had visited them. Popularity is split into five features. The first four (%top10_tweets, %top10–20_tweets, %top20–50_tweets and %top50–100_tweets) are the percentages of tweets of the user from POIs in positions 1–10, 10–20, 20–50 and 50–100 of the ranking. The feature %top100_tweets codifies the percentage of tweets that were not sent from any of the top 100 POIs in the city.
- Temporal features: These features embed the time of the day favoured by the user in his trips. There are 4 features representing the percentages of tweets that the user posted by period of the day. The features are: %Dawn_tweets (00:00–07:00), %Morning_tweets (07:00–12:00), %Afternoon_tweets (12:00–20:00) and %Night_tweets (20:00–00:00).
3.3.2. Clustering Parameters
- Algorithm. The k-means algorithm was selected because of its speed and ability to work with large data sets.
- Feature scaling. In cluster analysis it is necessary to ensure the data are scaled appropriately, as features having different scales would affect the clustering process negatively. The clustering features were standardised using the Z-Score.
- Number of clusters (k). The k-means algorithm requires the number of clusters as an input parameter. Clustering is by nature an unsupervised analysis process, and therefore, the optimal number of clusters is case-dependent. In this work we were not concerned about having equally sized clusters with clear dividing boundaries, but rather clusters that represent different combinations of the clustering features in order to create user profiles with different interests and contexts. After some experimentation, we found clusters was a suitable number for our data set.
3.4. Association Rule Mining
3.4.1. ARM Parameters
- Pre-processing: In MBA, the analysis is usually performed in shopping sessions; i.e., one user may have multiple baskets from different shopping sessions. In our case, it was decided to split the users into POIs experienced in the same day. This was beneficial because the system aimed to recommend POIs for daily trip planning. However, this decision also led to some loss of information, because we dropped the days in which less than two POIs were experienced.
- Frequent itemset mining algorithm: Frequent itemset mining (detecting sets of items that appear frequently together) is the main step in ARM. There are a variety of similar algorithms with which to perform this step. The Apriori algorithm [60] was chosen in this work because of its popularity and widespread acceptance. This algorithm requires minimum support to be provided, which is the minimum amount of times an itemset has to occur for it to be considered as frequent. This paremeter was given a low value value because the data set is sparse and it needed some leeway to function. The algorithm also requires the maximum length of the itemsets (maximum size of the sets of items appearing together frequently). The values chosen for these parameters are shown in Table 4.
- Association rule parameters: In ARM, multiple metrics are computed for each mined rule to evaluate its performance (they are detailed in the following subsection). In order to provide useful rules, these metrics may be used as filters; rules that do not reach a given threshold are discarded. The usual choices are confidence, support or lift, but in our case this filtering step was not relevant because the posterior selection algorithm performed a ranking of the rules, as will be shown later. Thus, the filtering parameters were set as shown in Table 4.
3.4.2. ARM Metrics
- Support. It indicates how frequently an itemset occurs in a data set. It is the fraction of times an itemset appears among all the transactions being analysed. It can be denoted as the probability of occurrence of the itemset . The support of a rule is the percentage of times that the antecedent and the consequent of the rule appear together.
- Confidence. It indicates how often a rule is found to be true. It is the proportion of times the consequent is found in the same transaction as the antecedent. It can be denoted as the conditional probability of the consequent appearing in the same transaction after the antecedent is found to be true .
- Lift. It was established to solve the problem on the confidence, being dependent only on the support of the antecedent. The order of the consequent and the antecedent in the rule does not matter, which makes the confidence metric a bit skewed because it considers the consequent to be dependent on the antecedent. The lift metric modifies this fact by considering the support of both the antecedent and the consequent.
- Create separate baskets with the POIs visited in the same day by each user.
- Mine frequent itemsets of visited POIs with a maximum length of 3 and a minimum support of 0.001.
- Build association rules from the itemsets uncovered in the previous step.
- Compute the previous metrics for each rule.
3.5. Personalised Recommendations of Touristic Activities
- It should contain only POIs that have been visited by other members of the same cluster.
- It should fit the user’s interests regarding the preferred types of activities and the attraction towards popular items.
- It should reflect the causality of the association rules of the cluster, in order to recommend POIs with high affinity.
- Preference ratio: This metric evaluates if the POIs that appear in a rule belong to any of the activity categories preferred by the user. The activity categories coincide with the activity features used in the clustering process.Let be a user’s preferred categories, i.e., the activity categories for which the user has at least one tweet. is the indicator function that has value 1 if or 0 otherwise. is a function that gets the category to which a POI belongs by looking up the activity tree. Furthermore, let be a function that signals the user’s degree of interest in a preferred category, where the degree of interest is the percentage of the user’s visited POIs belonging to the preferred category. If is the set of POIs in a rule, the preference ratio is calculated as follows:
- Popularity ratio: This is the percentage of popular POIs in a rule. Let be the top 10 POIs extracted from the data set, and is the indicator function with value 1 if and 0 otherwise. The popularity ratio is computed with the following formula:
- The user that desires a recommendation is assigned to a cluster. To make this assignment, first the values of the clustering features are extracted from the analysis of the Twitter history of the user (in the future, a survey will be used to gauge the user’s preferences). Then the user is assigned to the closest cluster comparing the Euclidean distance between the user’s features and the mean of the members in each cluster. The Euclidean distance was used because it was the distance metric employed in the previous k-means clustering process.
- The system takes the association rules of the user’s cluster and their metrics.
- The popularity and preference metrics are computed for each rule, based on the user’s data.
- The user’s personalised weights are computed as described in Equation (7).
- The metrics in Table 5 are combined using the WA operator to give an overall score for each rule.
- A final selection procedure (see Algorithm 3) is used to select the set of items to be recommended to the user.
Algorithm 3 Selection pseudocode |
Input: ARM rules Output: Set of recommended items R
|
4. Experiments and Results
4.1. Experimentation Details
4.2. Evaluation Metrics
- Average Precision (AP): It is the ratio of correct POI recommendations made to the users in the test set. A correct recommendation was determined by the user’s degree of preference in the category of the POI. AP is formulated as follows:The function gets the category of a POI. is an indicator function that has value 1 if returns a category that is preferred by the user and 0 otherwise. A category is preferred by a user if at least one of the user’s tweets has been associated with it.
- Average Category Recall (ACR): It is the ratio of preferred recommendable POIs that are actually recommended. ACR is formulated as:
- Average Item Recall (AIR): It is the ratio of visited POIs that are actually recommended. AIR is formulated as:
- Unified Item Recall (UIR): It is the unified fraction of times in which the POIs recommended to a user were among the POIs actually visited by the user. The UIR is formulated as:
- Similarity: It measures how similar the POIs recommended to a user and the POIs visited by the user are in terms of their paths in the activity tree. It is formulated as:
- Coverage: It measures the width of the recommendations. It is the percentage of items recommended to the users in the test set with respect to the total set of recommendable items. It is formulated as:
- Popularity: It measures the degree to which the popularity of recommended POIs matches the user’s popularity preference. is the fraction of top 10 POIs visited by the user u, and is the fraction of top 10 POIs recommended to the user.
- Personalisation: It measures the mean dissimilarity of users in the test set, based on their recommended POIs. It is formulated as:
- Diversity: It is the pairwise dissimilarity of POIs recommended to users in the test set, modelled after the diversity measure used in [61]. Borràs et al. replaced the arithmetic mean for an ordered weighted average (OWA) aggregation operator to avoid situations in which high values compensate for low values (e.g., aggregating (0,0,0,1,1,1) and (0.5,0.5,0.5,0.5,0.5,0.5) with arithmetic mean will have the same result, 0.5, but they are very different situations). The OWA weights are defined using a regular increasing monotone linguistic quantifier, which invokes a disjunctive policy where the lowest similarity values have higher weights. If is the set of values to be aggregated (in decreasing order) and is the weighting vector, where , then OWA is defined as:
4.3. Experiment Results and Discussion
5. Conclusions and Future Work
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
RS | Recommender System |
UNWTO | United National World Tourism Organization |
GPS | Global Positioning System |
POI | Point of Interest |
OSM | Open Street Map |
LBSN | Location-Based Social Networks |
URL | Uniform Resource Locator |
JSON | JavaScript Object Notation |
UTC | Coordinated Universal Time |
UTF | Unicode Transformation Format |
BCP | Best Current Practices |
SQL | Structure Query Language |
QL | Query Language |
NTLK | Natural Langauge Toolkit |
ARM | Association Rule Mining |
MBA | Market Basket Analysis |
OWA | Ordered Weighted Average |
WA | Weighted Average |
Appendix A
References
- UNWTO World Tourism Barometer and Statistical Annex, January 2020. UNWTO World Tour. Barom. 2020, 18, 1–48.
- International Tourism 2019 and Outlook for 2020. Available online: https://webunwto.s3.eu-west-1.amazonaws.com/s3fs-public/2020-01/Barometro-Jan-2020-EN-pre.pdf (accessed on 6 April 2021).
- Rathod, A.; Indiramma, M. A Survey of Personalized Recommendation System with User Interest in Social Network. Int. J. Comput. Sci. Inf. Technol. 2015, 6, 413–415. [Google Scholar]
- Haruna, K.; Akmar Ismail, M.; Suhendroyono, S.; Damiasih, D.; Pierewan, A.C.; Chiroma, H.; Herawan, T. Context-Aware Recommender System: A Review of Recent Developmental Process and Future Research Direction. Appl. Sci. 2017, 7, 1211. [Google Scholar] [CrossRef] [Green Version]
- Quadrana, M.; Cremonesi, P.; Jannach, D. Sequence-Aware Recommender Systems. ACM Comput. Surv. 2018, 51. [Google Scholar] [CrossRef] [Green Version]
- Dara, S.; Chowdary, R.C.; Kumar, C. A survey on group recommender systems. J. Intell. Inf. Syst. 2020, 54, 271–295. [Google Scholar] [CrossRef]
- Burke, R. Hybrid Recommender Systems: Survey and Experiments. User Model User-Adap. Inter. 2002, 12, 331–370. [Google Scholar] [CrossRef]
- Borràs, J.; Moreno, A.; Valls, A. Intelligent tourism recommender systems: A survey. Expert Syst. Appl. 2014, 41, 7370–7389. [Google Scholar] [CrossRef]
- Massimo, D.; Ricci, F. Harnessing a Generalised User Behaviour Model for Next-POI Recommendation. In RecSys ’18, Proceedings of the 12th ACM Conference on Recommender Systems; Association for Computing Machinery: New York, NY, USA, 2018; pp. 402–406. [Google Scholar]
- Ma, S.; Kirilenko, A. How Reliable Is Social Media Data? Validation of TripAdvisor Tourism Visitations Using Independent Data Sources. In Information and Communication Technologies in Tourism 2021; Wörndl, W., Koo, C., Stienmetz, J.L., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 286–293. [Google Scholar]
- Anandhan, A.; Shuib, L.; Ismail, M.A.; Mujtaba, G. Social Media Recommender Systems: Review and Open Research Issues. IEEE Access 2018, 6, 15608–15628. [Google Scholar] [CrossRef]
- Tsai, C.Y.; Paniagua, G.; Chen, Y.J.; Lo, C.C.; Yao, L. Personalized Tour Recommender through Geotagged Photo Mining and LSTM Neural Networks. MATEC Web Conf. 2019, 292, 01003. [Google Scholar] [CrossRef]
- Dietz, L.W.; Sen, A.; Roy, R.; Wörndl, W. Mining Trips from Location-Based Social Networks for Clustering Travelers and Destinations; Springer: Berlin/Heidelberg, Germany, 2020; Volume 22, pp. 131–166. [Google Scholar]
- Van der Zee, E.; Bertocchi, D. Finding patterns in urban tourist behaviour: A social network analysis approach based on TripAdvisor reviews. Inf. Technol. Tour. 2018, 20, 153–180. [Google Scholar] [CrossRef]
- Manca, M.; Boratto, L.; Morell Roman, V.; Martori i Gallissà, O.; Kaltenbrunner, A. Using social media to characterize urban mobility patterns: State-of-the-art survey and case-study. Online Soc. Netw. Media 2017, 1, 56–69. [Google Scholar] [CrossRef]
- Berndt, J.O.; Rodermund, S.C.; Lorig, F.; Timm, I.J. Modeling User Behavior in Social Media with Complex Agents. In Proceedings of the HUSO 2017—The Third International Conference on Human and Social Analytics, Nice, France, 23–27 July 2017; pp. 18–24. [Google Scholar]
- Ishanka, U.A.; Yukawa, T. User Emotion and Personality in Context-aware Travel Destination Recommendation. In Proceedings of the 2018 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA), Krabi, Thailand, 14–17 August 2018; pp. 13–18. [Google Scholar]
- Jabreel, M.; Huertas, A.; Moreno, A. Semantic analysis and the evolution towards participative branding: Do locals communicate the same destination brand values as DMOs? PLoS ONE 2018, 13, e0206572. [Google Scholar] [CrossRef] [PubMed]
- Jabreel, M.; Moreno, A.; Huertas, A. Do Local Residents and Visitors Express the Same Sentiments on Destinations through Social Media? In Information and Communication Technologies in Tourism 2017; Schegg, R., Stangl, B., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 655–668. [Google Scholar]
- Huang, F.; Qiao, S.; Peng, J.; Guo, B.; Han, N. STPR: A Personalized Next Point-of-Interest Recommendation Model with Spatio-Temporal Effects Based on Purpose Ranking. IEEE Trans. Emerg. Top. Comput. 2019, 9, 994–1005. [Google Scholar] [CrossRef]
- Massimo, D.; Ricci, F. Next-POI Recommendations Matching User’s Visit Behaviour. In Information and Communication Technologies in Tourism 2021; Wörndl, W., Koo, C., Stienmetz, J.L., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 45–57. [Google Scholar]
- Baral, R.; Iyengar, S.S.; Li, T.; Balakrishnan, N. CLoSe: Contextualized Location Sequence Recommender. In RecSys ’18, Proceedings of the 12th ACM Conference on Recommender Systems; Association for Computing Machinery: New York, NY, USA, 2018; pp. 470–474. [Google Scholar]
- He, R.; Kang, W.C.; McAuley, J. Translation-Based Recommendation. In RecSys ’17, Proceedings of the Eleventh ACM Conference on Recommender Systems; Association for Computing Machinery: New York, NY, USA, 2017; pp. 161–169. [Google Scholar]
- Li, C.T.; Chen, H.Y.; Chen, R.H.; Hsieh, H.P. On route planning by inferring visiting time, modeling user preferences, and mining representative trip patterns. Knowl. Inf. Syst. 2018, 56, 581–611. [Google Scholar] [CrossRef]
- Bustamante, A.; Sebastia, L.; Onaindia, E. Can Tourist Attractions Boost Other Activities Around? A Data Analysis through Social Networks. Sensors 2019, 19, 2612. [Google Scholar] [CrossRef] [Green Version]
- Farnadi, G.; Tang, J.; De Cock, M.; Moens, M.F. User Profiling through Deep Multimodal Fusion. In WSDM ’18, Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining; Association for Computing Machinery: New York, NY, USA, 2018; pp. 171–179. [Google Scholar]
- Orlandi, F.; Breslin, J.; Passant, A. Aggregated, Interoperable and Multi-Domain User Profiles for the Social Web. In I-SEMANTICS’12, Proceedings of the 8th International Conference on Semantic Systems; Association for Computing Machinery: New York, NY, USA, 2012; pp. 41–48. [Google Scholar]
- DBpedia. Available online: https://wiki.dbpedia.org/ (accessed on 29 April 2021).
- Esmaeili, L.; Mardani, S.; Golpayegani, S.A.H.; Madar, Z.Z. A novel tourism recommender system in the context of social commerce. Expert Syst. Appl. 2020, 149, 113301. [Google Scholar] [CrossRef]
- Liji, U.; Chai, Y.; Chen, J. Improved personalized recommendation based on user attributes clustering and score matrix filling. Comput. Stand. Interfaces 2018, 57, 59–67. [Google Scholar]
- Ma, X.; Lu, H.; Gan, Z.; Zhao, Q. An exploration of improving prediction accuracy by constructing a multi-type clustering based recommendation framework. Neurocomputing 2016, 191, 388–397. [Google Scholar] [CrossRef]
- Nguyen, L.V.; Jung, J.J.; Hwang, M. OurPlaces: Cross-Cultural Crowdsourcing Platform for Location Recommendation Services. ISPRS Int. J. Geo-Inf. 2020, 9, 711. [Google Scholar] [CrossRef]
- Nguyen, L.V.; Hong, M.S.; Jung, J.J.; Sohn, B.S. Cognitive Similarity-Based Collaborative Filtering Recommendation System. Appl. Sci. 2020, 10, 4183. [Google Scholar] [CrossRef]
- Fränti, P.; Waga, K.; Khurana, C. Can Social Network Be Used for Location-aware Recommendation? In Proceedings of the 11th International Conference on Web Information Systems and Technologies—WEBIST, Lisbon, Portugal, 20–22 May 2015; INSTICC, SciTePress: Setúbal, Portugal, 2015; pp. 558–565. [Google Scholar]
- Fränti, P.; Mariescu-Istodor, R.; Waga, K. Similarity of Mobile Users Based on Sparse Location History. In Artificial Intelligence and Soft Computing; Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 593–603. [Google Scholar]
- Najafabadi, M.K.; Mahrin, M.N.; Chuprat, S.; Sarkan, H.M. Improving the accuracy of collaborative filtering recommendations using clustering and association rules mining on implicit data. Comput. Hum. Behav. 2017, 67, 113–128. [Google Scholar] [CrossRef]
- Pandya, S.; Shah, J.; Joshi, N.; Ghayvat, H.; Mukhopadhyay, S.C.; Yap, M.H. A novel hybrid based recommendation system based on clustering and association mining. In Proceedings of the 2016 10th International Conference on Sensing Technology (ICST), Nanjing, China, 11–13 November 2016; pp. 1–6. [Google Scholar] [CrossRef]
- Jalalimanesh, A.; Mansoury, M.; Gandomi, H. Recommender system based on data mining: Interlibrary case study. In Proceedings of the 20th Iranian Conference on Electrical Engineering (ICEE2012), Tehran, Iran, 15–17 May 2012; pp. 806–809. [Google Scholar] [CrossRef]
- Fenza, G.; Fischetti, E.; Furno, D.; Loia, V. A hybrid context aware system for tourist guidance based on collaborative filtering. In Proceedings of the 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011), Taipei, Taiwan, 27–30 June 2011; pp. 131–138. [Google Scholar]
- Twitter Wikipedia. Available online: https://en.wikipedia.org/wiki/Twitter (accessed on 28 April 2021).
- Twitter Revenue and Usage Statistics. 2020. Available online: https://www.businessofapps.com/data/twitter-statistics/#:~:text=We%20saw%20a%20recovery%20to,%2435.01%20billion%20in%20September%202019/ (accessed on 30 March 2021).
- Similarweb Twitter Traffic Overview. Available online: https://www.similarweb.com/website/twitter.com/ (accessed on 30 March 2021).
- Twitter API. Available online: https://developer.twitter.com/en/docs/twitter-api (accessed on 28 April 2021).
- Twitter Object Attributes. Available online: https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/tweet (accessed on 28 April 2021).
- Lalicic, L.; Huertas, A.; Moreno, A.; Jabreel, M. Emotional brand communication on Facebook and Twitter: Are DMOs successful? J. Dest. Mark. Manag. 2020, 16, 100350. [Google Scholar] [CrossRef]
- Lalicic, L.; Huertas, A.; Moreno, A.; Jabreel, M. Which emotional brand values do my followers want to hear about? An investigation of popular European tourist destinations. Inf. Technol. Tour. 2019, 21, 63–81. [Google Scholar] [CrossRef]
- Enzensberger, H.M. A Theory of Tourism. New Ger. Crit. 1996, 117–135. [Google Scholar] [CrossRef]
- Neff, J.C. Santa Fe and the Tourist. New Mex. Q. 1938, 8. Available online: https://digitalrepository.unm.edu/nmq/vol8/iss2/12 (accessed on 8 July 2021).
- Waga, K.; Tabarcea, A.; Fränti, P. Recommendation of points of interest from user generated data collection. In Proceedings of the 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), Pittsburgh, PA, USA, 14–17 October 2012; pp. 550–555. [Google Scholar] [CrossRef] [Green Version]
- Mariescu-Istodor, R.; Ungureanu, R.; Fränti, P. Real-time destination prediction for mobile users. Adv. Cartogr. Gisci. Int. Cartogr. Assoc. 2019, 2, 1–7. [Google Scholar] [CrossRef]
- OpenStreetMap. Available online: https://www.openstreetmap.org/ (accessed on 29 April 2021).
- OpenStreetMap Map Features. Available online: https://wiki.openstreetmap.org/wiki/Map_features (accessed on 30 April 2021).
- OpenStreetMap Taginfo. Available online: https://taginfo.openstreetmap.org/tags (accessed on 30 April 2021).
- Moreno, A.; Valls, A.; Isern, D.; Marin, L.; Borràs, J. SigTur/E-Destination: Ontology-based personalized recommendation of Tourism and Leisure Activities. Eng. Appl. Artif. Intell. 2013, 26, 633–651. [Google Scholar] [CrossRef]
- Overpass Turbo EU. Available online: https://overpass-turbo.eu/ (accessed on 29 April 2021).
- Overpass QL. Available online: https://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_QL (accessed on 30 April 2021).
- Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python, 1st ed.; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2009. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res 2011, 12, 2825–2830. [Google Scholar]
- Raschka, S. MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack. J. Open Source Softw. 2018, 3. [Google Scholar] [CrossRef]
- Agrawal, R.; Srikant, R. Fast Algorithms for Mining Association Rules in Large Databases. In VLDB’94, Proceedings of the 20th International Conference on Very Large Data Bases; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1994; pp. 487–499. [Google Scholar]
- Borràs, J.; Moreno, A.; Valls, A. Diversification of recommendations through semantic clustering. Multimed. Tools Appl. 2017, 76, 24165–24201. [Google Scholar] [CrossRef]
- Golden, B.; Levy, L.; Vohra, R. The orienteering problem. Nav. Res. Logist. 1987, 34, 307–318. [Google Scholar] [CrossRef]
Attribute | Data Type | Short Description |
---|---|---|
Created at | String | UTC time when the tweet was created. |
Id str | String | Unique identifier of the tweet. |
Text | String | Actual UTF-8 text of the tweet. |
User | User object | Data dictionary containing information about a user including id, screen_name, geo_enable, etc. |
Coordinates | Coordinates | Geographical location of the tweet, if shared by the user. |
Place | Place object | Geographical data dictionary that indicates the place from which the tweet was sent. It can be a country, a region or even a POI. |
Lang | String | BCP 47 language identifier of the machine-detected language of the text of the tweet. |
Category | Distance | Priority |
---|---|---|
Culture | 50 m | 1 |
Leisure | 25 m | 2 |
Accommodation | 35 m | 3 |
Gastronomy | 25 m | 4 |
Nature | 15 m | 5 |
Routes | 15 m | 6 |
Sports | 15 m | 7 |
Transportation | 15 m | 8 |
Statistics | Value |
---|---|
Total number of tweets in Barcelona | 1,523,801 |
Total number of users in Barcelona | 108,515 |
Statistics after filtering | |
Total number of tweets in Barcelona | 37,302 |
Total number of users in Barcelona | 6066 |
Apriori Params | Association Rule Params | ||
---|---|---|---|
Min support | Max length | Metric | Minimum value |
0.001 | 3 | Lift | 0 |
Case | Preference Ratio | Lift | Confidence | Support | Antecedent Support | Consequent Support | Popularity Ratio |
---|---|---|---|---|---|---|---|
0.5 | 0.15 | 0.15 | 0.1 | 0.05 | 0.05 | 0.0 | |
0.3 | 0.05 | 0.05 | 0.1 | 0.1 | 0.1 | 0.3 |
Experiment 1 | |||||||||
---|---|---|---|---|---|---|---|---|---|
Case | AP | ACR | AIR | UIR | Similarity | Popularity | Coverage | Personalisation | Diversity |
Without clustering | 0.707 | 0.036 | 0.223 | 0.161 | 0.785 | 0.779 | 0.039 | 0.654 | 0.220 |
With clustering | 0.735 | 0.039 | 0.197 | 0.484 | 0.791 | 0.826 | 0.172 | 0.733 | 0.208 |
Experiment 2 | |||||||||
Without clustering | 0.712 | 0.038 | 0.223 | 0.164 | 0.784 | 0.780 | 0.041 | 0.676 | 0.210 |
With clustering | 0.762 | 0.040 | 0.181 | 0.543 | 0.794 | 0.806 | 0.206 | 0.753 | 0.191 |
Experiment 3 | |||||||||
Without clustering | 0.705 | 0.035 | 0.206 | 0.142 | 0.788 | 0.794 | 0.038 | 0.684 | 0.213 |
With clustering | 0.742 | 0.039 | 0.178 | 0.519 | 0.792 | 0.822 | 0.199 | 0.758 | 0.199 |
Experiment average | |||||||||
Without clustering | 0.708 | 0.036 | 0.217 | 0.156 | 0.786 | 0.785 | 0.039 | 0.671 | 0.214 |
With clustering | 0.746 | 0.039 | 0.185 | 0.516 | 0.792 | 0.818 | 0.192 | 0.748 | 0.199 |
Difference (diff) | 0.038 | 0.003 | −0.032 | 0.360 | 0.006 | 0.034 | 0.153 | 0.077 | −0.015 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Orama, J.A.; Borràs, J.; Moreno, A. Combining Cluster-Based Profiling Based on Social Media Features and Association Rule Mining for Personalised Recommendations of Touristic Activities. Appl. Sci. 2021, 11, 6512. https://doi.org/10.3390/app11146512
Orama JA, Borràs J, Moreno A. Combining Cluster-Based Profiling Based on Social Media Features and Association Rule Mining for Personalised Recommendations of Touristic Activities. Applied Sciences. 2021; 11(14):6512. https://doi.org/10.3390/app11146512
Chicago/Turabian StyleOrama, Jonathan Ayebakuro, Joan Borràs, and Antonio Moreno. 2021. "Combining Cluster-Based Profiling Based on Social Media Features and Association Rule Mining for Personalised Recommendations of Touristic Activities" Applied Sciences 11, no. 14: 6512. https://doi.org/10.3390/app11146512
APA StyleOrama, J. A., Borràs, J., & Moreno, A. (2021). Combining Cluster-Based Profiling Based on Social Media Features and Association Rule Mining for Personalised Recommendations of Touristic Activities. Applied Sciences, 11(14), 6512. https://doi.org/10.3390/app11146512