The Value of Web Data Scraping: An Application to TripAdvisor
Abstract
:1. Introduction
2. Materials and Methods
2.1. WDS: A Big Data Source
2.2. FpK: An Ease of Use Tool
3. Results
4. Discussion
4.1. Towards Cognitive Scraping
5. Conclusions
5.1. Other Trends
5.2. Implications and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- González-Padilla, D.; Tortolero-Blanco, L. Social media influence in the COVID-19 Pandemic. Int. Braz. J. Urol. 2020, 46, 120–124. [Google Scholar] [CrossRef] [PubMed]
- Duro, J.; Perez-Laborda, A.; Turrion-Prats, J.; Fernández-Fernández, M. COVID-19 and tourism vulnerability. Tour. Manag. Perspect. 2021, 38, 100819. [Google Scholar] [CrossRef]
- Ashley, C.; Tuten, T. Creative Strategies in Social Media Marketing: An Exploratory Study of Branded Social Content and Consumer Engagement. Psychol. Mark. 2015, 32, 15–27. [Google Scholar] [CrossRef]
- Zhao, B. Web Scraping. Encyclopedia of Big Data; Springer: Berlin/Heidelberg, Germany, 2017; pp. 1–3. [Google Scholar] [CrossRef]
- Kaisler, S.; Armour, F.; Espinosa, J.; Money, W. Big Data: Issues and Challenges Moving Forward. In Proceedings of the 46th Hawaii International Conference on System Sciences 2013, Wailea, HI, USA, 7–10 January 2013; pp. 995–1004. [Google Scholar] [CrossRef]
- Bar-Ilan, J. Data collection methods on the Web for infometric purposes-A review and analysis. Scientometrics 2001, 50, 7–32. [Google Scholar] [CrossRef]
- Mitchell, R. Web Scraping with Python: Collecting More Data from the Modern Web, 2nd ed.; O’Reilly: Sebastopol, CA, USA, 2018. [Google Scholar]
- Kusumasari, B.; Prabowo, N. Scraping social media data for disaster communication: How the pattern of Twitter users affects disasters in Asia and the Pacific. Nat. Hazards 2020, 103, 3415–3435. [Google Scholar] [CrossRef]
- Kaburuan, E.; Lindawati, A.; Surjandy; Sinswantini; Putra, M.; Utama, D. A Model Configuration of Social Media Text Mining for Projecting the Online-Commerce Transaction (Case: Twitter Tweets Scraping). In Proceedings of the 7th International Conference on Cyber and IT Service Management (CITSM) 2019, Jakarta, Indonesia, 6–8 November 2019; pp. 1–4. [Google Scholar] [CrossRef]
- Kaur, C.; Sharma, A. Social Issues Sentiment Analysis using Python. In Proceedings of the 5th International Conference on Computing, Communication and Security (ICCCS) 2020, Patna, India, 14–16 October 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Raman, D.; Jayalakshmi, S.; Arumugam, K.; Raj, A.; Balaji, D.; Brightsingh, R. Implementation of Data Analysis and Document Summarization in Social Media Data Using R and Python. In Proceedings of the 4th International Conference on Inventive Research in Computing Applications (ICIRCA) 2022, Coimbatore, India, 21–23 September 2022; pp. 1457–1464. [Google Scholar] [CrossRef]
- Bhardwaj, B.; Ahmed, S.; Jaiharie, J.; Dadhich, R.; Ganesan, M. Web Scraping Using Summarization and Named Entity Recognition (NER). In Proceedings of the 7th International Conference on Advanced Computing and Communication Systems (ICACCS) 2021, Coimbatore, India, 19–20 March 2021; pp. 261–265. [Google Scholar] [CrossRef]
- Dansana, D.; Adhikari, J.; Mohapatra, M.; Sahoo, S. An Approach to Analyse and Forecast Social media Data using Machine Learning and Data Analysis. In Proceedings of the International Conference on Computer Science, Engineering and Applications (ICCSEA) 2020, Gunupur, India, 13–14 March 2020; pp. 1–5. [Google Scholar] [CrossRef]
- Camargo-Henríquez, I.; Núñez-Bernal, Y. A Web Scraping based approach for data research through social media: An Instagram case. In Proceedings of the V Congreso Internacional en Inteligencia Ambiental, Ingeniería de Software y Salud Electrónica y Móvil (AmITIC) 2022, San Jose, Costa Rica, 14–16 September 2022; pp. 1–4. [Google Scholar] [CrossRef]
- Zou, J.; Le, D.; Thoma, G. Locating and parsing bibliographic references in HTML medical articles. Int. J. Doc. Anal. Recognit. 2010, 13, 107–119. [Google Scholar] [CrossRef] [Green Version]
- Korab, P. Text Network Analysis: Generate Beautiful Network Visualisations. 2022. Available online: https://towardsdatascience.com/text-network-analysis-generate-beautiful-network-visualisations-a373dbe183ca (accessed on 21 May 2023).
- Alaei, A.R.; Becken, S.; Stantic, B. Sentiment Analysis in Tourism: Capitalizing on Big Data. J. Travel Res. 2019, 58, 175–191. [Google Scholar] [CrossRef] [Green Version]
- Boegershausen, J.; Datta, H.; Borah, A.; Stephen, A.T. Fields of Gold: Scraping Web Data for Marketing Insights. J. Mark. 2022, 86, 1–20. [Google Scholar] [CrossRef]
- Màrquez-Domínguez, C.; López López, P.; Arias, T. Social networking and political agenda: Donald Trump’s Twitter accounts. In Proceedings of the 12th Iberian Conference on Information Systems and Technologies (CISTI) 2017, Lisbon, Portugal, 21–24 June 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Tarai, J.; Kant, R.; Finau, G.; Titifanue, J. Political Social Media Campaigning in Fiji’s 2014 Elections. J. Pac. Stud. 2015, 35, 89–114. [Google Scholar]
- Rullo, L.; Nunziata, F. “Sometimes the Crisis Makes the Leader?” A Comparison of Giuseppe Conte Digital Communication before and during the COVID-19 Pandemic. Comun. Politica 2021, 3, 309–332. [Google Scholar] [CrossRef]
- Mabillard, V.; Zumofen, R.; Pasquier, M. Local governments’ communication on social media platforms: Refining and assessing patterns of adoption in Belgium. Int. Rev. Adm. Sci. 2022, 1–17. [Google Scholar] [CrossRef]
- Martínez, T. Comunicación y diabetes, un camino para la reflexión. RedMarka-Rev. De Mark. Apl. 2022, 26, 96–113. [Google Scholar] [CrossRef]
- Jayasingh, S.; Venkatesh, R. Customer Engagement Factors in Facebook Brand Pages. Asian Soc. Sci. 2015, 11, 19. [Google Scholar] [CrossRef] [Green Version]
- Huertas, A.; Marine-Roig, E. User reactions to destination brand contents in social media. Inf. Technol. Tour. 2016, 15, 291–315. [Google Scholar] [CrossRef]
- Caldevilla-Domínguez, D.; Barrientos-Báez, A.; Padilla-Castillo, G. Dilemmas Between Freedom of Speech and Hate Speech: Russophobia on Facebook and Instagram in the Spanish Media. Politics Gov. 2022, 11, 1–13. [Google Scholar] [CrossRef]
- Martínez-Fernández, V.; Amboage, E.; Burneo, M.; Benitez, V. La gestión de los medios sociales en la dinamización de destinos turísticos termales: Análisis crosscultural de modelos aplicados en España, Portugal y Ecuador. Hologramática 2015, 2, 47–60. [Google Scholar]
- Sánchez-Jiménez, M.; Matos, N.; Correia, M. Evolution of the presence and engagement of official social networks in promoting tourism in Spain. J. Spat. Organ. Dyn. 2019, 7, 210–225. [Google Scholar]
- Sánchez-Jiménez, M. Análisis de la comunicación digital oficial en la promoción turística de Brasil. 3c TIC-Cuad. De Desarro. Apl. A Las TIC 2020, 9, 17–39. [Google Scholar] [CrossRef]
- Lee, M. Evolution of hospitality and tourism technology research from Journal of Hospitality and Tourism Technology: A computer-assisted qualitative data analysis. J. Hosp. Tour. Technol. 2021, 13, 62–84. [Google Scholar] [CrossRef]
- Pereira, P. Social Media Influencers in Travel and Tourism. Master’s Thesis, Master Course in Information Management. Nova Information Management School, Lisbon, Portugal, 2023. [Google Scholar]
- Phaujdar, A. 9 Best Web Scraping Tools. 2021. Available online: https://hevodata.com/learn/web-scraping-tools/ (accessed on 22 May 2023).
- Rizkallah, J. The Big (Unstructured) Data Problem. 2017. Available online: https://www.forbes.com/sites/forbestechcouncil/2017/06/05/the-big-unstructured-data-problem/?sh=cd00fa3493a3 (accessed on 23 March 2023).
- Selz, D. Unstructured Data Is Key to True Customer Insight. 2017. Available online: https://www.linkedin.com/pulse/unstructured-data-key-true-customer-insight-dorian-selz (accessed on 23 March 2023).
- Chen, S.; Kang, J.; Liu, S.; Sun, Y. Cognitive computing on unstructured data for customer co-innovation. Eur. J. Mark. 2020, 54, 570–593. [Google Scholar] [CrossRef]
- Marr, B. How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read. 2018. Available online: https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/?sh=4de1a9aa60ba (accessed on 12 March 2023).
- Ruan, Z.; Siau, K. Digital Marketing in the Artificial Intelligence and Machine Learning Age. Americas Conference on Information Systems. 2019. Available online: https://www.semanticscholar.org/paper/Digital-Marketing-in-the-Artificial-Intelligence-Ruan-Siau/5d0764dbe4cb3beb6c194b49a4eae1a991a72cd8 (accessed on 13 March 2023).
- Kim, H.; Chan, H.; Gupta, S. Examining information systems infusion from a user commitment perspective. Inf. Technol. People 2016, 29, 173–199. [Google Scholar] [CrossRef]
- Changchit, C.; Chuchuen, C. Cloud computing: An examination of factors impacting users’ adoption. J. Comput. Inf. Syst. 2018, 58, 1–9. [Google Scholar] [CrossRef]
- Biedrzycki, N. Cognitive Computing. What Can It Be Used for? 2020. Available online: https://towardsdatascience.com/cognitive-computing-what-can-it-be-used-for-8af4721928f5 (accessed on 26 May 2023).
- Frackiewicz, M. The Role of NLP in Cognitive Computing. 2023. Available online: https://ts2.space/en/the-role-of-nlp-in-cognitive-computing/ (accessed on 26 May 2023).
- Rao, L. Instagram Copies Snapchat Once again with Face Filters. 2017. Available online: https://tinyurl.com/ybcuxxdv (accessed on 5 June 2023).
- Perry, E. Meet HearMeOut: The Social Media Platform Looking to Bring Audio Back into the Mainstream. 2018. Available online: https://tinyurl.com/y8yxbzah (accessed on 5 June 2023).
- Katai, L. 3 Reasons Why Audio Will Conquer All Social Media. 2018. Available online: https://www.adweek.com/performance-marketing/3-reasons-why-audio-will-conquer-social-media/ (accessed on 5 June 2023).
- Shahid, M.Z.; Li, G. Impact of Artificial Intelligence in Marketing: A Perspective of Marketing Professionals of Pakistan. Glob. J. Manag. Bus. Res. 2019, 19, 27–33. [Google Scholar]
- Dwivedi, Y.; Ismagilova, E.; Hughes, D.; Carlson, J.; Filieri, R.; Jacobson, J.; Jain, V.; Karjaluoto, H.; Kefi, H.; Krishen, A.; et al. Setting the future of digital and social media marketing research: Perspectives and research propositions. Int. J. Inf. Manag. 2021, 59, 102168. [Google Scholar] [CrossRef]
- Zoho Social. Social Media Marketing Trends for 2022. 2021. Available online: https://www.zoho.com/social/journal/social-media-marketing-trends-2022.html (accessed on 5 June 2023).
- NBBJ. Social Media Is Evolving Quickly, and Your Business Needs to Also. 2022. Available online: https://www.northbaybusinessjournal.com/article/industrynews/social-media-is-evolving-quickly-and-your-business-needs-to-also/ (accessed on 5 June 2023).
- Corcoran, S. Defining Earned, Owned and Paid Media. 2009. Available online: https://www.forrester.com/blogs/09-12-16-defining_earned_owned_and_paid_media/ (accessed on 12 March 2023).
- Wozniak, T.; Stangl, B.; Schegg, R.; Liebrich, A. Do Social Media Investments Pay Off? Preliminary Evidence from Swiss Destination Marketing Organizations. In Proceedings of the ENTER eTourism Conference 2016, Bilbao, Spain, 2–5 February 2016. [Google Scholar]
- Peters, K.; Chen, Y.; Kaplan, A.; Ognibeni, B.; Pauwels, K. Social media metrics-A framework and guidelines for managing social media. J. Interact. Mark. 2013, 27, 281–298. [Google Scholar] [CrossRef]
TA Profiles | Page Performance Index | Fans | Follower Growth (%) | Post Interaction | Posts Per Day |
---|---|---|---|---|---|
TikTok | 10.0% | 1127 | 1843.1% | 1.31% | 0.052 |
10.0% | 228,048 | 8.6% | 0.0006% | 1.038 | |
8.0% | 3,424,208 | 0.97% | 0.0005% | 6.917 | |
14.0% | 2,769,720 | 12.38% | 0.06% | 1.057 | |
9.0% | 7,626,233 | 0.94% | 0.0008% | 4.608 | |
Average | 10.2% | 2,809,867.2 | 373.2% | 0.27% | 2.734 |
Web Scraper | Key Features |
---|---|
Octoparse | Anonymous web data scraping behind login forms |
ParseHub | Efficiency of data extraction from complex web pages |
Mozenda | Job sequencers to collect web data in real-time; highly scalable |
Webhose.io | Fast content indexing; get machine-readable data sets |
Content Grabber | Allows to build web apps and offers a wide variety of formats |
Common Crawl | Support for non-code usage; has resources to teach data analysis |
Scrapy | Open-source tool and easily extensible. Middleware modules available for integrating useful tools |
Scraperapi | Easy to integrate; allows price scraping, search engine scraping |
Scrape-it.Cloud | Easy integration in other systems; scrapers for popular sites |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Barbera, G.; Araujo, L.; Fernandes, S. The Value of Web Data Scraping: An Application to TripAdvisor. Big Data Cogn. Comput. 2023, 7, 121. https://doi.org/10.3390/bdcc7030121
Barbera G, Araujo L, Fernandes S. The Value of Web Data Scraping: An Application to TripAdvisor. Big Data and Cognitive Computing. 2023; 7(3):121. https://doi.org/10.3390/bdcc7030121
Chicago/Turabian StyleBarbera, Gianluca, Luiz Araujo, and Silvia Fernandes. 2023. "The Value of Web Data Scraping: An Application to TripAdvisor" Big Data and Cognitive Computing 7, no. 3: 121. https://doi.org/10.3390/bdcc7030121
APA StyleBarbera, G., Araujo, L., & Fernandes, S. (2023). The Value of Web Data Scraping: An Application to TripAdvisor. Big Data and Cognitive Computing, 7(3), 121. https://doi.org/10.3390/bdcc7030121