Detecting Fake Reviews in Google Maps—A Case Study
Abstract
:1. Introduction
2. Related Work
2.1. Existing Datasets
2.2. Detection of Fake Reviews
- “Review–centric”, which mostly concentrates on the content of the review itself. This relies on natural language processing (NLP) methods to separate fake reviews from real ones.
- “Reviewer–centric”, which looks more broadly at the reviewer’s account and considers the general statistics of the account, such as the percentage of positive reviews or number of reviews written in a given time.
3. Creating a Dataset of Google Maps Reviews
3.1. Scraping Google Maps to Identify Fake and Genuine Data
- Find online services offering fake reviews.
- Search their website for examples of reviews they have written.
- Use these samples to find the corresponding review and the reviewed object.
- If the reviewer has not set their status to private, save this account’s name and URL address in a temporary database.
- Use our automated Google Maps scraper to check, among the reviewed places, whether there are groups of reviewers who have reviewed the same places.
- Save the accounts that were present in the mentioned groups as fake.
- Scrape more detailed data about accounts presumed to be fake and save them into the dataset.
3.2. Challenges Faced When Crawling Google Maps Data
3.3. Database Validation
3.4. Clustering Object Types
3.5. Features Collected
3.6. Dataset Statistics
4. Initial Experiments and Results
4.1. Detection of Fake Accounts
- Local guide level;
- Number of reviews;
- Basic rating statistics (mean, median, and variance);
- Percentage of reviews with responses given.
4.2. Additional Features
4.2.1. Name Score
Algorithm 1: Algorithm for calculating name score. |
Input: name_of_account(string) Output: name score(int) 1: number of words in name_of_account 2: if () then 3: first(name_of_account) 4: second(name_of_account) 5: 6: return 7: else 8: first word of name_of_account 9: second word of name_of_account 10: 11: 12: 13: 14: 15: 16: 17: return 18: end if |
4.2.2. Geographic Dispersion
4.3. Results of Detection of Fake Accounts Using the Extended Feature Set
4.4. Detection of Fake Reviews
- Name score of the authoring account;
- Local guide level of the authoring account;
- Number of reviews of the authoring account;
- Rating awarded;
- Whether the review was responded to.
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bondielli, A.; Marcelloni, F. A survey on fake news and rumour detection techniques. Inf. Sci. 2019, 497, 38–55. [Google Scholar] [CrossRef]
- Calleja, N.; AbdAllah, A.; Abad, N.; Ahmed, N.; Albarracin, D.; Altieri, E.; Anoko, J.N.; Arcos, R.; Azlan, A.A.; Bayer, J.; et al. A Public Health Research Agenda for Managing Infodemics: Methods and Results of the First WHO Infodemiology Conference. JMIR Infodemiol. 2021, 1, e30979. [Google Scholar] [CrossRef] [PubMed]
- Gradoń, K.T.; Hołyst, J.A.; Moy, W.R.; Sienkiewicz, J.; Suchecki, K. Countering misinformation: A multidisciplinary approach. Big Data Soc. 2021, 8, 1–14. [Google Scholar] [CrossRef]
- Google Maps. Google Maps. 2023. Available online: https://www.google.pl/maps (accessed on 20 May 2023).
- Jindal, N.; Liu, B. Opinion spam and analysis. In Proceedings of the International Conference on Web Search and Data Mining (WSDM 2008), Palo Alto, CA, USA, 11–12 February 2008; pp. 219–229. [Google Scholar] [CrossRef]
- Ott, M.; Choi, Y.; Cardie, C.; Hancock, J.T. Finding Deceptive Opinion Spam by Any Stretch of the Imagination. arXiv 2011, arXiv:1107.4557. [Google Scholar]
- Yoo, K.H.; Gretzel, U. Comparison of Deceptive and Truthful Travel Reviews. In Proceedings of the Information and Communication Technologies in Tourism, Amsterdam, The Netherlands, 28–30 January 2009; Springer: Vienna, Austria, 2009; pp. 37–47. [Google Scholar] [CrossRef]
- Sandulescu, V.; Ester, M. Detecting singleton review spammers using semantic similarity. In Proceedings of the 24th International Conference on World Wide Web (WWW 2015), Florence, Italy, 18–22 May 2015; Association for Computing Machinery, Inc.: New York, NY, USA, 2015; Volume 5, pp. 971–976. [Google Scholar] [CrossRef]
- Amazon. Amazon Reviews Dataset. 2014. Available online: https://jmcauley.ucsd.edu/data/amazon/ (accessed on 31 March 2023).
- Ni, J.; Li, J.; McAuley, J. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong, China, 3–7 November 2019; pp. 188–197. [Google Scholar] [CrossRef]
- Salminen, J.; Kandpal, C.; Kamel, A.M.; Jung, S.G.; Jansen, B.J. Creating and detecting fake reviews of online products. J. Retail. Consum. Serv. 2022, 64, 102771. [Google Scholar] [CrossRef]
- Yelp Inc. Yelp Reviews and Users Dataset. 2022. Available online: https://www.yelp.com/dataset/documentation/main (accessed on 31 March 2023).
- Wang, J.; Kan, H.; Meng, F.; Mu, Q.; Shi, G.; Xiao, X. Fake review detection based on multiple feature fusion and rolling collaborative training. IEEE Access 2020, 8, 182625–182639. [Google Scholar] [CrossRef]
- Barbado, R.; Araque, O.; Iglesias, C.A. A framework for fake review detection in online consumer electronics retailers. Inf. Process. Manag. 2019, 56, 1234–1244. [Google Scholar] [CrossRef]
- Li, H.; Chen, Z.; Liu, B.; Wei, X.; Shao, J. Spotting Fake Reviews via Collective Positive-Unlabeled Learning. In Proceedings of the IEEE International Conference on Data Mining, Shenzhen, China, 14–17 December 2014. [Google Scholar] [CrossRef]
- Crawford, M.; Khoshgoftaar, T.M.; Prusa, J.D.; Richter, A.N.; Najada, H.A. Survey of review spam detection using machine learning techniques. J. Big Data 2015, 2, 23. [Google Scholar] [CrossRef]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Hajek, P.; Barushka, A.; Munk, M. Fake consumer review detection using deep neural networks integrating word embeddings and emotion mining. Neural Comput. Appl. 2020, 32, 17259–17274. [Google Scholar] [CrossRef]
- Li, Y.; Feng, X.; Zhang, S. Detecting Fake Reviews Utilizing Semantic and Emotion Model. In Proceedings of the 3rd International Conference on Information Science and Control Engineering (ICISCE 2016), Beijing, China, 8–10 July 2016; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2016; Volume 10, pp. 317–320. [Google Scholar] [CrossRef]
- Online Reviews Statistics and Trends: A 2022 Report by ReviewTrackers. 2022. Available online: https://www.reviewtrackers.com/reports/online-reviews-survey/ (accessed on 20 May 2023).
- Paget, S. Local Consumer Review Survey 2023. 2023. Available online: https://www.brightlocal.com/research/local-consumer-review-survey/ (accessed on 31 March 2023).
- Zarzycki, P. Ile kosztuje dobra opinia w internecie—Statystyki, przykłady, [memy] (How much does a good online reputation cost—Statistics, examples [memes]). In Proceedings of the Oh My H@ck 2020, Poland (Online), 27–28 November 2020. [Google Scholar]
- Rong, X. Word2vec Parameter Learning Explained. arXiv 2014, arXiv:1411.2738. [Google Scholar]
- Rehurek, R.; Sojka, P. Gensim–Python Framework for Vector Space Modelling; NLP Centre, the Faculty of Informatics, Masaryk University: Brno, Czech Republic, 2011; Volume 3. [Google Scholar]
- Dadas, S. A Repository of Polish NLP Resources. 2019. Available online: https://github.com/sdadas/polish-nlp-resources/ (accessed on 20 May 2023).
- Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching Word Vectors with Subword Information. arXiv 2016, arXiv:1607.04606. [Google Scholar] [CrossRef]
- Joulin, A.; Grave, E.; Bojanowski, P.; Mikolov, T. Bag of Tricks for Efficient Text Classification. arXiv 2016, arXiv:1607.01759. [Google Scholar]
- Kocoń, J.; KGR10 FastText Polish Word Embeddings. CLARIN-PL Digital Repository. 2018. Available online: http://hdl.handle.net/11321/606 (accessed on 20 May 2023).
- Agglomerative Clustering Algorithm Explained by ScikitLearn. Available online: https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering (accessed on 31 March 2023).
- Müllner, D. Modern hierarchical, agglomerative clustering algorithms. arXiv 2011, arXiv:1109.2378. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Serwis Rzeczypospolitej Polskiej (Republic of Poland Service). Lista Imion i Nazwisk Wystȩpujących w Rejestrze PESEL (List of Forenames and Surnames Appearing in the Polish PESEL Register). Available online: https://dane.gov.pl/en/dataset/1667,lista-imion-wystepujacych-w-rejestrze-pesel-osoby-zyjace (accessed on 31 March 2023).
- Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A k-means clustering algorithm. J. R. Stat. Society. Ser. C 1979, 28, 100–108. [Google Scholar] [CrossRef]
- Leader, I. How Reviews on Google Maps Work. 2022. Available online: https://blog.google/products/maps/how-google-maps-reviews-work/ (accessed on 31 March 2023).
- GMR-PL Fake Reviews Dataset. Available online: https://www.kaggle.com/datasets/pawegryka/gmr-pl-fake-reviews-dataset (accessed on 20 May 2023).
Name | Authors | No. of Records | Fake Percentage | Labeled | Natural | Public | Reference |
---|---|---|---|---|---|---|---|
Amazon | Jindal and Liu | 5.8 M | 0.95% | Some | Yes | No | [5] |
TripAdvisor | Ott et al. | 800 | 50% | Yes | No | Yes | [6] |
TripAdvisor (Marriott subset) | Yoo and Gretzel | 82 | 51% | Yes | No | No | [7] |
Trustpilot | Trustpilot | 9000 | “Balanced” | Yes | Yes | No | [8] |
Yelp reviews | Yelp | 6.99 M | Unknown | Some | Yes | Yes | [12] |
Yelp users | Yelp | 1.99 M | Unknown | Some | Yes | Yes | [12] |
Amazon review dataset | Amazon | 83.68 M | Unknown | No | Yes | Yes | [9] |
Amazon review dataset | Ni, Li, McAuley | 233.1 M | Unknown | No | Yes | Yes | [10] |
GPT-2 dataset | Joni S et al. | 40,000 | 50% | Yes | Half | Yes | [11] |
Timestamp of Review | Error Margin |
---|---|
1–59 min ago | 1 min |
1–23 h ago | 1 h |
1–6 days ago | 1 day |
1–4 weeks ago | 1 week |
1–11 months ago | 1 month |
1–X years ago | 1 year |
Property Name | Description | Property Type |
---|---|---|
name | Name of the account. Most often composed of a name and surname. | String |
reviewer_id | ID of the account assigned by Google Maps; a part of the account’s URL. | Numerical string |
local_guide_level | Level of the account in the Google Maps local guide program. | Integer or null |
number_of_reviews | Number of reviews by the account on the day of scraping. | Integer or null |
reviewer_url | URL of the account (if the account is private, the website has no valuable information). | String |
fake_service | Name of the fake review service provider through which the account was found. 1 | String |
is_private | If true, the account is not available for scraping, as the information is hidden. | Boolean |
is_deleted | If true, has been deleted and its information is no longer accessible. 2 | Boolean |
Property Name | Description | Property Type |
---|---|---|
review_id | ID of the review assigned by Google Maps. | Numerical string |
place_name | Name of the place that was given a review. | String |
rating | Rating in stars from 1 to 5. | Integer |
content | Content of the review. | String |
reviewer_url | Internet address of an account that gave the review. | String |
reviewer_id | ID of an account which gave the review | Numerical string |
place_url | Internet address of a place for which the review was given. | String |
localization | Geographical coordinates of the place for which the review was given. | “lat” and “lon” object |
photos_urls | If photo(s) attached to the review, it is a list of their URLs. | List of strings or null |
type_of_object | Type of the reviewed place, as assigned by Google Maps (e.g., Restaurant, Park, etc.). | String |
response_content | Content of the place owner’s response to the review, if a response was given. | String or null |
date | Approximate date when the review was given. See Section 3.2. | datetime object |
cluster | More general type of object given by us. See Section 3.4. | String |
is_real | Determines whether the review came from a real account. | Boolean |
Property | Value |
---|---|
Number of accounts | 605 |
Number of real accounts | 286 |
Number of fake accounts | 319 |
Number of private accounts at the start | 162 |
Number of private accounts at the end | 89 |
Number of deleted accounts at the end | 246 |
Number of reviews | 17,979 |
Number of fake reviews | 2436 |
Number of real reviews | 15,543 |
Oldest review | 11 February 2013 22:03:25 |
Newest review | 9 February 2023 23:05:17 |
Metric | Basic Features | Extended Features |
---|---|---|
Accuracy | 0.907 | 0.930 |
Prevalence | 0.425 | 0.586 |
F1 score | 0.892 | 0.916 |
Precision | 0.899 | 0.904 |
Recall | 0.885 | 0.929 |
Metric | Value |
---|---|
Accuracy | 0.932 |
Prevalence | 0.136 |
F1 score | 0.736 |
Precision | 0.698 |
Recall | 0.778 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gryka, P.; Janicki, A. Detecting Fake Reviews in Google Maps—A Case Study. Appl. Sci. 2023, 13, 6331. https://doi.org/10.3390/app13106331
Gryka P, Janicki A. Detecting Fake Reviews in Google Maps—A Case Study. Applied Sciences. 2023; 13(10):6331. https://doi.org/10.3390/app13106331
Chicago/Turabian StyleGryka, Paweł, and Artur Janicki. 2023. "Detecting Fake Reviews in Google Maps—A Case Study" Applied Sciences 13, no. 10: 6331. https://doi.org/10.3390/app13106331
APA StyleGryka, P., & Janicki, A. (2023). Detecting Fake Reviews in Google Maps—A Case Study. Applied Sciences, 13(10), 6331. https://doi.org/10.3390/app13106331