Detecting Fake Accounts on Instagram Using Machine Learning and Hybrid Optimization Algorithms
Abstract
:1. Introduction
- Identify fake accounts with high precision;
- Reduce false alarms while detecting fake accounts;
- Propose a hybrid optimization technique to improve the accuracy of fake account detection.
2. Related Work
3. Materials and Methods
3.1. Methodology
3.2. Feature Selection Using Optimization Methods
3.2.1. Particle Swarm Optimization (PSO)
3.2.2. Grey Wolf Optimization (GWO) Algorithm
Binary GWO for Feature Selection
3.2.3. Hybridization of Binary GWO and PSO
3.3. Classification Methods
4. Results
4.1. Summary of Results
4.2. Discussion
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hegde, P.; Saurabh, N.; Salian, P. Detection and Classification of Genuine User Profile Based on Machine Learning Techniques. In Proceedings of the 2022 International Conference on Intelligent Technologies (CONIT), Hubli, India, 24–26 June 2022. [Google Scholar] [CrossRef]
- Kaubiyal, J.; Jain, A.K. A Feature Based Approach to Detect Fake Profiles in Twitter. In Proceedings of the 3rd International Conference on Big Data and Internet of Things, in BDIOT 2019, Melbourne, Australia, 22–24 August 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 135–139. [Google Scholar] [CrossRef]
- Purba, K.; Asirvatham, D.; Murugesan, R.K. Classification of instagram fake users using supervised machine learning algorithms. Int. J. Electr. Comput. Eng. (IJECE) 2020, 10, 2763–2772. [Google Scholar] [CrossRef]
- Dey, A.; Reddy, H.; Dey, M.; Sinha, N. Detection of Fake Accounts in Instagram Using Machine Learning. Int. J. Comput. Sci. Inf. Technol. 2019, 11, 83–90. [Google Scholar] [CrossRef]
- Efthimion, P.G.; Payne, S.; Proferes, N. Supervised Machine Learning Bot Detection Techniques to Identify Social Twitter Bots. SMU Data Sci. Rev. 2018, 1, 5. [Google Scholar]
- Saranya Shree, S.; Subhiksha, C.; Subhashini, R. Prediction of Fake Instagram Profiles Using Machine Learning. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3802584 (accessed on 19 September 2024).
- Meshram, E.P.; Bhambulkar, R.; Pokale, P.; Kharbikar, K.; Awachat, A. Automatic Detection of Fake Profile Using Machine Learning on Instagram. Int. J. Sci. Res. Sci. Technol. 2021, 8, 117–127. [Google Scholar] [CrossRef]
- Sheikhi, S. An Efficient Method for Detection of Fake Accounts on the Instagram Platform. Rev. D’intelligence Artif. 2020, 34, 429–436. [Google Scholar] [CrossRef]
- Jain, M.; Saihjpal, V.; Singh, N.; Singh, S.B. An Overview of Variants and Advancements of PSO Algorithm. Appl. Sci. 2022, 12, 8392. [Google Scholar] [CrossRef]
- Li, Y.; Lin, X.; Liu, J. An Improved Gray Wolf Optimization Algorithm to Solve Engineering Problems. Sustainability 2021, 13, 3208. [Google Scholar] [CrossRef]
- Pellet, H.; Shiaeles, S.; Stavrou, S. Localising social network users and profiling their movement. Comput. Secur. 2019, 81, 49–57. [Google Scholar] [CrossRef]
- Egele, M.; Stringhini, G.; Kruegel, C.; Vigna, G. Towards Detecting Compromised Accounts on Social Networks. IEEE Trans. Dependable Secur. Comput. 2017, 14, 447–460. [Google Scholar] [CrossRef]
- Agarwal, N.; Jabin, S.; Hussain, S.Z. Analyzing Real and Fake users in Facebook Network based on Emotions. In Proceedings of the 2019 11th International Conference on Communication Systems & Networks (COMSNETS), Bangalore, India, 7–11 January 2019; pp. 110–117. [Google Scholar] [CrossRef]
- Chen, B.; Xiong, Z.; Zhao, Y.; Zhang, J. Transformation of Mg-Bearing Minerals and its Effect on Slagging During the High-Alkali Coal Combustion. Available online: https://ssrn.com/abstract=4941607 (accessed on 19 September 2024).
- Khaled, S.; El-Tazi, N.; Mokhtar, H.M.O. Detecting Fake Accounts on Social Media. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 3672–3681. [Google Scholar] [CrossRef]
- Kondeti, P.; Yerramreddy, L.P.; Pradhan, A.; Swain, G. Fake Account Detection Using Machine Learning. In Evolutionary Computing and Mobile Sustainable Networks; Suma, V., Bouhmala, N., Wang, H., Eds.; Lecture Notes on Data Engineering and Communications Technologies; Springer: Singapore, 2021; pp. 791–802. [Google Scholar] [CrossRef]
- Suganya, R.; Muthulakshmi, S.; Venmuhilan, B.; Kumar, K.V.V.; Vignesh, G. Detect Fake Identities Using Improved Machine Learning Algorithm. Undefined. 2021. Available online: https://www.semanticscholar.org/paper/Detect-fake-identities-using-improved-Machine-Suganya-Muthulakshmi/4b4e968545cb233b351249c2cee884be37fcf0bc (accessed on 26 September 2022).
- Bindu, P.V.; Mishra, R.; Thilagam, P.S. Discovering spammer communities in twitter. J. Intell. Inf. Syst. 2018, 51, 503–527. [Google Scholar] [CrossRef]
- Patil, A.P.; Remulkar, V.; Hardik; Shirole, U. Social Networks Fake Detection. Int. J. Recent Adv. Multidiscip. Top. 2022, 3, 98–100. [Google Scholar]
- Prabhu Kavin, B.; Karki, S.; Hemalatha, S.; Singh, D.; Vijayalakshmi, R.; Thangamani, M.; Haleem, S.L.A.; Jose, D.; Tirth, V.; Kshirsagar, P.R.; et al. Machine Learning-Based Secure Data Acquisition for Fake Accounts Detection in Future Mobile Communication Networks. Wirel. Commun. Mob. Comput. 2022, 2022, e6356152. [Google Scholar] [CrossRef]
- Kadhim, A.; Abdullah, A. Fake accounts detection on social media using stack ensemble system. Int. J. Electr. Comput. Eng. 2022, 12, 3013–3022. [Google Scholar] [CrossRef]
- Benabbou, F.; Boukhouima, H.; Sael, N. Fake accounts detection system based on bidirectional gated recurrent unit neural network. Int. J. Electr. Comput. Eng. IJECE 2022, 12, 3129. [Google Scholar] [CrossRef]
- David, I.; Siordia, O.S.; Moctezuma, D. Features combination for the detection of malicious Twitter accounts. In Proceedings of the 2016 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), Ixtapa, Mexico, 9–11 November 2016; pp. 1–6. [Google Scholar] [CrossRef]
- Sowmya, P.; Chatterjee, M. Detection of Fake and Cloned Profiles in Online Social Networks. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3349673 (accessed on 19 September 2024).
- Bharti, K.K.; Pandey, S. Fake account detection in twitter using logistic regression with particle swarm optimization. Soft Comput. 2021, 25, 11333–11345. [Google Scholar] [CrossRef]
- Homsi, A.; Al-Nemri, J.; Naimat, N.; Kareem, H.; Al-Fayoumi, M.; Snober, M.A. Detecting Twitter Fake Accounts using Machine Learning and Data Reduction Techniques. In Proceedings of the 10th International Conference on Data Science, Technology and Applications (DATA 2021), Online, 6–8 July 2021; pp. 88–95. [Google Scholar] [CrossRef]
- SudalaiMuthu, T.; Reddy, C.D.K.; Reddy, B.S.; Sahithya, M.L.; Visalaxi, S. Detecting spammer and fake user on social networks using machine learning approach. AIP Conf. Proc. 2022, 2385, 050010. [Google Scholar] [CrossRef]
- Awan, M.J.; Khan, M.A.; Ansari, Z.K.; Yasin, A.; Shehzad, H.M.F. Fake profile recognition using big data analytics in social media platforms. Int. J. Comput. Appl. Technol. 2022, 68, 215–222. [Google Scholar] [CrossRef]
- Munga, J.B.; Mohandas, P. Feature Selection for Identification of Fake Profiles on Facebook. In Proceedings of the 6th Kuala Lumpur International Conference on Biomedical Engineering 2021, Online, 28–29 July 2021; Usman, J., Liew, Y.M., Ahmad, M.Y., Ibrahim, F., Eds.; IFMBE Proceedings; Springer International Publishing: Cham, Switzerland, 2022; pp. 489–497. [Google Scholar] [CrossRef]
- Gupta, A.; Kaushal, R. Towards detecting fake user accounts in facebook. In Proceedings of the 2017 ISEA Asia Security and Privacy (ISEASP), Surat, India, 29 January–1 February 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Akyon, F.C.; Kalfaoglu, M.E. Instagram Fake and Automated Account Detection. In Proceedings of the 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), Izmir, Turkey, 31 October–2 November 2019; pp. 1–7. [Google Scholar] [CrossRef]
- Durga, P.; Sudhakar, D.T. The use of supervised machine learning classifiers for the detection of fake instagram accounts. J. Pharm. Negat. Results 2023, 14, 267–279. [Google Scholar] [CrossRef]
- My Information Bubble Project. Available online: http://mib.projects.iit.cnr.it/ (accessed on 29 March 2024).
- Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
- Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; pp. 1942–1948. [Google Scholar]
- Talbi, E.-G. A Taxonomy of Hybrid Metaheuristics. J. Heuristic 2022, 8, 541–564. [Google Scholar] [CrossRef]
- Singh, N.; Singh, S.B. Hybrid algorithm of particle swarm optimization and Grey Wolf optimizer for improving convergence performance. J. Appl. Math. 2017, 2017, 2030489. [Google Scholar] [CrossRef]
Reference | Data | Methods and Performance | Limitations and Research Gaps |
---|---|---|---|
Khaled et al. [15] | Twitter (X) accounts and bots | Feature selection with SVM, Neural Networks, and a combination of SVM-NN with accuracy of 98% in training dataset | Efficiency in the test dataset is lower than training dataset |
Kondeti et al. [16] | Twitter (X) accounts and bots | SVM, Logistic Regression, Random Forest, and KNN with a peak accuracy of 98%. Used Z-Score and Min–Max normalization | Small dataset and very few features |
Suganya et al. [17] | Twitter (X) accounts | SVM, Decision Tree, Random Forest, Naïve Bayes with an accuracy of 97% for SVM | Small dataset and very few features |
Kaubiyal and Jain [2] | Twitter (X) accounts | Logistic Regression, Random Forest, and SVM with a maximum accuracy of 97.9% by Random Forest | Imbalanced data with large number of genuine accounts and spambots |
Bindu, Mishra, and Thilagam [18] | Twitter (X) accounts | Spammer communities on Twitter using unsupervised method | |
Patil et al. [19] | Twitter (X) accounts | Propose a model for detection | Implementation and results are not given |
Prabhu Kavin et al. [20] | Twitter (X) accounts | Spam detection using Logistic Regression, SVM, Random Forest, Neural Networks. Images were used to identify inappropriate content and fake images | Did not detect fake accounts |
Kadhim and Abdullah [21] | Twitter (X) accounts Management Information Base “MIB” dataset | Feature selection with Spearman’s correlation coefficient and chi-square test. Random Forest, SVM, and Naïve Bayes algorithms as a stack ensemble method were applied and Logistic Regression was used as meta classifier with a combined accuracy of 97.8% | A bigger dataset would give a better understanding of the methods applied |
Benabbou, Boukhouima, and Sael [22] | Twitter user profiles as legitimate or fake | Bidirectional Gated Recurrent Unit (BiGRU) model proposed and compared with LSTM and CNN with an accuracy of 99.44% | Tweets were gathered in a single file and transformed into a vector space using the GloVe word-embedding technique in order to preserve the semantic and syntactic contexts |
David, Siordia, and Moctezuma [23] | Twitter (X) accounts | 71 descriptive features were extracted from the profiles. Random Forest and Naïve Bayes gave an accuracy of 94% and 91%, respectively | Less than 432 followers and 1.8% of responses were secured |
P, Sowmya and Chatterjee, Madhumita [24] | Data extraction from Facebook and Twitter | Similarity between the attributes and the network was used to detect cloned and fake profiles | Results are not presented |
Bharti and S. Pandey [25] | Twitter (X) accounts | Logistic Regression and Particle Swarm Optimization to classify accounts to real and fake with an accuracy of 96% | Requires more features to be tested |
Homsi et al. [26] | Twitter (X) accounts, data from My Information Bubble (MIB) [33] | Random Forest, J48, Naïve Bayes, and KNN were used as machine learning algorithms. On the other hand, PCA and correlation were two reduction techniques. An accuracy of 98.6% was achieved with Random Forest and correlation method | More features to be added |
SudalaiMuthu et al. [27] | Facebook dataset | Random Forest, Neural Networks, and SVM with an accuracy of 87% | Limited features |
Awan et al. [28] | 4000 Facebook accounts | Random Forest with 93% accuracy | Limited features |
Munga and Mohandas [29] | Facebook accounts with limited profile information | Feature selection using entropy and information gain. Decision Tree, Naïve Bayes, and Random Forest were used without feature selection. Neural Networks, Random Forest, SVM were used with information gain. Highest accuracy was achieved with SVM and information gain with 99.64% accuracy | Limited features |
Gupta and Kaushal [30] | OSN Facebook accounts | 12 machine learning classifiers were used for accuracy testing, including SVM, minimum-order optimization, Naïve Bayes, KNN, Decision Tree, Random Forest, with highest accuracy of 79% | Limited access to data due to privacy and security |
Purba, Asirvatham, and Murugesan [3] | Instagram dataset included fake accounts purchased from different sources and authentic users | Random Forest algorithm produced the highest accuracy for the classification of 2 classes (authentic, fake) and 4 classes (authentic, active fake user, inactive fake user, spam) with an accuracy of 91.76% | Performance improvement desired |
Dey et al. [4] | Instagram dataset | Logistic Regression and Random Forest accuracy of 90.8% and 92.5%, respectively | Performance improvement desired |
Akyon and Esat Kalfaoglu [31] | Instagram dataset | Naïve Bayes, Logistic Regression, Support Vector Machine, and Neural Network, with genetic algorithm giving the highest accuracy of 96% | Biased features in the automated account dataset |
Saranya Shree, Subhiksha, and Subhashini [6] | Instagram web-scraping dataset, labeled for training | Combining image recognition and natural language processing to identify fake accounts. CNN gave an accuracy of 91.5% | Limited accounts and performance |
Meshram et al. [7] | Collected 1002 real Instagram profiles and 201 fake profiles through web crawler | Neural Networks, Random Forest, and Logistic Regression. Random Forest gives highest accuracy of 96.94% | Detection of the fake profiles and automatic money owed |
Sheikhi [8] | A dataset of legitimate and fake accounts was extracted on Instagram using a crawler | Bagged Decision Tree was the proposed method that was compared to Random Tree, J48, SVM, Naïve Bayes, and Hoeffding Tree. Bagged Tree achieved an accuracy of 98.45% | Data and features were extracted by the crawler to create a dataset |
Durga and Sudhakar [32] | Instagram dataset | KNN, Logistic Regression, and Decision Tree were used and the best accuracy of 96% was achieved by Decision Tree. | Small dataset and few features |
Ratio | 60:40 | 70:30 | 80:20 | 90:10 | ||||
---|---|---|---|---|---|---|---|---|
Optimization | Accuracy | AUC | Accuracy | AUC | Accuracy | AUC | Accuracy | AUC |
Classifier: SVM | ||||||||
PSO | 89.75 | 91.18 | 89.35 | 90.42 | 88.55 | 89.54 | 87.68 | 91.34 |
BGWO | 92.96 | 93.45 | 86.64 | 89.23 | 89.96 | 92.34 | 87.44 | 89.88 |
BGWOPSO | 93.42 | 94.48 | 92.56 | 95.23 | 97.92 | 98.87 | 96.45 | 97.25 |
Classifier: K-Nearest Neighbor | ||||||||
PSO | 85.32 | 89.65 | 88.74 | 91.83 | 83.42 | 91.35 | 82.90 | 87.35 |
BGWO | 88.24 | 93.75 | 90.65 | 92.34 | 89.25 | 94.64 | 92.95 | 99.71 |
BGWOPSO | 91.22 | 94.54 | 93.96 | 96.38 | 96.27 | 98.28 | 98.25 | 94.12 |
Classifier: Artificial Neural Network | ||||||||
PSO | 88.52 | 90.11 | 85.27 | 91.84 | 96.24 | 97.24 | 85.36 | 92.88 |
BGWO | 85.23 | 92.73 | 94.49 | 96.15 | 97.5 | 98.93 | 92.64 | 94.33 |
BGWOPSO | 92.74 | 93.47 | 95.89 | 98.75 | 99.1 | 1.0 | 93.69 | 96.35 |
Classifier: Logistic Regression | ||||||||
PSO | 83.24 | 86.27 | 83.12 | 87.46 | 89.78 | 90.75 | 87.54 | 89.18 |
BGWO | 86.75 | 88.56 | 88.43 | 90.36 | 90.23 | 91.78 | 88.23 | 90.55 |
BGWOPSO | 89.74 | 90.57 | 90.17 | 93.12 | 92.55 | 93.36 | 90.14 | 92.23 |
Classifier | Optimization Technique | Ratio | Accuracy (%) | AUC |
---|---|---|---|---|
SVM | BGWOPSO | 80:20 | 97.92 | 0.9887 |
K-Nearest Neighbor | BGWOPSO | 90:10 | 98.25 | 0.9412 |
K-Nearest Neighbor | BGWO | 90:10 | 92.95 | 0.9971 |
ANN | BGWOPSO | 80:20 | 99.1 | 1 |
Logistic Regression | BGWOPSO | 80:20 | 92.55 | 93.36 |
The State-of-the-Art Literature | Method | Accuracy | AUC |
---|---|---|---|
Purba, Asirvatham, and Murugesan [3] | Random Forest | 91.76% | |
Dey et al. [4] | Random Forest | 92.5% | |
Akyon and Esat Kalfaoglu [31] | Neural Network with genetic algorithm as feature selection | 96% | |
Saranya Shree, Subhiksha, and Subhashini [6] | CNN | 91.5% | |
Meshram et al. [7] | Random Forest | 96.94% | |
Sheikhi [8] | Bagged Decision Tree | 98.45% | |
Durga and Sudhakar [32] | Decision Tree | 96% | |
BGWOPSO | Artificial Neural Network | 99.1% | 1.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Azami, P.; Passi, K. Detecting Fake Accounts on Instagram Using Machine Learning and Hybrid Optimization Algorithms. Algorithms 2024, 17, 425. https://doi.org/10.3390/a17100425
Azami P, Passi K. Detecting Fake Accounts on Instagram Using Machine Learning and Hybrid Optimization Algorithms. Algorithms. 2024; 17(10):425. https://doi.org/10.3390/a17100425
Chicago/Turabian StyleAzami, Pegah, and Kalpdrum Passi. 2024. "Detecting Fake Accounts on Instagram Using Machine Learning and Hybrid Optimization Algorithms" Algorithms 17, no. 10: 425. https://doi.org/10.3390/a17100425
APA StyleAzami, P., & Passi, K. (2024). Detecting Fake Accounts on Instagram Using Machine Learning and Hybrid Optimization Algorithms. Algorithms, 17(10), 425. https://doi.org/10.3390/a17100425