NLP-Based Digital Forensic Analysis for Online Social Network Based on System Security
Abstract
:1. Introduction
- This research applies natural language processing techniques for the detailed data analysis approach;
- One of the important aspects of this research is the multi data source input, which makes this process more competitive with other research results;
- The main focus of this research is a system security method which stores the OSN information in blockchain framework.
2. Related Work
2.1. Digital Forensics Challenges in Blockchain
2.2. Forensic Attainment of Social Media Content
3. Proposed NLP-Based Digital Forensic Analysis for Online Social Network
3.1. NLP-Based Digital Forensic Analysis
3.2. Blockchain-Based Digital Forensic Analysis
4. Experimental Results and Development Environment
4.1. Data Representation and Collection
4.2. Performance Evaluation of the Proposed Online Digital Forensic Analysis
4.3. Security Analysis of Online Digital Forensic Based on Blockchain
- The first step is digital evidence identification. The aim of this is to identify the digital fingerprint of evidence. Furthermore, one fingerprint is generated to examine the event for every certain claim;
- Based on the timestamp and additional information, the fingerprint records are written into the evidence block and appended to the blockchain;
- In the blockchain network, every participant holds a copy of the evidence blockchain.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Di Domenico, G.; Sit, J.; Ishizaka, A.; Nunan, D. Fake news, social media and marketing: A systematic review. J. Bus. Res. 2021, 124, 329–341. [Google Scholar] [CrossRef]
- Grubl, T.; Lallie, H.S. Applying Artificial Intelligence for Age Estimation in Digital Forensic Investigations. arXiv 2022, arXiv:2201.03045. [Google Scholar]
- Suryanto, H.; Degeng, I.N.S.; Djatmika, E.T.; Kuswandi, D. The effect of creative problem solving with the intervention social skills on the performance of creative tasks. Creat. Stud. 2021, 14, 323–335. [Google Scholar] [CrossRef]
- Shahbazi, Z.; Byun, Y.C. Analyzing the Performance of User Generated Contents in B2B Firms Based on Big Data and Machine Learning. Soft Comput. Mach. Intell. J. 2021, 1, 1–9. [Google Scholar]
- Shahbazi, Z.; Byun, Y.C. Twitter Sentiment Analysis Using Natural Language Processing and Machine Learning Techniques. In Proceedings of the KIIT Conference, Jeju, Korea, 15 December 2021; pp. 42–44. [Google Scholar]
- Shahbazi, Z.; Byun, Y.C. Deep Learning Method to Estimate the Focus Time of Paragraph. Int. J. Mach. Learn. Comput. 2020, 10, 75–80. [Google Scholar] [CrossRef]
- Heckmann, T.; Souvignet, T.; Sauveron, D.; Naccache, D. Medical Equipment Used for Forensic Data Extraction: A low-cost solution for forensic laboratories not provided with expensive diagnostic or advanced repair equipment. Forensic Sci. Int. Digit. Investig. 2021, 36, 301092. [Google Scholar] [CrossRef]
- Patil, A.; Banerjee, S.; Jadhav, D.; Borkar, G. Roadmap of Digital Forensics Investigation Process with Discovery of Tools. Cyber Secur. Digit. Forensics 2022, 241–269. [Google Scholar]
- Rouzbahani, H.M.; Dehghantanha, A.; Choo, K.K.R. Big Data Analytics and Forensics: An Overview. In Handbook of Big Data Analytics and Forensics; Springer: Berlin/Heidelberg, Germany, 2022; pp. 1–5. [Google Scholar]
- Li, S.; Sun, Q.; Xu, X. Forensic analysis of digital images over smart devices and online social networks. In Proceedings of the 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Exeter, UK, 28–30 June 2018; pp. 1015–1021. [Google Scholar]
- Javed, A.R.; Ahmed, W.; Alazab, M.; Jalil, Z.; Kifayat, K.; Gadekallu, T.R. A Comprehensive Survey on Computer Forensics: State-of-the-art, Tools, Techniques, Challenges, and Future Directions. IEEE Access 2022, 10, 11065–11089. [Google Scholar] [CrossRef]
- Lorch, B.; Scheler, N.; Riess, C. Compliance Challenges in Forensic Image Analysis Under the Artificial Intelligence Act. arXiv 2022, arXiv:2203.00469. [Google Scholar]
- Hemdan, E.E.D.; Manjaiah, D. An efficient digital forensic model for cybercrimes investigation in cloud computing. Multimed. Tools Appl. 2021, 80, 14255–14282. [Google Scholar] [CrossRef]
- Alnajjar, I.A.; Mahmuddin, M. Feature indexing and search optimization for enhancing the forensic analysis of mobile cloud environment. Inf. Secur. J. Glob. Perspect. 2021, 30, 235–256. [Google Scholar] [CrossRef]
- Bhagat, S.P.; Meshram, B.B. Digital Forensic Tools for Cloud Computing Environment. In ICT with Intelligent Applications; Springer: Berlin/Heidelberg, Germany, 2022; pp. 49–57. [Google Scholar]
- Salamh, F.E.; Karabiyik, U.; Rogers, M.K.; Matson, E.T. A comparative uav forensic analysis: Static and live digital evidence traceability challenges. Drones 2021, 5, 42. [Google Scholar] [CrossRef]
- Khalid Alabdulsalam, S.; Duong, T.Q.; Raymond Choo, K.K.; Le-Khac, N.A. An efficient IoT forensic approach for the evidence acquisition and analysis based on network link. Log. J. IGPL 2022. [Google Scholar] [CrossRef]
- Loli, M.; Mitoulis, S.A.; Tsatsis, A.; Manousakis, J.; Kourkoulis, R.; Zekkos, D. Flood characterization based on forensic analysis of bridge collapse using UAV reconnaissance and CFD simulations. Sci. Total Environ. 2022, 822, 153661. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Qin, T.; Min, G. Blockchain-based digital forensics investigation framework in the internet of things and social systems. IEEE Trans. Comput. Soc. Syst. 2019, 6, 1433–1441. [Google Scholar] [CrossRef] [Green Version]
- Alsulami, H. Implementation analysis of reliable unmanned aerial vehicles models for security against cyber-crimes: Attacks, tracebacks, forensics and solutions. Comput. Electr. Eng. 2022, 100, 107870. [Google Scholar] [CrossRef]
- Misra, S.; Arumugam, C. Illumination of Artificial Intelligence in Cybersecurity and Forensics; Springer: Berlin/Heidelberg, Germany, 2022; Volume 109. [Google Scholar]
- Kaushik, K.; Dahiya, S.; Sharma, R. Role of Blockchain Technology in Digital Forensics. In Blockchain Technology; CRC Press: Boca Raton, FL, USA, 2022; pp. 235–246. [Google Scholar]
- Kebande, V.R.; Ikuesan, R.A.; Karie, N.M. Review of Blockchain Forensics Challenges. In Blockchain Security in Cloud Computing; Springer: Berlin/Heidelberg, Germany, 2022; pp. 33–50. [Google Scholar]
- Li, S.; Choo, K.K.R.; Sun, Q.; Buchanan, W.J.; Cao, J. IoT forensics: Amazon echo as a use case. IEEE Internet Things J. 2019, 6, 6487–6497. [Google Scholar] [CrossRef] [Green Version]
- Li, S.; Zhao, S.; Yang, P.; Andriotis, P.; Xu, L.; Sun, Q. Distributed consensus algorithm for events detection in cyber-physical systems. IEEE Internet Things J. 2019, 6, 2299–2308. [Google Scholar] [CrossRef] [Green Version]
- Ganesh, N.; Venkatesh, N.; Prasad, D. A Systematic Literature Review on Forensics in Cloud, IoT, AI & Blockchain. Illum. Artif. Intell. Cybersecur. Forensics 2022, 109, 197–229. [Google Scholar]
- Rajawat, A.S.; Rawat, R.; Barhanpurkar, K. Security Improvement Technique for Distributed Control System (DCS) and Supervisory Control-Data Acquisition (SCADA) Using Blockchain at Dark Web Platform. Cyber Secur. Digit. Forensics 2022, 317–333. [Google Scholar] [CrossRef]
- Shahbazi, Z.; Byun, Y.C. Blockchain-based Event Detection and Trust Verification Using Natural Language Processing and Machine Learning. IEEE Access 2021, 10, 5790–5800. [Google Scholar] [CrossRef]
- Ryu, J.H.; Sharma, P.K.; Jo, J.H.; Park, J.H. A blockchain-based decentralized efficient investigation framework for IoT digital forensics. J. Supercomput. 2019, 75, 4372–4387. [Google Scholar] [CrossRef]
- Siddiqi, A.S.; Alam, M.; Mehta, D.; Zafar, S. Machine Learning-Based Predictive Analysis to Abet Climatic Change Preparedness. In Cyber Security and Digital Forensics; Springer: Berlin/Heidelberg, Germany, 2022; pp. 541–550. [Google Scholar]
- Mishra, A.; Khan, M.; Khan, W.; Khan, M.Z.; Srivastava, N.K. A Comparative Study on Data Mining Approach Using Machine Learning Techniques: Prediction Perspective. In Pervasive Healthcare; Springer: Berlin/Heidelberg, Germany, 2022; pp. 153–165. [Google Scholar]
- Shahbazi, Z.; Byun, Y.C. Fake media detection based on natural language processing and blockchain approaches. IEEE Access 2021, 9, 128442–128453. [Google Scholar] [CrossRef]
- Coffey, C.A.; Batastini, A.B.; Vitacco, M.J. Clues from the digital world: A survey of clinicians’ reliance on social media as collateral data in forensic evaluations. Prof. Psychol. Res. Pract. 2018, 49, 345. [Google Scholar] [CrossRef]
- Baror, S.O.; Venter, H.S.; Adeyemi, R. A natural human language framework for digital forensic readiness in the public cloud. Aust. J. Forensic Sci. 2021, 53, 566–591. [Google Scholar] [CrossRef]
- Barik, K.; Abirami, A.; Konar, K.; Das, S. Research Perspective on Digital Forensic Tools and Investigation Process. In Illumination of Artificial Intelligence in Cybersecurity and Forensics; Springer: Berlin/Heidelberg, Germany, 2022; pp. 71–95. [Google Scholar]
- Shahbazi, Z.; Byun, Y.C. Agent-Based Recommendation in E-Learning Environment Using Knowledge Discovery and Machine Learning Approaches. Mathematics 2022, 10, 1192. [Google Scholar] [CrossRef]
- Kaur, R.; Singh, S.; Kumar, H. Authorship analysis of online social media content. In Proceedings of the 2nd International Conference on Communication, Computing and Networking, Haldia, India, 15–16 November 2019; pp. 539–549. [Google Scholar]
- Montasari, R.; Hill, R. Next-generation digital forensics: Challenges and future paradigms. In Proceedings of the 2019 IEEE 12th International Conference on Global Security, Safety and Sustainability (ICGS3), London, UK, 16–18 January 2019; pp. 205–212. [Google Scholar]
- McGuire, J.C.; Leung, W.S. Enhancing digital forensic investigations into emails through sentiment analysis. In Proceedings of the ECCWS 2018 17th European Conference on Cyber Warfare and Security V2, Oslo, Norway, 28–29 June 2018; p. 288. [Google Scholar]
- Shahbazi, Z.; Byun, Y.C. Computing focus time of paragraph using deep learning. In Proceedings of the 2019 IEEE Transportation Electrification Conference and Expo, Asia-Pacific (ITEC Asia-Pacific), Seogwipo, Korea, 8–10 May 2019; pp. 1–4. [Google Scholar]
- Shahbazi, Z.; Byun, Y.C. LDA Topic Generalization on Museum Collections. In Smart Technologies in Data Science and Communication; Springer: Berlin/Heidelberg, Germany, 2020; pp. 91–98. [Google Scholar]
- Mouhssine, E.; Khalid, C. Social big data mining framework for extremist content detection in social networks. In Proceedings of the 2018 International Symposium on Advanced Electrical and Communication Technologies (ISAECT), Rabat, Morocco, 21–23 November 2018; pp. 1–5. [Google Scholar]
- Dhaliwal, P. Comprehensive Exploration of Machine Learning based models in Digital Forensics—A plunge into Hate Speech Detection. In Proceedings of the 2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), Greater Noida, India, 17–18 December 2021; pp. 1933–1938. [Google Scholar]
- Iqbal, F.; Debbabi, M.; Fung, B. Artificial intelligence and digital forensics. In Machine Learning for Authorship Attribution and Cyber Forensics; Springer: Berlin/Heidelberg, Germany, 2020; pp. 139–150. [Google Scholar]
- Horan, C.; Saiedian, H. Cyber Crime Investigation: Landscape, Challenges, and Future Research Directions. J. Cybersecur. Priv. 2021, 1, 580–596. [Google Scholar] [CrossRef]
- Shahbazi, Z.; Byun, Y.C.; Lee, D.C. Toward representing automatic knowledge discovery from social media contents based on document classification. Int. J. Adv. Sci. Technol. 2020, 29, 14089–14096. [Google Scholar]
- Shahbazi, Z.; Byun, Y.C. Topic prediction and knowledge discovery based on integrated topic modeling and deep neural networks approaches. J. Intell. Fuzzy Syst. 2021, 41, 1–17. [Google Scholar] [CrossRef]
- Seckiner, D.; Mallett, X.; Roux, C.; Meuwly, D.; Maynard, P. Forensic image analysis—CCTV distortion and artefacts. Forensic Sci. Int. 2018, 285, 77–85. [Google Scholar] [CrossRef] [Green Version]
- Sanyasi, M.; Kumar, P. Digital Forensics Investigation for Attacks on Artificial Intelligence. SPAST Abstr. 2021, 1, 1. [Google Scholar]
- Khan, A.A.; Shaikh, A.A.; Laghari, A.A.; Dootio, M.A.; Rind, M.M.; Awan, S.A. Digital forensics and cyber forensics investigation: Security challenges, limitations, open issues, and future direction. Int. J. Electron. Secur. Digit. Forensics 2022, 14, 124–150. [Google Scholar] [CrossRef]
- Krishnan, S.; Shashidhar, N.; Varol, C.; Islam, A.R. Evidence Data Preprocessing for Forensic and Legal Analytics. Int. J. Comput. Linguist. (IJCL) 2021, 12, 24. [Google Scholar]
- Choi, J.; Yu, J.; Hyun, S.; Kim, H. Digital forensic analysis of encrypted database files in instant messaging applications on Windows operating systems: Case study with KakaoTalk, NateOn and QQ messenger. Digit. Investig. 2019, 28, S50–S59. [Google Scholar] [CrossRef]
- Zhang, H.; Chen, L.; Liu, Q. Digital forensic analysis of instant messaging applications on android smartphones. In Proceedings of the 2018 International Conference on Computing, Networking and Communications (ICNC), Maui, HI, USA, 5–8 March 2018; pp. 647–651. [Google Scholar]
- Du, X.; Hargreaves, C.; Sheppard, J.; Anda, F.; Sayakkara, A.; Le-Khac, N.A.; Scanlon, M. SoK: Exploring the state of the art and the future potential of artificial intelligence in digital forensic investigation. In Proceedings of the 15th International Conference on Availability, Reliability and Security, Virtual Event, Ireland, 25–28 August 2020; pp. 1–10. [Google Scholar]
- Xiao, J.; Li, S.; Xu, Q. Video-based evidence analysis and extraction in digital forensic investigation. IEEE Access 2019, 7, 55432–55442. [Google Scholar] [CrossRef]
- Al-Jadir, I.; Wong, K.W.; Fung, C.C.; Xie, H. Enhancing digital forensic analysis using memetic algorithm feature selection method for document clustering. In Proceedings of the 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Miyazaki, Japan, 7–10 October 2018; pp. 3673–3678. [Google Scholar]
- Venugopalan, M.; Gupta, D. An enhanced guided LDA model augmented with BERT based semantic strength for aspect term extraction in sentiment analysis. Knowl.-Based Syst. 2022, 246, 108668. [Google Scholar] [CrossRef]
- Palimkar, P.; Shaw, R.N.; Ghosh, A. Machine learning technique to prognosis diabetes disease: Random forest classifier approach. In Advanced Computing and Intelligent Technologies; Springer: Berlin/Heidelberg, Germany, 2022; pp. 219–244. [Google Scholar]
Author | Proposed Approach | Advantages | Limitations |
---|---|---|---|
Choi et al. (2019) [52] | Digital forensic analysis for the Kakao Talk encrypted data. | Data recovery without requiring a password from a user. | Difficult to protect user-sensitive information. |
Zhang et al. (2018) [53] | Digital forensic analysis for smart phone instant messaging. | Investigating the history of user messages in four Android mobile applications. | Limitation in communication mode for one-to- one contacts. |
Du et al. (2020) [54] | Future of artificial intelligence in investigation of digital forensics. | Survey of automated evidence-processing methods based on AI techniques. | Image data with low quality is difficult to train and process further. |
Xiao et al. (2019) [55] | Analysis of video- based evidence investigation of digital forensics. | Identification of forensics and link- establishment to investigate the objects. | Difficult to identify human face recognition, motion detection, etc. |
Jadir et al. (2018) [56] | Digital forensic enhancement for document clustering, | Enhancing the document clustering performance for partitioning the criminal reports and text dataset. | Challenges of processing if the data numbers increase. |
Before Feature Selection | ||||||
---|---|---|---|---|---|---|
Samples | topic 1 | topic 2 | topic 3 | topic 4 | topic 5 | y (label) |
Vector 1 | 1 | 0 | 0 | 0 | 0 | 0 |
Vector 2 | 0 | 0 | 0.7 | 0.2 | 0.4 | 1 |
Vector 3 | 0.5 | 0.3 | 0.2 | 0 | 0.4 | 0 |
Vector 4 | 0 | 0.6 | 0 | 0 | 0.6 | 1 |
After Feature Selection | ||||||
Vectors | topic 1 | topic 2 | topic 3 | y (label) | ||
Vector 1 | 1 | 0 | 0 | 0 | ||
Vector 2 | 0 | 0 | 0.2 | 1 | ||
Vector 3 | 0.5 | 0.3 | 0 | 0 | ||
Vector 4 | 0 | 0.6 | 0 | 1 |
Module | Component | Description |
---|---|---|
Machine Learning | Operating System | Microsoft Windows 10 |
CPU | Intel (R) Core (TM) [email protected] GHz | |
Main Memory | 16GB RAM | |
Core Programming Language | Python | |
IDE | PyCharm Professional 2020 | |
ML Algorithm | Random Forest | |
Blockchain Framework | Operating System | Ubuntu Linux 18.04 LTS |
Docker Engine | Version 18.06.1-ce | |
Docker Composer | Version 1.13.0 | |
IDE | Composer Playground | |
Programming Language | Node.js |
Data Type | Total Records |
---|---|
5000 | |
6500 | |
Blogs | 6600 |
News | 5500 |
Training Set | 80% |
Testing Set | 20% |
Name of Operators | Details |
---|---|
Tweet Cloud | Object correlation method to provide the fast overview of users’ tweet topics. |
Hashtag Cloud | Object correlation based on hashtags of user tweets. |
Interaction Graph | Subject and object correlation for sorting contacts between the social graph of users with the highest communication frequency. |
Interaction Frequency Analysis | Subject and objective correlation to perform the frequency analysis between two users and identify the relationship of the users’ communication. |
Views Similarity | Rule-based correlation for nearest user-opinion identification. |
Trace Operator | Linking the evidence to the entity. |
Temporal Activity Graph | Using temporal correlation to analyze the user activity patterns in a defined period. |
Geo-location Activity Graph | Object correlation for sorting the location based on the tagged online content. |
Fold | Metrics | Decision Tree | Naive Bayes | Logistic Regression | Random Forest | Support Vector Machine |
---|---|---|---|---|---|---|
1 | P | 0.6595 | 0.8254 | 0.9486 | 0.9846 | 0.6487 |
R | 0.7511 | 0.9700 | 0.7111 | 0.6911 | 0.7348 | |
F1 | 0.6667 | 0.6174 | 0.8611 | 0.7811 | 0.7794 | |
2 | P | 0.7198 | 0.3541 | 0.6736 | 0.8947 | 0.4955 |
R | 0.5511 | 0.7511 | 0.4711 | 0.5948 | 0.6564 | |
F1 | 0.6944 | 0.5656 | 0.5611 | 0.7182 | 0.5836 | |
3 | P | 0.6111 | 0.6993 | 0.8793 | 0.9831 | 0.6939 |
R | 0.6311 | 0.9300 | 0.7511 | 0.8334 | 0.8479 | |
F1 | 0.6825 | 0.6111 | 0.7622 | 0.7939 | 0.7749 | |
4 | P | 0.5968 | 0.7986 | 0.8611 | 0.9444 | 0.6232 |
R | 0.8711 | 0.8711 | 0.7911 | 0.7746 | 0.7498 | |
F1 | 0.6929 | 0.6477 | 0.7994 | 0.8337 | 0.6949 | |
5 | P | 0.6374 | 0.4058 | 0.7929 | 0.8478 | 0.5498 |
R | 0.5111 | 0.8711 | 0.7111 | 0.6964 | 0.7699 | |
F1 | 0.5990 | 0.6566 | 0.7633 | 0.7982 | 0.6479 |
Metrics | Decision Tree | Naive Bayes | Logistic Regression | Random Forest | Support Vector Machine |
---|---|---|---|---|---|
P | 0.6449 | 0.5367 | 0.6393 | 0.0.9443 | 0.6279 |
R | 0.6631 | 0.9191 | 0.6871 | 0.6943 | 0.7432 |
F1 | 0.6673 | 0.6197 | 0.7334 | 0.7611 | 0.6745 |
# | Metrics | Decision Tree | Naive Bayes | Logistic Regression | Random Forest | Support Vector Machine |
---|---|---|---|---|---|---|
With feature selection | P | 0.6449 | 0.6367 | 0.8513 | 0.9443 | 0.6279 |
R | 0.6631 | 0.9191 | 0.6871 | 0.6943 | 0.7432 | |
F1 | 0.6673 | 0.6197 | 0.7534 | 0.7611 | 0.6745 | |
Without feature selection | P | 0.6293 | 0.6176 | 0.8122 | 0.8321 | 0.4574 |
R | 0.6171 | 0.5351 | 0.6791 | 0.5467 | 0.6831 | |
F1 | 0.6372 | 0.5779 | 0.6974 | 0.6998 | 0.5445 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shahbazi, Z.; Byun, Y.-C. NLP-Based Digital Forensic Analysis for Online Social Network Based on System Security. Int. J. Environ. Res. Public Health 2022, 19, 7027. https://doi.org/10.3390/ijerph19127027
Shahbazi Z, Byun Y-C. NLP-Based Digital Forensic Analysis for Online Social Network Based on System Security. International Journal of Environmental Research and Public Health. 2022; 19(12):7027. https://doi.org/10.3390/ijerph19127027
Chicago/Turabian StyleShahbazi, Zeinab, and Yung-Cheol Byun. 2022. "NLP-Based Digital Forensic Analysis for Online Social Network Based on System Security" International Journal of Environmental Research and Public Health 19, no. 12: 7027. https://doi.org/10.3390/ijerph19127027
APA StyleShahbazi, Z., & Byun, Y. -C. (2022). NLP-Based Digital Forensic Analysis for Online Social Network Based on System Security. International Journal of Environmental Research and Public Health, 19(12), 7027. https://doi.org/10.3390/ijerph19127027