Social Media Analytics on Russia–Ukraine Cyber War with Natural Language Processing: Perspectives and Challenges
Abstract
:1. Introduction
- Russia has engaged in cyber warfare as a part of its military strategy in the ongoing conflict with Ukraine.
- The cyber attacks conducted by Russia have targeted a wide range of Ukrainian organizations, including government agencies, media outlets, and critical infrastructure.
- The cyber attacks have caused significant damage to Ukrainian organizations, including disruptions to IT networks, power outages, and data theft.
- Russian cyber operations have sought to influence political events in Ukraine, including attempts to manipulate election results and conduct surveillance on political figures.
- Ukrainian hackers have retaliated against Russian cyber attacks by targeting Russian organizations, including media outlets and government agencies.
- Hacktivist groups have also targeted Belarusian infrastructure as a means of disrupting Russian troop movements towards Ukraine.
- Despite efforts by researchers to decipher Russian cyber strategies, the exact goals and motivations behind these attacks remain unclear.
- This paper reported the first critical analysis of Twitter-based critical cyber analytics on the Russia–Ukraine cyber war using data obtained through the Twitter API.
- Using natural language processing (NLP) algorithms, like language detection, translation, sentiment analysis, latent Dirichlet allocation (LDA), term frequency–inverse document frequency (TF-IDF), Porter stemming, n-grams, and others, on live tweets, this study reported an innovative approach in social-media-based cyber intelligence.
- Using a comprehensive literature survey, this paper generated a four-dimensional cyber intelligence framework composed of a geopolitical and socioeconomic perspective; targeted victim perspective; psychological and societal perspective; and national priority and concerns perspective
- This paper used 37,386 tweets that originated from 30,706 users in 54 different languages from 13 October 2022 to 6 April 2023, for automatically generating a cyber intelligence report in four cyber dimensions.
- Finally, this paper reported 12 different challenges of using NLP algorithms on social media for harnessing reliable social-media-based cyber intelligence
2. Background Context and Literature
2.1. Context of Russia–Ukraine Cyber War
2.2. Multidimensional Analysis of Cyber Threats
2.3. NLP-Based Analysis of Tweets
3. Materials and Methods
- Obtain tweets with the keywords “cyber” or “hack”:T = getTweetsWithKeywords(“cyber”, “hack”)
- Categorize tweets into English and non-English tweets:Tenglish = {t ∈ T | L(t) == “English”}Tnon_english = {t ∈ T | L(t) ≠ “English”}
- Translate non-English tweets to English:Ttranslated = {translateToEnglish(t, L(t)) | t ∈ Tnon_english}
- Perform sentiment analysis on English and translated tweets:S(t) = performSentimentAnalysis(t) for t ∈ Tenglish ∪ Ttranslated
- Group tweets by country names:G = groupTweetsByCountry(T)
- Perform term frequency calculation on each country group:TF(g) = calculateTermFrequency(g) for g ∈ G
- Perform LDA topic analysis on each country group’s term frequency data:LDA(g) = performLDATopicAnalysis(TF(g)) for g ∈ G
Algorithm 1: Pseudocode of NLP-Based Cyber Intelligence Solution Using Twitter Feed | ||||
1: | FUNCTION socialMediaAnalytics(keyword) | |||
2: | tweets = getTweetsByKeyword(keyword) // Obtain tweets with the given keyword | |||
3: | // Initialize data structures to store categorized tweet | |||
4: | englishTweets = [] | |||
5: | FOR EACH tweet IN tweets | |||
6: | language = detectTweetLanguage(tweet) // Detect the language of the tweet | |||
7: | IF language == “English” | |||
8: | sentiment = performSentimentAnalysis(tweet) // Perform sentiment analysis on English tweets | |||
9: | englishTweets.append({tweet: sentiment}) | |||
10: | ELSE IF language != “English” | |||
11: | translatedTweet = translateToEnglish(tweet, language) // Translate non-English tweets to English | |||
12: | sentiment = performSentimentAnalysis(translatedTweet) // Perform sentiment analysis on translated tweets | |||
13: | nonEnglishTweets.append({tweet: sentiment}) | |||
14: | // Group tweets by country names | |||
15: | countryGroups = groupTweetsByCountry(tweets) | |||
16: | // Initialize data structures to store analyzed country-grouped tweets | |||
17: | analyzedCountryGroups = [] | |||
18: | FOR EACH countryGroup IN countryGroups | |||
19: | termFrequency = calculateTermFrequency(countryGroup) // Calculate term frequency for the tweets | |||
20: | ldaTopicAnalysis = performLDATopicAnalysis(termFrequency) // Perform LDA topic analysis on the term frequency data | |||
21: | analyzedCountryGroups.append({countryGroup: ldaTopicAnalysis}) | |||
22: | RETURN analyzedCountryGroups // Return the final analyzed data for each country group | |||
23: | END FUNCTION |
Algorithm 2: Using NLP on Tweets Concerning Cyber Issues for Russia and Ukraine | |||
1: | For Each xi Tweet in N, Multilingual Tweets | ||
2: | If Detect_Language(xi)<> ‘English Language’ | ||
3: | yi = Perform_English_Translattion(xi) | ||
4: | Else | ||
5: | yi = xi | ||
6: | End If | ||
7: | End Loop | ||
8: | For Each yi Tweet in N, English Tweets | ||
9: | si = Analyse_Sentiment(yi) | ||
10: | If yi Contains ‘Russia’ Or ‘Ukraine’ | ||
11: | {, yi, } = yi | ||
12: | End If | ||
13: | End Loop | ||
14: | For Each cr in C, Country Names (i.e., for Russia & Ukraine) | ||
15: | {{, }, …} = Perform_TF-IDF(Tokenization(yi)) | ||
16: | {{, }, …} = Perform_PorterStemming(Tokenization(yi)) | ||
17: | {{, }, …} = Perform_N-Gram(Tokenization (yi)) | ||
18: | {{, {{}…}}, …} = Perform_LDA(Tokenization (yi)) | ||
19: | End Loop |
4. Results
5. Discussion
5.1. Analysis of Russian Cyber-Related Tweets
5.1.1. Russian Topic 1
5.1.2. Russian Topic 2
5.1.3. Russian Topic 3
5.1.4. Russian Topic 4
5.1.5. Russian Topic 5
5.1.6. Russian Topic 6
5.1.7. Russian Topic 7
5.2. Analysis of Ukrainian Cyber-Related Tweets
5.2.1. Ukrainian Topic 1
5.2.2. Ukrainian Topic 2
5.2.3. Ukrainian Topic 3
5.2.4. Ukrainian Topic 4
5.2.5. Ukrainian Topic 5
5.2.6. Ukrainian Topic 6
5.2.7. Ukrainian Topic 7
5.3. Overall Outcome of Topic Analysis
5.4. Challenges of Social-Media-Based Cyber Intelligence
6. Conclusions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Willett, M. The Cyber Dimension of the Russia–Ukraine War. Survival 2022, 64, 7–26. [Google Scholar] [CrossRef]
- Lewis, J.A. Cyber War and Ukraine. 16 June 2022. Available online: https://www.csis.org/analysis/cyber-war-and-ukraine (accessed on 2 May 2023).
- Gibney, E. Where is Russia’s cyberwar? Researchers decipher its strategy. Nature 2022, 603, 775–776. [Google Scholar] [CrossRef] [PubMed]
- Bateman, J. Russia’s Wartime Cyber Operations in Ukraine: Military Impacts, Influences, and Implications. 16 December 2022. Available online: https://carnegieendowment.org/2022/12/16/russia-s-wartime-cyber-operations-in-ukraine-military-impacts-influences-and-implications-pub-88657 (accessed on 3 May 2023).
- Pearson, J.; Bing, C. The Cyber War between Ukraine and Russia: An Overview. 10 May 2022. Available online: https://www.reuters.com/world/europe/factbox-the-cyber-war-between-ukraine-russia-2022-05-10/ (accessed on 3 May 2023).
- Rudenko, O. Authorities: Hackers Foiled in Bid to Rig Ukraine Presidential Election Results. 2014. Available online: https://www.kyivpost.com/post/7672 (accessed on 2 April 2023).
- BBC News. Hackers Caused Power Cut in Western Ukraine—US. 2016. Available online: https://www.bbc.com/news/technology-35297464 (accessed on 2 April 2023).
- Banerjea, A. NotPetya: How a Russian Malware Created the World’s Worst Cyberattack Ever. 27 August 2018. Available online: https://www.business-standard.com/article/technology/notpetya-how-a-russian-malware-created-the-world-s-worst-cyberattack-ever-118082700261_1.html (accessed on 2 April 2023).
- Microsoft Security. Destructive Malware Targeting Ukrainian Organizations. 15 January 2022. Available online: https://www.microsoft.com/en-us/security/blog/2022/01/15/destructive-malware-targeting-ukrainian-organizations/ (accessed on 2 April 2023).
- Boutilier, A.; Stephenson, M. Global Affairs Canada Suffers ‘Cyber Attack’ Amid Russia-Ukraine Tensions: Sources. 24 January 2022. Available online: https://globalnews.ca/news/8533835/global-affairs-hit-with-significant-multi-day-disruption-to-it-networks-sources/ (accessed on 2 April 2023).
- Microsoft Security. ACTINIUM Targets Ukrainian Organizations. 4 February 2022. Available online: https://www.microsoft.com/en-us/security/blog/2022/02/04/actinium-targets-ukrainian-organizations/ (accessed on 2 April 2023).
- Kovacs, E. Ukraine Separatists, Politicians Targeted in Surveillance Operation. Security Week. 19 May 2016. Available online: https://www.securityweek.com/ukraine-separatists-politicians-targeted-surveillance-operation/ (accessed on 3 May 2023).
- Shamanska, A. Hackers in Ukraine Deface Separatist Websites To Mark Victory Day. Radio Free Europe. 9 May 2016. Available online: https://www.rferl.org/a/hackers-ukraine-deface-separatist-websites-victory-day-opmay9/27724532.html (accessed on 3 May 2023).
- Inform Napalm. Ukrainian Hackers Break into the Russian Channel One. 6 November 2016. Available online: https://informnapalm.org/en/ru-channel-one/ (accessed on 3 May 2023).
- Walker, S. Kremlin Puppet Master’s Leaked Emails Are Price of Return to Political Frontline. The Guardian. 26 October 2016. Available online: https://www.theguardian.com/world/2016/oct/26/kremlin-puppet-masters-leaked-emails-vladislav-surkov-east-ukraine (accessed on 3 May 2023).
- Pietsch, B. Hacking Group Claims Control of Belarusian Railroads in Move to ‘Disrupt’ Russian Troops Heading near Ukraine. Washington Post. 25 January 2022. Available online: https://www.washingtonpost.com/world/2022/01/25/belarus-railway-hacktivist-russia-ukraine-cyberattack/ (accessed on 3 May 2023).
- Sufi, F. A New Social Media-Driven Cyber Threat Intelligence. Electronics 2023, 12, 1242. [Google Scholar] [CrossRef]
- Hernandez-Suarez, A.; Sanchez-Perez, G.; Toscano-Medina, K.; Martinez-Hernandez, V.; Perez-Meana, H.; Olivares-Mercado, J.; Sanchez, V. Social Sentiment Sensor in Twitter for Predicting Cyber-Attacks Using ℓ1 Regularization. Sensors 2018, 18, 1380. [Google Scholar] [CrossRef]
- Sufi, F. Algorithms in Low-Code-No-Code for Research Applications: A Practical Review. Algorithms 2023, 16, 108. [Google Scholar] [CrossRef]
- Pattnaik, N.; Li, S.; Nurse, J.R. Perspectives of non-expert users on cyber security and privacy: An analysis of online discussions on twitter. Comput. Secur. 2023, 125, 103008. [Google Scholar] [CrossRef]
- Geetha, R.; Karthika, S. Sensitive Keyword Extraction Based on Cyber Keywords and LDA in Twitter to Avoid Regrets. In Computational Intelligence in Data Science, ICCIDS 2020, IFIP Advances in Information and Communication Technology, Chennai, India, 20–22 February 2020; Springer: Berlin/Heidelberg, Germany, 2020; Volume 578. [Google Scholar]
- Sufi, F. A New AI-Based Semantic Cyber Intelligence Agent. Futur. Internet 2023, 15, 231. [Google Scholar] [CrossRef]
- Shah, R.; Aparajit, S.; Chopdekar, R.; Patil, R. Machine Learning based Approach for Detection of Cyberbullying Tweets. Int. J. Comput. Appl. 2020, 175, 51–56. [Google Scholar] [CrossRef]
- Rawat, R.; Mahor, V.; Chirgaiya, S.; Shaw, R.N.; Ghosh, A. Analysis of Darknet Traffic for Criminal Activities Detection Using TF-IDF and Light Gradient Boosted Machine Learning Algorithm. In Innovations in Electrical and Electronic Engineering: Proceedings of ICEEE 2021; Lecture Notes in Electrical Engineering book series; Springer: Singapore, 2021; Volume 756, pp. 671–681. [Google Scholar] [CrossRef]
- Lanier, H.D.; Diaz, M.I.; Saleh, S.N.; Lehmann, C.U.; Medford, R.J. Analyzing COVID-19 disinformation on Twitter using the hashtags #scamdemic and #plandemic: Retrospective study. PLoS ONE 2022, 17, e0268409. [Google Scholar] [CrossRef]
- Hagen, R.A. Unraveling the Complexity of Cyber Security Threats: A Multidimensional Approach. 15 April 2023. Available online: https://www.linkedin.com/pulse/unraveling-complexity-cyber-security-threats-approach-hagen/ (accessed on 25 April 2023).
- Correia, V.J. An Explorative Study into the Importance of Defining and Classifying Cyber Terrorism in the United Kingdom. SN Comput. Sci. 2021, 3, 84. [Google Scholar] [CrossRef]
- Li, Y.; Liu, Q. A comprehensive review study of cyber-attacks and cyber security; Emerging trends and recent developments. Energy Rep. 2021, 7, 8176–8186. [Google Scholar] [CrossRef]
- Agrafiotis, I.; Nurse, J.R.C.; Goldsmith, M.; Creese, S.; Upton, D. A taxonomy of cyber-harms: Defining the impacts of cyber-attacks and understanding how they propagate. J. Cybersecur. 2018, 4, tyy006. [Google Scholar] [CrossRef]
- Bhaskar, R. Better Cybersecurity Awareness through Research. 2022. Available online: https://www.isaca.org/resources/isaca-journal/issues/2022/volume-3/better-cybersecurity-awareness-through-research (accessed on 1 April 2023).
- Humayun, M.; Niazi, M.; Jhanjhi, N.; Alshayeb, M.; Mahmood, S. Cyber Security Threats and Vulnerabilities: A Systematic Mapping Study. Arab. J. Sci. Eng. 2020, 45, 3171–3189. [Google Scholar] [CrossRef]
- Alkhalil, Z.; Hewage, C.; Nawaf, L.; Khan, I. Phishing Attacks: A Recent Comprehensive Study and a New Anatomy. Front. Comput. Sci. 2021, 3, 563060. [Google Scholar] [CrossRef]
- Alim, S. Analysis of Tweets Related to Cyberbullying: Exploring Information Diffusion and Advice Available for Cyberbullying Victims. Int. J. Cyber Behav. Psychol. Learn. 2015, 5, 31–52. [Google Scholar] [CrossRef]
- Microsoft Documentation. Text Analytics: A Collection of Features from AI Language that Extract, Classify, and Understand Text within Documents. 2023. Available online: https://azure.microsoft.com/en-us/products/ai-services/text-analytics (accessed on 6 August 2023).
- Sufi, F. Novel Application of Open-Source Cyber Intelligence. Electronics 2023, 12, 3610. [Google Scholar] [CrossRef]
- Sufi, F.K.; Khalil, I. Automated Disaster Monitoring from Social Media Posts Using AI-Based Location Intelligence and Sentiment Analysis. IEEE Trans. Comput. Soc. Syst. 2022, in press. [Google Scholar] [CrossRef]
- Sufi, F.K. AI-SocialDisaster: An AI-based software for identifying and analyzing natural disasters from social media. Softw. Impacts 2022, 11, 100319. [Google Scholar] [CrossRef]
- Sufi, F.K.; Alsulami, M. Automated Multidimensional Analysis of Global Events with Entity Detection, Sentiment Analysis and Anomaly Detection. IEEE Access 2021, 9, 152449–152460. [Google Scholar] [CrossRef]
- Sufi, F.K. AI-GlobalEvents: A Software for analyzing, identifying and explaining global events with Artificial Intelligence. Softw. Impacts 2022, 11, 100218. [Google Scholar] [CrossRef]
- Pang, B.; Lee, L.; Vaithyanathan, S. Thumbs up?: Sentiment classification using machine learning techniques. arXiv 2002, arXiv:0205070. [Google Scholar]
- Turney, P.D. Thumbs up or thumbs down?: Semantic orientation applied. arXiv 2002, arXiv:0212032. [Google Scholar]
- Naseem, U.; Razzak, I.; Khushi, M.; Eklund, P.W.; Kim, J. COVIDSenti: A Large-Scale Benchmark Twitter. IEEE Trans. Comput. Soc. Syst. 2020, 8, 1003–1015. [Google Scholar] [CrossRef]
- Li, L.; Zhang, Q.; Wang, X.; Zhang, J.; Wang, T.; Gao, T.-L.; Duan, W.; Tsoi, K.K.-F.; Wang, F.-Y. Characterizing the Propagation of Situational Information in Social Media During COVID-19 Epidemic: A Case Study on Weibo. IEEE Trans. Comput. Soc. Syst. 2020, 7, 556–562. [Google Scholar] [CrossRef]
- Cameron, D.; Smith, G.A.; Daniulaityte, R.; Sheth, A.P.; Dave, D.; Chen, L.; Anand, G.; Carlson, R.; Watkins, K.Z.; Falck, R. PREDOSE: A semantic web platform for drug abuse epidemiology using social media. J. Biomed. Inform. 2013, 46, 985–997. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.; Faviez, C.; Schuck, S.; Lillo-Le-Louët, A.; Texier, N.; Dahamna, B.; Huot, C.; Foulquié, P.; Pereira, S.; Leroux, V.; et al. Mining Patients’ Narratives in Social Media for Pharmacovigilance: Adverse Effects and Misuse of Methylphenidate. Front. Pharmacol. 2018, 9, 541. [Google Scholar] [CrossRef]
- McNaughton, E.C.; Black, R.A.; Zulueta, M.G.; Budman, S.H.; Butler, S.F. Measuring online endorsement of prescription opioids abuse: An integrative methodology. Pharmacoepidemiol. Drug Saf. 2012, 21, 1081–1092. [Google Scholar] [CrossRef]
- Al-Twairesh, N.; Al-Negheimish, H. Surface and Deep Features Ensemble for Sentiment Analysis of Arabic Tweets. IEEE Access 2019, 7, 84122–84131. [Google Scholar] [CrossRef]
- Vashisht, G.; Sinha, Y.N. Sentimental study of CAA by location-based tweets. Int. J. Inf. Technol. 2021, 13, 1555–1567. [Google Scholar] [CrossRef]
- Ebrahimi, M.; Yazdavar, A.H.; Sheth, A. Challenges of Sentiment Analysis for Dynamic Events. IEEE Intell. Syst. 2017, 32, 70–75. [Google Scholar] [CrossRef]
- Evangelatos, P.; Iliou, C.; Mavropoulos, T.; Apostolou, K.; Tsikrika, T.; Vrochidis, S.; Kompatsiaris, I. Named Entity Recognition in Cyber Threat Intelligence Using Transformer-based Models. In Proceedings of the 2021 IEEE International Conference on Cyber Security and Resilience (CSR), Rhodes, Greece, 26–28 July 2021. [Google Scholar] [CrossRef]
- Wu, H.; Li, X.; Gao, Y. An Effective Approach of Named Entity Recognition for Cyber Threat Intelligence. In Proceedings of the 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 12–14 June 2020. [Google Scholar] [CrossRef]
- Batbaatar, E.; Ryu, K.H. Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach. Int. J. Environ. Res. Public Health 2019, 16, 3628. [Google Scholar] [CrossRef]
- Khandpur, R.P.; Ji, T.; Jan, S.; Wang, G.; Lu, C.-T.; Ramakrishnan, N. Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media. In Proceedings of the CIKM ‘17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017. [Google Scholar]
- Koloveas, P.; Chantzios, T.; Alevizopoulou, S.; Skiadopoulos, S.; Tryfonopoulos, C. inTIME: A Machine Learning-Based Framework for Gathering and Leveraging Web Data to Cyber-Threat Intelligence. Electronics 2021, 10, 818. [Google Scholar] [CrossRef]
- Shin, H.-S.; Kwon, H.-Y.; Ryu, S.-J. A New Text Classification Model Based on Contrastive Word Embedding for Detecting Cybersecurity Intelligence in Twitter. Electronics 2020, 9, 1527. [Google Scholar] [CrossRef]
- Zhao, J.; Yan, Q.; Li, J.; Shao, M.; He, Z.; Li, B. TIMiner: Automatically extracting and analyzing categorized cyber threat intelligence from social data. Comput. Secur. 2020, 95, 101867. [Google Scholar] [CrossRef]
- Schellekens, J. Release the bots of war: Social media and Artificial Intelligence as international cyber attack. Przegląd Eur. 2021, 4, 163–179. [Google Scholar] [CrossRef]
- Sun, N.; Zhang, J.; Gao, S.; Zhang, L.Y.; Camtepe, S.; Xiang, Y. Data Analytics of Crowdsourced Resources for Cybersecurity Intelligence. In Network and System Security: 14th International Conference, NSS 2020, Melbourne, VIC, Australia, 25–27 November 2020, Proceedings 14; Springer International Publishing: New York, NY, USA, 2020; Volume 12570, pp. 3–21. [Google Scholar]
- Subroto, A.; Apriyana, A. Cyber risk prediction through social media big data analytics and statistical machine learning. J. Big Data 2019, 6, 50. [Google Scholar] [CrossRef]
- Oosthoek, K.; Doerr, C. Cyber Threat Intelligence: A Product Without a Process? Int. J. Intell. Counterintelligence 2021, 34, 300–315. [Google Scholar] [CrossRef]
- Van Hee, C.; Jacobs, G.; Emmery, C.; Desmet, B.; Lefever, E.; Verhoeven, B.; De Pauw, G.; Daelemans, W.; Hoste, V. Automatic detection of cyberbullying in social media text. PLoS ONE 2018, 13, e0203794. [Google Scholar] [CrossRef]
- Paradise, A.; Shabtai, A.; Puzis, R.; Elyashar, A.; Elovici, Y.; Roshandel, M.; Peylo, C. Creation and Management of Social Network Honeypots for Detecting Targeted Cyber Attacks. IEEE Trans. Comput. Soc. Syst. 2017, 4, 65–79. [Google Scholar] [CrossRef]
- Carley, K.M. Social cybersecurity: An emerging science. Comput. Math. Organ. Theory 2020, 26, 365–381. [Google Scholar] [CrossRef]
- Yuvaraj, N.; Srihari, K.; Dhiman, G.; Somasundaram, K.; Sharma, A.; Rajeskannan, S.; Soni, M.; Gaba, G.S.; AlZain, M.A.; Masud, M. Nature-Inspired-Based Approach for Automated Cyberbullying Classification on Multimedia Social Networking. Math. Probl. Eng. 2021, 2021, 6644652. [Google Scholar] [CrossRef]
- Shu, K.; Sliva, A.; Sampson, J.; Liu, H. Understanding Cyber Attack Behaviors with Sentiment Information on Social Media. In Social, Cultural, and Behavioral Modeling: 11th International Conference, SBP-BRiMS 2018, Washington, DC, USA, 10–13 July 2018, Proceedings 11; Springer International Publishing: New York, NY, USA, 2018; Volume 10899, pp. 377–388. [Google Scholar]
- Sliva, A.; Shu, K.; Liu, H. Using Social Media to Understand Cyber Attack Behavior. In Advances in Human Factors, Business Management and Society: Proceedings of the AHFE 2018 International Conference on Human Factors, Business Management and Society, Orlando, FL, USA, 21–25 July 2018; Springer: New York, NY, USA, 2019; Volume 783, pp. 636–645. [Google Scholar] [CrossRef]
- Du, Y.; Huang, C.; Liang, G.; Fu, Z.; Li, D.; Ding, Y. ExpSeeker: Extract public exploit code information from social media. Appl. Intell. 2022, 53, 15772–15786. [Google Scholar] [CrossRef]
- Alves, F.; Bettini, A.; Ferreira, P.M.; Bessani, A. Processing tweets for cybersecurity threat awareness. Inf. Syst. 2020, 95, 101586. [Google Scholar] [CrossRef]
- Mughaid, A.; Al-Zu’bi, S.; AL Arjan, A.; Al-Amrat, R.; Alajmi, R.; Abu Zitar, R.; Abualigah, L. An intelligent cybersecurity system for detecting fake news in social media websites. Soft Comput. 2022, 26, 5577–5591. [Google Scholar] [CrossRef] [PubMed]
- Fang, Y.; Gao, J.; Liu, Z.; Huang, C. Detecting Cyber Threat Event from Twitter Using IDCNN and BiLSTM. Appl. Sci. 2020, 10, 5922. [Google Scholar] [CrossRef]
- Tundis, A.; Ruppert, S.; Mühlhäuser, M. On the Automated Assessment of Open-Source Cyber Threat Intelligence Sources. In Proceedings of the Computational Science—ICCS 2020, Amsterdam, The Netherlands, 3–5 June 2020; Volume 12138. [Google Scholar] [CrossRef]
- Sangwan, S.R.; Bhatia, M.P.S. Soft computing for abuse detection using cyber-physical and social big data in cognitive smart cities. Expert Syst. 2021, 39, e12766. [Google Scholar] [CrossRef]
- Jacobs, G.; Van Hee, C.; Hoste, V. Automatic classification of participant roles in cyberbullying: Can we detect victims, bullies, and bystanders in social media text? Nat. Lang. Eng. 2022, 28, 141–166. [Google Scholar] [CrossRef]
- Sánchez, J.R.; Campo-Archbold, A.; Rozo, A.Z.; Díaz-López, D.; Pastor-Galindo, J.; Mármol, F.G.; Díaz, J.A. Uncovering Cybercrimes in Social Media through Natural Language Processing. Complexity 2021, 2021, 7955637. [Google Scholar] [CrossRef]
- Ho, S.M.; Li, W. “I know you are, but what am I?” Profiling cyberbullying based on charged language. Comput. Math. Organ. Theory 2022, 28, 293–320. [Google Scholar] [CrossRef]
- Rezvan, M.; Shekarpour, S.; Alshargi, F.; Thirunarayan, K.; Shalin, V.L.; Sheth, A. Analyzing and learning the language for different types of harassment. PLoS ONE 2020, 15, e0227330. [Google Scholar] [CrossRef]
- De Boer, M.H.T.; Bakker, B.J.; Boertjes, E.; Wilmer, M.; Raaijmakers, S.; van der Kleij, R. Text Mining in Cybersecurity: Exploring Threats and Opportunities. Multimodal Technol. Interact. 2019, 3, 62. [Google Scholar] [CrossRef]
- Mendhurwar, S.; Mishra, R. Integration of social and IoT technologies: Architectural framework for digital transformation and cyber security challenges. Enterp. Inf. Syst. 2019, 15, 565–584. [Google Scholar] [CrossRef]
- Basheer, R.; Alkhatib, B. Threats from the Dark: A Review over Dark Web Investigation Research for Cyber Threat Intelligence. J. Comput. Netw. Commun. 2021, 2021, 1302999. [Google Scholar] [CrossRef]
- Mittal, S.; Das, P.K.; Mulwad, V.; Joshi, A.; Finin, T. CyberTwitter: Using Twitter to generate alerts for cybersecurity threats and vulnerabilities. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), San Francisco, CA, USA, 18–21 August 2016. [Google Scholar]
- Thakur, K.; Hayajneh, T.; Tseng, J. Cyber Security in Social Media: Challenges and the Way Forward. IT Prof. 2019, 21, 41–49. [Google Scholar] [CrossRef]
- Rodriguez, A.; Okamura, K. Social Media Data Mining for Proactive Cyber Defense. J. Inf. Process. 2020, 28, 230–238. [Google Scholar] [CrossRef]
- Le, B.-D.; Wang, G.; Nasim, M.; Babar, M.A. Gathering Cyber Threat Intelligence from Twitter Using Novelty Classification. arXiv 2019, arXiv:1907.01755. [Google Scholar]
- Maisano, R.; Foresti, G.L. A Sentiment Analysis Anomaly Detection System for Cyber Intelligence. Int. J. Neural Syst. 2022, 33, 2350003. [Google Scholar] [CrossRef]
- Lau, R.Y.K.; Xia, Y.; Ye, Y. A Probabilistic Generative Model for Mining Cybercriminal Networks from Online Social Media. IEEE Comput. Intell. Mag. 2014, 9, 31–43. [Google Scholar] [CrossRef]
- Alevizopoulou, S.; Koloveas, P.; Tryfonopoulos, C.; Raftopoulou, P. Social Media Monitoring for IoT Cyber-Threats. In Proceedings of the 2021 IEEE International Conference on Cyber Security and Resilience (CSR), Rhodes, Greece, 26–28 July 2021. [Google Scholar]
- Syed, R. Cybersecurity vulnerability management: A conceptual ontology and cyber intelligence alert system. Inf. Manag. 2020, 57, 103334. [Google Scholar] [CrossRef]
- Lima, A.Q.; Keegan, B. Chapter 3—Challenges of using machine learning algorithms for cybersecurity: A study of threat-classification models applied to social media communication data. In Cyber Influence and Cognitive Threats; Academic Press: Cambridge, MA, USA, 2020; pp. 33–52. [Google Scholar]
- Chen, B.; Chen, X. MAUIL: Multi-level Attribute Embedding for Semi-supervised User Identity Linkage. Inf. Sci. 2022, 593, 527–545. [Google Scholar] [CrossRef]
- Zannettou, S.; Caulfield, T.; Bradlyn, B.; De Cristofaro, E.; Stringhini, G.; Blackburn, J. Characterizing the Use of Images in State-Sponsored Information Warfare Operations by Russian Trolls on Twitter. In Proceedings of the International AAAI Conference on Web and Social Media, Atlanta, GA, USA, 1–5 June 2020; Volume 14. [Google Scholar] [CrossRef]
- Zannettou, S.; Caulfield, T.; De Cristofaro, E.; Sirivianos, M.; Stringhini, G.; Blackburn, J. Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web. arXiv 2019, arXiv:1801.09288. [Google Scholar] [CrossRef]
Cyber Dimensions | Existing Literature |
---|---|
1. Geopolitical and socioeconomic | [26] |
2. Targeted victim | [27,28] |
3. Psychological and societal | [29] |
4. National priority and concerns | [29,30] |
Notation | Definition |
---|---|
T | Set of all tweets containing the keywords “cyber” or “hack” |
Tenglish | Set of English tweets in T |
Tnon_english | Set of non-English tweets in T |
Ttranslated | Set of translated English tweets obtained from Tnon_english |
S(t) | Sentiment score of tweet t |
L(t) | Language of tweet t |
C(t) | Country name mentioned in tweet t |
G | Set of country groups, where each group contains tweets that belong to a specific country based on the country name mentioned in the tweet |
Name of Steps | Use of External API | Name of the Algorithm | References |
---|---|---|---|
Analyzing Sentiment | ✔ | Microsoft Text Analytics [34] | [18,20,36,37,38,39] |
English Translation | ✔ | Microsoft Text Analytics [34] | [36,37,38,39] |
Modeling Topics | X | LDA | [20,21] |
Analyzing Term Frequency | X | TF-IDF | [18,20,21,23,24] |
Analyzing Term Frequency | X | Porter Stemming | [18] |
Analyzing Term Frequency | X | N-Gram | [18,20,21] |
Month | Tweets | Users | Geo-Spatial Locations | No. of Languages | Retweets | Avg. Negative Sentiment | Avg. Neutral Sentiment | Avg. Positive Sentiment | English Translations |
---|---|---|---|---|---|---|---|---|---|
Oct-22 | 3954 | 3556 | 1588 | 38 | 3,727,756 | 0.36 | 0.43 | 0.21 | 941 |
Nov-22 | 6470 | 5875 | 2358 | 38 | 9,981,856 | 0.34 | 0.43 | 0.23 | 1283 |
Dec-22 | 6512 | 5544 | 2225 | 42 | 7,565,946 | 0.35 | 0.42 | 0.23 | 1533 |
Jan-23 | 6685 | 5785 | 2364 | 40 | 7,802,301 | 0.36 | 0.40 | 0.24 | 1419 |
Feb-23 | 5976 | 5053 | 2114 | 43 | 4,276,479 | 0.37 | 0.42 | 0.21 | 1373 |
Mar-23 | 6634 | 5749 | 2357 | 41 | 4,799,540 | 0.36 | 0.43 | 0.21 | 1469 |
Apr-23 | 1155 | 1083 | 538 | 27 | 713,083 | 0.40 | 0.41 | 0.20 | 258 |
Total | 37,386 | 30,706 | 10,178 | 54 | 38,866,961 | 0.36 | 0.42 | 0.22 | 8199 |
No 1 | Wgt | No 2 | Wgt | No 3 | Wgt | No 4 | Wgt | No 5 | Wgt | No 6 | Wgt | No 7 | Wgt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Russia | Russian | 72 | cyber | 65 | Russian | 65 | hack | 70 | Russia | 60 | Russia | 97 | Russian | 60 |
cyber | 65 | Russian | 44 | hack | 25 | Russia | 36 | invades | 50 | hacked | 19 | Putin | 59 | |
attack | 49 | Ukraine | 24 | Shellenberger MD | 14 | Russian | 30 | cyber | 42 | cyber | 18 | using | 58 | |
blame | 27 | McGonigal | 22 | hacking | 14 | Russians | 27 | attacks | 33 | helped | 16 | Trump | 57 | |
threat | 26 | FBI | 19 | amp | 13 | DNC | 16 | Darth Putin KGB | 26 | new | 16 | story | 57 | |
Ukraine | state | 3 | The Study of War | 2 | says | 3 | role | 3 | country | 5 | Ukraine | 117 | leaks | 2 |
absolutely | 2 | FBI | 2 | GicAriana | 2 | OMC Ukraine | 2 | loser | 3 | cyber | 76 | cyber warfare | 2 | |
threat | 2 | air | 2 | need | 2 | anonymous link | 2 | brigade | 2 | Russian | 31 | cyber attacks | 2 | |
report | 2 | infrastructure | 2 | do not | 2 | council | 2 | hacker | 2 | Ukrainian | 28 | red | 2 | |
cross | 2 | one | 2 | security | 2 | Ukraine–Russia War | 2 | awareness | 2 | hack | 28 | never | 2 |
Performance Parameters | Measured Values for Russia | Measured Values for Ukraine |
---|---|---|
LogLikelihood | −57,933.967 | −23,251.897 |
Perplexity | 458.384 | 1016.203 |
Average tokens | 1165.143 | 392 |
Average document_entropy | 4.495 | 4.364 |
Average word-length | 6.143 | 7.229 |
Average coherence | −13.754 | −14.672 |
Average uniform_dist | 2.677 | 2.009 |
Average corpus_dist | 1.614 | 1.925 |
Average eff_num_words | 98.33 | 179.378 |
Average token-doc-diff | 0.001 | 0.007 |
Average rank_1_docs | 0.772 | 0.174 |
Average allocation_count | 0.85 | 0.16 |
Average exclusivity | 0.597 | 0.461 |
AlphaSum | 0.118 | 8.434 |
Beta | 0.127 | 0.642 |
BetaSum | 386.22 | 1039.923 |
References | Data Quality | Privacy Concern | Bias | Data Overload | Limited Scope | Technical Limitation | Complexity | Inaccuracies | Noise | Security | Legal Limitations | Data Relevance |
---|---|---|---|---|---|---|---|---|---|---|---|---|
[53] | X | |||||||||||
[54] | X | |||||||||||
[55] | X | |||||||||||
[56] | X | |||||||||||
[57] | X | |||||||||||
[58] | ||||||||||||
[59] | X | |||||||||||
[60] | X | X | X | X | ||||||||
[61] | X | |||||||||||
[62] | X | |||||||||||
[63] | X | |||||||||||
[64] | X | |||||||||||
[65] | X | |||||||||||
[66] | X | |||||||||||
[67] | X | |||||||||||
[68] | X | |||||||||||
[69] | X | |||||||||||
[70] | X | |||||||||||
[71] | X | |||||||||||
[72] | ||||||||||||
[73] | X | |||||||||||
[74] | X | |||||||||||
[75] | X | |||||||||||
[76] | ||||||||||||
[77] | X | |||||||||||
[78] | X | |||||||||||
[50] | X | |||||||||||
[79] | X | |||||||||||
[80] | X | |||||||||||
[81] | X | X | X | X | ||||||||
[82] | ||||||||||||
[83] | X | |||||||||||
[84] | X | |||||||||||
[85] | X | |||||||||||
[86] | X | |||||||||||
[87] | X | |||||||||||
[88] | X | |||||||||||
[51] | X |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sufi, F. Social Media Analytics on Russia–Ukraine Cyber War with Natural Language Processing: Perspectives and Challenges. Information 2023, 14, 485. https://doi.org/10.3390/info14090485
Sufi F. Social Media Analytics on Russia–Ukraine Cyber War with Natural Language Processing: Perspectives and Challenges. Information. 2023; 14(9):485. https://doi.org/10.3390/info14090485
Chicago/Turabian StyleSufi, Fahim. 2023. "Social Media Analytics on Russia–Ukraine Cyber War with Natural Language Processing: Perspectives and Challenges" Information 14, no. 9: 485. https://doi.org/10.3390/info14090485
APA StyleSufi, F. (2023). Social Media Analytics on Russia–Ukraine Cyber War with Natural Language Processing: Perspectives and Challenges. Information, 14(9), 485. https://doi.org/10.3390/info14090485