Mixed analysis focuses on extracting the main information from the five most cited documents in the deepfake domain (these documents each have more than 200 citations) and 10 more papers having a number of citations between 30 and 200, which were randomly extracted. Additionally, an analysis of the connections between countries, authors, keywords, and affiliations is provided.
3.6.1. Five Most Cited Papers’ Review and Overview
The best way to understand a domain is to analyze the most globally cited documents, citations being one of the best performance metrics, together with the number of publications for each author [
34].
Table 11 contains the five most cited documents, together with some performance metrics, such as total citations, total citations per year, and normalized total citations.
In 2019 in “
California Law Review Journal”, Chesney and Citron [
35] published the most cited document globally at the time of this research. With 258 citations, an average of 43 citations per year, and 3.9 normalized total citations, the document discusses the deepfake area, where video and audio of people can be created, making them talk and do things they did not say or do, thanks to the machine learning evolution. Deepfake technology has the advantage of rapidly spreading. The scope of the analysis is to explain the main factors and risks of the technology evolution, and how the deepfakes can be detected, by creating a questionnaire and collecting information from various persons, asking them about the criminal penalties, civil liability, economic sanctions, importance of technological solutions, and many other questions. Researchers at the University of Washington presented a deepfake using a neural network tool, altering videos and making the speakers say things that were different from what they initially said, and presenting a video of Barack Obama where he appeared discussing things that he never talked about. The evolution of machine learning into neural network methods increased, in a significant manner, the appearance of images and audio. Generative adversarial networks, also known as GANs, combine two neural networks that work simultaneously. The first neural network draws on the dataset by producing a mimicked sample, and the second neural network evaluates the success of the previous network. The audio and video deepfakes will have a much greater impact in the future, and social media platforms will facilitate their distribution and increase their effects. At this moment, the legal and policy laws are not optimal, and because of that, the risks are high. Deep learning tools are the technology that created the deepfake and fake media. Forensic methods are one of the main applications that is trying to detect fake images and videos.
In 2020, Verdoliva published a paper in “IEEE Journal of Selected Topics in Signal Processing”, which has 245 citations, an average of 49 citations per year, and normalized total citations of 6.27. The paper explains the evolution of techniques that manipulate content and are able to edit information in a very realistic way. Technology has a lot of benefits, offering new features in advertising, arts, film production, or video games, but it is also extremely dangerous for society. The software applications are free and very simple to use, offering the possibility to almost everyone to create fake images and videos. The risks are a serious threat, with deepfakes being able to manipulate public opinion during elections, and discredit or blackmail individuals, or facilitate the fraud. It is necessary to find, as quickly as possible, tools that automatically detect false multimedia, reducing the spread of false information. The research focuses on presenting various methods of verification of visual media integrity and detecting manipulated videos and images. A lot of money is invested in major research initiatives in order to find the best methods of deepfake detection, but it is difficult to forecast if the efforts will ensure strong information security in the future. At this moment there are numerous methods that combat deepfakes, but these methods are not strong enough to identify all of them.
In 2020 in “
Information Fusion”, Tolosana et al. [
36] published one of the most cited documents, having 206 citations, an average of 41.20 citations per year, and 2.57 normalized total citations. The availability of free databases, together with the evolution of deep learning algorithms, especially generative adversarial networks, created realistic fake images and videos, having a serious impact on society. The focus of the research is to review the techniques of face image manipulation, including also deepfake methods, analyzing four different methods: entire face synthesis, expression swapping, attribute manipulation, and identity swapping. For each one, the available public databases, the manipulation algorithms, and the metrics to evaluate the detection rate are presented. Entire face synthesis creates a non-existent face image. Using generative adversarial networks, great results are achieved, and high-quality facial images, very similar to the real pictures, are provided. Identity swapping is the technique of replacing the face of a person in a video with the face of someone else. This type of deepfake is mostly used in the film industry, and in numerous cases, the usage of this approach is for bad purposes. Attribute manipulation represents face editing by changing some attributes of the face, such as the gender, age, or color of the hair. One of the most well-known applications that is used for attribute manipulation is FaceApp, which is available on mobile, and allows users to add various cosmetics, makeup, glasses, or hairstyles to the pictures. Expression swapping edits the facial expression of a person, and the most well-known techniques are Face2Face and NeuralTextures. One of the most common expression-swapping examples is when Mark Zuckerberg appeared in a video and discussed things that he never talked about in reality.
In 2020 in “
Social Media + Society”, Vaccari and Chadwick [
37] published a paper having 204 citations, with an average of 40.80 citations per year, and 5.22 normalized total citations. Artificial intelligence offers the possibility of mass creation, using synthetic information, of videos and images that are very close to reality. Political deepfakes are very popular on the internet, leading to disinformation, and having a serious impact on journalism and affecting the quality of democracy. Deepfakes are one of the main methods of disinformation, increasing uncertainty, reducing the trust in news on social media, increasing the cost of obtaining real information, and increasing the level of uncertainty. The research performed an experimental analysis on political deepfakes, and the results explained the trend of people’s increasingly reduced trust in news on social media, with people having less interest in cooperating in contexts where the trust level is low. This behavior can lead to less collaborative and responsible social media users, and this will constantly reduce the trust in news on social media. Citizens can also refuse to read news in order to reduce their stress level. It will become more and more difficult to perform public debates, as the society must be vigilant and able to observe every possibility of manipulation. Another problem is the trend of illiberal policies, which promises to clear the internet of deepfakes.
In 2021 in “
ACM Computing Surveys”, Mirsky and Wenke [
5] published a paper having 203 citations, an average of 50.75 citations, and normalized total citations of 13.67. In 2018, the technology of generative deep learning was used for malicious applications and spreading misinformation, and since then deepfakes have evolved significantly. The deepfake is a combination of “deep learning” and “fake”, representing fake content created by artificial neural networks. The most common use of deepfakes is the creation of videos and images. Deep learning methods also have some useful and productive applications, such as reanimating historical figures and dubbing foreign films in a realistic manner. At the end of 2017, a deepfake video appeared on Reddit for the first time in which a celebrity appeared in an adult movie; since then, the number of deepfakes has increased exponentially. In 2018, a video with Barack Obama was presented by BuzzFeed, where the former president talked about a subject that in reality was never discussed, raising serious concerns over identity theft. The deepfake is also used to clone voices. In just five seconds, a CEO of a company was scammed out of USD 250,000. Deep learning can also generate realistic human fingerprints, offering the possibility of unlocking devices. It is important to understand that not all technology is dangerous, and the purpose is not always bad, but the key is to identify the best methods to detect fake news.
Table 11.
Five most cited papers.
No. | Paper (First Author, Year, Journal, Reference) | Number of Authors | Region | Total Citations (TC) | Total Citations per Year (TCY) | Normalized TC (NTC) |
---|
1 | Chesney, Bobby, 2019, California Law Review [35] | 2 | USA | 258 | 43 | 3.90 |
2 | Verdoliva, Luisa, 2020, IEEE Journal of Selected Topics in Signal Processing [38] | 1 | Italy | 245 | 49 | 6.27 |
3 | Tolosana, Ruben, 2020, Information Fusion [36] | 5 | Spain | 206 | 41.20 | 2.57 |
4 | Vaccari, Cristian, 2020, Social Media + Society [39] | 2 | UK | 204 | 40.80 | 5.22 |
5 | Mirsky, Yisroel, 2021, ACM Computing Surveys [5] | 2 | USA | 203 | 50.75 | 13.67 |
3.6.2. Review and Overview of 10 Randomly Selected Papers
In order to reflect the research areas of the other papers included in the database, we randomly selected 10 papers that have a number of citations between 30 and 200. We decided to select papers with at least 30 citations as these papers represent the top 8% of the papers included in the database with respect to the number of citations, thus attracting relatively high interest from the research community. Furthermore, as there are 47 papers with more than 30 citations in the database, it would have been difficult to provide a review of all these papers.
Table 12 contains the information regarding the 10 selected papers. Only two countries are presented, China and USA, showing their interest in deepfakes, which are a great risk for their economics and politics. The most cited document has 69 citations, which is expected for a domain that is still new to the academic community; this document also has small values for total citations per year and normalized total citations. Similar to the analysis of Rana et al. [
12], the most used database was FaceForesnics++, and CNN models are the most commonly used
Chesney and Citron [
39] presented, in a philosophical manner, what a photo can express, but nowadays, video and audio recordings are more relevant. Audio and video recordings offer the possibility for people to be a witness to an event, even if they were not physically at that event. Thanks to social media platforms, now it is much easier to publish, share, or capture a video or a photo. However, people have to decide if they trust every single person who has access to their posts; otherwise, they could face serious problems, which could affect their social life. Chesney and Citron provided some examples of what deepfakes can produce: a private conversation between an Israeli prime minister and a colleague, in which they are discussing an imminent attack on Tehran, or an audio recording where an Iranian official describes a plan to attack a region in Iraq. All these could be faked using various deep learning tools, making them almost impossible to distinguish from the real ones. One of the most used algorithms is the generative adversarial network, or GAN, which has a pair of algorithms. The first creates content based on the source data, and the second tries to find the artificial content, picking out the fake content. Deepfakes also have numerous benefits, changing the audio or video of historical figures, or even restoring speech to people who have lost their voice. Unfortunately, deepfakes are commonly used for darker purposes, such as putting people’s faces into dangerous situations, without their consent, or trying to blackmail, intimidate, or sabotage them.
Bimber and de Zuniga [
40] believe democracy could be affected by fake information, with social media being one of the major factors in the sharing of deepfakes. The Xinhua news agency created, using AI, synthetic-video news, which looked realistic, using Barack Obama’s speech. In 2018, the Wall Street Journal was the first news company to announce the risks of a fake video, creating a dedicated team that investigates if the photos or videos that could be presented by the company are fake or not. The solutions to counter deepfakes could be mainly applied by the social media companies, by introducing an authentication of users when an account is created and requiring the individuals to publish their identity. Social media creates public opinion about what people are thinking and discussing, what they like, and what they want to do, and numerous methods of exploiting the vulnerabilities of individuals exist. Anonymity and pseudonymity in social media are the main reasons for creating deepfakes, by facilitating deception. Democracies are weaker because of deepfakes since messages are manipulated. The British Conservative Party, in 2018, used social media armies to promote their messages and to manipulate.
Yang et al. [
41] describes convolutional neural network discriminators and discusses the multi-scale texture difference. This is one of the key elements in face forgery detection, and significantly facilitates the process of identifying a fake photo and a real one with high accuracy. Because of technological advances, for humans it is impossible to detect fake and real photos and videos. A new multi-scale texture difference model has been created, known as MTD-Net, which was used for face forgery detection. The approach of the model is to leverage central difference convolution (CDC) and atrous spatial pyramid pooling (ASPP). CDC merges pixel intensity information and gradient information, offering a stationary description of texture difference information. The analysis was performed on multiple databases, i.e., Faceforensics++, Deeper Forensics-1.0, Celeb-DF, and DFDC. The results of the experiment showed great potential, having a higher accuracy compared with existing methods, showing that MTD-Net is more robust for image distortion.
Fletcher [
42] presented the historical evolution of deepfakes. The ML tool that creates deepfakes appeared in late 2017, and it took a few months for governments and social media companies to start understanding the possible impact of deepfakes. Face swapping effects can be easily achieved using AI, thanks to various applications that appeared starting in January 2018. Deep learning is a specialized form of ML, and their algorithms operate as neural nets, having a similar approach to biological neurons. Deep neural nets could be trained to indicate a correct medical diagnosis, create more efficient medicines, or completely change urban development. FakeApp is a desktop application that makes the face-swapping process extremely easy for videos, such that even users without coding knowledge can use it. Users just have to upload two videos, and in a few hours the face change process is completed. The program called Lyrebird uses deep learning techniques to create fake speeches using famous voices, such as those of Donald Trump or Barack Obama. Numerous research institutions make their AI technology available and publish open-source software libraries. One of the best-known examples is Google’s TensorFlow, which offers the possibility of innovation to any programmer. The evolution of ML algorithms is visible in the daily activities of individuals, mainly on social media platforms, where the users receive suggested ads or videos to watch.
Guo et al. [
43] described face image manipulation (FIM) techniques such as Face2Face and Deepfake, which helped spread fake images on the internet, creating serious problems and concerns for society. Researchers have progressed significantly with fake face detector algorithms, but there are numerous elements to be improved, since FIM is more and more complex. CNNs learn content representation of images but have some limitations, learning only parts of manipulation traces. An adaptive manipulation trace extraction network (AMTEN) represents the pre-processing of suppressed image content, showing the manipulation traces, focusing on convolutional layers, and predicting manipulation traces of pictures. AMTENnet, a fake face detector, was built to facilitate the integration of AMTEN with a CNN. The CNN models are divided into three different categories: stacking standard CNN modules for a certain fake image, using hand-crafted residual features by different models, and improving the form of the convolution layer, forcing the CNN to learn features from tampering traces. The results of the analysis showed good results for AMTEN, while AMTENnet had an average accuracy of 98.52%, outperforming state-of-the-art works. When the dataset contains face images with unknown post-processing operations, the algorithm had an average accuracy of 95.17%. Even if the forensics cases were simulated using post-processing methods, there are significant differences from real cases, with AI-generated images sent all over social media platforms.
Yang et al. [
44] pointed out the risks of fake videos presented on the internet, which affect individual’s activities and relationships, and also pollute the web environment and trigger public opinion. In some cases, they can become a national security threat. The majority of the existing algorithms are based on convolutional neural networks, which learn the feature differences between real and fake frames. The purpose of the analysis is to create a multi-scale self-texture attention generative network (MSTA-Net) that is able to track the potential texture trace in images and to eliminate the interference of deepfake post-processing elements. Initially, a generator was created, which performs encoding–decoding and disassembling, in order to visualize the traces, and finally merging generated trace images with the original ones as input into a classifier with Resnet. The second part of the tool is the self-texture attention mechanism (STA), which skips the existing connection between the encoder and decoder. The final step is to propose a loss function known as Prob-tuple loss restricted, which finds the probability of amending the generation of forgery traces. To check the performance of the model, several experiments were performed, showing that the algorithm performs well on FaceForensics++, Celeb-DF, Deeperforensics, and DFDC databases, having a high level of feasibility.
Rini [
45] provided an explanation of deepfakes and their effects, and introduced the idea of an epistemic backstop. Video and image recordings are an epistemic backstop, having great availability and regulating testimonial practices. Deepfakes could affect the democracy of information sharing and debates when key people could be deepfaked. Unfortunately, deepfakes spread outside of the journalistic domain, entering computer science, legal, and even academic domains. The recordings can facilitate the process of correcting errors in past testimony and regulating ongoing practices. In the summer of 2019, Deeptrace, a digital security company, tracked over 15,000 deepfakes on the web, almost double compared with an earlier period of that year, of which 96% were videos. The most dangerous applications of deepfakes are in politics. A few journalistic companies published fake political news, where, for instance, Barack Obama called Donald Trump in an offensive manner. In May 2018, the Flemish Socialist Party published a deepfake video where Donald Trump insisted on the withdrawal of Belgium from environmental treaties. Later, the Party tried to explain the purpose of the video, which was only a method of increasing interest in the subject, and was not to fool somebody. In January 2018, John Winseman described what could have been the first deepfake attempt with a political purpose, discussing gay rights in Russia.
Yang et al. [
46] believes fake detection is an acute problem, and discovered the texture variations between real and fake images. Due to technology, our life has improved significantly, but there are also threats, one of which is cybercrime. Deepfakes fabricate fake events which are shared on the internet, causing problems with a big impact and chaos. Using forensic methods, an algorithm for deepfake detection has been discovered, which compares real and fake images in image saliency, extracting the face texture differences. Resnet18, a classification network, was trained to identify the differences between images and tested to find real and fake face images, and the accuracy was also evaluated. The process is divided into two parts: the first focuses on full image training, while the second has only face images, which are also added into the training dataset, and, after that, are added to the Resnest18 algorithm. The evaluation was performed on 14,000 images and 140 videos, taking into consideration 2800 real images and 11,200 fake images. The Xception trained model has an accuracy of approximately 0.52, while that of Mesonet is 0.72 and that of Cozzolino is only 0.34. The Guided model, which was created by the researchers, has a performance of 0.8.
Yu et al. [
47] tried to improve face video forgery detection, in order to improve the generalization. There are already numerous face forgery algorithms that provide similarities in forgery trace videos. The purpose of the research is to understand a completed generalization in the detection of unknown forgery methods. Initially, a model called Specific Forgery Feature Extractors (SFFExtractors) was trained separately for each of the given forgery algorithms. Using the U-net structure, with various possible losses, the SFFExtractors were tested to detect corresponding forgery methods. In the next step, another algorithm, Common Forgery Feature Extractor (CFFExtractor), was trained, taking into consideration the results of SFFExtractors, and the similarities between forgery methods were explored. The results obtained by models on FaceForensic++ showed a great success of SFFExtractors in face forgery detection. CFFExtractor was also run on multiple databases, and the results proved that commonality learning is a good approach for improving generalization and developing an effective strategy.
Johnson [
48] explores the impact of AI in strategic decision-making processes and its stability, presenting the risks and the adaptability of military forces to the latest technology. An advantage of AI is that it replaces humans, who could be affected by empathy, creativity, intuition, or other external events, in making decisions. An AI generative adversarial network (GAN) could create deepfakes, which could create a crisis between two or more nuclear countries by just creating an image or a video of a military leader with fake orders, creating tensions and confusion between states. Deepfakes are already a tool used for disinformation and deception. It is very difficult, during a crisis, to understand the purpose of the attacker. China’s fear of a US attack has made the Chinese to prioritize false-negative scenarios instead of false-positive scenarios. False negative refers to misidentifying a nuclear weapon as non-nuclear, and false positive is misidentifying a non-nuclear weapon as a nuclear one. Technology not just provides deepfakes, but also bots and fake news, which are used to exploit human psychology by creating false narratives, intensifying false alarms, and trying to destabilize.
Table 13 presents the 10 selected documents with a number of citations greater than 30, together with the title of the papers, data that were utilized in the research, and the scope of the documents.
Considering the entire database, a thematic map was generated based on the titles of the papers using bigram word extraction. The results of this approach are visualized in
Figure 14.
The thematic map is divided into four quadrants representing niche, motor, emerging or declining, and basic themes. Within the basic themes, several key areas can be identified. Some focus on various types of deepfake detection, such as video deepfake detection (e.g., “video detection”, “deepfake video/videos” bigrams) and image forgery detection (e.g., “forgery detection”, “image forgery” bigrams). Others discuss the methods used to address deepfakes, including machine learning, deep learning, artificial intelligence, neural networks, and convolutional neural networks (e.g., “machine learning”, “deep learning”, “artificial intelligence”, “neural network”, “convolutional neural” bigrams).
Motor themes prominently feature generative adversarial networks, identified through bigrams such as “generative adversarial” and “adversarial networks”, positioned at the borderline between motor and basic themes. Additionally, manipulation detection is highlighted as an emerging theme.