1. Introduction
In recent years, deep learning models have been extensively utilized across a diverse array of industries [
1,
2]. These models enable the rapid and efficient analysis of substantial quantities of data, thereby significantly improving convenience in both personal and professional contexts. Furthermore, advancements in computer technology have rendered social interactions more accessible, thereby encouraging a growing number of individuals to share their photographs on social networks. Nevertheless, the development of web scraping technologies has similarly surged. Traditional scraping techniques, when combined with deep learning models, can analyze and extract images or other information from social media with a high degree of accuracy and efficiency [
3,
4]. Regrettably, certain malevolent actors exploit these technologies to unlawfully acquire sensitive user information, including photographs, preferences, geographical locations, and social connections. This unauthorized and malevolent data acquisition constitutes a significant threat to user privacy, thereby complicating the exploration of defense mechanisms against such intrusions [
5].
Szegedy et al. were the first to introduce the concept of adversarial examples (AEs), positing that these examples are generated by adding specific adversarial perturbations to the original images. These AEs can mislead deep learning models, resulting in incorrect outcomes [
6]. In practical applications, AEs constitute a significant security threat to deep learning systems. On the other hand, there are advantageous aspects associated with AEs; for instance, social platforms can implement minor adversarial perturbations during user photo uploads to deter attackers attempting to exploit deep learning models for identification. This methodology can also be broadly applied in the domain of privacy protection [
7,
8] and in counteracting malicious algorithms [
9,
10,
11].
Based on different attack conditions, adversarial attacks can typically be categorized into two types: white-box attacks and black-box attacks. White-box attacks allow complete access to the information of the deep learning model, including network architecture and weights. Such attacks are typically rapid and exhibit a high success rate. Conversely, black-box attacks typically restrict access to the outputs produced by the deep learning model or the corresponding probability scores. In black-box settings, adversarial attack algorithms are unable to access the internal details of the deep learning model, leading to comparatively slower attacks and reduced success rates. However, it is crucial to acknowledge that black-box attack conditions more accurately reflect real-world scenarios.
The Fast Gradient Sign Method (FGSM), proposed by Goodfellow et al., represents a classic white-box attack method [
12]. This method utilizes the loss function and the sign function to rapidly generate adversarial perturbations. Following this, several researchers have performed further optimizations based on FGSM [
13,
14]. Furthermore, certain researchers have employed this classic white-box attack method to investigate the robustness of deep learning models [
15,
16].
In contrast to gradient-based methods employed in white-box attacks, black-box attack methods resemble a form of simulated optimization of the gradient of the loss function. The One-Pixel attack, proposed by Su et al., generates adversarial examples (AEs) by modifying a single pixel under black-box conditions, utilizing a differential evolution algorithm to optimize the solution [
17]. Similarly, the Scratch attack, proposed by Jere et al. [
18], generates distortions resembling scratches using a differential evolution algorithm for adversarial attack purposes. Ran et al. proposed a black-box attack method based on image quality assessment [
19], aimed at generating adversarial samples with high image quality.
However, although the adversarial perturbations generated by these classic adversarial attack methods are typically minor, the degradation of image data remains an unavoidable reality. Even minor perturbations can result in failures in specific computer vision tasks, particularly in critical areas such as medical imaging, military applications, and digital forensics [
20,
21]. Data hiding technologies embed the secrets into a cover [
22], while reversible data hiding (RDH) enables the exact restoration of the original image with the aid of the embedded auxiliary information. Reversible adversarial example (RAE) generation integrates AE generation with RDH technologies [
23], inheriting the functions of AE generation while enabling the exact restoration of the original images.
Liu et al. [
24] proposed two methods for generating reversible adversarial examples (RAEs) utilizing RDH technology in conjunction with several classic AE generation techniques. However, the significant distortions induced by the RDH algorithm when embedding auxiliary information within AEs can result in a loss of adversarial effectiveness, ultimately culminating in attack failure. Yin et al. [
25] proposed a method for generating RAE using reversible image transformations, which avoids the failure of AEs due to the embedding of auxiliary information in RDH. Nonetheless, this method is not entirely reversible, and there may still be deviations in image recovery.
Zhang et al. [
26] proposed a partially reversible adversarial attack method. This method integrates adversarial attacks and restoration models into a unified task, utilizing a dimensionality reduction technique to optimize the distribution of adversarial perturbations, thereby reducing restoration error while maximizing attack capability. Cao et al. proposed a method for generating RAEs, known as W-RAE [
27], which transforms the task of generating RAEs into an image steganography task. This is accomplished by embedding a specific image watermark to generate RAEs. However, these methods can only approximately restore, rather than exactly restore, the original images.
This paper proposes a novel approach for RAE generation based on evolutionary algorithms to generate minimalist adversarial perturbations. The primary contributions are summarized as follows.
(1) We proposed a RAEs generation method that achieves zero-bit error, which inherits the functions of AEs and enables the exact restoration of original images. This facilitates recognition control in computer vision, and restricts the recognition capabilities of unauthorized systems.
(2) By introducing dual-color space detection of perturbed pixels (D-CSDPP), the perturbed pixel location can be automatically detected according to the difference between each pixel and its adjacent pixels, thereby the auxiliary information of perturbed pixel location do not need to be embedded, and embedding capacity is saved. Thereby, the image quality and the attack success rate (ASR) are both improved.
(3) Experimental validation demonstrates that the RAEs generated by the proposed method exhibit higher image quality compared to the original images and achieve a greater adversarial preservation rate (APR) relative to state-of-the-art (SOTA) methods.
2. Preliminary
This section provides a brief overview of one-pixel attacks and differential evolution algorithm, which are employed in our method.
2.1. One-Pixel Attack
Typically, the modifications to AEs involve multiple perturbations that jointly alter the overall structure of the image, causing deep learning models to make incorrect judgments. In contrast, a one-pixel attack modifies the image using only a single perturbation.
The AE generation typically involves accumulating multiple perturbations until certain conditions are satisfied. However, in a one-pixel attack, the problem can be simplified to find the optimal modification pixel within the constraints of the entire image. By focusing on a small number of pixels, the AE can achieve the desired adversarial task without constraining the modification intensity.
2.2. Differential Evolution
Differential evolution solves complex multimodal optimization problems. This algorithm relies on the variation within a population, and is particularly effective in black-box conditions compared with some gradient-based methods for white-box scenarios. Specifically, in each iteration, offspring individuals are generated from their parents, and then all of the offspring individuals and their parents are evaluated together; the individuals with higher likelihoods of survival (higher fitness values) are selected and preserved. This approach allows both parents and offspring individuals to pursue the goal of improving fitness, and maintains the diversity within the population.
Due to the absence of gradient-based iterations, the differential evolution algorithm does not require the target function to be differentiable. Consequently, it has a wider scope of optimization than gradient-based approaches. It has two main advantages for generating AEs, as follows.
(1) Global optimization enhancement.
Unlike gradient descent or greedy algorithms, which are limited by the constraints of the objective function and may converge to local optima, differential evolution algorithm is less affected by such limitations, and has higher likelihood of finding the global optimum, especially in highly challenging problems.
(2) Less information requirements.
Differential evolution algorithm does not rely on prior knowledge or information for optimization. This aspect is particularly important in the context of generating AEs. Firstly, some models are not rigidly differentiable, in which gradient-based methods are inapplicable. Secondly, some certain information is inaccessible during the optimization process.
In AE generation based on a one-pixel attack, a large number of iterations search for a single perturbed pixel that impacts the image structure. The differential evolution algorithm can relieve the limitations of local optima, and efficiently identify perturbed pixels. Moreover, in many real scenarios, we cannot access to the internal details of models. Employing the differential evolution algorithm to generate AEs can maintain a high ASR, even for black-box models. Based on these advantages, the proposed method in this paper utilizes the differential evolution algorithm.
3. The Proposed Method
The framework, illustrated in
Figure 1, consists of two phases: the generation phase and the restoration phase. The processes represented by blue arrows indicate the RAE generation, while the processes represented by red arrows indicate the restoration of original image from its RAE.
In the generation phase, adversarial perturbations are generated using a differential evolution algorithm and added to the original image to obtain the AE. The original RGB values of the perturbed pixels are recorded and treated as auxiliary information. This recorded auxiliary information is then embedded into the AE using differential expanded histogram shifting (DEHS) to generated an RAE.
In the restoration phase, the embedded auxiliary information is extracted from the RAE using DEHS, thereby enabling the recovery of the AE. The D-CSDPP is employed to detect the locations x and y of the perturbed pixels in the AE. By utilizing the detected location information along with the auxiliary information, the AE can be restored to the original image with zero-bit error.
3.1. Adversarial Example Generation
The adversarial perturbations are encoded as vectors (candidate solutions) and optimized by differential evolution algorithm. Each candidate solution contains a predefined number of perturbed pixels. Fore example, in a one-pixel attack, the number of perturbed pixels is 1. Each perturbed pixel has the modification value (one value in gray image, or three values of R, G and B in color image) and the coordinates
. The random numbers obeying normal distribution
are the initial R, G, and B values of perturbed pixels; each solution can be represented as a vector
. The initial population has 400 candidate solutions. In each iteration, an additional 400 offspring candidate solutions are produced. The differential evolution formula is
where
represents a candidate solution, i.e., the
i-th perturbed pixel.
t denotes the iteration count, and
F is a scaling coefficient with a preset value 0.5, which restricts
.
are three random indices of the selected parent individuals that produce the offspring individuals. Once the offspring individuals are produced, all of the offspring individuals and their parents are evaluated together, and the 400 individuals with higher likelihoods of survival (higher fitness values) are selected and preserved. Small size hinders the objective function from finding the ideal optimal solution, while large size increases the computation time. A size of 400 is sufficient to find good optimal solutions in most images, and the computational complexity is acceptable.
The image sizes of some datasets, e.g., ImageNet, are large. This can also be extended to multi-pixel attacks, where the number of perturbed pixels is larger than 1. This does not imply that altering just one pixel cannot execute an attack. If the computation is sufficient, i.e., the epoch number is large, even perturbing a single pixel can lead to a successful attack. In the case of
n-pixel attack, RAE is fast, i.e., the epoch number is small, and each solution can be represented as a vector (
,
,
,
,
,
,
,
,
,
, …,
,
,
,
,
). Algorithm 1 presents the workflow for generating adversarial perturbation using differential evolution.
Algorithm 1 Differential evolution for adversarial perturbation. |
- 1:
Input: - 2:
Initialize: - 3:
fort = 1 to T do - 4:
# Mutation Phase - 5:
for each solution in P do - 6:
Randomly select unique from P - 7:
- 8:
end for - 9:
# Crossover and Selection - 10:
- 11:
Evaluate fitness of all solutions - 12:
Select top 400 solutions with highest fitness - 13:
Update P - 14:
end for - 15:
Return: best solution
|
3.2. Reversible Adversarial Example Generation
One-pixel AE generation and multi-pixel AE generation share the same steps of RAE generation and original image restoration. The difference is that the lengths of the vectors, which represent the individuals, are 1 × 5 for one-pixel attacks and n × 5 for multi-pixel attacks, respectively. For simplification, one-pixel AE generation is specified, which consists of two steps: auxiliary information encoding and data embedding.
3.2.1. Auxiliary Information Encoding
Each pixel can be represented as a vector , represents the coordinates. If the image size is 32 × 32, . represent the values in the red, green, and blue channels, respectively. .
As shown in
Figure 2, after one-pixel AE generation, the perturbed pixel is represented as a vector
. To restore the original image, we have to record and save the pixel values at the same position in the original image, i.e.,
, where
.
is converted from decimal to a fixed-length bit string. Three 8-bit codes are for
, respectively, and a 24-bit code is for them totally.
3.2.2. Data Embedding
In each channel, the difference in the AE is
where
represents the AE image, and
represents the pixel value at coordinates
in the image. The image size is
. The three difference matrices in the three channels are
where
, respectively. The differential computation process is illustrated in
Figure 3.
The histograms of
are generated.
Figure 4 shows a histogram. The highest bar in the histogram represents the most frequent difference value (usually 0), and is denoted as
, as shown in
Figure 4a. The bars at the right of the highest bar are shifted to the right by one unit, as shown in
Figure 4b. The binary information can be embedded into the empty space, as shown in
Figure 4c.
The corresponding operation on
is to increase the values, which are greater than
, and generate
as
The first pixel with the value of
is found. If the embedded bit is 0, the value of the current
-pixel is unchanged; if the embedded bit is 1, the value of the current
-pixel is added by 1. The subsequent pixels outside
-pixel in the same row are also added by 1 to maintain the difference between two adjacent pixels. The next
-pixel is found and processed according to the next embedded bit. After data embedding, the matrices at three channels are
. Because the operations at three channels are the same, the subscript
i is omitted in the following equations.
can be obtained according to Equation (
4), while
k represents the value of the bit that is to be currently embedded.
The three matrices are added to the AE
, and we obtain the RAE
as
We take an instance to illustrate how the data are embedded into a difference histogram. In
Figure 5, a
matrix represents block in a channel of an AE
. The difference matrix
is a
matrix, which is computed according to Equation (
2).
is 0, and the
-pixels are labeled by yellow color.
represents the values after histogram shifting in
Figure 4.
is obtained after the data “1010” are embedded. Finally, the RAE
is obtained according to Equation (
5). The changed pixels after data embedding are labeled by red color.
3.3. Original Image Restoration
Restoring the RAE back to the original image involves two steps. The first step is perturbed pixel detection, and the second step is data extraction and image restoration.
3.3.1. Dual-Color Space Detection of Perturbed Pixels
During generating the RAEs, only the original values of the perturbed pixels are recorded. Our method can automatically detect the perturbed pixel in an RAE, so the perturbed pixel location is discarded and not recorded, which saves the embedding capacity for the auxiliary information and mends the images short of embedding capacity.
The differences between the perturbed pixel and its adjacent pixels are remarkable, so we leverage the remarkable differences to detect perturbed pixel. In
Figure 6, the RAE
is split into six channels, with three channels based on the RGB channels and three channels based on the HSV channels:
and
, where
,
. In fact, using only the RGB channels is sufficient to successfully detect perturbed pixel in the majority of RAEs. However, by incorporating the HSV channels, the detection success rate can be increased to 99%. In the following sections, we will also validate the feasibility of this approach through experiments.
A high-pass filter, a Laplacian operator, is applied on each channel to generate six response matrices:
and
. In
Figure 6, for visualization, the matrices have been normalized. The actual differences between the perturbed pixel and its adjacent pixels are larger than the visualization results.
Next, the generated matrices are subtracted from their respective channels, and the absolute values of the differences are and .
Finally, two kinds of difference matrices,
and
, are summed to obtain
and
according to Equations (
6) and (
7), where
,
and
in our work. The difference matrix
is obtained according to Equation (
8) where, in our work,
. The location of the pixel with the highest value in
is
, which is labeled by the red box, and considered as the location of the detected perturbed pixel in
.
is the location of the perturbed pixel in the original image. Generally,
. The workflow of the D-CSDPP is shown in Algorithm 2.
Algorithm 2 Dual-color space detection of perturbed pixels. |
- 1:
Input: RAE Image - 2:
Steps: - 3:
# Split Image into Channels - 4:
- 5:
- 6:
Apply High-Pass Laplacian Filter - 7:
for each channel do - 8:
# Generate high-pass response matrices - 9:
Create for RGB - 10:
Create for HSV - 11:
end for - 12:
# Calculate Difference Matrices - 13:
for RGB - 14:
for HSV - 15:
# Combine Difference Matrices - 16:
- 17:
- 18:
- 19:
# Locate Perturbed Pixel - 20:
Find with max value in - 21:
- 22:
Output: Detected Perturbed Pixel Location
|
3.3.2. Data Extraction and Image Restoration
Data extraction and data embedding are inverse operations. As shown in
Figure 7, first, the RAE
is used to generate the difference matrix
in the same way as that in embedding. Next,
is scanned in the same order as that in embedding. If a pixel with the value of
is found, the extracted bit is “0”, while if a pixel with the value of
is found, the extracted bit is “1”. In this way, all the embedded data “1010” are extracted. Finally, we restore all of the differences in the matrix
that are equal to
back to
, and shift the histogram differences larger than
to the left by one unit to restore them. In this way, we obtain the matrix
. The AE
is obtained through the matrix
and RAE
,
The location and original values of the perturbed pixel are known. The location is detected, the extracted data are extracted. Finally, the RAE is restored to the original image without any loss.
5. Conclusions and Future Works
A novel reversible adversarial example generation method is proposed under black-box conditions. A differential evolution algorithm is utilized to generate minimal adversarial perturbations on the original image. RAEs are generated based on D-CSDPP algorithm. This method not only mislead the deep learning models in image classification tasks, but also allows the RAEs to be exactly restored to the original image. This feature is lacking in many existing methods. Furthermore, this method enables recognition control in computer vision. The proposed reversible method functions as a form of encryption for computer vision. Thus, only authorized models are allowed to recognize the images.
The D-CSDPP can automatically detect the perturbed pixels, meaning that location information is not needed for original image restoration, resulting in a decrease in embedded data. As a result, APR is improved by more than 20%. Comparative experiments with different approaches demonstrate the effectiveness and detection performance of the proposed D-CSDPP.
The PSNR and SSIM between the RAEs and the original images can reach up to 48.32 dB and 0.9986, respectively, indicating that the images generated by the proposed method are of high quality. This is also reflected in the visual results of the RAEs.
The proposed method demonstrates effectiveness across various advanced models and datasets, highlighting its generalizability. Compared to SOTA methods, the proposed method achieves superior APR due to the high image quality of the RAEs.
In the future, we will attempt to propose some methods for generating more robust and efficient adversarial perturbations. We will also try to enhance the efficiency of the optimization solving. Alternatively, we aim to further compress the required auxiliary information for image restoration.