Dynamic-Max-Value ReLU Functions for Adversarially Robust Machine Learning Models

Sooksatra, Korn; Rivas, Pablo

doi:10.3390/math12223551

Open AccessArticle

Dynamic-Max-Value ReLU Functions for Adversarially Robust Machine Learning Models

by

Korn Sooksatra

^*

and

Pablo Rivas

^*

Department of Computer Science, Baylor University, Waco, TX 76706, USA

^*

Authors to whom correspondence should be addressed.

Mathematics 2024, 12(22), 3551; https://doi.org/10.3390/math12223551

Submission received: 15 October 2024 / Revised: 10 November 2024 / Accepted: 11 November 2024 / Published: 13 November 2024

(This article belongs to the Special Issue Advances in Trustworthy and Robust Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

The proliferation of deep learning has transformed artificial intelligence, demonstrating prowess in domains such as image recognition, natural language processing, and robotics. Nonetheless, deep learning models are susceptible to adversarial examples, well-crafted inputs that can induce erroneous predictions, particularly in safety-critical contexts. Researchers actively pursue countermeasures such as adversarial training and robust optimization to fortify model resilience. This vulnerability is notably accentuated by the ubiquitous utilization of ReLU functions in deep learning models. A previous study proposed an innovative solution to mitigate this vulnerability, presenting a capped ReLU function tailored to bolster neural network robustness against adversarial examples. However, the approach had a scalability problem. To address this limitation, a series of comprehensive experiments are undertaken across diverse datasets, and we introduce the dynamic-max-value ReLU function to address the scalability problem.

Keywords:

machine learning; adversarial machine learning; robustness; trustworthy AI; adversarial examples

MSC:

68T07; 68Q32; 62H30; 65K10

1. Introduction

Over the past few years, the adoption of deep machine learning models across various sectors has significantly increased. This is attributed to their superior performance in numerous tasks, and some have outperformed human capabilities. These tasks span from medical diagnosis to autonomous driving, where the accuracy and reliability of machine learning predictions are crucial. In the context of autonomous vehicles, for instance, the robustness of these systems is non-negotiable, as any failure could potentially endanger lives.

However, deep learning models exhibit a critical vulnerability to adversarial examples, i.e., subtle and deliberately engineered modifications to input data crafted to mislead the model into making erroneous decisions. Figure 1 illustrates an adversarial example that can fool an image classifier into predicting the image as a cat instead of as a dog. This susceptibility was first identified and discussed in seminal papers [1,2], highlighting significant challenges in deploying these models in environments demanding high security and reliability.

In our work, we show that the Static-Max-Value ReLU (S-ReLU) function designed in [3] is theoretically more robust than a traditional ReLU function. However, it does not work well on large datasets. Based on that analysis, we introduce the Dynamic-Max-Value ReLU (D-ReLU) function, an advanced activation function designed to dynamically adjust based on input data. This innovation enhances the model’s robustness, particularly when applied to larger and more complex datasets. By tailoring the activation mechanism to the specific characteristics of the input, D-ReLU aims to improve the overall performance of deep learning models.

Additionally, we explore the integration of D-ReLU into state-of-the-art pre-trained deep learning models. Through this integration, we demonstrate how modifications to traditional ReLU functions, alongside the addition of dense layers, can significantly enhance model security and reliability. This approach not only modernizes the activation function but also contributes to the stability and performance of established architectures.

To validate the effectiveness of D-ReLU, we conduct comprehensive experiments across large datasets, such as CIFAR-10, CIFAR-100, and TinyImagenet. These experiments aim to evaluate the practical performance and robustness improvements that D-ReLU can offer. The findings from these studies underscore the potential of D-ReLU for safer public deployment of deep learning models, making a compelling case for its adoption in various applications.

The rest of this paper is organized as follows. Section 2 discusses works with the same goals as our approach. Section 3 mentions and further analyzes the work inspiring us to propose our approach. Section 4 defines our approach. Section 5 shows the setups of our experiments. Section 6 details experimental applications of our approach and baselines against white-box attacks. Section 7 details experimental applications of our approach and baselines against white-box attacks. Section 8 reports on experiments with the models trained on augmented datasets against adversarial attacks. Section 9 discusses how much our approach generalizes in several perturbation bounds. Section 10 explains the limitations of our approach. Section 11 demonstrates the broader impact of our approach. Section 12 concludes everything and discusses future works.

2. Related Works

Addressing the vulnerabilities posed by adversarial examples has led to a plethora of research endeavors [2,4,5,6,7,8,9]. Among the proposed solutions, adversarial training has emerged as a foremost strategy due to its relatively straightforward implementation and proven effectiveness [7]. This approach involves training the model on a dataset supplemented with adversarially modified examples, thereby improving the model’s resilience to similar attacks. However, the technique significantly extends the training duration and computational demands.

Moreover, integrating autoencoders and generative adversarial networks (GANs) has been explored to preprocess and potentially cleanse adversarial perturbations from inputs [10,11]. These methods aim to improve the robustness of machine learning models by denoising or altering the input data before the model processes them. Figure 2 demonstrates the autoencoder method, which preprocesses an input to ensure that the classifier or target model receives a clean input. This autoencoder was specifically trained to denoise adversarial examples, effectively reducing the impact of adversarial perturbations. However, these solutions necessitate additional model training and are challenged by the complexity of handling large-scale data. The extra computational overhead and the need for extensive training make these methods less feasible for real-time applications and large datasets.

In addition, some strategies have focused on leveraging the outputs from machine learning models to promote robustness. One notable approach is randomized smoothing [12], which involves injecting random noise, such as Gaussian noise, into the input and generating multiple noisy versions of the input. Each version is then passed through the machine learning model, and the final prediction is determined by a majority vote among the predictions from the noisy inputs. While randomized smoothing provides a form of certified robustness, it has significant drawbacks. The method requires multiple passes through the model for each prediction, which is computationally expensive and impractical for real-time systems, where quick responses are essential. Figure 3 illustrates the randomized smoothing method. In this example, the system generates four predictions: three predict the input as a dog, and one predicts it as a cat. Based on the majority vote, the final output of the system is a dog.

Another method, defensive distillation, aims to reduce a model’s sensitivity to input variations by training the model to output softened probabilities rather than hard classifications [13]. Despite its initial promise, defensive distillation is vulnerable to more sophisticated adversarial attacks, as demonstrated by Carlini and Wagner [14]. This finding indicates that while defensive distillation can provide some level of robustness, it is not a comprehensive solution and only offers absolute protection against some types of adversarial techniques.

Many works have also attempted to create detectors for adversarial examples, aiming to filter out adversarial inputs before they reach the machine learning model [15,16,17,18,19,20]. These detectors can identify potentially malicious inputs and prevent them from affecting the model’s predictions. However, these approaches do not inherently improve the robustness of the underlying machine learning models. Furthermore, some detector-based methods rely on additional machine learning models, which can be vulnerable to adversarial attacks, allowing attackers to bypass the detectors and compromise the target models. Figure 4 depicts this technique. Samples detected as adversarial examples are ignored. Another drawback is that there will be no input for the machine learning model if there are only adversarial examples in the real world.

All the techniques mentioned above focus on preprocessing, post processing, or augmenting the inputs and outputs rather than directly modifying the models’ architectures. However, the architecture of the models—mainly the activation functions—significantly contributes to their vulnerability to adversarial examples. As illustrated in Figure 1, these tiny perturbations are imperceptible to the human eye. Despite this, certain activation functions, such as ReLU, enable these perturbations to amplify as they propagate through the layers of machine learning models. Ultimately, this can lead to changes in the output-layer values, consequently affecting the final prediction.

While some research [21,22,23,24,25,26,27,28] has aimed to customize the models’ architectures, only a few works have specifically targeted the customization of activation functions. One such effort by [21] involved quantizing activation functions, which can significantly reduce the precision of activations and potentially degrade the model’s performance and robustness. Another example is the ReLU6 activation function used in MobilenetV2 [29], which caps the activation values at 6. Although ReLU6 was introduced to improve robustness, its potential must be further explored to enhance robustness against adversarial attacks.

Amidst these challenges, we observe a potential opportunity with respect to conventional solutions concerning the activation functions within deep learning architectures—in particular, the ReLU function, which is known to contribute to vulnerabilities against adversarial examples [2]. A ReLU function is formulated as

\max (x, 0)

, where x is an input and

\max (\cdot, \cdot)

outputs the maximum value between two parameters. Our research aims to directly address this by enhancing the design of ReLU functions to improve model robustness without compromising accuracy.

3. Static-Max-ReLU Function

This novel activation function is termed the static-max-value ReLU function (S-ReLU) and is defined as follows:

S - ReLU (x, m) = \max (0, \min (m, x)),

where x represents the incoming input and m is a predefined maximum value. Figure 5 shows an example of this activation function, where the max value is 2. We can see that the function is capped after the input is 2. Theoretical analyses are presented to demonstrate the enhanced robustness of this proposed function compared to a general ReLU function. Furthermore, empirical experiments are detailed in the subsequent sections to empirically validate its improved robustness.

3.1. Theoretical Analysis

We previously introduced S-ReLU. Next, we aim to theoretically demonstrate how S-ReLU can neutralize adversarial perturbations at each layer in this section.

Theorem 1.

The outputs of S-ReLU functions always have an equal number of perturbations or fewer relative to the outputs of ReLU functions, given the same inputs. Note that perturbations are the additional noises affected by the corrupted input (also see Appendix A).

The utilization of the static-max-value ReLU function (S-ReLU) is likely associated with a reduction in the Lipschitz constant, denoted as K. This observation is substantiated by the findings presented in Theorem 1, which indicate a diminished discrepancy between the outputs of a layer when processing a clean sample and an adversarial example, especially when contrasted with the behavior of the standard ReLU activation function. The Lipschitz inequality is expressed as

d_{Y} (f (x), f (x^{*})) \leq K \cdot d_{X} (x, x^{*}),

where x is a clean sample,

x^{*}

is its adversarial example,

f (\cdot)

is a classifier,

d_{X} (\cdot, \cdot)

is a distance function (e.g.,

L_{2}

norm and

L_{\infty}

norm) for an input, and

d_{Y} (\cdot, \cdot)

is a distance function (e.g.,

L_{2}

norm) for an output. The consequential reduction in the Lipschitz constant, a consequence of employing S-ReLU, signifies an enhancement in the model’s robustness, as a lower Lipschitz constant is indicative of reduced sensitivity to input perturbations and, consequently, increased resilience against adversarial examples.

Next, we theoretically show how the max value (denoted by m) affects the amount of adversarial perturbations in a layer.

Corollary 1.

When the max value (m) of S-ReLU in a layer reduces, the layer’s outputs of clean samples and adversarial examples are closer (also see Appendix B).

According to Corollary 1, we can reduce the max values of S-ReLU to reduce the Lipschitz constant and eventually improve robustness. However, this technique may harm the overall performance if the max values are too low.

3.2. Limitations

In this section, we discuss the limitations of S-ReLU. While S-ReLU successfully enhances adversarial robustness in MNIST classifiers, its performance falters when applied to more extensive datasets like CIFAR-10. The challenge arises from the substantial number of layers and zero gradients, leading to what is commonly known as the gradient vanishing problem. The upcoming sections explain a compelling solution to address and overcome this issue, revolutionizing the capability of classifiers on larger datasets.

4. Dynamic-Max-ReLU Functions

Sooksatra et al. [3] demonstrated the effectiveness of S-ReLU in enhancing both model performance and adversarial robustness. They conducted a series of experiments that showcased improvements in resistance to adversarial attacks facilitated by the S-ReLU function. We also showed the theoretical results in the previous section. However, we observed challenges when applying S-ReLU to larger datasets beyond MNIST, primarily due to issues related to gradient vanishing.

To address these challenges, this section introduces a new variant, the dynamic-max-value ReLU function (D-ReLU). This modified function aims to retain the advantages of S-ReLU while mitigating its limitations on larger datasets. This approach uses the same activation functions as S-ReLU. However, the max values (i.e., m) of those functions are learnable. Therefore, at first, we set those values to be high, then try to minimize them during training to improve the robustness such that the optimizer can adjust the models with low max values. We minimize the max values because Table A1 shows that low max values (i.e., m) lead to small output differences and improve robustness. Therefore, the loss function can be formulated as

l (F (x, θ), y) + λ \sum_{i} m_{i}^{2},

(1)

where

F (x, θ)

is a classifier, x is an input,

θ

represents the parameters of F, y is the true label,

m_{i}

is the max value of neuron i with D-ReLU as its activation function, and

λ

balances the model’s performance and adversarial robustness. Note that the value of

m_{i}

is the number of neurons whose activation functions are D-ReLU. Next, we illustrate how D-ReLU can enhance adversarial robustness through a series of experiments. Before presenting our findings, we first describe the experimental setup.

5. Experimental Setup

In this section, we provide a comprehensive breakdown of the methodologies and resources utilized to configure and conduct our experimental studies. The components detailed here are crucial for replicating our results and understanding the efficacy of our proposed modifications in terms of model robustness.

First, we discuss the datasets employed in our experiments. These datasets were carefully selected to cover a variety of scenarios and complexity levels, which helps in testing the resilience of our modified models across different data distributions and task complexities.

Secondly, we elaborate on the specific training details, which include the configuration of the machine learning models, the choice of hyperparameters, and the training procedures we adopted.

Next, we delve into the robustness evaluations. Here, we define the metrics and methodologies used to assess the robustness of the models against adversarial attacks. This includes a description of how adversarial examples were generated and the criteria used to evaluate the model’s performance in the face of such perturbations.

Finally, we outline the baselines for comparison. This includes a discussion of the existing models and techniques against which our proposed modifications were benchmarked. Describing these baselines provides context for the improvements our research introduces and furnishes a clear contrast to demonstrate the incremental gains in robustness attributed to our enhancements.

Each of these elements plays a vital role in shaping the experimental design and is critical for assessing the practical impact of our research in enhancing the robustness of deep learning models against sophisticated adversarial threats.

5.1. Datasets

We used four datasets in this experiment: MNIST [30], CIFAR10 [31], CIFAR100 [31], and TinyImagenet [32]. MNIST consists of 60,000 training images and 10,000 testing images, each 28 × 28 pixels, representing handwritten digits from 0 to 9. Figure 6 shows some examples from this dataset.

CIFAR10 is a dataset commonly used for machine learning and computer vision tasks. CIFAR-10 consists of 60,000 32 × 32 color images in 10 different classes, with each class representing a distinct object or animal category. The dataset is divided into 50,000 training images and 10,000 testing images. It is widely used as a benchmark for developing and evaluating image classification algorithms and models. Figure 7 shows some examples of this dataset.

The CIFAR100 dataset is a collection of 60,000 32 × 32 color images across 100 different classes, with each class containing 600 images. It serves as a benchmark for image classification tasks, where each image belongs to one of the 100 fine-grained object classes. This dataset is commonly used for evaluating machine learning algorithms and models due to its diverse set of classes and relatively small image size. Figure 8 shows some examples of this dataset.

TinyImagenet is a subset of the large-scale Imagenet dataset designed for the training of deep neural networks with smaller computational resources. It consists of 200 diverse classes, with each class having 100,000 training images and 10,000 test images. Each image has dimensions of 64 × 64 pixels, representing a wide range of object categories, making it a useful dataset for tasks like classification, detection, and segmentation. TinyImagenet serves as a more manageable alternative to the full Imagenet dataset for researchers and practitioners working on computer vision tasks. We partitioned the training set using an 80/20 ratio for validation. Figure 9 shows some examples of this dataset.

5.2. Training Details

We used Tensorflow for this implementation. The optimization process employed the Adam optimizer [33] with the initial learning rate set to

10^{- 3}

. Additionally, we implemented the ReduceLROnPlateau callback with a decay factor of 0.5 and a patience of 5, as well as the EarlyStopping callback with patience of 10 based on the validation loss. The ReduceLROnPlateau callback reduces the learning rate by multiplying it by its decay factor when the validation loss does not improve for the patience epochs. The EarlyStopping callback stops the training when the validation loss does not improve for the patience epochs. The maximum number of epochs for the training procedure was set to 2000. We conducted three independent training sessions for each model type. All subsequent results presented in the following sections represent the average performance obtained from these three trained models.

We also added a dense layer before the output layer. Incorporating a dense layer before the output layer is motivated by findings from the experiment conducted in [3]. The study demonstrated that employing S-ReLU in the last hidden layer yields superior results compared to its placement in earlier layers. This layer’s activation function is D-ReLU for our approach, as shown in Figure 10, while it is a general ReLU for other approaches.

5.3. Adversarial Attacks

We employed diverse adversarial attack strategies to compute the robust accuracy for trained targeted models by using the test samples. The selected attacks encompass the following methodologies:

Fast Gradient Sign Method (FGSM) [2]: This attack creates adversarial examples by perturbing input data in the direction that maximizes the model’s loss, utilizing the sign of the gradients and a small constant.
Project Gradient Descent (PGD) [7]: This attack is an iterative approach, repeatedly updating the input by taking small steps in the gradient direction and projecting the result back into a small neighborhood around the original data. While FGSM is computationally less intensive and involves a single step, PGD is generally more effective and robust, requiring multiple iterations but producing adversarial examples that are harder to defend against.
Auto Projected Gradient Descent with DLR (APGD_DLR) [34]: This variant of APGD_CE maintains the same underlying principles as APGD_CE but employs the Difference of Logit Ratio Loss (DLR) [34] as the loss function.
Carlini and Wagner Attack with L2 Norm (CW_L2) [6]: Diverging from the optimization-based approach of the preceding attacks, CW_L2 is characterized by its slower adversarial example discovery process. However, its potency in generating robust adversarial examples is noteworthy. It directly minimizes the difference between clean samples and adversarial examples with the L2 norm and maximizes the misclassification confidence as well.
Square [35]: This is a black-box attack that utilizes random initialization with vertical stripes to perturb images within a specified range. By focusing on sparse updates grouped in a square pattern, the attack strategically alters the input, aiming to induce subtle yet significant changes in image components. This method leverages the sensitivity of convolutional networks to high-frequency perturbations and is designed to generate successful perturbations within a limited radius, ensuring distinct differences relative to the original image. By strategically manipulating color channels and employing sparse updates, the attack aims to maximize perturbation impact while adhering to image constraints and network sensitivities.

5.4. SOTA Methods for Robustness

To justify our approach’s novelty, we also compared it to state-of-the-art methods for adversarial robustness. We selected the following popular and effective methods:

Adversarial Training [7]: This method retrains a model with adversarial examples after its successful natural training. We retrained the models for 10 epochs.
TRADES [36]: This method balances the performance and robustness of a model by customizing the loss function. The loss function consists of two parts. The first part increases the performance, and the other part improves the robustness by computing the difference between the output distributions between the clean samples and their adversarial counterparts. Please be aware that this method utilizes a parameter denoted as $β$ to strike a balance between performance and robustness. We adopted the same values of $β$ as those employed by the original authors—specifically, $β = 1$ and 6.

We used PGD to generate adversarial examples for all the mentioned methods.

6. White-Box Attack Experiments

6.1. Experimental Results for MNIST

We created two models for the MNIST dataset. The first one is a two-hidden-layer dense network, and the other one is a shallow convolutional network. These networks are enough to evaluate the MNIST dataset. We set the perturbation bound to 0.1 for FGSM, PGD, PGD_CE, and PGD_DLR. We also set the perturbation bound to 18 for CW_L2.

The outcomes of tuning the balancer, denoted as

λ

in (1), are illustrated in Figure 11. Note that at a balancer value of zero, the models were naturally trained, and they were not robust against attacks at all. Through experimentation on both a dense network and a shallow CNN, it was observed that elevating the balancer led to increased accuracies on adversarial examples generated by FGSM, PGD, APGD_CE, and APGD_DLR. Interestingly, this improvement in adversarial accuracy occurred while the accuracy on clean samples remained relatively stable. This outcome aligns with our expectations. However, in the case of adversarial examples generated by CW_L2, the accuracy did not exhibit a similar increase. This anomaly can be attributed to the strength of the CW_L2 attack, where the applied perturbation may remain consistent across all samples.

Table 1 presents the performance (accuracy on clean samples) and robustness (accuracy on adversarial examples) achieved by training models using both state-of-the-art methods and our proposed approach. We carefully selected the optimal trade-off between performance and robustness for our approach, with the corresponding balancer values detailed in the table. Notably, our approach outperforms other methods across various scenarios, except for the accuracy of the dense model on both clean samples and adversarial examples generated by CW_L2. Importantly, our method achieves this superior performance without the need to compute adversarial examples during the training process. This observation underscores the efficacy of our approach in endowing machine learning models with adversarial robustness without compromising overall performance.

6.2. Experimental Results for CIFAR10

We trained six types of models: two-hidden-layer dense networks, shallow convolutional neural networks (CNN), ResNet50 [37], ResNet101 [37], MobilenetV2 [29], and InceptionV3 [38]. We set the perturbation bounds to 0.01 for FGSM, PGD, APGD_CE, and APGE_DLR. Moreover, the bound for CW_L2 was set to 18.

Figure 12 provides a detailed visualization of the performance outcomes for various models that employ different balancer values under multiple adversarial attack scenarios. This figure enables a comparative analysis, particularly focusing on how these models withstand adversarial perturbations when adjusted with varying balancer levels.

Consistent with our prior observations on the MNIST dataset, we noted a similar trend in the CIFAR-10 dataset. Specifically, as the balancer values increase, there is a noticeable enhancement in robustness against several attacks. This pattern aligns with our expectations and demonstrates that carefully calibrated balancer values can significantly improve a model’s resistance to certain types of adversarial attacks. However, it is important to highlight that while higher balancer values enhance robustness, there is a threshold beyond which further increases can negatively impact overall model performance. This suggests a trade-off where excessively high balancer values may lead to the deminishment of accuracy or other performance metrics under standard conditions.

In light of these findings, the D-ReLU mechanism appears to be particularly effective. For medium-sized datasets such as CIFAR10 and advanced models including ResNet, Mobilenet, and Inception, D-ReLU strikes a balance that optimizes robustness without excessively compromising overall performance. This makes D-ReLU a promising choice for practitioners looking to enhance model robustness in practical applications.

The implications of these results are multifaceted. First, they underscore the importance of balancing robustness and performance. While enhancing defense mechanisms against adversarial attacks is crucial, maintaining high levels of accuracy and performance in non-adversarial scenarios is equally important. This balance ensures that the models remain useful and effective in real-world applications where both adversarial and benign inputs are encountered.

Secondly, the trend observed with escalating balancer values offers insights into the tuning process for adversarial robustness. It suggests that there is a critical balancer value range that optimizes defense mechanisms without significantly degrading the model’s general performance. Identifying this optimal range can guide the development of more resilient machine learning systems.

Furthermore, the suitability of D-ReLU for state-of-the-art models such as ResNet, Mobilenet, and Inception indicates its potential for broader adoption. These models are widely used in various applications due to their performance and efficiency. Enhancing their robustness with D-ReLU can make them more reliable in adversarial settings, thereby extending their applicability in security-sensitive domains such as autonomous driving, medical imaging, and financial forecasting.

We also experimented with placing the additional convolutional layer with D-ReLU after the input layer instead of incorporating it in the dense layer before the output layer. Figure 13 presents the outcomes, illustrating the impact on several CNN architectures when the D-ReLU layer is added at the beginning of the network.

The results indicate that positioning the D-ReLU layer early in the network does not yield the same level of effectiveness as when placed in deeper layers. For the Shallow CNN (Figure 13a), MobilenetV2 (Figure 13b), and InceptionV3 (Figure 13c), there is a notable decline in adversarial robustness across different attack types (FGSM, PGD, APGD_CE, APGD_DLR, and CW_L2) as compared to when the D-ReLU layer is situated deeper in the network. This trend suggests that the D-ReLU function, when applied later in the model, significantly enhances the model’s ability to withstand adversarial attacks while maintaining high accuracy on clean samples.

The implications of these findings are significant for the design of robust neural network architectures. Incorporating D-ReLU in deeper layers allows the network to better leverage its properties for to improve adversarial robustness. This highlights the importance of strategic layer placement within CNNs, particularly for applications requiring high resilience to adversarial perturbations without compromising performance on clean data.

Table 2 provides a comprehensive comparison of accuracy metrics and rankings for various robust training schemes applied to different models on the CIFAR10 dataset. The table reveals that D-ReLU consistently achieves an optimal balance between performance on clean samples and robustness against adversarial attacks, particularly excelling in the context of deep networks like ResNet and InceptionV3.

Interestingly, while TRADES with

β = 6

demonstrated superior robustness for the dense network, it did so at the expense of performance on clean samples. In contrast, our D-ReLU approach significantly outperformed other methods in generalizing to adversarial examples, and it did so without the need to compute adversarial examples during training. This characteristic is particularly advantageous, as it simplifies the training process and reduces computational overhead.

Moreover, D-ReLU’s ability to maintain high performance on clean samples is noteworthy. Unlike other robust training schemes that often sacrifice accuracy on clean data to gain adversarial robustness, D-ReLU preserves the integrity of clean sample performance, making it a highly efficient and practical approach for enhancing model robustness without compromising overall accuracy. This makes D-ReLU a highly effective method for deploying robust models in real-world scenarios where maintaining high accuracy on clean data is crucial.

Additionally, we performed an ANOVA test and obtained an F score of

17.4

, surpassing the critical value of

3.92

at

α = 0.01

. Given that the F score was significantly higher than the critical value, we reject the null hypothesis and conclude that there are significant differences among the approaches with

99 %

confidence. Moreover, considering the average accuracy, it is evident that D-ReLU significantly enhances model robustness.

Furthermore, we conducted a non-parametric test, specifically the Friedman test, to assess the differences between the results. This test uses the ranks provided in Table 2. The test yielded a

χ^{2}

score of

39.43

and an

F_{F}

score of

20.12

. The critical value (

α = 0.01

) ranged between

2.13

and

2.18

for 3 and 105 degrees of freedom, respectively. Given that both the chi-square and

F_{F}

scores were significantly higher than the critical value, we reject the null hypothesis. Consequently, we conclude that the models differ significantly from each other with a confidence level of

99 %

.

Subsequently, we employed the Nemenyi test [39] to pinpoint which pairs of classifiers exhibited significant differences. The computed critical difference was

0.656

. The differences in the average ranks between D-ReLU and the other techniques are reported as follows:

0.86

for adversarial training,

1.89

for TRADES-1, and

1.14

for TRADES-6. Each of these differences surpasses the critical difference. Therefore, we conclude that classifiers utilizing D-ReLU demonstrate significantly greater robustness compared to those using all other methods.

6.3. Experimental Results for CIFAR100

Figure 14 illustrates the accuracy of various CNN architectures on clean CIFAR100 samples and adversarial examples generated by different white-box attacks. The figures reveal several important trends. Across all models, we observe a general pattern where the accuracy on clean samples remains relatively stable or slightly decreases as the balancer value increases. This stability indicates that the addition of the D-ReLU layer does not significantly compromise the model’s performance on clean data, which is crucial for maintaining the overall utility of the model in non-adversarial settings.

There is a notable improvement in robustness with increasing balancer values for adversarial examples. This trend is consistent across all considered types of white-box attacks: FGSM, PGD, APGD_CE, APGD_DLR, and CW_L2. The accuracy on adversarial examples shows a significant upward trajectory, especially for higher balancer values, suggesting that the D-ReLU function effectively mitigates the impact of adversarial perturbations. This improvement in robustness is particularly pronounced in more complex models like ResNet50, ResNet101, MobilenetV2, and InceptionV3.

Table 3 shows a comparison between our approach and the other baselines concerning performance and robustness. Although the baselines outperform our approach in three architectures, our approach can provide more robust models than the other baselines in every case. Particularly in the cases of MobilenetV2 and InceptionV3, our approach exhibits notably superior performance compared to the other baselines.

6.4. Experimental Results for TinyImagenet

Figure 15 presents the accuracy of several neural network architectures on clean TinyImagenet samples and adversarial examples produced by various white-box attacks. The assessed networks include Dense and Shallow CNN, ResNet50, ResNet101, MobilenetV2, and InceptionV3. The experiments involved integrating a dense layer with a D-ReLU function before the output layer and varying the balancer value to observe its impact on model performance and robustness.

The graphs demonstrate a consistent pattern across all models, indicating the efficacy of the D-ReLU layer in enhancing adversarial robustness. On clean TinyImagenet samples, the accuracy generally remains stable or exhibits minor fluctuations as the balancer value changes. This stability suggests that the addition of the D-ReLU layer does not significantly impair the model’s ability to correctly classify clean samples, maintaining its utility in standard scenarios.

For adversarial examples generated by white-box attacks (FGSM, PGD, APGD_CE, APGD_DLR, and CW_L2), there is a clear trend of improved robustness with increasing balancer values. The accuracy on these adversarial examples improves markedly, especially at higher balancer values, indicating that the D-ReLU function effectively counteracts the adversarial perturbations. This improvement is particularly evident in complex models like ResNet50, ResNet101, MobilenetV2, and InceptionV3, which show substantial gains in accuracy against adversarial attacks.

Table 4 shows the performance and robustness of our approach and the other baselines on the TinyImagenet dataset. The table also shows the ranking of the approaches in each architecture. Our approach struggles to find a balance between performance and robustness. However, in MobilenetV2, our approach outperforms the other ones in terms of performance and robustness.

6.5. Discussion

The consistent improvements in adversarial robustness across the MNIST, CIFAR10, CIFAR100, and TinyImagenet datasets highlight several key implications.

First, the D-ReLU layer’s effectiveness across different datasets and model architectures indicates its broad applicability. It suggests that this technique can be reliably used to enhance the adversarial robustness of various neural networks without specific tailoring to individual datasets.

Second, despite the significant gains in adversarial robustness, the performance on clean samples remains largely unaffected. This balance ensures that the models remain useful and reliable in standard conditions, which is critical for practical deployment.

Third, the approach scales well with model complexity. More advanced models like ResNet and InceptionV3, which are typically used in real-world applications, benefit greatly from the addition of a D-ReLU layer, showing substantial improvements in defending against sophisticated white-box attacks.

Moreover, by effectively countering a range of white-box attacks, the D-ReLU layer enhances the overall security of neural networks. This makes it a valuable addition to the suite of techniques aimed at protecting models against adversarial threats.

The integration of a dense layer with a D-ReLU function before the output layer provides a robust defense mechanism against white-box attacks across the MNIST, CIFAR10, CIFAR100, and TinyImagenet datasets. This approach ensures that neural networks can maintain high performance on clean samples while significantly improving their resilience to adversarial perturbations, thereby enhancing their reliability and security in various applications.

7. Black-Box Attack Experiments

In addition to the promising results against white-box attacks, we also evaluated the performance of the D-ReLU function in enhancing the robustness of CNNs against black-box attacks, specifically the Square attack. Figure 16, Figure 17 and Figure 18 offer valuable insights into how D-ReLU impacts various models across different datasets under black-box attack scenarios.

7.1. Experimental Results for CIFAR10

In Figure 16, the accuracy of several network types on clean CIFAR10 data and adversarial examples generated by the black-box attack is depicted. For dense networks (Figure 16a), the accuracy on clean samples remains relatively stable across different balancer values. However, the accuracy against adversarial examples shows a notable improvement with increasing balancer values, indicating enhanced robustness. Shallow CNNs (Figure 16b) display a similar pattern, with a significant improvement in adversarial robustness at higher balancer values, while the clean accuracy remains consistent.

ResNet50 and ResNet101 (Figure 16c,d) both demonstrate substantial gains in adversarial robustness with increasing balancer values. This trend suggests that deeper networks benefit more from the D-ReLU layer in terms of adversarial resilience. MobilenetV2 (Figure 16e) also shows consistent improvement in adversarial accuracy with higher balancer values, despite slight fluctuations in clean accuracy. InceptionV3 (Figure 16f) exhibits a strong increase in adversarial robustness with higher balancer values while maintaining high accuracy on clean samples.

7.2. Experimental Results for CIFAR100

Figure 17 presents the accuracy metrics for CIFAR100. Dense networks (Figure 17a) show moderate improvement in adversarial robustness with the addition of the D-ReLU layer, though clean accuracy remains largely unaffected. Shallow CNNs (Figure 17b) follow a clear trend of increasing adversarial accuracy with higher balancer values, indicating the D-ReLU layer’s effectiveness in enhancing robustness.

For deeper networks like ResNet50 and ResNet101 (Figure 16c and Figure 17d), there is improved adversarial robustness with increasing balancer values, though a slight decrease in clean accuracy is observed at higher balancer values. MobilenetV2 (Figure 17e) displays marked improvement in adversarial robustness with higher balancer values, with minimal fluctuations in clean accuracy. InceptionV3 (Figure 17f) shows the highest gains in adversarial robustness, maintaining strong performance on clean samples.

7.3. Experimental Results for TinyImagenet

In Figure 18, the results for TinyImagenet are detailed. Dense networks (Figure 18a) show a significant increase in adversarial robustness with higher balancer values, while clean accuracy remains stable. Shallow CNNs (Figure 18b) exhibit improved adversarial accuracy with higher balancer values, though clean accuracy shows some variability.

Deeper networks like ResNet50 and ResNet101 (Figure 18c,d) benefit significantly in terms of adversarial robustness with increasing balancer values, with slight fluctuations in clean accuracy. MobilenetV2 (Figure 18e) demonstrates notable improvement in adversarial robustness with higher balancer values, with clean accuracy remaining relatively unaffected. InceptionV3 (Figure 18f) shows the most substantial gains in adversarial robustness among all tested architectures, with clean accuracy remaining high.

7.4. Comparison to Other Baselines

Table 5 provides accuracy metrics and rankings for various neural network models trained under different robust training schemes and evaluated on clean samples, as well as adversarial examples generated by a black-box attack (denoted as Square), on the CIFAR10, CIFAR100, and TinyImagenet datasets. The displayed values are percentages, with the highest accuracy metrics highlighted in bold for each specific model among the different training methods.

The TRADES-6 strategy demonstrates superior performance across most scenarios in the dense network. In the Shallow CNN architecture, the D-ReLU method showcases a competitive edge over TRADES-based approaches specifically on the CIFAR10 dataset. However, TRADES-6 surpasses D-ReLU in other instances. For the ResNet50, MobilenetV2, and InceptionV3 models, D-ReLU stands out as the top performer on the CIFAR10 and CIFAR100 datasets. Nevertheless, its efficiency on the TinyImagenet dataset falls short in comparison to the TRADES-based techniques, highlighting a trade-off between performance and robustness. ResNet101 presents a mix of results, showcasing variability in its performance outcomes.

7.5. Discussion

The effectiveness of D-ReLU against black-box attacks has several important implications. First, it highlights the potential of D-ReLU to provide robust defenses in more realistic adversarial settings where attackers lack full knowledge of the model’s parameters and architecture. This makes D-ReLU a valuable tool for real-world applications where security and reliability are paramount.

Second, the consistent improvement in robustness across different architectures and datasets suggests that D-ReLU can be widely applied to various deep learning models, making it a versatile and scalable solution for enhancing adversarial defenses.

Lastly, the ability of D-ReLU to improve robustness without compromising performance on clean samples is particularly noteworthy, especially on the CIFAR10 and CIFAR100 datasets. This balance between robustness and accuracy ensures that models remain effective for their intended tasks while being resilient to adversarial perturbations. However, it is still difficult to train the model with D-ReLU on a large dataset like the TinyImagenet dataset.

Overall, the findings underscore the robustness of the D-ReLU function against black-box attacks, further validating its utility in strengthening the security of deep learning models in diverse and practical scenarios. This reinforces the importance of integrating such robust functions into model architectures to safeguard against a wide range of adversarial threats.

8. Experiments with Augmented Dataset

The study conducted by Wang et al. [40], as highlighted within the extensive literature review, has brought to light the significant impact of incorporating the elucidating diffusion model (EDM) proposed by Karras et al. [41] as a means to effectively mitigate the prevalent issue of overfitting encountered during adversarial training processes. By augmenting the training dataset with the EDM, promising results have been observed in terms of enhancing the robustness and generalization capabilities of the learning model. Against this backdrop, the subsequent analysis presented in this section undertakes a comprehensive evaluation through comparative studies between our proposed methodology and the renowned TRADES technique introduced by Zhang et al. [36]. This comparative analysis is conducted utilizing the augmented training samples, demonstrating the efficacy and superiority of our approach in bolstering the resilience of the learning system against adversarial attacks and enhancing overall performance metrics.

In every epoch, a combination of generated samples and original training samples is utilized. As outlined in the research conducted by Wang et al. [40], a specific configuration is followed for the CIFAR10 and CIFAR100 datasets. Here, a random selection process is employed to choose samples from both the original dataset and the generated samples. Approximately 30% of the training samples are sourced from the original dataset, while the remaining samples are from the generated dataset. It is imperative to note that despite this mixing process, the overall size of the training dataset remains constant.

Furthermore, the research also stipulates the use of a hyperparameter value of

β = 5

for the TRADES method. Moving on to the TinyImagenet dataset, a slightly different approach is adopted. In this case, 20% of the training samples are sourced from the original dataset, with the remaining samples coming from the generated dataset. Consistent with the literature by Wang et al. (2023) [40], a value of

β = 8

is utilized for the TRADES method in this context. To ensure a fair comparison, the same

β = 5

value is also utilized in this scenario.

8.1. Experimental Results

The visual representations displayed in Figure 19 for CIFAR10 and Figure 20 for CIFAR100 offer an insightful analysis of the performance and robustness of various architectures trained with D-ReLU under white-box attacks, leveraging a training dataset enriched with generated samples from the EDM. The fusion of D-ReLU with the EDM showcases impressive results on both the CIFAR10 and CIFAR100 datasets, particularly demonstrating significant efficacy when applied to deep architectures. Notably, the combined approach of D-ReLU plus EDM exhibits remarkable performance and robustness; especially noteworthy is how it outperforms instances where D-ReLU is employed without the integration of the EDM.

Intriguingly, even at higher values of m, such as

m = 100

, the performance and robustness metrics do not exhibit a notable decline as observed with the utilization of D-ReLU in isolation, underscoring the added value and efficacy of incorporating EDM-generated samples into the training set. This observation highlights the positive impact of integrating EDM in the training process, particularly in enhancing the overall performance and robustness of deep architectures across the CIFAR10 and CIFAR100 datasets. Such findings provide valuable insights into the effectiveness of synergistic methods like D-ReLU plus EDM in improving the learning capabilities and resilience of neural network models.

Table 6 and Table 7 provide a comparative analysis between our approach using D-ReLU and the TRADES method with generated samples from the EDM across the CIFAR10 and CIFAR100 datasets, respectively. The tables also show the rankings for comparison. They also provide a comparative analysis of our approach using D-ReLU with the TRADES method with generated samples from the EDM across the CIFAR10 and CIFAR100 datasets. When considering the CIFAR10 dataset, it is evident that D-ReLU generally surpasses TRADES regarding the robustness of the models in a majority of the scenarios. The exception lies in cases involving smaller network architectures such as Dense and Shallow CNNs, where TRADES demonstrates noticeably superior performance compared to D-ReLU. In contrast, D-ReLU shows its strengths in deeper network architectures, where its performance is on par with or even exceeds that of TRADE. This trend of comparative performance is not isolated to the CIFAR10 dataset but is also observable in the results for the CIFAR100 dataset.

For deeper evaluations, the performance differential between D-ReLU and TRADES across different network depths highlights the significance of choosing appropriate defensive techniques depending on the complexity and depth of the employed models. Further insights suggest that while TRADES tends to be more effective with simpler, less deep networks, D-ReLU offers competitive advantages, primarily in more complex architectures. This pattern suggests that the underlying mechanisms of D-ReLU might be better tuned for managing the higher complexities and intricacies associated with deeper networks. Hence, assessing the networks’ architecture becomes crucial when implementing robust training methods, as the choice between D-ReLU and TRADES could significantly impact the effectiveness of model robustness against adversarial attacks.

The graphical representation provided in Figure 21 presents a detailed evaluation of the outcomes derived from implementing D-ReLU in conjunction with the EDM on the TinyImagenet dataset. Interestingly, the results indicate noticeable discrepancies in both performance and robustness compared to scenarios where solely D-ReLU is deployed. This inferior performance observed in the approach combining D-ReLU with the EDM can be attributed to a crucial factor: the generated samples utilized for augmentation originate from data points that are external to the test dataset.

The discrepancy in results between the D-ReLU with EDM method and the standalone D-ReLU approach on the TinyImagenet dataset underscores the significance of the source of generated samples in the training process. By incorporating samples that do not align closely with the original dataset, the model may encounter challenges in effectively generalizing and adapting to the unseen data during inference. This discrepancy highlights the critical aspect of data-source relevance in the augmentation process, emphasizing the importance of utilizing samples that are representative of the original dataset to ensure optimal performance and robustness in model training.

Table 8 presents a detailed comparison of the D-ReLU and TRADES training methodologies using samples generated from the EDM approach, particularly within the context of the TinyImagenet dataset. Upon examining the results, it becomes noticeable that the performance of D-ReLU in smaller network structures, such as Dense and Shallow CNNs, is substantially deficient. When employing D-ReLU in these compact network configurations, the results indicate a stark underperformance compared to its counterpart, TRADES, which appears to better handle the constraints and demands posed by smaller neural networks.

Conversely, in the context of more elaborate and deep network architectures, D-ReLU demonstrates a marked superiority, substantially outperforming TRADES. This significant enhancement in performance with deep networks suggests that D-ReLU is particularly well suited to leverage the complex structures and layers involved in such models, potentially exploiting deeper features and more intricate decision boundaries that deeper architectures facilitate.

Figure 22, Figure 23 and Figure 24 visualize the accuracy on the clean and adversarial samples under several architectures on the CIFAR10, CIFAR100, and TinyImagenet datasets. These results follow the same patterns as in the white-box attacks.

Table 9 presents a comparative analysis between the D-ReLU and TRADES methodologies, utilizing samples generated from the EDM approach while assessing the performance under a black-box attack across three distinct datasets: CIFAR10, CIFAR100, and TinyImagenet. In smaller network configurations such as those typified by the Dense and Shallow CNN architectures, the results observed under a black-box attack align closely with those obtained under white-box attacks, indicating consistent behavior across different types of adversarial attacks in these simpler network models. This consistency is crucial for validating the robustness of training methodologies against varied adversarial strategies.

Expanding the evaluation to deeper network architectures, particularly within the CIFAR10 and CIFAR100 datasets, D-ReLU demonstrates commendable competitiveness with TRADES. This indicates that D-ReLU can effectively leverage the complexities inherent in larger and deeper models to enhance robustness against black-box attacks, thereby suggesting its suitability in scenarios where maintaining integrity against external manipulations in data is critical.

Interestingly, in the TinyImagenet dataset, which typically requires handling of a more extensive and complex set of classes and image variations, D-ReLU not only competes well but also noticeably outperforms TRADES. This superior performance underscores D-ReLU’s potential advantage in more challenging and diverse datasets where the depth and complexity of the network can be turned into a strategic asset to counter adversarial attacks more effectively.

8.2. Discussion

In the context of the CIFAR10 and CIFAR100 datasets, the integration of generated samples from the EDM approach appears to notably enhance the performance and robustness of both the D-ReLU and TRADES training methodologies. This improvement is primarily due to the diversification of data samples provided by EDM, which broadens the array of scenarios that the models encounter during training. Such enhanced variety promotes better generalization capabilities within machine learning models, equipping them to handle a wider range of inputs and reducing overfitting on the training data.

Furthermore, D-ReLU demonstrates a capacity to surpass TRADES in several state-of-the-art (SOTA) networks deployed on these datasets. This superior performance of D-ReLU suggests that its mechanisms might be more effectively aligned with the innate characteristics and challenges presented by the CIFAR10 and CIFAR100 datasets when combined with the enriched diversity of training instances generated through the EDM.

However, the scenario shifts quite dramatically when considering the TinyImagenet dataset. Both D-ReLU and TRADES exhibit significantly diminished performance compared to methodologies that do not employ EDM-generated samples. The core issue stems from the EDM’s inability to produce new samples that accurately reflect the distribution inherent to the test dataset of TinyImagenet. The discrepancy between the training data augmented by the EDM and the actual data distribution encountered in testing hinders the model’s ability to generalize effectively, resulting in poorer performance.

Despite these challenges with the TinyImagenet dataset, it is notable that D-ReLU still maintains a considerable performance edge over TRADES. This indicates that while the overall effectiveness of both methodologies is compromised by the limitations of the EDM in this context, D-ReLU’s approach still manages to adapt more successfully than TRADES, leveraging its strengths to achieve better results even under less-than-ideal conditions.

Such findings underscore the importance of contextual suitability of data augmentation techniques like the EDM in training robust machine learning models. While the EDM proves advantageous in datasets like CIFAR10 and CIFAR100 by enhancing model generalization through diverse examples, its effectiveness is contingent upon the relevance and fidelity of the generated samples to the test environments. Tailoring the choice of augmentation strategies to the specific characteristics of the dataset is crucial in optimizing model performance and robustness. This nuanced approach to training can significantly influence the successful deployment of machine learning models across various real-world applications.

9. Perturbation-Bound Generalization

This section demonstrates how D-ReLU and other baseline methods perform across various perturbation bounds. We choose APGE_CE as the adversarial attack in this experiment because it is the most widely used and one of the strongest attacks.

9.1. Experimental Results

Figure 25 presents the accuracy of various approaches, including the baselines and our proposed methods, on the CIFAR10 dataset under an APGD_CE attack with different levels of perturbation. For a small network like Shallow CNN, our approaches, D-ReLU and D-ReLU with the EDM outperform the other baselines under very small perturbations, except TRADES-5 with the EDM. However, as the perturbation level increases, D-ReLU and D-ReLU with the EDM consistently surpass all the baselines, demonstrating their superior robustness.

Figure 26 and Figure 27 depict similar results for the CIFAR100 and TinyImagenet datasets, respectively. We observe a comparable trend to that of the CIFAR10 dataset, where D-ReLU and D-ReLU with the EDM exhibit enhanced performance over the baselines. Although our approaches show slightly diminished performance on larger datasets, they still generalize well across different perturbation bounds. This consistency across varying perturbation levels highlights our methods’ robustness and adaptability.

9.2. Discussion

Our approaches, D-ReLU and D-ReLU with the EDM, demonstrate significant improvements in accuracy and robustness compared to baseline methods across different datasets and perturbation levels. These results indicate the potential of our techniques to enhance the reliability of machine learning models in adversarial settings, particularly in image classification tasks. Our methods maintain high accuracy under small perturbations and exhibit strong generalization capabilities as the perturbation bound increases, proving their effectiveness in real-world applications where robustness is critical.

10. Limitations

Despite the successful results of D-ReLU, this activation function may be more difficult than ReLU to harness because it has two hyperparameters. The first one is the balancer that was tuned in our experiments. Noticeably, the best balancer in the CIFAR10 dense network is different from the that in the CIFAR10 MobilenetV2 network. Therefore, it is tricky to find the best balancer. Moreover, the second hyperparameter is the initial max value of D-ReLU. We set it to 100 for the MNIST, CIFAR10, CIFAR100, and TinyImagenet datasets. It is clear that the results of our approach on the TinyImagenet are not very satisfying due to the large values before the D-ReLU layer, which cause several areas of zero gradients for training. Therefore, in large datasets, we may need to set it to a higher value. However, the results with an initial max value of 100 are satisfactory. It is noteworthy that if this value is ridiculously high, the training time will significantly increase because the optimizer takes much more time to reduce this max value.

11. Broader Impact

Our research is substantial, offering a transformative solution to the problem of adversarial vulnerability in machine learning systems by customizing activation functions within the model architecture. This enhancement in security was designed to be achieved without significantly affecting the model’s performance on clean, non-adversarial samples. This is a critical advantage for machine learning practitioners who need to ensure that the pursuit of robustness does not come at the expense of efficiency and overall model accuracy.

The potential applications of this technology extend far beyond academic research; it has practical, real-world implications across various sectors utilizing artificial intelligence. Industries ranging from finance and healthcare to autonomous vehicle technology and cybersecurity can greatly benefit from the integration of our findings into their AI development cycles. By implementing our advanced techniques, these sectors can enhance the reliability and security of their systems against adversarial attacks, safeguarding sensitive data and critical operational functions.

Furthermore, our approach is expected to set a significant precedent for future research and development in adversarial robustness. By providing a versatile framework that can be adapted to diverse AI models and applications, our methodology promises to serve as a strong baseline for ongoing efforts in the mitigation of adversarial examples. Researchers and developers can leverage our proven strategies to explore further innovations in the field, potentially leading to even more sophisticated defenses against increasingly complex adversarial attacks.

Finally, the broader impacts of this research are multi-faceted, providing not only a practical method for enhancing the adversarial robustness of machine learning models but also contributing to the elevation of standards for the trustworthiness and security of AI systems in industrial applications. This work supports the important goal of advancing technology that is both powerful and resistant to evolving threats, thereby fostering a safer and more reliable digital future.

12. Conclusions and Future Works

We introduced the D-ReLU function to overcome the gradient vanishing issue observed with S-ReLU. We conducted various experiments demonstrating that D-ReLU enhances adversarial robustness in larger datasets than MNIST. The results indicate that D-ReLU not only performed well but, in some instances, surpassed or matched the performance of TRADES under both white-box and black-box attack scenarios. Our statistical tests on the CIFAR10 dataset also show that D-ReLU significantly outperforms the other baselines.

Moreover, even when testing with augmented samples from the EDM, D-ReLU continued to show superior performance or remained competitive with TRADES. Notably, D-ReLU exhibited robust generalization across various perturbation bounds, a feature that TRADES struggled with. Integrating D-ReLU into a machine learning model offers a favorable balance between performance and robustness, making it a compelling option for enhancing model resilience against adversarial attacks.

In the future, we plan to design and implement a series of controlled experiments aimed at systematically evaluating how different initial maximum settings influence the performance and robustness of machine learning models, especially when applied to large-scale datasets. By manipulating this parameter, we aim to uncover deeper insights into how subtle changes can improve or impair a model’s ability to withstand adversarial attacks, thereby refining the robustness of the activation function.

The anticipated outcome of these future investigations is a more nuanced understanding of the relationship between hyperparameters of the D-ReLU and the overall efficacy of the model. This will not only contribute to the academic literature but also provide practical guidelines that can be applied to enhance the security and reliability of machine learning systems in real-world applications. Through rigorous experimentation and analysis, we believe these efforts will pave the way for the development of more sophisticated, adaptive, and resilient machine learning architectures.

Author Contributions

Conceptualization, K.S. and P.R.; methodology, K.S. and P.R.; formal analysis, K.S. and P.R.; investigation, K.S. and K.S.; resources, K.S. and P.R.; data curation, K.S.; writing—original draft preparation, K.S.; writing—review and editing, K.S. and P.R.; visualization, K.S.; supervision, P.R.; project administration, P.R.; funding acquisition, P.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was executed while P.R. and K.S. were funded by the National Science Foundation under grants NSF CISE—CNS Award 2136961 and 2210091.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1

Proof.

Suppose that we have a feedforward network.

o_{i}^{l}

denotes the output of neuron i in layer l, and

w_{i j}^{l}

is the parameter from neuron i in layer l to neuron j in layer

l + 1

. Then, the output of neuron j in layer l with an activation function (denoted by

act (\cdot)

) is

\begin{matrix} o_{j}^{l} = act (\sum_{i} w_{i j}^{l - 1} \cdot o_{i}^{l - 1}) . \end{matrix}

When a previous layer has some perturbations (i.e.,

δ^{l - 1}

), the output is

\begin{matrix} o_{j}^{l^{*}} & = act (\sum_{i} w_{i j}^{l - 1} \cdot (o_{i}^{l - 1} + δ_{i}^{l - 1})) \\ = act (\underset{A}{\underset{︸}{\sum_{i} w_{i j}^{l - 1} \cdot o_{i}^{l - 1}}} + \underset{B}{\underset{︸}{\sum_{i} w_{i j}^{l - 1} \cdot δ_{i}^{l - 1}}}), \end{matrix}

where

o^{*}

means that the output has a perturbation induced by the previous layers.

Say that

A = \sum_{i} w_{i j}^{l - 1} \cdot o_{i}^{l - 1}

and

B = \sum_{i} w_{i j}^{l - 1} \cdot δ_{i}^{l - 1}

. Then,

o_{j}^{l} = act (A)

, and

o_{j}^{l^{*}} = act (A + B)

. Suppose that we would like to compare the differences between

o_{j}^{l}

and

o_{j}^{l^{*}}

of ReLU and S-ReLU functions. Six cases can happen as follows:

Case 1: $A \leq 0$ and $A + B > m$ → $| o_{j}^{l} - o_{j}^{l^{*}} | = | 0 - (B - A) | = | A - B |$ for ReLU and $| o_{j}^{l} - o_{j}^{l^{*}} | = | 0 - m | = | m |$ for S-ReLU. The perturbations in the output of S-ReLU are smaller than ReLU because $m < | A + B | < | A - B |$ . The inequality is true, since A is negative and B is positive due to the conditions.
Case 2: $0 < A \leq m$ and $A + B > m$ → $| o_{j}^{l} - o_{j}^{l^{*}} | = | A - (A + B) | = | B |$ for ReLU and $| o_{j}^{l} - o_{j}^{l^{*}} | = | A - m | = | A - m |$ for S-ReLU. The perturbations in the output of S-ReLU are smaller than those of ReLU because $B > m - A$ according to the conditions. Also, since both B and $m - A$ are positive due to the conditions, $| B | > | A - m |$ .
Case 3: $A > m$ and $A + B > m$ → $| o_{j}^{l} - o_{j}^{l^{*}} | = | A - (A + B) | = | B |$ for ReLU and $| o_{j}^{l} - o_{j}^{l^{*}} | = | m - m | = 0$ for S-ReLU. The perturbations in the output of S-ReLU are smaller than that of ReLU because $| B | > 0$ .
Case 4: $A > m$ and $0 < A + B \leq m$ → $| o_{j}^{l} - o_{j}^{l^{*}} | = | A - (A + B) | = | B |$ for ReLU and $| o_{j}^{l} - o_{j}^{l^{*}} | = | m - (A + B) | = | B + A - m |$ for S-ReLU. The perturbations in the output of S-ReLU are smaller than ReLU because $A - m$ is positive due to the conditions, and B is negative. Then, $B + A - m$ is greater than B. Thus, $| B + A - m |$ is less than $| B |$ .
Case 5: $A > m$ and $A + B \leq 0$ → $| o_{j}^{l} - o_{j}^{l^{*}} | = | A - 0 | = | A |$ for ReLU and $| o_{j}^{l} - o_{j}^{l^{*}} | = | m - 0 | = | m |$ for S-ReLU. The perturbations in the output of S-ReLU are smaller than that of ReLU because one of the conditions is $A > m$ . Then, $| m | < | A |$ .
Case 6: $A \leq m$ and $A + B \leq m$ → $| o_{j}^{l} - o_{j}^{l^{*}} |$ for both ReLU and S-ReLU because S-ReLU behaves the same as ReLU.

These results are summarized in Table A1 and show that the output of S-ReLU never exceeds that of ReLU. Therefore, the theorem is valid. □

Table A1. The difference between the outputs of a layer in a model on a clean sample and a sample injected by small perturbations under possible conditions.

Conditions	Output Difference
Conditions	ReLU	S-ReLU
$A \leq 0$ and $A + B > m$	$\| A - B \|$	$\| m \|$
$0 < A \leq m$ and $A + B > m$	$\| B \|$	$\| A - m \|$
$A > m$ and $A + B > m$	$\| B \|$	0
$A > m$ and $0 < A + B \leq m$	$\| B \|$	$\| B + A - m \|$
$A > m$ and $A + B \leq 0$	$\| A \|$	$\| m \|$
$A \leq m$ and $A + B \leq m$	Same

Appendix B. Proof of Corollary 1

Proof.

This corollary can be easily proven by the information in Table A1 summarized from the proof of Theorem 1. When m decreases, S-ReLU’s

| o_{j}^{l} - o_{j}^{l^{*}} |

also decreases or remains the same. For example, in case 3, suppose that m′ < m. Therefore,

| B + A - m' | < | B + A - m |

because

B + A \leq m

and

B + A \leq m'

according to the condition. □

References

Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. arXiv 2013, arXiv:1312.6199. [Google Scholar]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
Sooksatra, K.; Hamerly, G.; Rivas, P. Is ReLU Adversarially Robust? In Proceedings of the LatinX in AI Workshop at ICML 2023, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar] [CrossRef]
Kurakin, A.; Goodfellow, I.; Bengio, S. Adversarial examples in the physical world. arXiv 2016, arXiv:1607.02533. [Google Scholar]
Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z.B.; Swami, A. The limitations of deep learning in adversarial settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS&P), Saarbruecken, Germany, 21–24 March 2016; pp. 372–387. [Google Scholar]
Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; pp. 39–57. [Google Scholar]
Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards deep learning models resistant to adversarial attacks. arXiv 2017, arXiv:1706.06083. [Google Scholar]
Ilyas, A.; Engstrom, L.; Athalye, A.; Lin, J. Black-box adversarial attacks with limited queries and information. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 2137–2146. [Google Scholar]
Sooksatra, K.; Rivas, P. Enhancing Adversarial Examples on Deep Q Networks with Previous Information. In Proceedings of the 2021 IEEE Symposium Series on Computational Intelligence (SSCI), Orlando, FL, USA, 5–7 December 2021; pp. 1–7. [Google Scholar]
Meng, D.; Chen, H. Magnet: A two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 135–147. [Google Scholar]
Samangouei, P.; Kabkab, M.; Chellappa, R. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Cohen, J.; Rosenfeld, E.; Kolter, Z. Certified adversarial robustness via randomized smoothing. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 1310–1320. [Google Scholar]
Papernot, N.; McDaniel, P.; Wu, X.; Jha, S.; Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2016; pp. 582–597. [Google Scholar]
Carlini, N.; Wagner, D. Defensive distillation is not robust to adversarial examples. arXiv 2016, arXiv:1607.04311. [Google Scholar]
Wong, E.; Kolter, Z. Provable defenses against adversarial examples via the convex outer adversarial polytope. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 5286–5295. [Google Scholar]
Pang, T.; Du, C.; Dong, Y.; Zhu, J. Towards robust detection of adversarial examples. Adv. Neural Inf. Process. Syst. 2018, 31, 4579–4589. [Google Scholar]
Liu, J.; Zhang, W.; Zhang, Y.; Hou, D.; Liu, Y.; Zha, H.; Yu, N. Detection based defense against adversarial examples from the steganalysis point of view. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4825–4834. [Google Scholar]
Zhao, Z.; Chen, G.; Wang, J.; Yang, Y.; Song, F.; Sun, J. Attack as defense: Characterizing adversarial examples using robustness. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, 11–17 July 2021; pp. 42–55. [Google Scholar]
Nesti, F.; Biondi, A.; Buttazzo, G. Detecting adversarial examples by input transformations, defense perturbations, and voting. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 1329–1341. [Google Scholar] [CrossRef] [PubMed]
Li, Q.; Chen, J.; He, K.; Zhang, Z.; Du, R.; She, J.; Wang, X. Model-agnostic Adversarial Example Detection via High-Frequency Amplification. In Computers & Security; Elsevier: Amsterdam, The Netherlands, 2024; p. 103791. [Google Scholar]
Rakin, A.S.; Yi, J.; Gong, B.; Fan, D. Defend deep neural networks against adversarial examples via fixed and dynamic quantized activation functions. arXiv 2018, arXiv:1807.06714. [Google Scholar]
Qian, H.; Wegman, M.N. L2-nonexpansive neural networks. arXiv 2018, arXiv:1802.07896. [Google Scholar]
Xie, C.; Wu, Y.; van der Maaten, L.; Yuille, A.L.; He, K. Feature denoising for improving adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 501–509. [Google Scholar]
Sehwag, V.; Wang, S.; Mittal, P.; Jana, S. Hydra: Pruning adversarially robust neural networks. Adv. Neural Inf. Process. Syst. 2020, 33, 19655–19666. [Google Scholar]
Wu, B.; Chen, J.; Cai, D.; He, X.; Gu, Q. Do wider neural networks really help adversarial robustness? Adv. Neural Inf. Process. Syst. 2021, 34, 7054–7067. [Google Scholar]
Chen, X.; Li, X.; Zhou, Y.; Yang, T. DDDM: A Brain-Inspired Framework for Robust Classification. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22), Vienna, Austria, 23–29 July 2022. [Google Scholar]
Bubeck, S.; Li, Y.; Nagaraj, D.M. A law of robustness for two-layers neural networks. In Proceedings of the Conference on Learning Theory, PMLR, Boulder, CO, USA, 15–19 August 2021; pp. 804–820. [Google Scholar]
Bubeck, S.; Sellke, M. A universal law of robustness via isoperimetry. Adv. Neural Inf. Process. Syst. 2021, 34, 28811–28822. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 2012, 29, 141–142. [Google Scholar] [CrossRef]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
Le, Y.; Yang, X.S. Tiny ImageNet Visual Recognition Challenge. 2015. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Croce, F.; Hein, M. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Proceedings of the International Conference on MACHINE Learning, PMLR, Virtual Event, 13–18 July 2020; pp. 2206–2216. [Google Scholar]
Andriushchenko, M.; Croce, F.; Flammarion, N.; Hein, M. Square attack: A query-efficient black-box adversarial attack via random search. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 484–501. [Google Scholar]
Zhang, H.; Yu, Y.; Jiao, J.; Xing, E.; El Ghaoui, L.; Jordan, M. Theoretically principled trade-off between robustness and accuracy. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 7472–7482. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Wang, Z.; Pang, T.; Du, C.; Lin, M.; Liu, W.; Yan, S. Better diffusion models further improve adversarial training. In Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA, 23–29 July 2023; pp. 36246–36263. [Google Scholar]
Karras, T.; Aittala, M.; Aila, T.; Laine, S. Elucidating the design space of diffusion-based generative models. Adv. Neural Inf. Process. Syst. 2022, 35, 26565–26577. [Google Scholar]

Figure 1. Adversarial example that misleads an image classifier to predict the image as a cat.

Figure 2. Denoised autoencoder for preprocessing of an adversarial example to create a clean/denoised sample. The solid line is the process with the autoencoder, and the dashed line is the process without the autoencoder.

Figure 3. Randomized smoothing method, where the most common predictions are picked as the output. In this example, four noises are generated by the noise generator.

Figure 4. Adversarial example detection technique where the detected samples are thrown away.

Figure 5. An example of S-ReLU with a max value of 2.

Figure 6. Examples of the MNIST dataset.

Figure 7. Examples of the CIFAR10 dataset.

Figure 8. Examples of the CIFAR100 dataset.

Figure 9. Examples of the TinyImagenet dataset.

Figure 10. Architecture of our approach with an added layer (in red) with D-ReLU before the output layer.

Figure 11. Accuracy of two types of networks on clean MNIST and adversarial examples when adding a dense layer with a D-ReLU function before the output layer.

Figure 12. Accuracy of several types of networks on clean CIFAR10 and adversarial examples when adding a dense layer with a D-ReLU function before the output layer.

Figure 13. Accuracy of several types of CNNs on clean CIFAR10 and adversarial examples when adding a convolutional layer with a D-ReLU function after the input layer.

Figure 14. Accuracy of several types of networks on clean CIFAR100 and adversarial examples when adding a dense layer with a D-ReLU function before the output layer.

Figure 15. Accuracy of several types of networks on clean TinyImagenet and adversarial examples when adding a dense layer with a D-ReLU function before the output layer.

Figure 16. Accuracy of several types of networks on clean CIFAR10 and adversarial examples generated by a black-box attack (i.e., square attack) when adding a dense layer with a D-ReLU function before the output layer.

Figure 17. Accuracy of several types of networks on clean CIFAR100 and adversarial examples generated by a black-box attack (i.e., square attack) when adding a dense layer with a D-ReLU function before the output layer.

Figure 18. Accuracy of several types of networks on clean TinyImagenet and adversarial examples generated by a black-box attack (i.e., square attack) when adding a dense layer with a D-ReLU function before the output layer.

Figure 19. Accuracy of several types of networks on clean CIFAR10 and adversarial examples when adding a dense layer with a D-ReLU function before the output layer and training them with augmented data samples generated from the EDM.

Figure 20. Accuracy of several types of networks on clean CIFAR100 and adversarial examples when adding a dense layer with a D-ReLU function before the output layer and training them with augmented data samples generated from the EDM.

Figure 21. Accuracy of several types of networks on clean TinyImagenet and adversarial examples when adding a dense layer with a D-ReLU function before the output layer and training them with augmented data samples generated from the EDM.

Figure 22. Accuracy of several types of networks on clean CIFAR10 and adversarial examples generated by a black-box attack (i.e., square attack) when adding a dense layer with a D-ReLU function before the output layer and training them with augmented data samples generated from the EDM.

Figure 23. Accuracy of several types of networks on clean CIFAR100 and adversarial examples generated by a black-box attack (i.e., square attack) when adding a dense layer with a D-ReLU function before the output layer and training them with augmented data samples generated from the EDM.

Figure 24. Accuracy of several types of networks on clean TinyImagenet and adversarial examples generated by black-box attacks when adding a dense layer with a D-ReLU function before the output layer and training them with augmented data samples generated from the EDM.

Figure 25. Accuracy of several approaches on the CIFAR10 dataset under an APGD_CE attack with various perturbation bounds, where mReLU is D-ReLU.

Figure 26. Accuracy of several approaches on the CIFAR100 dataset under an APGD_CE attack with various perturbation bounds, where mReLU is D-ReLU.

Figure 27. Accuracy of several approaches on the TinyImagenet dataset under an APGD_CE attack with various perturbation bounds, where mReLU is D-ReLU.

Table 1. Accuracy metrics for dense networks and shallow CNNs under various robust training schemes, evaluating them on both clean samples and adversarial examples generated by different attacks on the MNIST dataset. Note that AP_CE is APGD_CE, AP_DLR is APGD_DLR, the accuracy metrics in bold are the highest in a specific model among the different training methods, the numbers in parentheses are the ranks for training methods under an architecture, TRADES-k indicates the TRADES approach with

β = k