Enhancing Adversarial Robustness through Stable Adversarial Training

Yan, Kun; Yang, Luyi; Yang, Zhanpeng; Ren, Wenjuan

doi:10.3390/sym16101363

Open AccessArticle

Enhancing Adversarial Robustness through Stable Adversarial Training

by

Kun Yan

¹,

Luyi Yang

²,

Zhanpeng Yang

^3,* and

Wenjuan Ren

^3,*

¹

Beijing Institute of Technology, Beijing 100081, China

²

Foreign Language Department, Information and Engineering University, Zhengzhou 450000, China

³

The Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China

^*

Authors to whom correspondence should be addressed.

Symmetry 2024, 16(10), 1363; https://doi.org/10.3390/sym16101363

Submission received: 4 September 2024 / Revised: 8 October 2024 / Accepted: 10 October 2024 / Published: 14 October 2024

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

:

Deep neural network models are vulnerable to attacks from adversarial methods, such as gradient attacks. Evening small perturbations can cause significant differences in their predictions. Adversarial training (AT) aims to improve the model’s adversarial robustness against gradient attacks by generating adversarial samples and optimizing the adversarial training objective function of the model. Existing methods mainly focus on improving robust accuracy, balancing natural and robust accuracy and suppressing robust overfitting. They rarely consider the AT problem from the characteristics of deep neural networks themselves, such as the stability properties under certain conditions. From a mathematical perspective, deep neural networks with stable training processes may have a better ability to suppress overfitting, as their training process is smoother and avoids sudden drops in performance. We provide a proof of the existence of Ulam stability for deep neural networks. Ulam stability not only determines the existence of the solution for an operator inequality, but it also provides an error bound between the exact and approximate solutions. The feature subspace of a deep neural network with Ulam stability can be accurately characterized and constrained by a function with special properties and a controlled error boundary constant. This restricted feature subspace leads to a more stable training process. Based on these properties, we propose an adversarial training framework called Ulam stability adversarial training (US-AT). This framework can incorporate different Ulam stability conditions and benchmark AT models, optimize the construction of the optimal feature subspace, and consistently improve the model’s robustness and training stability. US-AT is simple and easy to use, and it can be easily integrated with existing multi-class AT models, such as GradAlign and TRADES. Experimental results show that US-AT methods can consistently improve the robust accuracy and training stability of benchmark models.

Keywords:

adversarial defense; training stability; robust accuracy

1. Introduction

Through continuous development, deep neural network modeling has been widely used in various fields, such as natural language processing, computer vision, signal processing, speech recognition, etc. Although the deep neural network has played a significant role in many application scenarios, deep neural networks also have a weak side, that is, they are prone to network adversarial attacks, and even under small adversarial disturbances, the performance of the model may experience degradation. Adversarial attack techniques utilize the sensitivity of models to perturbations in the gradient direction to generate adversarial perturbation samples, resulting in significant degradation of model accuracy even under small perturbation conditions, thus achieving the goal of model attack [1]. On the contrary, adversarial training techniques optimize the generation of adversarial samples for auxiliary training, and they use the training model on the adversarial samples to obtain the ability to robustly recognize adversarial samples. This problem can be defined as follows:

min_{N} E_{(x, y) \sim D} [max_{x + δ \in B (x, ϵ)} L (N, x + δ, y)]

(1)

where N is a deep neural network,

ϵ > 0

,

(x, y) \sim D

denotes real samples and labels satisfying the distribution

D

,

B (x, ϵ) : = {x + δ | ∥ δ ∥ \leq ϵ}

, and

L

is an object function.

Many solutions have emerged to address this issue and are constantly evolving in different directions. Goodfellow et al. [2] proposed a fast gradient sign method (FGSM) to generate attack adversarial samples, which led to producing an important class of adversarial training techniques. Kurakin et al. [3] introduced a basic iterative method by directly generalizing the FGSM. Maini et al. [4] combined some AT models based on the Projected Gradient Descent (PGD) to obtain a complete AT method, which could be robust against different attacks simultaneously. Traditional metric learning has been introduced to address the issue of improving robust accuracy [5]. The GradAlign method using gradient alignment in perturbation sets was proposed to adress he phenomenon of catastrophic overfitting [6]. Herrmann et al. [7] introduced the pyramid adversarial training (PyramidAT) method to improve the natural accuracyies of vision transformers (ViTs). Li et al. [8] constructed a class of adversarial training methods for pre-trained ViT models, which can balance natural accuracy and robustness accuracy. In typical specific application scenarios, Li et al. [9] considered the problem of signal individual classification based on deep neural network to fight against attacks. And the authors also showed the effectiveness of the model with the fight against attacks for different fight scenarios. In the paper [10], the authors provided a communication signal countermeasure attack model using Minimum Power Adversarial Attacks (PAAs) and achieved a better attack effect on their simulation dataset. Consistency regularization techniques were introduced into adversarial training models to suppress robust overfitting problems [11,12].

Although there are currently more and more methods that can alleviate the challenge of adversarial attacks to some extent, these methods typically focus on specific targets, such as improving robust accuracy, balancing robust accuracy with natural accuracy, and accelerating training methods. Each method is based on its own specific observations and special theoretical guarantees, and each method continuously improves the model’s adversarial robustness using different optimization strategies. These different perspectives have greatly enriched the progress of adversarial robustness technology, which is also the main reason why adversarial defense communities can remain active. In this article, we consider adversarial defense issues from a perspective of stability. We theoretically establish the internal relationship between a special type of Ulam stability and the adversarial robustness of neural network models, and we validate the effectiveness of improved adversarial defense methods based on Ulam stability condition constraints in experiments. The Ulam stability constraint condition can limit the feature subspace of the network to a bounded region with special functional properties [13,14]. In this optimal special subspace, neural network models based on adversarial learning types can achieve relatively stable optimization. Our main contributions in this article include the following aspects:

(1): We present a proof of the Ulam stability theorem for deep neural networks. Through the theorem, the Ulam stability not only accurately characterizes the boundaries of feature subspaces for deep neural networks, but it also provides a new theoretical perspective for finding the optimal feature subspace for adversarial training.
(2): We present a Ulam-stability-based adversarial training method (US-AT) that can be seamlessly integrated with existing classical methods to further strengthen the robustness of benchmark models against adversarial attacks. Our US-AT framework utilizes various Ulam stability conditions to construct bounded optimal feature subspaces with special properties, thereby enhancing the stability and robustness of adversarial training. Additionally, this approach allows for the combination of different types of adversarial training strategies, such as regularization and data augmentation, to further enhance the model’s ability to withstand adversarial attacks.
(3): The experimental results demonstrate that the US-AT method can not only enhance the robust accuracy of the model, but it also improves its training stability. Furthermore, the combination of this method with multiple types of models further confirms the strong compatibility between the Ulam stability framework and other adversarial training methods.

2. Related Work

2.1. Adversarial Training

Adversarial training technology is an important technique in the field of intelligent security, which can effectively resist the accuracy degradation of models caused by network adversarial attacks. Zhang et al. [15] gave the tightest possible upper bound uniform over all possible distributions and measurable predictors using the classification-calibrated theory, and they introduced a new AT method (TRADES) to balance natural accuracies and robustness accuracies. Huang et al. [16] presented a new study on the robustness of residual type deep neural networks from the perspective of network architecture, including factors such as topology, kernel size, activation, and normalization. Bai et al. [17] discussed the general problems and the challenges in the filed of adversarial training. By random initialization method, Wong et al. [18] proposed a modified fast gradient sign method (FGSM), which was as effective, as PGD type methods but had lower algorithm costs. Ali et al. [19] proposed a free adversarial training algorithm that reused the gradient information and reduced the computational cost caused by the generation of adversarial samples. Li et al. [20] presented a subspace adversarial training (Sub-AT) method, which constrained the adversarial training model in a specific subspace to improve the robustness. Liu et al. [21] proposed mutual adversarial training method (MAT) in which multiple models were trained together and shared the knowledge of adversarial examples to achieve improved robustness. Due to the significant structural differences between ViT and CNN, Gong et al. [22] provied the Random-angled Image Transformer (ReiT) techniques to improve the robustness accuracy of ViT models.

2.2. Robustness Accuracy and Robustness Overfitting

Robustness accuracy refers to the recognition accuracy of deep neural networks under network adversarial attacks, such as gradient based attacks. The adversarial training strategy based on adversarial generated samples is an effective adversarial defense technique. Rice [23] proposed an adversarial training strategy based on early stopping conditions to address the phenomenon of overfitting in adversarial training. Li et al. [20] proposed a subspace adversarial training model that utilizes the intrinsic relationship between gradient growth and robust overfitting, and they achieved good results. By analyzing the difference in distribution between weak adversarial and strong adversarial approaches, Yu et al. [24] proposed a minimum-loss-constrained adversarial training model (MLCAT) that significantly enhances the robust accuracy of the AT model. Tang et al. [25] presented the Test-Time Pixel-Level Adversarial Purification (TPAP) method, which is a novel defense strategy that uses a FGSM robust overfitting network and adversarial purification processing at testing phase for robust defense against unknown adversarial attacks. A continual adversarial defense model with anisotropic and isotropic pseudo-replay was proposed to solve the problem of disaster forgetting [26]. Some work on adversarial robustness involves pre-training visual language models and multi-modal models [27,28].

3. Ulam Stability Adversarial Training

3.1. Some Results on Ulam Stability of Neural Networks

Ulam stability is an important research object of nonlinear functional analysis, which characterizes the stability and existence of solutions of functions in abstract Banach spaces [29,30,31]. If an operator equation is Ulam stable, there exists a mapping (or solution) with the error that is no more than the upper bound of Ulam constant around its approximate solution. This property can be used to define the boundary of the latent feature subspace of deep neural networks, meaning that the subspace of deep neural networks satisfying Ulam stability is limited by a certain function within a region that is less than a certain error. In papers [13,14] the authors introduced Ulam stability into the application of deep neural networks to solve domain adaptation problems. This application of Ulam stability in the field of domain adaptation has motivated us to explore its potential in solving deeper problems in the field of deep learning. In classical functional analysis, a classic method for proving Ulam stability is the perturbation method, where the operator equation has an exact solution under small perturbations, and the error between the exact solution and the approximate solution can be controlled by a certain bounded control function. This can be used by the AT to improve the stability of models. In this paper, we aim to equip the network with a similar ability to control stability, which makes the model more robust to adversarial training. Below, we provide the necessary definition and theorem for this paper.

Definition 1.

Let

x = (x_{1}, x_{2}, \dots, x_{n}) \in X \subset R^{n}

be an element in a norm space X. The

l_{\infty}

-norm of x can be defined as follows:

{∥ x ∥}_{\infty} = {max}_{i = 1}^{n} {| x_{i} |},

(2)

where

| \cdot |

is an absolute value function.

Definition 2.

Let

x = (x_{i j}) \in X \subset R^{m \times n}

be an element in a norm space X. The

l_{\infty}

-norm of x can be defined as follows:

{∥ x ∥}_{\infty} = max_{1 \leq i \leq m} \sum_{j = 1}^{n} | x_{i j} |,

(3)

where

| \cdot |

is an absolute value function.

Definition 3

((Generalized Ulam Stability) [29]). Let X and Y be both Banach spaces (or Eucidean spaces) equipped with the norm

∥ \cdot ∥

. Set functions

ψ, ϕ : X \to R^{+}

. Assume that

f : X \to Y

is a function, and

O

is an operator. If

O [f] (x) < ψ (x), x \in X,

(4)

then there exists a mapping

g : X \to Y

such that

O [g] (x) = 0

(5)

and

∥ f (x) - g (x) ∥ \leq ϕ (x), x \in X .

(6)

We say that the operator

O

is generalized Ulam stable.

Remark 1.

Definition 3 is a definition of the generalized Ulam stability of abstract functional spaces. In the subspace composed of deep neural networks, we hope that deep neural networks also have some good properties similar to functionals or have good approximation properties and quantifiable boundaries that can be estimated.

In the following analysis, we will prove that if a deep neural network is Ulam stable, it can be approximated by a good function, and its boundaries can be estimated by that function.

If

ψ (x) = ϕ (x) = δ

, the generalized Ulam stability is also called Ulam stable. Let

O

be an abstract operator. We provide definitions for the five types of inequality constraints considered in this article.

Definition 4.

The family of Ulam stability conditions

F

is defined as follows.

(H1) The δ-additive transformation is defined as

O [f] (x, y) : = ∥ f (x + y) - f (x) - f (y) ∥ < δ,

(7)

where

x, y \in X

and X constitute a Banach space.

(H2) The δ-quadratic transformation is defined as

\begin{matrix} O [f] (x, y) : = ∥ f (x + y) + f (x - y) - 2 f (x) - 2 f (y) ∥ < δ, \end{matrix}

(8)

where

x, y \in X

and X constitute a Banach space.

(H3) The δ-isometric transformation is defined as

O [f] (x, y) : = | ∥ f (x) - f (y) ∥ - ∥ x - y ∥ | < δ,

(9)

where

x, y \in X

and X constitute a Euclidean vector space or a Hilbert space.

(H4) The δ-Hosszù transformation is defined as

\begin{matrix} O [f] (x, y) : = ∥ f (x + y - x y) + f (x y) - f (x) - f (y) ∥ < δ, \end{matrix}

(10)

where

x, y \in X

and X constitute a Banach space.

(H5) The δ-Jensen transformation is defined as

O [f] (x, y) : = ∥ f (x / 2 + y / 2) - f (x) / 2 - f (y) / 2 ∥ < δ,

(11)

where

x, y \in X

and X constitute a Banach space.

Remark 2.

The condition (H1)–(H5) is only a part of the Ulam stability conditions. It is easy to verify that (H1) represents the family of approximate additive functions, (H2) represents the family of approximate quadratic functions, (H3) represents the family of approximate isometric functions, (H4) represents the family of approximate complex functional equations, and (H5) represents the family of approximate convex functions.

In fact, conditions such as exponential functions, logarithmic functions, and hyperfunctions may also have good properties. There are many stability conditions in the sense of Ulam stability, which can be expressed not only in the form of differences but also in the form of differentiation and integration, as well as in the form of discretization. This article considers the convenience of the form and tests the above five stability conditions. Assume that f is a deep neural network, operator

O

is symmetric (

O [f] (x, y) = O [f] (y, x)

) in definitions (H1), (H2), (H3), and (H5), but it is not necessarily asymmetric in (H4) when

x y \neq y x, x, y \in R^{m} \times R^{n}

.

Theorem 1.

If a deep neural network N satisfies any of the above conditions (H1)–(H5) in Definition 4, then N is generalized Ulam stable, and there is a solution

N : X \to X

satisfying Equation (5) and Inequality (6).

Proof.

The stabilities under (H1)–(H5) conditions can be proven through direct construction methods [32], and the proof processes are quite similar. The difference is that the construction process is different. Below, we provide a proof of Ulam stability under approximate isomorphism conditions (H2), and other conditions can be obtained through similar proofs. The proof process is divided into four parts:

(1) The first step is to prove the following conclusion. Let X be an abstract Hilbert space equipped with the inner product

< \cdot, \cdot >

. Let

N (x)

be a

δ

-isometry transformation with

N (0) = 0

. The limit

N (x) = {lim}_{n \to \infty} \frac{N (2^{n} x)}{2^{n}}

exists for every x in X. And

N

is an isometry mapping.

We have

| ∥ N (x) ∥ - ∥ x ∥ | < δ

and

| ∥ N (x) - N (2 x) ∥ - ∥ x ∥ | < δ .

Let

y_{0} = \frac{N (2 x)}{2}

; then,

| ∥ x ∥ - ∥ y_{0} ∥ | < \frac{δ}{2}

. Assume that

S_{1} = {y : ∥ y ∥ < ∥ x ∥ + δ}

and

S_{2} = {y : ∥ y - 2 y_{0} ∥ < ∥ x ∥ + δ}

. Then,

N (x)

is in the intersection, and for any point y of

S_{1} \cap S_{2}

we can obtain

2 ∥ y - y_{0} ∥^{2} = {2 ∥ y ∥}^{2} + 2 {∥ y_{0} ∥}^{2} - 4 〈y, y_{0}〉,

∥ y - 2 y_{0} ∥^{2} = {∥ y ∥}^{2} + 4 ∥ y_{0} ∥^{2} - 4 〈y, y_{0}〉 < {(∥ x ∥ + δ)}^{2}

and

{∥ y ∥}^{2} < {(∥ x ∥ + δ)}^{2}

.

Then, we have

\begin{matrix} 2 ∥ y - y_{0} ∥^{2} & < {(∥ x ∥ + δ)}^{2} + {∥ y ∥}^{2} - 2 {∥ y_{0} ∥}^{2} \\ < 2 {(∥ x ∥ + δ)}^{2} - 2 {∥ y_{0} ∥}^{2} \\ < 2 {(∥ x ∥ + δ)}^{2} - 2 {(∥ x ∥ - \frac{δ}{2})}^{2} \\ = 6 δ ∥ x ∥ + \frac{3}{2} δ^{2} . \end{matrix}

(12)

If

∥ x ∥ \geq δ

, then

∥N (x) - \frac{N (2 x)}{2}∥ < 2 {(δ ∥ x ∥)}^{1 / 2}

. If

∥ x ∥ < δ

, we have

∥N (x) - \frac{N (2 x)}{2}∥ < 2 δ

.

The following inequality

∥N (\frac{x}{2}) - \frac{N (x)}{2}∥ < 2^{- 1 / 2} k {(∥ x ∥)}^{1 / 2} + 2 δ

(13)

is satisfied, where

k = 2 δ^{1 / 2}

, and

x \in X

.

By using mathematical induction, we can obtain the following inequality:

\begin{matrix} ∥N (2^{- n} x) - 2^{- n} N (x)∥ < 2^{- n / 2} k {∥ x ∥}^{1 / 2} (\sum_{i = 0}^{\infty} 2^{- i / 2}) + 4 δ . \end{matrix}

(14)

If

m, p \in Z^{+}

, then we have

\begin{matrix} ∥2^{- m} N (2^{m} x) - 2^{- m - p} N (2^{m + p} x)∥ \\ = 2^{- m} ∥N (2^{m + p} x / 2 p) - 2^{- p} N (2^{m + p} x)∥ \\ < 2^{- m / 2} a {∥ x ∥}^{1 / 2} + 2^{2 - m} δ, \end{matrix}

(15)

where

a = k \sum_{i = 0}^{\infty} 2^{- i / 2}

, and

x \in X

.

Since X is a Hilbert space, the limit

N (x) = lim_{n \to \infty} \frac{N (2^{n} x)}{2^{n}}

exists for

x \in X

. It is easy to verify that there exists a set

A \subset X

such that the equation

N (x) = 0

for

x \in A

. We call

N

a completely nonlinear set, on which the neural network N has an upper bound. Since

N (x) = 0

is trivial in the sense of Ulam stability, in the following proof, we only consider the set

X : = ∁_{X} A

.

Then, for

x, y \in X

, the following equation

∥ N (x) - N (y) ∥ = ∥ x - y ∥

is easy to obtain by

| ∥ N (2^{n} x) - N (2^{n} y) ∥ - 2^{n} ∥ x - y ∥ | < δ .

(16)

(2) The second step is to prove the following conclusion. If u and x are any points of

X

such that

∥ u ∥ = 1

and

〈x, u〉 = 0

; then,

〈N (x), N (u)〉 \leq 3 δ

.

For an arbitrary integer n, put

z = 2^{n} u

. Let

y \in S_{n} : = B (z, 2^{n})

. Then,

∥ y - z ∥ = ∥ z ∥

. It follows that

〈y, u〉 = 2^{- n - 1} 〈y, y〉

. Since N is an

δ -

isometry, we have

∥ N (y) - N (z) ∥ = η 〈y, z〉 + ∥ N (z), ∥

where

|η 〈y, z〉| < 2 δ

.

Dividing by

2^{n + 1}

, we obtain the equality

\begin{matrix} 〈N (y), 2^{- n} N (2^{n} u)〉 = \frac{1}{2^{n + 1}} [〈N (y), N (y)〉 - η^{2}] - η ∥\frac{N (2^{n} u)}{2^{n}}∥ . \end{matrix}

(17)

Set

x \in E

such that

〈x, u〉 = 0

. Then,

y = x + r u

, where

r = 2^{n} - (2^{2 n} - {∥ x ∥}^{2})^{1 / 2}

is a point of the sphere

S_{n}

. We have

{∥ y - z ∥}^{2} = {∥ z ∥}^{2} .

Moreover,

∥ y - x ∥ = r \to 0

as

n \to \infty

. Set

t = {lim}_{n \to \infty} \frac{N (2^{n} u)}{2^{n}}

exists and is a unit vector. Then, we can obtain

\begin{matrix} 〈N (x), t〉 & \leq 〈N (x), t - \frac{N (2^{n} u)}{2^{n}}〉 + 〈N (y), \frac{N (2^{n} u)}{2^{n}}〉 + 〈N (x) - N (y), \frac{N (2^{n} u)}{2^{n}}〉 \\ < ϵ + 3 δ (1 + ϵ) \end{matrix}

(18)

where

ϵ

is an arbitrary positive. It follows that

〈N (x), U (x)〉 = 〈N (x), t〉 \leq 3 δ .

(19)

(3) The third step is to prove the following conclusion. If

N (X) = X

, then

N (X) = X

.

For

z \in X

, let

N^{- 1} (z)

denote any point whose N-image is z. We call

N^{- 1}

as the ideal decoder of the nerual network N. Then,

N^{- 1} (z)

is an

δ

-isometric mapping. The limit

N^{*} (z) = {lim}_{n \to \infty} \frac{N^{- 1} (2^{n} z)}{2^{n}}

exists, and the mapping

N^{*}

is also an isometry on

X

. We have

\begin{matrix} ∥2^{n} z - N (2^{n} N^{*} (z))∥ < 2^{n} ∥\frac{N^{- 1} (2^{n} z)}{2^{n}} - N^{*} (z)∥ + δ . \end{matrix}

(20)

Upon divding by

2^{n}

and letting

n \to \infty

, we can obtain

z = N N^{*} (z)

. Therefore,

N (X) = X

for

z \in X

. Moreover,

N

is surjective and linear by the classic Mazur–Ulam theorem.

(4) Finally, for

x \in X

, we only need to prove that the inequality

∥ N (x) - N (x) ∥ < 10 δ

holds.

For any

x \neq 0

, assume that

M

is the linear mainfold orthogonal to x. Then,

N

is an isometric transformation that maps

X

into the whole

X

. Hence,

N (M)

is the linear manifold orthogonal to

N (x)

. Let w be the projection of

N (x)

on

N (M)

. If

w = 0

, set

t = 0

. Otherwise, let

t = w / ∥ w ∥

. The inequality

〈N (x), t〉 \leq 3 δ

holds. Set

ν = N (x) / ∥ x ∥

. Then,

ν

is an unit vector associated with t and is coplanar with

N (x)

and t. Using the Pythagorean theorem, one can obtain

{∥N (x) - N (x)∥}^{2} = {〈N (x), t〉}^{2} + {[∥ x ∥ - 〈N (x), v〉]}^{2} .

(21)

Let

z_{n} = 2^{n} x

. If the projection

μ_{n}

of

N (z_{n})

on

N (M)

is not zero, then we set

t_{n} = μ_{n} / ∥ μ_{n} ∥

. Otherswise, let

t_{n} = 0

. In both cases

〈t_{n}, ν〉 = 0

, and

〈N (z_{n}), t_{n}〉 \leq 3 δ

. If

∥ T (z_{n}) ∥ < 3 δ

; then,

∥ T (z_{n}) ∥ - 〈T (z_{n}), ν〉 \leq 3 δ .

If

∥ T (z_{n}) ∥ \geq 3 δ

, then we have

\begin{matrix} 0 \leq ∥ N (z_{n}) ∥ - 〈N (z_{n}), ν〉 = ∥ N (z_{n}) ∥ - {(∥ N (z_{n}) ∥^{2} - {〈N (z_{n}), t_{n}〉}^{2})}^{1 / 2} \leq 3 δ . \end{matrix}

(22)

Hence, the inequality

| ∥ z_{n} ∥ - 〈N (z_{n}), ν〉 | < 4 δ

(23)

holds, since

∥ z_{n} ∥ < ∥ N (z_{n}) ∥ + δ

.

Consider the following two situations. If

〈N (x), ν〉 \geq 0

, set

n = 0

in (23) and (21). Then, we have

∥ N (x) - N (x) ∥ < 5 δ

. If

〈N (x), ν〉 < 0

, then for some integer

m \geq 0

, we must have

〈N (z_{m}), ν〉 < 0

and

〈N (2 z_{m}), ν〉 \geq 0

, since

〈N (x), ν〉

is positive and

N (x) = {lim}_{n \to \infty} N (z_{n}) / 2^{n}

. Hence, we have

\begin{matrix} ∥N (2 z_{m}) - N (z_{m})∥ \geq 〈N (2 z_{m}), ν〉 - 〈N (z_{m}), ν〉 > 3 ∥ z_{m} ∥ - 8 δ . \end{matrix}

(24)

But we know that

∥ N (2 z_{m}) - N (z_{m}) ∥ < ∥ z_{m} ∥ + δ

; then, we have

∥ x ∥ \leq ∥ z_{m} ∥ < (9 / 2) δ

, and

∥ N (x) - N (x) ∥ < 2 ∥ x ∥ + δ \leq 10 δ

for

x \in X

. □

Remark 3.

In fact, slightly changing the conditions in Theorem 1 can lead to stronger (weaker) conclusions about hyperstability (weak stability). Hyperstability neural networks have stronger control boundaries and significant differences in their asymptotic properties, which can be generalized as the next step of work.

If the operator

O

acts on the neural network, then by Theorem 1, we can obtain the following corollary:

Corollary 1.

Assuming N is a deep neural network that satisfies Theorem 1, there exists a function

N

and a control function

ϕ : X \to R^{+}

such that

N (x) \subset B (N (x), ϕ (x)), x \in X

.

Proof.

(I) If

O [N]

satisfies condition (H1), then there exists an additive function A such that

∥ N (x) - A (x) ∥ \leq ϕ (x), x \in X

. In this case, the boundary of the value range of N is fixed by an additive mapping A and a sphere with a bounded perturbation

ϕ (x)

, that is,

N (x) \subset B (A (x), ϕ (x))

. We call the neural network satisfying the condition (H1) an approximately additive neural network.

(II) If

O [N]

satisfies condition (H2), then there is a quadratic function Q such that

∥ N (x) - Q (x) ∥ \leq ϕ (x), x \in X

. In this case, the boundary of the value range of N is fixed by a quadratic mapping Q and a sphere with a bounded perturbation

ϕ (x)

, that is,

N (x) \subset B (Q (x), ϕ (x))

. We call the neural network satisfying the condition (H2) an approximately quadratic neural network.

(III) If

O [N]

satisfies condition (H3), then there is an isometric function I such that

∥ N (x) - I (x) ∥ \leq ϕ (x), x \in X

. In this case, the boundary of the value range of N is fixed by an isometric mapping I and a sphere with a bounded perturbation

ϕ (x)

, that is,

N (x) \subset B (I (x), ϕ (x))

. We call the neural network satisfying the condition (H3) an approximately isometric neural network.

(IV) If

O [N]

satisfies condition (H4), then there is a Hosszù function I such that

∥ N (x) - H (x) ∥ \leq ϕ (x), x \in X

. In this case, the boundary of the value range of N is fixed by a Hosszù mapping H and a sphere with a bounded perturbation

ϕ (x)

, that is,

N (x) \subset B (H (x), ϕ (x))

. We call the neural network satisfying the condition (H4) an approximately Hosszù neural network.

(V) If

O [N]

satisfies condition (H5), then there is a Jensen function J such that

∥ N (x) - J (x) ∥ \leq ϕ (x), x \in X

. In this case, the boundary of value range of N is fixed by a Jensen mapping H and a sphere with a bounded perturbation

ϕ (x)

, that is,

N (x) \subset B (J (x), ϕ (x))

. We call the neural network satisfying the condition (H5) an approximately Jensen neural network. □

3.2. Ulam Stability Adversarial Training

In this section, we establish the connection between Ulam stability theory and adversarial training. Associated with different Ulam stability conditions, one can induce optimal feature subspaces with different properties, in which the adversarial training model is more stable and robust.

Assume that

x_{a d v} \in X_{a d v}

is an adversarial sample associated with a sample

x \in X

. It is easy to see that

X_{a d v} \subset B (X, ϵ)

, where

B (X, ϵ) : = \cup_{x} {x + δ; ∥ δ ∥ \leq ϵ, x \in X}

. Let N be a deep neural network. Furthermore, assume that a benchmark adversarial training model is defined in the following form:

min_{N} E_{(x, y) \sim D} [max_{x_{a d v} \in B (x, ϵ)} L_{A T} (N, x, x_{a d v}, y)]

(25)

where

L_{A T}

is an objective function,

x, y \in X

,

x_{a d v} \in X_{a d v}

, and

ϵ > 0

.

Under the framework of the adversarial training model (25), we introduce the following unconstrained optimization problem (26) and define the corresponding Ulam stability object function

L_{U S}

:

min_{N} L_{U S} (N, x, x_{a d v}) : = O [N] (x, x_{a d v}),

(26)

where

O

is a Ulam stable condition,

x \in X

and

x_{a d v} \in X_{a d v}

. In this article, we will focus on the conditions (H1)–(H5). From the optimization problem (26), it can be observed that if

O \leq δ

, the abstract operator

O

is generalized Ulam stable, and the error boundary of the value range of neural network

N (x)

can be estimated by a certain mapping

N (x)

with certain properties. At this point, the neural network N can effectively resist attacks from the gradient direction. Our improved Ulam stability adversarial training (US-AT) has been transformed into a joint optimization problem for problems (25) and (26).

If

L_{A T}

is a regular type loss function, then the overall loss function

L_{T o t a l}

of our newly constructed US-AT can be defined as

L_{T o t a l} = L_{A T} + λ_{1} \cdot L_{U S}

(27)

where

x \in X

,

x_{a d v} \in X_{a d v}

, and

λ_{1} > 0

.

The US-AT does not define a method for generating adversarial samples and relies on an adversarial training loss function

L_{A T}

. Therefore, the US-AT does not have an independent adversarial training model, but rather provides a strategy for enhancing or improving adversarial training, further improving the performance of the benchmark model. However, a natural question has been raised: are the Ulam stability induced loss function $L_{U S}$ and $L_{A T}$ compatible, that is, is the Ulam condition induced optimal feature subspace also suitable for adversarial training?

We will theoretically analyze and experimentally demonstrate that not all the Ulam stability conditions are compatible with the AT. However, some of these different Ulam stability conditions with induced feature subspaces can effectively enhance the AT ability of the benchmark model, not only improving its stability but also enhancing its robust accuracy.

In theory, if condition (H1) holds, the stability is called a first-order additive stability. If

x \in R

, then the feature space of N can be represented by an additive function

A (x) = a x + b

, that is,

a x + b - δ \leq N (x) \leq a x + b + δ,

(28)

where a and b are fixed constants. In this case, N has strong linearity and is susceptible to gradient attack methods. If condition (H2) holds, the stability is called a quadratic stability. If

x \in R

, then the feature space of N can be represented by a quadratic function

Q (x) = a x^{2} + b x + c

, that is,

a x^{2} + b x + c - d δ \leq N (x) \leq a x^{2} + b x + c + d δ,

(29)

where

a, b, c

, and d are fixed constants. In this case, N behaves similarly to a polynomial function and is easily influenced by gradient attack methods. In both cases, the feature subspace is constrained within a smooth banded region induced by a polynomial function. This also causes the properties of the neural network to resemble those of a polynomial function, resulting in a smoother gradient and making it vulnerable to attacks while also making it difficult to be compatible with other types of gradient adversarial defense methods. However, if conditions (H3)–(H5) hold, the models are restricted to the corresponding isometric function, the solution of the Hosszù equation, and the strip region of the convex function.

3.3. Model Enhancement for US-AT

Although the US-AT model can optimize the feature subspace of the neural network, the adversarial training model can have stronger adversarial robustness. However, due to the lack of optimization design for adversarial training strategy functions limits the potential improvement of the model. Therefore, in this section, we further enhance the adversarial robustness of the US-AT method.

3.3.1. Model Augmentation Method Based on Algebraic Operation

Let us revisit the conditions (H1)–(H5) once again; it can be found from their definitions that when these conditions are applied to the US-AT model, auxiliary sample data always appear in conditions (H1), (H2), (H4), and (H5). This means that the data are calculated using both samples x and adversarial samples y, such as

x + y

in condition (H1) and

x + y

, as well as

x - y

in condition (H2). However, condition (H3) does not introduce any new auxiliary samples. In order to fully utilize adversarial samples, we propose a new enhanced adversarial training method.

Taking the condition (H5) as an example, we improved the US-AT model using newly induced auxiliary sample data

x / 2 + x_{a d v} / 2

. Let

x \in X

be a sample and

y = x_{a d v} \in X_{a d v}

be an adversarial sample associated with x. Assume that N is a well-trained neural network. Then,

x_{a u x} = \frac{x + x_{a d v}}{2}

naturally becomes auxiliary data and, in a sense, a new adversarial sample. Similar to the GAT method, we introduce the following loss function to suppress differences between samples and improve the model’s adaptability to adversarial samples.

\begin{matrix} L_{a u s} & = ∥s o f t m a x (N (x)) - s o f t m a x (N (x_{a d v}))∥ \\ + ∥ s o f t m a x (N (x_{a u x})) - s o f t m a x (N (x_{a d v})) ∥ \\ + ∥ s o f t m a x (N (x)) - s o f t m a x (N (x_{a u x})) ∥, \end{matrix}

(30)

where

∥ \cdot ∥

is a matrix norm. Therefore, the overall loss function

L_{T o t a l}

of the improved model is changed to

L_{T o t a l}^{1} = L_{A T} + λ_{1} \cdot L_{U S} + λ_{2} \cdot L_{a u s},

(31)

where

λ_{1}, λ_{2} \geq 0

. We call this method the US-AT-1 type method.

3.3.2. Model Enhancement Method Based on Regularization

In this section, we will combine specific types of regularization techniques with Ulam stability methods to further strengthen the model’s ability to defend against adversarial attacks. In papers [11,12], the authors utilized regularization and data augmentation methods to improve the model’s robustness against adversarial attacks. Building on this, we conducted additional experiments to assess the impact of Consisitence Regularization under Ulam stable conditions on the robustness accuracy of adversarial training. The consistence regularization loss function, as described in [11], can be expressed as the following:

\begin{matrix} L_{C R} & = \frac{1}{2} [L_{A T} (x_{1}, y_{1}, N) + L_{A T} (x_{2}, y_{2}, N)] \\ + λ \cdot J S (s o f t m a x (N (x_{1})), s o f t m a x (N (x_{2}))), \end{matrix}

(32)

where

y_{1}, y_{2} \in X_{a d v}

define a pair of adversarial samples associated with

x_{1}, x_{2} \in X

,

λ \geq 0

;

L_{A T}

is an adversarial training loss function, and

J S

is the Jensen–Shannon divergence. After introducing the defined Ulam stability constraint, the total object (loss) function of the improved regularized AT method based on Ulam stability is defined as

\begin{matrix} L_{T o t a l}^{2} & = L_{C R} + λ_{1} \cdot L_{U S} . \end{matrix}

(33)

where

λ_{1} > 0

. We call this method the US-AT-2 type method.

4. Experiments

4.1. Datasets

CIFAR-10 Dataset. The CIFAR-10 dataset consists of 60,000 images in 10 classes, including 6000 images per class [33]. It includes 50,000 training samples and 10,000 test samples. Each sample is a

3 \times 32 \times 32

color image.

CIFAR-100 Dataset. CIFAR-100 consists of 100 categories (divided into 20 superclasses, with each containing five classes), with an equal number of 10 categories in the set, each containing 6000 images (5000 training sets and 1000 test sets) [33].

RADIOML 2016.04C Dataset (2016.04C), and RADIOML 2016.10A Dataset (2016.10A). These two datasets are commonly used for modulation classification and recognition in signal processing, and the 10a version is a regularized version of 04c. The data contains 11 types of modulation signals, with signal-to-noise ratios ranging from −18 to 20. Data noise includes frequency offset, multipath effects, etc. [34].

Modulation Recognition Simulation Dataset (MSR). This dataset was simulated and generated using radio simulation software (GNU Radio) in paper [13]. Each sample is an IQ signal with a length of 128 sampling points. The dataset contains 11 commonly used modulation styles, and the number of training sets, verification sets, and test sets is 55,000, 13,200, and 22,000 respectively.

4.2. Implementation Details

Our method US-AT was trained on four NVIDIA P100 GPUs and two NVIDIA A5000 GPUs. The version of Pytorch framework used was 2.1.2+cu121. The networks of the models in the following experiments all adopted the PreActResnet18 network structure and PreActResnet34 without pre-training. The learning rate (

l r = 0.1

) decreased to 1/10 of its original level every 50 epochs. The maximum number of epochs in the experiment was set to 200. For the PreActResnet34 network structure, the maximum epoch amount was set to 100. Below are detailed training and testing parameters for the model adversarial training on several different datasets:

(1): CIFAR-10 and CIFAR-100 experiments. When performing the image classification tasks on the CIFAR-10 dataset and CIFAR-100 dataset, we set the batchsize to 512. The neural networks in experiments adopted the PreActResnet18 structure. For multi-step AT, we usd $l_{\infty}$ -norm for training the models, and we let $α_{0} = 1.25 ϵ_{0}$ as in [18]. The perturbations radius was set to $ϵ_{0} = 8 / 255$ . We set the itertion number of attack to 10 and the step size of attack 2. Moreover, for testing the models, the itertion number of attack was set to 20. We set the batchsize to 1024.
(2): RADIOML 2016.04C experiment. For the modulation recognition tasks, we used $l_{\infty}$ -norm for AT models, and we let $α_{1} = 1.25 ϵ_{1}$ as in [18]. The perturbations radius was set to $ϵ_{1} = 4 / 255$ . We set the itertion number of attack to 10 and the step size of attack 2. Moreover, for testing the AT models, the itertion number of attack was set to 20. We set the batchsize to 1024.
(3): RADIOML 2016.10A and MRS experiments. For the modulation recognition tasks, we used $l_{\infty}$ -norm for AT models, and we let $α_{2} = 1.25 ϵ_{2}$ as in [18]. The perturbations radius was set to $ϵ_{2} = 2 / 255$ . We set the itertion number of attack to 10 and the step size of attack 2. Moreover, for testing th AT models, the itertion number of attack was set to 20. We set the batchsize to 1024 or 521.

Comparison Methods. Our comparative experiment involved the following algorithms: GradAlign [6], GAT [35], PGD [23], TRADES [15], and Consistance [11].

4.3. Effectiveness Analysis of Ulam Stability Framework

We used robustness accuracy (Robust Acc.) to test the performance of the model. We selected the best AT models and the average of the last five final AT models on validation sets to verify the performance of the models. We defined the following indicators to measure the stability of the model. The smaller the average value of

Δ_{R o b}

and

Δ_{N a t}

, the more stable the AT model is. Set

Δ_{s t a b} = A v e r a g e (Δ_{R o b} + Δ_{N a t}) .

We evaluated the performance of our improved model, which is based on the Ulam stability framework (US-AT) on a two-dimensional image dataset and a one-dimensional modulation recognition signal dataset, respectively. In both cases, we used Jensen-type Ulam loss functions (H5). As shown in Table 1, the newly proposed models achieved higher robustness accuracy and higher stability indicators

Δ_{s t a b}

, which also showed that the Ulam stability condition improved the adversarial robustness and training stability of the models.

4.4. US-AT Associated with Different Ulam Conditions

We tested other types of stability conditions. For conditions (H1) and (H2), corresponding to additive functions and quadratic functions, respectively, the US-AT model failed to train on four benchmark models under these two conditions, making the model unable to defend against network gradient attacks. So, in this experiment, we only list the experimental results of (H3)–(H5) (See Table 2 and Table 3). Under the Isometric condition (or Mazur–Ulam-type condition), Jensen condition, and Hosszù condition, the robust accuracy and stability indicators of the model have all been improved, indicating that under these three conditions, the model has good adversarial robustness. The accuracy improvement of the model is most significant under Isometric conditions. Optimizing Ulam stability conditions can achieve good results in both adversarial robustness and training stability of the model.

4.5. Stability Analysis

In this experiment, we compared stabilities of our proposed models associated with PGD, GradAlign, GAT, and TRADES models on the CIFAR-10 dataset and CIFAR-100 dataset. Based on Figure 1 and Figure 2, the robustness accuracies of Ulam type adversarial training are better than that of benchmark AT methods. The training stabilities are all higher. Ulam-type models do not experience significant performance degradation with adversarial training. This shows that the Jensen-type AT method can enhance the adversarial robustness of benchmark methods and improve their training stability.

4.6. Analysis of Model Enhancement Methods

This experiment was divided into two parts. The first part tested the model enhancement method based on algebraic operations, which we call the US-AT-1 method. We chose Jensen-type conditions for Ulam stability conditions. Based on Table 4 and Table 5, the US-AT-1 method has improved robust accuracy on all four benchmark models compared to the original method and the basic US-AT method. The stability of the US-AT-1 method is between the US-AT and benchmark models, which is better than the stability of the original model but lower than the basic US-AT model. Overall, the US-AT-1 method can still achieve relatively competitive results.

In the second part, considering the US-AT-2 method, we used a combination of the consistency regularization method and TRADES method to test the compatibility between the regularization method and Ulam method. Based on Table 6, it can be seen that by adding the consistency regularization loss function, the robust accuracy of the US-AT-2 models has been improved, but the stability index has also decreased. The experimental results indicate that in the sense of defined stability indicators

Δ_{s t a b}

, consistency regularization does not further increase the stability of the US-AT model, but it still outperforms the stability indicators of the benchmark method.

Based on the above two parts of the experiment, our results reveal that adding a new enhanced loss function can further improve the robustness accuracy of the model, but at the same time, it is limited by the direction of loss function optimization; however, the stability of the final training will be affected to some extent. This also indicates the complexity of model training stability, and the work of US-AT model in maintaining stability is nontrivial.

5. Hyperparameter Analysis

From the Table 7, it can be seen that

λ_{1}

has an impact on the adversarial robustness of the model. In this paper, the

λ_{1}

parameter selection in GAT was 0.5, which is a compromise between adversarial accuracy and natural accuracy.

6. Complexity Analysis and Execution Efficiency Evaluation

The proposed method in this article adds additional computational overhead compared to the benchmark method. This is mainly due to the calculation of the Ulam loss function and the computational overhead brought by auxiliary variable gradient propagation. The detailed experimental comparison results are shown in Table 8. We compared the time cost for each adversarial training step of Ulam-based adversarial training methods with the original adversarial training methods. The experimental results show that Ulam-type adversarial methods require more computational cost, but the time consumption is within an acceptable range.

7. Conclusions

In this article, we introduced a new type of adversarial training framework, called US-AT, which is based on the generalized Ulam stability theorem. This theorem allows us to precisely constrain the feature subspaces of our US-AT models, resulting in improved stability and robustness. Through experimentation, we have found that certain Ulam stability conditions, such as isometry, Jensen, and H-transform, are effective in improving the gradient based adversarial attack defense capability of the model.

Furthermore, our framework can be combined with different types of adversarial training methods, resulting in improved performance. The diversity of Ulam stability conditions also allows for the construction of various US-AT models with different properties. However, there is still room for improvement in finding the optimal Ulam stability condition or combination of conditions for our framework. Therefore, our future research will focus on optimizing the design of stability training strategies to further improve the robust accuracy of our US-AT models. In addition, hyperstability is another direction worth exploring in adversarial training, and the adversarial robustness of deep neural networks with hyperstability is the next topic we will study.

Author Contributions

Formal analysis, K.Y.; Investigation, K.Y.; Writing—original draft, L.Y.; Writing—review and editing, K.Y., L.Y., W.R. and Z.Y.; Visualization, W.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Outstanding Member of Youth Innovation Promotion Association CAS Y2022052 and Key Projects of National Natural Science Foundation of China 62131019.

Data Availability Statement

The dataset is sourced from publicly available data.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lei, H.; Tsai, Y.Y.; Chen, P.Y.; Ho, T.Y. Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 24658–24667. [Google Scholar]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2015, arXiv:1412.6572. [Google Scholar]
Kurakin, A.; Goodfellow, I.J.; Bengio, S. Adversarial machine Learning at Scale. arXiv 2016, arXiv:1611.01236. [Google Scholar]
Maini, P.; Wong, E.; Kolter, J.Z. Adversarial Robustness Against the Union of Multiple Perturbation Models. In Proceedings of the International Conference on Machine Learning, Online, 12–18 July 2020. [Google Scholar]
Mao, C.Z.; Zhong, Z.Y.; Yang, J.F.; Vondrick, C.; Ray, B. Metric Learning for Adversarial Robustness. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Andriushchenko, M.; Flammarion, N. Understanding and Improving Fast Adversarial Training. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–12 December 2020. [Google Scholar]
Herrmann, C.; Sargent, K.; Jiang, L.; Zabih, R.; Chang, H.W.; Liu, C.; Krishnan, D.; Sun, D.Q. Pyramid Adversarial Training Improves ViT Performance. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 13409–13419. [Google Scholar] [CrossRef]
Li, Y.X.; Xu, C. Trade-off between Robustness and Accuracy of Vision Transformers. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7558–7568. [Google Scholar]
Sun, L.T.; Ke, D.; Wang, X.; Huang, Z.T.; Huang, K.Z. Robustness of Deep Learning-Based Specific Emitter Identification under Adversarial Attacks. Remote Sens. 2022, 14, 4996. [Google Scholar] [CrossRef]
Ke, D.; Wang, X.; Huang, K.Z.; Wang, H.Y.; Wang, Z.T. Minimum Power Adversarial Attacks in Communication Signal Modulation Classification with Deep Learning. Cogn. Comput. 2023, 15, 580–589. [Google Scholar] [CrossRef]
Tack, J.; Yu, S.; Jeong, J.; Kim, M.; Hwang, S.J.; Shin, J. Consistency Regularization for Adversarial Robustness. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22), Virtual Event, 22 February–1 March 2022. [Google Scholar]
Zhang, S.D.; Gao, H.C.; Zhang, T.W.; Zhou, Y.Y.; Wu, Z.H. Alleviating Robust Overfitting of Adversarial Training With Consistency Regularization. arXiv 2022, arXiv:2205.11744v1. [Google Scholar]
Ren, W.J.; Chen, Q.; Yang, Z.P. Adversarial discriminative domain adaptation for modulation classification based on Ulam stability. Iet Radar, Sonar Navig. 2023, 17, 1175–1181. [Google Scholar] [CrossRef]
Ren, W.J.; Yang, Z.P.; Wang, X. A two-branch symmetric domain adaptation neural network based on Ulam stability theory. Inf. Sci. 2023, 628, 424–438. [Google Scholar] [CrossRef]
Zhang, H.Y.; Yu, Y.D.; Jiao, J.T.; Xing, E.P.; Ghaoui, L.E.; Jordan, M.I. Theoretically Principled Trade-off between Robustness and Accuracy. arXiv 2019, arXiv:1901.08573. [Google Scholar]
Huang, S.H.; Lu, Z.C.; Deb, K.; Boddeti, V.N. Revisiting Residual Networks for Adversarial Robustness: An Architectural Perspective. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
Bai, T.; Luo, J.Q.; Zhao, J.; Wen, B.H.; Wang, Q. Recent Advances in Adversarial Training for Adversarial Robustness. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 19–26 August 2021; pp. 4312–4321. [Google Scholar]
Wong, E.; Rice, L.; Kolter, J.Z. Fast is better than free: Revisiting adversarial training. arXiv 2020, arXiv:2001.03994. [Google Scholar]
Shafahi, A.; Najibi, M.; Ghiasi, A.; Xu, Z.; Dickerson, J.; Studer, C.; Davis, L.S.; Taylor, G.; Goldstein, T. Adversarial Training for Free! arXiv 2019, arXiv:1904.12843. [Google Scholar]
Li, T.; Wu, Y.; Chen, S.; Fang, K.; Huang, X. Subspace Adversarial Training. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 13399–13408. [Google Scholar] [CrossRef]
Liu, J.; Lau, C.P.; Souri, H.; Feizi, S.; Chellappa, R. Mutual Adversarial Training: Learning Together is Better Than Going Alone. IEEE Trans. Inf. Forensics Secur. 2022, 17, 2364–2377. [Google Scholar] [CrossRef]
Gong, H.H.; Dong, M.J.; Ma, S.Q.; Camtepe, S.; Nepal, S.; Xu, C. Random Entangled Tokens for Adversarially Robust Vision Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–24 June 2024. [Google Scholar]
Rice, L.; Wong, E.; Kolter, Z. Overfitting in Adversarially Robust Deep Learning. In Proceedings of the International Conference on Machine Learning, Online, 12–18 July 2020. [Google Scholar]
Yu, C.J.; Han, B.; Shen, L.; Yu, J.; Gong, C.; Gong, M.M.; Liu, T.L. Understanding Robust Overfitting of Adversarial Training and Beyond. In Proceedings of the International Conference on Machine Learning, Hangzhou, China, 23–25 September 2022. [Google Scholar]
Tang, L.Y.; Zhang, L. Robust Overfitting Does Matter: Test-Time Adversarial Purification with FGSM. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–24 June 2024. [Google Scholar]
Zhou, Y.H.; Hua, Z.Y. Defense without Forgetting: Continual Adversarial Defense with Anisotropic & Isotropic Pseudo Replay. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–24 June 2024. [Google Scholar]
Li, L.; Guan, H.Y.; Qiu, J.N.; Spratling, M. One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–24 June 2024. [Google Scholar]
Wang, Y.T.; Fu, H.Y.; Zou, W.; Jia, J.Y. MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–24 June 2024. [Google Scholar]
Jung, S.M. Hyers-Ulam-Rassias Stability of Jensen Equation and Its Applications. Proc. Am. Math. Soc. 1998, 126, 3137–3142. [Google Scholar] [CrossRef]
Hyers, D.H. On The Stability of Linear Functional Equation. Proc. Natl. Acad. Sci. USA 1941, 27, 222–224. [Google Scholar] [CrossRef] [PubMed]
Xu, T.Z.; Yang, Z.P. A Fixed Point Approach to the Stability of Functional Equations on Noncommutative Spaces. Results Math. 2017, 72, 1639–1651. [Google Scholar] [CrossRef]
Hyers, D.H.; Ulam, S.M. On Approximate Isometries. Bull. Amer. Math. Soc. 1945, 51, 288–292. [Google Scholar] [CrossRef]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. Technical Report. 2009. Available online: https://www.semanticscholar.org/paper/Learning-Multiple-Layers-of-Features-from-Tiny-Krizhevsky/5d90f06bb70a0a3dced62413346235c02b1aa086 (accessed on 3 September 2024).
O’Shea, T.J. Deepsig. Available online: http://www.deepsig.io/ (accessed on 3 September 2024).
Sriramanan, G.; Addepalli, S.; Baburaj, A.; Babu, R.V. Guided Adversarial Attack for Evaluating and Enhancing Adversarial Defenses. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canad, 8–14 December 2019. [Google Scholar]

Figure 1. The robustness accuracy curve of the model on four benchmark model experiments associated with CIFAR-10 dataset. Based on the robust accuracy curve, it can be seen that the newly proposed method in this paper has significant advantages in robust accuracy and also improves the stability of training.

Figure 2. The robustness accuracy curve of the model on four benchmark model experiments associated with CIFAR-100 dataset. Based on the robust accuracy curve, it can be seen that the newly proposed method in this paper has significant advantages in robust accuracy and also improves the stability of training.

Table 1. Performance comparisons of multi-step AT on CIFAR-10 and CIFAR-100 datasets. The robustness accuracy was evaluated under PGD-20 attack.

Dataset	Method	Robust Acc.			Natural Acc.
Dataset	Method	Best	Final	$Δ_{R o b}$	Best	Final	$Δ_{N a t}$	$Δ_{s t a b}$
CIFAR-10	PGD	44.60	38.48	−6.12	76.08	80.75	4.67	5.40
	+US-AT	47.23	45.60	−1.63	78.90	79.70	0.8	1.22
	TRADES	48.91	46.09	−2.82	79.42	80.17	0.75	1.79
	+US-AT	49.69	48.99	−0.7	78.08	78.12	0.04	0.37
	GradAlign	46.03	38.90	−7.13	76.38	81.41	5.03	6.08
	+US-AT	47.60	43.93	−3.67	80.52	80.53	0.01	1.84
	GAT	27.84	22.21	−5.63	84.20	88.37	4.17	4.87
	+US-AT	31.38	30.42	−0.96	85.29	86.29	1.00	0.98
CIFAR-100	PGD	21.26	16.32	−4.94	42.31	49.49	7.18	6.06
	+US-AT	25.47	24.60	−0.87	49.20	50.13	0.93	0.90
	TRADES	23.75	22.38	−1.37	49.93	49.47	−0.46	0.92
	+US-AT	25.37	25.36	−0.01	49.71	49.71	0.00	0.01
	GradAlign	21.77	16.90	−4.87	45.48	50.21	4.73	4.80
	+US-AT	24.26	22.08	−2.18	50.35	50.84	0.49	1.34
	GAT	11.05	9.71	−1.34	58.96	59.29	0.33	0.84
	+US-AT	14.02	13.58	−0.44	56.79	55.91	−0.88	0.66

Table 2. Performance comparisons of multi-step AT on CIFAR-10 based on PreActResnet34. The robustness was evaluated under PGD-20 attack.

Dataset	Method	Robust Acc.			Natural Acc.
Dataset	Method	Best	Final	$Δ_{R o b}$	Best	Final	$Δ_{N a t}$	$Δ_{s t a b}$
CIFAR-10	PGD	46.13	43.06	−3.07	75.47	79.97	4.5	3.79
	+US-AT	47.64	46.55	−1.09	77.16	78.32	1.16	1.13
	TRADES	47.72	46.56	−1.16	77.67	78.23	0.56	0.86
	+US-AT	47.88	47.67	−0.21	75.41	75.56	0.15	0.62
	GradAlign	46.06	42.71	−3.35	76.10	79.97	3.87	3.61
	+US-AT	45.97	45.04	−0.93	74.23	73.84	−0.39	0.66
	GAT	27.45	23.51	−3.94	84.20	85.78	1.58	2.76
	+US-AT	31.34	28.96	−2.38	83.60	83.62	0.02	1.2

Table 3. Performance comparisons of multi-step AT on CIFAR-10. Associated with different Ulam conditions, the robustness was evaluated under PGD-20 attack.

Dataset	Method	Robust Acc.			Natural Acc.
Dataset	Method	Best	Final	$Δ_{R o b}$	Best	Final	$Δ_{N a t}$	$Δ_{s t a b}$
CIFAR-10	PGD	44.60	38.48	−6.12	76.08	80.75	4.67	5.40
	+US-AT-Isometric	48.73	46.79	−1.94	76.72	79.05	2.33	2.14
	+US-AT-Hosszù	48.21	45.74	−2.47	78.50	79.32	0.82	1.65
	+US-AT-Jensen	47.23	45.60	−1.63	78.90	79.70	0.8	1.22
	TRADES	48.91	46.09	−2.82	79.42	80.17	0.75	1.79
	+US-AT-Isometric	49.19	46.99	−2.20	77.97	77.71	−0.26	1.23
	+US-AT-Hosszù	48.18	46.28	−1.90	77.25	77.74	0.49	1.20
	+US-AT-Jensen	49.69	48.99	−0.7	78.08	78.12	0.04	0.37
	GradAlign	46.03	38.90	−7.13	76.38	81.41	5.03	6.08
	+US-AT-Isometric	48.61	47.14	−1.47	76.94	77.96	1.02	1.25
	+US-AT-Hosszù	48.05	45.54	−2.51	77.83	79.27	1.46	1.99
	+US-AT-Jensen	47.60	43.93	−3.67	80.52	80.53	0.01	1.84
	GAT	27.84	22.21	−5.63	84.20	88.37	4.17	4.87
	+US-AT-Isometric	46.56	44.30	−2.26	78.82	80.01	1.19	1.73
	+US-AT-Hosszù	46.44	44.31	−2.13	78.78	80.07	1.29	1.71
	+US-AT-Jensen	31.38	30.42	−0.96	85.29	86.29	1.00	0.98

Table 4. Performance comparisons based on US-AT-1 methods.

Dataset	Method	Robust Acc.			Natural Acc.
Dataset	Method	Best	Final	$Δ_{R o b}$	Best	Final	$Δ_{N a t}$	$Δ_{s t a b}$
CIFAR-10	PGD	21.26	16.32	−4.94	42.31	49.49	7.18	6.06
	+US-AT	25.47	24.60	−0.87	49.20	50.13	0.93	0.90
	+US-AT-1	26.01	25.87	−0.14	46.01	46.09	0.08	0.11
	TRADES	23.75	22.38	−1.37	49.93	49.47	−0.46	0.92
	+US-AT	25.37	25.36	−0.01	49.71	49.71	0.00	0.01
	+US-AT-1	25.78	25.68	−0.10	48.94	48.58	−0.36	0.23
	GradAlign	21.77	16.90	−4.87	45.48	50.21	4.73	4.80
	+US-AT	24.26	22.08	−2.18	50.35	50.84	0.49	1.34
	+US-AT-1	25.46	25.52	0.06	44.27	45.84	1.57	0.82
	GAT	11.05	9.71	−1.34	58.96	59.29	0.33	0.84
	+US-AT	14.02	13.58	−0.44	56.79	55.91	−0.88	0.66
	+US-AT-1	15.14	14.31	−0.83	56.41	55.52	−0.89	0.86
CIFAR-100	PGD	21.26	16.32	−4.94	42.31	49.49	7.18	6.06
	+US-AT	25.47	24.60	−0.87	49.20	50.13	0.93	0.90
	+US-AT-1	26.01	25.87	−0.14	46.01	46.09	0.08	0.11
	TRADES	23.75	22.38	−1.37	49.93	49.47	−0.46	0.92
	+US-AT	25.37	25.36	−0.01	49.71	49.71	0.00	0.01
	+US-AT-1	25.78	25.68	−0.10	48.94	48.58	−0.36	0.23
	GradAlign	21.77	16.90	−4.87	45.48	50.21	4.73	4.80
	+US-AT	24.26	22.08	−2.18	50.35	50.84	0.49	1.34
	+US-AT-1	25.46	25.52	0.06	44.27	45.84	1.57	0.82
	GAT	11.05	9.71	−1.34	58.96	59.29	0.33	0.84
	+US-AT	14.02	13.58	−0.44	56.79	55.91	−0.88	0.66
	+US-AT-1	15.14	14.31	−0.83	56.41	55.52	−0.89	0.86

Table 5. Performance comparisons based on US-AT-1 methods.

Dataset	Method	Robust Acc.			Natural Acc.
Dataset	Method	Best	Final	$Δ_{R o b}$	Best	Final	$Δ_{N a t}$	$Δ_{s t a b}$
2016.04C	PGD	31.72	27.67	−4.05	23.43	21.66	−1.77	2.91
	+US-AT	32.08	31.40	−0.68	39.86	39.98	0.12	0.40
	+US-AT-1	32.22	32.20	−9.58	36.09	34.81	−0.86	5.22
	TRADES	26.71	9.28	−17.43	43.41	42.48	−0.93	9.18
	+US-AT	28.54	18.96	−9.58	43.42	42.56	−0.86	5.22
	+US-AT-1	28.66	18.39	−10.27	46.48	41.43	−5.05	7.66
	GradAlign	24.78	8.27	−16.51	33.30	31.90	−1.4	8.90
	+US-AT	27.33	27.26	−0.07	36.81	36.75	0.06	0.79
	+US-AT-1	31.55	30.81	−0.74	35.44	34.90	−0.54	0.64
	GAT	16.99	6.50	−10.49	43.26	43.50	0.24	5.37
	+US-AT	23.56	17.50	−6.06	43.16	51.39	8.23	7.15
	+US-AT-1	26.68	21.94	−4.74	45.18	48.20	3.02	3.88
2016.10A	PGD	35.33	33.58	−1.75	42.56	42.79	0.23	0.99
	+US-AT	36.33	34.28	−2.05	43.08	42.80	−0.28	1.17
	+US-AT-1	37.61	32.55	−5.06	43.70	45.18	1.48	3.27
	TRADES	31.15	29.54	−1.61	50.46	50.48	0.02	0.82
	+US-AT	31.24	28.28	−2.96	50.73	51.06	0.33	1.65
	+US-AT-1	33.29	32.68	−0.61	50.50	51.52	1.02	0.82
	GradAlign	33.79	29.59	−4.20	43.88	46.39	2.51	3.56
	+US-AT	34.30	31.43	−2.97	43.46	44.66	1.20	2.09
	+US-AT-1	37.38	31.96	−5.42	43.41	44.87	1.37	3.40
	GAT	31.38	30.14	−1.24	46.62	48.47	1.85	1.55
	+US-AT	31.43	30.50	−0.93	44.07	50.51	6.44	3.69
	+US-AT-1	34.15	33.84	−0.31	49.72	49.50	−0.22	0.27
MSR	PGD	39.33	34.91	−4.42	28.31	29.22	0.91	2.67
	+US-AT	42.74	43.29	0.55	48.36	48.90	0.54	0.55
	+US-AT-1	40.90	41.13	0.23	46.75	47.03	0.28	0.26
	TRADES	32.15	15.17	−16.98	43.51	48.48	4.97	10.98
	+US-AT	35.33	27.97	−7.36	52.49	48.55	−3.94	5.65
	+US-AT-1	27.65	28.04	0.48	44.88	49.57	4.69	2.59
	GradAlign	36.24	11.35	−24.89	21.59	30.39	8.80	16.85
	+US-AT	41.10	43.26	2.16	46.78	47.93	1.15	1.66
	+US-AT-1	40.56	40.08	−0.48	45.46	45.77	0.31	0.40
	GAT	32.86	14.13	−18.73	50.05	48.41	−2.04	10.39
	+US-AT	32.10	34.50	2.4	50.38	53.79	3.44	2.92
	+US-AT-1	35.37	35.14	−0.23	49.36	54.39	5.03	2.63

Table 6. Comparison of US-AT-2 methods and consistency regularization.

Dataset	Method	Robust Acc.			Natural Acc.
Dataset	Method	Best	Final	$Δ_{R o b}$	Best	Final	$Δ_{N a t}$	$Δ_{s t a b}$
2016.04C	TRADES	30.39	10.62	−19.77	38.86	48.48	9.62	14.70
	Consistency+TRADES	30.50	11.56	−18.94	41.35	48.73	7.38	13.16
	TRADES+US-AT	36.52	30.22	−6.3	44.91	46.44	1.53	3.92
	TRADES+US-AT-2	36.54	28.87	−7.67	44.96	45.78	0.82	4.25
2016.10A	TRADES	30.49	27.65	−2.84	49.07	49.86	0.79	1.82
	Consistency+TRADES	30.34	10.05	−20.17	49.46	52.98	3.52	11.85
	TRADES+US-AT	33.35	32.67	−0.68	50.55	52.16	1.61	1.15
	TRADES+US-AT-2	38.32	30.69	−7.63	48.86	52.56	3.7	5.67
MSR	TRADES	32.15	15.17	−16.98	43.51	48.48	4.97	10.98
	Consistency+TRADES	46.16	45.31	−0.85	53.96	54.22	0.26	0.56
	TRADES+US-AT	35.33	27.97	−7.36	52.49	48.55	−3.94	5.65
	TRADES+US-AT-2	41.91	40.18	−1.73	50.52	50.91	0.39	1.06

Table 7. Performance comparisons of multi-step AT on CIFAR-10. The robustness was evaluated under PGD-20 attack.

Dataset	Method	Robust Acc.
Dataset	Method	Best	Final	$Δ_{s t a b}$
CIFAR-10	GAT	27.84	22.21	-5.63
	GAT- $λ_{1} = 0.1$	31.38	30.42	−0.96
	GAT- $λ_{1} = 0.5$	34.17	34.08	−0.09
	GAT- $λ_{1} = 1.0$	31.96	32.06	0.10

Table 8. Time complexity comparison results. The robustness was evaluated under PGD-20 attack.

Dataset (s/epoch)	GAT	+US-AT	Gradalign	+US-AT	PGD	+US-AT	TRADES	+US-AT
CIFAR-10	47	56	147	175	81	100	150	175
CIFAR-100	40	52	147	178	81	151	150	175
2016.04C	9	35	80	96	32	41	57	68
2016.10A	44	56	54	248	58	105	242	281
MSR	58	69	112	135	71	89	58	108

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yan, K.; Yang, L.; Yang, Z.; Ren, W. Enhancing Adversarial Robustness through Stable Adversarial Training. Symmetry 2024, 16, 1363. https://doi.org/10.3390/sym16101363

AMA Style

Yan K, Yang L, Yang Z, Ren W. Enhancing Adversarial Robustness through Stable Adversarial Training. Symmetry. 2024; 16(10):1363. https://doi.org/10.3390/sym16101363

Chicago/Turabian Style

Yan, Kun, Luyi Yang, Zhanpeng Yang, and Wenjuan Ren. 2024. "Enhancing Adversarial Robustness through Stable Adversarial Training" Symmetry 16, no. 10: 1363. https://doi.org/10.3390/sym16101363

APA Style

Yan, K., Yang, L., Yang, Z., & Ren, W. (2024). Enhancing Adversarial Robustness through Stable Adversarial Training. Symmetry, 16(10), 1363. https://doi.org/10.3390/sym16101363

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Adversarial Robustness through Stable Adversarial Training

Abstract

1. Introduction

2. Related Work

2.1. Adversarial Training

2.2. Robustness Accuracy and Robustness Overfitting

3. Ulam Stability Adversarial Training

3.1. Some Results on Ulam Stability of Neural Networks

3.2. Ulam Stability Adversarial Training

3.3. Model Enhancement for US-AT

3.3.1. Model Augmentation Method Based on Algebraic Operation

3.3.2. Model Enhancement Method Based on Regularization

4. Experiments

4.1. Datasets

4.2. Implementation Details

4.3. Effectiveness Analysis of Ulam Stability Framework

4.4. US-AT Associated with Different Ulam Conditions

4.5. Stability Analysis

4.6. Analysis of Model Enhancement Methods

5. Hyperparameter Analysis

6. Complexity Analysis and Execution Efficiency Evaluation

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI