3.1. Some Results on Ulam Stability of Neural Networks
Ulam stability is an important research object of nonlinear functional analysis, which characterizes the stability and existence of solutions of functions in abstract Banach spaces [
29,
30,
31]. If an operator equation is Ulam stable, there exists a mapping (or solution) with the error that is no more than the upper bound of Ulam constant around its approximate solution. This property can be used to define the boundary of the latent feature subspace of deep neural networks, meaning that the subspace of deep neural networks satisfying Ulam stability is limited by a certain function within a region that is less than a certain error. In papers [
13,
14] the authors introduced Ulam stability into the application of deep neural networks to solve domain adaptation problems. This application of Ulam stability in the field of domain adaptation has motivated us to explore its potential in solving deeper problems in the field of deep learning. In classical functional analysis, a classic method for proving Ulam stability is the perturbation method, where the operator equation has an exact solution under small perturbations, and the error between the exact solution and the approximate solution can be controlled by a certain bounded control function. This can be used by the AT to improve the stability of models. In this paper, we aim to equip the network with a similar ability to control stability, which makes the model more robust to adversarial training. Below, we provide the necessary definition and theorem for this paper.
Definition 1. Let be an element in a norm space X. The -norm of x can be defined as follows:where is an absolute value function. Definition 2. Let be an element in a norm space X. The -norm of x can be defined as follows:where is an absolute value function. Definition 3 ((Generalized Ulam Stability) [
29])
. Let X and Y be both Banach spaces (or Eucidean spaces) equipped with the norm . Set functions . Assume that is a function, and is an operator. Ifthen there exists a mapping such thatandWe say that the operator is generalized Ulam stable. Remark 1. Definition 3 is a definition of the generalized Ulam stability of abstract functional spaces. In the subspace composed of deep neural networks, we hope that deep neural networks also have some good properties similar to functionals or have good approximation properties and quantifiable boundaries that can be estimated.
In the following analysis, we will prove that if a deep neural network is Ulam stable, it can be approximated by a good function, and its boundaries can be estimated by that function.
If , the generalized Ulam stability is also called Ulam stable. Let be an abstract operator. We provide definitions for the five types of inequality constraints considered in this article.
Definition 4. The family of Ulam stability conditions is defined as follows.
(H1) The δ-additive transformation is defined aswhere and X constitute a Banach space. (H2) The δ-quadratic transformation is defined aswhere and X constitute a Banach space. (H3) The δ-isometric transformation is defined aswhere and X constitute a Euclidean vector space or a Hilbert space. (H4) The δ-Hosszù transformation is defined aswhere and X constitute a Banach space. (H5) The δ-Jensen transformation is defined aswhere and X constitute a Banach space. Remark 2. The condition (H1)–(H5) is only a part of the Ulam stability conditions. It is easy to verify that (H1) represents the family of approximate additive functions, (H2) represents the family of approximate quadratic functions, (H3) represents the family of approximate isometric functions, (H4) represents the family of approximate complex functional equations, and (H5) represents the family of approximate convex functions.
In fact, conditions such as exponential functions, logarithmic functions, and hyperfunctions may also have good properties. There are many stability conditions in the sense of Ulam stability, which can be expressed not only in the form of differences but also in the form of differentiation and integration, as well as in the form of discretization. This article considers the convenience of the form and tests the above five stability conditions. Assume that f is a deep neural network, operator is symmetric () in definitions (H1), (H2), (H3), and (H5), but it is not necessarily asymmetric in (H4) when .
Theorem 1. If a deep neural network N satisfies any of the above conditions (H1)–(H5) in Definition 4, then N is generalized Ulam stable, and there is a solution satisfying Equation (5) and Inequality (6). Proof. The stabilities under (H1)–(H5) conditions can be proven through direct construction methods [
32], and the proof processes are quite similar. The difference is that the construction process is different. Below, we provide a proof of Ulam stability under approximate isomorphism conditions (H2), and other conditions can be obtained through similar proofs. The proof process is divided into four parts:
(1) The first step is to prove the following conclusion. Let X be an abstract Hilbert space equipped with the inner product . Let be a -isometry transformation with . The limit exists for every x in X. And is an isometry mapping.
We have
and
Let
; then,
. Assume that
and
. Then,
is in the intersection, and for any point
y of
we can obtain
and
.
Then, we have
If
, then
. If
, we have
.
The following inequality
is satisfied, where
, and
.
By using mathematical induction, we can obtain the following inequality:
If
, then we have
where
, and
.
Since
X is a Hilbert space, the limit
exists for
. It is easy to verify that there exists a set
such that the equation
for
. We call
a completely nonlinear set, on which the neural network
N has an upper bound. Since
is trivial in the sense of Ulam stability, in the following proof, we only consider the set
.
Then, for
, the following equation
is easy to obtain by
(2) The second step is to prove the following conclusion. If u and x are any points of such that and ; then, .
For an arbitrary integer
n, put
. Let
. Then,
. It follows that
. Since
N is an
isometry, we have
where
.
Dividing by
, we obtain the equality
Set
such that
. Then,
, where
is a point of the sphere
. We have
Moreover,
as
. Set
exists and is a unit vector. Then, we can obtain
where
is an arbitrary positive. It follows that
(3) The third step is to prove the following conclusion. If , then .
For
, let
denote any point whose
N-image is
z. We call
as the ideal decoder of the nerual network
N. Then,
is an
-isometric mapping. The limit
exists, and the mapping
is also an isometry on
. We have
Upon divding by
and letting
, we can obtain
. Therefore,
for
. Moreover,
is surjective and linear by the classic Mazur–Ulam theorem.
(4) Finally, for
, we only need to prove that the inequality
holds.
For any
, assume that
is the linear mainfold orthogonal to
x. Then,
is an isometric transformation that maps
into the whole
. Hence,
is the linear manifold orthogonal to
. Let
w be the projection of
on
. If
, set
. Otherwise, let
. The inequality
holds. Set
. Then,
is an unit vector associated with
t and is coplanar with
and
t. Using the Pythagorean theorem, one can obtain
Let
. If the projection
of
on
is not zero, then we set
. Otherswise, let
. In both cases
, and
. If
; then,
If
, then we have
Hence, the inequality
holds, since
.
Consider the following two situations. If
, set
in (
23) and (
21). Then, we have
. If
, then for some integer
, we must have
and
, since
is positive and
. Hence, we have
But we know that
; then, we have
, and
for
. □
Remark 3. In fact, slightly changing the conditions in Theorem 1 can lead to stronger (weaker) conclusions about hyperstability (weak stability). Hyperstability neural networks have stronger control boundaries and significant differences in their asymptotic properties, which can be generalized as the next step of work.
If the operator acts on the neural network, then by Theorem 1, we can obtain the following corollary:
Corollary 1. Assuming N is a deep neural network that satisfies Theorem 1, there exists a function and a control function such that .
Proof. (I) If satisfies condition (H1), then there exists an additive function A such that . In this case, the boundary of the value range of N is fixed by an additive mapping A and a sphere with a bounded perturbation , that is, . We call the neural network satisfying the condition (H1) an approximately additive neural network.
(II) If satisfies condition (H2), then there is a quadratic function Q such that . In this case, the boundary of the value range of N is fixed by a quadratic mapping Q and a sphere with a bounded perturbation , that is, . We call the neural network satisfying the condition (H2) an approximately quadratic neural network.
(III) If satisfies condition (H3), then there is an isometric function I such that . In this case, the boundary of the value range of N is fixed by an isometric mapping I and a sphere with a bounded perturbation , that is, . We call the neural network satisfying the condition (H3) an approximately isometric neural network.
(IV) If satisfies condition (H4), then there is a Hosszù function I such that . In this case, the boundary of the value range of N is fixed by a Hosszù mapping H and a sphere with a bounded perturbation , that is, . We call the neural network satisfying the condition (H4) an approximately Hosszù neural network.
(V) If satisfies condition (H5), then there is a Jensen function J such that . In this case, the boundary of value range of N is fixed by a Jensen mapping H and a sphere with a bounded perturbation , that is, . We call the neural network satisfying the condition (H5) an approximately Jensen neural network. □
3.2. Ulam Stability Adversarial Training
In this section, we establish the connection between Ulam stability theory and adversarial training. Associated with different Ulam stability conditions, one can induce optimal feature subspaces with different properties, in which the adversarial training model is more stable and robust.
Assume that
is an adversarial sample associated with a sample
. It is easy to see that
, where
. Let
N be a deep neural network. Furthermore, assume that a benchmark adversarial training model is defined in the following form:
where
is an objective function,
,
, and
.
Under the framework of the adversarial training model (
25), we introduce the following unconstrained optimization problem (
26) and define the corresponding Ulam stability object function
:
where
is a Ulam stable condition,
and
. In this article, we will focus on the conditions (H1)–(H5). From the optimization problem (
26), it can be observed that if
, the abstract operator
is generalized Ulam stable, and the error boundary of the value range of neural network
can be estimated by a certain mapping
with certain properties. At this point, the neural network
N can effectively resist attacks from the gradient direction. Our improved Ulam stability adversarial training (US-AT) has been transformed into a joint optimization problem for problems (
25) and (
26).
If
is a regular type loss function, then the overall loss function
of our newly constructed US-AT can be defined as
where
,
, and
.
The US-AT does not define a method for generating adversarial samples and relies on an adversarial training loss function . Therefore, the US-AT does not have an independent adversarial training model, but rather provides a strategy for enhancing or improving adversarial training, further improving the performance of the benchmark model. However, a natural question has been raised: are the Ulam stability induced loss function and compatible, that is, is the Ulam condition induced optimal feature subspace also suitable for adversarial training?
We will theoretically analyze and experimentally demonstrate that not all the Ulam stability conditions are compatible with the AT. However, some of these different Ulam stability conditions with induced feature subspaces can effectively enhance the AT ability of the benchmark model, not only improving its stability but also enhancing its robust accuracy.
In theory, if condition (H1) holds, the stability is called a first-order additive stability. If
, then the feature space of
N can be represented by an additive function
, that is,
where
a and
b are fixed constants. In this case,
N has strong linearity and is susceptible to gradient attack methods. If condition (H2) holds, the stability is called a quadratic stability. If
, then the feature space of
N can be represented by a quadratic function
, that is,
where
, and
d are fixed constants. In this case,
N behaves similarly to a polynomial function and is easily influenced by gradient attack methods. In both cases, the feature subspace is constrained within a smooth banded region induced by a polynomial function. This also causes the properties of the neural network to resemble those of a polynomial function, resulting in a smoother gradient and making it vulnerable to attacks while also making it difficult to be compatible with other types of gradient adversarial defense methods. However, if conditions (H3)–(H5) hold, the models are restricted to the corresponding isometric function, the solution of the Hosszù equation, and the strip region of the convex function.