1. Introduction
High-dimensional (dimension ≥ 3) nonlinear partial differential equations (PDEs) are among the subjects that draw a lot of interest and are widely used in a variety of fields. Many practical problems require the use of a high-dimensional PDE, for example: the Schrodinger equation of the quantum many-body problem, whose PDE dimension is about three times that of the electron or quantum (particle) in the system; the Black Scholes equation used to price financial derivatives, where the dimension of the PDE is the number of relevant financial assets under consideration; and the Hamilton-Jacobi-Bellman equation in dynamic programming. High-dimensional nonlinear partial differential equations are very practical, but because there are few explicit solutions or the analytical expressions are too complicated, it is frequently necessary to use some numerical techniques to solve them. However, in practical applications, high-dimensional nonlinear partial differential equations are usually very difficult to solve and remain one of the challenging topics in the academic community. The difficulty is that, due to the “curse of dimensionality” [
1], the time complexity of traditional numerical solutions will increase exponentially with the increase of dimensions, thus requiring extensive computing resources. Nevertheless, we urgently need to approximate the numerical solution of these high-dimensional nonlinear partial differential equations because these equations can resolve a variety of real-world issues.
In the field of finance, the method of solving high-dimensional PDEs is frequently employed. However, the “heavy tail” of financial markets has been demonstrated by numerous empirical research studies. “Abnormal” events such as new inventions, economic policies, wars or other news, as well as various arbitrage and hedging activities, can lead to sudden and intermittent changes in asset prices. These sudden changes in price and price fluctuations intuitively reflect the jumping effect that exists in financial markets. Although jumping behavior has a low likelihood of happening, when it does happen, it causes significant harm and often results in investors suffering enormous losses or even going bankrupt. How to build adequate models to depict the jumping behavior of asset prices has become a hot topic for academics as financial market research has become more sophisticated and miniaturized. Therefore, it is necessary to incorporate common jumping behaviors in financial problems, such as buying (selling) property defaults, corporate bankruptcies, operational failures, insurance events, etc., into high-dimensional PDE numerical solutions.
Backward stochastic differential equations (BSDEs) are widely used in stochastic control and financial mathematics. One important reason is that a BSDE can be expressed as the stochastic counterpart of a PDE through the Feynman-Kac formula. El Karoui N et al. [
2] provided a detailed introduction to the different properties of BSDEs and their applications to finance. Zhang J [
3] proposed a numerical scheme for a class of BSDEs with possible path-dependent terminal values and proved its convergence. Recently, Barigou K and Delong Ł [
4] also examined the pricing of equity-linked life insurance contracts by framing the price as the solution of a system of non-linear PDEs, reformulating the problem as a BSDE with jumps and solving it numerically using efficient neural networks.
In recent years, deep learning algorithms have rapidly gained popularity and success in various application sectors. As a result, several scientists have had success using deep learning methods to handle high-dimensional PDE issues. In 2017, E W. Han et al. [
5,
6] first proposed the deep BSDE method and systematically applied deep learning to general high-dimensional PDEs. Following that, Han et al. (2020) [
7] continued to apply the deep BSDE method to the study of random games and also produced promising outcomes. The method was expanded to include the case of 2BSDE and the corresponding fully nonlinear PDE by Beck et al. (2019) [
8]. Raissi et al. (2017) [
9] use the latest development of probabilistic machine learning to infer the control equation represented by a parametric linear operator and modify the prior value of the Gaussian process according to the special form of the operator, which is then used to infer the parameters of the linear equation from the scarce and possibly noisy observations. To approximate the solution of a high-dimensional PDE, Sirignano et al. (2018) [
10] suggested using a deep neural network technique. The network was trained to satisfy boundary conditions, initial conditions and differential operators in a batch of randomly sampled time and space. The achievements listed above show that it is feasible to solve high-dimensional PDEs with a deep learning-based method.
In this paper, we mainly apply the deep learning algorithm to a special class of high-dimensional nonlinear partial differential equations with jumps and obtain a numerical solution of the equation. Specifically, through mathematical derivation and equivalent formula expression of a high-dimensional PDE and backward stochastic differential equation, the problem of solving a partial differential equation is equivalent to the problem of solving a BSDE, and then it is transformed into a stochastic control problem. Then, a deep neural network framework is created to address the issue. Therefore, this method can be used to solve high-dimensional PDEs and corresponding backward stochastic differential equations simultaneously.
The method introduced in this paper to solve a kind of high-dimensional PDE with jump by using deep learning is mainly applied in the financial field, and it has important applications in financial derivatives pricing, insurance investment decision-making, small and micro enterprise financing, risk measurement mechanism and other issues.
2. Background Knowledge
2.1. A Class of PDE
We consider a class of semilinear parabolic PDEs with jump term in the following form:
Let
,
, and
and
be continuous functions. For all
,
, the function to be solved
has a terminal value condition
and satisfies:
where we introduce two operators:
Here
is the time variable,
is the
-dimensional space variable,
is the nonlinear part of the equation,
represents the gradient of
with respect to
,
represents the Hessian matrix of
with respect to
. In particular, this equation can be regarded as a special case of partial integro-differential equation. What we are interested in is the solution at
,
, which is
[
11,
12].
2.2. Backward Stochastic Differential Equations with Jumps
Let
be a complete probability space,
be the d-dimensional standard Brownian motion in this probability space,
be the normal filtration generated by
in space
.
,
,
,
are integrable
-adapted stochastic processes. For a class of forward backward stochastic differential equations with jump terms, it has the following form:
where
is the compensated Poisson measure,
is a centralized Poisson process with the compensator intensity
such that
,
is the Poisson process in the probability space
such that
.
Under the standard Lipschitz assumptions on the coefficients
,
,
,
,
, the existence and uniqueness of the solution have been proved [
13].
2.3. The Generalized Nonlinear Feynman-Kac Formula
Under the appropriate regularization assumption, and if
satisfies Equation (1) and linear growth condition
,
,
, the following relationships are established almost everywhere:
is the only solution of the BSDE [
11,
12].
2.4. Improvement of Neural Network Parameter Optimization Algorithm
The Adam (Adaptive Moment Estimation) algorithm is an algorithm that combines the RMSProp algorithm with classical momentum in physics. It uses the first-order moment estimate and the second-order moment estimation of gradient to dynamically modify the learning rate of each parameter. The Adam optimizer is one of the most popular classical optimizers in deep learning, and it also shows excellent performance in practice [
14].
Although Adam combines RMSprop with momentum, the adaptive moment estimation with Nesterov acceleration is often superior to momentum. Therefore, we consider including the Nesterov acceleration effect [
15] into the Adam algorithm, that is, using the Nadam (Nesterov-accelerated Adaptive Moment Estimation) optimization algorithm. The calculation formula is as follows:
For parameters of the neural network , is the gradient of . Besides, and are the first order moment estimate and the second order moment estimate of the gradient, respectively, which can be regarded as the estimation of the expectation and , and here and are their attenuation rates, respectively. Moreover, and are the correction for and , which can be approximated as an unbiased estimate of the expectation. It can be seen that NAdam has a stronger constraint on the learning rate and more direct influence on the gradient update. As a result, we attempt to apply the new optimization algorithm to our method and compare the outcomes with the traditional Adam algorithm.
3. Main Theorem
3.1. Basic Ideas
In this paper we propose a deep learning-based PDE numerical solution for nonlinear PDEs with jump terms in the form of Equation (1). The basic ideas of the algorithm are as follows:
- (1)
By using the generalized nonlinear Feynman-Kac formula, the PDEs to be solved can be equivalently constructed using BSDEs with jumps.
- (2)
Taking the gradient operator and the jump term of the solution of the function to be solved as policy functions, the problem of solving the numerical solution of BSDEs can be considered as a stochastic control problem, which can be further considered as a reinforcement learning problem.
- (3)
Two different deep neural networks are used to approximate this pair of high-dimensional strategy functions, and the neural networks are trained through a deep learning method to obtain the numerical solution of the original equation.
3.2. Transforming the Nonlinear PDE Numerical Solution Problem into a Stochastic Control Problem
It is well known that the generalized nonlinear Feynman-Kac formula can connect a PDE and a BSDE under appropriate assumptions. Thus, the solution
of the above equation is equivalent to the viscous solution
of the semilinear parabolic partial differential equation with jump term in Equation (1) [
16,
17]. We can estimate the solution
of Equation (1) by estimating the solution of Equation (4) because the numerical solution of the traditional partial differential equation does not perform well in high dimensions. Since BSDEs developed from the study of stochastic control problems and have common characteristics and internal relations, the nonlinear PDE problem can also be transformed into a stochastic control problem through the nonlinear Feynman-Kac formula.
As for the stochastic control problem related to Equation (4), the following formula holds almost everywhere for all
:
Among them,
[
18].
Under the appropriate regularity assumption, for the nonlinear function
, it holds that a group of solution functions composed of
,
and
is the unique global minimum of the following functions
:
We regard and as the policy functions of a deep learning problem and use DNN to approximate and . In this way, the stochastic process corresponds to the solution function of the stochastic control problem and can be solved by using the policy functions and , and thus we can transform the numerical solution problem of the nonlinear partial differential equation into a stochastic control problem.
3.3. Forward Discretization of the Backward Stochastic Differential Equations with Jumps
For all
and
, the following equations hold almost everywhere:
By substituting (6) and (7) into (19), we can obtain:
Now, we will discretize them in the time dimension and divide the time interval
into N partitions, so that
and
. For
sufficiently large, according to (18) and (19) we have:
where
,
,
. [
5,
18].
The above is the Euler scheme for discretization. In this way, we can start from the initial value of (4) and finally obtain the approximation of the Euler scheme of (4) at N partitions.
3.4. General Framework for Neural Networks
Two feedforward neural networks are established at each time step
in (21) and (22). One is to approximate the gradient of the unknown solution, which means to approximate the function
, and this neural network is recorded as
, such that
represents all parameters of this neural network and
indicates that this neural network is used to approximate
at time
. Another is to approximate the jump process of the unknown solution, which means to approximate the function
, and this neural network is recorded as
, such that
represents all parameters of this neural network, and
indicates that this network is used to approximate
at time
. For the convenience of expression, we can suppose
,
,
. As shown in
Figure 1, all sub-neural networks are stacked together to form a complete neural network.
in the initial layer is a random variable.
,
and
in the initial layer are unknown, and we treat them as parameters of the neural network.
,
and
correspond to the value of the BSDE with jumps as follows:
and
In this neural network, the
of the current layer is related to the
of the previous layer, and at the same time the
of the current layer is related to the
,
,
and
of the previous layer. However, there are no
and
in the current layer. Therefore, as shown in
Figure 1, our solution is to start from the
of the current layer to build two sub-neural networks to represent these two values. In addition, the construction of the final loss function can be obtained from a given terminal value condition
, that is,
in the neural network, which also corresponds to
in the nonlinear Feynman-Kac formula.
Specifically, for
, we can set
,
as parameters to obtain the appropriate Euler approximation
of the process forward:
For all appropriate
, we have
. For all appropriate
, we have
. Then, we can obtain a suitable approximation of
:
The mean square error between the final output
of neural network and the true value
at the terminal time is selected as the loss function:
is the set of all training parameters of the above system, such that . This loss function is used since . The expectation in the loss function in (28) is the expectation for all sample paths, but due to the infinite number of sample paths, it is not possible to traverse the entire training set. Therefore, we adopt an optimization method based on stochastic gradient descent. In each iteration of gradient descent, only a portion of samples is selected to estimate the loss function, thereby obtaining the gradient of the loss function over all parameters and obtaining a one-step neural network parameter update.
Like this, the back propagation of this neural network uses an optimizer to update the parameters layer by layer. When the DNN is trained, is fixed as the parameter. Take this parameter out and it is the required value.
3.5. Details of the Algorithms
The detailed steps of our proposed algorithm based on deep learning to numerically solve the BSDE with jumps is presented as follows.
According to our algorithm, which uses a deep learning-based neural network solver, the BSDE with jumps in the form of Equation (4) can be numerically solved in the following ways:
Simulate sample paths using standard Monte Carlo methods
Use a deep neural network (DNN) to approximate
and , and then plug them into the BSDE with jumps to perform a forward iterative operation in time
For simplicity, here we use the one-dimensional case as an example, and the high-dimensional case is similar. We divide the time interval into N partitions, so that and , assuming , and . The detailed steps are as follows:
- (1)
N Monte Carlo paths of the diffusion process
are sampled by the Euler scheme:
This step is the same as the standard Monte Carlo method. Other discretization schemes can also be used, such as logarithmic Euler discretization or Milstein discretization.
- (2)
At , the initial value , and are randomly selected as parts of the neural network parameters, and , and are all constant random numbers selected from an empirical range.
- (3)
At each time step
, given
,
and
are approximated using deep neural networks, respectively. Note that every time
, two sub-neural networks are used for all Monte Carlo paths. In our one-dimensional case, as described in
Section 3.4, we introduce two sub-deep neural networks:
, such that
and
, such that
.
,
,
.
- (4)
For
, we have
According to this formula, we can directly calculate the at the next time step. This process does not require any parameters to be optimized. In this way, the BSDE with jumps propagates forward in time direction from to . Along each Monte Carlo path, as the BSDE with jumps propagates forward in time from 0 to T, is estimated as , where is all the hyper-parameters for the N − 1 layers neural networks.
- (5)
Calculate the following loss function:
where g is the terminal function.
- (6)
Use stochastic optimization algorithms to minimize the loss function:
The estimated value is the desired initial value at time t = 0.
4. Numerical Results
In this section, we will use the deep neural network to code the theoretical framework of the proposed algorithm. Many problems in the financial field can be solved through numerical solutions using high-dimensional PDEs and corresponding BSDEs. Among them, the jump process can depict the sudden impact of the outside world in its process, so that it can accurately depict a class of uncertain things, and this situation is most common in financial markets. So, we extend the numerical solution problem of a high-dimensional PDE to the case of jump diffusion. In this paper, three classical high-dimensional PDEs with important applications in finance-related fields are selected for numerical simulation. They are financial derivative pricing under the jump-diffusion model, Bond pricing under the Vasicek model with jump, and the Hamilton-Jacobi-Bellman equation.
The numerical experiments in this paper use a 64-bit Windows 10 operating system, based on the PyTorch deep learning framework in Python. All examples in this section are calculated based on 1000 sample paths, each chosen for a time partition of N = 100. We ran each example 10 times independently to calculate the average result. All neural network parameters are initialized by uniform distribution. Each sub-neural network of each time node contains 4 layers: one d-dimensional input layer, two (d + 10)-dimensional hidden layers and one d-dimensional output layer. We use the rectifier function (ReLU) as the activation function. Batch Normalization (BN) techniques are used after each linear transformation and before activation.
4.1. Pricing of Financial Derivatives under Jump Diffusion Model
In this section, we will apply the proposed method to the pricing of derivatives related to a 100-dimensional jump diffusion model. In this model, the stock price
satisfies the following jump diffusion model [
19]:
where
is the constant discount rate,
is the average number of jumps per unit time of the stock price and
is the constant volatility.
represents the jump magnitude. Assuming that the jump amplitude is fixed and V is constant, we can let
and
. This equation can be written as
.
It is known that the European call option with the stock as the underlying asset satisfies the following partial differential equation:
For all , , , . Suppose , , , , , , , , , , here , , are randomly selected in , .
After calculation,
Table 1 and
Table 2 show the important numerical results of solving the related derivative pricing problem of the 100-dimensional jump diffusion model with Adam and NAdam optimizers, respectively, including that with the change of iteration steps, mean and standard deviation of loss function
, mean and standard deviation of loss function and running time. Only the numerical results of iteration steps
are selected as typical examples for display. We set the initial value
obtained from the classical Monte Carlo method as the exact solution and compare it with the numerical results.
It can be seen from
Table 1 and
Table 2 that when the iteration steps are the same, the calculation time required for using the two optimizers is not very different. As the iteration progresses, the loss function value of the NAdam optimizer is smaller than that of the Adam optimizer at the same iteration steps and the final loss function value is also smaller.
Figure 2 and
Figure 3 show the changes of the initial value estimate and loss function with the number of iteration steps when Adam and NAdam optimizers are used for solving. It can be seen that the initial value estimates and loss functions of the numerical solutions of the two methods converge rapidly. However, the convergence speed of the latter is obviously faster than that of the former, whether it is the initial value estimate or the loss function. Among them, the former tends to converge after about 1500 iterations, while the latter tends to converge after about 700 iterations. When the iteration is 4000 times, the numerical solution obtained with the Adam algorithm is 0.32143775, while the numerical solution obtained with the NAdam algorithm is 0.32273737, which is similar and not significantly different from the exact value simulated by the Monte Carlo method.
4.2. Bond Pricing under the Jumping Vasicek Model
In this section, we use the proposed method to solve the pricing problem of a class of bonds with interest rates subject to the Vasicek model with jumps [
20,
21,
22]. In this model, the short-term interest rate
obeys the following stochastic differential equations:
i.e., each part of
follows:
For all
,
,
,
. Suppose
,
,
,
,
,
,
,
,
,
are randomly selected in
,
,
. Then, for the zero-coupon bond price
paying 1 at maturity T under the above jump Vasicek model, it satisfies that
for all
,
, and the following partial differential equation holds:
After calculation,
Table 3 and
Table 4 show the important numerical results of solving a class of bond pricing problems with interest rates subject to the Vasicek model with jumps using the Adam and NAdam optimizers, respectively, including that with the change of iteration steps, mean and standard deviation of loss function
, mean and standard deviation of loss function and running time. Only the numerical results of the iteration steps
are selected as typical examples for display. We set the initial value
obtained from the classical Monte Carlo method as the exact solution and compare it with the numerical results.
It can be seen from
Table 3 and
Table 4 that with the iteration, the loss function value of the NAdam optimizer is smaller than that of the Adam optimizer at the same iteration steps, and the final loss function value is smaller, but the cost is that the operation time becomes longer.
Figure 4 and
Figure 5 show the changes of the initial value estimate and loss function with the number of iteration steps when Adam and NAdam optimizers are used for solving. It can be seen that the initial value estimates and loss functions of the numerical solutions of the two methods converge rapidly. However, the convergence speed of the latter is obviously faster and more stable than the former, whether it is the initial value estimate or the loss function. When the iteration is 4000 times, the numerical solution obtained with the Adam algorithm is 0.25530547, while the numerical solution obtained with the NAdam algorithm is 0.27403888, which is similar and not significantly different from the exact value simulated by the Monte Carlo method.
4.3. Hamilton-Jacobi-Bellman (HJB) Equation
In fields such as finance, investment, and risk management, optimization and control problems are often involved, and these problems are often represented by stochastic optimal control models. One way to solve this kind of control problem is to obtain and solve the Hamilton-Jacobi-Bellman equation (HJB equation for short) of the corresponding control problem according to the principle of dynamic programming. In this section, we use the proposed method to solve a class of 100-dimensional HJB equations [
12]:
For all
,
,
,
. Suppose
,
,
,
,
,
,
,
, where
,
are randomly selected in
,
,
, it satisfies that
for all
,
, and the following partial differential equation holds:
After calculation, the important numerical results of solving a class of HJB equations with Adam and NAdam optimizers are shown in
Table 5 and
Table 6, respectively, including that with the change of iteration steps, mean and standard deviation of loss function
, mean and standard deviation of loss function, and running time. Only the numerical results of iteration steps
are selected as typical examples for display. We set the initial value
obtained from the classical Monte Carlo method as the exact solution and compare it with the numerical results.
It can be seen from
Table 5 and
Table 6 that with the iteration, the loss function value of the NAdam optimizer is smaller than that of the Adam optimizer at the same iteration steps, and the final loss function value is smaller, but the operation time is longer.
Figure 6 and
Figure 7 show the changes of the initial value estimate and loss function with the number of iteration steps when Adam and NAdam optimizers are used for solving. It can be seen that the convergence speed of the latter is obviously faster than that of the former, whether it is the initial value estimate or the loss function. Among them, the former tends to converge after about 7000 iterations, while the latter tends to converge after about 5500 iterations. After 8000 iterations, the numerical solution obtained with the Adam algorithm is 2.73136686, while the numerical solution obtained with the NAdam algorithm is 2.6456454, which is similar and not significantly different from the exact value simulated by the Monte Carlo method.
5. Conclusions
In this paper, we propose an algorithm that can be used to solve a class of partial differential equations with jump terms and their corresponding backward stochastic differential equations. Through the nonlinear Feynman-Kac formula, the above-mentioned high-dimensional nonlinear PDE with jumps and its corresponding BSDE can be expressed equivalently, and the numerical solution problem can be regarded as a stochastic control problem. Next, we treat the gradient and jump process of the unknown solution separately as policy functions and use two neural networks at each time division to approximate the gradient and jump process of the unknown solution respectively. In this way, we can use deep learning to overcome the “curse of dimensionality” caused by high-dimensional PDE with jumps and obtain numerical solutions. In addition, we attempt to replace the traditional Adam algorithm with the new stochastic optimization approximation algorithm and apply it to our algorithm.
Next, focusing on the financial field, we applied our algorithm to solve three common high-dimensional problems in finance-related fields and then compared the results. We concluded that our algorithm performs well in numerical simulation. It is also concluded that after applying the new optimizer to the deep learning algorithm, the convergence speed of the model is mostly faster, and the generalization ability is significantly improved, at the cost of the operation time possibly being longer.
The algorithm proposed in this paper can be used for a wide variety of problems in finance, insurance, actuarial modeling and other fields, such as the pricing problem of some special path-dependent financial derivatives, the bond pricing problem under the jump diffusion model of interest rates, the pricing of equity-linked life insurance contracts and so on.
6. Discussion and Limitations
The algorithm proposed in this article only targets a special class of nonlinear parabolic PDEs with jump terms, and its application objects have certain limitations. We can consider improving the algorithm to extend the range of application objects, such as other special forms of PDE. For example, the improved deep learning-based algorithm can be extended to solve reflected PDEs, using the penalty function method to approximate reflected PDEs to a general form of PDEs. We can also consider delayed PDEs, restricted PDEs, and so on. In addition, this paper only considers the most common Poisson process, not the general jump process. We hope to apply this algorithm to other jump processes in the future. In addition, this paper does not focus on the changes in the intensity of the jump process. We hope to extend this algorithm to the general integro-partial differential equations in the future.
This paper mainly focuses on numerical simulations of examples in finance. In the future, more cases can be considered to test the practical effect of the proposed algorithm in other jump-diffusion instances. In addition, this paper only introduces the specific steps of the algorithm but does not analyze the convergence, error estimation and robustness of stochastic simulation, and thus, we hope to improve it in future research.