1. Introduction
The study of complex physical systems described by Hamiltonian mechanics often requires a deep understanding of their underlying dynamics. The classical Hamiltonian formulation provides a mathematical framework for characterizing the evolution of systems in a deterministic manner. However, in many realworld scenarios, the dynamics are subject to unpredictable fluctuations and noises, rendering traditional Hamiltonian models insufficient to capture their behavior accurately. To address this challenge, the emerging field of stochastic Hamiltonian systems (SHSs) inference is gaining attention [
1,
2,
3]. Stochastic Hamiltonians account for the inherent randomness and uncertainties present in physical systems, allowing for a more comprehensive modeling approach. The application of stochastic Hamiltonian systems is prevalent in the real world. As described in [
4], Langevin introduced the Langevin equation, a stochastic Hamiltonian system, to explain the motion of particles in the fluid caused by random pulses of fluid molecules colliding and to show that the viscous drag is the average effect of these pulses. In the realm of control theory, stochastic Hamiltonian systems play an important role in the extension of optimal control to stochastic processes [
5]. These systems also find applications in biology. For instance, in the study of ion channel selectivity in biological systems, Seifi et al. [
6] employed the Spin–Boson model and a stochastic Hamiltonian model under classical noise to simulate the behavior of ion channels. Inferring these stochastic Hamiltonian systems from observed data has proven to be a complex task due to the nonlinearity and high-dimensional nature of the underlying dynamics. In recent years, advancements in deep learning and neural networks have opened up new avenues for tackling this problem. This paper focuses on the application of neural networks to infer SHSs from available data, aiming to learn the underlying dynamics of the observed processes. The goal is to estimate the drift and diffusion Hamiltonian functions that govern the evolution of the SHSs.
Remarkable advancements have been achieved in the realm of learning deterministic Hamiltonian systems. Chen et al. [
7] introduced Neural ODE as a method for learning ordinary differential equations (ODEs) from observed time series of states. These states are assumed to be generated by the underlying ODE flow. The approach utilizes adjoint augmented ODE solvers to compute the gradient of the loss function with respect to the network parameters. For Hamiltonian systems, Greydanus et al. [
8] proposed the Hamiltonian Neural Networks (HNNs). HNNs employ a neural network
to learn the Hamiltonian function
instead of directly learning the vector field so that the symplectic structure as well as certain conserved quantities related to the structure of the underlying system can be extracted from data. The network assumes access to time derivatives
and
and is trained to match these derivatives along the phase flow. In their work [
9], Chen et al. introduced Symplectic Recurrent Neural Networks (SRNNs), which combine the principles of Neural ODE and recurrent neural networks to learn the Hamiltonian from time series data in the state space. SRNNs exhibit robustness and demonstrate good performance in learning separable Hamiltonian dynamics. Zhu et al. [
10] proved that Hamiltonian networks based on symplectic integrators possess network targets and exhibit superior generalization ability and prediction accuracy compared to non-symplectic networks. Jin et al. [
11] developed SympNets, a novel class of symplectic neural networks for learning Hamiltonian systems. SympNets can simulate any symplectic mappings based on appropriate activation functions. In their work [
12], Xiong et al. addressed the problem of learning nonseparable Hamiltonian systems using neural ODE with an augmented Hamiltonian system. Tong et al. [
13] developed a lightweight and efficient symplectic neural network based on Taylor series expansion and a fourth-order symplectic Runge-Kutta integrator. This network is also designed to learn separable Hamiltonian systems. In [
14], David et al. developed a symplectic learning approach for HNNs. They apply symplectic schemes in the loss function to train a modified Hamiltonian and then deduce the true Hamiltonian using its relationship with the modified one. Chen and Tao proposed the learning of exactly-symplectic maps in [
15].
In recent years, a variety of approaches inferring stochastic differential equations (SDEs) from data have arisen. Based on the framework of [
7], several authors extended the method to neural stochastic differential equations (neural SDEs) by incorporating stochasticity into the dynamics. Tzen et al. [
16] introduced a novel framework that combines neural networks and Gaussian latent variables to model SDEs in the diffusion limit. Ref. [
17] presented a method for efficiently computing gradients and performing variational inference in models involving SDEs. Yildiz et al. [
18] combined Gaussian processes and neural networks to estimate the drift and diffusion coefficients in SDEs from sparse and noisy data. Jia et al. [
19] introduced neural jump SDEs, which integrate neural networks with jump diffusion processes to model rare events and non-Gaussian dynamics. Kong et al. [
20] introduced the SDE-Net, treating the forward propagation of a deep neural network as the evolution of a stochastic dynamic system governed by the Brownian motion. By adopting the generative model of the SDE framework, they successfully addressed various image-related challenges. Archibald et al. [
21] presented a sample-wise back-propagation method for training stochastic neural networks through a stochastic optimal control framework, utilizing a discretized stochastic differential equation structure, and provided convergence analysis with and without convexity assumptions. Gobet et al. [
22] developed a neural network-based method to estimate the parameters of scalar SDEs from low-frequency data, improving traditional estimation techniques. Ref. [
23] introduced a novel stochastic differential equation framework for generative modeling that smoothly transforms data distributions using noise injection, leveraging score-based generative modeling and diffusion probabilistic modeling to achieve record-breaking performance in image generation tasks. Liu et al. [
24] proposed a supervised learning framework for training generative models using a novel score-based diffusion model, avoiding issues of unsupervised training by generating labeled data through a training-free score estimation method, resulting in improved sampling efficiency and performance. Meanwhile, some scholars explored the integration of popular models with SDEs. Ref. [
25] treated neural SDEs as a Bayesian neural network with infinite depth, utilizing Bayesian inference to infer the drift and diffusion coefficients. Kidger et al. [
26] employed neural SDEs as the generator of a GAN while using neural controlled differential equation (CDE) as the discriminator. These methods parameterize the drift and diffusion coefficients with neural networks, but different perspectives and inference methods have led to the development of various neural SDE models, such as those based on the Fokker–Planck equations associated with the underlying SDEs [
27,
28,
29,
30,
31,
32] where [
28,
29,
30,
31] extended the driving process to non-Gaussian Lévy processes. There are also other approaches leveraging the probability density of the latent processes for inference such as the variational approaches [
33,
34], the numeric-informed maximum likelihood method [
35], and the normalizing flow techniques [
36,
37,
38,
39,
40]. Methods extending the ResNet approach and flow map operators to stochastic context emerge in [
41,
42], etc., and employing neural ODE via deriving dynamics of moments of SDEs’ solutions can be found in [
43].
For stochastic Hamiltonian systems, we propose a neural network learning approach for extracting the drift and diffusion Hamiltonian functions from observational data, based on expressing the solution in an integral form, and then applying numerical quadrature formulae to approximate them. To deal with the stochasticity, we employ the moment calculations of the solutions. Our contribution mainly lies in the generalization of the HNNs to stochastic context, with a quadrature-based model structure that can leverage numerical quadrature theory to improve the accuracy of HNN learning. Meanwhile, moments calculations provide a direct and simple denoised loss and training strategy which may benefit algorithm generalization. Numerical experiments on four Hamiltonian systems give support to the proposed learning approach and showcase that it has better learning accuracy and generalization ability compared to certain existing methods, and suggest its potential for improving the learning effect by utilizing quadrature with higher accuracy.
The rest of the contents are organized as follows.
Section 2 provides an introduction to the problem statement relevant to the stochastic Hamiltonian systems.
Section 3 presents the design of the learning model, and network settings including the data acquisition, learning strategy, as well as the derivation of the loss functions.
Section 4 conducts numerical experiments on four different Hamiltonian systems, illustrating the performance of the learning algorithm.
Section 5 lists some limitations of the current study, followed by a brief conclusion in
Section 6.
4. Numerical Experiments
The experiments in this section are conducted on a system running Windows 11, and within a virtual environment including Python 3.9.7 and Torch 1.12.0, which is created by using Anaconda.
4.1. Example 1
This example is aimed to compare the learning accuracy, as well as the prediction ability between the corrected SHNN method [
14] and our quadrature-based learning model. The underlying Hamiltonian system is the harmonic oscillator
where
, and
We denote the fully connected neural network that approximates the Hamiltonian in the quadrature-based model by
, and the one in the corrected SHNN model by
. Both
and
have a single layer of width 16, utilizing
as the activation function. We use Adam with learning rate
as the optimizer and take
. For training, we set
, the number of initial values
, and the initial points
are uniformly sampled from the region
. For the quadrature-based model, we choose three different quadrature formulae: the Simpson’s rule, the two-point Gaussian quadrature, and the three-point Gaussian quadrature. The weights and nodes for them on the interval
are listed in the
Table 1.
Denoting the above six different nodes other than
in increasing order by
, respectively, we generate the training data set
by using numerical methods on the harmonic oscillator (
24) with tiny time-step
, starting from each sampled initial value
.
The loss function of the corrected SHNN model by using the midpoint method as the symplectic integrator is
where we choose
and
as the observation time pair in the corrected SHNN model whereby
. After training,
needs to be corrected according to the following relation [
14]
to make
closer to the true Hamiltonian
.
In this example, we use the following average learning error
defined in [
14] to measure the accuracy of approximating
H by the learned
in the corrected SHNN method and our quadrature-based learning model:
where the term
denotes the average value of the function
f over the region
V. This error representation aims to eliminate the additional constant arising from deriving the Hamiltonians from their gradients in the learning process. In our numerical experiment, we take
.
The
Table 2 lists the average errors of learning
, resulting from using the quadrature-based model with three different quadrature formulae, as well as from the corrected SHNN method:
Hereby, we have calculated the learning errors by and , where h is the time step used in . It can be seen that the three quadrature-based models provide more accurate than the corrected SHNN method, and the learning accuracy increases with using more precise quadrature formulae in the quadrature-based models.
Figure 1 compares the evolution of
on
predicted by the corrected SHNN model and three quadrature-based models with the true solution, for
and
.
Figure 2 compares predictions of
on
made by the corrected SHNN method and the three-point Gaussian quadrature-based model with that of the true solution, for
and
. It is clear that the quadrature-based models behave better in long-term predictions than the corrected SHNN, and the three-point Gaussian quadrature provides the best learning accuracy among the three quadrature formulae.
4.2. Example 2
In this example, we consider the stochastic Kubo oscillator [
2] which is a benchmark stochastic Hamiltonian system with Hamiltonian functions
where
. For the training we let
,
, the number of initial values
, and the initial points
are sampled uniformly from the region
. We generate the training data set
by using numerical methods on the Kubo oscillator system with appropriate tiny step sizes, starting from each sampled
.
The fully connected layers and in the network are of size and initialized as . The fully connected layers and are initialized as with size . and are -dimensional bias and initialized as . and are -dimensional bias and initialized as . is chosen as the activation function, and we train the network with .
The three panels in
Figure 3, from left to right, depict the comparison in the phase space between the true solutions and the predicted solutions generated by the neural network using the Simpson quadrature, the two-point and the three-point Gaussian quadrature, respectively. Hereby. we take
, and both the predicted and the true solutions are obtained by applying the midpoint scheme to the learned and the true system, respectively, with time step
for prediction and time step
for simulating the true solution, accordingly. It can be seen that the three-point Gaussian quadrature implies the most accurate prediction.
Figure 4 compares the evolution of the predicted and true
and
on
, whereby the network predictions in the left, middle and right panels are based on the Simpson, the two-point and the three-point Gaussian quadrature, respectively. Again, the prediction accuracy increases from left to right.
Figure 5 visualizes the learning error distribution of
(panels (a1), (b1), (c1)) and
(panels (a2), (b2), (c2)) in the phase space, whereby the results in panels (a
i), (b
i), and (c
i)
are obtained using the Simpson, the two-point and the three-point Gauss quadrature, respectively. Here, the error function between the learned Hamiltonian
and the true Hamiltonian
is defined as [
14]:
which is a function of
and equals the average learning error (
25) after taking average over the region
V. In this experiment, we take
. It can be seen from the figures that the learning errors of the Hamiltonian functions are distributed in an almost uniform way in the phase space, meaning that there are no particular regions exhibiting significantly large or small errors. Meanwhile, it is clear that utilizing the two- and three-point Gaussian quadrature can produce better learning accuracy.
The average learning error (
25) on the region
for
and
by using the three different quadrature formulae are listed in the following table:
The errors reported in
Table 3 are the average results from ten tests. It is evident that the three-point Gaussian quadrature yields the best accuracy, while the error by Simpson’s formula is the largest among the three.
Now we compare our method with that in [
48], the idea of which is as follows. For the stochastic differential equations with learnable parameters
the following Kramers–Moyal formula holds:
Based on (
30), the loss functions are built as
where we take
,
in our experiments, and
is the training data set.
are numerically generated from the initial point
with time step
and the initial points
are uniformly sampled from the region
. We take
. Both
and
are single-layer, fully connected neural networks with a width of 32. The training epoch is 1000, and the learning rate of Adam is 0.001. For convenience, we will refer to the above method of learning SDEs as the Kramers–Moyal method. Next, we compare the predicted solution obtained by the Kramers–Moyal method with that of our quadrature model.
Figure 6 draws the predicted
on
arising from the two learning methods, as well as from the true system. It is obvious that the three-point Gaussian-based model provides better learning accuracy and possesses stronger generalization ability than the Kramers–Moyal method.
4.3. Example 3
In this subsection, we consider the stochastic Hamiltonian system with Hamiltonians:
The setup of training data and neural network is the same as in Example 2.
Figure 7 displays the comparison between
and
(
), by using the three mentioned quadrature in the learning, where the panels from left to right correspond to Simpson’s rule, the two-point and three-point Gauss quadrature formula, respectively.
The left, middle and right panels in
Figure 8 illustrate the comparison between the true and predicted phase trajectories generated by the neural network using the Simpson’s rule, two-point and three-point Gaussian quadrature, respectively, on
. The predicted and true solutions are simulated by the midpoint discretization on the learned and true system, respectively, with time step
for prediction, and
for the true solution. Obviously, the Gaussian quadrature formulae provide better prediction than Simpson’s rule, and the three-point Gaussian quadrature behaves the best.
In
Figure 9, the spread of the learning error function (
28) in the phase space for
resulting from the Simpson’s rule, the two-point, and the three-point Gaussian quadrature formulae, are illustrated in the panels (a1), (b1) and (c1), respectively, and those for
are shown in (a2), (b2) and (c2), accordingly.
The average learning error (
25) on the region
for
and
by using the three different quadrature formulae are listed in the following table:
The errors presented in
Table 4 are the mean values from ten tests. The advantage of the Gaussian quadrature formulae, especially the three-point one is evident.
4.4. Example 4
In this example, we compare our method with the numerics-informed likelihood method given in [
35], which simulates the probability density function of
based on the Euler–Maruyama scheme, and we also compare with the Kramers–Moyal method. The considered stochastic Hamiltonian system is the same as that in Example 3, which can be written explicitly as the Itô (equivalent with Stratonovich) form:
where
,
, and
. As described in [
35], when using the likelihood method, Equation (
33) is assumed to be known, while
b and
are unknown.
In the experiment, we build the fully connected neural networks
and
to approximate
and
, respectively. Both
and
have 2 layers each with 50 neurons, and employ the exponential linear unit (ELU) as the activation function. The loss function for training
and
by the likelihood method is
for which we set
, and the number of initial values
. As in [
35],
and
are regarded as initial observations which we sample uniformly from
. The values
are generated as in [
35] using an approximated adjoint symplectic Euler–Maruyama method applied to (
33) and (
34) with a tiny step size
, over 10 steps, where the fixed
are applied to approximate
(for all
) appearing in the 10 steps numerical simulation. The training epoch is 200, the learning rate of Adam is 0.01, and the batch size is 32. The setup of training data and neural network for the Kramers–Moyal method is the same as in Example 2. Next, we compare the
and
obtained by the above likelihood method and the Kramers–Moyal method with the results from our method in Example 3.
Figure 10 compares the learned
and
by using the likelihood method, the Kramers–Moyal method, the three-point Gaussian quadrature based method, with the true
and
, in the left and right panels, respectively.
Figure 11 draws the predicted
on
arising from the three learning methods as well as from the true system. It is obvious that the three-point Gaussian-based model provides better learning accuracy and possesses stronger generalization ability than the likelihood method and the Kramers–Moyal method.
Figure 12 illustrates the probability density functions (PDFs) of
at three time points estimated using 100 trajectories generated from 100 initial points with
,
with respect to the system learned by the likelihood and the quadrature based model. It can be seen that the three-point Gaussian quadrature-based model produces good estimates of the PDFs for both small and large time points
t, while the likelihood method only gives an approximate PDF for a small
t (e.g.,
).
4.5. Example 5
In the last example, we consider the system where the Hamiltonian functions are
on which we compare our method with the Kramers–Moyal method. The configuration of the training data and neural network is the same as in Example 1, except that we take a smaller
for the quadrature model. The setup of training data and neural network for the Kramers–Moyal method is the same as in Example 2. Next, we illustrate the predicted solution by the Kramers–Moyal method and our quadrature-based model.
Figure 13 draws the predicted
on
arising from the two learning methods, as well as from the true system. It is evident that the three-point Gaussian-based model behaves better in accuracy and generalization ability than the Kramers–Moyal method.
4.6. Summary
Examples 1, 2, 4 and 5 contribute to comparing our method with the existing SHNN method, the numerics-informed likelihood method and the Kramers–Moyal method applied to deterministic/stochastic harmonic Hamiltonian oscillators as well as the other two SHSs with non-polynomial Hamiltonian functions. Experimental results show the superiority of our method in learning accuracy and long-time prediction, which may essentially result from the quadrature-based model structure and the moment denoising technique. Examples 2 and 3 make a comparison among the applications of three different quadrature formulae in the model, including the Simpson’s rule, the two-point and three-point Gaussian quadrature formulae. It is shown that the learning accuracy and the prediction ability of our method increase with employing more precise quadrature formulae.
6. Conclusions
We proposed a neural network learning method for detecting the drift and diffusion Hamiltonian functions of stochastic Hamiltonian systems from data, where the loss functions are built upon the integral formulation of the system solution with the integrals being approximated by numerical quadrature formulae, and the stochasticity being ‘removed’ via moments calculations. Numerical experiments show the effectiveness of the methodology and indicate that the learning accuracy of the proposed models can be improved by employing quadrature formulae of higher accuracy. Moreover, the numerical comparison with the corrected SHNN method, the numerics-informed likelihood method and the Kramers–Moyal method on four concrete Hamiltonian systems show better accuracy and stronger generalization ability of our models. This may mainly be owed to the integral formulation of the loss that enables a more feasible reduction in the learning error via precise quadrature formulae instead of through the complex Hamilton–Jacobi approach by classical models on one hand, and the simple moment fitting method rather than intricate denoising technique on the other hand, which enlarges the possibility of better generalization.
The proposed models generalize the HNNs for deterministic Hamiltonian systems to stochastic context and provide a new way of improving the accuracy and prediction ability of the HNNs from the integral point of view.
Acknowledging the prevalence of multiple noise phenomena in real-world scenarios, our future work will consider expanding the method to SHSs featuring high-dimensional noises. Moreover, the theoretical convergence analysis of the algorithm for a deeper exploration of the inherent nature of the method will be a topic of our further investigation.