Next Article in Journal
Application of Machine Learning and Hydrological Models for Drought Evaluation in Ungauged Basins Using Satellite-Derived Precipitation Data
Previous Article in Journal
Extreme Precipitation Events During the Wet Season of the South America Monsoon: A Historical Analysis over Three Major Brazilian Watersheds
Previous Article in Special Issue
Rainfall Projections for the Brazilian Legal Amazon: An Artificial Neural Networks First Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Using Machine Learning for Climate Modelling: Application of Neural Networks to a Slow-Fast Chaotic Dynamical System as a Case Study

by
Sergei Soldatenko
* and
Yaromir Angudovich
Arctic and Antarctic Research Institute, St. Petersburg 199397, Russia
*
Author to whom correspondence should be addressed.
Climate 2024, 12(11), 189; https://doi.org/10.3390/cli12110189
Submission received: 8 October 2024 / Revised: 10 November 2024 / Accepted: 12 November 2024 / Published: 15 November 2024
(This article belongs to the Special Issue Addressing Climate Change with Artificial Intelligence Methods)

Abstract

:
This paper explores the capabilities of two types of recurrent neural networks, unidirectional and bidirectional long short-term memory networks, to build a surrogate model for a coupled fast–slow dynamic system and predicting its nonlinear chaotic behaviour. The dynamical system in question, comprising two versions of the classical Lorenz model with a small time-scale separation factor, is treated as an atmosphere–ocean research simulator. In numerical experiments, the number of hidden layers and the number of nodes in each hidden layer varied from 1 to 5 and from 16 to 256, respectively. The basic configuration of the surrogate model, determined experimentally, has three hidden layers, each comprising between 16 and 128 nodes. The findings revealed the advantages of bidirectional neural networks over unidirectional ones in terms of forecasting accuracy. As the forecast horizon increases, the accuracy of forecasts deteriorates, which was quite expected, primarily due to the chaotic behaviour of the fast subsystem. All other things being equal, increasing the number of neurons in hidden layers facilitates the improvement of forecast accuracy. The obtained results indicate that the quality of short-term forecasts with a lead time of up to 0.75 model time units (MTU) improves most significantly. The predictability limit of the fast subsystem (“atmosphere”) is somewhat greater than the Lyapunov time.

1. Introduction

Predicting climate change caused by natural and human-made factors is one of the key problems of contemporary climatology [1,2]. Since the Earth’s climate system (ECS) has a number of rather specific features [3], to predict and project future climate, researchers use fairly complex climate models, which describe, among other things, the general circulation of the atmosphere and oceans [4,5,6]. In general terms, climate models represent systems of partial differential equations (DEs), which are mathematical expressions of the laws of physics, chemistry, thermodynamics, and fluid mechanics. Since the solution to these equations cannot be found analytically, numerical methods are usually used for this purpose. Building and running climate models require enormous human and computational resources [7]. Researchers have always looked for ways to overcome these critical issues. Recent advancements in artificial intelligence (AI), and machine learning (ML) in particular, have allowed scientists to create so-called surrogate models (emulators or meta-models), which are comparable in accuracy to traditional “physically based” climate models but are computationally much more efficient (e.g., [8,9,10]). Surrogate models represent quite simplistic versions of complex models of physical and other systems. Typically, these models map input data to output data, assuming that the inputs and outputs are (nonlinearly) related to each other. Supervised ML is a suitable and effective tool for building emulators.
Over the last few years, there has been a steady increase in the number of publications considering various aspects of the application of ML—and in particular deep learning (DL), which is a subset of ML—to solve different problems in the fields of meteorology, hydrology, and climatology. This demonstrates the growing interest of climate modelling experts in AI. The main areas of AI application in climate research and numerical weather prediction (NWP) have been discussed in a number of review articles (e.g., [11,12,13,14,15,16,17,18,19,20,21,22]). Experts agree that the application of AI technologies, and ML algorithms in particular, have great potential for bias correction of contact and remote observations, data assimilation, developing parameterization schemes of sub-grid physical processes, processing satellite images, estimating NWP and climate model parameters, post-processing of NWP and climate model outputs, building surrogate weather prediction and climate models, etc.
As mentioned above, all existing physically based (classical) NWP and climate models are built on a unified methodological basis, since DEs used in these models to describe the evolution of the ECS and its components (e.g., the atmosphere and oceans) are essentially the same. Meanwhile, in recent times, in the study of complex systems, an interdisciplinary approach called surrogate modelling has become increasingly widespread (e.g., [23,24,25,26,27,28]. The essence of this approach is to build an emulator that could replace a physically based model of a system, with a comparable degree of adequacy. The process of building an emulator is based on ML, including the use of multilayer artificial neural networks (ANN). Meta-models vastly (by orders of magnitude) reduce the computational time. In other words, they have significantly greater computational efficiency compared to physically based models.
To date, the most notable research advances have been achieved in developing ML-based sub-grid-scale parameterization of physics, data pre-processing, quality control and assimilation, and post-processing of modelling and forecasting results (e.g., [8,9,10,19,20,21,22,29] and references therein). In recent years, a number of studies have demonstrated the capabilities of DL to model global temperature trends (e.g., [30]) and phenomena, such as the El Niño–Southern Oscillation using coupled atmosphere–ocean models (e.g., [31,32,33,34]). However, science is never settled; therefore, the potential for using new ML technologies to build surrogate NWP and climate models, comparable in quality to classical models, is actively explored on an ongoing basis (e.g., [8,9,35,36,37,38,39,40,41,42,43,44]).
It is known that the ECS is completely nonlinear by nature. Its components, such as the atmosphere and oceans, as well as various internal dynamical, physical, and chemical processes and cycles, are also nonlinear. For this reason, building a neural network-based ECS model is a challenging problem that requires, among other things, high computing power and tuning, carried out via a large number of numerical experiments [7].
A fairly large, continuously expanding variety of ANNs have been developed to date. All of them can be conditionally divided into classes concerning types of architecture and topology, methods of training, and the determination of the weights of the ANN, as well as the problems solved using the ANN, etc. (see [45,46,47] and references therein). Meanwhile, ANNs of each class may differ from each other in terms of the number of hidden layers and neurons in them, the activation function, the optimization algorithm, and other details that establish the nature of processing input data in relation to output data. In general, the choice of ANN structure depends on the problem being solved, available data, accuracy requirements, experience, and other factors. In this context, reduced-order climate models can be viewed as testing tools that allow you to determine and evaluate the capabilities of ANNs with different architectures to build reliable data-driven surrogate models of the climate system (e.g., [48]).
It is important to note that some reduced-order models describing the essential properties of the ECS can also reproduce the complex and chaotic behavior of the atmosphere, which is the most unstable and fluctuating component of the climate system [49,50].
This paper aims to explore the capabilities of two types of recurrent neural networks (RNN), namely, long short-term memory (LSTM) and bidirectional LSTM (BiLSTM), to mimic the nonlinear chaotic behaviour of a coupled fast–slow dynamic system, which can be viewed as an atmosphere–ocean research simulator [51,52,53,54,55,56]. In doing so, we also demonstrate a methodological approach to constructing an emulator that replaces the original, physically based model of the dynamical system, reproducing processes with two different time scales.

2. Materials and Methods

2.1. Coupled Nonlinear Dynamical System with Two Time Scales

In this paper, a low-order coupled fast–slow nonlinear dynamical system is considered as a physically based representation of the atmosphere–ocean system, for which a corresponding surrogate is constructed, in order to estimate the ability of neural networks to predict nonlinear and chaotic dynamics of complex systems such as the ocean and atmosphere. The dynamical system used combines two versions of the well-known Lorenz model [49] (Lorenz’63) with the small time-scale separation factor τ. Using lowercase letters x, y, and z for the fast subsystem variables and uppercase letters X, Y, and Z for the slow subsystem variables, we write the basic set of model equations as follows [51,52,53,54,55,56]:
d x / d t = σ y x c a X + k ,
d y / d t = r x y x z + c Y + k ,
d z / d t = x y b z c z Z ,
d X / d t = τ σ Y X c a x + k ,
d Y / d t = τ r X Y a X Z + c y + k ,
d Z / d t = τ a X Y b Z c z z ,
where σ > 0 , b > 0 , and r > 0 are the Lorenz model parameters (recalling that their standard values, providing chaotic dynamics, are σ = 10 , b = 8 / 3 , and r = 28 ); c is the coupling strength parameter of the x, y, X, and Y variables while c z is the coupling strength parameter of the z and Z variables; k is an offset (“uncentering”) parameter; and a is an amplitude scale factor (if a = 1 then the fast and slow systems have the same amplitude scale). Parameters c and c z regulate the interconnection strength between the fast and slow subsystems; the larger the parameters c and c z , the stronger the interdependence between two subsystems.
The dynamic behaviour of the coupled system is controlled by the values of parameters σ ,   b ,   r ,   c ,   c z ,   k , a , and τ. In calculations, the default values of 10, 8/3, and 28 are taken respectively for parameters σ ,   b , and r [49]. With no loss of generality, we can assume that k = 1 , a = 1 , and c = c z . The time-scale separation factor τ is taken to be 0.1, meaning that the fast subsystem is 10 times “faster” than the slow subsystem.
Particular attention should be paid to the coupling strength parameter c due to its decisive role in shaping the dynamic behaviour of the system (1)–(6) [54,56]. It is known that the stability of dynamical systems in their phase space is characterized by Lyapunov exponents, the number of which is equal to the dimensionality of the system’s phase space. The existence of a positive exponent in the Lyapunov spectrum indicates chaotic behaviour of the dynamical system in question. Let us note in passing that phase-space trajectories of climatic systems are unstable in the sense of Lyapunov [57]. Since the dynamical system under consideration is six dimensional, the general patterns of its behaviour are determined by the six-component Lyapunov spectrum λ 1 ,   ,   λ 6 , depending on parameter c [54,56]. Calculations show that for c 0.1 ,     0.9 the maximal (largest) Lyapunov exponent λ 1 is positive, and hence the dynamical system (1)–(6) exhibits the behaviour known as deterministic chaos. To build the emulator, we chose the value of 0.15 for parameter c [54], for which λ 1 = 8.60 × 10 1 .

2.2. Building a Surrogate Model of a Nonlinear Dynamical System with Two Time Scales: Problem Statement

Various ML techniques are used to build surrogate models, ranging from relatively simple (e.g., regression) to complex (e.g., neural networks). We are building a surrogate model based on neural networks, applying a supervised ML approach to “tune” the model that we trying to build. Supervised ML requires training and validation datasets. Sometimes a testing dataset is also required. In this paper, the training and validation datasets are generated by numerically integrating the set of Equations (1)–(6), performed using the fourth order Runge–Kutta method with a time step of Δ t = 10 2 . At first, the Equations (1)–(6) are integrated from t T = 2 14 Δ t to t 0 = 0 with a random initial condition. During this time period, the coupled system reaches its attractor. This procedure allows the initial transition period to be excluded from consideration. Then, state vector x 0 = x 0 ,   y 0 ,   z 0 ,   X 0 ,   Y 0 ,   Z 0 , calculated at time t 0 , is taken as the initial condition when solving Equations (1)–(6)over the given time interval t 0 ,   t e n d , where t e n d = 0.85 × 10 4 Δ t = 85 model time units (MTUs). The obtained numerical solution represents dataset D , which will be split into training and validation samples. It should be noted that D is a multivariate time series consisting of six interrelated variables x, y, z, X, Y, and Z, each of which is an equally spaced (regular) one-dimensional time series. We use 20% of the data for validation and the remaining 80% of the data for training.
Supervised learning is based on the assumption that there is some unknown relationship between the inputs and outputs of the model under test. Only a finite set of precedents (model input–output pairs), called the training set, is known. It is the set of precedents that is used to determine the relationship between the input and output of the model, which, if successfully trained, can then serve predictive purposes [58].
The emulator building problem is as follows: let a training sample S n = x i , y i ,   i = 1 , , n be given, obtained using model y = f x , where x X R k is the input vector, y Y R l is the output vector, and x i , y i X × Y is a precedent (an “input–output” pair). It is required to determine an unknown empirical relationship f ^ x = f ^ x | S n based on the training sample S n , such that f ^ x = f x + ϵ for all x X , where ϵ is the “random” error term, which can be considered to be small.
The empirical relationship obtained in this way is considered suitable for practical purposes if generalization error E g e n , between predicted values and actual values of validation sample S m = x j , y j ,   j = 1 , , m
E g e n = 1 m j = 1 m y j f ^ x j 2
is fairly small.
To build the surrogate model, we use LSTM and BiLSTM neural networks, which are types of RNN [59,60]. Some previous studies have examined LSTMs in varying detail, including their advantages and disadvantages (e.g., [61,62,63,64]). Without going into detail, we note that one of the main advantages of LSTMs is their ability to detect nonlinear patterns and long-term dependencies in data. The accuracy of time series forecasts produced using neural networks is influenced by the volume and quality of training data, the number of hidden layers and the number of neurons in them, the number of epochs (iterations), the type of activation function, the forecasting time horizon, etc. The complexity of a neural network and its performance are largely determined by the number of hidden layers N h and the number of nodes in each layer N h , n . Unfortunately, there are no clear theoretically grounded approaches to determining the size and number of hidden layers in a neural network. For this reason, parameters N h and N h , n are usually determined heuristically, using elements of the “trial and error” method, i.e., experimentally. Some recommendations on this matter can be found, for example, in [65]. Thus, to build the emulator we should specify the number of hidden layers and their sizes, the type of activation function, the optimization algorithm, the forecast lead time, and the number of training cycles (epochs). The surrogate model is built using the TensorFlow platform with its high-level Keras API.

3. Results

As mentioned earlier, the surrogate model is built based on RNN architectures, such as unidirectional LSTM and its extension known as bidirectional LSTM. It should be noted that it is impossible to theoretically prove the advantages of BiLSTM over LSTM since we have no appropriate proof methods. When determining the basic hyperparameters of neural networks used to simulate the nonlinear dynamics of the system (1)–(6), we assume that there is no rigorous theoretical proof for their choice to date. This applies, in particular, to determining the number of hidden layers of neural networks N h and their sizes N h , n . Therefore, the number of hidden layers in numerical experiments varied from 1 to 5, with the number of neurons in them ranging from 16 to 256. Along with the hidden layers, the neural networks contain two attributive layers: the input layer and the output layer. The input layer has six neurons (according to the number of dependent variables x, y, z, X, Y, and Z of the model under consideration), while the output layer has three neurons, because we are interested in the “atmospheric” variables x, y, and z, only, i.e., in a “weather prediction”. The learning accuracy and rate of neural networks depend on the optimizer used to update the weights of the neural network. The Keras library has several built-in optimizers. The calculations were made based on the “Adam” (Adaptive Moment Estimation) optimizer, which is one of the most efficient optimization algorithms for recurrent neural network training. The advantages of this algorithm are the effective handling of steep and low gradients arising during neural network training and the ability to partially overcome the over-training problem.
The sigmoid function is used as an activation function, which is defined by the formula f φ = 1 / 1 + e φ , where φ is the input variable. Sigmoid function compresses inputs into a bounded range between 0 and 1. It should be noted that that the activation function type affects the neural network learning rate, since its weights change during the learning process, and the extent of these changes depends on the value of the activation function f φ and its derivative f φ at the given point φ = φ * . The neural network training rate also depends on the selected initial values of the weights. Since no universal way of initializing the neural network weights exists, random numbers from the interval 0.5 ; 0.5 are taken as the initial values of the weights in the problem under consideration. The number of epochs was determined experimentally based on the analysis of the behaviour of the loss function, represented by the mean squared error (MSE):
MSE = 1 k i = 1 k φ A , i φ ^ A , i 2
where φ A = x , y , z T and φ ^ A = x ^ , y ^ , z ^ T are the vectors of “observed” and predicted values of “atmospheric” variables, respectively; and k is the number of forecasts being estimated.
The forecast time horizon (forecast lead time), i.e., the time period over which predictions of “atmospheric” variables were made, varied from 0.05 to 1.5 MTU. It has been experimentally established that a further increase in the forecast lead time makes no sense, since the quality of forecasts deteriorated sharply. This was quite expected, since we considered the chaotic regime of the dynamic system (1)–(6). It should be noted that the predictability limit of chaotic systems is characterized by the Lyapunov time T L , which is the reciprocal of the largest Lyapunov exponent. In our case, this is T L = 1 / λ 1 = 1.15 MTU.
It should be recalled that to obtain the training and validation samples, Equations (1)–(6) were integrated over 8500 forward steps, which corresponds to 85 MTU. Thus, for each variable x ,   y ,   z ,   X ,   Y ,   and   Z , the training and validation sample sizes were 6800 and 1700, respectively. Before model training or predictions, data scaling was applied to bring all values into the range [0, 1].
It was found that with the selected hyperparameters, the neural network over-training effect is not observed, and the applied dropout procedure worsens results, so it was decided to abandon its use.
Let us consider some of the results obtained in the present study. Figure 1 shows graphs illustrating the time variation of variables x, y, z, X, Y, and Z obtained as a result of numerical integration of Equations (1)–(6). We will categorize these data as “observational data” used to form the training and verification samples. The corresponding attractors of the “fast” and “slow” subsystems are shown in Figure 2. Model testing (in other words, evaluating forecast accuracy) was performed only for the “atmospheric” variables x, y, and z. Forecast accuracy was estimated using the standard deviation RMSE = MSE and the mean absolute error (MAE):
MAE = 1 k i = 1 k φ A , i φ ^ A , i .
Mean absolute percentage error (MAPE) is used quite often to estimate prediction quality. However, for values close to zero, MAPE is not suitable, since when the denominator in scaling formula (not shown here) is close to zero, the division result tends to infinity. Therefore, only metrics RMSE and MAE were used to estimate forecast quality.
It is well known that absolutely accurate weather forecasts are fundamentally impossible, although their quality is continuously improving. All other things being equal, forecast accuracy depends largely on lead time; the shorter the lead time, the higher the accuracy of weather forecasts. Therefore, our attention was focused on how forecast accuracy changes with increasing forecast time horizon.
To determine the basic configuration of neural networks used for forecasting, a large number of experiments were conducted with different numbers of hidden layers and nodes per hidden layers. Neural networks with three hidden layers were chosen as the basic networks following the analysed results of these experiments. In particular, Figure 3 illustrates that the three-layer LSTM and BiLSTM networks appear preferable to the others. This figure summarizes the results of testing LSTMs and BiLSTMs with different numbers of hidden layers.
The accuracy metrics of forecasts made over different time horizons using LSTM and BiLSTM in the basic configuration are given in Table 1 and Table 2. The findings from the analysis of these tables revealed the advantages of bidirectional neural networks over unidirectional ones in terms of forecasting accuracy. This result seems to be very important since BiLSTM are mainly used for natural language processing rather than for simulating the chaotic behaviour of dynamic systems. The data in Table 1 and Table 2 also show that forecast accuracy decreases with increasing lead time, which is quite expected, mainly due to the chaotic behaviour of the fast subsystem. Thus, the accuracy of “weather” predictions tends to decrease as the forecast time horizon extends, and forecasts beyond 1.5 MTU lead times have no value due to their low accuracy. Overall, it can be concluded that the predictability limit of the fast system (“atmosphere”) is somewhat greater than the Lyapunov time and is about 1.5 MTU.
All other things being equal, increasing the number of neurons in hidden layers facilitates the improvement of forecast accuracy. The obtained results indicate that the quality of short-term forecasts with a lead time of up to 0.75 MTU improves most significantly.
Figure 4, Figure 5 and Figure 6 provide a clear indication of the correspondence between the “observed” values of dependent variables and the predicted values produced by the LSTM and BiLSTM models for different forecast horizons. Although the forecasted series generally approximate the original ones quite well, the quality of forecasts obtained using BiLSTM models is better than forecasts obtained using LSTM models, which is especially noticeable as the forecast horizon increases. To determine the weights of neural networks, 50–100 epochs are sufficient, since a further increase in their number leads to the over-training of neural networks.

4. Discussion and Conclusions

Developing and using climate models to improve our understanding of the climate system and predict future climate change require enormous computational and intellectual resources. However, a powerful technique, called surrogate modelling, significantly reduces these resource requirements, since these detailed and physically based climate models are approximated using much simpler algorithms trained on simulation data. As we know, neural network-based deep learning technology is an important tool for building surrogate models of various physical systems and processes described by differential equations. In this context, it is of the utmost importance to study the neural networks’ capabilities to predict solutions to sets of differential equations of the form d x / d t = f x based on a sample of the state vector x, i.e., without knowing the function f x . It should be highlighted that DL can lead to positive results if the development of surrogate models is based on an understanding of the physics of the processes that actually shape the climate. This fact emphasizes the need to continue and intensify research into all aspects of surrogate modelling of climate processes and phenomena. In this regard, low-order models of dynamic systems that describe the essential features of the object under study, in particular its chaotic behaviour, can serve as a useful research tool. Simplified climate models can be used on their own (at least as a qualitative research tool) to study physical climate processes, and can also be used as an instrument for testing new approaches (for example, machine learning technologies and algorithms, and in particular deep learning) to modelling complex systems and processes.
We have shown the procedure, and in essence the methodology, of building a surrogate model (meta-model) that “replaces” the original, physically based, simplified model of the coupled “ocean–atmosphere” system whose behaviour becomes chaotic under certain conditions. The dynamical system, comprising two versions of the classical Lorenz’63 model with a small time-scale separation factor, is treated as a physically based model for which a surrogate model is built. A brief and intelligible description of the process of building a meta-model allows one to form a holistic understanding of the main and related problems that may arise.
The capabilities of two types of recurrent neural networks, unidirectional and bidirectional LSTMs, to build a surrogate model and predicting its nonlinear chaotic behaviour were explored. The results of numerous numerical experiments have shown that LSTMs can be successfully used to build surrogate models of nonlinear systems and predict solutions to differential equations without knowing their right-hand sides. Based on this, it can be concluded that the ability of neural networks to predict the future behaviour of nonlinear dynamic systems over certain time periods stimulates their use for the development of emulators of the climate system and its individual components, as well as parameterization schemes for use in complex models employed to simulate the global climate system. However, the accuracy of surrogate models is affected not only by the architecture and structure (hyperparameters) of neural networks but also by the availability of sufficient training data and its quality. Climate models project climate using observational data and some other factors as inputs, which are inherently uncertain. It is therefore important to assess how the uncertainty of training data affects the quality of climate modelling and projection. In this regard, exploring the capabilities of deep learning methods to represent climate model outputs in probabilistic form appears to be a very significant problem. However, this will be the subject of further research.

Author Contributions

Supervision, S.S.; administration, S.S.; funding acquisition, S.S.; conceptualization, S.S.; methodology, S.S.; software, S.S. and Y.A.; validation, S.S. and Y.A.; writing—original draft preparation, S.S.; writing—review and editing, S.S.; visualization, S.S. and Y.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by RSF, grant number 23-47-10003.

Data Availability Statement

All data used in this study are available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. IPCC. Climate Change 2013: The Physical Science Basis; Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change; Stocker, T.F., Qin, D., Plattner, G.-K., Tignor, M., Allen, S.K., Boschung, J., Nauels, A., Xia, Y., Bex, V., Midgley, P.M., Eds.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2013; p. 1535. [Google Scholar]
  2. IPCC. Climate Change 2021: The Physical Science Basis; Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Masson-Delmotte, V.P., Zhai, A., Pirani, S.L., Connors, C., Péan, S., Berger, N., Caud, Y., Chen, L., Goldfarb, M.I., Gomis, M., et al., Eds.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2021; p. 2391. [Google Scholar]
  3. Soldatenko, S.; Bogomolov, A.; Ronzhin, A. Mathematical Modelling of Climate Change and Variability in the Context of Outdoor Ergonomics. Mathematics 2021, 9, 2920. [Google Scholar] [CrossRef]
  4. Trenberth, K. Climate System Modeling; Cambridge University Press: Cambridge, UK, 2010; p. 820. [Google Scholar]
  5. Neelin, D. Climate Change and Climate Modeling; Cambridge University Press: Cambridge, UK, 2011; p. 304. [Google Scholar]
  6. Lloyd, E.A.; Winsverg, E. Philosophical and Conceptual Issues; Palgrave Macmillan: London, UK, 2019; p. 497. [Google Scholar]
  7. Palmer, T. Modelling: Build imprecise supercomputers. Nature 2015, 526, 32–33. [Google Scholar] [CrossRef]
  8. Lu, D.; Ricciuto, D. Efficient surrogate modeling methods for large-scale Earth system models based on machine-learning techniques. Geosci. Model Dev. 2019, 12, 1791–1807. [Google Scholar] [CrossRef]
  9. Bocquet, M. Surrogate modeling for the climate sciences dynamics with machine learning and data assimilation. Front. Appl. Math. Stat. 2023, 9, 1133226. [Google Scholar] [CrossRef]
  10. Weber, T.; Corotan, A.; Hutchinson, B.; Kravitz, B.; Link, R. Technical note: Deep learning for creating surrogate models of precipitation in Earth system models. Atmos. Chem. Phys. 2020, 20, 2303–2317. [Google Scholar] [CrossRef]
  11. Huntingford, C.; Jeffers, E.S.; Bonsall, M.B.; Christensen, H.M.; Lees, T.; Yang, H. Machine learning and artificial intelligence to aid climate change research and preparedness. Environ. Res. Lett. 2019, 14, 124007. [Google Scholar] [CrossRef]
  12. Boukabara, S.A.; Krasnopolsky, V.; Penny, S.G.; Stewart, J.Q.; McGovern, A.; Hall, D.; Hoeve, J.E.T.; Hickey, J.; Huang, H.-L.A.; Williams, J.K. Outlook for exploiting artificial intelligence in the earth and environmental sciences. Bull. Am. Meteorol. Soc. 2021, 102, E1016–E1032. [Google Scholar] [CrossRef]
  13. Dewitte, S.; Cornelis, J.P.; Müller, R.; Munteanu, A. Artificial intelligence revolutionises weather forecast, climate monitoring and decadal prediction. Remote Sens. 2021, 13, 3209. [Google Scholar] [CrossRef]
  14. Kashinath, K.; Mustafa, M.; Albert, A.; Wu, J.L.; Jiang, C.; Esmaeilzadeh, S.; Azizzadenesheli, K.; Wang, R.; Chattopadhyay, A.; Singh, A. Physics-informed machine learning: Case studies for weather and climate modelling. Phil. Trans. R. Soc. 2021, 379, 20200093. [Google Scholar] [CrossRef] [PubMed]
  15. Schultz, M.G.; Betancourt, C.; Gong, B.; Kleinert, F.; Langguth, M.; Leufen, L.H.; Mozaffari, A.; Stadtler, S. Can Deep Learning Neat Numerical Weather Prediction? Phil. Trans. R. Soc. A 2021, 379, 2020097. [Google Scholar]
  16. Bochenek, B.; Ustrnul, Z. Machine learning in weather prediction and climate analyses—Applications and perspectives. Atmosphere 2022, 13, 180. [Google Scholar] [CrossRef]
  17. Sun, Z.; Sandoval, L.; Crystal-Ornelas, R.; Mousavi, S.M.; Wang, J.; Lin, C.; Cristea, N.; Tong, D.; Carande, W.H.; Ma, X. A review of earth artificial intelligence. Comput. Geosci. 2022, 159, 105034. [Google Scholar] [CrossRef]
  18. Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; Tian, Q. Accurate medium-range global weather forecasting with 3D neural networks. Nature 2023, 619, 533–538. [Google Scholar] [CrossRef] [PubMed]
  19. de Burgh-Day, C.O.; Leeuwenburg, T. Machine learning for numerical weather and climate modelling: A review. Geosci. Model Dev. 2023, 16, 6433–6477. [Google Scholar] [CrossRef]
  20. Schneider, T.; Behera, S.; Boccaletti, G.; Deser, C.; Emanuel, K.; Ferrari, R.; Leung, L.R.; Lin, N.; Müller, T.; Navarra, A. Harnessing AI and computing to advance climate modelling and prediction. Nat. Clim. Change 2023, 13, 887–889. [Google Scholar] [CrossRef]
  21. Krasnopolsky, V. Applying Machine Learning in Numerical Weather and Climate Modeling Systems. Climate 2024, 12, 78. [Google Scholar] [CrossRef]
  22. Soldatenko, S.A. Artificial Intelligence and Its Application in Numerical Weather Prediction. Meteorol. Hydrol. 2024, 49, 283–298. [Google Scholar] [CrossRef]
  23. Forrester, A.; Sóbester, A.; Keane, A. Engineering Design via Surrogate Modelling: A Practical Guide; Wiley: Chichester, UK, 2008; p. 240. [Google Scholar]
  24. Bhosekar, A.; Ierapetritou, M. Advances in surrogate based modeling, feasibility analysis, and optimization: A review. Comput. Chem. Eng. 2018, 108, 250–267. [Google Scholar] [CrossRef]
  25. Gramacy, R.B. Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences; Chapman Hall/CRC: Boca Raton, FL, USA, 2020; p. 543. [Google Scholar]
  26. Jiang, P.; Zhou, Q.; Shao, X. Surrogate Model-Based Engineering Design and Optimization; Springer: Singapore, 2020; p. 240. [Google Scholar]
  27. Koziel, S.; Pietrenko-Dabrowska, A. Performance-Driven Surrogate Modeling of High-Frequency Structures; Springer: Berlin/Heidelberg, Germany, 2020; p. 419. [Google Scholar]
  28. Alizadeh, R.; Allen, J.K.; Mistree, F. Managing computational complexity using surrogate models: A critical review. Res. Eng. Des. 2020, 31, 275–298. [Google Scholar] [CrossRef]
  29. McDermott, P.L.; Wikle, C.K. Deep echo state networks with uncertainty quantification for spatio-temporal forecasting. Environmetrics 2018, 30, e2553. [Google Scholar] [CrossRef]
  30. Pasini, A.; Racca, P.; Amendola, S.; Cartocci, G.; Cassardo, C. Attribution of recent temperature behaviour reassessed by a neural-network method. Sci. Rep. 2017, 7, 17681. [Google Scholar] [CrossRef] [PubMed]
  31. Ham, Y.G.; Kim, J.H.; Luo, J.J. Deep learning for multi-year ENSO forecasts. Nature 2019, 573, 568–572. [Google Scholar] [CrossRef] [PubMed]
  32. Mu, B.; Qin, B.; Yuan, S.J. ENSO-ASC 1.0.0: ENSO deep learning forecast model with a multivariate air–sea coupler. Geosci. Model Dev. 2021, 14, 6977–6999. [Google Scholar] [CrossRef]
  33. Zhu, Y.C.; Zhang, R.-H.; Moum, J.N.; Wang, F.; Li, X.F.; Li, D.L. Physics-informed deep-learning parameterization of ocean vertical mixing improves climate simulations. Natl. Sci. Rev. 2022, 9, nwac044. [Google Scholar] [CrossRef] [PubMed]
  34. Zhou, L.; Zhang, R.-H. A self-attention-based neural network for three-dimensional multivariate modeling and its skillful ENSO predictions. Sci. Adv. 2023, 9, eadf2827. [Google Scholar] [CrossRef] [PubMed]
  35. Field, R.V.; Constantine, P.; Boslough, M. Statistical Surrogate Models for Prediction of High-Consequence Climate Change; Sandia National Laboratories: Albuquerque, NM, USA; Livermore, CA, USA, 2008; p. 38. [Google Scholar]
  36. Prieß, M.; Piwonski, J.; Koziel, S.; Slawig, T. Parameter identification in climate models using surrogate-based optimization. J. Comput. Methods Sci. Eng. 2012, 12, 47–62. [Google Scholar] [CrossRef]
  37. Rasp, S.; Pritchard, M.S.; Gentine, P. Deep learning to represent subgrid processes in climate models. Proc. Natl. Acad. Sci. USA 2018, 115, 9684–9689. [Google Scholar] [CrossRef] [PubMed]
  38. Brenowitz, N.D.; Bretherton, C.S. Spatially extended tests of a neural network parameterization trained by coarse-graining. J. Adv. Model. Earth Syst. 2019, 11, 2727–2744. [Google Scholar]
  39. Chattopadhyay, A.; Ashesh, K.; Hassanzadeh, P.; Subramanian, D.; Palem, K.; Jiang, C.; Subel, A. Data-Driven Surrogate Models for Climate Modeling: Application of Echo State Networks, RNN-LSTM and ANN to the Multi-Scale Lorenz System as a Test Case. ICML Workshop on Climate Change, Long Beach, CA, USA. 2019. Available online: https://www.climatechange.ai/papers/icml2019/22 (accessed on 10 July 2024).
  40. Hudson, B.; Nijweide, F.; Sebenius, I. Computationally-Efficient Climate Predictions using Multi-Fidelity Surrogate Modelling. arXiv 2021, arXiv:2109.07468. [Google Scholar]
  41. Yuval, J.; O’Gorman, P.A. Stable machine-learning parameterization of subgrid processes for climate modeling at a range of resolutions. Nat. Commun. 2020, 11, 3295. [Google Scholar] [CrossRef] [PubMed]
  42. Pawar, S.; San, O. Equation-free surrogate modeling of geophysical flows at the intersection of machine learning and data assimilation. J. Adv. Model. Earth Syst. 2022, 14, e2022MS003170. [Google Scholar] [CrossRef]
  43. Jin, Q.; Jiang, X.; Hua, F.; Yang, Y.; Jiang, S.; Yu, C.; Song, Z. GWSM4C: A global wave surrogate model for climate simulation based on a convolutional architecture. Ocean. Eng. 2024, 309, 118458. [Google Scholar] [CrossRef]
  44. Durand, C.; Finn, T.S.; Farchi, A.; Bocquet, M.; Boutin, G.; Ólason, E. Data-driven surrogate modeling of high-resolution sea-ice thickness in the Arctic. Cryosphere 2024, 18, 1791–1815. [Google Scholar] [CrossRef]
  45. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning (Adaptive Computation and Machine Learning Series); MIT Press: Cambridge, MA, USA, 2016; p. 800. [Google Scholar]
  46. Aggarwal, C.C. Neural Networks and Deep Learning; Springer: Cham, Switzerland, 2023; p. 529. [Google Scholar]
  47. Bishop, C.M.; Bishop, H. Deep Learning: Foundation and Concepts; Springer: Cham, Switzerland, 2024; p. 649. [Google Scholar]
  48. Balogh, B.; Saint-Martin, D.; Ribes, A. A toy model to investigate stability of AI-based dynamical systems. Geophys. Res. Lett. 2021, 48, e2020GL092133. [Google Scholar] [CrossRef]
  49. Lorenz, E.N. Deterministic nonperiodic flow. J. Atmos. Sci. 1963, 20, 130–141. [Google Scholar] [CrossRef]
  50. Pasini, A.; Langone, R.; Maimone, F.; Pelino, V. Energy-based predictions in Lorenz system by a unified formalism and neural network modelling. Nonlinear Process. Geophys. 2010, 17, 809–815. [Google Scholar] [CrossRef]
  51. Boffetta, G.; Crisanti, A.; Paparella, F.; Provenzale, A.; Vulpiani, A. Slow and fast dynamics in coupled systems: A time series analysis view. Phys. D 1998, 116, 301–312. [Google Scholar] [CrossRef]
  52. Boffetta, G.; Giuliani, P.; Paladin, G.; Vulpiani, A. An Extension of the Lyapunov Analysis for the Predictability Problem. J. Atmos. Sci. 1998, 55, 3409–3416. [Google Scholar] [CrossRef]
  53. Peña, M.; Kalnay, E. Separating fast and slow modes in coupled chaotic systems. Nonlinear Process. Geophys. 2004, 11, 319–327. [Google Scholar] [CrossRef]
  54. Siqueira, L.; Kirtman, B. Predictability of a low-order interactive ensemble. Nonlinear Process. Geophys. 2012, 19, 273–282. [Google Scholar] [CrossRef]
  55. Soldatenko, S.; Steinle, P.; Tingwell, C.; Chichkine, S. Some aspects of sensitivity analysis in variational data assimilation for coupled dynamical systems. Adv. Meteorol. 2015, 2015, 1–22. [Google Scholar] [CrossRef]
  56. Soldatenko, S.; Chichkine, D. Correlation and Spectral Properties of a Coupled Nonlinear Dynamical System in the Context of Numerical Weather Prediction and Climate Modeling. Discret. Dyn. Nat. Soc. 2014, 2014, 498184. [Google Scholar] [CrossRef]
  57. Dymnikov, V.P.; Filatov, A.N. Mathematics of Climate Modeling; Birkhauser: Boston, MA, USA, 1997; p. 264. [Google Scholar]
  58. Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach; Pearson: Hoboken, NJ, USA, 2021; p. 1168. [Google Scholar]
  59. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  60. Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM networks. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), Montreal, QC, Canada, 31 July–4 August 2005; Volume 4, pp. 2047–2052. [Google Scholar]
  61. Van Houdt, G.; Mosquera, C.; Nápoles, G. A review on the long short-term memory model. Artif. Intell. Rev. 2020, 53, 5929–5955. [Google Scholar] [CrossRef]
  62. Han, D.; Liu, P.; Xie, K.; Li, H.; Xia, Q.; Cheng, Q.; Wang, Y.; Yang, Z.; Zhang, Y.; Xia, J. An attention-based LSTM model for long-term runoff forecasting and factor recognition. Environ. Res. Lett. 2023, 18, 024004. [Google Scholar] [CrossRef]
  63. Rubasinghe, O.; Zhang, X.; Chau, T.K.; Chow, Y.H.; Fernando, T.; Iu, H.H.C. A novel sequence to sequence data modelling based CNN-LSTM algorithm for three years ahead monthly peak load forecasting. IEEE Trans. Power Syst. 2024, 39, 1932–1947. [Google Scholar] [CrossRef]
  64. Song, Y.; Tsai, W.P.; Gluck, J.; Rhoades, A.; Zarzycki, C.; McCrary, R.; Lawson, K.; Shen, C. LSTM-based data integration to improve snow water equivalent prediction and diagnose error sources. J. Hydrometeorol. 2024, 25, 223–237. [Google Scholar] [CrossRef]
  65. Heaton, J. Introduction to Neural Networks with JAVA; Heaton Research Publication: St. Louise, MO, USA, 2008; p. 439. [Google Scholar]
Figure 1. Time evolution of the fast and slow dynamic variables: (a) variables x and X; (b) variables y and Y; (c) variables z and Z.
Figure 1. Time evolution of the fast and slow dynamic variables: (a) variables x and X; (b) variables y and Y; (c) variables z and Z.
Climate 12 00189 g001
Figure 2. Trajectories of dynamical systems: (a) fast subsystem; (b) slow subsystem.
Figure 2. Trajectories of dynamical systems: (a) fast subsystem; (b) slow subsystem.
Climate 12 00189 g002
Figure 3. Comparison of prediction accuracy using LSTM and BiLSTM with different numbers of hidden layers, comprising 16 and 128 nodes: (a) Root mean square error; (b) Mean absolute error; 1—LSTM with 16 nodes; 2—LSTM with 128 nodes; 3—BiLSTM with 16 nodes; 4—BiLSTM with 128 nodes.
Figure 3. Comparison of prediction accuracy using LSTM and BiLSTM with different numbers of hidden layers, comprising 16 and 128 nodes: (a) Root mean square error; (b) Mean absolute error; 1—LSTM with 16 nodes; 2—LSTM with 128 nodes; 3—BiLSTM with 16 nodes; 4—BiLSTM with 128 nodes.
Climate 12 00189 g003
Figure 4. Testing models with 3 hidden layers comprising 128 nodes for 5-time step prediction horizon: (a) LSTM model; (b) BiLSTM model.
Figure 4. Testing models with 3 hidden layers comprising 128 nodes for 5-time step prediction horizon: (a) LSTM model; (b) BiLSTM model.
Climate 12 00189 g004
Figure 5. Testing models with 3 hidden layers comprising 128 nodes for 50-time step prediction horizon: (a) LSTM model; (b) BiLSTM model.
Figure 5. Testing models with 3 hidden layers comprising 128 nodes for 50-time step prediction horizon: (a) LSTM model; (b) BiLSTM model.
Climate 12 00189 g005
Figure 6. Testing models with 3 hidden layers comprising 128 nodes for 100-time step prediction horizon: (a) LSTM model; (b) BiLSTM model.
Figure 6. Testing models with 3 hidden layers comprising 128 nodes for 100-time step prediction horizon: (a) LSTM model; (b) BiLSTM model.
Climate 12 00189 g006
Table 1. Results of testing the model built on LSTM with 3 hidden layers, each comprising 16 and 128 nodes.
Table 1. Results of testing the model built on LSTM with 3 hidden layers, each comprising 16 and 128 nodes.
Forecast Lead Time in MTU16 Nodes128 Nodes
RMSEMAERMSEMAE
50.0100.0110.0100.005
100.0250.0140.0090.025
500.1360.0700.1360.057
750.1410.0750.1440.070
1000.2020.1370.1510.092
1500.2530.1920.2210.157
Table 2. Results of testing the model built on BiLSTM with 3 hidden layers, each comprising 16 and 128 nodes.
Table 2. Results of testing the model built on BiLSTM with 3 hidden layers, each comprising 16 and 128 nodes.
Forecast Lead Time16 Nodes128 Nodes
RMSEMAERMSEMAE
50.0060.0040.0020.002
100.0050.0040.0040.002
500.0510.0260.0370.012
750.1100.0580.1000.041
1000.1180.0580.1360.059
1500.2370.1650.2230.126
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Soldatenko, S.; Angudovich, Y. Using Machine Learning for Climate Modelling: Application of Neural Networks to a Slow-Fast Chaotic Dynamical System as a Case Study. Climate 2024, 12, 189. https://doi.org/10.3390/cli12110189

AMA Style

Soldatenko S, Angudovich Y. Using Machine Learning for Climate Modelling: Application of Neural Networks to a Slow-Fast Chaotic Dynamical System as a Case Study. Climate. 2024; 12(11):189. https://doi.org/10.3390/cli12110189

Chicago/Turabian Style

Soldatenko, Sergei, and Yaromir Angudovich. 2024. "Using Machine Learning for Climate Modelling: Application of Neural Networks to a Slow-Fast Chaotic Dynamical System as a Case Study" Climate 12, no. 11: 189. https://doi.org/10.3390/cli12110189

APA Style

Soldatenko, S., & Angudovich, Y. (2024). Using Machine Learning for Climate Modelling: Application of Neural Networks to a Slow-Fast Chaotic Dynamical System as a Case Study. Climate, 12(11), 189. https://doi.org/10.3390/cli12110189

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop