2.2. Season, Daily Cycle, Solar and Geomagnetic Activity
To predict foF2, we first need to obtain the parameters related to variations in foF2. Williscroft and Poole [
15] noted that the foF2 parameter is correlated with the peak electron density in the ionosphere, and is influenced by various factors including solar and magnetic activity, geographic location, local time and season. Furthermore, in accordance with their suggestions, the day of year (DOY) is utilized to quantify the season. However, the problem arising from this approach is that although there is only a one-day difference between December 31st and January 1st, the numerical difference is very large. Consequently, a neural network faces challenges in treating these days as contiguous. To address this, the DOY variable is divided into two separate inputs [
16,
17].
In a comparable manner, the hours of the day are divided into consecutive hours using the following formulas. We use local time (LT) rather than universal time (UT) to more accurately represent the actual time at the observing stations.
According to common sense, using solar cycle-related indices is a requirement for developing an ionospheric model. Due to the lack of prolonged data for extreme ultraviolet-specific indices, researchers have employed two distinct solar indices from various wavelength ranges, specifically sunspot numbers and the solar radio flux at 10.7 cm (F10.7) [
18,
19]. In spite of the strong correlation between these two parameters, they are still used simultaneously to build the model [
20]. It was posited by E.O. Joshua [
21] that foF2 changes are complicated and difficult to capture fully, thereby necessitating a focus solely on the sunspot number (SSN). The findings of Liu et al. [
22] indicate that incorporating F10.7 enhances the prediction accuracy and, therefore, should also be utilized in modeling foF2. Furthermore, Cao et al. [
23] discovered that the dependence of foF2 on the SSN and F10.7 is most pronounced during mid-solar years and weakest during high-solar years. Compared to F10.7, Bai et al. [
24] noted that the SSN has a more decisive influence on foF2 during quiet and active periods. However, F10.7 plays an important role in influencing foF2 over the SSN during periods when solar activity is moderate. Mursula et al. [
25], in their recent study, highlighted significant changes in the relationship between the F10.7 index and sunspot numbers over long timescales. They emphasize that these parameters should be considered separately in long-term solar studies, as their asynchronous evolution reflects different aspects of solar atmospheric changes. Simply put, the SSN and F10.7 are both indispensable. Employing a hybrid neural network can enhance the model’s ability to accurately capture the variations in foF2 across different levels of solar activity.
Similarly, Mikhailov et al. [
26] concluded that changes in foF2 clearly depend on geomagnetic activity parameters. The magnitude of magnetic activity can be measured with several indices, with the Ap index and the interplanetary magnetic field Bz component (IMF Bz) index being particularly relevant for predicting foF2. To more accurately differentiate between calm conditions and geomagnetic storms, additionally, the Disturbance Storm Time (Dst) index is employed [
27,
28].
It is noteworthy that in equatorial and low-latitude regions, the ionosphere exhibits irregularities in plasma density, leading to ionospheric scintillation. These irregularities complicate predictions of ionospheric conditions in these areas [
29]. Therefore, employing a single deep learning model falls short in capturing the intricate variations within the ionosphere accurately. Utilizing a hybrid neural network model facilitates a more effective learning and prediction of the intrinsic relationships between ionospheric outcomes and climate data, thereby enhancing the accuracy of forecasts.
Therefore, the model utilizes F10.7 and the SSN, in conjunction with geomagnetic indices including Ap, Dst and IMF Bz, as well as seasonal data and local time, as essential inputs for the model. In taking the data of 2009 and 2014 as an example, the hourly indices of Dst, Ap, F10.7, IMF Bz and the SSN during the solar-minimum and solar-maximum year are shown in
Figure 1.
2.3. Structure of the Hybrid Prediction Model
The model for predicting foF2, utilizing a hybrid architecture, combines a CNN, BiLSTM, and temporal pattern attention mechanism. A CNN, compared to traditional neural networks, is distinguished by local connections and weight sharing. It typically comprises a convolutional layer, where the convolution operation is fundamental [
30]. The strength of this operation lies in its ability to automatically detect and analyze local spatial patterns within the data, such as those found in ionospheric variations. By applying filters that move across the input data, the CNN can capture and convey the intricate characteristics of ionospheric variations more effectively, identifying important features across multiple scales while maintaining computational efficiency. In addition, the pooling operation of the CNN enables it to reduce the data dimension while preserving key information, which enables the model to focus on the most significant features of ionospheric changes.
A recurrent neural network (RNN) can retain input information of the previous time, which is its primary difference from traditional neural networks. It can handle inputs of any length, and the model does not change shape as the input length increases. However, it is difficult for RNNs to obtain information from long ago. Due to the large number of partial derivative multiplications, it is difficult to capture long-term dependencies, which is not conducive to processing time series data. Utilizing a gate controller to enhance weight control of memory across different time instances and incorporating cross-layer connections to mitigate the effects of the vanishing gradient problem, the long short-term memory network (LSTM) is a distinct type of RNN. It decreases the computational complexity of the RNN through its special structure [
31,
32]. However, the LSTM cannot process time series information from back to front. Therefore, the model can only obtain limited feature information [
33], which leads to increased prediction errors. On the contrary, due to the capability to deal with information in both directions, BiLSTM achieves better predictions through additional data training.
This study utilized MATLAB’s Deep Learning Toolbox to construct the BiLSTM framework. For an individual LSTM, the critical components are calculated as follows [
34]:
where
ft is the forget gate, and it uses the sigmoid activation function
to weigh the input
xt and the previous hidden state
ht−1 combined with its own bias
bf.
and
are the weight matrices acting on the current input and the previous hidden state, respectively. Similarly,
and
;
and
; and
and
represent the weight matrices of the input gate, output gate, and new unit state, respectively.
bi,
bc and
bo are their respective biases.
it is the input gate; this gate determines what new information is added to the cell state.
ct is the cell state; it is the core memory of the LSTM unit, which is updated in two steps. First, the old cell state
ct−1 is modified by the forget gate. Second, the input gate
it allows the addition of new candidate values scaled by the
tanh function of the current input and previous hidden state. The output gate
ot regulates the transfer of the cell state to the subsequent hidden state. It utilizes the sigmoid function to determine which components of the cell state are to be outputted. The output
ht of the LSTM hidden layer is calculated from
ot and the activation function
tanh applied to the cell state:
The attention mechanism draws on the theory of human attention. For example, during reading, we focus our attention on important information. For the model, the weights of the inputs are dynamically changing. The attention mechanism can learn these weights and understand which inputs are more important. In the typical attention mechanism in an RNN, the previous states are represented by hi. The context vector vt is derived from these preceding states. This vector vt is computed as a weighted sum of each column vector hi in the matrix H, encapsulating the relevant information for the current time step. This approach allows the model to focus on the most pertinent features from the recent past, enhancing the prediction accuracy for the current time step.
Consider a scoring function
that quantifies the correlation between input vectors. The calculation of
is similar to the softmax function, which normalizes the similarity between variables into a probability distribution in order to weight the input vector. The
vt is computed through the equation below:
Currently, many scholars have attempted to incorporate attention mechanisms into neural network models for the prediction of ionospheric parameters [
35,
36]. Although the traditional attention mechanism is capable of detecting the relationship between input vectors, there are shortcomings in applying it to the RNN for multivariate time series prediction. It identifies information pertinent recently and uses
vt to compute the weighted sum of the antecedent states. This is why it is ideal for tasks involving basic data at each time step. If each time step contains multiple variables, it has difficulty excluding variables that contribute noise, which diminishes the accuracy of forecasting. Furthermore, when confronted with numerous time steps, the traditional attention mechanism responds by averaging the information. These problems make traditional attention mechanisms unable to identify temporal patterns that are valuable for prediction [
37,
38].
Temporal pattern attention (TPA) is proposed to solve the shortcomings of the traditional attention mechanism in multivariate time series prediction [
39]. Assume that the multivariate time series given by the above LSTM is
X = {
x1,
x2, …,
xt−1}, where
denotes the actual value at time
i. The aim is to predict the value of
, where
is a variable increment. The corresponding predicted value is denoted as
. We use only {
xt−w,
xt−w+1,…,
xt−1} to predict
, where
w is a predefined, variable window size.
Figure 2 illustrates the TPA’s architecture.
In TPA, the input vector still needs to pass through the CNN first. We use
k filters
, where
T can be regarded as the number of column vectors contained within the window. The convolution operations yield
, denoting the result of operations between the
i-th row vector and the
j-th filter. The formula is as follows:
The same as before,
vt is a weighted sum of row vectors of
HC. Subsequently, the relevance is calculated using the scoring function
.
The following equation represents the attention weight
:
The
is obtained through the weighted sum of attention weight and the row vectors of
HC:
The final prediction is obtained by combining
and
:
where
Wh,
Wv and
are the weight matrices for the input vector
vt, the hidden state
ht, and the newly computed hidden state
h′t, respectively.
yt−1+Δ is the final predicted value.
Figure 3 provides a detailed depiction of the structure of the hybrid neural network prediction model proposed in this paper. The input data pass through a convolutional layer to extract spatial features, and then moves into a sequence expansion layer before entering the BiLSTM layer to capture temporal characteristics. The temporal pattern attention (TPA) focuses on time-related information, and after a series of processing steps, the output results are generated. The inputs to the model are shown in the following equation:
The output layer
combines the three layers, temporal pattern attention, BiLSTM and CNN. The output of the model is shown in the following formula, which is the series consequence of the three layers:
The model input is processed by the CNN layer and then handed over to the BiLSTM layer, and finally processed by the TPA layer to output the result. It is noted that the model requires evaluation metrics to assess the quality of its predictions, including the RMSE, MAE and MAPE. The root mean square error (RMSE) is the square root of the average of the squared differences between the predicted and actual values. It is sensitive to outliers and gives a higher weight to larger differences between predicted and actual values. The meaning of MAE is mean absolute error; it provides a straightforward average of errors without squaring them, meaning it treats all errors equally regardless of their magnitude. The full name of MAPE is mean absolute percentage error, which expresses the error as a percentage of the actual values, making it easier to understand in a relative context. Their calculation formulas are as follows:
where
represents the actual observed value of foF2, and
is the predicted value of the model.