1. Introduction
Earthquakes are profound natural disasters that threaten economic development and people’s lives and property. Pre-seismic anomalies have been found before almost all major earthquakes [
1], such as the 1995 Kobe
[
2,
3] and the 2008 Wenchuan
[
4]. Therefore, the research for extracting and detecting pre-seismic anomalies has been a top priority in recent studies.
In the past 20 years, seismologists have conducted ample research on seismic-related data, including electromagnetic data (EM data) [
5,
6,
7], geoacoustic data [
8,
9], radon [
10,
11,
12], stress [
13], and seismicity indicators [
14,
15,
16,
17,
18,
19,
20,
21,
22]. Among them, EM data and seismicity indicators are often employed for pre-seismic anomaly detection.
Seismicity indicators are usually calculated from earthquake catalogs by using Gutenberg Richter’s law [
16], which describes the relationship between the magnitude and frequency of earthquakes. The parameter
b in this law is considered a statistical parameter and is related to the area’s physical characteristics [
18,
19]. Reyes et al. [
14] calculated seven seismicity indicators based on the value of
b as input into artificial neuronal networks (ANN) for pre-seismic anomaly detection. Asencio-Cortés [
20] implemented these seven indicators, but he modified
T,
and
c. Fault density (FD) [
21] was proposed to describe the density of active faults in the study zone. The authors thought that there is a higher probability of a large earthquake occurring along active fault zones or in their proximity. Salam et al. [
22] calculated seven seismicity indicators based on an earthquake catalog for California. They predicted a specific magnitude in the following fifteen days. However, the reliability of the results is yet to be examined, and taking pre-seismic anomaly detection as a binary classification is a choice.
The EM data are considered to be closely related to earthquakes [
23,
24,
25]. Chen et al. [
26] summarized seismo-electromagnetic phenomena in the lithosphere, the lower atmosphere and the upper atmosphere. Hayakawa [
3] found a seismo-ionospheric perturbation before the 1995 Kobe earthquake (
Ms7.2), where the amplitude or phase of the very-low-frequency (VLF) reached a minimum at sunrise and sunset. It is worth mentioning that the same phenomenon was found in [
27]. Hayakawa analyzed the statistical correlation between VLF anomalies and earthquakes. He discovered that earthquakes with a source depth of less than 40 km are usually more correlated with VLF anomalies. Moustra et al. [
28] utilized magnitude data and seismic electric signal (SES) to predict earthquakes. Li et al. [
24] discovered that the anomalies of earthquakes appeared 20 days before the earthquake by analyzing the ultra-low-frequency (ULF) and VLF bands of electrical data.
Geographic activities are complex, which increases the difficulty of pre-seismic anomaly detection. A survey of available research is listed in
Table 1. Some seismicity indicators calculated from earthquake catalogs are treated as inputs of pre-seismic detection models [
14,
20,
21,
22,
29]. However, these studies mainly focus on the choice of seismicity indicators, but some small earthquakes may be a mistake due to the lack of sensor accuracy. It is necessary to calculate the minimum completeness magnitude and then ignore earthquakes smaller than the minimum completeness magnitude [
30]. In addition, some research on EM data ignores the effects of magnetic storms, which usually yield anomalous fluctuations [
31]. Sometimes the EM data during a magnetic storm is excluded, which leads to missing data. Therefore, how to deal with missing data is still a problem. Moreover, the above studies detected pre-seismic anomalies by using one kind of data. It is a promising direction to use multiparameteric data. It can make full use of the complementarity between different related data and improve the accuracy of pre-seismic anomaly detection.
A novel approach based on the fusion of seismicity indicators and EM data is proposed in this paper. Considering that there may exist missing data, the earthquake cross partial multi-view data fusion approach (EQ-CPM) is proposed based on a partial multi-view fusion method [
23]. This approach tolerates the absence of data and complements the missing part in fusion. Firstly, the effectiveness of EM data and seismicity indicators was analyzed based on two earthquake case studies. Then, EQ-CPM was applied to acquire fusion data from seismicity indicators and EM data. Finally, a comparison of pre-seismic anomaly detection between fused data and two original data was conducted based on four machine learning methods: logistic regression, AdaBoost based on decision trees, stochastic gradient descent (SGD) and support vector machines (SVM). The main contributions of this paper are summarized as follows:
The effectiveness of EM data and seismicity indicators was further validated in pre-seismic anomaly detection based on the 2017 Jiuzhaigou 7.0 earthquake and the 2019 Changning 6.0 earthquake in China.
Propose an earthquake data fusion approach that leverages the temporal characteristic of earthquake data by convolution. The proposed approach tolerates the absence of data and complements the missing part in fusion.
The following content is organized as follows:
Section 2 introduces the datasets used in this paper, including EM data and seismicity indicators;
Section 3 specifies the data fusion algorithm;
Section 4 analyzes EM data, seismicity indicators and the fused data separately and compares the pre-seismic anomaly detection of fused data and two original data;
Section 5 presents a summary of this paper.
2. EM Data and Seismicity Indicators
This section introduces the data used to detect pre-seismic anomalies, including EM data and seismicity indicators. The details of the data are presented in each subsection.
2.1. EM Data
The Key Laboratory of Integrated Microsystems (IMS) of Peking University Shenzhen Graduate School deployed the acoustic and electromagnetics on an artificial intelligence (AETA) system. AETA provides the EM data used in our study. The EM data are collected at 0.1 Hz. There are 159 AETA stations in Sichuan and Yunnan (22° N–34° N, 98° E–107° E). Unfortunately, the datasets are incomplete due to weather or equipment failure. Therefore, stations with more than 30% missing data were filtered out. As a result, 74 stations were selected for this study, which is shown in
Figure 1.
There are 51 types of EM data features provided by AETA, which are shown in
Figure 2. The features are divided into three categories: time-domain-related features, frequency-domain-related features and wavelet-transform-related features.
The following is some description of these features, and more details can be found on the homepage of AETA.
Some basic statistical features, such as mean value and maximum value, were extracted. The variance is the expectation of the squared deviation of the original signal from its population mean or sample mean. The power means the energy of the signal, and in most cases, the power of the time series is reckoned in terms of the square of the signal. The skewness indicates the asymmetry of the observed signal. The kurtosis represents how often outliers occur and reflects anomalies of the signal to a certain extent. The energy of the signal of different frequency bands is expressed by power_rate_atob. levelx means the reconstruction of the xth level while wavelet transforming. The features with as a prefix represent relative features in ultra-low frequencies (300 Hz to 3 kHz).
Solar activity can cause disturbances in the magnetosphere of Earth, which are generally known as magnetic storms. Strong storms can cause significant disruptions in EM data, leading to many false alarms. The magnetic storm can be measured through the disturbance storm time (Dst) index. Magnetic storms can be divided into five categories: weak magnetic storms (), medium magnetic storms (), strong magnetic storms (), intense magnetic storms () and giant magnetic storms ( nT). In our study, the data were excluded when nT, even if the data are viewed as anomalies.
2.2. Seismicity Indicators
Earthquake catalogs typically contain the time, latitude and longitude of the epicenter, magnitude and depth of the source. The earthquake catalog is available from the National Earthquake Data Center (NEDC). The Gutenberg–Richter law [
16] is used when calculating seismicity indicators from the earthquake catalog, which states the relationship between magnitude and frequency of earthquakes:
where
a and
b is constant and
N is the number of earthquakes with a magnitude greater or equal to
M. The
b value is a vital seismicity indicator. Reyes et al. [
14] calculated seven seismicity indicators for pre-seismic anomaly detection based on the
b value. Maximum likelihood method is used to estimate the
b value [
32]:
where
represents the average of magnitude,
is the completeness magnitude and
is the magnitude accuracy (equals 0.1 in most cases). The calculation is affected by the completeness of the earthquake catalog [
30]. Therefore, we have to check the completeness of the earthquake catalog and get the completeness magnitude. Then, the earthquake events smaller than the completeness magnitude should be ignored.
The maximum curvature method (MAXC) was used to calculate the completeness magnitude of the earthquake catalog as
0.3, as shown in
Figure 3. The blue box represents the number of earthquakes larger than the current magnitude; the orange triangle is the number of earthquakes equal to the current magnitude. The green curve is the fitted curve. The red pentagram is the maximum curvature point, and the magnitude
0.3 is the completeness magnitude of this earthquake catalog.
A sliding window was used to generate seismicity indicators for each AETA station, and the length of the window was one day. Seismicity indicators calculated from the earthquake catalog are listed in
Table 2.
b and a are the constants of the Gutenberg–Richter Law; they imply the activity of seismic events to some extent. T represents the time interval between N earthquakes, and this indicator partially reflects the frequency of earthquakes. is the average magnitude of n seismic events. describes the release of seismic energy; seismic energy is continuously released from the ground through small earthquakes; and a seismic quiet period is reached when the energy stops being released. means the maximum magnitude in the past seven days. is the probability of an earthquake with magnitude six or greater. represents the deviation value of b. indicates the correspondence of the real data distribution and the Gutenberg–Richter law. is the difference between the expected maximum magnitude and the observed maximum magnitude. represents the time between two earthquakes with magnitude greater or equal to .
3. Method
This section introduces the fusion method, including the network design and loss function design, as shown in
Figure 4. The network is a generative model and converts latent representations into seismic-related data. Reconstruction and classification loss functions are applied to update the network parameters and the latent representations.
3.1. Network
The network is designed as a generative model with two layers: the 1D convolution layer and the densely-connected layer. Each view (dataset) corresponds to a model with the same structure but different parameters, as shown in
Figure 5.
The 1D convolution layer can extract deep features of latent representations, which takes advantage of the temporal characteristic of the sequence. The size of the convolution kernel is 3. The features of the adjacent three days are converged when the convolution kernel slides. Zeros are padded evenly into the upstream and downstream parts of the input such that the output has the same length as the input. The convolution layer is followed by a densely-connected layer, whose main roles are feature aggregation and mapping. The output is the generated seismic-related data, including EM data and seismicity indicators. The network parameters and the latent representations are updated by comparing output with true data.
The network can fuse missing data because the data are not used directly but in the process of backpropagation. In other words, the problem of missing is put into the part of the loss function.
3.2. Loss Function
Given a dataset
X with multiple views:
where
V is the number of views and
N is the number of samples. The latent representations of
X are expressed as
, where
and
.
Reconstruction and classification loss functions are applied to update the network parameters and the latent representations, as shown in
Figure 4. The reconstruction loss function is mainly used to update network parameters. Both loss functions are used to update the latent representations.
Matrix
describes the completeness of
X, and
denotes the
j-th view of the
i-th sample data that are available or missing. For the convenience of expression,
is used to describe the completeness of the
v-th view for all samples. Missing parts of the data are ignored by multiplying
when calculating the reconstruction loss (see Equation (
4) for details).
where
is the forward propagation of the network and
are the parameters of the
v-th view.
Classification loss is derived from triplet loss [
33], which increases the distance between different classes by clustering.
where
is cosine similarity indicating the similarity of
and
, and the higher the similarity, the greater the value.
is the prediction of the
i-th sample. The expression of
represents all the samples that belong to class
. A margin distance denoted as
in
is aimed at avoiding the training process stopping before convergence.
Classification loss is controlled by
, and the combined loss function is:
is used to update the parameters of the network, and L is used to update the latent representations. During training, update H and alternately.
4. Results
This section presents the results and analysis of seismicity indicators, EM data and the fused data. Firstly, EM data and seismicity indicators are assessed, which is a prerequisite for fusion. Then, a comparison between the fused data with two original data is presented.
4.1. Assessment of Seismicity Indicators
The earthquake catalog from 2009 to 2017 was selected to analyze the changes in
b before the 8th of August 2017, Jiuzhaigou Ms7.0 earthquake in the study area of 32.5° N to 34° N, 103° E to 104.5° E, as shown in
Figure 6a.
The epicenter of the Jiuzhaigou
earthquake was located at the intersection of the Kunlun Fault Zone and the Minshan Tectonic Zone. In this area, the most recent earthquake of magnitude seven or greater occurred in 1976, well before the occurrence of Jiuzhaigou
. In other words, there was a seismic quiet period between 1976 and 2017. It is worth mentioning that no earthquakes with a magnitude greater than four occurred from 2009 to 2017. In the region’s surrounding areas, the Wenchuan
earthquake occurred on 12 May 2008, and the Lushan
earthquake occurred on 20 April 2013. The strong earthquakes in the surrounding areas increased the coulomb stress in this region and promoted the occurrence of Jiuzhaigou
. This geotectonic situation can be reflected to some extent by the trend in the b value in
Figure 6b.
Changning
occurred in June 2019. Earthquakes that occurred between 2018 and 2019 (in
Figure 6c) were selected to study the value of b before Changning
. The value of
b increased before Changning
and decreased after the earthquake, as shown in
Figure 6d. The trend matches the change pattern of
b presented in [
17]. However, the change trend was subtle and not as obvious as before Jiuzhaigou
. The reasons are complex. Unlike the Jiuzhaigou area, there are more earthquakes near Changning area, including earthquakes of magnitude 5 or higher. There were more than ten thousand earthquakes within two years after the 2018 event around Changning. On the contrary, only 5561 earthquakes occurred between 2009 and 2017 around Jiuzhaigou. Namely, the coulomb stress accumulated around Jiuzhaigou, and this contributed in part to the
earthquake. Contrary to this, the coulomb stress was released through continuous earthquakes before Changning
, such as
in December 2018 and
in January 2019, and these two earthquakes may have caused the change in
b to not be obvious.
4.2. Assessment of EM Data
The Dst index shows no apparent magnetic storms before the occurrence of Jiuzhaigou
, 2017. We found that anomalies in EM data existed both before and after the earthquake. The types of anomalies varied from station to station, which can be roughly classified as period anomalies, amplitude anomalies and waveform-disturbance anomalies. The different anomalies may be related to the distance between each station and the epicenter. As shown in
Figure 7:
The periodic disturbance anomalies were found when the distance between the station and the epicenter was closer than 200 km; see
Figure 7(ai).
The amplitude anomalies appeared when the distance between the station and the epicenter was about 400 km; see
Figure 7(aii,aiii).
The waveform-disturbance anomalies appeared when the distance between the station and the epicenter was farther than 1000 km; see
Figure 7(aiv,av).
To analyze the statistical characteristics of the EM data, firstly, we generated the subseries of the EM data for each station by sliding windows. Then, the fractal dimension of each subseries was calculated using the Higuchi fractal algorithm [
34]. Finally, the fractal dimensions were analyzed by using the interquartile range (IQR) method. We took Changning
as the earthquake case.
Stations 48 and 77 were selected for analysis because their epicenter distances were lower than 100 km. The fractal dimension is shown in
Figure 8a. It can be seen that before the Changning
earthquake, simultaneous fractal dimension anomalies occurred at the two stations on 20 April 2019 and 15 May 2019. It is worth mentioning that almost the values are negative, which means the fractal dimension decreased before the earthquake [
35]. In addition, Station 48 had more anomalies from June 17 to early July, and one of possible reasons is that it is closer to the epicenter, as shown in
Figure 8b. In contrast, Station 77 had only one anomaly of a smaller magnitude from June 17 to early July. A clear hypothesis is that the closer the epicenter, the more the EM data can reflect the associated anomalies.
4.3. Comparison between Fused Data and Single-Modal Data
The ultimate goal of data fusion is to improve the accuracy of detection tasks. In this paper, pre-seismic anomaly detection is viewed as a binary classification problem, where positive samples are pre-seismic anomalies and negative samples are background data. The fused data are high-dimensional data and challenging to visualize. The t-SNE method [
36] reduced the data to two dimensions for data visualization.
After dimensionality reduction, the data are only a projection of a two-dimensional plane, without any physical meaning. However, we can visualize the two-dimensional data to analyze the distribution of positive samples and negative samples. When there is an overlap of positive and negative samples, the negative samples will cover the positive samples. The more red points (positive samples), the easier it is to distinguish between positive and negative samples.
It can be seen in
Figure 9 that it is easier to detect positive samples from fused data, because there are more red points. As a pre-processing step for pre-seismic anomaly detection, the data fusion is meaningful. The overlap of positive and negative samples may lead to important information being hidden. In contrast, the distribution of fused data is more dispersed, probably because in the process of reconstruction, the loss function
increases the distance of different classes and influences all samples meanwhile.
Four machine learning methods were applied to compare the fused data with EM data and seismicity indicators. The dataset was split into a training set and a test set in the ratio of 8:2. The SMOTE method was used to alleviate the data imbalance problem. Five metrics were used to measure the detection result: PPV (positive predictive value), NPV (negative predictive value), recall, TNR (true negative rate), MCC (Matthews correlation coefficient) and AVG, where AVG is the average of PPV, NPV, recall and TNR.
Table 3 shows the detection performance of the fused data, EM data and seismicity indicators of station 47. As
Table 3 depicts, the values of six metrics of fused data are large except
based on decision tree method. The good performance mainly resulted from increasing the distances of different classes through
. The bad result of the decision tree method was probably caused by the imbalance of the dataset. The proposed method, EQ-CPM, complements the missing part of data through
, and fully utilizes the complementary information between different views (different dataset). Therefore, in two overall metrics,
and
, the value of the fused far exceeds those of the two original data.
There were 74 stations selected for the study, and the data fusion results of these 74 stations are given in
Table 4. For the imbalance dataset, the recall metric is more important, and a lower recall value in pre-seismic anomaly detection means a lower leakage rate and a better detection effect. With the recall rate as the main index, the effect of fusion is considered better when the recall rate of the fused data is higher than that of the EM data and the seismicity indicators. All four methods had better fusion effects for more than 50% of the stations, indicating that the method proposed in this paper has better generalization for seismic data.
The results in this paper are compared with those of existing studies, as shown in
Table 5. The fused data are better than EM data in all indexes but inferior to the seismicity indicators in some indexes, such as NPV and recall. A possible reason is that earthquake catalogs are different in different regions. However, the performance of the fused data in both AVG and MCC is much better than that of the EM data and seismicity indicators. That is to say, the fused data has a better overall performance. Since the distances of some classes are larger, the false positives and false negatives, which are related to the metrics PPV, NPV, recall and TNR, can be significantly reduced.
5. Conclusions
In this paper, we leveraged the advantage of multimodal data for pre-seismic anomaly detection. EM data and seismicity indicators were considered. First, the effectiveness of EM data and seismicity indicators with related earthquake cases was investigated. The anomalies were further confirmed by focusing on the b-value change curves before the occurrence of the Jiuzhaigou , 2017 earthquake. At the same time, different EM data anomalies were discovered from different AETA stations before the Jiuzhaigou , 2017 earthquake. Multiple anomalies in the fractal dimension of the EM data were found in relation to the Changning , 2019 earthquake.
There are few relevant studies that have detected pre-seismic anomalies using multiparameteric data. We proposed a method called EQ-CPM to fuse EM data and seismicity indicators. The proposed method tolerates the absence of data and complements the missing data by fusion. Therefore, we can improve the use of a dataset with missing parts via EQ-CPM. Meanwhile, the proposed method brings together complementary information between different datasets during the reconstruction and leverages the advantage of multimodality. The fused data surpassed the original data in performance when used by four classical machine learning algorithms. Most importantly, more relevant datasets can be fused by EQ-CPM if possible.