1. Introduction
After a period of relatively low seismicity, South Korea suddenly experienced two moderate earthquakes, the 16 September 2016 Gyeongju Earthquake and the 15 November 2017 Pohang Earthquake [
1,
2]. These events ignited multiple discussions on the vulnerability of South Korean infrastructure to seismic events, especially considering that both earthquakes occurred near nuclear power plants. The state-of-practice in evaluating how earthquakes can impact engineered infrastructure is seismic hazard analysis [
3,
4,
5,
6,
7]. Seismic hazard analyses on major projects are typically conducted using a probabilistic approach, whereby stochastic models are used as components in solving the hazard integral. These models, such as modern ground motion prediction equations and recurrence relationships, are generally calibrated to moment magnitude, M
W [
8,
9,
10,
11,
12]. However, earthquake catalogs used to develop tools for use in probabilistic seismic hazard analysis tend to have earthquakes listed with differing magnitude types, such as surface wave magnitude, M
S, short-period body wave magnitude, m
b, the local magnitude, M
L, also commonly known as the magnitude on the Richter scale [
13,
14,
15,
16]. Notably, M
L requires a local calibration process that takes into account site and path effects that make it unique to a region [
17]. Accordingly, using a variety of earthquake magnitude types in earthquake research or practical applications is cumbersome and computationally inconsistent as some agencies may use different standards for the same magnitude [
18].
To overcome this issue, seismologists and earthquake engineers conduct magnitude homogenization. Magnitude homogenization entails developing relationships between the various magnitude types to a single magnitude type. These relationships typically relate M
w from M
s, m
b, M
L, or any other additional magnitude types relevant to the region of interest [
19,
20,
21]. Magnitude homogenization is not a trivial matter as there are considerations with regression techniques and relationship priority, which might or might not influence calculations [
20,
22,
23]. The reader is encouraged to refer to the work of Pujols [
24] as it provides a good review on regression issues stemming from magnitude homogenization. The work essentially suggests Deming regression may not always be the best method in predicting earthquake magnitudes. Moreover, few low magnitude earthquake events list an M
W, making it difficult for certain types of application.
Given this backdrop, this study proposes a variety of relationships to homogenize earthquake magnitudes estimated within the South Korean region to MW. This involves developing an earthquake catalog, which should include events within and immediately outside of South Korea, approximately 200 km away, to allow the results to be applicable and consistent with seismic hazard studies. Regression techniques, such as ordinary least squares and total squares regression, will be applied to this expanded catalog to help understand how these techniques influence ranking and priority amongst the various magnitude types.
2. Materials and Methods
Several earthquake catalogs will be required to develop appropriate magnitude homogenization relationships for the South Korean region. Specific catalogs used in this study include:
International Seismological Centre (ISC), Reviewed ISC Bulletin. This is a global catalog of seismic events recorded from all over the world starting from 1900. The ISC bulletin is updated periodically [
25,
26]. Many international seismological agencies submit data to the ISC. These agencies include the United States Geologic Survey National Earthquake Information Center (NEIC) as well as the Commission for the Comprehensive Nuclear-Test-Ban Treaty Organization International Data Center (IDC). The NEIC was formerly known as the National Earthquake Information Service (NEIS), while the IDC was formerly known as the Experimental International Data Center (EIDC).
Global Centroid Moment Tensor catalog (GCMT). This is an earthquake moment tensor catalog for global earthquakes [
27]. Data starts from 1976.
ISC-GEM Global Instrumental Earthquake Catalog. This catalog focuses primarily on large earthquakes in the ISC Bulletin, where the data is more reliable, to improve, extract, and constrain certain earthquake parameters [
19,
28,
29,
30]. When available, ISC-GEM uses M
W stated by GCMT. Data starts from 1904.
Korea Meteorological Administration (KMA). This is a local catalog of seismic events local to the Korean peninsula [
31]. Most of the data is recorded as M
L due to historical reasons [
21]. Data starts from 1978. This online database is in Korean.
A search is made within these catalogs to compile a set of earthquakes relevant to the South Korean region. Events from the beginning of 1900 until the end of 2020 were considered. The search region was restricted to within 200 km of the mainland border and the islands of South Korea within 31° to 41° N and 122° to 134° E. Depth was limited to 40 km as any earthquake deeper could be below the crust and from the mantle. Earthquakes originating in the mantle have different wave propagation behavior and mechanisms than those within the crust, concerns typically not accounted for in seismic hazard analyses. Additionally, when the option was available, a minimum magnitude of 2.5 was selected as it is assumed earthquakes with magnitudes at 2.5 or less would not influence magnitude homogenization model development. It is rare to see an earthquake catalog with listed events at MW below three. Moreover, there will be no distinction between a main shock and foreshocks or aftershocks.
Using the compiled earthquake catalog, earthquakes with a listed MW in addition to other magnitude types will be plotted. Data of the same magnitude type from related seismological agencies will be grouped into one representative agency. For example, MS from NEIC will include MS data from NEIC as well as NEIS, and mb data from IDC will include mb data from both IDC and EIDC.
Continuing with the previous examples, this suggests a few agencies will have listed events with an M
W, such as ISC-GEM and GCMT. Data from GCMT is used as the base reference [
19,
20] and therefore this study will treat M
W,GCMT as the base M
W. Additionally, although not an international seismological agency, NIED is considered to have accurate constraints of the moment tensors for earthquakes in Japan and the surrounding regions [
19,
20]. Therefore, earthquakes listed with a M
W from NIED will be included in a base list. This base list will contain M
W data as the dependent variable in regression, with M
W,GCMT having first priority, M
W,ISC-GEM having second priority, and M
W,NIED having third priority. Earthquake data where there are five or fewer earthquakes from a single representative agency of one non-M
W magnitude type will not be considered in the regression.
Two regression approaches are considered; ordinary least squares (OLS) and total least squares (TLS) regression. Whereas OLS regression minimizes the residuals of the dependent variable, TLS regression attempts to minimize a distance metric that is a function of residuals between the dependent and independent variables. When this function is the Euclidean distance, the method is called orthogonal regression. TLS herein refers to orthogonal regression. An effective TLS technique, Deming regression, is not considered here as the bivariate method requires errors with known variances. Although ISC provides a magnitude error estimate in their Bulletin, other seismological agencies do not. The reader should note that when the ratio of the known variances is one, Deming regression is the same as orthogonal regression. The resultant regressions for MW will be termed MW,proxy herein. Regressions will not be performed on agencies that list four or fewer earthquakes for a specific magnitude type.
To help prioritize magnitude relations, the standard deviation will be used as a metric. For OLS, σOLS is defined as the standard deviation of the residuals. For TLS, σTLS is defined as the standard deviation of the Euclidean distances between data points and the regressed model. These standard deviations will serve as a metric to help prioritize and rank the magnitude homogenization relationships.
3. Results
The resultant earthquake catalog has 1544 earthquakes with contributions from multiple agencies.
Table 1 describes the seismological agencies and their country of operation. As previously shown, these agencies are typically described by a three or four letter code. Note that BJI and PEK are treated as the same agency, while NEIC, NEIS, and CGS are also treated as the same agency.
Table 2 describes the base list and shows the number of events with an assigned M
W per seismological agency. There are 15 and 14 events in GCMT and ISC-GEM, respectively, with an overlap of five events. These five events shared the same magnitudes. NIED listed 135 events with a reported M
W, however only nine events shared a M
W,GCMT listing. As a result, there are 149 events with a listed M
W by ISC-GEM, GCMT, or NIED. Surprisingly, there are only eight and seven events with a reported M
W by NEIC and JMA, respectively.
Table 2 also lists the number of earthquakes listed by ISC-GEM, GCMT, NIED, NEIC, and JMA that share an event listed in GCMT. These data suggest most of the ISC-GEM and NIED data are not listed within the GCMT database.
A plot of M
W versus M
W,NEIC and M
W,JMA is presented in
Figure 1 to help ascertain how correlated the M
W data set are. Data compiled from a previous study in Africa is also shown in the background of
Figure 1 to show that the dataset are within previous work [
20]. Overall, M
W,NEIC and M
W,JMA show a general one-to-one relation, but there is some slight scatter. Performing linear and total squares regression on the data set results in:
With coefficients of determination R2 = 0.941 and 0.986 for Equations (1) and (2), respectively. The regression coefficients are similar with the σOLS being greater than σTLS for both relationships. These relationships also suggest MW,JMA is a better indicator of MW than MW,NEIC.
To help increase the base list of M
W, some statistical hypothesis testing is conducted to determine if Equations (1a) and (2a) are similar to a one-to-one line (slope = 1, intercept = 0) representing what M
W estimates should be. Hypotheses testing performed on Equation (1a) found no significant difference against a slope of 1,
tcrit(6,0.05) = 2.45,
p = 0.34 and no significant difference against an intercept of 0,
tcrit(6,0.05) = 2.45,
p = 0.33. Hypotheses testing performed on Equation (2a) found no significant difference against a slope of 1,
tcrit(4,0.05) = 2.78,
p = 0.82 and no significant difference against an intercept of 1,
tcrit(4,0.05) = 2.78,
p = 0.82. This implies the relationships described by Equations (1a) and (2a) are not different from a one-to-one line. Hypotheses testing is not conducted on the TLS results as the residuals are not well-defined within the framework. As such, it is assumed the results from statistical hypothesis testing encompass the results from TLS. Due to this, the M
W,NEIC and M
W,JMA data are considered as M
W and placed in the base list, with M
W,JMA having priority over M
W,NEIC. In actuality, only one additional event was added to the base list, for a total of 149 earthquakes with a listed M
W. Given a baseline M
W,
Table 3 presents the number of events that contain M
W and any magnitude (i.e., M
S, m
b, M
L etc.) according to agency. The table shows most of the data for analyses are from IDC and JMA with very limited data sets from KEA, a North Korean agency.
A plot of M
w and M
S pairs from multiple agencies is shown in
Figure 2. M
S data from ISC, NEIC, BJI, MOS, and IDC is shown in
Figure 2a–e, respectively. The figures show relatively little scatter and the data appear linear with R
2 = 0.935, 0.725, 0.940, 0.899, and 0.908 for ISC, NEIC, BJI, MOS, and IDC, respectively. Furthermore, the M
S data does not show signs of saturation, a phenomenon where magnitude does not change as M
W continues to increase. Mathematically, M
S saturation tends to reveal itself with earthquakes approaching M
S~8. On the lower end of the magnitude range, the figure shows NEIC, BJI, and MOS to have no data pairs M
S < 4, while ISC and IDC have data pairs for M
S < 4. For comparison, the ISC data generally plots closer to the right-side boundary of a global earthquake catalog data set [
19].
OLS regressions on the M
S data result in the following relationships:
Additionally, TLS regressions on the M
S data result in the following relationships:
Figure 2 shows that both OLS and TLS regressions produce similar relationships for the magnitude ranges presented.
Similar to
Figure 2, a plot of M
w and m
b pairs from multiple agencies is shown in
Figure 3. m
b data from ISC, NEIC, BJI, MOS, and IDC is shown in
Figure 3a–e, respectively. The data show some scatter but appear linear with R
2 = 0.891, 0.806, 0.814, 0.821, and 0.804 for ISC, NEIC, BJI, MOS, and IDC, respectively. The data also hint at potential magnitude saturation at m
b ~ 6, which is what is expected. Similar to the M
S data, the figure shows NEIC, BJI, and MOS to have no data pairs m
b < 4, while ISC and IDC have data pairs for m
b < 4. Additionally, the ISC data generally plot closer to the right-side boundary of a global earthquake catalog data set [
19], similar to that shown in
Figure 2.
OLS regressions on the m
b data result in the following relationships:
Additionally, TLS regressions on the m
b data result in the following relationships:
Figure 3 shows TLS regressions on NEIC, BJI, MOS, and IDC data to be steeper relative to OLS regressions most likely due to the 2005 March 20 M
W 6.6 Fukuoka earthquake.
Figure 4 plots M
W and M
L pairs from multiple agencies. Data from KMA, JMA, BJI, KEA, and IDC are shown in
Figure 4a–e, respectively. In the ISC Bulletin, magnitudes listed as M should be treated equivalent to M
L [
16]. Since the local magnitude in the JMA seismological bulletin is listed as M
JMA, the local magnitude used by JMA will be termed M
JMA herein for consistency. Data from JMA, BJI, and KEA appear linear with decent scatter having R
2 = 0.895, 0.838, and 0.930 for JMA, BJI, and KEA, respectively. The KMA data appear relatively scattered with R
2 = 0.780 while the IDC data appear the most scattered with R
2 = 0.408.
The figures show there is a lack of M
L data pairs for BJI and KEA at M
L < 4, while KMA, JMA, and IDC are showing a good amount of data for M
L < 4. Interestingly,
Figure 4a appears to show a type of magnitude saturation where, as M
L decreases, M
W stays within a small range. Considering this, a bilinear relationship is proposed where an OLS and TLS regression is applied on magnitude pair data above M
L = 3.6, but kept constant for M
L < 3.6. In this approach, R
2 for data where M
L < 3.6 is 0.810, a moderate increase from 0.780, but still suggesting a linear relationship.
OLS regressions on the M
L data result in the following relationships:
Additionally, TLS regressions on the M
L data result in the following relationships:
In addition to M
W and M
JMA, JMA also uses a displacement magnitude, M
D,JMA, and a velocity magnitude, M
V,JMA [
32,
33,
34]. Earthquakes with these JMA displacement and velocity-based magnitudes and a corresponding M
W are plotted in
Figure 5. The data for M
D,JMA is not as plentiful, ranging from M
D,JMA~4.0 to 6.0. On the other end, M
V,JMA appears to be associated with events at the lower magnitude range, with M
V,JMA~3.5 to 5.0. Both magnitudes show linear behavior and little scatter.
OLS regressions on these magnitude types result in the following relationships:
Additionally, TLS regressions on these magnitude types result in the following relationships:
As stated previously, the σ’s will be used to prioritize magnitude relations. Note that σ
TLS treats the residual as the shortest distance from data points to the regressed line.
Table 4 compares the ranking of each relationship in ascending order for OLS and TLS. The table shows that based on σ, M
V,JMA, M
D,JMA, M
S,MOS, and M
S,BJI have the highest, second, third, and fourth highest priorities, respectively. However, after these first four magnitude types, there appears to be a difference in rankings depending on the type of regression used. For example, the M
S,ISC to M
W relationship has the fifth lowest σ
OLS, but is ranked ninth when considering σ
TLS. The largest discrepancy appears to be with m
b,BJI, where it is ranked fourteenth under OLS regression but ranked eighth under TLS regression. Interestingly, the magnitude type by the most local agency, M
L,KMA, ranked thirteenth under OLS and fifteenth under TLS conditions. The table also suggests m
b data are more scattered relative to M
S data as both OLS and TLS regressions show higher σ’s.
4. Discussion
In terms of performance, residuals from M
S regressions are plotted in
Figure 6. These plots show the OLS and TLS residual biases for each regression are essentially zero. Additionally, TLS residuals also show less variability relative to OLS residuals. The regression residuals for M
S,IDC appear more scattered especially at M
S,IDC < 4, with three events beyond 2σ for both OLS and TLS regressions. Even so, these plots suggest a linear model can be appropriate for M
S regressions.
Residuals from m
b regressions are plotted in
Figure 7. These plots show the OLS and TLS residual biases for each regression are close to zero, however there appears to be considerable scatter within the residuals for each regression, with each regression having about three events beyond 2σ for OLS regressions. Interestingly, the TLS regression for m
b,NEIC in
Figure 7b had no events outside of 2σ. Although the figure shows decent scatter across magnitudes, there does not appear to be any trend, suggesting a linear model can be appropriate for m
b regression.
Residuals from M
L regressions are plotted in
Figure 8. Residual bias from OLS and TLS regressions are zero with TLS residuals showing less variability relative to OLS residuals. The proposed regression for M
L,KMA was bilinear to account for the stagnation at lower magnitudes and the residuals appear to agree in
Figure 8a. It is not known if this saturation effect is a characteristic of lower magnitudes or an artifact of M
L calibrated for Japan [
21,
35]. Interestingly, there are several events between 4.5 < M
L,KMA < 5 where the bilinear model appears to underestimate magnitudes. Residuals for M
JMA also appear to show events at M
JMA~3.7 and 5.2 to be underestimated as shown in
Figure 8b, with five events beyond 2σ for both OLS and TLS regressions, the highest amongst all models. However, this may be due in part to the relatively higher number of earthquakes assigned this magnitude type.
Figure 8e shows the M
L,IDC residuals that appear the most scattered across the magnitude range, resulting in relatively large σ. Even with a large σ, there are four and three events beyond 2σ for OLS and TLS regressions, respectively. Moreover, the decreasing behavior with M
L,IDC suggest a linear model for regression may not have been appropriate. As such, the proposed regression relationship herein is not recommended for homogenization purposes but is presented for completeness.
Interestingly, the residuals for M
V,JMA and M
D,JMA appear the most linear of all the regressions presented herein as shown in
Figure 9. This may be perhaps due to the limited number of earthquake events. However, the reason why there are so few events is that JMA started using M
D,JMA and M
V,JMA after discontinuing the use of M
JMA in 2016. Nonetheless, all the data points fall within 2σ and suggest a linear model to be appropriate to the data.
Overall, these findings reinforce the notion that MW estimates from magnitude type regressions are dependent on regression type. A linear trend model appears to serve the data well with perhaps the exceptions of ML,KMA and ML,IDC. For magnitude types that exhibited more linear behavior, either OLS or TLS regression would yield similar results. Differences between OLS and TLS relationships appear when the data show events that seem to be significant outliers, relative to a linear trend.
An additional comparison is made between the proposed homogenization regressions herein against regressions from previous studies in
Figure 10. As previously mentioned, a study based on global ISC data provided regressions for M
S,ISC and m
b,ISC [
19]. Another study focused on the continent of Africa also offered homogenization relationships M
S,ISC, m
b,ISC, M
S,NEIC, and m
b,NEIC [
20]. Both studies derived their relationships using orthogonal regression, which is comparable to the TLS approach used here.
Figure 10a,b show the residuals to both global and African relationships for M
S,ISC and m
b,ISC, respectively. The global relationship appears to overestimate M
S,ISC for the compiled data and shows a linearly increasing trend for the m
b,ISC residuals, suggesting an inappropriate model.
Figure 10c,d show the residuals from the African relationships for M
S,NEIC and m
b,NEIC, respectively. The residuals from the African model plot slightly above and away from the TLS M
S,NEIC residuals, while the residuals from the African model plot mostly lower and away from the TLS m
b,NEIC residuals.
A South Korean study presenting a local magnitude calibration also showed magnitude type relationships between M
W, M
L, and M
L,KMA [
21]. They separated between horizontal and vertical M
L derivations, but each one showed similar results. The residuals from the horizontal model are presented in
Figure 10e. These residuals show significantly more scatter relative to the TLS residuals shown herein and seem to suggest bias, especially at M
L < 3.6. As mentioned previously, KMA used an M
L scale calibrated for Japan, which may be a source for much of observed scatter. Additionally, the M
W relationships provided in the South Korean study were compared against their proposed M
L scale.