1. Introduction
A new Vertical Reference Frame EVRF 2007 and European Vertical Reference System 2007 were created in Europe. Currently, efforts are being made to implement the system in particular European countries. The system is a kinematic height reference system [
1]. The leveling data from 27 countries were included in a common epoch (2000). Both the system and the frame are still being modified in relation to constant updates [
2]. Levelling campaigns are conducted on average every 20 years or they are carried out as supplementary or control measurements [
3,
4]. A long time between measurement epochs influences the validity of regional and global models.
Due to the character of satellite navigation systems and types of errors that occur in them [
5], the ellipsoidal height is determined with a far worse precision and accuracy than the horizontal components. For vertical crustal movements, the situation is the opposite to the levelling data. In addition, there are numerous external factors that influence the determination of height and velocity. These factors are often difficult to identify, for example: the influence of soil freezing, type and place of stabilization, continents’ vertical movements, changes in seas and oceans level, and earthquakes [
6]. Moreover, there are no explicit relations between the value of these errors for the adjacent stations of CORS (continuously operating reference stations). This is why we have concentrated only on the changes in height velocity. The models of vertical crustal movements can be determined using geodetic methods—starting from the leveling data, through mareographic data, to GNSS. Each group of data is characterized with the assumed error in estimating the movement [
4,
7,
8]. Mareographic data and GNSS data are constantly collected owing to how far it is possible to precisely analyze the changes that appear in time series. The vertical crustal movement models and deformation models can be determined on the basis of time details, e.g., the absolute model (absolute vertical movements of the station) or relative models (vertical movements between given stations) [
9]. Regardless of the chosen solution, the data must be reliable.
Extending operation periods of permanent GNSS stations provides the possibility to determine the velocity of vertical movements at the stations with accuracy ranges from
mm/yr to
mm/yr for three-year-long observations [
7,
9].
From the statistical point of view, longer observation makes the estimated vertical movement velocities more precise. However, the data in time series (heights or height differences) must be verified by applying proper mathematical and statistical tools. Higher precision can be obtained by elimination of external factors (e.g., the influence of temperature). As shown in [
10], there are inaccuracies in time series caused by technical, human and environment factors (breaks in functioning of stations, changes of antennas, changes of stabilization, software and receivers modification, changes in the troposphere and ionosphere, changes in the system of coordinates, changes in the base, subsidizing of buildings on top of which the antennas are stabilized, tectonic movements, and velocity caused by human activity) [
11]. The inaccuracies in time series can be characterized with (
Figure 1):
discontinuity of epochs—lack of coordinates in measurement epoch (panel 1),
discontinuous data—quasi constant relocation of coordinates in the following measurement epochs towards the coordinates of the previous epochs (panel 2 and 3),
Data with small jumps that are difficult to detect due to the data noise (panel 4).
The discontinuity of epochs is identified on the basis of the lack of data in the time series at a certain epoch. The discontinuity of data (jumps) can be identified, for example, by knowing the date of the event that caused the discontinuity [
12] or by the automatic discontinouity detection process [
11,
13,
14]. Additionally, the identification can be done visually, on the basis of heights diagrams [
15]. Various approaches to the solution of the time series discontinuity problem can be found in the literature. In [
13], Bayesian, variational, and Wavelet methods were applied to detect discontinuities in permanent GPS stations coordinate time-series. The discontinuities were detected without estimation of the magnitude of “jumps”. The parameters of the functional model containing discontinuities were estimated in [
11]. Recursive application of the DIA (Detection, Identification and Adaptation) procedure [
16] enables the detection of the discontinuities present in the time-series and the removal of the observations affected by outliers. A review of various methods of discontinuity detection can be found in [
14], where the authors conclude that further work needs to be done and metadata should be used in both manual and automated approaches. In the approach presented in this paper, an automated approach to discontinuity detection is presented. It does not require any metadata and its numerical cost is relatively low, which implies that it be applied in real-time applications. Regardless of the selected approach, the discontinuity of data that is caused by “a random incident” caused by human activity or the environment is difficult to identify. The jump identification methods can be grouped into the following categories:
visual changes in the graph,
changes documented by the system operator (e.g., in log files),
use of dedicated algorithms,
combination of the above methods.
The aim of the article is to verify and improve the discontinuity detection algorithm based on the switching edge detector (SWE) presented in the work by Rapinski and Kowalczyk [
17], which supports the estimation of vertical crustal movement model on the basis of GNSS data. The limits of the algorithm were verified in relation to the identification of inaccuracies in data (jumps) and values of the jumps with a simultaneous calculation of the velocity of vertical movements on the basis of GNSS data obtained from the polish national CORS network named ASG-EUPOS.
2. Methodology
Verification of an algorithm is usually based on runs against specific test cases. These test cases can be preconditioned (for example, using Monte Carlo method). Another approach is to use real life data and test the results using independent tools and criteria. During development of the algorithm, we have performed many tests on simulated data (
Figure 2). These tests allowed us to draw a conclusion that the presented approach can be used in a real life situation.
The assumptions of the algorithm were presented in the article by Rapinski and Kowalczyk (2016) [
17]. While conducting the tests, it was noticed that the algorithm had given false results when the dataset was contaminated with outliers (
Figure 3). In such a case, short jumps of unpredictable values were noticed. In order to solve this issue, the algorithm was preceded with outlier detection and removal procedure. Many tests for outliers can be found in the literature. The simplest one is a
test, which is applied in many data processing strategies. Since this test works poorly for gross measurement errors [
18], the Grubbs’ test for outliers was used [
19,
20]. The test allows for detection of outlier values in the data sets. On the basis of the test results, the elimination of a single outlier can be done in few ways. One of them is to replace the outlier with the mean of two (or more) adjacent observations. The other option is to remove the outlier from the dataset. In the case of a long time series (about 1900 observations in our case), the differences between resulting parameters for both options are negligibly small. The flowchart of the algorithm is presented in
Figure 4.
Grubbs’ test is defined for the hypothesis:
The Grubbs’ test statistic is defined as:
where
is a current sample (window),
is a mean value and
s is a standard deviation.
The Grubbs’ Test is meant to be used for small sets (up to 25 observations). Due to the amount of data in time series, the moving window of the Grubbs Test was used. For each range from to (where i is i-th observation and n is the Grubbs test window size), a single Grubbs Test was conducted. In a single Grubbs test, the p-value is understood as a chance of a certain point being so far from other points was set to 0.05. If any information in the current window was identified as an outlier, the value of the rank, which described the probability that the point was the outlier observation, rose by 1. Next, i increased by 1 and the test was conducted again. After considering all the points, the rank threshold, over which all the values were considered outliers, was determined. For the purpose of this elaboration, the rank threshold above which the point was considered outlier was set to 5. This means that if, during the 25 steps of moving Grubbs window the point was identified five times as an outlier, it was removed from the dataset. In order to eliminate the situations when two adjacent observations were outliers, the test was conducted several times.
To detect places in which “steps” occur, the switching edge detector algorithm was used [
21]. Assuming a single time series point
h defined at time point
i, moving averages are computed using the
n-point window (
2):
The next step is to construct moving variances (
3) on the basis of the moving averages (
2):
The output function of the switching edge detector is defined as:
where
and
, are the switching factors defined as:
where
r is an arbitrary selected exponent with a value
. The switching edge detector results in a
C matrix containing zeros and ones. This matrix describes at which epoch a step was detected. It has as many columns as there are epochs and as many rows as number of detected “discontinuities”. In each row (related with a certain jump), there are zeroes before the jump and ones after it. The mathematical model used in this article [
17] is a straight line with steps in particular epochs:
where:
t—epoch,
—height difference in epoch t,
v—velocity between stations,
—height difference at epoch 0,
—elements of C matrix,
—magnitude of “jumps”.
or, assuming the following matrices:
Equation (
6) can be written as:
Then, the model can be fitted into the data using least squares (LS) adjustment.
In the case of mining operation areas or seismically active areas, the velocities of vertical movement can change during the station operation. In such a situation, a more sophisticated method of model parameters estimation than LS should be used. For example,
estimation can be used to fit multiple functional models into single data set [
22,
23,
24].
3. Data Used in the Experiment
In order to avoid the influence of GNSS network adjustment on the resulting vertical movements model, unadjusted vectors between adjacent GNSS stations were used. The time series of height differences between Polish ASG EUPOS permanent stations (
Figure 5) and several stations from adjacent countries such as: Lithuania (LITPOS), Germany (SAPOS), Czech Republic (CZEPOS) and Slovakia (SKPOS) were used as data in the article. The stations are parts of the networks: IGS (International GNSS Service), EPN (EUREF Permanent Network), CEGRN (Central European GPS Reference Network), and EUVN (European Unified Vertical Network). The differences were calculated at the Military University of Technology EPN Local Analysis Centre (MUT LAC) using ETRF2000 reference frame [
25]. All together, daily data from 71 vectors were analyzed. The estimated height differences caused time series between GNSS stations located in central Europe, on the Western European platform, in the Sudetes, the Carpathian Mountains region and in the Eastern European platform.
Figure 6 depicts the availability of data for each GNSS station. It shows the number of epochs in the time series and the number of epochs from the logs of the station operator. These numbers are not equal, which can indicate discontinuities in time series. The number of epochs in time series for particular vectors is in a range from 920 to 1776 (from 3.1 to 5.1 years of operation) and the median is 5 years. Only the time series that had discontinuities were selected for the tests. The discontinuity is characterized with a single jump or even several dozens of jumps in epochs.
The CORS operation log files contain the following information: site identification of the GNSS monument, site location information, GNSS receiver information, GNSS antenna information (date installed and date removed), surveyed local ties, frequency standard, collocation information, meteorological instrumentation, local ongoing conditions possibly affecting computed position, local episodic effects possibly affecting data quality, and an on-site point of contact agency information. Apart from log files, there are also change logs that contain information about failures and operation breaks, and other changes made on reference stations together with the date of the event.
4. The Input Parameters of the Algorithm
The algorithm was tested on various differences of heights between GNSS stations. Such an attitude allows for the application of verification vectors and points that should not be used in estimation of the vertical crustal movements directly on permanent stations, for example: checking the loops misclosure. Additionally, it allows for later adjustment of network of the velocity of vertical movements. The verification was made with the use of four methods:
Verification I. Comparison of the velocity of vertical movements calculated with the use of the algorithm with the velocities calculated manually on the basis of the same time series (using visual evaluation).
Verification II. Comparison of the velocity of vertical movements estimated with the algorithm with the use of adjacent parameters.
Verification III. Comparison of the jumps identified by the algorithm with the information contained in .log files.
Verification IV. Estimating the value of loops misclosure.
A proper verification and then a possible modification are possible only on the basis of true data gained in the measurement. In order to check the calculations, one should determine the Grubbs Test parameters, the size of the threshold and the size of the window for SWE. The search window of the maximal value for the test, i.e., 25, was used as a parameter in the Grubbs Test. The test was repeated twenty times in order to eliminate the outliers located next to each other. Due to a large variance, the confidence level was 0.01, which was the value of triple standard deviation.
The SWE parameters are connected with two values: the size of the moving window, and the threshold value outside which the values can be regarded as a jump. The threshold value was connected with the expected precision of height changes. The analysis of the observations of permanent GPS stations indicated that the precision of the estimated heights of daily observations were from 7 mm to 9 mm [
7]. However, analyzing the observations from the last three years with the linear regression method, it was seen that there was an error of about
mm/yr in ellipsoidal height [
7]. As it was presented in [
9], in the Polish station, the error was about
mm/yr in four years and
mm/yr in five years. This is why the assumed threshold was
mm. This value exceeds the expected mean error of ellipsoidal heights.
The size of the SWE window was was estimated using the spectral series analysis. The analysis showed that the value of the main period in time series was one year (
Figure 7).
The final result of the algorithm was the estimated velocity of changes of
v, difference of heights at the beginning of the series
, location and magnitude of a jumps
. The final fit was made using least squares so the covariance matrix of estimated parameters is also available. The results of the algorithm calculations were shown in
Figure 8.
Verification I. The comparison between the velocity of vertical movements estimated with the algorithm and the velocity calculated manually (using visual evaluation) on the basis of the same time series. In the manual processing, the “obvious” outliers visible on a time series plot were removed from the time series. Visible jumps were removed by subtracting the approximate magnitude of jump from all the data points following the jump. The vertical movement velocity was estimated with LS line fit.
The manual calculations were collated with the results of the calculations made with the algorithm. The estimated absolute values of the differences are presented in
Figure 9.
Discontinuities undetected in manual processing resulted in a different value of estimated vertical movement in 46 of 71 processed vectors. The nature of the jumps remains unknown. They can be caused by poor fixing of the GNSS antenna, construction works in the vicinity of station, or other factors. A large drawback of manual identification of discontinuities is its dependency on operator experience. The estimated differences fluctuated between 0.0 mm/y to 0.2 mm/y. They were satisfactory and they were within the limits of
of assumed vertical movement accuracy. The differences were influenced by the difficulty of presenting the jump values on a diagram (
Figure 10).
Verification II. The comparison between height changes estimated with the algorithm using adjacent parameter values. The verification of correct identification of a jump and automatic calculations of the velocity of vertical movements were estimated with the use of SWE threshold of adjacent values, i.e.,
mm,
mm and
mm. The estimated results are shown in
Figure 11.
The majority of the jumps identified with SWE threshold of 0.5 mm were the same as those calculated with adjacent values of the threshold. The differences appeared mainly in the vectors between stations located in mining areas, in the Sudetes, the Carpathian Mountains region and the area that is seemingly stable (Eastern European platform, next to the Polish-Russian border). This area showed greater changes in heights than other adjacent areas, which was reflected in the estimations of levelling data [
4,
6]. The greatest differences in estimating the velocity of vertical movements were obtained with the use of SWE of 0.4 mm on the stations located in areas of rich deposits and those with the smallest number of epoch (CTRU-KLDZ, GANP-SKSV, JLGR-CLIB, JLGR-CTRU, PRZM-SULP, USDL-SKSV, ZARY-GLOG). On the vectors between these stations, the error of velocity was bigger than in other cases (from
mm/yr to
mm/yr).
Verification III. Checking the jumps identified using the algorithm with the information given in .log files. The estimated number of jumps and their locations were compared with the number of jumps identified on the basis of log files. The estimated number of jumps from automatic calculations and information received from log files is depicted in
Figure 12.
All the identified jumps in .log files were reflected in calculations with the use of the algorithm. The algorithm showed a larger number of jumps in time series than it was identified in log files (
Table 1).
The higher number of jumps indicated by the algorithm shows that the algorithm identifies jumps not only from the changes in the equipment, but also from other unidentified reasons. A large number of jumps may also show that the particular station was not stable or other factors had an influence on the values of height differences. The example of a time series with a few jumps is shown in
Figure 13 (green—original data, black—Grubss test).
GNSS stations, on the basis of which the time series was estimated, were located in the Sudetes on the areas with strong mining activity.
Verification IV. Checking the loops misclosure. The estimated velocities of vertical movements were analyzed by
loops misclosure analysis. Such verification allowed for a mutual control of the estimated adjacent velocities of changes in heights. The estimation of a greater than expected value of loops misclosure may indicate a false identification of jumps that resulted from an incorrect selection of SWE parameters. To determine the criterion for the value of loops misclosure, the authors chose the solutions presented in [
9]:
where
n is the number of vectors in the loop (
N is indicating that the trend is unadjusted) and:
in which
is a time span between consecutive epochs. Equation (
11) was derived on the basis of empirical time series analysis performed in [
9]. It is describing the dependency between stations’ time of operation and predicted vertical crustal movement accuracy.
Instead of standard deviation presented in Equations (
7) and (
8), one can take a theoretical (a priori) and practical (estimated a posteriori) value of standard error.
The estimated values of changes in heights (values near the vectors) and loops misclosure (the bold values inside the triangles) were presented in
Figure 14.
In the majority of cases, the estimated loops misclosures ranged from
mm/yr to
mm/yr. In four cases, the misclosure was from
mm/yr to
mm/yr. The greatest value of loops misclosure appeared in mining areas. The number of the identified jumps (from 8 to 23 jumps) in the time series exceeded the number of jumps identified in other time series. These were the time series in which the number of epochs was lower (about three years). Regardless of the higher values of loops misclosure, they satisfied the assumptions of equation (
10). However, for those vectors, in the further process of adjusting the network of vertical movements, one ought to select changes of a different rank.
5. Conclusions
According to the presented verifications, algorithm results are appropriate for the elaboration of vertical crustal movements. The obtained results and the verification results prove that it works properly. The indicated differences in the estimated results on given vectors show that there are reasons for jumps other than those presented in log files. The nature of these jumps is unknown and can be caused either by unnoticed human actions or some natural phenomena. For example, the improper antenna assembly on the roof of the building (where most of the stations reside) along with extreme temperatures can cause sudden movements of antenna. The higher number of jumps identified may be caused by unidentified environmental or technical factors. One should strive to identify the reasons for this or determine their influence on the number of jumps and, consequently, on the final calculation of the unadjusted trend on the vector. One can also eliminate or limit the series’ influence in the further adjustment process. The estimated increase and uniqueness of the area velocities in terms of changes in heights may be local.
Compared to other algorithms, the presented approach can simultaneously detect discontinuities and estimate the magnitude of “jumps” along with model parameters. It also has relatively low computational cost—it is not a recursive or iterative algorithm.
On the basis of the estimated values, one can more deeply analyze the time series in which the number of jumps is definitely higher. It is possible that, due to the nature of GNSS data, the time series of fluent periodic changes in heights can appear and they will not be identified by the algorithm as jumps. The methods of mutual verification that were suggested in the article allow for their verification.