All following considerations are based on data from the Austrian standard track recording car. This delivers several signals, such as the longitudinal level, the track gauge, and the alignment two to four times a year—depending on the importance of the track [
12]. As mentioned before, the condition and behaviour of the track are mostly described by the development of the standard deviation of the longitudinal level. In order to determine accurate deterioration models, deterioration branches have to be bounded by maintenance actions that affect the longitudinal level. Therefore, the input data for this research is the longitudinal level in the wavelength range of 3 to 25 m, described as the D1 signal in the European Standards [
13]. For all three algorithms, it is important that the input signals are synchronised, as they are only roughly positioned in the database. As described by Fellinger [
14], this most effectively works by shifting the measurements run with the aim of minimising the Euclidean distance d (Formula (1)) between the measurement runs, whereby the latest valid measurement run before a renewal or the latest measurement run in the database, respectively, forms the reference signal.
The calculation of the Euclidean distance between two measurement points also includes the positional value in the longitudinal track dimension described by x. As this term is the result of the synchronisation process, it is not relevant for the calculation; therefore, the one-dimensional distance of y between two measurement points is sufficient for the synchronisation process. The sum of the distances D
M1|M2 is then a kind of quality index for the synchronisation of two measurement signals, M
1 and M
2, with a length L, shown in Formula (2).
When shifting one of the two measurement signals, the shift with the minimum distance D
M1|M2 can be found. This shift represents the distance in longitudinal direction by which the signal has to be moved, most of the time lying in the range of a few meters. In the upper part of
Figure 1, two unsynchronised signals are shown. Those signals are then synchronised through the described process, with the result displayed in the lower part of
Figure 1.
In the following, three different approaches to detect executed maintenance actions derived from the longitudinal level in the wavelength range of 3 to 25 m meters are presented. Hereby, the first two algorithms are cross-section based, whereas the third method uses the principle of the cumulative sum of the longitudinal level.
2.1. SEARCH Algorithm
The SEARCH algorithm, first described by Fellinger [
11] and further developed for the aim of this comparison, can be employed to detect unrecorded tamping actions. The decision-making process is based on five conditions, all of which represent possible developments of the standard deviation of the longitudinal level D1. Moreover, the algorithm’s precision can be enhanced by incorporating recorded tamping actions.
Basically, the SEARCH algorithm operates on a cross-sectional basis and can be implemented across the entire line by iterating over each cross-section. For every cross section, a loop is executed, whereby a new measurement point, including its corresponding date, is appended to a temporary data set in each iteration. Subsequently, the five conditions, presented in the following in numerical order, are checked to ascertain whether a tamping action may have occurred. If this is the case, the proposed date of the action is saved. The date is calculated via the mean of the measurement dates before and after the predicted maintenance. Furthermore, all data points before the predicted maintenance are deleted from the temporary data set. In the event that no condition is fulfilled, or the temporary dataset is cleared, a new point is added to said dataset. The loop is continued until the latest measurement is added to the temporary dataset and has been evaluated.
2.1.1. Condition/Rule 1
The application of Rule 1 is limited to cases where the temporary dataset comprises precisely two measurement points. In the event that the second measurement point exhibits a lower quality and a higher value than the first, no action is required, given that it is reasonable to expect a decline in track quality over time. Conversely, if the quality of the initial measurement exceeds the quality of the subsequent measurement by a defined value, a tamping action is identified. The threshold value is set at 0.25 mm, which allows for the reasonable assumption that a significant improvement can be attributed to track work and not to an issue with the data or other influences. Rule 1, like Rules 2–4, is illustrated graphically in
Figure 2.
2.1.2. Condition/Rule 2
Rule 2 is applied to a dataset comprising three to five data points. All data points, with the exception of the final one, are employed in the calculation of a linear regression, which is subsequently utilized to forecast the value of the last point. In the event that the final point exhibits a quality improvement of greater than 0.25 mm relative to the prediction, and the subsequent measurement point also demonstrates a quality enhancement of at least 0.1 mm in comparison to the same prediction, a tamping action is identified in advance of the last point of the provisional dataset. Incorporating the subsequent data point serves to reduce the probability of an outlier being erroneously identified as maintenance work.
2.1.3. Condition/Rule 3
The third rule can only be applied if a tamping action has been identified and the initial two data points of the provisional dataset have not fulfilled any of the specified conditions. The initial data point of the provisional set will be excluded if three conditions are met:
The absolute increase in quality from temporary point 1 to temporary point 2 is greater than the absolute quality increase from the final point in the preceding deterioration branch to temporary point 1.
Temporary point 2 exhibits higher quality than that observed at temporary point 1.
The absolute increase in quality from temporary point 1 to temporary point 2 is greater than 0.05 mm.
Rule 3 is applied in order to eliminate outliers at the beginning of deterioration branches, thereby ensuring stable regressions.
2.1.4. Condition/Rule 4
Rule 4 is similar to Rule 3, as it also demands an already detected tamping and three measurement points in the new temporary dataset. Additionally, three conditions must be met:
The absolute increase in quality from temporary point 2 to temporary point 3 is greater than the absolute quality increase from the final point in the preceding deterioration branch to temporary point 2.
Temporary point 3 is of a higher quality than temporary point 2.
The absolute increase in quality from temporary point 2 to temporary point 3 is greater than 0.05 mm.
The objective of this rule is to ensure stable regression for deterioration branches by eliminating outliers at the start of those branches. If the aforementioned conditions are met, the first two points of the temporary dataset will be excluded.
2.1.5. Condition/Rule 5
While Rules 1 through 4 are necessary for specific instances to ensure the proper functionality of Rule 5, Rule 5 can be regarded as the primary rule for detecting tamping actions. The temporary dataset must comprise a minimum of four data points, and no other rule must have identified a tamping action. All points, with the exception of the final one, are used to calculate a linear regression model. This is employed to forecast the value of the final point in the temporary dataset within a confidence interval with a statistical significance of 0.995. Consequently, it is possible to ascertain whether the measurement point is included in the linear regression.
Should the quality of the measurement exceed the predicted value, it may be indicative of either an outlier or the execution of a tamping action. Should the subsequent measurement point also exceed the predicted quality range (confidence interval), a tamping action will be recorded, and all points except the final one will be excluded from the temporary dataset. In the event that the final point is identified as an outlier, it is excluded from subsequent calculations.
Figure 3 illustrates an exemplary dataset with its linear regression and confidence interval. As is evident from this example, the predicted standard deviation is not significantly different from the actual value. Furthermore, the stability of the regression is of importance, as the widening of the confidence interval is dependent on the scattering of the data and the time span between the last and the predicted point.
2.1.6. Condition/Rule 0—Adaptation of the Algorithm
After completion of the loop, the boundaries of all deterioration branches and outliers are known. If there are also known maintenance actions, these can be compared to the calculated ones. Furthermore, if there is a detected and a real tamping action amidst two measurement points, the calculated one will be overwritten by the recorded one. On the other hand, if no measurement point has been detected, a recorded one can be added.
In case the recorded data of tamping actions can be trusted, the algorithm will function more effectively. As Rule 0, already known maintenance actions will be included by default, instead of comparing and adding those afterwards. Upon the addition of a new measurement point to the temporary dataset, a verification is conducted to ascertain whether a recorded tamping action exists between the newly introduced point and the preceding one. If the aforementioned condition is satisfied, only the most recent data point will remain in the temporary data set, and the corresponding recorded maintenance action will be saved. The use of trusted tamping actions serves to enhance the algorithm, reducing the likelihood of missed outliers, unstable linear regressions, and slightly differing, not detectable behaviour of adjacent deterioration branches.
Figure 4 illustrates the application of the five rules to a fictional cross-section. While Rules 1 (yellow) and 2 (blue) are applied, they never result in a detection. Conversely, Rules 3 (and 4, green) do detect an outlier, and another outlier is identified by Rule 5 (red) in the first deterioration branch. As is the case with the majority of cross-sections, nearly all tamping actions are detected by Rule 5.
For better understanding, the workflow of the SEARCH algorithm is depicted in a flow chart in
Figure 5.
2.2. Cross-Section- and RANSAC-Based (CRAB) Algorithm
The second method provides the option of incorporating recorded maintenance data into the process, in a manner analogous to that of the SEARCH algorithm. If maintenance data is available, the period during which measurement data is available is divided into two or more rooms, with the number of rooms depending on the number of maintenance activities. In the absence of maintenance data, all measurement data is treated as pertaining to a single room. The fundamental tenets of this algorithm are derived from the core principles of the RANSAC (Random Sample Consensus) algorithm [
15], which is an iterative method employed to estimate the parameters of a mathematical model from a dataset that may contain outliers. The method operates by repeatedly selecting random subsets of the data, fitting a model to them, and evaluating which model has the greatest number of inliers. In this case, for each defined room bounded by maintenance actions or the beginning and end of data recording, respectively, every possible combination of two measurement points represented by the standard deviation of the longitudinal level D1 is selected iteratively. The primary objective is not to detect outliers but to identify individual deterioration branches, which occurs in two steps. In the first step, the two chosen data points establish a straight line around which an interval is traversed. The size of the interval is defined by the standard deviation of the data points in the respective room, whereby a value of 1/3 of the standard deviation has been found to be a sensible choice.
Figure 6 shows that with an interval range of 1/3 (0.33) of the standard deviation of the longitudinal level D1, the highest F1 score can be reached by applying the algorithm to a calibration data set, which is further introduced in 2.3. The value by which the standard deviation of the longitudinal level D1 is multiplied is plotted on the x-axis.
All data points that fall within the interval are then labelled and saved in a list (green points in
Figure 7). Once all potential combinations within the designated room have been processed, the set of data points that were most frequently identified as contiguous is defined as a set for a segregated deterioration branch.
Data points that fall within the specified room and which are not part of the data set are discarded as outliers (orange points in
Figure 7); those that fall outside the room (red points) are used for the subsequent determination of a deterioration branch. To prevent rooms with an excess of outliers from being erroneously designated as a segregated deterioration branch, the ratio of the time span of the identified room in relation to the labelled inliers must not exceed 1.5 times the overall inspection interval in this cross section. If no further measuring points are available for which room affiliation can be determined, the system will proceed with the next room initially defined by existing maintenance activities or the next cross-section. The procedure aims to fragment the cross-section into deterioration branches. Nonetheless, issues predominantly arise when the maintenance interval is shortened towards the end of the service life or the enhancement in the standard deviation of the longitudinal level subsequent to maintenance is minimal. Consequently, a further refinement is conducted in the second step, with the outcomes of the initial step serving as the basis for this process. In the second step, three measurement points are always employed across the room, with the first and third measurement points establishing a straight line. Subsequently, the vertical distance between the second data point and the representative point on the line is calculated, as shown in
Figure 8.
It has been concluded empirically that if the distance is within half and double the entire room’s standard deviation below the established line, maintenance was executed prior to the second data point. If the measuring point is situated below the lower limit, the data point is designated as an outlier. Upon completion of both steps, the algorithm returns the detected maintenance actions for each cross-section, while the date of the maintenance is defined as the midpoint between the two adjacent measurement runs.
For better understanding, the workflow of the CRAB algorithm is depicted in a flow chart in
Figure 9.
2.3. Cumulative Track Geometry-Based Algorithm
The third algorithm employs the cumulative track geometry index, initially proposed by Loidolt for the assessment of turnout condition [
16]. In the publication, the cumulative sum of the root mean squares (RMS) with an influence length of 3 m is used to represent the average track geometry quality of a turnout or parts of a turnout. For the aim of this paper, the approach is slightly modified, and the cumulative sum of the square roots instead of the root mean squares of the longitudinal level are used. The calculated index is called the Cumulative Index (CI) and is defined in Formula 3. The length L of the section can be selected arbitrarily, as will be demonstrated in the following explanations. CI is therefore described as a function of the position and can be seen for multiple measurement runs in the upper part of
Figure 10.
The local gradient of the cumulated curves reflects the track geometry quality of the respective location, with high gradients depicting poor quality. Deviations between two cumulated curves indicate either track deterioration or executed maintenance. In order to capture the gradient differences across a range of CIs for detecting maintenance, the difference between each consecutive CI is calculated and referred to as the difference signal (DCI). Subsequently, the DCI for each position is smoothed by calculating the moving mean with a span of 100 metres with the objective of minimising excessive scattering. The DCI signal is depicted in the lower part of
Figure 10.
Three scenarios are presented. In Case 1, no maintenance is performed between the two measurement runs, and track geometry deteriorates at a rapid rate as the track is potentially approaching the end of its service life. This accelerated deterioration results in an increase in the amplitude of the longitudinal level, which in turn leads to a positive gradient in the DCI signal (blue). In contrast, Case 2 involves a maintenance activity between the measurement runs, which serves to reduce the longitudinal level amplitudes. Therefore, the CI signal after the maintenance has a lower gradient than the CI signal of the measurement before the maintenance. Consequently, the gradient of the DCI signal in the area where maintenance has been carried out (magenta) is negative. After a few measurement runs, which are shown in grey, a track renewal was executed. As expected, the CI signal of the first measurement run after the renewal (31 March 2021) has a flat gradient. New, undamaged components result in a minimal deterioration of track geometry and no need for maintenance. Consequently, the CI signals display gradients that are almost identical (Case 3) and flatter than the gradients in Case 1. Furthermore, the DCI signal also has a low but positive gradient.
In this instance, the gradient is approximated via the secant of the DCI over a length of 100 m. The length of the secant was determined through an investigation in which secant lengths of 50 m, 75 m, 100 m, 125 m, 150 m, 200 m, and 250 m were analysed. The relation of true positive rate to false positive rate (
Figure 11) reveals that a secant length of 100 m yields the best results, as then a good balance between a low false positive rate and a high true positive rate can be achieved. This analysis was conducted on four sections with comprehensive maintenance documentation, providing a ground truth that enabled the determination of the ideal secant length.
Given that the gradient of the DCI can now be described at each point using the secant gradient, the subsequent step is to ascertain in which areas the gradient of the DCI signal is negative. Given the varying gradients before and after the maxima and minima of the DCI (as depicted in
Figure 12, where the gradient before the minimum is steeper than after), the selected range may be either too long or too short due to the secant length of 100 m. Accordingly, the precise location of the maxima and minima, which delineate the commencement and conclusion of the maintenance section, is subsequently determined (
Figure 12).
If the algorithm is cancelled at this point, maintenance measures are incorrectly assigned to an excessive number of sections. The sections are therefore subjected to a more detailed analysis in two stages. The first step is to ascertain whether the negative slope of the DCI is due to potentially incorrectly synchronised data. For this purpose, a signal correlation analysis is conducted for each identified section. The final signal preceding the identified maintenance measure must exhibit a linear correlation of at least 0.7 with one of the two preceding measurement runs. It is assumed that no further maintenance was carried out during this period. The process is then repeated with the initial measurement signal following the identified maintenance activity, comparing it to the subsequent two signals. If either the correlation value before or after maintenance is too low, the detected section should be identified as an erroneous detection and subsequently be excluded from further consideration. The threshold value of 0.7 was determined using the same data set as the influence length and is based on the interpretation of
Figure 13. It was ensured that the ratio of true negatives (correctly labelled as a section without maintenance; green curve in
Figure 13) to false negatives (incorrectly labelled as a section without maintenance; red curve in
Figure 13) is high and that the number of false negatives is low, so that the precision of the filtering is high. These requirements are best met by a correlation coefficient of 0.7, as the number of false negatives is small up to this point and increases sharply thereafter (red curve).
Secondly, the steepness of the DCI must also be taken into account when considering the result. To achieve this, the difference between the highest and lowest points of the DCI for the specific maintenance section is calculated and related to the length of the maintenance section. This allows for the consideration of the overall reduction in the longitudinal level. The value of 700 has proven to be an appropriate choice when applied to the previously described test data set. This was determined using the ROC curve, the progressions of the negative and positive curves, and the F1 score (
Figure 14). The ROC curve in
Figure 14b illustrates that the optimal threshold value should be situated within the range of 600 to 1000, as evidenced by the comparable distances to the diagonal in this range. Additionally,
Figure 14a demonstrates that the F1 score exhibits minimal growth from a threshold value of 700 onwards, whereby the enhancement in the F score up to 700 is considerable. This is further corroborated by the comparison of true negatives (dark green) and false positives (light red) in
Figure 14c, which also indicate a flattening of both curves at this value. One evaluation alone would not allow a clear statement to be made about the most appropriate value, but when the results of all three evaluations are considered collectively, it becomes evident that the value of 700 is the most appropriate.
Once all the mentioned steps have been completed, the algorithm generates a list of identified maintenance tasks, including the commencement and conclusion of each section, as well as the estimated date. In contrast to the preceding two algorithms, the definition of maintenance work is not cross-section based but rather line wide.
For better understanding, the workflow of the CTG-based algorithm is depicted in a flow chart in
Figure 15.
2.4. Method Performance Comparison
The three algorithms described are applied to four track sections in the network of the ÖBB-Infrastruktur AG, all of which have an average age of around 20 years. Apart from that, the sections have the following boundary conditions:
Section 1: The first section has an average daily load of approximately 85,000 gross tons and is predominantly composed of concrete sleepers on 60E1 rails, extending for approximately 12 kilometres. The area encompasses 12 turnouts, 17 bridges, and two station areas.
Section 2: In contrast to the first section, the second section has an average load of only 17,000 gross tons per day. Approximately 2/5 of the 60E1 rails are installed on concrete sleepers, while an equal number are installed on wooden sleepers. The remaining rails are installed on concrete sleepers with under sleeper pads. The section includes six turnouts, 26 bridges, two short tunnels, and four station areas. The total length of the section is 11 kilometres.
Section 3: The third section, spanning approximately 10 kilometres, is primarily composed of 60E1 rails on concrete sleepers. The track is subjected to a gross tonnage of approximately 50,000 per day. The section incorporates 10 turnouts, 25 bridges, and 3 station areas.
Section 4: The fourth section, which extends approximately five kilometres, is primarily composed of 60E1 rails on concrete sleepers. The track bears a load of approximately 67,000 gross tons per day and encompasses nine bridges, one station area, and no turnouts within the specified region.
The selection of sections is based on the consideration of enabling a comparison of sections with disparate loads and expected deterioration. The evaluation is based on data from the Austrian track recording car dating back to 2003 (Section 4), 2005 (Section 1), 2006 (Section 2), and 2012 (Section 3). The data necessary for the evaluation are the longitudinal level D1, which describes the vertical track geometry in the wavelength range from 3 to 25 m (used for the CTG-based algorithm), and the sliding standard deviation of this signal with an influence length of 100 m for the SEARCH and CRAB algorithm. For the recording of the track geometry, the Austrian track recording car utilises an inertial measurement unit (IMU) paired with an optical track gauge measurement system and a navigation system. The measuring principles and data output of the track recording car comply with European regulations (EN 13848) [
13]. The modified maintenance database serves as the reference case for assessing the precision of the algorithms. Therefore, the recorded maintenance work was augmented and corrected with manually recorded sections through a process of visual inspection of the measurement signals and the TQIs derived from them.
In order to evaluate the three algorithms, the following metrics will be employed: precision, recall, and F-score. The classifications required for this analysis (true positive, true negative, false positive, false negative) are determined on a cross-sectional basis. Recall is defined as the number of true-positive results divided by the total number of elements that actually belong to the positive class. The precision for a class is the number of true positives divided by the total number of elements labelled as belonging to the positive class. As it is not reasonable to use recall and precision as the sole criteria, the two parameters are combined using the F-score. The formula for the F-score is: