In order for airplanes to be located with acceptable accuracy, it is necessary to develop the method of obtaining a sufficiently accurate TDoA estimate. In radiolocation systems, common practice is to determine the relative delay of signals by determining their cross-correlation function. However, it should be borne in mind that transmitters in these systems emit signals, the form of which is known on the receiving side. In addition, the waveforms of these signals are designed in such a way that they have good correlation properties giving a clearly marked unambiguous maximum suitable for time measurements. In the described VCS system, analog speech signals are used. These signals are generally of variable waveform so their correlation properties are also variable and what is more: the quality of correlation measurements of such signals depends on many factors, such as transmitted messages and voice characteristics of the speaking person. These signals are generally of unknown waveform and have a narrowband character different from noise. For such signals, an additional processing step is required to obtain a clear maximum of the correlation function.
Another issue concerning TDoA determination that needed to be solved was the limited time resolution of the correlation function. The signals received at the reference stations are sampled at a frequency of 40 kHz, which corresponds to a time interval of successive samples of 25 μs. In this case, TDoA time resolution would enable the determination of differences in the distance of the object from a pair of reference stations with a resolution of approximately 7500 m. Such a low resolution does not allow for determining the position of the aircraft with a sufficiently small error. To increase the time resolution of TDoA measurements, interpolation of the correlation function was used.
Selected digital signal processing modules in radio stations and in the central server are described in detail later in this section.
5.1. Preprocessing of Speech Signals
Before calculating the correlation function of the signals received by the two reference stations, preprocessing of these signals is performed to eliminate components other than those associated with the transmission of speech signals.
First, noise segments at the beginning and end of the file containing the audio samples are detected and removed. The noise segments occur at the beginning and end of the actual voice message and they are results of both transient effects in onboard transmitter—this component should be correlated in all ground radio stations and transient effects in receivers which are not correlated nor synchronized between radio stations and must be removed. They can also appear in pauses between statements, but these segments are not removed.
Spectrogram analysis shows that a major part of the speech signal power during air-to-ground communication is concentrated in the frequency band up to 3–3.5 kHz, which is mainly limited by characteristics of a human voice. On the other hand, in segments containing noise, the bandwidth goes up to 8–8.5 kHz, which is limited by the bandwidth of IF filters in the receiver. An exemplary spectrogram is shown in
Figure 12.
Hence, the detection of noise segments is based on an analysis of the signal strength in the 3.5 kHz to 8.5 kHz band. First, the signal is filtered with a bandpass filter with such a passband. It is a filter with a finite impulse response, and its order is 1000. Another element of digital signal processing is the envelope detector. Thereafter, filtering of the signal is performed by a lowpass filter of 1000, with a cutoff frequency of 50 Hz. At the output of this filter, we have a signal which may be understood as short-term (tens of milliseconds) amplitude of noise component in samples of signal from AM receiver. The signal formed in this way is then normalized to the maximum value in a time window of 300 s to achieve automatic adaptation of detection threshold in case of variable noise level caused by industrial interferences. An example of the noise signal form after smoothing and normalization is shown in
Figure 13.
It was assumed that noise occurs in those file segments where the value of the metric shown in
Figure 13 exceeds the level of 0.3. Thus, noise at the beginning of the file occurs from the beginning of the file until it is below the threshold. In turn, at the end of the file, the noise covers its last segment where the value does not fall below the threshold. In addition, a margin of 1500 samples on each side is also removed to get rid of any transient effect which may occur at the beginning of signal reception (for example, distortion caused by the different behavior of automatic gain control in VHF AM receivers in different stations) and at the end (noise recorded after the end of transmission before squelch unit switched off signal recording). However, taking into account both operations, in each audio file, a different number of initial samples may be removed. Therefore, before computing the correlation function, timestamps are updated to be the same as the sampling time that occurs first in the file.
In the next preprocessing step, the waveform from the audio file, after cutting the starting and ending noise fragments, is filtered with a filter with a passband from 500 Hz to 3 kHz. This is to eliminate distortion outside the speech signal band. A filter with a finite impulse response, order 1000, is used.
5.2. Correlation of Speech Signals
Signals from VHF aviation radio are recorded in reference stations starting from the moment of their envelope detection. Due to the different strengths of the signals reaching the stations, the moment of their detection, and thus the start of recording, may differ. Nevertheless, information on the start of recording of each signal is available, relative to a common clock synchronized with the GPS receivers. The moment of starting the recording is determined with an accuracy of 1 μs.
Before determining the correlation function, the waveforms of the signals are aligned on the time axis. In the next step, circular correlation is determined, consisting of cyclically shifting one of the signals by one sample to find the shift for which the correlation reaches its maximum, i.e., for which the signals are closest to each other. Cyclic correlation requires that the signals have the same number of samples. In case one of them is shorter, it is padded at the end with zeros (zero padding).
The cyclic correlation is computed in the frequency domain using the fast Fourier transform (FFT). It is a method that has less computational complexity than computing time-domain correlations. The FFT correlation is based on the calculation of the cross-spectrum of two signals, i.e., the product of the spectrum of one signal and the complex conjugation of the spectrum of the other signal, then computing the inverse fast Fourier transform (IFFT) and determining the modulus from the result.
As mentioned, speech signals are not of a noise nature, and therefore there may not be one clearly marked maximum in the function of their correlation. Therefore, in the case of such signals, the generalized cross correlation is used [
31]. It involves deliberate, preliminary shaping of the signals before calculating the correlation. It corresponds to passing the signals through a filter with a frequency response inverse to the signal spectrum, resulting in flattening the spectrum and thus imparting noise characteristics to the signal.
Figure 14 shows a diagram of the procedure for determining the generalized cross-correlation function with the use of the fast Fourier transform.
The cross spectrum of
x1[
n] and
x2[
n] is determined as
and generalized correlation
where
H[
k] is the frequency response of the shaping filter.
Several variants of filtration are described in the literature, e.g., Roth, PHAT, and SCOT [
23,
31]. The last of them was chosen for implementation in the VCS system because it shows the most clearly marked maximum correlation. Its frequency response is determined as follows
where
Xii[
k] is the auto spectrum of the
i-th signal, determined analogously as in (20).
It has been observed that the generalized correlation function may show a false global maximum for ∆n = 0 in the case where the denominator values of the filter frequency response are close to zero. To prevent this, the lowest denominator value H[k] is increased to the level equal to 1% of the maximum value of this denominator.
Figure 15 shows a comparison of the normal and generalized correlation functions in the vicinity of the maximum. As can be seen, in the case of speech signals, there are several local maxima in the standard correlation function, and the peak width around the global maximum is significantly greater than in the case of the generalized correlation.
5.3. Interpolation of the Correlation Function
Increasing the accuracy of TDoA estimation requires increasing the time resolution of signal analysis. An interpolation method is used to estimate the sample values between the original samples of band-limited signals.
Two variants are possible—interpolation of received signals in the time domain or interpolation of the correlation function. The first of these increases the processing effort due to the need to calculate the long FFT transforms. Hence, the second option is preferred, as it enables us to obtain similar results with much lower computational complexity. Additionally, in this case the processing effort is limited because only a fragment of the correlation function is analyzed in the vicinity of its global maximum—there is no need to interpolate the entire correlation function. A segment of ±30 samples from the maximum is assumed to be interpolated.
The first step in the interpolation process is oversampling. A fixed number of zeros samples (Nzer) are inserted after each original sample. This number depends on the expected increase in time resolution. For example, a tenfold increase in resolution requires nine zeros.
The next stage of interpolation is the filtering of the oversampled waveform. The interpolated signal is obtained at the output of the low-pass filter with a cut-off frequency of 5 kHz.
Obtaining an appropriately selective filtration requires the proper selection of the filter length with a finite impulse response. The empirically selected length is 400·(Nzer + 1). A 100-fold oversampling of the correlation function was adopted; therefore, a FIR filter of 40,000 was used.
Zero-valued vectors with a length equal to one-half of the filter length are added at the beginning and end to the segment of the correlation function before filtering. In turn, many zeros are added to the impulse response of the filter so that it has the same number of samples as the waveform to be filtered. The filtration process itself is performed in the frequency domain using the FFT transform. After filtration, excess samples at the start and end of the waveform are removed to restore the original length of the oversampled correlation segment.
Figure 16 shows an example of the correlation function in the vicinity of the maximum. Red represents the oversampled waveform and blue represents the same waveform after lowpass filtration. There is a noticeable asymmetry of the correlation function around the maximum. In such a case, the maximum of the function after interpolation is closer to the actual delay between the analyzed signals—an increase in time accuracy is obtained.
The final value of the TDoA estimate is the sum of two components. The first is the difference of the timestamps associated with the times when the first samples of the correlated waveforms were taken. The second component is the shift of the signals corresponding to the maximum of the correlation function after interpolation.
5.4. Results of Measurement Tests
The first step in the aircraft positioning process is selecting a set of recorded signals for the same audio transmission received by different radio (reference) stations (RS). The grouping of signals is based on the comparison of the timestamps of their recording start (these markers are saved in the database).
The set includes signals that markers differ by no more than a preset time interval. This interval is defined by the operator and is one of the configuration parameters of the TDoA determination program.
Originally, the value of the maximum interval between the timestamps was set to 2 s. However, problems with incorrect grouping of data were observed with this setting, especially in the case of series of short voice messages lasting 1–2 s only. TDoA values derived from such a data set quite often exceeded one second, indicating that the signals could not be related to the same transmission. For this reason, it was decided to reduce the allowable timestamp difference to 500 ms. At RSs, the recording start timestamp is set with a step of 40 ms, which corresponds to a segment of 1600 samples at a sampling rate of 40 kHz. Possible timestamp differences are therefore a multiple of this interval.
For each set of voice data, a time difference between the latest and the earliest marker is determined. The distribution of these values is presented in
Figure 17.
The distribution was based on 6180 sets of voice data. As shown in
Figure 17, over 80% of cases are the scatter values of 0 ms and 40 ms, and almost 90% of the values do not exceed 80 ms. It can be seen that relatively small scatter is the most common, which indicates that the voice data is grouped correctly. Grouped records were used to estimate TDoA values, but we limited possible length of voice signals to a range between 2 and 6 s (80 to 240 thousand samples). Messages shorter than 2 s were rejected because they resulted in large width of main correlation peak due to limited signal-to-noise ratio after correlation. On the other side, messages longer than 6 s were rejected because commercial planes, in 6 s, can travel more than 1.5 km which significantly changes time delay between reception of beginning and end of transmission in ground stations; therefore, main correlation peak tends to increase width.
Tests for determining the TDoA value were performed in two stages. The first stage concerned the evaluation of the TDoA scatter of the signal emitted by the ground transmitter (ATC ground transmitter for ground-to-air communication in the approach phase at the airport in Gdansk) and received by a pair of RSs no. 3 and no. 5. Data from one selected observation day were analyzed. The transmissions from the ground station were identified through direct listening—52 messages were identified. Due to the unknown location of the transmitting station, it was not possible to determine the actual TDoA value; however, approximately constant values were to be expected.
In the first approach, TDoA values of signals from ground ATC transmitter were calculated using standard correlation in time domain. Obtained results were oscillating around 190 μs, but they had a large scatter—the minimum value was 168 μs, and the maximum was 203 μs. The standard deviation was 5.3 μs. A few times smaller scatter was expected; therefore, modifications were made at the software code level for TDoA determination. These modifications included:
Introduction of filtration to eliminate disturbances outside the speech signal band;
Adding detection and elimination of the beginning and end of the transmission, containing noise related to switching on and off the transmission;
Generalized correlation estimated in frequency domain instead of standard correlation in time domain.
After making the above changes, the scatter of estimated TDoA values decreased significantly. The variability of these values is shown in
Figure 18. This time, a standard deviation of approximately 1 µs was obtained, and the difference between the maximum and minimum TDoA values did not exceed 5 µs. These are the values to be expected with the good quality transmission of signals received by radio stations.
The second stage of verifying the correctness of TDoA determination was to compare the TDoA values calculated by the server with the values corresponding to the actual positions of the aircraft. Information on reference (actual) position was obtained from the ADS–B system data stored in the central server database.
Data from nearly one month were analyzed. Unfortunately, these tests were conducted after COVID lockdown so the number of flights in the supervised area was significantly reduced and so the amount of data for system performance evaluation was also limited. First, sets of TDoA values related to the same voice transmission were mapped to the ADS–B data as follows:
Based on the position from the ADS–B system for a given second (the same as the TDoA timestamp) and the coordinates of the position of the RS, the reference TDoA values were calculated;
Then the ADS–B position for which the estimated TDoA values were most similar to the reference ones was selected;
An aircraft callsign associated with the selected ADS–B position was searched in the database.
The initial linkage of the TDoA data to ADS–B was further verified by determining whether the aircraft callsign provided in the voice message matched that contained in the ADS–B data. TDoA and ADS–B data set pairs where the callsign was not compliant or could not be verified were rejected. For the remaining data, the RMSE for TDoA was determined
where
N is the number of the radio station pair between which TDoA is determined,
TDoAest is the value estimated by the central server, and
TDoAADS-B is the value determined based on the position from ADS–B.
The TDoA error histogram is presented in
Figure 19. It was determined on the basis of 2078 pairs of TDoA–ADS–B data sets, where the RMSE did not exceed 300 μs. As may be seen, the most common RMSE TDoA is between 8 μs and 11 μs. Based on this data, the mean TDoA error in each pair of RSs was calculated. This was to determine whether the TDoA error was systematic and whether it could be compensated by adding corrections to the timestamps. The presence of TDoA errors limited to a few microseconds is fully justified by the different delays of signals in filters inside VHF AM receivers, especially 2nd IF filters. These systematic time measurement errors may be compensated by system calibration.
Table 2 gives the values of the mean error of TDoA estimation. Ideally, the error determined for the station pair
M,N should be the opposite of the error for the station pair
N,M. Results presented in
Table 2 were grouped depending on which radio station assigned earlier timestamp to the recorded stream of samples.
Those cells are marked in green where the discrepancy between errors, after changing the order of stations, in terms of the absolute value, does not exceed 1 μs. Yellow indicates a discrepancy of 1 to 2 μs, orange—between 2 and 3 μs—and red—more than 3 μs. As can be seen, in the case of 7 out of 10 pairs of reference stations, there is a clear error symmetry (the discrepancy does not exceed 2 μs). This indicates that the TDoA error may be systematic and it is desirable to try to compensate for it.
To determine the timestamp corrections for individual RS, based on the values in
Table 2, a system of linear equations was created in the following matrix form
Ax =
bIn matrix
A, the value 1 is entered in the column corresponding to the reference radio station index and the value -1 in the column corresponding to the index of the second RS station in the pair. The rest of the matrix is zero. The elements of the vector
x are the desired timestamp corrections for RS from no. 1 to no. 5 and vector b is created from corresponding TDoA error values from
Table 2. The system of equations has been solved by the least squares method
where
A+ is a pseudo-inverse matrix to
A (pinv() function in MATLAB environment). The correction values determined in this way are included in
Table 3.
The accuracy of aircraft position estimation by the technology demonstrator was determined by comparing it with the positions reported in the ADS–B system messages transmitted from ADS–B equipment onboard aircraft. Information about the current position of the aircraft in ADS–B messages comes from the onboard GPS receiver. Thus, the accuracy of the aircraft position used as reference data in our investigation is determined by the class of the GPS receiver used in the aircraft, method of data processing in ADS–B transmitter, and probably also delay between position estimation and message transmission. It results from [
32,
33] that the accuracy of the position reported by the ADS–B system in the horizontal plane cannot be worse than 0.05 nautical mile, that is approximately 90 m. Thus, it is at least an order of magnitude better than the tested VCS-MLAT system.
The basic analysis performed was the estimation of the error in the horizontal plane related to the longitude and latitude of the localized object. The results for the two position estimation algorithms, Foy and SI, were obtained based on data from the two-month observation period.
Figure 20 and
Figure 21 show the cumulative distribution function (CDF) of the 2D position estimation error (horizontally) for the two above-mentioned algorithms. There are two curves in each figure—the red one indicates the result after applying timestamp corrections, while the blue line is for the case without the use of time corrections. It can be seen from these figures that corrected results of position estimation are worse than results of simulations taken with assumption that TDOA errors are limited to ±2 μs (
Figure 5 and
Figure 6). However, simulation results obtained for TDOA errors in range ±5 μs (
Figure 7 and
Figure 8) are in quite good agreement with cumulative distribution charts of position estimation error from real field measurements.
It should be added that the results of aircraft position measurements obtained from the system tests in real conditions did not undergo any additional processing. The inability to automatically identify the aircraft on the basis of the recorded voice signals does not allow combining the results of subsequent measurements and applying averaging or filtering the position; therefore, each result of position estimation must be treated separately.
The results clearly indicate that the SI algorithm shows a significantly greater error in the position estimation compared to the Foy algorithm. This effect may be caused by the non-Gaussian distribution of errors in measured TDoA values and possibly correlation between measurement errors while least-squares algorithms expect input data to have uncorrelated Gaussian error characteristics. Additional degradation in position calculation quality may occur with an overdetermined set of equations where signals are received simultaneously by more than three ground receivers. Inconsistent measurement errors, in this case, emphasize the differences between different position calculation algorithms. It is also worth mentioning that TDoA measurements were made in a 3D environment, and the position estimation was limited to the 2D case only. In this case, a nonlinear transformation of the error distribution takes place, which may affect the positioning algorithms in different ways. Future work may include estimating the error distribution and optimizing the position computation algorithm for voice-based multilateration.
Although presentation of position estimation results on a map may provide useful information, e.g., on distribution of errors and systematic shifts, we found that plotting over 6000 processed records on one map makes it unreadable. Instead, we present in
Figure 22 one example of aircraft position estimation using recorded voice communication in VHF band (red dot) together with true coordinates from the ADS–B system of all aircraft which were present at the same time in controlled airspace around Gdansk airport (blue dots). In this example the position estimation error was equal to 3.7 km. However, even such large error is smaller than minimal aircraft separation defined for this part of airspace (5 km) and it is clearly visible from the map which aircraft transmitted voice messages. Therefore, our solution may be used for quick verification of transmitting aircraft even when pilots skip aircraft callsigns in their transmission.