1. Introduction
Global Navigation Satellite System (GNSS) can provide all-day and all-weather global Positioning, Navigation, and Timing (PNT) services for global users, and its positioning errors will not accumulate over time [
1,
2]. However, satellite signals are frequently blocked or even lose lock in complex urban scenarios, which cannot guarantee the effectiveness of positioning [
3]. Inertial Navigation System (INS) has the advantages of strong autonomy and strong anti-interference and can obtain short-term, high-precision navigation and positioning results [
4]. However, INS errors accumulate over time, and long-term independent solution can result in reduced accuracy or even divergence [
5]. GNSS and INS possess highly complementary characteristics, which can effectively overcome the adverse effects of the harsh environment with the two systems integrated [
6]. Therefore, GNSS/INS integrated navigation is widely utilized in naturalistic driving, high-precision vehicle navigation, Intelligent Transportation system (ITS), and autonomous driving [
7,
8,
9,
10]. Based on the integrated method, GNSS/INS integrated navigation can be divided into loosely coupled, tightly coupled, and deeply coupled. Among them, the GNSS/INS tightly coupled system can maintain continuous positioning despite the insufficiently visible satellites and has the advantage of simple integrated structure and easier implementation, which has been extensively recognized by many scholars [
11,
12,
13].
In the application of vehicle dynamic positioning, a continuous, reliable, and high-precision positioning method is urgently required. Balsa-Barreiro (2013) implemented an innovative methodology based on vehicle speed for the geo-referencing naturalistic driving method, which can overcome the problems related to the lack of positioning data [
14]. However, this method requires extra geographic information. In addition, some scholars use GNSS and multiple sensors for positioning, which can integrate the advantages of each sensor, and obtain continuous and reliable positioning information in urban scenes [
15,
16,
17]. But, the cost of multi-sensor is high, the weight is difficult to determine, and the amount of calculation is large. Therefore, based on the above discussion, this paper chooses GNSS/INS tightly coupled system positioning as the method to obtain the vehicle position in urban scenarios. However, the performance of GNSS/INS tightly coupled positioning is inevitably plagued by various outliers in the complex urban scenario [
18]. For the GNSS receiver, the multipath effect and NLOS signal are the commonly adverse factors to restrict the positioning performance of GNSS, which further subsides the GNSS/INS integration system positioning capability [
19,
20]. Multipath effect occurs when a signal is received through multiple paths between the satellite and the receiver antenna. Additionally, multipath interference can affect the phase detection characteristics of the receiver tracking loop, resulting in tracking and measurement errors [
21]. Multipath includes both direct and reflected signals, and the reflected signals can be multiple. NLOS signal reception occurs when the direct path from the satellite to the receiver is blocked and the signal can only be received through the reflected path [
22]. Due to the relatively low cost and the ability to provide accurate time reference and absolute coordinate information, GNSS solution is still an important and preferred technical method in the field of high-precision navigation and location services. Unfortunately, the degradation of GNSS positioning accuracy may negatively affect the performance of the entire system [
23]. We need to mitigate the error of the standalone GNSS positioning with innovative signal-processing methods to promote the performance of GNSS/INS integration.
Accurate multipath/NLOS detection and subsequent processing are the basis for the signal quality control and positioning strategy optimization of the GNSS/INS integration Navigation system [
24]. With the prosperity of artificial intelligence technology, more and more scholars began to employ machine learning or deep-learning methods to identify multipath/NLOS signals. These methods constructed the mapping relationship between multiple feature parameters and GNSS signal categories to reduce the operating cost of traditional methods and enhance the availability of the algorithm, which have achieved outstanding results [
25,
26,
27,
28]. The above approaches are classifiers based on supervised learning that need to label the training samples in advance. Meanwhile, the accuracy of the labeling is directly associated with the performance of the classifiers. In practical applications, it is challenging and expensive to obtain an accurately labeled dataset that covers multiple scenarios and represents all state types. In order to refrain from the limitations of signal classification methods caused by the above problems, scholars have begun to use unsupervised learning techniques for signal classification research, which can sufficiently excavate the information of the observation data itself [
29,
30,
31,
32]. Compared with the supervised-learning multipath/NLOS signal detection algorithm, the unsupervised technology has better advantages in availability and environmental applicability. Furthermore, the positioning accuracy after excluding contaminated GNSS satellites is significantly better than the traditional threshold method and the classic RAIM algorithm.
Given that the multipath/NLOS signal restricts the dynamic positioning accuracy and reliability of the GNSS/INS integrated system, apart from GNSS alone positioning, multipath/NLOS detection and mitigation for GNSS/INS integrated systems were developed in past years [
33,
34,
35]. However, these technologies are dependent on geographic information data or external sensor equipment, such as 3D building models, cameras, and LiDAR, which have a certain level of deficiencies in terms of availability, cost, and security. In addition, the equivalent weight model is employed to construct robust algorithms that can weaken the influence of the gross error of the observation on the positioning accuracy [
36,
37]. This method utilizes the robust factor to adjust the filter gain moment or the observation noise for GNSS/INS integrated positioning, which plays a role in suppressing multipath and NLOS errors to a certain extent. However, the robust estimation algorithm has difficulty handling multiple outliers on the same epoch and relies heavily on the correctness of the robust model. Therefore, when the original observations are seriously polluted by the multipath/NLOS signals in harsh environments, the reliability of the algorithm cannot be guaranteed.
This paper aims to further promote the accuracy of GNSS/INS tightly coupled positioning results by using unsupervised techniques to detect multipath/NLOS signals. A clustering algorithm is utilized to label GNSS data in offline system as normal and abnormal observations, the latter mainly caused by multipath/NLOS signals. The clustering criteria obtained by offline dataset training are applied to detect multipath/NLOS signals for online data, which can enhance the performance of GNSS/INS real-time positioning. Additionally, it can provide an innovative perspective for the research on GNSS signal quality control methods of vehicle positioning systems in highly complex urban areas.
The rest of this paper is organized as follows: In
Section 2, the relevant mathematical methodology is presented for K-means Clustering and GNSS/INS tightly coupled positioning algorithm. Next,
Section 3 implements the data collection and experiment analysis, which validate the accuracy and reliability improvements of the proposed method. Finally, the conclusion and outlook will be given in
Section 4.
2. Methodology
2.1. Feature Extraction
Reasonable feature value is critical to the capability of machine-learning algorithms, and this paper refers to the feature parameters of multipath/NLOS signal detection in supervised learning classifiers. Most of the current machine learning methods for multipath/NLOS detection adopt feature values at the observation data level. We only extract the feature parameters from the RINEX format file output by the GNSS receiver, including pseudorange, carrier phase, carrier-to-noise ratio (signal strength), and Doppler frequency shift, etc., which are all closely related to GNSS signal types. However, it is impossible for any single feature to effectively classify GNSS signals. Hence, a combination of different features is needed to ameliorate the classification accuracy [
22,
38,
39,
40].
(1) Satellite elevation angle: It is a common method to assign weights to each observation value based on the satellite elevation angle to reduce the influence of multipath and NLOS signal reception on the positioning results. Generally speaking, satellite signals from high elevation angles are less likely to be blocked and reflected by buildings, but this is not always the case in reality. Due to the height and distribution of buildings in urban areas, satellite signals at high elevation angles may also be NLOS signals, while signals at low elevation angles may be direct signals. Nonetheless, satellite elevation angle is still an important feature indicator to distinguish NLOS signals.
(2) Carrier-to-noise ratio: The GNSS receiver will output the observations of the tracked satellite signal strength. According to the signal propagation theory, supernumerary propagation and reflection will increase the path loss of the GNSS signal. As an important indicator reflecting the signal quality, the C/N0 observation value is also a common parameter to alleviate the multipath effect. Similar to the elevation angle, the satellite signal strength or carrier-to-noise ratio also has a certain correspondence with the type of signal. The signal strength of the satellite received by the survey antenna is usually higher in an open environment. However, the magnitude of C/N0 does not have a clear correspondence with the type of GNSS signal in a multipath environment, because constructive multipath will increase the received signal, while destructive multipath reduces signal strength.
(3) Pseudorange residual: When there are more observation equations than unknown parameters and the position estimation is accurate enough, the magnitude of the pseudorange residual can reflect the inconsistency between the pseudorange measurements and the geometric distance of the satellite.
In addition, multi-constellation GNSS integrated positioning increases the number of available observation satellites and observation redundancy. Therefore, the pseudorange residual can be used as an indicator to detect the quality of GNSS signals.
(4) Pseudorange rate consistency: The pseudorange observations originate from the receiver code tracking loop, and the Doppler shift of the signal is determined by the receiver frequency tracking loop. Compared with the code tracking loop, the multipath/NLOS signal has less influence on the frequency tracking loop, so the consistency between the pseudorange change rate and the Doppler frequency shift can reflect the interference degree of the reflected signal. Its formula is expressed as:
where
and
represent the pseudorange variation and time interval between adjacent epochs, respectively. According to the Doppler effect, the pseudorange rate
is calculated from the Doppler shift.
where
and
indicate the wavelength of frequency
i and the Doppler shift in Hz, individually.
Since all of the above single features are uncertain and interdigitated with each other for NLOS signals, it is impossible for any single feature to effectively classify GNSS signals. Thus, NLOS signals need to be determined by a combination of different features. In summary, this paper comprehensively selects the above four parameters to form the feature vector of cluster analysis. Then the data are standardized to eliminate the influence of different dimensions on the clustering results, that is, each feature value conforms to the standard normal distribution after data processing.
2.2. K-Means Clustering Algorithm and Its Evaluation Indicator
It is considered that GNSS signals in complex environments are generally divided into two main types: direct signals and indirect signals (including multipath and NLOS signals), and each type of signal has a certain internal relationship with the above four feature parameters.
In accordance with this characteristic, the K-means algorithm is used for signal clustering. When the sample is closest to one of the cluster centers, it is classified into this class.
For a given sample set
, where
represents the standardized feature vector of satellite elevation angle, carrier-to-noise ratio, pseudorange residual, and pseudorange rate consistency. This paper assigns corresponding weights to the feature parameters based on experience [
22], which are set as
, respectively. The K-means algorithm divides them into
k clusters
so that the sum of squared Euclidean distances from each data point to its nearest cluster center is minimized, namely:
where
is the mean vector of the cluster
, and can be expressed as:
The basic process of the K-means algorithm is as follows [
41]:
(1) This algorithm randomly selects k samples as the initial cluster center;
(2) For the remaining samples, according to the distance of their cluster centers, they are classified into the nearest cluster;
(3) For each cluster, the mean of all samples is calculated as the new cluster center;
(4) Repeat steps (2) and (3) until the cluster centers no longer change.
Based on the above calculations, all GNSS signals are classified into different clusters. The K-means algorithm needs to specify the value of
k in advance, which is usually defaulted to 2 or 3. However, due to the complexity of the scenario and the correlation between GNSS signals, 2 and 3 are often not the optimal
k values. Additionally, the difference between the clustering results corresponding to different
k values is not obvious. Therefore, this paper chooses the Davies–Bouldin Indicator (DBI) as the internal evaluation index of the clustering effect [
42].
DBI is defined as the average similarity between each cluster
,
and its the most similar one
, where the similarity is expressed by the ratio of the intra-cluster distance to the inter-cluster distance. The minimum value of DBI is 0, and the smaller the value, the better the clustering effect. The specific calculation formula is:
where
k is the number of clusters;
denotes similarity that can be constructed by a simple choice as follows so that it can keep nonnegative and symmetric:
where
and
mean the average distance between each point of cluster data to the centroid of that cluster also known as cluster diameter, individually;
is the distance between cluster centroids
i and
j, which represents the dispersion degree of data point for cluster centroids
i and
j.
2.3. GNSS/INS Tightly Coupled Positioning Model
GNSS/INS integrated navigation system adopts Extended Kalman Filter (EKF) for system fusion to realize high-precision navigation and positioning by effectively detecting and rejecting Multipath/NLOS signals in complex urban areas. In vehicle navigation, due to the strong reliability and highly real-time performance of the pseudorange/INS system, this paper employs the integrated positioning solution of GNSS double-differenced pseudorange (DGNSS) and INS observation.
The system state model depends on the INS error model and the description of the inertial sensor system error. The INS error equation based on the psi angle is adopted in this paper [
43].
where
δ, δ, and
δ indicate position error, velocity error, and attitude angle error, respectively;
is the rotation matrix from the body frame (
b-frame) to the navigation frame (
n-frame);
and
are the accelerometer and gyroscope error vector in the
b-frame, separately; In addition, the specific force vector measured by the accelerometer, the rotation velocity of the earth, and the transfer rate are represented by
,
, and
, respectively.
The accelerometer error and gyroscope error are the main factors affecting the accuracy of GNSS/INS tightly coupled system, and the bias errors are modeled by a random walk process. Their specific forms can be expressed as:
where
is bias of the accelerometer;
is the bias of the gyroscope;
and
express the corresponding random white noise.
The equation of the state of the system is as follows:
where
is the state transition matrix;
and
represent the dynamic noise matrix and the noise vector;
is the state parameter,
.
Based on the INS error model,
and
are optimized and presented as follows:
where
indicates the unit matrix;
and
represent state coefficients of position;
and
represent coefficients of velocity;
and
represent coefficients of attitude. The specific derivation of the above symbols can be found in [
33].
The difference between the distance from the satellite to the ground station predicted by the INS and the GNSS double-differenced pseudorange is solved, which is used as the EKF measurement to achieve high-precision positioning for GNSS/INS tightly coupled system. The measurement equation is written in matrix form:
where
represents the measurement vector at time epoch
k,
, and
indicates the satellite-to-ground distance predicted by INS; “*” represents different satellite systems including uses GPS and BDS in this paper;
is the measurement model coefficient matrix;
and
are the pseudorange observation noise and INS observation noise, respectively.
The final GNSS/INS tightly coupled positioning results can then be solved based on the following EKF procedures.
Update stage:
where
,
, and
express the state vector estimates, the state transition matrix, and the error covariance matrix at time epoch
k, respectively;
represents the system noise covariance matrix at time epoch
k−1;
indicates the measurement noise covariance matrix at time epoch
k;
denotes the measurement matrix at time epoch
k;
represents the EKF gain matrix at time epoch
k; In addition, ∎
k,k−1 represents matrix/vector ∎propagation from time epoch
k−1 to
k.
2.4. Overview of the Proposed Method
The flowchart of the proposed method is shown in
Figure 1. Firstly, four essential features are extracted from GNSS raw observation data, namely, satellite elevation angle, carrier-to-noise ratio, pseudorange residual, and pseudorange rate consistency, which are comprehensively used to enhance the classification accuracy. Secondly, satellite signals received by GNSS receivers in complex scenarios are generally divided into two categories: direct and indirect signals, the latter including multipath and NLOS signals, and each type of signal has a certain internal relationship with the above four main features. According to this characteristic, this paper adopts the k-means clustering algorithm to cluster the signals and selects the DBI as the internal evaluation index of the clustering effect. Finally, both the raw INS measurements and the GNSS double-differenced pseudorange observations are tightly integrated with EKF filtering, resulting in reliable and high-precision positioning results.
It should be pointed out that the measured circumstance of the GNSS receiver of the base station is open and superior with no obstacle occluding all around as shown in
Figure 2. Additionally, the Trimble GNSS-Ti earth-type chock-ring antenna is installed. This paper purports that there is no NLOS signal and the multipath is well suppressed, which is small relative to the rover and does not affect the subsequent positioning solution. Therefore, multipath/NLOS signal detection for base station GNSS observations is not required.
4. Discussion
Multipath effects and NLOS signals are the main factors restricting the accuracy and reliability of GNSS/INS positioning, especially in challenging environments such as urban canyons, shaded trees, etc. Therefore, given the interference of multipath/NLOS signals, this paper proposes an outliers detection method composed of an offline learning system and an online learning system for GNSS/INS tightly coupled positioning in urban areas. We believe that GNSS signals in complex environments are generally divided into three categories: LOS, multipath, and NLOS signals. Each type of signal has a certain internal relationship with the four feature parameters of satellite elevation angle, carrier-to-noise ratio, pseudorange residual, and pseudorange rate consistency. According to this characteristic, K-means algorithm is used for signal clustering. When the sample is closest to one of the cluster centers, it is classified into this class. In an offline system, the K-means clustering algorithm is employed to detect observation outliers and construct an offline training set with labels, without resorting to 3D building model and external sensor. On this basis, due to the good scalability of the K-means clustering algorithm, the above-mentioned model is then utilized to identify multipath/NLOS signals on the online observation dataset for real-time positioning.
As can be seen from
Figure 11 and
Figure 12, after excluding GNSS observation outliers, the number of satellites participating in the position calculation decreases, and the GDOP value has generally increased. Yet, despite this, the positioning accuracy has been improved. However, while ensuring the positioning accuracy, the continuity of the dynamic positioning results is also crucial. Although correct outlier detection and exclusion can effectively improve the performance of positioning results, it must be admitted that directly removing abnormal observations will reduce the number of available GNSS satellites and weaken the satellite geometric distribution. This will reduce the positioning performance to a certain extent, especially when the number of available satellites is small. Therefore, it is not advisable to blindly pursue positioning accuracy and lose a large number of original valid epochs, but there will certainly be more space for optimizing signal selection when the available GNSS constellations are enough.
Unfortunately, although this paper has made some meaningful explorations in multipath/NLOS detection and elimination, the above research work still needs to be further improved because of the complexity of multipath/NLOS signals, for example, using more GNSS/INS observation data to establish offline label datasets, so that the training set covers more scenarios and satellite constellations, and improves the generalization ability of the classification model. In addition, based on different anomaly distribution assumptions, a more suitable detection method for observation outliers is pursued under the condition of ensuring positioning accuracy. We have also made preparations for this and will further study it in the future.
5. Conclusions
GNSS/INS integrated navigation possesses excellent characteristics so that it plays a significant role in vehicle positioning requirements. However, the performance of GNSS/INS integration suffers from excessive unexpected GNSS outliers such as multipath/NLOS signal in dense urban areas.
This paper put forward an urban vehicle GNSS multipath/NLOS observation detection algorithm based on K-means clustering, which can effectively promote the accuracy of GNSS/INS tightly coupled positioning results. The method is essentially an offline learning system that can be used for post-processing solution of GNSS/INS observation data. Simultaneously, we employ K-means to detect observation outliers and obtain LOS/NLOS classification rules, which can be further broadened to GNSS/INS integrated navigation vehicle position in real-time. The proposed method obtains the signal type label by sufficiently excavating the information of the GNSS observation data itself, without the assistance of external software and hardware. Based on the good scalability of the K-means clustering algorithm, the above model is used to identify the multipath/NLOS of online observation data for real-time positioning. As a result, it can effectively enhance the performance of GNSS/INS tightly coupled system with higher availability and environmental adaptability.
In future work, we will continue to research the influence of the GNSS signal distribution pattern in different scenarios and test data on the positioning performance of GNSS/INS tightly coupled system, and study a more robust outlier boundary determination rule. Additionally, when the number of visible satellites is relatively small, simply excluding the multipath/NLOS signal will deteriorate the satellite geometric distribution, which reduces the positioning accuracy or even fails to execute the positioning solution. Therefore, in future research, we will further consider reasonable multipath/NLOS processing strategies, such as optimizing the stochastic model of the observation equation [
45].