Research manuscripts reporting large datasets that are deposited in a publicly available database should specify where the data have been deposited and provide the relevant accession numbers. If the accession numbers have not yet been obtained at the time of submission, please state that they will be provided during review. They must be provided prior to publication. The fingerprint fusion positioning system based on the wireless signal distribution characteristic was divided into three important links. Firstly, to improve the accuracy of the Bayesian positioning algorithm, it is necessary to calculate the distribution characteristic of different signals, obtain indexes according to regional information and time information, access the designated fingerprint library, realize the multi-level fingerprint structure, and use multi-source data for the final localization [
9] according to the relationship between different positioning results.
2.1. Distribution Properties of the Signal
The Bayesian positioning algorithm based on the signal distribution characteristic needs to collect the received signal data of each reference point successively and establish a fingerprint library before positioning. During the online positioning, the probability of the target being located at each reference point was calculated according to the signal received in real-time, based on which the estimated coordinate [
10] of the target position is obtained. When establishing the fingerprint model and calculating the probability, it is necessary to use the signal distribution model. The traditional Bayesian algorithm usually uses the Gaussian distribution model for calculation, but there are many possibilities for the actual distribution of the signals. If the distribution model is not selected properly, the positioning effect will also be seriously affected.
In this paper, a large number of statistics were made on the RSSI of Wi-Fi, FM, and DTMB signals, respectively. To reduce the influence of different factors on the statistical results, the receiving position and signal characteristic were fixed during the observation process, respectively. Here, the signal characteristic refers to different AP or signal frequencies. According to the observations, the three signal distribution models mentioned above are final.
Figure 1 shows the partial statistical results of the Wi-Fi signal.
As shown in
Figure 1, the distribution of the Wi-Fi signal is difficult to describe [
11] with known distribution models, so the polynomial model was chosen to fit the discrete Wi-Fi data. The assumed polynomial model is shown in Equation (1).
The parameters a0, a1, …, an can be solved according to the least squares method. In the offline stage, these parameters are stored in the position fingerprint library as the distribution characteristic of the model. When positioning, the probability values corresponding to different reference points can be calculated by substituting the online data into the polynomial model.
Figure 2 shows the partial statistical results of the FM signal.
As shown in
Figure 2, the distribution of FM signals can be described by a multimodal Gaussian distribution model. The multimodal Gaussian curve can be regarded as the superposition of multiple unimodal Gaussian curves. Taking the bimodal Gaussian model as an example, its probability density function is shown in Equation (2).
a, b, c in the above equation are the parameters of the model. With the increase of peaks, the number of parameters will also increase, and these parameters need to be stored in the database as the position fingerprint information features of the multi-peak Gaussian distribution model. Therefore, in order to avoid wasting too much storage space, the peak number of the model should not be set too high, under the condition that it can meet the positioning requirements. When fitting the data using the m-peak Gaussian model, according to the actual distribution, any data not above the m-peak can be fitted, so the unimodal distribution can be regarded as a special case of the multimodal distribution.
It is worth mentioning that the distribution of FM signals is different in different periods, as shown in
Figure 3. However, after three consecutive days of testing, the FM signal in the same period found that, although the distribution model parameters of the signal were not the same, the overall distribution state of the signal did not produce large differences and there was almost no change in the number of peaks, as shown in
Figure 4.
Figure 5 shows the partial statistical results of the DTMB signal.
As shown in
Figure 5, the “trailing” phenomenon of the signal statistics was found repeatedly while observing the received DTMB signal, and this data distribution state is closer to the Rayleigh distribution relative to the Gaussian distribution. The Rayleigh distribution is a distribution model [
12] used to describe the time-varying characteristic of independent multipath components or flat fading signal receiving envelopes, and its probability density function is shown in Equation (3).
The parameters of the Rayleigh distribution model are in this formula, which is the feature information that needs to be stored in the location fingerprint database. The data
x require a positive value, but the received signal RSSI unit is dBm, and the value is generally less than zero. Therefore, the processing of the data is needed before fitting the distribution model, adding a bias parameter, b, to convert Equation (3) into the form of Equation (4).
In contrast to the FM signals, the DTMB signal did not clearly show the time-related differences in the signal distribution, so the segmentation of the fingerprint libraries in different periods was not considered in the process of using the DTMB localization.
2.2. Multistage Fingerprint Structure
This paper adopts a multi-level structure of the location fingerprint database when online positioning by establishing the index to access a part of the fingerprint data. This achieves the purpose of saving computing resources. First, the fingerprint data in different periods are marked for the signal with distribution time degeneration, and the time index was obtained directly according to the time information when positioning.
In addition, when the overall positioning area covers a large physical space, each fingerprint matching requires comparing the fingerprint data of the entire positioning space [
13], which wastes a great deal of resources. Therefore, before positioning, a pre-trained classifier was used to initially screen out the region where the target is located, obtain a region index, and search the target fingerprint set jointly with the temporal index, thus reducing the matching time of fingerprint localization. The multi-level fingerprint localization process is shown in
Figure 6.
When obtaining a region index, you do not have to determine the exact location of the target, just get the approximate range of location. In this paper, Support Vector Machine (SVM) was used to conduct region classification. Combined with map information, the positioning space was divided into one region corresponding to a label. In the offline stage, the RSSI of wireless signal is the input to train the region classifier.
2.3. Multi-Source Data Self-Decision Fusion Localization Algorithm
After obtaining the specified subset of fingerprint data, the final positioning process was entered. First, the main positioning source and the auxiliary positioning source are selected. In this paper, Wi-Fi is used as the main source, and FM and DTMB are used together as the auxiliary source to avoid the overcompensation phenomenon caused by using a single auxiliary source.
In the positioning phase, a primary positioning result was obtained through the main positioning source, and the auxiliary positioning source is also used for positioning. In order to initially reduce the influence of signal fluctuations on positioning accuracy, it is necessary to use auxiliary positioning sources to calculate n positioning results in a short time. Since each collection and positioning of information takes a certain amount of time, the value of n should not be too large. After n results were obtained from the auxiliary location source, the sum of the distance from the remaining location results
was calculated for each auxiliary location result coordinate, as shown in Equation (5).
After calculating the corresponding distance of each positioning result, the distance and result were sorted from small to large. The smaller the sum of the distances, the higher the reliability of the positioning result, while positioning results with larger distance sums are considered invalid positioning results. Set a parameter k, extract the first k positioning results from the above ranking results, put them into the positioning result set of the signal source, and give the corresponding weights according to the corresponding distance and size of the different positioning results [
14]. The calculation method is as shown in Equation (6).
The auxiliary source weighted positioning coordinates shown in Equation (7) were then obtained.
After entering the reliability of the auxiliary source determination stage, assuming that the weighted localization results obtained from the auxiliary data source a and the auxiliary data source b are law and lbw, respectively, the distance between the weighted localization results and the primary localization results is dam and dbm, respectively, artificially set a threshold , and discuss the relationship between the distance and threshold to make different decisions. The following introduces four possible situations. (1) Neither dam nor dbm exceeded the threshold value. In this case, the positioning results of the auxiliary source a and the auxiliary source b are both close to the main positioning result. At this time, it can be considered that the auxiliary sources a and b can compensate for the main positioning source and jointly decide on the final positioning result. (2) One out of dam and dbm does not exceed the threshold value , and the other is higher than the threshold value . At this time, the situation shows that only one of the two auxiliary sources are close to the main positioning result. Assuming that the auxiliary data source a is close, the positioning result of the data source b is considered invalid. Therefore, in order to make the result decision, only the main positioning source and the auxiliary positioning source a need to be considered. (3) dam and dbm are higher than , but dab does not exceed the threshold . The localization results of the auxiliary sources deviate from the main localization results, but the localization results of the auxiliary sources are similar. It can be considered that the result of the main localization source has a large error and the reliability of the auxiliary source is high. The two auxiliary sources participate in the decision of the final fusion localization. (4) dam and dbm are higher than , and dab is also above the threshold . Due to the large differences in the positioning results of different auxiliary sources, it is difficult to ensure the reliability of the auxiliary data source. At this time, only the positioning coordinates of the main positioning source were selected as the global positioning result.
In conclusion, before the weighted fusion of multiple localization results, the fusion weights can be automatically adjusted according to the distance relationship between the localization results of different signal sources. Multi-source data self-decision fusion positioning algorithm expression are shown in Equation (8).