1. Introduction
With the rapid development of Micro Electro Mechanical System(MEMS)sensors, the smartphone is equipped with an inertial measurement unit (IMU), barometer and magnetometer, which provide new and cheap approaches to smartphone-based pedestrian positioning services. Now, it has penetrated all aspects of people’s lives. However, because of the complex scenarios faced by the smartphone, obtaining a high-precision position based on the smartphone is still a challenge. The complex scenarios include the diversity of smartphone carrying modes, pedestrian movement modes, and the accuracy limitations of smartphone built-in sensing, which are all factors that affect the accuracy of pedestrian positioning. Contextual information is very important to the positioning system. It not only affects the types of available signals, but also provides more information for positioning, and provides an important basis for positioning methods, the selection of fusion algorithms, and failure detection. Therefore, it is necessary to identify different scenarios and choose different coping strategies for different scenarios to obtain high-precision pedestrian positioning results.
At present, pedestrian motion mode recognition is mainly divided into two research directions: one is based on image processing technology, which converts the input image or video into feature vectors, and then recognizes the motion mode [
1,
2]. However, it is easily infringes on personal privacy and relies heavily on light conditions [
3]. The other is based on various sensors, such as accelerometers, gyroscopes, gravimeters, barometers, etc., that collect sensor data, extract various features, and classifying features. Then, various methods are used to recognize the movement pattern. Machine learning methods are mostly used for motion pattern recognition based on sensors, such as support vector machine (SVM) [
4], k-nearest neighbor algorithm (KNN) [
3], Gaussian naive Bayes (GNB), and artificial neural network (ANN). The average recognition success rate can reach more than 80%. For example, Sun Bingyi et al. [
5] proposed a behavior recognition method based on the SC-HMM algorithm, which can classify up and down stairs and elevators with a classification accuracy of more than 80%. Jin-Shyan Lee et al. [
6] proposed a threshold-based classification algorithm for carrying phone mode with an acceleration value, which is very simple and easy to achieve. Qinglin Tian et al. [
7] proposed a finite state machine (FSM) to classify the smartphone mode with a classification accuracy of more than 89%.
Other scholars have tried to combine different models to improve recognition performance. Liu Bin et al. [
8] combined four typical methods: k-nearest neighbor algorithm, support vector machine, naive Bayesian network, and the AdaBoost algorithm based on a naive Bayesian network to create a human activity recognition model. The optimal human activity recognition model was obtained through model decision-making, and the accuracy reached 92%. Using support vector machine (SVM) and decision tree (DT), any combination of motion state and mobile phone postures could be successfully identified [
9] with an average success rate of 92.4%. Other scholars combined convolutional neural network (CNN) and long-term and short-term memory (LSTM) network to recognize walking, sitting, and lying behaviors for wearable (tied-to-the-waist) devices, with a success rate of over 96% [
10,
11]. In addition, some scholars used other methods to realize motion pattern recognition, such as the implicit Markov model [
12,
13], the sensor data interception method based on last bit matching [
14], and the voting method [
15]. Some scholars have also studied the influence of window length on human motion pattern recognition, to choose the optimal window length [
16,
17].
Ichikawa et al. [
18] studied the ways people like to use mobile phones. The common locations of mobile phones are trouser pockets, clothing pockets, hand-held and so on. Scholars have explored various methods to identify the common locations, that is, to identify the location of the mobile phone. Yang et al. [
19] proposed PACP (Parameters Adjustment-Corresponding-to-smartphone position), a method that is independent of the smartphone mode. It uses the SVM (support vector machine) model to identify the smartphone mode with an accuracy rate of 91%. Deng et al. [
20] proposed to recognize the location of mobile phones based on accelerometer features, and tested the recognition results based on SVM, Bayesian network, and random forest. Noy et al. [
21] used KNN, decision tree, and XGBoost to test and compare, and showed that XGBoost has the best recognition success rate. Wang [
22] proposed a recognition method of the superimposed model, which combines the six models of AdaBoost, DT, KNN, LightGBM, SVM, and XGBoost to realize the location recognition of smartphones, and the recognition accuracy can reach 98.37%.
In general, the methods for scenarios recognition mainly focus on machine learning methods, such as SVM, CNN, KNN, etc. These methods have a low recognition accuracy rate when recognizing based on raw data, and when recognizing based on sensor features, they have a strong dependence on feature selection. The fusion of multiple models can improve the recognition accuracy, increasing the complexity of calculations and requiring a large number of samples. In addition, the calculation cost is large, and the choice of features is heavily dependent.
To solve this problem, we designed a DT-BP (decision tree-Bayesian probability) scenarios recognition algorithm by using a single model decision tree and a Bayesian state transition model, which aims at motion mode and smartphone mode. This method is more simplified, less computationally expensive, and less computationally complex, and can obtain the same recognition accuracy as the multi-model machine learning method. The contributions of this study are as follows:
We designed a decoupling analysis method to analyze the relationship between e different kinds of scenario, so as determine the identification order. As the interactions of different scenario are categorised, adverse effects on scenario recognition occur. Therefore, a decoupling relationship analysis method was designed to decouple different scenario categories and determine the sequence of scenario type identification;
We designed a DT-BP (decision tree-Bayesian probability) scenario recognition algorithm by using a single model decision tree and a Bayesian state transition model, which aimed at motion mode and smartphone mode. This method is more simplified, less computationally expensive, and less computationally complex, and can obtain the same recognition accuracy as the multi-model machine learning method;
We designed the corresponding decision tree criteria and probability allocation method for smartphone mode and motion mode. We carried out experiments for each scenario and compared them with the methods in the references.
The rest of this paper is organized as follows:
Section 2 introduces the presented algorithm, including decoupling analysis, feature extraction and scenario recognition algorithm.
Section 3 shows the experimental setup, results and discussion. And finally,
Section 4 concludes the paper.
2. Methodology
2.1. Decoupling Analysis of Scenario Category
It is necessary to analyze the decoupling relationship of different scenario categories to determine their independence and correlation. For example, if there are nk motion modes, mk smartphone modes, there are kinds of situations in any combination of the two kinds of contexts. It is too complicated and redundant to identify all the combined scenarios. And as the interaction of different scenario is categorized, adverse effects on scenario recognition occur. Therefore, a decoupling relationship analysis method was designed to decouple different scenario categories and determine the sequences of scenario type recognition.
The decoupling of smartphone modes and motion modes needs to be analyzed in three parts:
The correlation coefficient of the same motion mode in different smartphone modes;
The correlation coefficient of different motion modes in the same smartphone mode;
The correlation coefficient between different smartphone modes and different motion modes.
We used Pearson’s correlation coefficient to analyze the decoupling of the data. The calculation formula of the correlation coefficient is as follows:
where
r is the correlation coefficient,
n is the length of the data,
which are two different time series.
As the data sampling length in different situation is different and the data contains the periodic behavior of pedestrian movement, it was necessary to establish a time-lag series [
23]. The sequence
, after moving forward and backward by m sampling points is:
If the time-shift sequence is correlated, it must exist to maximize the correlation coefficient of .
To decouple different scenario categories, the following analysis method was used:
- (1)
To avoid dependence on feature selection, raw data were selected for data analysis;
To ensure a full analysis of different scenario categories, it was necessary to exclude other factors as far as possible, such as pedestrian differences, smartphone brand differences, etc. Therefore, the window length
n shall meet the pedestrian movement cycle, generally 0.5–1.2 s. The calculation method of window length
n is as follows:
where
is the window period, generally bigger than 2 s.
is the sampling period, which depends on smartphone’s brand and model. [] means round numbers.
- (2)
To ensure the integrity of pedestrian motion cycle, the forward and backward sampling points m should be selected as:
- (3)
To ensure the analysis is not disturbed by abnormal data, the sliding window is used to calculate the correlation coefficient
r, which is
, and
N is the sampling length of data. The decoupling analysis correlation coefficient is
- (4)
To analyze the decoupling correlation different scenario categories, we set
. Where
i and
u represent the smartphone mode,
j and
v represent the motion mode.
and
are two kinds of scenario. To obtain the analysis result we needed to analyze three situations as follows:
The test results are shown in
Table 1, which gives the correlation calculation results of a total of nine scenarios composed of three motion modes and three smartphone modes. The raw data include GNSS sensor, accelerometer, gyroscope, magnetometer, barometer, and Bluetooth.
According to the test results in
Table 1, the decoupling correlation can be summarized as follows:
From Formula (7) we know that the correlation between different motion modes has a certain correlation under the same smartphone mode. In the case of different smartphone modes, the correlation of the same motion mode is greater than 0.5. The correlation between different smartphone modes and different motion modes was very low, and less than 0.3. So, smartphone modes have little influence on motion mode recognition. On the contrary, motion modes have a great influence on smartphone mode recognition.
According to the above analysis, during scenario recognition, we can recognize the motion mode first. And when this is determined, the smartphone mode is recognized secondly.
2.2. Feature Extraction
In this paper, we extracted features from different sensor data in both the time domain and frequency domain, respectively. The time-domain refers to the extraction of the mathematical-statistical characteristics of the sensor measurement data in a certain window length, such as variance, mean, amplitude, etc. The frequency domain refers to the calculation of the Fourier transform and frequency domain entropy of the sensor measurement data in a certain window length. Then, the features in the frequency domain were extracted, such as dominant frequency, energy, frequency difference, etc.
As shown in
Table 2, the time-domain features extracted from active sensors were used in this paper. Where
is the length of the data window,
is the sampling data,
is the mean value,
i,j,k are the times at different sampling points,
is the value that the data in the window length
is greater and smaller than the threshold value
.
The height gradient value [
25,
26] is calculated by the raw data of barometer as:
where
t is the temperature, and the unit is °C.
is the reference air pressure and
p is the output of the barometer.
2.3. Scenario Recognition Algorithm
2.3.1. Design of Scenario Recognition Algorithm Based on DT-BP
The decision tree (DT) establishes the nodes by exploring the high-value data features in the overall data and constructs the branches of the tree according to the required research contents. With repeatedly establishing the branch nodes, the classification results and decision set contents are displayed with the tree structure [
27,
28]. The decision tree has the advantages of low computational complexity and is insensitive to missing content in the middle. It can handle irrelevant feature data [
29]. Decision trees also have shortcomings, such as low detection accuracy and the work needed for the preprocessing of time-sequential data. In the actual environment, due to the complexity of the environment, the different interference, the performance of different devices, the error accumulation of the sensor itself, etc., the error in the identification process is so relatively large that the usability is not high. To deal with this problem, we designed a context recognition method based on the decision tree and Bayesian state transition probability (decision tree-Bayesian probability, DT-BP).
Bayes theory is a common method in model decision-making. The basic idea is to know the conditional probability density parameter expression and a priori probability, convert the formula into a posteriori probability, and finally use a posteriori probability for decision classification.
If
is a priori probability or edge probability of
A. the conditional probability of
A after the occurrence of
B is
, which is called the posterior probability of
A.
is the conditional probability of
B after the occurrence of
A, which is called the posterior probability of
B.
is the prior probability of
B [
30]. Then
For the situation recognition in this paper, we suppose
as the situation category, and
. Where
U is the number of situations. We assume
as the feature quantity set, and
. Where
V is the number of features. If the features of
all belong to
, the probability is
. When it is satisfied
, it is considered
, which means the recognition is successful. Therefore, we only need to calculate
to recognize context. The Formula (9) is changed to:
To obtain the conditional probability of the scenario , we need to calculate , and .
The principle of probability allocation in this article: the number of features quantities is
V, and the probability of each feature quantity is the same, which means
is a constant. So
is the largest when
is the largest. That is
where
is the state probability. Its value at the current moment is related to the number of scenarios to be detected and the probability of the previous moment. It is independently designed according to different scenario categories and the number of scenarios
U.
is the conditional probability of the feature vector
, which is obtained from DT rules. The obtaining algorithm designed in this paper is as follows:
where
represents the number of features related to the category
.
is the judgment value of each feature. If the judgment condition is met, it is
, otherwise is
.
2.3.2. Recognition of Smartphone Mode Based on DT-BP
- (1)
Algorithm design based on decision tree
To realize the recognition of different smartphone modes, it is necessary to detect the transformation process between different smartphone modes, which determines whether the smartphone mode is transforming or fixed. When it is in the fixed position, the different smartphone mode is determined. There are two parts of mobile phone location recognition: transformation recognition and current location recognition. In this paper, we took six common smartphone modes [
18] as examples to design the specific decision tree, including texting, calling, pants front pocket, clothes pocket, pants back pocket and hand swing, as shown in
Figure 1.
According to the data characteristic analysis and feature analysis, the variance of the first-order norm of acceleration was used as the criterion for judging the position change of the mobile phone. The first-level decision criterion for the decision tree is:
where
Thre is the threshold. When
, the location is changing, otherwise it is not.
When the position is fixed, it is necessary to determine whether there is periodic oscillation, which means there are other periodic motions besides walking and swinging with the pedestrian. We use the second main frequency amplitude of the acceleration amplitude to determine it. The second-level decision criterion of the decision tree is:
where
is the threshold. When
, the position of the mobile phone has periodic movement, otherwise it is not. If the smartphone location with periodic movement, the judgment criterion is as follows:
where
is the threshold. When
, the position is pants pocket(pp), otherwise it is swinging. As the features of the front pants pocket(fpp) and back pants pocket(bpp) is similar, there is a new branch for them, and the decision criterion is:
where
M is the window length,
is the threshold.
To recognize the fixed smartphone mode without periodic motion, we used the first-order norm of acceleration, peaks, and wave as the features. The determination rule for designing a fixed smartphone mode decision tree is:
where
Thre1,
Thre2,
Thre3 are thresholds.
- (2)
Probabilistic design based on DT-BP method
According to the design of the DT-BP method, we needed to design
and
to calculate the probability distribution principle. The number of scenarios
U is 7, including 6 smartphone positions and the change process.
is designed as shown in
Table 3. According to the design of the decision tree, the number of features
V is 10. The design of
in Formula (12) is as follows in
Table 4.
2.3.3. Recognition of Motion Mode Based on DT-BP
- (1)
Design of motion mode recognition algorithm based on decision tree
In this paper, the recognition algorithm is designed by taking the motion modes of static, walking, turning, going upstairs and downstairs, escalator, and elevator as examples. According to the analysis of the extracted features, the dynamic and static are distinguished by the acceleration variance. We used positive and negative zero-crossing rates of acceleration air pressure gradients as the features of static, elevator and escalator recognition. Dynamic motion includes walking on the ground, going upstairs, going downstairs and turning. Walking, turning, going upstairs and downstairs are coupled movements, that is, turning and going up and down stairs also have walking movements. Therefore, the main work was to distinguish turning and going up and down stairs from walking. The amplitude of angular velocity is used to recognize turning. We used the auto-correlation coefficient, Fourier dominant frequency, and elevation gradient to recognize going up and down stairs and walking, as shown in
Figure 2.
The process of motion mode recognition based on the decision tree method is as follows:
- (a)
If is greater than the threshold , it is considered dynamic, otherwise, it is static.
- (b)
When it is static, it is necessary to distinguish between static and elevator and escalator. We use the zero-crossing rate of acceleration amplitude as judgment, and the judgment criterion is:
where
,
is the amplitude of acceleration and
g is the acceleration of gravity.
and
is the positive and negative zero-crossing rate.
and
are the zero-crossing judgment threshold. If
, it is going up or down elevator.
- (c)
When the motion is the elevator, we judge whether going up or down. If and the previous state is not the elevator, the state is going up elevator. If and the previous state is the elevator, the state is going down elevator. If and the previous state is not the elevator, the state is going down elevator. If and the previous state is the elevator, the state is going up elevator.
- (d)
If
, it is further recognized whether it is an escalator, and the recognition criteria are as follows:
where
is the pressure gradient,
and
are the positive and negative zero-crossing rates of the pressure gradient.
and
are the thresholds. When
, it is the static state. If
, the escalator is going up. If
, it is the elevator is going down.
- (e)
When pedestrians are in a dynamic state, we mainly distinguish turning, stairs, and walking. The angular velocity amplitude is used to recognize turning. When , it is turning.
- (f)
We use the auto-correlation coefficient and Fourier transform to distinguish stairs and walking. The auto-correlation coefficient at the offset
k = 2 and
k = 4 as the judgment value, and the main frequency of the Fourier transform as the judgment condition. The criterion is as follows:
where
is the correlation coefficient of the acceleration amplitude offset
k = 2.
represents the main frequency of the Fourier transform.
is the main frequency judgment threshold. If
, it is up and downstairs, otherwise it is walking on level ground.
- (g)
When pedestrians are going up and down the stairs, we use the height gradient value to make judgments. The criterion is as follows:
where
and
are the judgment thresholds of the height gradient. If
, it is going up the stairs. If
, it is going down the stairs.
- (2)
Probabilistic design based on DT-BP method
For the nine motion modes in the above example, the probability
of each state in the DT-BP method is determined according to the previous state. The transition probability between each motion mode is shown in
Table 5. According to
Figure 2 and the decision tree rules, the number of features
V is 12, as shown in
Table 6 for the design
in Formula (12).