1. Introduction
The human activity and posture transformation recognition is useful to provid users with valuable situational awareness, thus become one of the hotspots in many fields such as medical care, human-computer interaction, film and television production, and motion analysis [
1]. The two dominant approaches for human activity classification used in literature are Vision-based systems and Wearable Sensor-based systems. Vision-based systems are widely used to detection of human parts and identification of daily activities [
2]. These systems process the collected visual data for activity classification.
Wearable Sensor based systems consist of multiple inertial sensors connected to a human sensor network. After receiving and executing system commands, the raw human body data would be given feedback [
3,
4]. Inertial measurement (accelerometers and gyroscopes) units are used to measure the triaxle angular velocity and the triaxle acceleration signals generated during human body movement [
5]. Sensors available in smartphones, such as temperature sensors and pressure sensors, are useful to know the surroundings [
6]. The data collected from the sensors attached to the user and sensors installed in the surroundings are proceed to provide situational awareness to the user [
7]. One of the problems of using accelerometer to detect the motion of an object is that it often affected by the gravitational field in the measurement, and its value (g = 9.81 m/s
2) is relatively high. However, many studies have found that gravity factors can be separated from body motion by filtering. When using three-axis accelerometer, the induced gravity vector can also help determine the direction of the object relative to the gravity axis [
8]. The gyroscope measures the direction indirectly; that is, it first estimates the angular velocity, and integrates the angular velocity to obtain the direction. However, a reference initial angular position is needed to obtain the direction from the gyroscope [
9]. Gyroscopes are also prone to noise, resulting in different offsets which can be eliminated by filtering.
At present, many scholars have studied the problem of human behavior recognition based on video data [
10]. In [
11], the authors proposed depth video-based HAR system to utilize skeleton joints features indoors. They used processed depth maps to track human silhouettes and produce body joints information in the of skeleton, then the hidden Markov model was trained by features calculated from the joint information. The trained model was adopted to recognize various human activities with a mean rate of 84.33% for nine daily routine activities of the elderly. Basbiker M, etc. [
12] developed an intelligent human recognition system. In multiple stages of the system, a series of digital image processing technologies were used to extract the human activity feature data from the frame sequence, and a robust neural networks was established to classify the activity models by using a multi-layer feedforward perceptor network. However, the vision-based HAR is limited by spatial location, and video data is relatively complex. It is easier to cause privacy leakage. In contrast, data based on inertial measurement unit can avoid these problems very well, thus it is becoming a new trand of HAR.
The human activity recognition system has three types of feature extraction methods: temporal features, frequency features, and a combination of the two [
13]. The authors of [
14] put forward an algorithm named S-ELM-KRSL, which is more suitable for processing large-scale data with noises or outliers to identify the motion sequence of body. After experiment, the scheme could detect symptoms of mild cognitive impairment and dementia with satisfactory accuracy. In [
15], Zhu, etc. proposed a semi-supervised deep learning approach using temporal Ensembling of deep long short-term memory to extract high-level features for human activity recognition. They investigated temporal Ensembling with some randomness to enhance the generalization of the neural networks. Besides the use of ensemble approach based on both labeled and unlabeled data, they also combined the supervised and unsupervised losses and demonstrated the effectiveness of the semi-supervised learning scheme in experimental results. The authors of [
16] brought up a novel ensemble extreme learning machine (ELM) algorithm, in which Gaussian random projection is employed to initialize the input weights of base ELMs and more diversities had been generated to boost the performance of ensemble learning. The algorithm demonstrated recognition accuracies of 97.35% and 98.88% on two datasets. However, the training time of the algorithm is slightly longer. In [
17], a feature selection algorithm based on fast correlation filtering was developed to achieve data preprocessing and demonstrated that the classification accuracy can reach up to 100%. However, the classification model only used the AIRS2 algorithm which may not be suitable for other classifier. Feature selection is based on well-defined evaluation criteria to select the original feature set, which eliminates small correlations and unnecessary features. The selected features don’t change the original representation of the feature set, and feature selection helps online classification to be more flexible [
18].
Most human behavior recognition systems developed in the past ignored posture transitions because the incidence of posture transitions is lower and the duration is shorter than other basic physical activities [
19]. However, the above assumptions depend on different applications and are not applicable when multiple activities must be performed in a short period of time. On the other hand, in many practical scenarios, such as fitness or disability monitoring systems, determining posture transitions is critical because in these cases the user performs multiple tasks in a short period of time [
20]. In fact, in the case of human behavior recognition system and transient posture perception, the classification will change slightly, and the absence of specified posture transformation may lead to poor system performance [
21].
A posture transition is a finite duration event determined by its start and end times. In general, the time required for posture transitions between different individuals is different. The posture transition is limited by the other two activities and represents the transition period between the two activities [
22]. Basic activities like standing and walking can be extended for a longer period of time than posture transitions. The data collection of the two types of activities is also different. The posture transformation needs to be repeated to obtain a separate sample. Since the basic activities are continuous, multiple window samples can be obtained from a single test according to the limitation of its time range [
23].
The other works related to this paper are referred in [
24,
25]. We have researched a large number of features on HAR assisted by an inertial measurement unit in the past. The various activity features are classified hierarchical, and six basic activities can be identified with an average accuracy of 96.4%. However, the transition period of activities was out of account.
This paper focuses on Human Activity Recognition with postural transition awareness. In this paper, the motion of the human body was sensed by an accelerometer and a gyroscope of the inertial measurement unit. The magnitude and direction of the acceleration can be measured by vertically arranging the sensors in three-dimensional space. It can also be built on a single chip, and it is now common to use three-axis accelerometers in some commercial electronic devices [
26]. First, we analyzed the si
x-axis signal data acquired by the inertial measurement unit, and thenpreprocessed to obtain a variety of signals that can represent the action. The various signals obtained from the preprocessing were extracted in the time domain and the frequency domain using various standard and original measurement methods to characterize each active sample. Thereafter, we perform feature selection according to the specific classification condition by using various feature selection algorithms. A variety of machine learning methods are used to classify and selected the one with the highest classification accuracy. Finally, we use support vector machine to classify the posture. Different kernel functions and specific parameters are used to optimize the model.
Figure 1 shows the framework followed in this paper for Activity Recognition. The framework consists of four modules: Data preprocessing, Feature Extraction and Selection, Classifier Selection, and Classifier Evaluation. The details of each module are given in next sections. In
Section 2, we described the Data preprocessing, Feature extraction, and Data selection.
Section 3 is focused on the Classifier Selection. In
Section 4, we discussed Classifier Selection and Results. We concluded the paper in
Section 5.