1. Introduction
Tracking a human subject’s kinematics and characterizing the subject’s behavior at-home plays a significant role in robotics, computer engineering, physical sciences, medicine, natural sciences and industrial academic areas [
1,
2,
3,
4,
5]. In medicine, pervasive human motion monitoring can improve the rehabilitation process by moving treatment processes from clinics to homes [
6], and facilitate the design of treatment plans and follow-up monitoring [
7] to reduce the time and cost for patients making round-trips to hospitals. It can improve the accuracy of diagnoses and the overall quality of the treatment of neurological disorders [
8]. It can remotely monitor the life activities of the elderly in their homes [
9], and in particular it can detect, in real-time, risk situations for the elderly population like falls [
10], and can assist the medical staff in hospitals to monitor patients, even at night. Aggregating context information with real-time daily life activities can help provide better services, service suggestions, and changes in system behavior for better healthcare [
11].
Human subject kinematic assessment can be obtained by the estimation of the Body Part (BP) positions over time by contact technologies like Inertial Navigation System (INS) [
12] or by non-contact systems. This estimation can be used to classify activity level, performance, and activity type. Such analyses can assist in forming an objective score to gauge the severity of symptoms of neurological disorders like Parkinson’s disease (PD) and others [
12]. The kinematic patterns of PD subjects can be used to assess PD severity, for instance in a Freezing of Gait (FOG) situation, where effective forward stepping is involuntarily arrested [
13].
Unlike contact sensors, non-contact sensors are not wearable, do not require battery replacement, and can give information of all the body parts. The most widely used non-contact methods for motion acquisition are videos based on optical technology. Video recording systems can track human subjects [
14] and extract various kinematic features [
15], track their activity [
16,
17], and analyze the subject’s behavior [
18]. Typically using markers attached to the patient’s body, such systems are accurate and deployed in many gait analysis laboratories [
19], but are expensive, limited in range to the laboratory, and are limited to Line of Sight (LoS) conditions and to the markers’ locations. Existing marker-less video systems are less accurate, require multiple synchronized cameras [
17], and sometimes still require an expert’s analysis of the resulting video streams [
20].
Technologies based on electromagnetic [
21,
22], and ultrasonic technologies [
23] can also be deployed for activity monitoring. A narrow band radar has been suggested [
24] for the detection and classification of patients’ movements using a gait signature [
25]. An Ultra-Wide-Band (UWB) radar was suggested for the acquisition of BPs’ displacement and motion kinematics [
26] or to detect human activity in urban environments [
27]. A sonar system based on acoustic technology similar to radar was used to produce an acoustic signature, has been successfully used to classify various activity types [
23]. However, Sonar and Radar technologies suffer from multi-path fading, are generally limited in range, and cannot provide precise information about the absolute location of different BPs in time and space [
28].
The Microsoft Kinect™ (Kinect) [
29], is a system that combines optical- and radar-based technologies for human detection, tracking and activity recognition. The Kinect [
30] is an active system for the assessment of human activity acquisition that combines an optical video camera and infrared radar technology. It contains a normal Red, Green, and Blue (“RGB”) camera, a depth sensor based on infra-red radar, and a microphone array, simultaneously providing streams of depth signals, RGB images, and audio signals. The Kinect software that processes and aggregates the images from the infra-red radiation line of sight reflections, together with the related color video streams, can reconstruct skeleton-like parts over time [
31], and enable capturing human 3-D motion [
32].
The Kinect’s ability to assess human kinematic data was validated when compared to an optical marker-based 3D motion analysis (e.g., [
33]). The accuracy of the Microsoft Kinect sensor for measuring movement in people with Parkinson’s disease was shown to be high compared to the Vicon motion capture system [
19]. The Kinect demonstrated varied success measuring spatial characteristics, ranging from excellent for gross movements such as sit-to-stand (Intra-class Correlation Coefficient (ICC) = 0.989) to very poor for fine movement such as hand clasping (ICC = 0.012) [
19,
34]. The capability of the Kinect to assess reliable spatiotemporal gait variables was shown in [
35]. Activity recognition at home with Kinect using depth maps and cuboid similarity feature was described in [
36]. Shape and dynamic motion features were derived from Kinect for activity recognition by using temporal continuity constraints of human motion information for each activity and Hidden Markov Model (HMM) [
37]. Skeleton-based activity detection and localization using the Kinect enables effective ubiquitous monitoring [
38]. In [
39], an algorithm to monitor activities of daily living has the capacity to detect abnormal events in the home (e.g., falling).
Despite its relatively high accuracy rate and its ability to provide full-body kinematic information, the Kinect (versions 1 and 2), still possesses the following deficiencies [
5]: (1) Its range of coverage is limited: when a subject comes too near or goes too far from the coverage range, the resulting information is distorted or unavailable [
39,
40]; and (2) when multiple people cross through the Kinect range, or when one person is closer than or hides the other, the current Kinect application begins an automatic re-detection process. This results in erroneous human identity detection which leads to an inaccurate interpretation of the data [
41]. Deployment of multiple Kinect sensors at various locations in the environment can extended the coverage range [
42], but requires additional deployment procedure, and their synchronization and the sensor data aggregation, still remains a challenging task [
32,
42]. Furthermore, the ambiguity of multiple subject estimations, and reference for distortion of skeleton estimation, was not yet addressed.
In the paper in [
43], we proposed inspired by radar [
26] and sonar signatures [
23] to use a Kinect Signature (KS) to differentiate between subjects. The KS is based on features like subject’s size, proportions between different BPs, kinematics profiles, and when possible, subject’s color and voice features. These KS attributes can be assessed in a separate calibration phase, or using a priori knowledge about the Subject of Interest (SoI). This paper extend the work in [
43] and proposes set of static features accompanied with computational techniques to be used to overcome some of the existing Kinect drawbacks and form a major step towards using Kinect based continuous kinematic feature assessment at home. The feasibility of the new technology is demonstrated in an experiment setup composed of three sets: (1) Five people were recorded walking in a room, their static features were extracted, and used to validate the capability to distinguish between the different subjects based on their signatures; (2) tracking complex gait patterns of a subject by simulating a PD patient walking with another subject (simulating a medical care assistant or a family member walking with the subject); and (3) hand-tapping on a surface, to demonstrate the capability of the technology to assess kinematic features of the SoI.
This paper has three contributions: (1) Suggesting a new set of skeleton-based static features that form a unique subject signature; (2) a mechanism to map the different Kinect instances to the SoI (subject of interest); and (3) methodology to detect and exclude noisy skeleton estimations (due to shadowing or getting near the upper coverage limit of the Kinect) and by this to improve the kinematic features quality. Utilization of the suggested system and procedure has the potential to enable reliable kinematic features at-home while doing daily life activities. The kinematic features can be sent in real time to medical care center to monitor patient medical condition objectively and assist in follow-up treatments [
44]. This paper is organized as follows:
Section 2 describes the methods used in this study.
Section 3, describes the results.
Section 4 summarizes the results and suggests directions for future research.
2. Materials and Methods
The proposed real-time scheme composed of the following stages: joint estimation, static features estimation based on the joints’ estimation, feature selection, artifact detection and exclusion in the feature domain, identification of the SoI skeleton based on the KS, and ways to derive common kinematic features that can be used for kinematic analysis. All of the data analysis stages are summarized in
Figure 1.
2.1. Continuous Body Segments Tracking Using Kinect
The Kinect utilizes independent color (RGB) and depth images streams at frame rate of
, which is usually set to 1/30 s [
32]. The color and depth images can be aggregated to provide estimations for 3-D joint coordinates (3-D) [
45]. The joints coordinates at instance time m,
, can be estimated recursively, similar to [
45], by:
where
, and
are the 3-D Kinect joints’ location estimation vectors at time instance
m, and
,
, and
are the color and depth images at time instance
m,
, and
are the skeleton joints’ distortion, and noise due caused [
46], and
is a function that maximizes the joint matching probability based on a very large database of people [
5].
Much of the image interpretation is performed independently on each frame, which enhances the system’s ability to recover from tracking errors [
45]. Human subjects are identified based on matching their skeleton and depth estimations to a training data set (built from many subjects by Microsoft). The temporarily distortions in the skeleton’s estimations as captured by the term
, are therefor due to low quality of the measurements, and to discrepancy between the Kinect trained model and its observations, leads to temporal artifacts in the skeleton.
The identified subject becomes part of the current active set, which is restricted to maximum 6 people [
5]. The number of joints in the skeleton varies between 20 (Kinect v1), and 25 (Kinect v2) [
5]. Still, when one subject hides behind another, or moves in and out of the Kinect range, the skeleton can be distorted. In some cases the skeleton estimation becomes not valid anymore, and a new identification and assignment process is initiated. This can result in erroneous subject’s assignment with an index (Kinect subject instance) that was previously associated with a different subject. This can cause erroneous tracking and lead to degradation in activity diagnosis.
2.2. Human Kinect Signature (KS)
For the purposes of this study, the Kinect subject active set is defined as the set of subjects in a given scene. The subject index is assigned blindly by the Kinect to an arbitrary available index. Whenever the subject is excluded from the active set (due to shadowing, or by going outside of the Kinect range) its index becomes available for another subject. Each new subject index assignment forms a new Kinect Instance (KI), which can cause miss-assignment to different KIs.
To consistently assign each identified KI to the right subject in the Kinect database, in particular to the Subject of Interest (SoI), we define a human Kinect Signature (KS) similar to radar [
21,
22], and sonar signatures [
23]. The KS is composed of set of features that their typical sets of values can be used to characterize each subject. These KS characteristics features’ values can be obtained through a calibration or training sessions, with or without deployment of an a priori knowledge about the SoI, and form a KS database of the SoIs [
47]. The KS can be used to identify different subjects.
2.3. KS’s Feature Set
The KS can be based on static or dynamic features [
23]. Static features can be based on: (1) Body dimensions (such as a BPs’ length), body volume, or a BPs’ proportions [
48]; (2) consistency of BPs’ colors; (3) facial features; or (4) static features of the voice. Dynamic features are related to gait pattern, stride length, asymmetry in posture, dynamics of the facial expressions, and activity profile.
The usage of color based features is limited to the time where the subject remains with the same set of clothing, and is affected by change in light conditions. The color of the skin, in areas that are not covered by clothing, like the facial area, can be used alternatively [
5]. The use of Kinect facial features, requires advanced and accurate registration, and is not available in many Kinect settings. Skeleton-based features are more reliable, and incorporate the rich database that was used to build Kinect, and include a priori information about human movements.
The focus of this work is, without loss of generality, on KS based on skeleton-based static features, which has the advantage of having prior knowledge that their value should be consistent values over time. This enables using a low number of features and eases the processing stages. This work focuses on three main skeleton-based static set of features in three domains: BPs’ length, ratio between the BPs’ sizes (both based on estimated joints locations), and body’s color at these joints locations (derived by using the image color related to this set of joints). The first two are based on the assumption of a rigid body [
41], and that the relative locations (proportions) and lengths of the subjects’ joints are preserved, and thus should have similar value over time. The color consistency of skin color is high [
5], and of clothing is usually restricted to a period of one day.
As the length of each BP is assumed to be preserved, it can be used to identify each subject [
49]. The subject’s BPs spread and body dimensions as derived from depth images, were shown to be informative as body joints features to recognize different human poses [
50]. Here we suggest incorporating the sum of BPs’ lengths based on the joints estimations in Equation (1), to provide dimension and length feature. The length’s feature can be defined as:
where the operation
is Euclidean distance metric,
is the full set of joint’s indexes, and
is the length of the BP between joints
, and
, which is denoted as
.
Another domain of skeleton features can be based the relative on the joint positions relationship [
51]. It can be estimated by the ratio between the BPs lengths. The ratio can be used to differentiate between subjects based on their differences in BPs proportions. The ratio feature at time instance m, can be defined as a subset of ratios between a set of BPs. For subset of two BPs, the ratio between the
and the
, is defined by:
The color features,
, can be obtained by the color coordinates of subset of body areas, based on their related skeleton’s joints [
52]. It can use the RGB colors of the joints directly, or it can separate the RGB components by color intensity and the ratio between the RGB colors.
The set of static KS’s features of the
n’th subject for
M consecutive samples is given by:
where
, are the length, and ratio features computed for the
n’th KI in Equations (2) and (3).
Kinect-based skeleton estimations suffers from three main distortion sources: (1) When the subject moves out of the Kinect effective range and the Kinect uses inaccurate interpolation; (2) erroneous skeleton merges with nearby subjects or objects; (3) inaccurate interpolation, where the subject is partially hidden by an object in the room.
The temporal distortion in the skeleton estimations in Equation (1) leads to temporal artifacts in reference to the KS’s features as derived from any non-noisy skeleton measurements. Since the KS’s features are based on static features that without distortion invariant to different body postures and positions, the KS’s features can be modeled as signal distributed around the distortion free features’ value, which can be seen as the features’ reference value. The KS’s feature set related to the
n’th KI, at instance time m can be modeled as:
where
is the reference KS calculated based on reference value of the skeleton, and
is noise vector at the size of the feature set, and can be assumed as colored noise, due to the correlation between consecutive Kinect estimations.
2.4. KS’s Feature Pre-Processing: Artifact Detection and Exclusion
An artifact detection algorithm is applied on the KS’s features to estimate and exclude the temporal distortion factor of the KS’s features,
, by using the reference KS’s set of values. The reference KS’s features estimation,
, can be estimated by using another sensor modality like video, where the markers are set at similar point of the Kinect skeleton. The work in [
19] has shown that the Kinect joint’s estimations can be accurate with an error in a range of few millimeters under the conditions of: (1) Optimal range from the Kinect (2–3 m); (2) his/her all body is included in the Kinect frame; (3) and in subject in frontal orientation to the Kinect, was shown. Under the assumption that the distortion tracking interval is long enough and its distribution is ergodic, the reference KS’ features can be estimated by the median of the KS’s features value along the life span of the KI. Thus the distortion factor of the
n’th KI, at the
m’th time instance, can be estimated by:
where
is the median vector of the KS’s features along the
n’th KI life span in Equation (4).
In this work, without loss of generality, we use a binary quality measure, to distinguish between low quality measurements that should be excluded from the feature set that is used for identification. A binary quality measure at the
n’th KS at instance time m, can by two-values confidence level of the estimations defined as:
where
is the distortion threshold, which is a function of the KS’s joint estimation error,
standard deviation.
The distortion threshold value should be tuned to maximize the artifact detection probability, and has typical values that move between 0.5 and 3 standard deviations, according to the environment [
43]. Tuning the distortion threshold,
, should be done in a way to minimize the identification error in the desired environment. Its value increases with deviations from the optimal Kinect range, with the rate of shadowing and rapid movements, where more joints are interpolated, since interpolated joints are likely to be more distorted, and are therefore less reliable.
More enhanced algorithms can aggregate to the quality measurement higher statistical moments, or abnormal skewness or Kurtosis values [
53]. In addition, the artifact detection algorithm can incorporate knowledge about invalid postures, using a priori constraints of the human body [
54], about the operation range (if it resides in the optimal coverage), and about the features that are derived by interpolation, mainly when parts of the human body are out of the Kinect frame. The low quality erroneous features that are suspected as artifact are excluded from the KS’s feature set, prior to the subject identification. The low quality measures can be replaced by the KI’s median value
, which is the operation of median filtering. Another way to mitigate over the artefactual features, is by using physiological constrains on the skeleton as in [
55].
2.5. Features Selection
After features cleaning, a feature selection algorithm that chooses the more robust features for subject identification and that are less sensitive for the joints’ estimation error, can be applied. Unsupervised feature selection algorithm like sparse Principal Component Analysis (PCA) that choose significant features in the PCs representation based on the features’ variability [
56], can be chosen when there is not training data. Alternatively, supervised feature selection like AdaBoost that minimize the training error by using a set of weak classifiers of partial sets of the features [
57], or Fisher criterion, which is optimum for Fisher’s linear discriminant classifier, and has computational and statistical scalability [
58] can be used.
2.6. Identification of Soi’s Skeleton
The identification of the SoI’s skeleton out of the KIs can be seen as part of classification problem of the KIs’ features to the different subjects. The classification uses the KS’s features of the different KIs. It can be further improved by taking into account the temporal quality of the measurements in Equation (7).
The classification problem can be reduced to detection of only the SoI’s KIs out of the all KIs in the active set. Without loss of generality, we would choose one (the
k’th) subject SoI out of the total number of subjects in the Kinect subject active set list. A detection criterion based on minimal square root error criterion for choosing the KS of the SoI from the
-size active set KSs at instance time
m is:
where the ‖ ‖ operator is the Euclidean distance metric,
is the SoI’s KS, which can be estimated by reference sensor or by the median of the features’ value in calibration process in optimal Kinect conditions,
is the
n’th KI’s set of features in the active set,
after artifact exclusion, and
is the detection threshold of subjects in the data base.
The detection threshold can be tuned to minimize the error of subjects of interest in the data base, sometimes referred to as subjects in a white list. In case that artifact was not filtered out, or the subject does not exist in the data set, the estimated index, , would be null.
2.7. Kinmetic Features
To enable patient kinematics’ analysis at home, after the SoI’s identification, its related Kinect estimation can be aggregated to one data base, and we can derive skeleton based features for motion and activity analysis. For the purpose of demonstration of the kinematics’ analysis based on Kinect features at home like environment, we will use the common features of joints’ angular orientation [
59] and velocity [
60]. Other more enhanced features for kinematic analysis, like pose changes over time [
61], asymmetry measures, and kinetic pattern can be used to enhance kinematic analysis, and be used to enhance the subject identification by forming kinematic signature [
62].
The angular position is based on the angles between two BPs and a common joint (vertex). It can be relative angle, such as the intersecting lines of two BPs and their common joint, or absolute, relative to a reference system. The relative angle between the body parts
, and
, in degree, is defined as:
where the operators
, and
are the cross and dot products, and ‖ ‖ is the Euclidean norm.
The subject absolute velocity can be estimated by the torso in
x-
y plane (usually ground plane) by:
where
is the subject torso
, and
coordinates at time instance
m.
Following the stages above ensures that the kinematic features are of the SoI. The quality measure of time instances with lower reliability could be aggregated to the kinematic analysis algorithm. Attentively, artefactual time instances of the features with low estimation quality can be mitigated by using interpolation using spatial-temporal correlations. Alternatively, the mitigation of artefactual time instances, can be performed in the joints domain, using an algorithm that inforce physiological constrains on the skeleton as in [
61], or by using statistical information on the skeleton data [
62], and then to recalculate the features. The “clean” kinematic data can be then used for more accurate activity recognition [
63] by applying methods like dictionary learning of the set of the activities [
26].
2.8. Implementation Scheme
The implementation is separated to two phases: a training phase to assess the SoI’s KS; and a real-time SoI tracking phase, with estimation of kinematic features that can be used for the kinematic analysis. A more general implementation would continuously update the Kinect signature from the tracking data, and additional kinematic features. When multiple Kinect sensors are available in the environment, the same implementation scheme can be applied on each sensor separately, and then combined by choosing the Kinect sensor with the best subject has maximal observance.
2.9. Experiment Setup
The motion sensor was a Microsoft Kinect (Kinect v1), which consists of a depth sensor based on an infrared (IR) projector and an IR camera, and a color camera [
32]. Since all versions of Kinect (Kinect v1, v2) lack skeleton identification and suffer from skeleton distortion, the same methods can be applied to all Kinect models, and to other skeleton-based sensing modalities. The sensor has a practical range limit of 0.8−3.5 m distance. The angular field of view was 57° horizontally and 43° vertically. The sensor could be tilted up to 27° either up or down. The analysis software was Matlab (version 2015.a, Matlab Inc., Natick, MA, USA), with corresponding Kinect SDK version 1.6 (Microsoft, Redmond, WA, USA) to collect the Kinect estimation. The RGB frame rate, Fs, was of 30 Hz, and the image resolution was of 640 × 480 pixels. For configuration, the trigger time for calibration was setup to 5 s, and for tracking 3 s.
For evaluation of the quality of the KS features, an offline calibration phase, performed once before the start of the tracking, (similar to taking an identification (ID) photo), is performed. It is sufficient to estimate only the SoI’s KS, as other subjects’ data can be excluded from the analysis based on their distance from the SoI’s KS in Equation (8). In this phase, the SoI is standing in front of the Kinect, in optimal range from the camera (around 3 m [
32]), where the full skeleton is included in the image frame in line of sight conditions, and the skeleton distortion is minimal. The features are smoothed, and their mean and standard deviation are calculated. The KS can be estimated by the median of the KS’s features. Feature selection is used to choose the best ratio, and spread features. This stage results in SoI’s KS, and in confusion matrices, that indicates on the success rate of the identification. The SoI’s KS are then stored in database for tracking stage [
63].
In the tracking phase, a double buffer mechanism is implemented, wherein the time between each data collection is used for calculating and storing the static features in a buffer for all the skeletons in the active set. Only the skeleton data is stored to minimize the memory resources. In cases where the activity’s context is necessary for the analysis, the stream of video frames can be used after the desired compression. The artifacts are identified and removed, and the estimations are classified to reliable or unreliable using Equation (7). Then the KI of the SoI is identified using the criterion in Equation (8). The identification and artifact removal can be performed using k-means clustering with a delay equal to the number of frames in the block [
64], and the data can be sent in real time to central analysis center for continuous assessment of the medical conditions [
44].
The human tracking mode was set to postures of standing/walking position. The feasibility of the new technology was demonstrated with three experiment sets in two locations. The first experiment set was at a single room (at Tel Aviv University, Tel Aviv, Israel), where five adults subjects (three adult males, and two adult females, the fifth subject, is the author (GB)), were recorded by the Kinect separately. The second, and third experiment sets, were recorded in a room (at the Neurology Department, Tel-Hashomer Hospital, Ramat Gan, Israel) and included two of the authors (GB, YM). The environment simulated a home environment, with challenging the Kinect with different coverage ranges, coming in and out of the camera range, and shadowing.
2.10. Experiment Sets
The feasibility of the new technology and the methods validation, were demonstrated in an experiment setup with three sets that represent daily activities commonly used to examine PD patients’ condition. The first experiment set included five subjects walking randomly in a room and was designed to produce statistics for the subject identification algorithm, and verify the significance of the chosen features. In this experiment, the subjects’ static features were extracted, and used to validate the capability to distinguish between the different subjects based on their signatures; the second experiment set included two subjects that maneuvered around a hall and performed different activities. It was designed to demonstrate the capabilities of the suggested methods to: (1) Detect the set of features that are used for classification; (2) identify and re-identify the SoI based on SoI’s KS from the calibration phase; (3) reject different artifacts (going in and out of the Kinect range and shadowing); (4) give indication regard the tracking quality and reliability based on skeleton distortion; and (5) validate Freezing of Gait (FOG) detection during the turnaround.
At calibration phase the two subjects stood for 7 s. Their KS was derived and stored for identifying the SoI at the tracking phase.
Figure 2 describes the calibration and the tracking phase from snapshots of the Kinect video camera. The subjects were walking randomly in the room for around 30 s with the following activities: (1) Walking in a complex pattern in relatively limited space; and (2) repetitive hand tapping. These activities are similar to experimental paradigms which have been recently employed to test motor function of subjects with PD [
34]. One subject simulated a patient (the SoI), and the other simulated an accompanying person (an interferer subject), like a medical staff (gait labs), or a non-expert family member (at home). In the tapping experiment the SoI was the second subject (
Figure 2a), and the other subject simulated an interferer subject in the scene. The path was marked by bowling pins (
Figure 2b). The third experiment set was composed from hand tapping activity.
Tapping is widely used as a measure of bradykinesia in Parkinson’s disease (PD). In addition, parkinsonian subjects, in general, tap more slowly than non-PD patients, more slowly in the more affected arm, and do not benefit as much from continued practice as do normal subjects, due to impairment of procedural motor learning in PD [
64]. The hand-tapping on a surface as shown in
Figure 2c, was designed to demonstrate the capability of the technology to assess SoI’s kinematic features that can be used for commonly used kinematic analysis. The kinematic features accuracy, in particular the ones of spatiotemporal gait variables [
35], or of the limbs, of time instances where the Kinect estimation are of high quality, can be assumed to be high and be used to assist in diagnosis of PD condition [
19].
4. Conclusions
In this paper, we suggest a new procedure and computational techniques to handle some of the existing known drawbacks of the Kinect System (v1. and v2): (1) The Kinect does not include subject identification, and thus can assign the Kinect estimations to the wrong subject; (2) it can suffer from distortions due to limited coverage range and shadowing that can result in degradation in the subject kinematics assessment’s quality.
The work provided a limited set of static skeleton based features of ratio and length of different BPs that can be used based on their time-invariance characteristics to identify subjects. Relatively a small number of these features were used to build a KS Kinect Signature of the Subjects of Interest (SoIs). Distorted estimations were detected and excluded in the feature domain based on feature prior knowledge of consistency over time, and enabled identification of the KIs of the SoI using k-mean classification algorithm. Then, kinematic features based on the SoI’s joints’ estimation were derived to demonstrate a Kinematics analysis.
The feasibility of the suggested technology was shown by three experiment sets. In the first one, the validity of the features to identify different subjects was verified using a DFA classifier. The identification accuracy of five different subjects was around 99% with all the 34 skeleton features, and reduced to around 90% with only two skeleton features. The second experiment demonstrated the applicability of methods in a challenging environment, in which the SoI, the other subject’s KI, and a distorted skeleton, were successfully detected. It enabled merging the different SoIs’ KIs to one, and to detect and remove if needed low quality tracking features and time instances. The third experiment set demonstrated the use of dynamic features like body plane velocity and relative upper and lower limbs angles, to collect kinematic statistics for advanced kinematic analysis. Similar data analysis and implementation scheme can be tailored to the case of multiple Kinect sensors to increase coverage. The KS estimation and reliability criterion can be applied on each sensor separately, and then combined by choosing the Kinect sensor with the best subject has maximal observance. More complex approaches can be used in future to aggregate multiple Kinect sensors to single joints’ estimations.
In future, the suggested technology should be tested with more activities and more subjects; more features, like pose, and facial features, should be added to the KS feature set to increase identification accuracy; dynamic features such as BPs’ standard deviation, changes in posture, stride length, asymmetry of gait, can be aggregated to the KS; more advanced feature selection and classification algorithm can be applied; multiple synchronized Kinect sensors to increase tracking accuracy and coverage is planned to be used; smart interpolation with additional sensor data should be used; and clinical experiments with different patients with different diseases to evaluate patients’ condition at home environment should be conducted.
Utilization of the suggested system and procedure can enable the Kinect to be a reliable resource for collecting at home kinematic information related to daily life activities. The Kinect analysis results can then be sent in real time directly from the patient home, to a patient database in the hospital for use in continuous monitoring and follow-up treatments. This technology can further assist in detection of risk-life situations like fall detection, fainting, or stroke, of elderly people at their home.