1. Introduction
Nowadays, in developed countries, significant progress in the process of aging is observed—the percentage of elderly people in the population is higher than the percentage of young people. It is expected that in these countries the current 20%-proportion of people age 60 years and above will increase by 32% by the year 2050. Over 50 years between 1950 and 2000 the median age increased from 29.0 years to 37.3 years and its continued growth is estimated to be 45.5 years by the year 2050 [
1].
These figures force the governments of developed countries to carry out adequate actions. They mainly consist of the monitoring of health parameters and physical activity for the purposes of prevention against all types of diseases and life risks such as falls and frailty due to the absence of systematic physical exercise, selected on the individual level. Taking care of people who need special treatment (older, with disabilities, during recovery after injuries, accidents, or serious illnesses) is not limited to satisfy their physiological or material needs, but first of all involves physical, psychological, and social stimulation [
2]. As early as in ancient times, not without reason, Aristotle said that “movement is life—life is movement”. Thus, all attempts and efforts towards achieving practical support for such people by encouraging their psychomotor autonomy are of great importance.
To face the above needs, projects of technical solutions proposed worldwide aim at the non-invasive, convenient, and secure monitoring of supervised human vital signs [
1,
3]. Such monitoring is expected to reduce the costs of expensive medical equipment or specialized medical and rehabilitation staff and to assist non-professional individuals in taking continuous care of ill people.
Every approach to an assisted living system raises three issues:
Adequacy of the applied sensor set;
Intrusion of measurement devices in the subject’s environment and behavior;
Violation of the subject’s privacy and vulnerability of the collected data.
With the rising demand for applying the new technical solutions in the field of ambient assisted living, scientific works and their outcomes are widely presented [
4]. Therefore, various types of approaches of ambient sensor-based monitoring technologies detecting elderly events (activities of daily living and falls) can be found in the current literature such as non-contact sensor technologies (motion, pressure, video, object contact, and sound sensors), multicomponent technologies (combinations of ambient sensors with wearable sensors), smart technologies, and sensors in robot-based elderly care.
With the aim of non-intrusively monitoring human wellbeing at home, the domestic energy supplies can be also disaggregated in order to detect appliance usage by means of machine learning and signal processing [
5]. This enables the identifying of behavioral routines, detecting anomalies in human behavior, and facilitating early intervention.
To support the independent life of seniors and people with chronic conditions and potential health-related emergencies an Internet of Things (IoT) network is implemented for continuous monitoring [
6]. The solution is based on the network including mobile phones to transmit the data generated by the IoT sensors to the cloud server and the 3rd party unknown mobile relays.
Since the home environment is usually monitored by sensors collecting a vast volume of collected data, the computational methods should process it in an appropriate time [
7]. This implies the need for an event-driven framework in order to detect unusual patterns in such environments.
Another important point is designing and implementing an indoor location and motion tracking system in a smart home setup [
8]. The role of such a system is to track human location based on the room in which the supervised person is located at a given time and to recognize the current activity.
Since in real daily life human behavior is not so predictable, a hybrid framework for human behavior modeling could take a great role in managing the changing nature of activity and behavior. The feedback-based mechanism could be significant to recursively append new events and behavior and classify them into normal or abnormal human behavior [
9].
Due to the rapid evolvement of Ambient Assisted Living (AAL), there is also the necessity of standardization, uniformities, and facilitation in the system design [
10]. The paper presents the latest survey of the AAL system’s models and architectures. The authors investigated the AAL system requirements and implementation challenges, Reference Models (RM) and Reference Architectures (RA) definitions, demands, and specifications.
Simple unimodal approaches propose using a motor signal that adequately describes the state and behavior of the monitored person. This type of measurement allows not only to initiate an alarm in the dangerous or unusual situation [
11,
12] but also allows to specify a degree [
13] and a type of daily physical activity [
14]. It was also found helpful in the evaluation of rehabilitation progress and providing biofeedback to support the growth of psychological motivation and engagement in physical exercises [
15].
In multimodal approaches, the activity sensors use various physical measurements and data fusion methods to provide consistent information about the subject’s activity. This usually raises a question about the adequate usage of particular sensor types accordingly to their advantages in specific scenarios. Studying numerous papers on ambient assisted living, considering personal longstanding experience, and being inspired by rules of nerve sensitivity modulation in humans, we were motivated to propose a multisensor system with an adaptive contribution of particular sensors to the final behavior classification accordingly to the present and most probable future actions. The scope of the reported research includes the analysis of the performance of the four most commonly applied assisted living sensors (three of them are wearable) in six elementary reversible activity types of the human. Based on this analysis background rules of sensor contribution have been proposed and applied to build an auto-optimizing multimodal surveillance system. The main purpose of the work is to confirm the complementary competencies of the sensors and benefits resulting from their adaptive contribution in realistic assisted-living scenarios.
Consequently, the main novelty presented in this paper is the concept of a system for the recognition of human daily activity that adapts the process of multimodal data fusion following the criteria of sensitive, selective, non-intrusive, and privacy-protective measurements (
Section 3).
To this point, we tested basic behavioral measurements with a custom-built multimodal surveillance system (
Section 4), registered and interpreted many different vital signs from supervised people with low-cost and easy-to-use sensors, and compare their sensitivity and selectivity of action recognition. Elements of this system have been developed as the result of different previous projects focused on single sensing modalities such as control of the living environment with the eye movements [
16], motor cortex rhythm [
17], facial information [
18], and sound recognition [
19]. The cooperation of several sensors with different characteristics has been proposed in two other projects dedicated to the supervision of humans during sleep [
20,
21]. We also contributed to the research aimed at the development of sensor networks for supervising the human in motion based on motion patterns from wall-mounted cameras [
12,
22] or data from wearable devices [
23,
24]. Finally, two approaches of sensor data fusion from multimodal sensing systems have been proposed in [
25,
26].
This research summarized in
Section 5 paved a way to propose two algorithms for continuous modulation of the extent of influence from each particular sensor to the final recognition (
Section 6).
Section 7 presents the case studies,
Section 8 contains discussion and
Section 9—concluding remarks.
3. Concept of Adaptive Sensing
The concept of continuous adaptation of sensors’ contribution in a multimodal system originates from rules of information propagation in living neural systems. Let us shortly recall two different types of chemical synapses: ionotropic, with a quick and short synaptic response, specialized in fast sensory or executory, excitatory or inhibitory pulse messaging, and metabotropic, with a delayed and long-standing response, being primarily responsible for the modulation of pulse conduction. All mammals select the dominating and auxiliary senses that they actually use to perceive the surroundings thanks to these two complementary types of synaptic junctions.
Mimicking the above-mentioned natural rule of neural modulation in a technical multisensor assisted living environment requires solving two issues:
Determining competence areas and performance hierarchy in a given sensor set;
Specifying data stream modulation rules, allowing to adapt each sensor’s contribution to a final decision.
Initially, we assume each sensor to have an exclusive sector of competence area, where no other sensor is applicable, and its complementary sector, where it competes with one or more other sensors. Although the accuracy and reliability are most naturally selected as competence criteria, a variety of other parameters are applicable in a real surveillance system: availability, intrusiveness, energy consumption, etc. Moreover, the cooperation of two sensors in a common competence sector yields valuable information about the coherence of their data streams, which may be useful in other scenarios to assess the quality of measurements relying only on the auxiliary sensor (e.g., when the principal sensor data are unavailable).
In the following sections, we develop this concept by examining the sensor set and sensor-specific preprocessing software (
Section 4) in an experimental detection of human motor activities (
Section 5). The discussion of the experiment outcome is followed by a proposal of two data stream adaptation algorithms (
Section 6) and the presentation of a use case (
Section 7). The discussion and future remarks (
Section 8) conclude the paper.
6. Reliability-Driven Sensor Data Fusion
6.1. General Assumptions and System Design
The general architecture of a multisensory environment for assisted living consists of sensors, dedicated feature extraction methods, and modality selectors. The proposed innovation replaces the selector by a modulator using weight coefficients
Wk (
Figure 6) to prefer the most pertinent features while discriminating the others. As the sensors use specific signals (muscular, pressure, acceleration, and video), one of the consequences of replacement of the feature selector by a modulator is the necessity of uniform representation of all features. To this point, the feature calculation step uniforms the information update rate and normalizes the feature values. The output of each sensor is given as a probability-ordered list of activities {
Ai,
pi} (see
Figure 6 and
Figure 7).
Three coefficients are proposed to modulate the influence of each sensor on the final decision about the detected activity. These are listed and shortly explained below.
Hk is an activity-independent coefficient characterizing each sensor cost including hardware, installation, and maintenance as well as human factors like acceptance of each particular sensor set (cameras at home, accelerometer belt or bands, electrodes, etc.); all these factors we consider to be constant in time thus these values need to be evaluated once per subject. In order to efficiently adapt the sensors’ choice, extreme values of Hs should be avoided.
Rk(
A) is an activity-dependent factor of reliability; as it was demonstrated in
Section 5, sensors show different performance in the detection of basic daily activities of the human; accordingly, in the system paradigm,
Rs is the primary factor adapting the contribution from sensors to the current activity of the monitored subject.
L(
n) is a penalty factor that discriminates the influence from sensors depending on their position n on the reliability ranking in determining the activity
A by sensor
k; the actual penalty factor is calculated based on a coefficient
p: low values of
p equalizes the ranking list what makes the system mostly working with multiple sensors and avoiding the worst, while a high value of
p prefers the winner to be a unique working sensor:
The contribution of each sensor
k may be thus determined as:
and normalized over the whole set of sensor weighting coefficients:
Accordingly, with the currently detected subject’s action, the system automatically adapts the feature set (
Table 2) to optimally detect the present action. The optimization criteria may be freely selected from variables presented in
Section 4.4 and used jointly with other attributes (including non-technical such as acceptance, usage cost, etc.). To keep the presentation simple, we use the correctness of recognition (given in
Table 3). In a real system, besides the subject action, the selection of sensors also takes into account constant factors like costs and availability or acceptance of a sensor by individual subjects.
Instead of applying recognition correctness generalized for all volunteers, an individual table, equivalent to
Table 3 may be built for each supervised subject. The personalization of the multisensor environment improves the individual performance (compare columns in
Table 4) but requires a set of exercises performed under the supervision of a human assistant who annotates the activities and checks the recognition correctness (or other optimization criteria).
Based on selected optimization criterion (in our example: generalized correctness of recognition,
Table 3) a hierarchy of feature vectors is built for each detected activity. Taking the action “bending” (5a) as an example, we have sensor set hierarchy:
(highest) | BDE; |
| (BD, BE, DE, CDE); |
| (BCD, BCDE); |
| (CE, BCE); |
| CD; |
(lowest) | BC. |
It is noteworthy that BD yields better results than BCD, therefore the use of more sensors does not lead to better results, and adding a sensor (C in this case) may degrade the recognition correctness.
The modulation of the sensor’s contribution presented above is confirmative. Firstly, the detection is roughly made with a possibly not optimal sensor set and then confirmed with an adapted set. The modification closes the information loop and, like all kinds of feedback, raises the stability issue if the action detected with adapted features does not match those initially detected. The other drawback of confirmative detection is related to possible erroneous first detection leading to an even less optimal sensor set and confirming the erroneous decision.
6.2. Stability Condition for Modulated Sensor Set
The stability issue in a sensor set with modulated contribution can be solved by limitation of the weight modulation range. Let
f be a function
A =
f(
Sk;
Wk) assigning a unique subject’s action
A to specific sensor outputs
Sk modulated by
Wk. This means all probability values
pi of given activities
Ai from sensor
k are multiplied by
Wk:
Let
m be a function
Wk =
m(
A) modulating the contributions from sensors
Sk to maximize the reliability of the recognition of
A. Therefore, the modulator is stable if:
which means the modulation does not influence the current recognition result.
Since we cannot expect the recognition result to be a linear function of the modulation depth, we propose an iterative try and fail algorithm finding the modulation limits. To find the value of
Wk, between the original
Wk1 and the desired target
Wk2 the algorithm repeatedly bisects an interval and then selects a subinterval in which both ends yield different actions for further processing.
All necessary steps of the modulation algorithm are performed within the subject state sampling interval. New data gathered from the sensors are processed with optimized sensors’ contribution and confirm the detected subject’s action.
The stability issue can be also avoided by applying a sensor set consistency rule. This rule uses the past sensor set as a reference and requires the new set to be as similar as possible. Continuing the example given in
Section 6.1 if “bending” has been detected with BE sensors and a “straightening the trunk” (5b) occurs thereafter, the sensor set hierarchy is the following:
(BD, BDE);
(BE, DE);
(CE, BCD, BCE, CDE, BCDE);
CD;
BC.
Maintaining the BE configuration is preferred over changing to DE, despite their equal performance, for the stability reason.
6.3. Predictive Modulation of Sensors’ Contribution
One may question the purpose of optimization if it only confirms the result of recognition already made. Fortunately, in most assisted living environments, the prevention of dangerous events is stressed as a primary goal, their architecture usually includes an artificial intelligence-based system for learning of the subject’s habits and detecting unusual behavior as a potential sign of danger. Such systems gather the information of individual habits in a form of database learned and updated from real past behavior records. Such a database provides activity statistics, but, more interestingly, for each given activity the most probable next activity can be determined. We propose to use the information from the individual’s habits database to predict the subject’s upcoming action and adjust the sensor’s contribution accordingly (
Figure 7). The modulation is still made accordingly to the stability requirements (see
Section 6.2), but the sensor’s contribution now adapts to the most probable next subject’s action.
Introducing the habits database in the feedback path has two benefits:
Prediction of upcoming action takes into account multimodal time series instead of single points, what stabilizes the prediction in case of singular recognition error;
Focusing on optimal recognition for current action makes the system conservative (i.e., expecting a stable status), whereas optimizing for future action makes it progressive (i.e., awaiting changes of the status).
8. Discussion
The results showed that it is possible to recognize the selected motor activities of everyday life with high reliability by using a different kind of individual sensor as well as their 2-, 3-, or 4-elements sets. Although some activities are recognized with less reliability with the use of some sensors, in such case there is a possibility to successfully use the data from other sensors (see discussion and conclusions in [
50]) or sensors sets for which the outcome is more reliable. As can be observed from
Table 3 and
Figure 4 the recognition with the use of sensors sets very often has higher values (94.1–100%) than with the use of the individual sensors, for any type of activity. The same observation can be also taken from
Table 4a,
Table 4b, and
Figure 5, which present very often better results from sensors sets (88.3–100%) that from the individual sensors, for any volunteer. There are sometimes opposite cases, but only when the individual sensor (with lower recognition for some activity or some volunteer) is applied to a sensor set. In such a situation, this sensor decreases the recognition for the sensor set and this recognition is lower than for the other individual sensor (with higher recognition).
To sum up, the individual sensors have complementary scopes of competences and their mutual exchange depending on the current situation benefits better results than the usage of a rigidly defined sensor set.
Studying sensors’ performance in recognition of six elementary daily living activities, we confirmed that particular sensors show their optimal recognition accuracy at different movements (
Table 3). Consequently, due to the complementary competencies of sensors, combining information from multiple different sensors is expected to give more reliable recognition. Unfortunately, in compound actions, true recognition falls into the border area or actually moves from the area of competence of one sensor to another. This remark was a foundation of the presented concept, design. and prototype of an assisted living system with an adaptive sensor contribution.
Based on the comparison of the accuracy of activity recognition by four different assisted living sensors, we built activity-specific sensor priority lists and proposed a multimodal surveillance system with adaptive sensor’s contribution. The setup we used as a model of a sensorized environment in which multiple sensors of possibly different paradigms and performance cooperate in the surveillance of a human. We assumed that sensors not only differ in reliability depending on the subject’s action but also give consistent or contradictory results. We proved this assumption in experiments showing that adding sensors may decrease the correctness of recognition (
Table 3).
Since the sensor data differ in form and refresh rate, sensor-specific data processing was applied first to provide data in a uniform format before fusion. The sensor-independent format was a list of activities ordered by descending detection probability. Activity data matching and fusion are made on the list level and also allows for continuous adaptation of sensors’ contribution to the final result of the network. This proposal has been inspired by a neuromodulatory mechanism, which, although far more complicated, also leads to modulation of the information flow from the senses to the brain.
Biomimetic modulation of a sensor’s contribution in a multisensory assisted living environment puts forward their advantages according to the subject’s behavior. Being aware of limitations present in any human behavior model, we took selected daily living activities as samples in a continuous space of possible behaviors and tried to represent the actual behavior with a measure of similarity to these primitives [
34]. In this paper, we showed that sensors, due to the specificity of their work principle, are somewhat ‘specialized’ in the recognition of particular poses or activities. Consequently, if a compound activity is represented by a set of elementary poses of varying contributions (see
Section 7.3), the surveillance system, besides other limitations (see
Section 7.2), should optimize the flow of sensor data seamlessly.
Regarding the related works, the main novelty in this paper is the ongoing adaptation of the sensor set dependent on the subject’s behavior. Since the range of activities is virtually unlimited and the prediction of most probable future action is uncertain, given optimization rules had to be proposed and were implemented as:
Sensor cost—to balance the sensor usage;
Penalty factor—to balance between multimodal and single mode-switching system;
Stability check—to maintain decision on detected activity while modifying sensors’ contribution.
Since human activity is a dynamic process, the contribution of the sensors needs to be considered as time-varying. To this point in the design of the multimodal assisted living system with adaptive sensor’s contribution, we proposed to consider conservative and predictive adaptation. The conservative adaptation assumes the sensor contribution is adapted after the activity recognition and, in case other results were issued by the adapted system, raises the stability issue, which can be solved in several ways (e.g., see 6.2). The predictive adaptation requires the use of a subject’s habits database, which has to be created and trained, but it already contains a personalized factor. Moreover, the prediction of behavior is never 100% accurate, something that needs to be taken into consideration in the design of adaptation rules.
We used four different sensors with quite good performance in the given experimental setup. However, one should consider more difficult or unstable conditions (e.g., lighting) and simplified sensors (e.g., when the energy consumption will be taken into consideration). The maximum error the system will make in activity recognition is expected as equal to the error of the second sensor.
Conservative adaptation in the two-sensor mode, (
p > 1) may give erroneous recognition which (according to
Table 3) may be inaccurate by 5.9% of cases (activity 4b, sensors C and E). The stability check in conservative adaptation prevents the system from changing the recognition decision based on an inappropriate change of sensors. The proposed new sensor set is applied in a subsequent sensing step and if the previous activity is maintained and the new settings are appropriate, a more accurate recognition will be issued.
In predictive adaptation, the unexpected behavior may affect the sensor set adaptation making the new proposed set inappropriate. In this case, again one should consider the case that a less accurate sensor will be proposed, and the overall reliability will decrease. Unlike the conservative case, the subject’s history (represented in
Figure 7 as the “habits” database) helps to avoid the adaptation mismatch. However, it is worth noting that we used only a single step prediction (i.e., next most probable activity has been taken as a background for sensors adaptation), and future studies are necessary to potentially extend the prediction range to a tree of
n future activities.
Our studies presented here were performed with the data recorded from specific sensors (including custom sensor-specific software,
Figure 2) in the given test environment described by Smoleń [
50]. With different sensors, particular findings (such as
Table 3) may differ significantly, but a general rule of building sensor set hierarchies is universal and worth follow-up by other scientists developing multimodal human activity sensing systems. Therefore, we found it reasonable to present the system operation in four case studies than to give a quantitative evaluation of setup-specific activity detection efficiency.
The building of such a prototype system combining wearable and infrastructural sensors is the aim of our next project. Also, the question of initial personalization of recognition and data flow rules needs to be considered again in the context of a working prototype.