1. Introduction
Amidst the expansion in the field of the Internet of Medical Things (IoMT), a significant shift in the paradigm of tele-healthcare has been recorded. In particular, the monitoring of vital signs is gaining attention and has established itself as no longer a commodity but a necessity in this field. Four principal vital signs serve to establish an early warning score, namely the body temperature, blood pressure, heart rate (RR) and respiratory rate (RR). This score is the gold standard when it comes to quantifying the degree of illness of patients [
1,
2].
The Ballistocardiogram signal (BCG) is a noninvasive, unobstructive measure of the ballistic force generated by the circulation of the blood in the body. Several sensors have been designed to extract the BCG signal using for instance cameras [
3], accelerometers [
4], weight scales [
5,
6], ear wearables [
7] and piezoelectric sensors [
8]. Recently, an interesting microbend Fiber Optic Sensor (FOS) has been placed under mattresses; this noninvasive and unobstructive method is boosting the use of the BCG in telemedicine [
9] since it provides more flexibility and eliminates the inconveniences present in invasive or minimally invasive methods [
10].
Several works have studied the extraction of vital signs from BCG signals. For instance, in [
11], the authors developed different methods for the extraction of the RR such as the Harmonic and Noise (HNM) model and Wavelet Transform (WT), yielding promising results. In [
12], the authors proposed a machine learning-based approach for the estimation of the beat-to-beat HR from a highly volatile BCG signal. Features describing the BCG waveform were extracted, and a modified
k-means algorithm was used as an inference tool. Other works, such as in [
13], implemented a template matching approach to detect the J-peaks of the BCG waveform. In [
14], two different methods were implemented for the detection of the HR, namely the Maximum Overlap Discrete Wavelet Transform (MODWT) and Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDMAN) methods.
However, the previous works considered signal segments with preset properties and characteristics that describe perfect conditions (a subject laying on the mattress and breathing normally) and are thus distant from the real-life challenges when real data are processed. Indeed, these challenges are present due to the manifestations of spontaneous or illness-related activities of the subject during the recording of the signal. Examples of these instances include movement, coughing, holding breath, etc.
The main focus of this paper is to extend the systems developed in previous works to measure HR and RR by including a preprocessing stage with a set of classification and segmentation tools to decide which portions of the BCG signal, HR and RR are possible to measure. The full approach also allows the precise determination of the start and the endpoints of different kinds of human-body activities that can be extracted from the BCG signal.
We adopted a progressive approach, the flowchart of which is presented in
Figure 1. The first step, after the BCG signal pre-processing, of content-based BCG signal exploration and understanding includes the temporal and spectral analysis of the BCG signal during different human body activities (for example, resting, coughing and movement). The analysis permits us to consider the use of two spectral features, namely the Spectral Flatness Measure (SFM) and Spectral Centroid (SC). These are defined according to a frame-by-frame analysis, which means that the BCG signal can be decomposed into pre-defined time duration slices. After features behavior analysis, a clustering task, which is the core of many segmentation methods, is carried out in order to determine the usefulness of selected features for classifying the BCG signal into many classes. The number of classes and the hyper-parameters related to the frame size are optimized thanks to the Gaussian Mixture Model (GMM). Next, in order to establish the link between the number of classes and the nature of human body activity, a supervised classification is proposed. Two types are defined: the first one is a two-class approach that aims to classify a selected frame into one permitting the measurement of HR (binary flag cardiac activity detection (CAD)/NoCAD as an output) and another one permitting the measurement of RR (respiratory activity detection (RAD)/NoRAD). We recall that this kind of binary output is useful to drive HR and RR detectors. In fact, during coughing, there is no need to try to measure the RR, for example, since information on respiratory activity is absent and not relevant. The second supervised classification is a refined one since it gives information about the type of human body activity (resting, coughing, expiration, movement, etc.). We deal with coarse and fine approaches. The coarse approach makes use of features calculated for each frame while the fine approach considers a sample-by-sample classification. To this end, frame-by-frame features are converted to time-series thanks to a novel idea presented in this paper. The segmentation consists of identifying the bounding of each human body activity.
2. Material and Methods
2.1. Data Collection
2.1.1. Acquisition Process
The project was carried out at the Centre Hospitalier de l’Université de Montréal (CHUM) in Montréal, Canada. The protocol was accepted by the institutional ethics committees of the Ecole des Technologies Supérieures (ETS), the TELUQ university, the University of Sherbrooke and CHUM.
The study was conducted on healthy subjects, comprising three males and three females, aged between 20 and 35 years, in good health and without known cardio-respiratory disorders. Once the subjects agreed to participate in the study, information concerning their demographic and anthropomorphic details (age, weight, height, etc. ) were collected.
The system used for collecting data included a small FOS mattress and a module to gather optical data from the mattress. This particular process and more details about the feasibility, conception and performance of the FOS micro-bend sensors are available in the literature (for example, [
11] or [
15]). The FOS mattress was fixed on the back of a regular office chair, as shown in
Figure 2. The system used for collecting data included a small FOS mattress and a module to gather optical data. The raw data were sampled at 50 Hz by the module. During the whole experiment, the subject wore a Hexoskin and was asked to sit still on the chair for a duration of 5 min. During this time, a Raspberry Pi 3B recorded the data streamed on the serial port of the FOS’s module.
2.1.2. Experimental Protocol
To simulate common human body activities, subjects were asked to obey a certain experimental protocol composed of the following steps: (1) normal breathing for 30 s, (2) coughing three times, (3) normal breathing for 300 s, (4) holding breath for 30 s, (5) expiration, (6) normal breathing for 60 s, (7) coughing 10 times with a between-cough interval of 5 s, (8) normal breathing for 30 s, (9) coughing 10 times with a between-cough interval of 2 s, (10) normal breathing for 120 s, (11) standing up, (12) sitting down, (13) steps 10 and 11 repeated 4 times and (14) standing up.
Thus, aside from remaining in a still position, other human body activities that commonly occur and can alter vital sign measurements from BCG signals were introduced. We explored (i) coughing as a frequent activity executed by the human body, which induces a certain internal disturbance in the respiratory and cardiac rhythms; (ii) holding one’s breath as another activity that renders respiration activity impossible to consider while cardiac activity is possible to extract; and (iii) standing up and sitting down as an example of movement types.
Figure 3 illustrates the BCG signal during the experimental protocol activities. This is discussed in detail later.
The ground truth—i.e., the human body activities—was created manually by experts who adjusted the frontiers so that a precise ground truth was created. This label information was used later in frames and then in spectral features.
2.2. Ballistocardiogram Signal Description
2.2.1. Cardiac Information
The BCG signal is a measure of the acceleration of blood through the veins. In broader terms, it is a measure of the ballistic force generated by the heart. For a person placed on a bed with minimal friction and minimal movement artifacts, one cycle of ideal cardiac activity appearing in BCG signal is illustrated in the enlarged part of
Figure 4. This contains eight different waves annotated as
F,
G,
H,
I,
J,
K,
L and
M. They are divided into three groups: pre-ejection (
), ejection (
) and diastolic (
). The
F wave (rarely seen in the BCG representation) is closely related to pre-systolic events of the heart. The
G wave corresponds to a trough preceding the systolic waves. The
H wave is the first one in the systolic cycle; it corresponds to the maximum peak recorded (which is synchronous to the isovolumetric contraction) [
17].
I,
J and
K waves are the most recognized waves in the BCG, occurring during the systole. The
I wave simulates the acceleration of blood in the ascending aorta and pulmonary arteries; the
J wave is the main positive wave occurring in the systole, simulating the acceleration of the blood when going through the second part of the aorta; and the
K wave occurs before the end of the systole.
L and
M are the diastolic waves that represent headwards deflections following the
K wave [
18,
19].
2.2.2. Respiratory Information
It is worth mentioning that the described waveform depends greatly on the used acquisition system. In fact, multiple sensors using different electrical, optical or mechanical devices are currently available [
20]. For example, optical fiber-based systems extract the aforementioned waveform, but it is believed that the resulting signal presents a respiratory component which corresponds to the movement of the thoracic cage of the patient [
11,
21]. In fact, the BCG signal studied in this work was acquired using a microbend FOS which provides a new way of acquiring the mechanical activity of the human body. More precisely, we used a Juvo’s skin non-contact feature, benefitting from the patented invention of fibre-optic sleep and vital signs sensors [
21,
22]. The BCG signal formation was achieved through the intensity attenuation of the light passing through an optic fiber in response to a mechanical stimulus on the fiber, as shown in
Figure 5.
The FOS sensor, typically placed at the level of the thoracic cage, captures the longitudinal BCG as well as the inspiration and expiration of the body [
23]. In fact, during the inspiration phase, the inhalation of air by the patient causes the appearance of a downward trend in the BCG signal, which corresponds to the increase of the force executed on the sensor placed behind the thoracic cage of the body. During the expiration, the exhalation of the air in the body of the patient causes a decrease in the force on the sensor, leading to an upward trend. The coupling of these upward and downward trends constitutes the general respiration waveform of the BCG [
21,
24]. An example of the respiratory cycle present in the BCG signal acquired with the FOS system is illustrated in the upper part of
Figure 4.
2.3. Spectral Features for the Identification of Various Activities
2.3.1. Spectral Features Motivation
The cardiac and respiratory activities were repeated periodically. The term
denotes any one of the periodic signals generated in the BCG signal to describe the cardiac or the respiratory activity; according to its periodicity property, its Fourier transform is a sum of Dirac pulses equally spaced in the frequency axis at frequencies that are multiples of
:
where
is the
Fourier transform coefficient, calculated during one period according to the following formula:
Thus, a periodic signal can be represented as a sum of sine waves, and thus the Fourier transformation of this particular signal is spiky.
We used the two features of the Spectral Flatness Measure (SFM) and Spectral Centroid (SC). They are computed as follows. First of all, the discrete-time BCG signal, denoted as , is decomposed into frames of short duration. These frames should be long enough to carry information about the activity but not overly long to avoid an overlap of two or more different activities. In the frequency domain, the short-term Fourier transform is calculated and its amplitude is extracted. Its module is denoted as , where m is the frame index and k is the discrete frequency.
2.3.2. Spectral Flatness Measure (SFM)
This is also known as Wiener entropy [
25]. It is a signal processing measure used to describe the flatness of the spectrum of a certain signal, defined as the ratio of the geometric mean and arithmetic mean of the spectrum:
where
m is the frame number,
k is the frequency bin index and
N is the number of frequency bins.
If the spectrum of the frame is flat,
will be close to 1. In the particular case of white noise, the spectrum is constant, and thus the
value is equal to one. The SFM is mostly used in audio-related fields, for instance. For example, we relate the detection of voiced and unvoiced speech frames [
25], the analysis and recognition of whispered speech [
26] and emotional speech synthesis and transformation [
27]. However, few works have implemented such features in the framework of vital signals, which makes the subject of this work a novelty. In fact, we believe that the pertinence of this feature is supported by the intuition of their physical significance.
2.3.3. Spectral Centroid (SC)
The Spectral Centroid indicates the location of the center of mass of the spectrum. It is defined as follows:
where
is the frequency in Hertz related to the frequency bin
k. Once again, because the Spectral Centroid is a good predictor of the “brightness” of a sound, it has been widely used in digital audio and music processing, such as musical genre classification [
28]. To the best of our knowledge, its use for a BCG signal is also a novelty.
2.4. Unsupervised Classification for the Exploration of the Number of Activity Classes
We conducted an exploratory data analysis in order to determine the different classes of activities in the signal. First, a general description of the available features and their modeling process in an unsupervised manner was made. No prior knowledge about the available BCG signal was taken into consideration. This justified the existence of different classes and permitted us to define the hyper-parameters that were useful for the effective classification.
2.4.1. Features Histograms and GMM Modeling
The Gaussian Mixture Model (GMM) is used for unsupervised learning algorithms to model clusters of points. Each cluster is assigned a Gaussian function where the mean represents the center of the cluster and the variance describes the spread within it. We denote
as the two-component feature vector of the frame
m. The distribution can be modeled using the GMM as follows:
where
K is the number of clusters (number of Gaussian components),
is the weight of the
Gaussian (where the sum of all of them is equal to one),
is the mean,
is the variance and
is its determinant.
Classically, and regardless of the classification problem, many criteria are defined to formally determine the number of Gaussian components; for example, the Akaike Information Criterion (
) [
29] and Bayesian Information Criterion (
) [
30]. They are defined as follows:
where
is the maximum likelihood of the model.
The Bayesian Information Criterion introduces a penalty term,
where
M is the number of training examples. Similarly to
, low values of
indicate a good fit of the model.
2.4.2. Frame Size Optimisation
GMM modeling requires the optimization of the frame size. The used criterion is the silhouette, which provides information about the goodness of fit of a certain model [
31]. It takes into consideration the between-cluster distance as well as the within-cluster distances, which makes it particularly interesting for an unsupervised learning context. The expression of the silhouette
is the following:
where
is the within-class distance,
is the between-class distance,
is the
cluster and
is the cardinal operator.
and
are computed as follows:
and
where
is the distance between the points
i and
j.
After calculating for each frame under different frame sizes, the mean value is determined for each frame size.
2.5. Supervised Binary Classification for Vital Signs Detection
Supervised classification was performed using the K-Nearest Neighbors (KNN) algorithm thanks to its simplicity and as we knew that other classification techniques have been tested and have shown poorer or similar performances. The KNN algorithm is a non-parametric method that is used for classification. The variant adopted was Fine , which is the finest variation of since it labels the new input with the same label as its nearest neighbor. The BCG signal frames were classified based on their pertinence to detect the heart rate and the respiratory rate. In this regard, all frames were labeled into one of the following activities: cardiac activity detection () or no cardiac activity detection () and respiratory activity detection or no respiratory activity detection ().
2.6. Multi-Class Classification and Segmentation
The BCG signal frames were then classified according to human body activities. Seven classes were used: normal activity in still position, coughing, post-coughing, holding breath, expiration, movement and others. Two different classification approaches were adopted. First, a classification was made for each frame, which is called frame-by-frame classification. The second approach was used for each sample and is called sample-by-sample classification.
Enframing is a classical method in signal processing, and it refers to splitting the BCG signal into temporal segments. For each frame, features are extracted, and decision-making about classification is related to the whole segment.
The approach used for sample-by-sample classification was as follows. Instead of using raw values of SFM and SC for each frame and risking the resulting limitations (coarse decision for long-duration frame, boundary epochs, etc.), we created time-series out of the SFM and SC values computed for each frame. The adopted approach is described below; it is also illustrated in
Figure 6.
First, the original BCG signal was decomposed into frames of length 1024 with an overlap of 960 samples (an increase of 64 from frame to frame). The Hamming window was used for this. An example of the original BCG signal, the moving Hamming window and the generated frames are illustrated in the top part of
Figure 6.
The features, namely and , were extracted for each frame.
Each scalar feature
(resp.
) was used to construct a “sub-time-series”. This latter was a constant vector, whose value was equal to
(resp.
) and whose length was equal to the frame size. The constant sub-series are shown in
Figure 6 below those of previous steps (horizontal lines labelled
).
The whole set of sub-series was put together at the input of an “overlap and add module”. This step was equivalent to the inverse of windowing and frames decomposition; the aim of this process was to construct time-series from features that represented the spectral content of the BCG signal. Their length was equal to that of BCG signal, and they are denoted as and , respectively, for the Spectral Flatness Measure and Spectral Centroid.
The classification was then conducted on each sample of the new time-series. This process is called sample-by-sample-classification.
2.7. Evaluation Metrics
During this study, some common classification criteria were used. Due to a lack of space, only some of them are given here. In case of binary classification, we mainly retain the True Positive Rate () and the Positive Predicted Value (). The , called also sensitivity or recall, measures the proportion of detected positives from the actual positive (true positives and false negatives) while the measures the proportion of detected positives from the real positive.
In the binary classification, the provided confusion matrix contains the
and the
values on the diagonal, while the complementary metrics define the False Discovery Rates (
) and the False Omission Rates (
). The generalized formulas of the evaluation metrics are included below:
In case of multiple classes (more than two), the confusion matrix in terms of
and
is used. Each term of the matrix is written as follows:
where
is the number of predictions of class
i that belong to class
j,
L is the number of classes. The PPV confusion matrix is given by
4. Conclusions
In this work, we proposed an approach to generate two binary flags indicating the useful frames permitting the measurement of cardiac and respiratory rates from a BCG signal. We have also presented a refined BCG signal segmentation technique according to the phenomena that occur in the process of recording the BCG signal. These described occurrences represent real-life challenges when it comes to the BCG signal applications since they present different properties than the perfect conditions assumed in the previous works. This fact makes this work a novelty and boosts its adaptability and use case for physical applications.
This work began with a temporal and spectral analysis of the resulting BCG signals. This was followed by unsupervised learning to understand the patterns available in the presented dataset. Furthermore, the different parameters of the segmentation were explored and rigorously determined. Next, two-label classification was carried out to create flags indicating the opportunity of measuring cardiac and respiratory activities. Finally, two different methods of multi-label classification were proposed. The sample-by-sample classification showed more promising results compared with frame-by-frame classification.