1. Introduction
Accurate measurement of vital signs and physiological parameters, such as body temperature, pulse rate, blood pressure, and respiratory rate, plays a pivotal role in the healthcare sector and management of patients. Among these, the respiratory rate (
) is still considered the neglected vital sign in both the clinical practice and sports activity monitoring [
1,
2]. Temporal changes in the respiratory rate may indicate relevant variations of the physiological status of the subject, even better than other vital signs (e.g., pulse rate) [
2] and it is found to be more discriminatory between stable and unstable patients than pulse rate [
1].
In a clinical setting, the respiratory rate is an early indicator of physiological deterioration [
3] and a predictor of potentially dangerous adverse events [
1]. Indeed, respiratory rate is an important predictor of cardiac arrest and of unplanned intensive care unit admission [
1], as well as an independent prognostic marker for risk assessment after acute myocardial infarction [
4]. Besides, it is fundamental in the early detection and diagnosis of dangerous conditions such as sleep apnea [
5], sudden infant death syndrome, chronic obstructive pulmonary disease, and respiratory depression in post-surgical patients [
6]. In intensive care units, the respiratory waveform and
are typically recorded. In mechanically ventilated patients, such data can be obtained directly by the mechanical ventilator traces [
7] or retrieved by pulse oximetry sensors [
8]. However,
is typically collected at regular interval by operators (i.e., every 8–10 h) in the clinical setting outside this ward, while is often neglected in home monitored people and patient [
1].
Conventional methods for measuring respiratory parameters require sensing elements in contact with the patient [
9]. These methods are mainly based on the analysis of several parameters sampled from the inspiratory and/or expiratory flow. Differently, approaches based on the measurement of respiratory-related chest and abdominal movements have been also adopted [
10]. Sensors may be directly attached on the torso [
11] or integrated into clothing fibers. Several sensors have been used as resistive sensors, capacitive sensors, inductive sensors. Such monitoring systems must be worn and powered [
11]. Additionally, they may cause undesirable skin irritation and discomfort, especially when long-term monitoring is required or during sleep. Substantial evidence indicates all these contact-based measurement techniques may influence the underlying physiological parameters being measured [
12].
Contactless monitoring systems may overcome these issues related to placing sensors on patients and influence the measurand [
13]. Mainly, solutions based on the analysis of depth changes of the torso using time-of-flight sensors [
14] during breathing, low-power ultra wideband impulse radio radar [
15,
16], and laser Doppler vibrometers [
17,
18,
19] have been designed and tested. Principal limitations of such solutions are related to the high cost of the instrumentation, need for specialized operators, and, in some cases, a low signal-to-noise ratio. Contactless monitoring systems based on the use of optical sensors are gaining preeminence in the field of respiratory monitoring mainly because of recent progress in video technology. Commercial and industrial cameras may be exciting solutions as they provide low-cost and easy-to-use non-contact approaches for measuring and monitoring physiological signals [
4]. Some attempts have been made to record respiratory parameters from breathing-related movements of thoraco-abdominal area, face area, area at the edge of the shoulder, pit of the neck [
20,
21,
22,
23,
24,
25]. Then, different approaches have been also used to post-process the video to extract the respiratory-related signal mainly based on image subtraction [
26], optical flow analysis [
27], Eulerian Video Magnification [
24] and Independent Component Analysis (ICA) applied to pixel intensity changes [
28]. By the review of the literature, there is a lack of results about accuracy of such methods in the monitoring of eupneic respiratory pattern and
monitoring, since the majority of the cited studies present proof of concepts and preliminary tests, but accuracy evaluation is not performed. When available, typically a frequency-domain analysis is carried out to extract the frequency content of the respiratory-related video signal and to measure the average respiratory rate. Since analysis with these techniques requires the recording of the torso movement, clothing can influence the data quality and validity of the methods. However, no studies have focused on such potential influences on respiratory pattern and
measurement. Only a preliminary study of our research group tried to investigate this influencing factor in [
29].
In this paper, we present a measuring system capable of non-contact monitoring of respiratory pattern by using RGB video signal acquired from a single built-in high-definition webcam. The aim of this study is three-fold: (i) the development of the measuring system and the related algorithm for the extraction of breath-by-breath values; (ii) the evaluation of the error between the breath-by-breath values retrieved by using the proposed measuring system and those recorded with a reference instrument; and (iii) the analysis of influence of clothing (i.e., slim-fit and loose-fit) and sex on the performance of the proposed method.
2. Measuring System
The proposed measuring system is composed of a hardware module (i.e., a built-in webcam) for video recording and an algorithm for (i) preprocessing of the video to obtain a respiratory signal, and (ii) event detection, segmentation and extraction of breath-by-breath values. The working principle of the method used to extract respiratory information from a video is explained in the following section.
2.1. Light Intensity Changes Caused by Respiration
Each video can be considered a series of f frames (i.e., polychromatic images), where f is the number of the frames collected. Each frame is an image composed of three images in the red (R), green (G) and blue (B) channels. Each image in the R, G and B channels is a matrix composed of pixels. The size of the matrix (of dimensions x along the x-axis , and y along the y-axis) depends on the resolution of the camera used for the data collection. Each pixel assumes a value representing the color light intensity: the value 0 means black, whereas the maximum value is the white. The numerical values of each pixel depend on the number of bytes used to represent a given R, G, B channel. When considering commercial 8-bit/channel cameras (24-bit for RGB colors), the maximum value is 2 (i.e., 255 colors including zero).
When an object is recorded by a video, the pixel of each frame of the video assume an intensity level caused by the light reflected from the object over a two-dimensional grid of pixels. In the RGB color model separate intensity signals corresponding to each channel—
,
,
—can be recorded at each frame
f. The measured intensity of any reflected light (
V) can be decomposed into two components: (i) intensity of illumination (
I), and (ii) reflectance of the surface (
R):
The respiratory activity causes the periodic movement of the chest wall. During inspiration, the ribcage widens: it results in an upward movement of the thorax; during expiration, the opposite occurs. By considering the chest wall covered by clothing as the surface framed by the camera, and the intensity of illumination almost constant, the changes of intensity of reflected light between two consecutive frames can be considered caused by the movement of the chest surface. Breathing-related chest movements are transmitted to the clothing (e.g., t-shirts, sweaters), so the subsequent changes of V can be used to collect respiratory patterns and events indirectly. Loose- or slim-fit clothing differently adhere to the skin. In the case of slim-fit clothing, we can hypothesize the complete transfer of chest wall movement to the side of the t-shirt framed by the camera, whereas only a partial transfer in the case of loose-fit clothing.
2.2. Hardware for Video Data Recording
The proposed system needs to collect a video of a person seated in front of the camera (
Figure 1). The hardware module consists of a built-in CCD RGB webcam (iSight camera) integrated into a MacBook Pro laptop (by Apple Inc., California, USA). This camera is used to collect video with a resolution of 1280
720 pixel. Video images are recorded at 24-bit RGB with three channels, 8 bits per channel. A bespoke interface was developed in
Matlab (MathWorks, Massachusetts, USA) to record the video and pre-process the data (i.e., images) collected with the camera. The video is collected for 120 s at a frame rate of 30 Hz, which is enough to register the breathing movements.
2.3. Algorithm for the Preprocessing of the Video
The preprocessing of the recorded video is performed off-line via a bespoke algorithm developed in
Matlab, which is an upgraded version of the algorithm presented in our previous papers [
29,
30]. Several steps must be followed as shown in
Figure 1.
Basically, after the video is loaded, the user (i.e., the one who is designated to analyze the data) is asked to select one pixel (with coordinates
,
) at the level of the jugular notch (i.e., the anatomical point near the suprasternal notch) in the first frame of the video. This anatomical marker has been chosen because it is easily identifiable (see
Figure 1).
Automatically a rectangular region of interest (in short ROI) is delineated, with dimensions
×
:
where
x and
y are the
x-axis and
y-axis frame dimensions (related to camera resolution), respectively.
The selected ROI is then split into three same-size images corresponding to the red, green, and blue channels. At each frame
f, the intensity components of each channel
are obtained, where
c is the color channel (i.e., red (R), green (G), and blue (B)). Then, the intensity components are averaged for each line
y of the ROI according to Equation (
3):
where
.
From each
, the mean of the signal is removed from the signal itself (i.e., the signal is detrended). The standard deviation of each
signal is then calculated. The 5% of the
with the higher standard deviations are selected. The 5% value was selected with an empirical approach using data from previous experiments carried out on volunteers aimed at calibrating the algorithm. The 5% of the
are used to calculate the mean value considering the selected lines at each frame. The
signal is obtained with this procedure. At that point, filters were applied to the
signal. For filtering the signal and to emphasize the respiratory content, adequate cut-off frequencies and bandwidth need to be defined. A bandpass configuration was chosen, by fixing the low cut-off frequency around 0.05 Hz, to avoid the slow signal variations unrelated to respiratory movements and a high cut-off frequency around 2 Hz. In this way, the changes generated by the respiratory movements recorded to the webcam sensor can be adequately isolated and relayed to the subsequent elaboration stages. A third order Butterworth digital filter was employed. Finally, the
signal is normalized to obtain
as reported in the following Equation (
4):
where
and
are the mean and standard deviation of signal
, respectively.
The signal
is used for extracting respiratory temporal information (i.e., period duration—
and respiratory rate—
) since
would be proportional to the changes in the intensity component, and thus to the underlying respiratory signal of interest (
Figure 2). A window of 60 s is shown in
Figure 2B. In this figure the apnea phase of about 5 s used for synchronizing reference signal and video-derived signal in the experimental trials is not shown (see
Section 3.1).
3. Tests and Experimental Trials
3.1. Participants and Tests
In this study, we enrolled 12 participants (6 males and 6 females) with a mean age 24 ± 4 years old, mean height of 165 ± 15 cm, mean body mass of 60 ± 10 kg). All the participants provided informed consent. We have created a data set for evaluation of the proposed system. We aim to cover normal breathing (i.e., respiratory frequency in the range 8–25 breathsmin), abnormal breathing (i.e., tachypnea) and apnea stages.
Each participant was invited to sit on a chair in front of the web camera at a distance of about 1.2 m. The user adjusted the screen of the laptop in order to record the trunk area (as shown in
Figure 1). All the experiments were carried out indoor (in a laboratory room) and with a stable amount of light delivered by neon lights and three windows as sources of illumination. The participants’ shoulders were turned towards the furnishings of the room. The windows were lateral to the scene recorded by the camera. Other people were in the room during the data collection but not allowed to pass near the shooting area.
Participants were asked to keep still and seated, and to breathe spontaneously by facing the webcam. Each volunteer was called to breathe quietly for around 5 s, simulate an apnea of duration <10 s, and then to breathe quietly at self-paced for all the duration of the trial (120 s). Each volunteer carried out two trials with the same experimental design: in the first trial, the participant wore a loose-fit t-shirt; in the second trials, a slim-fit t-shirt. Two volunteers were also invited to simulate abnormal breathing (i.e., tachypnea) that is characterized by high values (>35 bpm).
At the same time, respiratory pattern was recorded with a reference instrument described in the following
Section 3.2.
3.2. Reference Instrument and Signal
For registering reference pattern, a head-mounted wearable device was used. We already used this system in a similar scenario [
31]. This device is based on the recording of the pressure-drop (
) that occurs during the expiratory/inspiratory phases of respiration at the level of nostrils. The device consists of a cannula attached to the jaw with tape: one piece of tape at the end of the nostrils in order to collect part of the nasal flow while the other tap is connected to a static tap of a differential digital pressure sensor (i.e., Sensirion—model SDP610, pressure range up to ±125 Pa). The pressure data were recorded with a dedicated printed circuit board described in [
31], at 100 Hz of sample rate. Data were sent to a remote laptop via a wireless connection and archived.
Negative pressure was collected during the expiratory phase and positive pressure during the inspiratory phase, as can be seen in
Figure 2A. Then, a temporal standard cumulative trapezoidal numerical integration of the
signal was carried out to obtain a smooth respiratory signal for further analysis (
) and to emphasize the maximum and minimum peaks. Afterward, such integrated
has been filtered using a bandpass Butterworth filter in the frequency range 0.05–2 Hz and normalized as in Equation (
4) and
has been obtained. This
is the reference respiratory pattern signal, then used to extract breath-by-breath
reference values (i.e.,
).
As shown in
Figure 2B, one breath is the portion of the signal between the starting point of the inspiration and the end of the following expiration. During the inspiratory phase, the
signal pass from 0 to positive values (grey area in
Figure 2A), and
is an increasing signal. During the expiratory phase, the opposite situation:
signal passes from 0 to negative values (green area in
Figure 2A), and
is a decreasing signal.
3.3. Respiratory Rate Calculation
The breathing rate can be extracted from both the reference signal
and
either in the frequency or time domains [
21,
32]. The analysis in the time domain requires the identification of specific points on the signal. Mainly, two different approaches may be used: (i) based on the identification of the maximum and minimum points; or (ii) the zero-crossing point individuation on the signals. In this work, we used a zero-crossing-based algorithm. We used the same algorithm for the event detection on both the reference signal
and
. The algorithm provides the detection of the zero-crossing points on the signal based on signum function. It allows determining the onset of each respiratory cycle, characterized by a positive going zero-crossing value. The signum function of a real number
x is defined as in the following Equation (
5):
where
is the value
x of the signal for frame index
i corresponding to the onset of a respiratory cycle. Then, the algorithm provides the location of local minimum points on the signal and their indices between respiratory cycle onsets determined in the first step.
The duration of each
i-th breath—
—is then calculated as the time elapsed between two consecutive minima points (expressed in s). Consequently, the
i-th breath-by-breath breathing rate
, expressed in breaths per minute (bpm), is calculated as in Equation (
6):
3.4. Data Analysis
We recorded the breath-by-breath respiratory rate with our system and the reference instrument and evaluated the discrepancies coming from their comparison. Signals obtained from the measuring system have been compared to the reference signals. Firstly the and were synchronized to be directly compared. We used the apnea stage to detect a common event on both signals. All the analysis were carried out on both the and that occur after the first end expiratory point after the apnea stage. The breath-by-breath values have been compared between instruments by extracting such values with the time-domain analysis from (i.e., ) and (i.e., ).
To compare the values gathered by the reference instrument and computed by the video-based method, we use the mean absolute error (MAE) as in Equation (
7):
where
n is the number of breaths recognized by the algorithm for each subject in the trial.
Then, the standard error of the mean (SE) is calculated as in Equation (
8):
where
is the standard deviation of the absolute difference between estimations and reference data
. Standard error was used to provide a simple estimation of uncertainty.
Lastly, the percentage difference between instruments was calculated as in Equation (
9), per each volunteer:
Additionally, we used the Bland–Altman analysis to investigate the agreement between the proposed method and the reference, in the whole range of
measurement. With this graphical method we investigated if the differences between the two techniques against the averages of the two techniques presented a tendency at the different
collected during the trials. The Bland–Altman analysis was used to obtain the mean of the Differences (MOD) and the limits of Agreements (LOAs) values [
33] that are typically reported in other studies and extremely useful when comparing our results with the relevant scientific literature [
2].
To fulfill the scope of this paper we carried out three separate analyses using these metrics for comparisons. Firstly, we used the data collected with slim-fit and loose-fit clothing to investigate the influence of clothing on the performance of the proposed method, using both male and female data. Then, we separately use the data collected from male and from female to investigate the influence of sex on performance. Lastly, the overall performance of the proposed measuring system has been tested considering all the breath-by-breath (n = 411). Preliminary tests have been also done using data collected from two volunteers during tachypnea.
5. Discussion
In this paper, a single built-in camera system is proposed for the extraction of the respiratory pattern and the estimation of breath-by-breath . The built-in camera of a commercial laptop allows the non-intrusive, ecological, and low-cost recording of chest wall movement. The algorithm for the processing of images allows (i) the chest wall video recording at sufficient frame rate (i.e., 30 Hz), (ii) the selection of a pixel for further semi-automatic selection of a ROI for the measurement of the pixel intensity change, in order to extract video-based respiratory pattern , and (iii) the post-processing of the signal to estimate breath-by-breath values. The proposed system has been tested on healthy participants. Tests were carried out on male and female participants wearing both slim-fit and loose-fit t-shirts to simulate real respiratory monitoring conditions (e.g., a subject at home, patient in a medical room, etc.). In the literature, rarely authors take into account the influence of sex and clothing when camera-based methods are used. Additionally, in this paper, we used an unobtrusive head-mounted wearable as reference instrument to not compromise the area recorded by the camera.
Signals obtained with the proposed method allow clear identification of the apnea stages, breathing pattern at quiet pace and during tachypnea in all the trials. Considering the breath-by-breath
values, we obtained comparable MAE and SE values in the two groups (slim-fit vs. loose-fit). From the analysis of the bias revealed by the Bland–Altman plots, we found slightly better results with volunteers wearing slim-fit clothing (LOAs of ±0.98 bpm against ±1.07 bpm with loose-fit clothing). These results confirm those obtained in [
29]. Considering the sex, results demonstrated good performance with both males and females with slightly lower bias in females (−0.01 ± 0.73 bpm) than in males (0.01 ± 1.22 bpm). By considering all 414 breaths, the Bland–Altman analysis demonstrates a bias of −0.01 ± 1.02 bpm of the proposed method when compared to the
values gathered by the reference instrument. The method proposed in [
20] achieves bias of −0.32±1.61 bpm when tested in similar setting and participants. Then, the bias we found is comparable with the one reported in [
34] (i.e., −0.02 ± 0.83 bpm) where the pseudo-Wigner–Villetime frequency analysis was used (with a
resolution of 0.7324 bpm). The performances we obtained are better than those obtained in [
35] where the average
were considered (bias of 0.37 ± 1.04 bpm), and advanced signal and video processing techniques, including developing video magnification, complete ensemble empirical mode decomposition with adaptive noise, and canonical correlation analysis were used in the post-processing phase. When compared to depth sensors used on participant in supine position [
16], our method demonstrates comparable results with simplicity and cost (∼0.01 ± 0.96 bpm in [
16]). Despite the absence of contact with the subject, the proposed method shows overall performance similar to those obtained with wearable device for
monitoring requiring direct contact with the torso (e.g., garment with optical fibers showed bias of −0.02 ± 2.04 bpm in [
36], during quiet breathing). In contrast to other research studies, we did not use a background behind the user to test the system in conditions resembling real application scenarios. Further tests might be focused on extracting respiratory volumes by using a more structured environment during video collection as in [
37].
One of the main limitations of this study is the limited number of subjects included in the analysis. For this reason, we did not perform any statistical analysis because population size does not allow any statistically significative conclusions. Additionally, we tested the proposed method at one distance between camera and subject (i.e., 1.2 m).
Further effort will be mainly devoted to addressing these points. Tests will be carried out to investigate the performance of the system in different scenarios at different subject–camera relative distances, and on many subjects. Furthermore, performance of the method will be tested in a wide range of atypical respiratory pattern (i.e., tachypnea, deep breaths, Cheyne-Stokes) and in extracting additional respiratory parameters (e.g., duration of expiratory and inspiratory parameters, inter-breathing variations). We are already testing the validity of additional techniques based on pixel flow analysis to remove unrelated breathing movements. Additionally, we are working on feature selection approaches to use the proposed method for respiratory monitoring when small movements of the user happen. We hope to use the proposed measuring system for respiratory monitoring even with undesired subject’ motion, also by implementing a fully automatic process to detect ROI from video frames. These steps will allow automatic and long-term data collection.