The feature database was built by looping over all subjects and using their respective radar and CNAP® data. For each subject, the whole radar data were divided into individual pulse waves. Sometimes, the waveforms appear to be inverted—the cause of this phenomenon remains a matter for future research. If the data were indeed inverted, the whole waveform series was multiplied by in order to revert the inversion. The process of determining whether a waveform chain was inverted was performed manually. However, automating the detection of inversion will be implemented in the future.
This section first gives an overview over the algorithm’s skin displacement extraction results. It then outlines filtering results. Finally, the results from training neural networks for regression are presented.
6.1. Skin Displacement Extraction
The resulting skin displacement extraction attached to a subject’s wrist has proven to be capable of extracting the smallest movement, as small as a few µm. An example of live extraction of skin displacement of subject 39 can be seen in
Figure 16.
The data acquisition worked exceptionally well for this subject, producing periodic pulse waves of roughly 50 µm amplitude. The algorithm captured the pressure morphology exceptionally well, being able to extract not only the systolic peak location but also the dicrotic notch and diastolic peak. The fact that the DOA was ° proves the claim that a DOA in the range ° °] is sufficient.
To demonstrate that the extracted waves are well above noise level,
Figure 17 depicts an example where the radar device is placed on the table, antenna side up. Not only is the DOA at 20 °, but the pulse wave signal also shows random non-periodic movement in the range of maximum 3 µm.
Figure 18 shows a plot of the [0, 1]-scaled extracted pulse waves from both radar and the CNAP
® reference blood pressure device. The figure clearly displays that the skin displacement matches the expected wave morphology from the reference device, only the diastolic peak is slightly higher.
These figures demonstrate that the algorithm is very capable of extracting skin displacement that shows a meaningful resemblance to the expected pressure waveform morphology.
However, the algorithm did not perform equally well for each subject. For some subjects, skin displacement extraction worked remarkably well, as demonstrated above. However, some extractions were slightly noisy, for example for subject 9 in
Figure 19.
Some recordings produced such deficient results that it is likely that the radial artery was not located well enough since it seems that only noise was recorded, see
Figure 20.
Not only is the amplitude of extracted skin displacement extremely low, but it also does not resemble the expected morphology.
Nonetheless, it must be noted that, for this subject specifically, manually locating the artery by using the index finger to locate the pulsation origin was also difficult.
It can thus be concluded that extracting the skin displacement using radar is indeed possible with the provided signal processing algorithm. Sometimes the extracted waves are slightly noisy—likely when the artery is not located well enough. Under some circumstances, extraction seems very difficult. The reasons for that difficulty are small arterial pulsations from the subject, i.e., very small skin displacement, not locating the radial artery well enough, and the distance from the radar to the skin. Since the sensor is even closer to the skin than the first range bin (here cm), it is very likely that this has a tremendous impact on signal quality. Analyzing the reasons stated above remains a matter of future research.
6.3. Neural Network Training Results for Blood Pressure Value Regression
Neural networks are powerful tools that were utilized to train blood pressure mapping. In order to find a good neural network for predicting blood pressure using the extracted features, 126 fully connected feedforward networks were trained.
Various databases for different correlation thresholds were created. These were , , and . This was carried out in order to analyze whether it is more beneficial for learning to use more—but likely worse quality—data, or fewer data that fulfil higher quality standards. Essentially, this approach serves to analyze the bias-variance tradeoff where more data means more variance and fewer data mean less variance and potentially higher bias. The extracted features were not affected by this thresholding, only their assigned label. The database for training the neural network contained only those pulse waves that were classified as usable ones, i.e., those whose correlation exceeded the threshold.
The input features were preprocessed first by normalizing each feature column so that values of this feature lie in the range
. One reason for applying normalization is that the features vary in scale—bringing all features to the same range helps to stabilize the fitting procedure [
51].
Subsequently, the total database was shuffled and split into a training set and a testing set, with used for training and used for testing. The random state was set to 42. The value assigned to the random state variable controls the splitting of data and ensures that the same split is obtained when repeatedly splitting the data.
Several network topologies were used for each of these thresholds to fit the input data to the expected output. Only two hidden layers were used, since this work presents a study of general feasibility. For further optimization, more hidden layers could be utilized. A general two hidden layer network topology is sketched in
Figure 21.
The input layer has 25 input nodes, one for each feature. The first hidden layer had a varying size from 30 to 60 nodes, with an incremental increase of 5 nodes. The second hidden layer had either 5, 10, or 15 nodes, and the output layer had two nodes—one for the SBP prediction and one for DBP prediction. In order to decrease the weight of potentially unimportant features, L1 regularization can be applied to the input layer. Therefore, all network architectures were trained twice—once without regularization, and once with L1 regularization. In total, 42 networks have been trained per correlation threshold. The chosen activation function was the ReLU since it forces a positive output. The training process minimizes prediction error by iteratively adapting model parameters.
As regression is used to predict blood pressure pairs, the mean squared error of the predicted value compared to the ground truth was used as the loss function. The model parameters were optimized using the Adam optimizer, with a validation split of . This means that of the training data were withheld from the training process.
Instead, model performance is evaluated after each epoch by testing on the validation set.
Each network was trained for 50 epochs, with a batch size of 10. This means that the model iterates through the training set 50 times. During each epoch iteration, ten samples at a time will be fed into the network and used to update model parameters until the model has seen all samples of the training set.
The root mean squared error (RMSE) was chosen as a regression metric. Metrics are used to monitor the model’s performance during training. The errors are calculated as the difference between ground truth and prediction, and their mean error is calculated by averaging all of the squared errors. Finally, the predictive quality was tested by applying the trained model to the previously unseen testing data.
Training for different thresholds was performed in order to determine whether more data of worse quality or fewer data of better quality would produce better outputs. Increasing the correlation threshold leads to fewer data being used to train the networks. Hence, data variance is decreased and its bias is increased.
The AAMI has additional requirements for the validation of non-invasive blood pressure measuring devices [
50]. These are summarized in
Table 5.
The best network per correlation threshold is chosen by selecting the neural network that produces the smallest RMSE for that threshold.
Table 6 gives an overview of the results of applying the best model per correlation threshold on its test set, where a bold number means that this error is within AAMI bounds for non-invasive blood pressure monitoring.
Additionally, the performance of our small networks was compared to the British Hypertension Society standard for BP measuring [
52], which is summarized in
Table 7.
We compared our minimal network models to the BHS standard in
Table 8—for the chosen small model architectures, the results were insufficient compared to the BHS standard. In particular, the percentage requirement for errors below 5 mmHg were not reached. Here, the authors would like to point out that reaching this standard was not a priority of the presented work, but rather supplying the reader with an in-depth radar signal processing pipeline and a set of meaningful features. For a very shallow network and limited training time, this still shows a great feasibility of the approach. Deeper network architectures are very likely to improve these results.
For correlation threshold , the best neural network was “25-45-10-2” (thus, two hidden layers with 45 and 10 nodes, respectively) trained with L1 regularization, where the RMSE was 11.7 mmHg. This network returned mmHg (mean ± standard deviation) error for systolic pressure values and mmHg for diastolic pressure values.
Figure 22 shows the results, loss, and metrics of the chosen network for correlation threshold 0.7.
Figure 22a,b show the scatter plots of systolic and diastolic values, respectively.
For each pulse wave in the testing set, the ground truth value (x-axis) is plotted against its predicted value (y-axis). For a perfect mapping, the prediction lies on the bisecting line of the coordinate system. For easier visualization, a line—visualized in blue—is fitted through the scatter plot. That way, deviation from the perfect bisecting line is visualized more clearly. The output of the predicted diastolic values seems to be restricted to a certain range since values below 60 mmHg are always mapped to roughly 60 mmHg and do not follow the bisecting line. Similarly, all values above 90 mmHg seem to be clipped to 90 mmHg. The r2-score for systolic and diastolic predictions is indicated in
Figure 22. A perfect model that captures data variability optimally will have an r2-score of 1. As visible from
Figure 22, the r2-score for systolic values is higher than for diastolic values.
For correlation threshold
, the best neural network was “25-50-15-2”, trained without regularization, where the RMSE was 11.0 mmHg. This network returned
mmHg (mean± standard deviation) error for systolic pressure values and
mmHg for diastolic pressure values.
Figure 23 shows the training results that were estimated during the training of network “25-50-15-2” for correlation threshold 0.8. Again, the systolic (
Figure 23a) and diastolic (
Figure 23b) scatter plots show that they follow the bisecting line sufficiently well. The mapping of diastolic values is better than for the network in
Figure 22 since the fitted scatter plot line is closer to the perfect bisecting line.
This claim is additionally backed by the fact that the r2-score is higher for both systolic and diastolic predictions.
For correlation threshold , the best neural networks were “25-45-15-2” and “25-60-10-2”, both trained without regularization and with an RMSE of 11.7 mmHg.
The former network returned mmHg (mean± standard deviation) error for systolic pressure values and for diastolic pressure values, and the latter mmHg and mmHg, respectively.
Figure 24 shows the results of applying the network “25-45-15-2” from correlation threshold 0.9 on the test set. Here, the scatter plots show that the systolic (
Figure 24a) and diastolic (
Figure 24b) predictions fall slightly better onto the bisecting line. The clipping of values in higher blood pressure values seems to have been resolved, while the r2-score remained the same as for correlation threshold 0.8. Only the prediction of lower values of blood pressure seems to surpass a lower threshold.
As discussed previously, the prediction of blood pressure values seems to be restricted to a certain output range. At first glance, this could look as though the network is producing bad predictions by requiring a minimum output or suppressing the output below a maximum. However, upon closer ground truth inspection it becomes apparent that the ground truth contains data that is in an unlikely range, e.g., a systolic BP reading of 40 mmHg.
Therefore, it can be argued that the reference device returned bad blood pressure estimates, e.g., when the finger cuff is not tight enough. Hence, the radar prediction might be even more favorable.
Next, contributing factors to successful blood pressure prediction are discussed. For that, the correlation of features with the ground truth systolic pressure and diastolic pressure are listed in
Table 9 and
Table 10, respectively. A higher correlation value of a feature with the ground truth value signifies a higher feature importance. The shown tables serve to visualize feature importance to the reader. This analysis was only performed for the database, which utilized 0.9 as the correlation threshold, since it showed the most promising results.
The reason for this assumption is that training neural networks on this database produced the most results where the mean systolic and diastolic errors were below 5 mmHg.
For the systolic pressure, height, age, gender, weight, and systolic upstroke time are the top five contributors to systolic blood pressure (see
Table 9). All five of these show a positive correlation, i.e., the SBP increases when these features increase. This was an expected result since the four calibration parameters all relate to an increase in BP—e.g., it is well-known that BP increases with age. The features that least contribute to systolic BP prediction are SW25, SW10, DW33, DW75, and DW50.
For the diastolic pressure, height, pulse pressure, diastolic height, point of reflection, and systolic height are the top five contributors (see
Table 10). Only the diastolic height shows a negative correlation—this is correct and expected since the pulse wave starts at a negative value due to the calibration procedure. The lower the diastolic point height is, the greater the DBP height is, in turn. The positive contribution of height was expected as well, as previously discussed. The fact that pulse pressure and systolic height are positively correlated to DBP prediction is likely explained by the following relation: systolic BP has a much wider range than diastolic BP. When the pulse pressure is large, it is hence likely that the systolic height increases.
It thus appears that higher systolic values allow for higher diastolic ones.
The features that least contribute to diastolic BP prediction are DW25, diastolic time, SW10, SW25, and DW33.