Another analysis identified which were the main applications obtained from the applied techniques.
Figure 3 illustrates the mapping of the healthcare applications found among the selected studies. Fall detection represents more than 32% of the analyzed studies, followed by cardiovascular health monitoring, epileptic seizure detection, heart anomaly detection, and heart disease prediction. Other applications such as food intake monitoring, stroke prediction, blood pressure estimation, and others cover the remaining selected articles. It is important to understand that Yazici et al. in [
20] discussed more than one application, so it appears twice on the mapping.
4.1. Fall Detection
As stated by Paramasivam et al., falls are accidental events caused by a loss of center of gravity resulting from either a lack of an active effort or an insufficient effort to restore balance [
10]. Various factors can cause falls, such as imbalance, poor posture, vision impairment, foot problems, muscle weakness, and others. Supporting this, Ghosh et al. highlighted that almost 40% of injury-related deaths among elderly citizens result from falls [
33]. Given this context, it is evident why this is the most extensively researched topic in the field.
A significant challenge in detecting falls lies in distinguishing between a fall and a non-harmful common event, such as a user simply lying down. Additionally, the necessity to alert caregivers as quickly as possible is paramount.
In the domain of wearable devices, there is no standard hardware configuration among the articles; each study employs different hardware, although almost all incorporate an accelerometer as a key component.
Table 4 details the hardware utilized in each study.
In the edge layer, Raspberry Pi boards were mentioned in most articles, with an STM board also appearing as the option used by Campanella et al. in [
30]. In contrast, Ghosh et al. in [
33] utilized the user’s smartphone for this purpose, while Baktir et al. in [
18] conducts simulations of the edge layer on a computer. Notably, Chetcuti et al. in [
11] does not specify the hardware used for the edge layer. Detailed information on edge layer implementation can be found in
Table 5.
Among machine learning (ML) algorithms, no definitive conclusion can be drawn regarding which is the most effective. However, neural networks are frequently utilized. Convolutional Neural Networks (CNNs) are considered the optimal choice by Paramasivam et al. in [
10] (where they were combined with a Long Short-Term Memory Network (LSTM)) and Yazici et al. in [
20], while isolated LSTMs were selected by Utsha et al. in [
26] and Queralta et al. in [
31]. The Feedforward Neural Network (FFNN) was employed in Campanella et al. [
30].
Other notable approaches include the combination of Federated Learning with a Hidden Markov Model (HMM) and an LSTM by Ghosh et al. in [
33], primarily to ensure personal data security. Additionally, XGBoost was applied instead of neural network algorithms by Chetcuti et al. in [
11], and a Support Vector Machine (SVM) was used by Baktir et al. in [
18]. The Deep Gated Recurrent Unit (DGRU) was utilized by Al-Rakhami et al. in [
23].
Regarding training data, no consistent pattern was discernible. Some datasets were used in multiple studies, such as the MobiAct dataset, which was utilized by Queralta et al. in [
31] and Ghosh et al. in [
33], and the Sisfall dataset, which was employed by Sarabia-Jacome et al. in [
26] and Campanella et al. in [
30]. Another common practice was the use of data collected by the researchers themselves. In some studies, this was the sole data source, while in others, such as Al-Rakhami et al. in [
23], Campanella et al. in [
30], and Ghosh et al. in [
33], it was combined with external datasets.
The information about the used datasets and the ML algorithms is presented in
Table 6. Since all studies used accuracy as a metric, this information is also included.
Accuracy is a commonly used metric for assessing machine learning models, especially in detection tasks. It quantifies the proportion of correctly classified instances, including both true positives and true negatives, relative to the total number of instances.
4.2. Cardiovascular Health Monitoring
In contrast to the majority of the found articles, Utsha et al. in [
16] and Talha et al. in [
25] are monitoring applications focused on the early detection of cardiac diseases and providing a platform for both patient and doctor to monitor the cardiac health. Utsha et al. asserts that detecting diseases at earlier stages improves treatment outcomes [
16]. The continuous analysis of vital signs (heart rate, respiratory rate, oxygen saturation, and blood pressure) can predict or detect neonatal pathophysiology, offering the potential to improve outcomes and mitigate neonatal diseases using big data analytics.
In the wearable hardware domain, Utsha et al. in [
16] relies on ECG electrodes connected to an ECG module and a microcontroller board, while Talha et al. in [
25] does not specify any hardware but suggests the use of biosensors to measure body temperature, heart rate, blood pressure, respiration rate, and blood oxygen saturation.
For the edge layer implementation, Utsha et al. in [
16] uses the user’s smartphone to perform the necessary computations, whereas Talha et al. in [
25] lacks specific definitions regarding the edge layer implementation.
Table 7 shows the results found on the articles.
In the area of machine learning algorithms, both articles tested various models to find the best fit. Utsha et al. in [
16] evaluated Convolutional Neural Networks (CNNs), Artificial Neural Networks (ANNs), and Long Short-Term Memory Networks (LSTM), while Talha et al. in [
25] assessed Logistic Regression, Naive Bayes (NB), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Trees (DT), and Random Forest (RF).
Both articles relied on datasets, with Utsha et al. in [
16] utilizing the MIT-BIH Arrhythmia dataset and Talha et al. in [
25] employing the MIMIC-III dataset.
Additionally, both articles used accuracy as a metric. Utsha et al. in [
16] found that the LSTM model provided the best results, achieving an accuracy of 95.94%, while Talha et al. in [
25] identified RF as the best option, reaching an accuracy of 95%.
Table 8 presents the results for each article.
4.3. Epileptic Seizures Detection
Baghersalimi et al. stated that epilepsy is a prevalent neurological disorder affecting approximately 65 million individuals of all ages worldwide [
19]. This condition manifests in various forms, with severity ranging from mild to severe, and encompasses a spectrum of seizure types, each with distinct consequences. Consequently, individuals with epilepsy and their families encounter a wide array of challenges and experiences specific to their condition. Among these challenges are issues such as access to quality healthcare, adequate information and coordination of services, and societal stigma. A particularly grave concern is SUDEP (sudden unexpected death in epilepsy), a rare but potentially fatal occurrence that typically transpires during or after a seizure, leading to unexpected deaths within the epilepsy community.
Ingolfsson et al. also notes that the continuous monitoring of brain activity is essential for personalizing patient treatments, which can be performed using electroencephalography (EEG) techniques [
12]. Given this context, the necessity for a wearable and fast-response solution becomes apparent.
In the context of wearable devices, both articles focus on EEG systems due to the nature of the detection required, and both are based on hardware proposed in other studies. Despite these similarities, each study proposes different hardware: Ingolfsson et al. in [
12] based the application on a brain–computer interface wearable called BioWolf, proposed by Kartsch et al. in [
38], while Baghersalimi et al. in [
19] suggests the use of e-Glass, a wearable device with four electrodes developed by Sopic et al. in [
39].
For the edge layer, each study adopts a different approach. The BioWolf device, as described by Ingolfsson et al. in [
12], includes an integrated microprocessor, making the edge layer entirely self-contained within the wearable device. Conversely, Baghersalimi et al. in [
19] tested two different platforms (Kendryte K210 and Raspberry Pi Zero) and ultimately selected the Kendryte K210 microcontroller platform.
Table 9 presents these data.
Regarding machine learning algorithms, the studies also diverge. Baghersalimi et al. in [
19] employed deep neural networks combined with Federated Learning (FL), whereas Ingolfsson et al. in [
12] evaluated several algorithms, including Support Vector Machine (SVM), Random Forest (RF), Extra Trees (ETs), and AdaBoost Classifier.
Both studies utilized pre-existing datasets: Ingolfsson et al. in [
12] used the CHB-MIT dataset, while Baghersalimi et al. in [
19] used a combination of three datasets: EPILEPSIAE, TUSZ, and MIT-BIH.
The results from both studies appear promising. As showed on
Table 10, both achievied an accuracy of 100%. Specifically, Ingolfsson et al. in [
12] attained this accuracy with the RF and ET algorithms.
4.4. Heart Anomaly Detection
Congenital heart anomalies are defects in the structure of the heart or great vessels. They are usually present at birth but can manifest later in life. These anomalies are classified as cardiovascular diseases and can lead to serious health problems. In this context, Firouzi et al. in [
22] and Yazici et al. in [
20] explored the application of machine learning (ML) algorithms in edge computing for detecting such anomalies.
Regarding wearable hardware, Firouzi et al. in [
22] does not specify any particular microchip or sensor, only emphasizing the necessity of using ECG signals for detection. In contrast, Yazici et al. in [
20] is based on the ECG module AD8232.
In terms of the edge layer, each study proposes a unique approach. Firouzi et al. in [
22] aims to propose a task offloading strategy, using fog computing nodes as the basis for this goal, but does not specify any hardware configuration. Yazici et al. in [
20], on the other hand, utilizes a Raspberry Pi Zero as the edge layer, as presented on
Table 11.
Different machine learning algorithms were employed in these studies. Firouzi et al. in [
22] focused on Convolutional Neural Networks (CNNs) as the ML algorithm, whereas Yazici et al. in [
20] tested three classifiers for heart anomaly detection: Deep Neural Networks (DNNs), Random Forest (RF), and CNN, with RF yielding the best results.
Both studies utilized pre-existing datasets: Firouzi et al. in [
22] used the MIT-BIH dataset, and Yazici et al. in [
20] used the MHEALTH dataset for heart anomaly detection. Accuracy was used as the metric for evaluating the trained ML models in both studies, with the results presented in
Table 12.
4.5. Heart Disease Prediction
As stated by Jenifer et al., cardiovascular diseases claim 17.9 million lives globally each year [
28]. The lives of individuals affected by sudden heart damage could be saved if such events were predicted before their occurrence. Research in this specific area is highlighted by Chakraborty et al. in [
24] and Jenifer et al. in [
28].
In the domain of wearable devices, the articles adopted different approaches. Chakraborty et al. in [
24] provides fewer details about the wearable implementation but suggests using a combination of a blood pressure sensor, a fasting blood sugar sensor, and a heart rate sensor, recommending commercial brands for the latter two without specifying the device names. Conversely, Jenifer et al. in [
28] details the use of a temperature sensor (DS18B20), an accelerometer (ADXL1335), and an unspecified pulse sensor.
For the edge layer, Chakraborty et al. in [
24] does not explicitly choose an implementation, merely indicating the use of fog nodes to perform this function. In contrast, Jenifer et al. in [
28] employs a Raspberry Pi B+ device equipped with a specific Ethernet interface.
Table 13 shows these information.
Regarding machine learning algorithms, both articles tested various options to identify the best choice. Chakraborty et al. in [
24] evaluated Naive Bayes (NB), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest (RF), and Artificial Neural Network (ANN), while Jenifer et al. in [
28] selected NB, Decision Trees (DTs), KNN, and SVM. The algorithms were applied to different datasets: Chakraborty et al. in [
24] compiled data from various hospitals and health institutions to generate their dataset, while Jenifer et al. in [
28] initially trained the ML model with the Human Gait dataset and validated it with data collected during their research. Both studies used accuracy as a metric, with Chakraborty et al. in [
24] finding the best results using RF and Jenifer et al. in [
28] achieving the highest accuracy with the DT algorithm.
Table 14 summarizes their findings and the accuracies achieved.
4.6. Other Applications
Other articles have explored various healthcare applications, each employing different approaches and proposed systems.
In the cardiac domain, Ingolfsson et al. in [
34] focused on arrhythmia detection; Odema et al. in [
32] focused on myocardial infarction detection, and Petroni et al. in [
13] proposed a system for atrial fibrillation detection.
Other applications focus on detection: Pazienza and Monte in [
14] presented a system aiming to identify COVID-19/influenza, Gokul et al. in [
29] addressed freezing of gait detection, and Nandy et al. in [
36] focused on communicable disease detection.
Contrasting these detection systems, two articles proposed estimation systems: Pankaj et al. in [
17] focused on heart rate estimation, and Banerjee et al. in [
21] focused on blood pressure estimation.
Another application type we found was monitoring systems. Four articles were identified in this category: Rachakonda et al. in [
15] introduced a food intake monitoring system to track calorie ingestion; Kasaeyan et al. in [
35] introduced a pain-monitoring system; and Jiang et al. in [
37] introduced a stress-monitoring system.
Furthermore, Elbagoury et al. in [
27] proposed a stroke prediction system.
Given the diverse application objectives and types, a variety of hardware is proposed for wearables. The majority of the articles do not specify the models or devices, but some details are given about the types of sensors useful for the proposed systems.
Table 15 shows these definitions.
Petroni et al. in [
13], Banerjee et al. in [
21], and Ingolfsson et al. in [
34] based their studies on ECG sensors. Elbagoury et al. in [
27] and Kasaeyan et al. in [
35] proposed the usage of a body area network (BAN) composed of EMG, ECG, and galvanic skin response (GSR) sensors. Specifically, Elbagoury et al. in [
27] also included an EEG sensor, while Gokul et al. in [
29] relied on a group of accelerometers attached to the patient’s body. Within the BAN group, Nandy et al. in [
36] does not specify the sensors but states the necessity of including data from the pulse rate, body temperature, and other common health monitoring metrics. Additionally, Rachakonda et al. in [
15] proposed the use of glasses equipped with cameras.
In the more specific category, Pankaj et al. in [
17] based their work on a photoplethysmogram optical sensor using the GRAVITY SEN0203 device; and Jiang et al. [
37] employed a commercial device called RespiBAN, which includes ECG, electrodermal activity (EDA), EMG, respiratory signal, temperature, and accelerometer sensors. Another notable article on this matter is that by Pazienza and Monte in [
14], which describes a multi-sensor hardware installed on a 3D-printed mask. Conversely, Ghosh et al. in [
33] did not provide any information about the wearables. On
Table 16 a better visuallization for these information can be found.
Regarding the edge layer, most articles are specific about the implementation. It is noteworthy that Pazienza and Monte in [
14], Elbagoury et al. in [
27], and Jiang et al. in [
37] proposed using the user’s smartphone as the edge layer, with everything connected to the wearables. Other articles, such as those by Petroni et al. in [
13], Pankaj et al. in [
17], Gokul et al. in [
29], Odema et al. in [
32], Ingolfsson et al. in [
34], Kasaeyan et al. in [
35], and Nandy et al. in [
36], focused on commercial microcontrollers. For instance, Gokul et al. in [
29] tested two different boards and identified the Raspberry Pi 3 Model B as the best option. Meanwhile, Rachakonda et al. in [
15] and Banerjee et al. in [
21] proposed using a “common computer” as the edge layer. All these information are presented on
Table 17.
In terms of machine learning algorithms, neural networks are predominantly used. Banerjee et al. in [
21], Gokul et al. in [
29], Kasaeyan et al. in [
35], Nandy et al. in [
36], and Jiang et al. in [
37] tested at least two different algorithms, with only Kasaeyan et al. in [
35] finding two algorithms with the same results. Additionally, Pazienza and Monte in [
14] utilized the XGBoost Algorithm, Rachakonda et al. in [
15] employed the Single-Shot MultiBox Detector, suitable for the proposed application. Ingolfsson et al. in [
34] adopted a Temporal Convolutional Network (TCN), and Petroni et al. in [
13] used the Deep L1-PCA, an algorithm originally presented for brain connectivity measurements, and Odema et al. in [
32] proposed a CNN algorithm combined with Early Exit (EEx) and the Neural Architecture Search (NAS) technique. Elbagoury et al. in [
27] based the application on two different algorithms: the GMDH Neural Network for stroke prediction and the Sparse Autoencoder for stroke diagnosis.
For data collection, only Rachakonda et al. in [
15], Pankaj et al. in [
17], and Elbagoury et al. in [
27] gathered data through experiments, while the other articles relied on pre-existing datasets.
Since the selected articles cover a wide range of applications, there is no consistent pattern in the use of evaluation metrics. For those studies that report accuracy, it is used as a key metric and is presented in
Table 18. For studies that employ different metrics, these are listed in a separate column. Four additional metrics commonly used in the reviewed studies include the following:
Recall: Measures the completeness of the model’s output, indicating how many relevant instances it correctly identified.
Precision: Measures the precision of the model’s output, indicating how many of the instances it identified were actually relevant.
F1-score: Combines precision and recall into a single metric.
Mean absolute error (MAE): Measures the average absolute difference between predicted and actual values, providing a simple measure of error magnitude.
It is important to note that these metrics should not be used to directly compare results across different studies, as they target varied applications and objectives (e.g., prediction versus detection tasks). Instead, they are presented to showcase the specific evaluation methods and outcomes reported in each article.