1. Introduction
Class imbalance is a prevalent challenge in machine learning (ML), particularly in classification tasks where the distribution of data classes is significantly skewed. This occurs when one class, referred to as the majority class, vastly outnumbers another, known as the minority class [
1,
2]. Such imbalances can lead to models that perform well on the majority class but poorly on the minority class. For example, in a dataset where 95% of samples belong to one class and only 5% to another, a model could achieve 95% accuracy by always predicting the majority class, while completely failing to identify instances of the minority class.
Studies have shown that ML methods trained on imbalanced datasets tend to favor the majority class, resulting in poor predictive accuracy for the minority class, which is often the class of the greatest interest [
3,
4]. This issue is particularly critical in domains such as healthcare, where rare conditions or events are of high importance. For instance, in asthma management, severe class imbalance can hinder the performance of models predicting asthma attacks [
5,
6]. Similarly, in remote sensing imagery classification, datasets often exhibit significant imbalance, necessitating the careful consideration of evaluation metrics and classification techniques [
7,
8].
To address class imbalance, researchers employ data-level and algorithm-level techniques. Data-level methods include oversampling techniques like the Synthetic Minority Oversampling Technique (SMOTE), which generates synthetic samples for the minority class, and undersampling methods that reduce the number of majority class samples [
5,
7]. Algorithm-level methods involve modifying the learning algorithm itself, such as cost-sensitive learning which assigns higher penalties for misclassifying minority class samples [
7,
8].
SMOTE, in particular, generates synthetic examples of the minority class rather than replicating existing instances [
4,
6]. It selects a minority class instance and creates synthetic examples along the line segments connecting that instance to its nearest minority class neighbors [
9,
10]. This approach expands the minority class’ feature space, enabling classifiers to identify decision regions more favorable to minority instances. Unlike simple oversampling methods, which risk overfitting by duplicating data, SMOTE generates diverse, unique samples, helping models generalize better to unseen data [
10].
SMOTE has demonstrated its effectiveness across various domains, including fraud detection, medical diagnosis, and risk management. For example, in medical datasets where certain diseases are rare, SMOTE can improve detection rates, ultimately contributing to better patient outcomes [
11]. Additionally, SMOTE is algorithm-independent, making it a versatile preprocessing step for any classification algorithm. Its simplicity and effectiveness have led to the development of extensions like Borderline-SMOTE, which focuses on generating synthetic samples near decision boundaries to improve classification in critical regions [
12].
Despite its advantages, SMOTE has limitations. The introduction of synthetic data can sometimes lead to overlapping between classes, potentially reducing model performance. To address this, hybrid approaches like SMOTE combined with Edited Nearest Neighbors (SMOTEENN) have been proposed [
11].
SMOTEENN integrates SMOTE with the Edited Nearest Neighbors (ENN) algorithm to enhance classifier performance on imbalanced data [
11]. SMOTE generates synthetic minority class examples by interpolating between existing minority instances and their nearest neighbors, increasing the minority class population without duplicating samples. ENN then refines the dataset by removing noisy and ambiguous instances from both the majority and minority classes, particularly those misclassified by their
k-nearest neighbors. This sequential process not only balances the class distribution but also improves data quality by reducing noise, leading to better classification performance compared to using SMOTE or ENN alone [
11].
Building on the work of Joseph et al., who applied regressive machine learning algorithms for the real-time monitoring of bedridden patients at risk of falls [
13], this study aims to enhance predictive performance using the SMOTEENN technique. Joseph et al. analyzed six unique movements commonly performed by patients on hospital beds and developed an automated risk assessment system to reduce the need for constant monitoring by hospital staff [
14]. They addressed class imbalance in their dataset using SMOTE to improve model performance.
This study evaluates the effectiveness of SMOTE and SMOTEENN in addressing class imbalance and improving predictive accuracy. By applying SMOTEENN to the same dataset used by Joseph et al., the goal is to determine whether this hybrid approach provides improvements over SMOTE alone. Assessing the performance of SMOTEENN in this context has the potential to contribute to the development of more accurate and reliable real-time monitoring systems for fall risk assessment in bedridden patients, ultimately reducing the incidence of falls without requiring constant supervision by hospital staff.
This study focuses on demonstrating the practical efficacy of existing resampling algorithms, namely SMOTE and SMOTEENN, in addressing class imbalance within the context of regression-based ML models. While these techniques are not novel in themselves, their application and performance in the specific domain of fall risk assessment for bedridden patients using real time data have not been extensively explored. By conducting a comprehensive comparative analysis across various sample sizes and regression models, this research aims to provide insights into the effectiveness of SMOTE and SMOTEENN in improving predictive accuracy and model generalization in this critical healthcare application. The findings of this study can contribute to the development of more reliable and accurate real-time monitoring systems, ultimately reducing the need for constant patient supervision and enhancing patient safety.
3. Results
This section presents the results of the comparative analysis between SMOTE and SMOTEENN in addressing class imbalance. The first subsection examines the learning curves of both methods across various sampling strategies, providing insights into their generalization capabilities. The second subsection compares the performance of SMOTE and SMOTEENN at different sample sizes, analyzing their impact on model accuracy, R2 values, and mean squared error. The final subsection discusses the cross-validation analysis, highlighting the mean accuracy and standard deviation of both methods across different sampling levels.
3.1. Learning Curves
Recognizing healthy learning curves is essential for developing robust and reliable models across various applications. A learning curve typically plots a model’s performance, such as accuracy or loss, against the number of training instances or epochs. A healthy learning curve demonstrates a model’s ability to learn effectively from the data while generalizing well to unseen instances.
A healthy learning curve is characterized by a steady increase in both training and validation performance over time, eventually plateauing at a high level. The validation performance should consistently trail slightly behind the training performance, indicating the model’s capacity to generalize without severe overfitting. This gap between training and validation curves is crucial, as a large gap may suggest overfitting, while a non-existent or very small gap could indicate underfitting or data leakage.
Recognizing healthy learning curves is vital for several reasons. First, it allows researchers and practitioners to select the most appropriate model for a given task. By analyzing learning curves, one can identify models that exhibit a balanced trade-off between training and validation performance, ensuring that the chosen model can generalize well to new data. Second, learning curves serve as a guide for hyperparameter tuning, enabling adjustments to optimize a model’s performance and convergence.
Moreover, learning curves are essential for detecting overfitting, a common challenge in machine learning where a model performs exceptionally well on the training data but poorly on unseen data. By monitoring learning curves, practitioners can identify early signs of overfitting and implement regularization techniques or modify the model architecture to improve generalization.
In high-stake applications, such as medical diagnosis or financial risk assessment, recognizing healthy learning curves is particularly crucial. Models with healthy learning curves are more likely to capture meaningful patterns in the data rather than memorizing noise or dataset-specific characteristics. This reliability is essential for deploying machine learning models in real-world scenarios where accurate predictions can have significant consequences.
Figure 1,
Figure 2,
Figure 3,
Figure 4 and
Figure 5 compare the learning curves for SMOTE and SMOTEENN across different sampling strategies (100, 150, 200, 250, and 300 instances). The validation accuracy for SMOTE shows some improvement as the number of instances increases, eventually converging to the training score. However, the absence of a small gap between the training and validation curves, where the validation curve is slightly lower than the training curve, is a concern. This lack of a gap suggests that the model may not generalize well to unseen data, despite the convergence. Hence, the SMOTE learning curves are unhealthy across all sampling strategies due to the lack of a small gap between training and validation performance, even though the validation score does converge to the training score. The faster convergence with higher sampling strategies (e.g., 300 instances) does not necessarily indicate the improved health of the learning curve, as the absence of a gap remains a critical issue.
The validation accuracy for SMOTEENN shows a more promising trend. It gradually increases with the number of instances and appears to be approaching a plateau, particularly at a sampling strategy of 200 instances (
Figure 3). The presence of a small gap between the training and validation curves, where the validation curve is slightly lower than the training curve, indicates better generalization capabilities. The steady increase in the validation score is best observed at a sampling strategy of 200 instances. However, for values higher than 200 instances, the validation score starts to worsen again, suggesting a potential trade-off between sample size and model performance.
3.2. Performance Comparison at Different Sample Sizes
Across all sample sizes (100 to 300), SMOTEENN consistently outperforms SMOTE in terms of accuracy for all three models (Decision Tree Regression (DTR), Gradient Boosting Regression (GBR), and Bayesian Regression (BR)). The difference in accuracy between SMOTE and SMOTEENN is most pronounced at the smallest sample size (100), with SMOTEENN achieving up to 13.5% higher accuracy (e.g., BR with SMOTEENN at 0.96 vs. BR with SMOTE at 0.825). As the sample size increases, the accuracy gap between SMOTE and SMOTEENN narrows, but SMOTEENN still maintains a higher accuracy across all models and sample sizes.
For both training and test sets, SMOTE consistently achieves perfect or near-perfect R2 values (1.0 or 0.99) across all models and sample sizes, indicating potential overfitting. SMOTEENN also achieves high R2 values, but they are slightly lower than SMOTE’s, particularly for the training set. This suggests that SMOTEENN may be less prone to overfitting compared to SMOTE. The test set R2 values for SMOTEENN are generally higher than those for SMOTE, indicating better generalization performance.
When comparing SMOTE to SMOTEENN, differences in Mean Squared Error (MSE) can reflect how well each approach handles imbalanced class distributions. A high MSE indicates that the model has larger errors in its predictions and its estimated values deviate significantly from the true targets. A low MSE suggests that the model’s predictions closely match the actual values, reflecting better accuracy. SMOTEENN consistently achieves lower MSE values for both training and test sets across all models and sample sizes compared to SMOTE. The difference in MSE between SMOTE and SMOTEENN is most significant for the test set, with SMOTEENN achieving up to 81.6% lower MSE (e.g., DTR with SMOTEENN at 0.07 vs. DTR with SMOTE at 0.38 for sample size 100). As the sample size increases, the MSE values for both SMOTE and SMOTEENN generally decrease, but SMOTEENN maintains a lower MSE across all models and sample sizes.
The numerical results presented in
Table 1,
Table 2,
Table 3,
Table 4 and
Table 5 demonstrate that SMOTEENN consistently outperforms SMOTE across all three models (decision tree regression, gradient boosting regression, and Bayesian regression) and across all sampling strategies (from 100 to 300 instances). For example, at the smallest sample size of 100 instances, SMOTEENN achieved an accuracy from 0.93 to 0.96 across the three models, compared to an accuracy from 0.81 to 0.825 for SMOTE. Similarly, the MSE for SMOTEENN ranged from 0.04 to 0.07 on the test set, while SMOTE had MSE values between 0.15 and 0.38. These results highlight the superior performance of SMOTEENN in handling class imbalance and improving predictive accuracy in the context of fall risk assessment for bedridden patients.
3.3. Cross-Validation Analysis
Table 6 presents the cross-validation results for both SMOTE and SMOTEENN across different sampling strategies (100 to 300 instances). The table includes the mean accuracy and standard deviation for each method at each sampling level. SMOTEENN consistently achieves higher mean accuracy compared to SMOTE across all sampling strategies, with the exception of the 250-instance level, where both methods have the same mean accuracy of 0.97. The highest mean accuracy for SMOTEENN is observed at the 150-instance level (0.99), while for SMOTE, it is at the 250 and 300-instance levels (0.97). SMOTE shows a steady increase in mean accuracy as the number of instances increases, from 0.94 at 100 instances to 0.97 at 300 instances. SMOTEENN’s mean accuracy peaks at 150 instances and then slightly decreases at higher instance levels, suggesting an optimal point for this method.
The standard deviation for SMOTE ranges from 0.010 to 0.030 across different sampling strategies, indicating relatively consistent performance. SMOTEENN’s standard deviation ranges from 0.008 to 0.029, with the lowest value observed at the 200-instance level (0.008). Both methods show a decrease in standard deviation as the number of instances increases, suggesting improved stability in performance with larger sample sizes. The peak mean accuracy of SMOTEENN at the 150-instance level indicates that this sampling strategy may be optimal for achieving the best predictive performance. The slight decrease in mean accuracy at higher instance levels for SMOTEENN suggests that there may be a trade-off between sample size and model performance, as also observed in the learning curves discussed earlier.
4. Discussion
Recent advances in addressing class imbalance for regression tasks have built upon techniques originally devised for imbalanced classification. One common approach is to generate synthetic data for underrepresented classes using variants of SMOTE. Although SMOTE was initially proposed for classification, researchers have adapted it to regression settings by interpolating both feature inputs and the target variable, aiming to produce realistic data points that improve the model’s ability to learn from scarce examples [
22]. Extensions of SMOTE for regression, such as SMOGN (SMOTE for Regression with Gaussian Noise), incorporate noise within the interpolation step to better capture the variability inherent to continuous target variables [
23]. These strategies generally align with findings that data-level manipulations can substantially increase performance when class or target distributions are heavily skewed.
Beyond data-level methods, cost-sensitive learning has gained traction as another potent strategy for tackling class imbalance in regression. In cost-sensitive approaches, the learning algorithm’s loss function is modified to penalize errors on the minority (or difficult-to-predict) segments of the regression span more heavily [
24,
25]. This technique effectively shifts the model’s emphasis toward minimizing high-cost errors, thus promoting more balanced predictions across uneven target distributions. By customizing the cost function for imbalanced regression, models can focus on critical or rare target ranges without unduly sacrificing performance in better-represented regions [
26]. Although cost-sensitive learning is widely applied in classification scenarios, multiple studies have underscored its adaptability and success in imbalanced regression problems as well [
25,
27].
A growing body of work also explores ensemble-based techniques, such as boosting and bagging, specifically tailored for imbalanced regression [
28]. These methods combine multiple weak learners and adaptively focus on regions of the feature space where performance is poorer. For instance, recent studies have integrated oversampling or undersampling processes within a boosting architecture, allowing each iteration to learn more effectively from challenging subsets of the data [
28]. Similarly, hybrid methods merge data augmentation strategies (like SMOTE variants) with cost-sensitive ensembles, yielding promising results in predictive power and model stability [
22,
28]. Overall, the continued refinement of synthetic oversampling, cost-sensitive optimization, and ensemble methods underscores the field’s commitment to developing robust solutions for imbalanced regression tasks across diverse real-world applications.
SMOTEENN seamlessly combines SMOTE’s synthetic data generation with Edited Nearest Neighbors (ENN) to remove noisy and borderline instances from the enriched dataset, leading to more refined and balanced training samples [
17]. This dual-stage process not only addresses the core issue of minority class underrepresentation but also reduces confusion between classes by pruning ambiguous data points, an essential step often omitted in other SMOTE variants [
22]. As a result, SMOTEENN has been shown to produce cleaner decision boundaries and more robust performance metrics, including higher accuracy and lower error rates in both classification and regression tasks [
22,
28]. Studies indicate that this enhanced data purity and increased representativeness allow learning algorithms to generalize more effectively, ultimately making SMOTEENN a superior resampling choice for heavily imbalanced datasets compared to stand-alone SMOTE or other SMOTE-based techniques.
The SMOTE learning curves consistently exhibit unhealthy characteristics across all sampling strategies due to the absence of a small gap between training and validation performance, despite the validation score eventually converging to the training score. This convergence suggests that the model may be overfitting and struggling to generalize effectively to unseen data. In contrast, the SMOTEENN learning curves demonstrate healthier traits, particularly at a sampling strategy of 200 instances, where a small gap between training and validation performance is evident, and the validation score shows a steady increase. This indicates a more balanced learning curve and improved generalization capabilities. However, the validation score worsens at higher sampling strategies, suggesting an optimal point for the number of instances beyond which the model’s performance may degrade.
The differences in values between SMOTE and SMOTEENN across
Table 1,
Table 2,
Table 3,
Table 4 and
Table 5 suggest that SMOTEENN is a more effective technique for handling class imbalance in the context of the manuscript’s study on fall risk assessment for bedridden patients. The higher accuracy and lower MSE values achieved by SMOTEENN indicate that it can lead to more accurate and reliable predictive models for fall risk assessment compared to using SMOTE alone. This improvement in predictive performance is crucial for developing effective real-time monitoring systems that can reduce the need for constant patient supervision. The slightly lower R
2 values for SMOTEENN on the training set and the higher R
2 values on the test set suggest that SMOTEENN may help the models generalize better when using unseen data. This is important for ensuring that the fall risk assessment models can perform well on new, real-world patient data.
The consistent superior performance of SMOTEENN across different sample sizes (from 100 to 300) demonstrates its robustness and effectiveness in handling class imbalance, regardless of the number of instances per class. This is particularly valuable in the context of this particular dataset, as it suggests that SMOTEENN can be applied successfully to datasets with varying levels of class imbalance. By mitigating overfitting, SMOTEENN can contribute to the development of more reliable and generalizable fall risk assessment models.
The performance of SMOTEENN at a sampling strategy of 300 instances is lower compared to the optimal point at 200 instances. This finding underscores the importance of identifying the right balance in data augmentation when using SMOTEENN. This study demonstrates that more instances do not always lead to better performance, and there is an optimal point that needs to be carefully determined. The slight decrease in validation score at higher sampling strategies, such as 300 instances, suggests that the model may be overfitting to the training data, leading to reduced generalization capabilities. This highlights the need for practitioners to thoroughly evaluate the impact of different sampling strategies on model performance when applying SMOTEENN to their specific datasets.
The decreasing standard deviation for both methods as the number of instances increases suggests improved stability and generalization abilities with larger sample sizes. The lower standard deviation values for SMOTEENN, particularly at the 200-instance level, indicate that this method may provide more consistent and reliable performance across different cross-validation folds. The superior performance of SMOTEENN in terms of mean accuracy and stability suggests that this method could contribute to the development of more accurate and reliable real-time monitoring systems for fall risk assessment in bedridden patients.
Despite its effectiveness in balancing skewed datasets, SMOTEENN has several limitations that could pose challenges for certain machine learning tasks. For one, this technique relies heavily on a proper parameter configuration, such as the number of nearest neighbors used both in the synthetic oversampling step (from SMOTE) and in the data-cleaning step (from ENN). Incorrect parameter choices can either fail to eliminate noisy or borderline samples or, conversely, remove too many informative data points, thus impairing model accuracy. Moreover, the reliance on nearest-neighbor searches increases computational costs, particularly for large-scale or high-dimensional datasets. These overheads can be significant and might not be justified in some real-world applications where real-time computation is critical.
Another drawback pertains to potential data distortion. While SMOTE creates synthetic samples by interpolating between existing minority samples, these newly generated instances might oversimplify the true data distribution, especially in complex, multimodal feature spaces. Although the Edited Nearest Neighbors step filters out some misaligned data points, it may also remove borderline examples that are actually representative of natural transitions in the dataset. This balancing act can sometimes create gaps in the data space, ultimately affecting a model’s ability to generalize to unseen scenarios [
19]. Beyond that, SMOTEENN’s focus on minority-class oversampling and noise removal may not adequately address other subtleties in real-world datasets, such as mislabeled examples or concept drift, where the underlying data distribution can change over time.
The primary focus of this study is to demonstrate the practical efficacy of SMOTE and SMOTEENN in addressing class imbalance within the context of regression-based machine learning models for fall risk assessments in bedridden patients. The study systematically evaluates the performance of these techniques across smaller, imbalanced datasets. The algorithm is specifically designed to address the challenges posed by class imbalance in such datasets by generating synthetic data for the minority class. In scenarios where large, balanced datasets are available, the need for synthetic data generation diminishes, as traditional ML algorithms can effectively learn from the abundant data. However, in real-world applications, particularly in domains such as healthcare where rare conditions or events are of high importance, smaller datasets with class imbalance are prevalent.
The focus of future studies will be on comparing SMOTE and SMOTEENN with other techniques currently in use. This comparison is important for evaluating the practical application of these methods in real-world scenarios. The findings from such studies can guide practitioners in selecting the most appropriate resampling technique for their specific use case, particularly when dealing with imbalanced datasets and regression models.
5. Conclusions
Hence, this study demonstrates that SMOTEENN, a hybrid approach combining oversampling and undersampling techniques, consistently outperforms SMOTE in handling class imbalance across various ML models and sampling strategies. SMOTEENN achieved higher accuracy, lower mean squared error, and better generalization capabilities, as evidenced by the healthier learning curves and cross-validation results. The optimal performance of SMOTEENN was observed at a sampling strategy of 200 instances, highlighting the importance of finding the right balance in data augmentation. These findings suggest that SMOTEENN can significantly improve the reliability and accuracy of predictive models in imbalanced classification tasks, potentially leading to more effective applications in diverse fields.
Future research would benefit from exploring an even wider array of hybrid systems that extend beyond SMOTE and Edited Nearest Neighbors. For instance, approaches that integrate ensemble learning techniques (e.g., boosting or bagging) with modern data-level or algorithm-level interventions could yield powerful synergistic effects. One possibility is to combine cost-sensitive learning or active learning frameworks with SMOTE-based oversampling to further optimize the selection and generation of synthetic examples. By doing so, researchers could reduce the likelihood of committing errors in borderline regions of the dataset, an issue that sometimes arises with SMOTE-only or SMOTEENN approaches.
Evaluating the performance of SMOTEENN in various real-world domains beyond medical monitoring systems also remains an open direction. While this study highlighted its usefulness in fall risk assessment for bedridden patients, other fields often suffer from skewed data distributions. A systematic evaluation in these different contexts would uncover new insights into how data characteristics (e.g., noise levels and feature dimensionality) influence the effectiveness of SMOTEENN. Such comparative studies could also reveal domain-specific adaptations or parameter settings that maximize performance. Ultimately, expanding the breadth of applications tested will help solidify SMOTEENN’s standing as a robust and generalizable resampling solution for modern machine learning tasks.