Next Article in Journal
A Survey on Variable Neighborhood Search for Sustainable Logistics
Next Article in Special Issue
Code Obfuscation: A Comprehensive Approach to Detection, Classification, and Ethical Challenges
Previous Article in Journal
Design of a New Energy Microgrid Optimization Scheduling Algorithm Based on Improved Grey Relational Theory
Previous Article in Special Issue
Applying Recommender Systems to Predict Personalized Film Age Ratings for Parents
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models

1
Department of Anatomy, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USA
2
Department of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USA
3
The Ferrara Center for Patient Safety and Clinical Simulation, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USA
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(1), 37; https://doi.org/10.3390/a18010037
Submission received: 20 December 2024 / Revised: 8 January 2025 / Accepted: 9 January 2025 / Published: 10 January 2025
(This article belongs to the Special Issue Algorithms in Data Classification (2nd Edition))

Abstract

:
Class imbalance is a prevalent challenge in machine learning that arises from skewed data distributions in one class over another, causing models to prioritize the majority class and underperform on the minority classes. This bias can significantly undermine accurate predictions in real-world scenarios, highlighting the importance of the robust handling of imbalanced data for dependable results. This study examines one such scenario of real-time monitoring systems for fall risk assessment in bedridden patients where class imbalance may compromise the effectiveness of machine learning. It compares the effectiveness of two resampling techniques, the Synthetic Minority Oversampling Technique (SMOTE) and SMOTE combined with Edited Nearest Neighbors (SMOTEENN), in mitigating class imbalance and improving predictive performance. Using a controlled sampling strategy across various instance levels, the performance of both methods in conjunction with decision tree regression, gradient boosting regression, and Bayesian regression models was evaluated. The results indicate that SMOTEENN consistently outperforms SMOTE in terms of accuracy and mean squared error across all sample sizes and models. SMOTEENN also demonstrates healthier learning curves, suggesting improved generalization capabilities, particularly for a sampling strategy with a given number of instances. Furthermore, cross-validation analysis reveals that SMOTEENN achieves higher mean accuracy and lower standard deviation compared to SMOTE, indicating more stable and reliable performance. These findings suggest that SMOTEENN is a more effective technique for handling class imbalance, potentially contributing to the development of more accurate and generalizable predictive models in various applications.

Graphical Abstract

1. Introduction

Class imbalance is a prevalent challenge in machine learning (ML), particularly in classification tasks where the distribution of data classes is significantly skewed. This occurs when one class, referred to as the majority class, vastly outnumbers another, known as the minority class [1,2]. Such imbalances can lead to models that perform well on the majority class but poorly on the minority class. For example, in a dataset where 95% of samples belong to one class and only 5% to another, a model could achieve 95% accuracy by always predicting the majority class, while completely failing to identify instances of the minority class.
Studies have shown that ML methods trained on imbalanced datasets tend to favor the majority class, resulting in poor predictive accuracy for the minority class, which is often the class of the greatest interest [3,4]. This issue is particularly critical in domains such as healthcare, where rare conditions or events are of high importance. For instance, in asthma management, severe class imbalance can hinder the performance of models predicting asthma attacks [5,6]. Similarly, in remote sensing imagery classification, datasets often exhibit significant imbalance, necessitating the careful consideration of evaluation metrics and classification techniques [7,8].
To address class imbalance, researchers employ data-level and algorithm-level techniques. Data-level methods include oversampling techniques like the Synthetic Minority Oversampling Technique (SMOTE), which generates synthetic samples for the minority class, and undersampling methods that reduce the number of majority class samples [5,7]. Algorithm-level methods involve modifying the learning algorithm itself, such as cost-sensitive learning which assigns higher penalties for misclassifying minority class samples [7,8].
SMOTE, in particular, generates synthetic examples of the minority class rather than replicating existing instances [4,6]. It selects a minority class instance and creates synthetic examples along the line segments connecting that instance to its nearest minority class neighbors [9,10]. This approach expands the minority class’ feature space, enabling classifiers to identify decision regions more favorable to minority instances. Unlike simple oversampling methods, which risk overfitting by duplicating data, SMOTE generates diverse, unique samples, helping models generalize better to unseen data [10].
SMOTE has demonstrated its effectiveness across various domains, including fraud detection, medical diagnosis, and risk management. For example, in medical datasets where certain diseases are rare, SMOTE can improve detection rates, ultimately contributing to better patient outcomes [11]. Additionally, SMOTE is algorithm-independent, making it a versatile preprocessing step for any classification algorithm. Its simplicity and effectiveness have led to the development of extensions like Borderline-SMOTE, which focuses on generating synthetic samples near decision boundaries to improve classification in critical regions [12].
Despite its advantages, SMOTE has limitations. The introduction of synthetic data can sometimes lead to overlapping between classes, potentially reducing model performance. To address this, hybrid approaches like SMOTE combined with Edited Nearest Neighbors (SMOTEENN) have been proposed [11].
SMOTEENN integrates SMOTE with the Edited Nearest Neighbors (ENN) algorithm to enhance classifier performance on imbalanced data [11]. SMOTE generates synthetic minority class examples by interpolating between existing minority instances and their nearest neighbors, increasing the minority class population without duplicating samples. ENN then refines the dataset by removing noisy and ambiguous instances from both the majority and minority classes, particularly those misclassified by their k-nearest neighbors. This sequential process not only balances the class distribution but also improves data quality by reducing noise, leading to better classification performance compared to using SMOTE or ENN alone [11].
Building on the work of Joseph et al., who applied regressive machine learning algorithms for the real-time monitoring of bedridden patients at risk of falls [13], this study aims to enhance predictive performance using the SMOTEENN technique. Joseph et al. analyzed six unique movements commonly performed by patients on hospital beds and developed an automated risk assessment system to reduce the need for constant monitoring by hospital staff [14]. They addressed class imbalance in their dataset using SMOTE to improve model performance.
This study evaluates the effectiveness of SMOTE and SMOTEENN in addressing class imbalance and improving predictive accuracy. By applying SMOTEENN to the same dataset used by Joseph et al., the goal is to determine whether this hybrid approach provides improvements over SMOTE alone. Assessing the performance of SMOTEENN in this context has the potential to contribute to the development of more accurate and reliable real-time monitoring systems for fall risk assessment in bedridden patients, ultimately reducing the incidence of falls without requiring constant supervision by hospital staff.
This study focuses on demonstrating the practical efficacy of existing resampling algorithms, namely SMOTE and SMOTEENN, in addressing class imbalance within the context of regression-based ML models. While these techniques are not novel in themselves, their application and performance in the specific domain of fall risk assessment for bedridden patients using real time data have not been extensively explored. By conducting a comprehensive comparative analysis across various sample sizes and regression models, this research aims to provide insights into the effectiveness of SMOTE and SMOTEENN in improving predictive accuracy and model generalization in this critical healthcare application. The findings of this study can contribute to the development of more reliable and accurate real-time monitoring systems, ultimately reducing the need for constant patient supervision and enhancing patient safety.

2. Materials and Methods

This section outlines the methodologies employed to compare the effectiveness of SMOTE and SMOTEENN in addressing class imbalance. The first subsection describes the data collection methods used to gather the dataset analyzed in this study. The second subsection discusses the machine learning models and sampling strategies applied to evaluate the performance of SMOTE and SMOTEENN. The third and fourth subsections provide detailed explanations of the implementation of SMOTE and SMOTEENN, respectively, including the algorithms’ steps and parameters.

2.1. Data Collection

Following the methodology of Joseph et al., this research uses their dataset, derived from a high-fidelity medical mannequin simulating common movements of bedridden patients. Joseph et al. adapted a mannequin, typically used in medical training simulations, to replicate inpatient movements and positions. Movement data were captured using strategically placed Movella DOT sensors, wearable platforms known for their signal processing and sensor fusion capabilities. Each sensor, comprising an accelerometer, gyroscope, and magnetometer, recorded velocities, accelerations, and Euler angles, transmitting data wirelessly via Bluetooth. Sensor fusion algorithms combined these data to precisely determine the mannequin’s orientation, generating a robust dataset of patient movements. These movements were categorized as “Roll right”, “Roll left”, “Drop right”, “Drop left”, “Breathing”, and “Seizure” [14], while Joseph et al. employed SMOTE to address class imbalance within this dataset, our study investigates whether SMOTEENN, a hybrid resampling method, offers superior performance for predictive model accuracy in fall risk assessment, potentially contributing to more effective real-time monitoring systems and the reduced need for constant patient supervision.
The dataset included velocities, accelerations, and Euler angles at specific time intervals from a high-fidelity mannequin equipped with Movella DOT sensors [15]. Mayer et al. used sensor fusion algorithms to combine these components in order to accurately calculate the mannequin’s orientation [16]. Both adult-sized and infant-sized mannequins were used to simulate various patient demographics. Each movement type was repeated around 100 times for substantial data collection. Breathing data were collected by having the mannequin perform one full tidal volume inhalation–exhalation cycle for 3 min straight, while seizure data were gathered from a 10 min continuous seizure simulation. For rolling and falling off the bed movements, the infant mannequin was used, starting in the supine position and rolling approximately 90 degrees to either side before returning to the original position (repeated 100 times per side). Drop data were collected by rolling the mannequin beyond its left side until it fell off the bed, also repeated 100 times. These movements were categorized by Joseph et al. into six key labels in the used dataset as fikkiws: “Roll right” (0), “Roll left” (1), “Drop right” (2), “Drop left” (3), “Breathing” (4), and “Seizure” (5). Each label captures a common patient action or position, offering a comprehensive view of patient behavior in a bed setting.

2.2. Machine Learning

We employed a controlled sampling strategy for both SMOTE and SMOTEENN to ensure the balanced comparison of their performance across different levels of class representation. We aimed to evaluate the effectiveness of each technique, not just on a single balanced dataset, but across a range of sample sizes for each class within the target variable. This approach allowed us to observe how the performance of each resampling method scales with the number of instances per class, providing a more nuanced understanding of their strengths and weaknesses.
Specifically, we configured both SMOTE and SMOTEENN to generate synthetic samples such that each class in the target variable reached five distinct instance levels as follows: 100, 150, 200, 250, and 300. This systematic variation in sample size allowed us to assess the impact of the number of instances on the performance of each oversampling technique. For example, we could observe whether SMOTE or SMOTEENN was more beneficial in dealing with extremely limited minority class representation (100 instances) compared to a moderately imbalanced scenario (250 instances). This controlled approach ensured a fair comparison between SMOTE and SMOTEENN, and it provided insights into the optimal application scenarios for each method.
The synthetic instances generated by SMOTE and SMOTEENN are exclusively used to augment the training dataset, ensuring a more balanced representation of the minority class during model training. These synthetic samples are not included in the testing dataset to maintain the integrity of the evaluation process. By using the original, unaltered testing data, we can accurately assess the generalization capabilities of the trained models and their performance on unseen instances. This approach allows us to evaluate the effectiveness of the resampling techniques in improving predictive accuracy without introducing bias into the testing phase.
The machine learning algorithms were run in Visual Studio Code version 1.86.2, which utilized python 3.12.3.

2.3. Implementation of SMOTE

The SMOTE algorithm was implemented using the following steps:
  • Identify minority samples in the dataset.
  • For each minority sample, find its k-nearest neighbors (typically k = 5 ).
  • Generate synthetic samples by interpolating between the minority sample and its randomly chosen nearest neighbors.
  • Repeat the process until the desired balance between classes is achieved.

2.3.1. Identifying Minority Samples

The initial step is identifying all samples belonging to the minority class within the dataset. The dataset D is formally defined as follows:
D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , , ( x n , y n ) }
where x i represents a sample and y i its corresponding class label. The minority class S min is then defined as follows:
S min = { ( x i , y i ) D y i = c min }
Here, c min denotes the minority class. This step is crucial as it lays the foundation for the subsequent oversampling process, ensuring that the algorithm focuses its efforts on the underrepresented class(es).

2.3.2. k-Nearest Neighbors

Once the minority samples are identified, SMOTE employs the k-Nearest Neighbor (KNN) algorithm to find similar samples in the feature space. This step is critical for maintaining the underlying distribution of the minority class while generating synthetic samples. The value of k, typically set to 5, is a hyperparameter that can significantly influence the quality of the synthetic samples. The distance between two samples is calculated using the Euclidean distance metric. The k-nearest neighbors of a sample x i are then defined as the set, as follows:
KNN ( x i ) = x j S min d ( x i , x j ) is among the k smallest distances ,
where x j denotes a particular minority-class instance in S min such that its distance d ( x i , x j ) to the query point xi is ranked among the shortest k distances. Only those x j values yielding the smallest k distances d ( x i , x j ) are included in KNN ( x i ) . In other words, while x j may initially range over all samples in S min , the algorithm restricts the final set of neighbors to exactly k points once it ranks their distances from x i . This step is a standard feature of the KNN algorithm [17].
This KNN approach allows SMOTE to create synthetic samples that are statistically similar to existing minority samples, thereby preserving the inherent characteristics of the minority class [17].

2.3.3. Synthetic Sample Generation

The core of the SMOTE algorithm lies in its synthetic sample generation process. This step is where the algorithm demonstrates its power in addressing class imbalance by creating new, artificial samples of the minority class. For each minority sample x i , SMOTE generates N synthetic samples, where N is determined by the desired oversampling rate. The process for generating each synthetic sample x new is as follows:
  • Randomly select x j from KNN ( x i ) .
  • Generate a synthetic sample as follows:
    x new = x i + λ ( x j x i )
    where λ is a random number in [ 0 , 1 ] .
This approach ensures that the synthetic samples are created along the line segment between the minority sample and its neighbor, effectively populating the feature space in the vicinity of the existing minority samples. By using a random λ , SMOTE introduces variability in the synthetic samples, preventing the creation of exact duplicates and reducing the risk of overfitting [18].
If λ were fixed, all synthetic samples derived from the same pair of minority points would end up in the same location, risking overfitting or poor generalization. By contrast, a random λ distributes the generated points more evenly, approximating the underlying minority data distribution in a superior manner. As a result, this approach can help machine learning models form more balanced decision boundaries, ultimately improving predictive performance on imbalanced datasets, as seen in previous studies [19,20].

2.3.4. Oversampling Rate

This synthetic sample generation process is repeated for each minority sample and as many times as needed to achieve the desired minority-to-majority class ratio. The number of synthetic samples to generate can be calculated based on the required oversampling percentage. By iteratively augmenting the minority class, the algorithm adjusts the class distribution as follows:
Desired Minority Samples = Original Minority Samples × 1 + Oversampling Percentage 100
Repeating the interpolation ensures that the synthetic samples are diverse and spread across the minority class feature space, enhancing the classifier’s ability to generalize [20].

2.4. Implementation of SMOTEENN

SMOTEENN is a two-step technique used to better handle imbalanced datasets by combining the benefits of oversampling with data cleaning. In the first step, SMOTE generates new samples for the minority class. This approach helps counteract the issue of having too few training instances in the minority class and mitigates the risk of overfitting that can arise with the naive replication of rare samples. However, placing newly generated data in areas near class decision boundaries or within noisy regions can sometimes lower model performance because misaligned synthetic points may blur class distinctions [19].
Once these synthetic samples have been added, the second step of SMOTEENN applies Edited Nearest Neighbors (ENN) to refine the data further. Here, every instance—regardless of whether it is an original minority sample, a synthetic point created by SMOTE, or a majority-class sample—undergoes a nearest-neighbor check by ENN. If an instance is misclassified when evaluated against its k-nearest neighbors, it is removed from the dataset by ENN. This helps eliminate not only incorrectly placed synthetic points but also outliers and borderline cases from the original data that could obscure decision boundaries. ENN thus serves as a cleaning mechanism, ensuring the final training set is more coherent and less susceptible to noise [21], while SMOTE addresses imbalanced class distributions by increasing the representation of the minority class. ENN safeguards data quality by removing problematic points, making SMOTEENN a powerful tool for improving classification performance on highly skewed datasets.

Applying Edited Nearest Neighbors (ENN) Undersampling

  • Define ENN Parameters: Determine the number of nearest neighbors to consider (usually denoted as k) when applying ENN.
  • Identify Noisy and Borderline Instances: For each sample in the dataset (after applying SMOTE), find its k nearest neighbors and examine the class labels.
  • Remove Misclassified Samples: If a sample’s class label differs from the majority of its neighbors’ class labels, consider it misclassified or noisy and remove it from the dataset.
  • Clean the Dataset: ENN helps in cleaning overlapping or ambiguous areas in the feature space, reducing the potential for misclassification due to noisy data [7].

3. Results

This section presents the results of the comparative analysis between SMOTE and SMOTEENN in addressing class imbalance. The first subsection examines the learning curves of both methods across various sampling strategies, providing insights into their generalization capabilities. The second subsection compares the performance of SMOTE and SMOTEENN at different sample sizes, analyzing their impact on model accuracy, R2 values, and mean squared error. The final subsection discusses the cross-validation analysis, highlighting the mean accuracy and standard deviation of both methods across different sampling levels.

3.1. Learning Curves

Recognizing healthy learning curves is essential for developing robust and reliable models across various applications. A learning curve typically plots a model’s performance, such as accuracy or loss, against the number of training instances or epochs. A healthy learning curve demonstrates a model’s ability to learn effectively from the data while generalizing well to unseen instances.
A healthy learning curve is characterized by a steady increase in both training and validation performance over time, eventually plateauing at a high level. The validation performance should consistently trail slightly behind the training performance, indicating the model’s capacity to generalize without severe overfitting. This gap between training and validation curves is crucial, as a large gap may suggest overfitting, while a non-existent or very small gap could indicate underfitting or data leakage.
Recognizing healthy learning curves is vital for several reasons. First, it allows researchers and practitioners to select the most appropriate model for a given task. By analyzing learning curves, one can identify models that exhibit a balanced trade-off between training and validation performance, ensuring that the chosen model can generalize well to new data. Second, learning curves serve as a guide for hyperparameter tuning, enabling adjustments to optimize a model’s performance and convergence.
Moreover, learning curves are essential for detecting overfitting, a common challenge in machine learning where a model performs exceptionally well on the training data but poorly on unseen data. By monitoring learning curves, practitioners can identify early signs of overfitting and implement regularization techniques or modify the model architecture to improve generalization.
In high-stake applications, such as medical diagnosis or financial risk assessment, recognizing healthy learning curves is particularly crucial. Models with healthy learning curves are more likely to capture meaningful patterns in the data rather than memorizing noise or dataset-specific characteristics. This reliability is essential for deploying machine learning models in real-world scenarios where accurate predictions can have significant consequences.
Figure 1, Figure 2, Figure 3, Figure 4 and Figure 5 compare the learning curves for SMOTE and SMOTEENN across different sampling strategies (100, 150, 200, 250, and 300 instances). The validation accuracy for SMOTE shows some improvement as the number of instances increases, eventually converging to the training score. However, the absence of a small gap between the training and validation curves, where the validation curve is slightly lower than the training curve, is a concern. This lack of a gap suggests that the model may not generalize well to unseen data, despite the convergence. Hence, the SMOTE learning curves are unhealthy across all sampling strategies due to the lack of a small gap between training and validation performance, even though the validation score does converge to the training score. The faster convergence with higher sampling strategies (e.g., 300 instances) does not necessarily indicate the improved health of the learning curve, as the absence of a gap remains a critical issue.
The validation accuracy for SMOTEENN shows a more promising trend. It gradually increases with the number of instances and appears to be approaching a plateau, particularly at a sampling strategy of 200 instances (Figure 3). The presence of a small gap between the training and validation curves, where the validation curve is slightly lower than the training curve, indicates better generalization capabilities. The steady increase in the validation score is best observed at a sampling strategy of 200 instances. However, for values higher than 200 instances, the validation score starts to worsen again, suggesting a potential trade-off between sample size and model performance.

3.2. Performance Comparison at Different Sample Sizes

Based on the contents of Table 1, Table 2, Table 3, Table 4 and Table 5, the following is the analysis of the differences in the values and their implications.
Across all sample sizes (100 to 300), SMOTEENN consistently outperforms SMOTE in terms of accuracy for all three models (Decision Tree Regression (DTR), Gradient Boosting Regression (GBR), and Bayesian Regression (BR)). The difference in accuracy between SMOTE and SMOTEENN is most pronounced at the smallest sample size (100), with SMOTEENN achieving up to 13.5% higher accuracy (e.g., BR with SMOTEENN at 0.96 vs. BR with SMOTE at 0.825). As the sample size increases, the accuracy gap between SMOTE and SMOTEENN narrows, but SMOTEENN still maintains a higher accuracy across all models and sample sizes.
For both training and test sets, SMOTE consistently achieves perfect or near-perfect R2 values (1.0 or 0.99) across all models and sample sizes, indicating potential overfitting. SMOTEENN also achieves high R2 values, but they are slightly lower than SMOTE’s, particularly for the training set. This suggests that SMOTEENN may be less prone to overfitting compared to SMOTE. The test set R2 values for SMOTEENN are generally higher than those for SMOTE, indicating better generalization performance.
When comparing SMOTE to SMOTEENN, differences in Mean Squared Error (MSE) can reflect how well each approach handles imbalanced class distributions. A high MSE indicates that the model has larger errors in its predictions and its estimated values deviate significantly from the true targets. A low MSE suggests that the model’s predictions closely match the actual values, reflecting better accuracy. SMOTEENN consistently achieves lower MSE values for both training and test sets across all models and sample sizes compared to SMOTE. The difference in MSE between SMOTE and SMOTEENN is most significant for the test set, with SMOTEENN achieving up to 81.6% lower MSE (e.g., DTR with SMOTEENN at 0.07 vs. DTR with SMOTE at 0.38 for sample size 100). As the sample size increases, the MSE values for both SMOTE and SMOTEENN generally decrease, but SMOTEENN maintains a lower MSE across all models and sample sizes.
The numerical results presented in Table 1, Table 2, Table 3, Table 4 and Table 5 demonstrate that SMOTEENN consistently outperforms SMOTE across all three models (decision tree regression, gradient boosting regression, and Bayesian regression) and across all sampling strategies (from 100 to 300 instances). For example, at the smallest sample size of 100 instances, SMOTEENN achieved an accuracy from 0.93 to 0.96 across the three models, compared to an accuracy from 0.81 to 0.825 for SMOTE. Similarly, the MSE for SMOTEENN ranged from 0.04 to 0.07 on the test set, while SMOTE had MSE values between 0.15 and 0.38. These results highlight the superior performance of SMOTEENN in handling class imbalance and improving predictive accuracy in the context of fall risk assessment for bedridden patients.

3.3. Cross-Validation Analysis

Table 6 presents the cross-validation results for both SMOTE and SMOTEENN across different sampling strategies (100 to 300 instances). The table includes the mean accuracy and standard deviation for each method at each sampling level. SMOTEENN consistently achieves higher mean accuracy compared to SMOTE across all sampling strategies, with the exception of the 250-instance level, where both methods have the same mean accuracy of 0.97. The highest mean accuracy for SMOTEENN is observed at the 150-instance level (0.99), while for SMOTE, it is at the 250 and 300-instance levels (0.97). SMOTE shows a steady increase in mean accuracy as the number of instances increases, from 0.94 at 100 instances to 0.97 at 300 instances. SMOTEENN’s mean accuracy peaks at 150 instances and then slightly decreases at higher instance levels, suggesting an optimal point for this method.
The standard deviation for SMOTE ranges from 0.010 to 0.030 across different sampling strategies, indicating relatively consistent performance. SMOTEENN’s standard deviation ranges from 0.008 to 0.029, with the lowest value observed at the 200-instance level (0.008). Both methods show a decrease in standard deviation as the number of instances increases, suggesting improved stability in performance with larger sample sizes. The peak mean accuracy of SMOTEENN at the 150-instance level indicates that this sampling strategy may be optimal for achieving the best predictive performance. The slight decrease in mean accuracy at higher instance levels for SMOTEENN suggests that there may be a trade-off between sample size and model performance, as also observed in the learning curves discussed earlier.

4. Discussion

Recent advances in addressing class imbalance for regression tasks have built upon techniques originally devised for imbalanced classification. One common approach is to generate synthetic data for underrepresented classes using variants of SMOTE. Although SMOTE was initially proposed for classification, researchers have adapted it to regression settings by interpolating both feature inputs and the target variable, aiming to produce realistic data points that improve the model’s ability to learn from scarce examples [22]. Extensions of SMOTE for regression, such as SMOGN (SMOTE for Regression with Gaussian Noise), incorporate noise within the interpolation step to better capture the variability inherent to continuous target variables [23]. These strategies generally align with findings that data-level manipulations can substantially increase performance when class or target distributions are heavily skewed.
Beyond data-level methods, cost-sensitive learning has gained traction as another potent strategy for tackling class imbalance in regression. In cost-sensitive approaches, the learning algorithm’s loss function is modified to penalize errors on the minority (or difficult-to-predict) segments of the regression span more heavily [24,25]. This technique effectively shifts the model’s emphasis toward minimizing high-cost errors, thus promoting more balanced predictions across uneven target distributions. By customizing the cost function for imbalanced regression, models can focus on critical or rare target ranges without unduly sacrificing performance in better-represented regions [26]. Although cost-sensitive learning is widely applied in classification scenarios, multiple studies have underscored its adaptability and success in imbalanced regression problems as well [25,27].
A growing body of work also explores ensemble-based techniques, such as boosting and bagging, specifically tailored for imbalanced regression [28]. These methods combine multiple weak learners and adaptively focus on regions of the feature space where performance is poorer. For instance, recent studies have integrated oversampling or undersampling processes within a boosting architecture, allowing each iteration to learn more effectively from challenging subsets of the data [28]. Similarly, hybrid methods merge data augmentation strategies (like SMOTE variants) with cost-sensitive ensembles, yielding promising results in predictive power and model stability [22,28]. Overall, the continued refinement of synthetic oversampling, cost-sensitive optimization, and ensemble methods underscores the field’s commitment to developing robust solutions for imbalanced regression tasks across diverse real-world applications.
SMOTEENN seamlessly combines SMOTE’s synthetic data generation with Edited Nearest Neighbors (ENN) to remove noisy and borderline instances from the enriched dataset, leading to more refined and balanced training samples [17]. This dual-stage process not only addresses the core issue of minority class underrepresentation but also reduces confusion between classes by pruning ambiguous data points, an essential step often omitted in other SMOTE variants [22]. As a result, SMOTEENN has been shown to produce cleaner decision boundaries and more robust performance metrics, including higher accuracy and lower error rates in both classification and regression tasks [22,28]. Studies indicate that this enhanced data purity and increased representativeness allow learning algorithms to generalize more effectively, ultimately making SMOTEENN a superior resampling choice for heavily imbalanced datasets compared to stand-alone SMOTE or other SMOTE-based techniques.
The SMOTE learning curves consistently exhibit unhealthy characteristics across all sampling strategies due to the absence of a small gap between training and validation performance, despite the validation score eventually converging to the training score. This convergence suggests that the model may be overfitting and struggling to generalize effectively to unseen data. In contrast, the SMOTEENN learning curves demonstrate healthier traits, particularly at a sampling strategy of 200 instances, where a small gap between training and validation performance is evident, and the validation score shows a steady increase. This indicates a more balanced learning curve and improved generalization capabilities. However, the validation score worsens at higher sampling strategies, suggesting an optimal point for the number of instances beyond which the model’s performance may degrade.
The differences in values between SMOTE and SMOTEENN across Table 1, Table 2, Table 3, Table 4 and Table 5 suggest that SMOTEENN is a more effective technique for handling class imbalance in the context of the manuscript’s study on fall risk assessment for bedridden patients. The higher accuracy and lower MSE values achieved by SMOTEENN indicate that it can lead to more accurate and reliable predictive models for fall risk assessment compared to using SMOTE alone. This improvement in predictive performance is crucial for developing effective real-time monitoring systems that can reduce the need for constant patient supervision. The slightly lower R2 values for SMOTEENN on the training set and the higher R2 values on the test set suggest that SMOTEENN may help the models generalize better when using unseen data. This is important for ensuring that the fall risk assessment models can perform well on new, real-world patient data.
The consistent superior performance of SMOTEENN across different sample sizes (from 100 to 300) demonstrates its robustness and effectiveness in handling class imbalance, regardless of the number of instances per class. This is particularly valuable in the context of this particular dataset, as it suggests that SMOTEENN can be applied successfully to datasets with varying levels of class imbalance. By mitigating overfitting, SMOTEENN can contribute to the development of more reliable and generalizable fall risk assessment models.
The performance of SMOTEENN at a sampling strategy of 300 instances is lower compared to the optimal point at 200 instances. This finding underscores the importance of identifying the right balance in data augmentation when using SMOTEENN. This study demonstrates that more instances do not always lead to better performance, and there is an optimal point that needs to be carefully determined. The slight decrease in validation score at higher sampling strategies, such as 300 instances, suggests that the model may be overfitting to the training data, leading to reduced generalization capabilities. This highlights the need for practitioners to thoroughly evaluate the impact of different sampling strategies on model performance when applying SMOTEENN to their specific datasets.
The decreasing standard deviation for both methods as the number of instances increases suggests improved stability and generalization abilities with larger sample sizes. The lower standard deviation values for SMOTEENN, particularly at the 200-instance level, indicate that this method may provide more consistent and reliable performance across different cross-validation folds. The superior performance of SMOTEENN in terms of mean accuracy and stability suggests that this method could contribute to the development of more accurate and reliable real-time monitoring systems for fall risk assessment in bedridden patients.
Despite its effectiveness in balancing skewed datasets, SMOTEENN has several limitations that could pose challenges for certain machine learning tasks. For one, this technique relies heavily on a proper parameter configuration, such as the number of nearest neighbors used both in the synthetic oversampling step (from SMOTE) and in the data-cleaning step (from ENN). Incorrect parameter choices can either fail to eliminate noisy or borderline samples or, conversely, remove too many informative data points, thus impairing model accuracy. Moreover, the reliance on nearest-neighbor searches increases computational costs, particularly for large-scale or high-dimensional datasets. These overheads can be significant and might not be justified in some real-world applications where real-time computation is critical.
Another drawback pertains to potential data distortion. While SMOTE creates synthetic samples by interpolating between existing minority samples, these newly generated instances might oversimplify the true data distribution, especially in complex, multimodal feature spaces. Although the Edited Nearest Neighbors step filters out some misaligned data points, it may also remove borderline examples that are actually representative of natural transitions in the dataset. This balancing act can sometimes create gaps in the data space, ultimately affecting a model’s ability to generalize to unseen scenarios [19]. Beyond that, SMOTEENN’s focus on minority-class oversampling and noise removal may not adequately address other subtleties in real-world datasets, such as mislabeled examples or concept drift, where the underlying data distribution can change over time.
The primary focus of this study is to demonstrate the practical efficacy of SMOTE and SMOTEENN in addressing class imbalance within the context of regression-based machine learning models for fall risk assessments in bedridden patients. The study systematically evaluates the performance of these techniques across smaller, imbalanced datasets. The algorithm is specifically designed to address the challenges posed by class imbalance in such datasets by generating synthetic data for the minority class. In scenarios where large, balanced datasets are available, the need for synthetic data generation diminishes, as traditional ML algorithms can effectively learn from the abundant data. However, in real-world applications, particularly in domains such as healthcare where rare conditions or events are of high importance, smaller datasets with class imbalance are prevalent.
The focus of future studies will be on comparing SMOTE and SMOTEENN with other techniques currently in use. This comparison is important for evaluating the practical application of these methods in real-world scenarios. The findings from such studies can guide practitioners in selecting the most appropriate resampling technique for their specific use case, particularly when dealing with imbalanced datasets and regression models.

5. Conclusions

Hence, this study demonstrates that SMOTEENN, a hybrid approach combining oversampling and undersampling techniques, consistently outperforms SMOTE in handling class imbalance across various ML models and sampling strategies. SMOTEENN achieved higher accuracy, lower mean squared error, and better generalization capabilities, as evidenced by the healthier learning curves and cross-validation results. The optimal performance of SMOTEENN was observed at a sampling strategy of 200 instances, highlighting the importance of finding the right balance in data augmentation. These findings suggest that SMOTEENN can significantly improve the reliability and accuracy of predictive models in imbalanced classification tasks, potentially leading to more effective applications in diverse fields.
Future research would benefit from exploring an even wider array of hybrid systems that extend beyond SMOTE and Edited Nearest Neighbors. For instance, approaches that integrate ensemble learning techniques (e.g., boosting or bagging) with modern data-level or algorithm-level interventions could yield powerful synergistic effects. One possibility is to combine cost-sensitive learning or active learning frameworks with SMOTE-based oversampling to further optimize the selection and generation of synthetic examples. By doing so, researchers could reduce the likelihood of committing errors in borderline regions of the dataset, an issue that sometimes arises with SMOTE-only or SMOTEENN approaches.
Evaluating the performance of SMOTEENN in various real-world domains beyond medical monitoring systems also remains an open direction. While this study highlighted its usefulness in fall risk assessment for bedridden patients, other fields often suffer from skewed data distributions. A systematic evaluation in these different contexts would uncover new insights into how data characteristics (e.g., noise levels and feature dimensionality) influence the effectiveness of SMOTEENN. Such comparative studies could also reveal domain-specific adaptations or parameter settings that maximize performance. Ultimately, expanding the breadth of applications tested will help solidify SMOTEENN’s standing as a robust and generalizable resampling solution for modern machine learning tasks.

Author Contributions

Conceptualization, G.H., D.N., R.J., J.M., M.B., T.D. and M.T.; methodology, G.H., D.N. and M.T.; software, G.H. and D.N.; validation, G.H., D.N., R.J., J.M., M.B., T.D. and M.T.; formal analysis, G.H. and D.N.; investigation, G.H., D.N. and M.T.; resources, M.T.; data curation, G.H. and D.N.; writing—original draft preparation, G.H. and D.N.; writing—review and editing, G.H., D.N., R.J., J.M., M.B., T.D. and M.T.; visualization, G.H. and D.N.; supervision, M.T.; project administration, M.T.; funding acquisition, M.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SMOTESynthetic Minority Oversampling Technique
SMOTEENNSMOTE combined with Edited Nearest Neighbors
MLMachine Learning
MSEMean Squared Error
DTRDecision Tree Regression
GBRGradient Boosting Regression
BRBayesian Regression
ENNEdited Nearest Neighbors
KNNk-Nearest Neighbors

References

  1. Chen, Y.; Calabrese, R.; Martin-Barragan, B. Interpretable machine learning for imbalanced credit scoring datasets. Eur. J. Oper. Res. 2024, 312, 357–372. [Google Scholar] [CrossRef]
  2. van den Goorbergh, R.; van Smeden, M.; Timmerman, D.; Van Calster, B. The harm of class imbalance corrections for risk prediction models: Illustration and simulation using logistic regression. J. Am. Med. Inform. Assoc. 2022, 29, 1525–1534. [Google Scholar] [CrossRef]
  3. Tan, J.K.; Quan, L.; Salim, N.N.M.; Tan, J.H.; Goh, S.Y.; Thumboo, J.; Bee, Y.M. Machine Learning–Based Prediction for High Health Care Utilizers by Using a Multi-Institutional Diabetes Registry: Model Training and Evaluation. JMIR AI 2024, 3, e58463. [Google Scholar] [CrossRef]
  4. Chen, W.; Yang, K.; Yu, Z.; Shi, Y.; Chen, C.L.P. A survey on imbalanced learning: Latest research, applications and future directions. Artif. Intell. Rev. 2024, 57, 137. [Google Scholar] [CrossRef]
  5. Budiarto, A.; Sheikh, A.; Wilson, A.; Price, D.B.; Shah, S.A. Handling Class Imbalance in Machine Learning-based Prediction Models: A Case Study in Asthma Management. In Proceedings of the 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia, 24–27 July 2023; pp. 1–5. [Google Scholar] [CrossRef]
  6. Budhathoki, N.; Bhandari, R.; Bashyal, S.; Lee, C. Predicting asthma using imbalanced data modeling techniques: Evidence from 2019 Michigan BRFSS data. PLoS ONE 2023, 18, e0295427. [Google Scholar] [CrossRef] [PubMed]
  7. Kalnak, N.; Nakeva von Mentzer, C. Listening and Processing Skills in Young School Children with a History of Developmental Phonological Disorder. Healthcare 2024, 12, 359. [Google Scholar] [CrossRef]
  8. Tao, Y.; Zhang, Z.; Wang, B.; Ren, J. Motality prediction of ICU rheumatic heart disease with imbalanced data based on machine learning. Big Data Inf. Anal. 2024, 8, 43–64. [Google Scholar] [CrossRef]
  9. Yang, Y.; Khorshidi, H.A.; Aickelin, U. A review on over-sampling techniques in classification of multi-class imbalanced datasets: Insights for medical problems. Front. Digit. Health 2024, 6, 1430245. [Google Scholar] [CrossRef] [PubMed]
  10. Matharaarachchi, S.; Domaratzki, M.; Muthukumarana, S. Enhancing SMOTE for imbalanced data with abnormal minority instances. Mach. Learn. Appl. 2024, 18, 100597. [Google Scholar] [CrossRef]
  11. Bounab, R.; Zarour, K.; Guelib, B.; Khlifa, N. Enhancing Medicare Fraud Detection Through Machine Learning: Addressing Class Imbalance With SMOTE-ENN. IEEE Access 2024, 12, 54382–54396. [Google Scholar] [CrossRef]
  12. Cheah, P.C.Y.; Yang, Y.; Lee, B.G. Enhancing Financial Fraud Detection through Addressing Class Imbalance Using Hybrid SMOTE-GAN Techniques. Int. J. Financ. Stud. 2023, 11, 110. [Google Scholar] [CrossRef]
  13. Toma, M.; Husain, G. Algorithm Selection and Data Utilization in Machine Learning for Medical Imaging Classification. In Proceedings of the 2024 IEEE Long Island Systems, Applications and Technology Conference (LISAT), Holtsville, NY, USA, 15 November 2024; pp. 1–6. [Google Scholar] [CrossRef]
  14. Joseph, P.; Ali, H.; Matthew, D.; Thomas, A.; Jose, R.; Mayer, J.; Bekbolatova, M.; Devine, T.; Toma, M. Regressive Machine Learning for Real-Time Monitoring of Bed-Based Patients. Appl. Sci. 2024, 14, 9978. [Google Scholar] [CrossRef]
  15. Mayer, J.; Jose, R.; Kurgansky, G.; Singh, P.; Coletti, C.; Devine, T.; Toma, M. Evaluating the Feasibility of Euler Angles for Bed-Based Patient Movement Monitoring. Signals 2023, 4, 788–799. [Google Scholar] [CrossRef]
  16. Mayer, J.; Jose, R.; Bekbolatova, M.; Coletti, C.; Devine, T.; Toma, M. Enhancing patient safety through integrated sensor technology and machine learning for bed-based patient movement detection in inpatient care. Artif. Intell. Health 2024, 1, 132. [Google Scholar] [CrossRef]
  17. Wang, A.X.; Chukova, S.S.; Nguyen, B.P. Synthetic minority oversampling using edited displacement-based k-nearest neighbors. Appl. Soft Comput. 2023, 148, 110895. [Google Scholar] [CrossRef]
  18. Elreedy, D.; Atiya, A.F. A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance. Inf. Sci. 2019, 505, 32–64. [Google Scholar] [CrossRef]
  19. Wang, S.; Dai, Y.; Shen, J.; Xuan, J. Research on expansion and classification of imbalanced data based on SMOTE algorithm. Sci. Rep. 2021, 11, 24039. [Google Scholar] [CrossRef]
  20. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  21. Alejo, R.; Sotoca, J.M.; Valdovinos, R.M.; Toribio, P. Edited Nearest Neighbor Rule for Improving Neural Networks Classifications. In Advances in Neural Networks—ISNN 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 303–310. [Google Scholar] [CrossRef]
  22. Batista, G.E.A.P.A.; Prati, R.C.; Monard, M.C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 2004, 6, 20–29. [Google Scholar] [CrossRef]
  23. Kuo, P.H.; Chen, Y.T.; Yau, H.T. SMOGN, MFO, and XGBoost Based Excitation Current Prediction Model for Synchronous Machine. Comput. Syst. Sci. Eng. 2023, 46, 2687–2709. [Google Scholar] [CrossRef]
  24. Elkan, C. The foundations of cost-sensitive learning. In Proceedings of the 17th International Joint Conference on Artificial Intelligence—Volume 2, Seattle, WA, USA, 4–10 August 2001; IJCAI’01. pp. 973–978. [Google Scholar]
  25. Abe, N.; Zadrozny, B.; Langford, J. An iterative method for multi-class cost-sensitive learning. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 22–25 August 2004; KDD04. pp. 3–11. [Google Scholar] [CrossRef]
  26. He, H.; Garcia, E. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
  27. Krawczyk, B. Learning from imbalanced data: Open challenges and future directions. Prog. Artif. Intell. 2016, 5, 221–232. [Google Scholar] [CrossRef]
  28. Araf, I.; Idri, A.; Chairi, I. Cost-sensitive learning for imbalanced medical data: A review. Artif. Intell. Rev. 2024, 57, 80. [Google Scholar] [CrossRef]
Figure 1. Learning curves for (a) SMOTE and (b) SMOTEENN with a sampling strategy of 100 instances.
Figure 1. Learning curves for (a) SMOTE and (b) SMOTEENN with a sampling strategy of 100 instances.
Algorithms 18 00037 g001
Figure 2. Learning curves for (a) SMOTE and (b) SMOTEENN with a sampling strategy of 150 instances.
Figure 2. Learning curves for (a) SMOTE and (b) SMOTEENN with a sampling strategy of 150 instances.
Algorithms 18 00037 g002
Figure 3. Learning curves for (a) SMOTE and (b) SMOTEENN with a sampling strategy of 200 instances.
Figure 3. Learning curves for (a) SMOTE and (b) SMOTEENN with a sampling strategy of 200 instances.
Algorithms 18 00037 g003
Figure 4. Learning curves for (a) SMOTE and (b) SMOTEENN with a sampling strategy of 250 instances.
Figure 4. Learning curves for (a) SMOTE and (b) SMOTEENN with a sampling strategy of 250 instances.
Algorithms 18 00037 g004
Figure 5. Learning curves for (a) SMOTE and (b) SMOTEENN with a sampling strategy of 300 instances.
Figure 5. Learning curves for (a) SMOTE and (b) SMOTEENN with a sampling strategy of 300 instances.
Algorithms 18 00037 g005
Table 1. Comparison of SMOTE and SMOTEENN across a sample size of 100.
Table 1. Comparison of SMOTE and SMOTEENN across a sample size of 100.
ModelAccuracyR2 (Train)R2 (Test)MSE (Train)MSE (Test)
DTR with SMOTE0.8251.00.8700.38
GBR with SMOTE0.810.990.950.020.15
BR with SMOTE0.8250.990.940.040.16
DTR with SMOTEENN0.9310.9700.07
GBR with SMOTEENN0.940.9990.890.0020.05
BR with SMOTEENN0.960.9950.990.010.04
Table 2. Comparison of SMOTE and SMOTEENN across a sample size of 150.
Table 2. Comparison of SMOTE and SMOTEENN across a sample size of 150.
ModelAccuracyR2 (Train)R2 (Test)MSE (Train)MSE (Test)
DTR with SMOTE0.901.00.9500.14
GBR with SMOTE0.880.990.960.030.12
BR with SMOTE0.890.980.960.040.12
DTR with SMOTEENN0.941.00.9600.098
GBR with SMOTEENN0.940.9980.960.0060.107
BR with SMOTEENN0.9440.990.960.0150.099
Table 3. Comparison of SMOTE and SMOTEENN across a sample size of 200.
Table 3. Comparison of SMOTE and SMOTEENN across a sample size of 200.
ModelAccuracyR2 (Train)R2 (Test)MSE (Train)MSE (Test)
DTR with SMOTE0.891.00.9100.23
GBR with SMOTE0.910.990.950.030.14
BR with SMOTE0.911.00.9700.09
DTR with SMOTEENN0.971.00.9600.11
GBR with SMOTEENN0.950.9970.980.0090.05
BR with SMOTEENN0.960.990.980.0170.06
Table 4. Comparison of SMOTE and SMOTEENN across a sample size of 250.
Table 4. Comparison of SMOTE and SMOTEENN across a sample size of 250.
ModelAccuracyR2 (Train)R2 (Test)MSE (Train)MSE (Test)
DTR with SMOTE0.891.00.9300.18
GBR with SMOTE0.890.990.960.030.12
BR with SMOTE0.890.990.970.040.10
DTR with SMOTEENN0.931.00.9700.07
GBR with SMOTEENN0.940.9990.980.0020.05
BR with SMOTEENN0.960.9950.9860.1130.04
Table 5. Comparison of SMOTE and SMOTEENN across a sample size of 300.
Table 5. Comparison of SMOTE and SMOTEENN across a sample size of 300.
ModelAccuracyR2 (Train)R2 (Test)MSE (Train)MSE (Test)
DTR with SMOTE0.891.00.9200.23
GBR with SMOTE0.910.990.970.030.08
BR with SMOTE0.910.990.970.040.09
DTR with SMOTEENN0.961.00.9900.04
GBR with SMOTEENN0.960.9950.980.0140.04
BR with SMOTEENN0.970.990.980.020.05
Table 6. Cross Validation Results.
Table 6. Cross Validation Results.
Sample Generation MethodSampling NumberMeanStandard Deviation
SMOTE1000.940.020
1500.950.030
2000.960.020
2500.970.010
3000.970.010
SMOTEENN1000.970.029
1500.990.009
2000.9770.008
2500.970.029
3000.980.009
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Husain, G.; Nasef, D.; Jose, R.; Mayer, J.; Bekbolatova, M.; Devine, T.; Toma, M. SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models. Algorithms 2025, 18, 37. https://doi.org/10.3390/a18010037

AMA Style

Husain G, Nasef D, Jose R, Mayer J, Bekbolatova M, Devine T, Toma M. SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models. Algorithms. 2025; 18(1):37. https://doi.org/10.3390/a18010037

Chicago/Turabian Style

Husain, Gazi, Daniel Nasef, Rejath Jose, Jonathan Mayer, Molly Bekbolatova, Timothy Devine, and Milan Toma. 2025. "SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models" Algorithms 18, no. 1: 37. https://doi.org/10.3390/a18010037

APA Style

Husain, G., Nasef, D., Jose, R., Mayer, J., Bekbolatova, M., Devine, T., & Toma, M. (2025). SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models. Algorithms, 18(1), 37. https://doi.org/10.3390/a18010037

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop