Adaptive Oversampling via Density Estimation for Online Imbalanced Classification

Lee, Daeun; Kim, Hyunjoong

doi:10.3390/info16010023

Open AccessArticle

Adaptive Oversampling via Density Estimation for Online Imbalanced Classification

by

Daeun Lee

¹ and

Hyunjoong Kim

^2,*

¹

SAP Labs Korea, Seoul 06578, Republic of Korea

²

Department of Statistics and Data Science, Yonsei University, Seoul 03722, Republic of Korea

^*

Author to whom correspondence should be addressed.

Information 2025, 16(1), 23; https://doi.org/10.3390/info16010023

Submission received: 28 November 2024 / Revised: 19 December 2024 / Accepted: 22 December 2024 / Published: 5 January 2025

(This article belongs to the Special Issue Machine Learning Approaches for Imbalanced Domains: Emerging Trends and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Online learning is a framework for processing and learning from sequential data in real time, offering benefits such as promptness and low memory usage. However, it faces critical challenges, including concept drift, where data distributions evolve over time, and class imbalance, which significantly hinders the accurate classification of minority classes. Addressing these issues simultaneously remains a challenging research problem. This study introduces a novel algorithm that integrates adaptive weighted kernel density estimation (awKDE) and a conscious biasing mechanism to efficiently manage memory, while enhancing the classification performance. The proposed method dynamically detects the minority class and employs a biasing strategy to prioritize its representation during training. By generating synthetic minority samples using awKDE, the algorithm adaptively balances class distributions, ensuring robustness in evolving environments. Experimental evaluations across synthetic and real-world datasets demonstrated that the proposed method achieved up to a 13.3 times improvement in classification performance over established oversampling methods and up to a 1.66 times better performance over adaptive rebalancing approaches, while requiring significantly less memory. These results underscore the method’s scalability and practicality for real-time online learning applications.

Keywords:

online learning; imbalanced data; concept drift; kernel density estimation

1. Introduction

Online learning refers to a learning framework where input data are processed sequentially, and the output is provided before the next input arrives. Online learning is well-suited for problems that require immediate solutions and low memory usage. For example, online learning can be used to detect defective sensor signals in real time from incremental data in hardware engineering [1]. It is also widely applied in various fields such as software engineering [2], anomaly detection [3], social media [4], risk management [5], and finance [6].

One of the challenges to consider in online learning is that sequential environments can change over time, leading to variations in the distribution of data. For example, the characteristics of anomalies may change, user preferences can shift due to certain events, and signal patterns can be affected when sensors are relocated. Several studies have been conducted to address scenarios where concept drift occurs in online learning environments. Ref. [7] provided a comprehensive review of the methods and challenges associated with online learning with concept drift. Ref. [8] examined how the diversity of the ensemble impacts the precision of online learning amid concept drift. The challenges in online learning where concept drift occurs and class labels are unavailable were addressed by [9].

Another challenge in online environments is the difficulty in effectively addressing class imbalances in classification problems. Although batch learning methods for imbalanced classification data have been extensively studied [10], they may not be suitable for online scenarios that require limited memory and fast processing. In contrast, Ref. [11] introduced a structured approach to effectively address class imbalance in online learning environments. Furthermore, Ref. [2] introduced an algorithm designed to adaptively address class imbalance in online learning environments for various fault detection scenarios.

This paper aimed to propose an online learning algorithm capable of addressing the aforementioned challenges, including concept drift and imbalanced classification data. In prior work, the study by [12] proposed a method for handling dynamic class imbalances in data streams using adaptive resampling and time-decayed metrics, while [13] introduced a method that maintains a class balance in online learning by selectively incorporating subsets of majority and minority class examples in nonstationary data distributions. In this paper, we propose a new approach that incorporates kernel density estimation and a conscious biasing procedure to efficiently manage memory, distinguishing it from the methods introduced in prior studies.

The rest of this paper is organized as follows. Section 2 introduces the challenges in online learning that this paper aimed to address. Section 3 explains the motivation, core methodology, and algorithm of our proposed approach. In Section 4, we present extensive experiments conducted using both simulated and real-world data. Finally, as the conclusion of the paper, we present a discussion on the results and the significance of our method in Section 5.

2. Related Works

This section is organized as follows. In Section 2.1, we introduce the concept of online learning. Section 2.2 defines and categorizes the types of concept drift that occur in data streams. In Section 2.3, we discuss the issue of class imbalance in online learning and define time-decayed class sizes. Finally, Section 2.4 reviews various methods for addressing class imbalance in online learning, including data-level, algorithm-level, and ensemble-based approaches.

2.1. Online Learning Framework

A data stream denotes a sequential data generating process. Learning a data stream requires great care, in that the learning process differs from ordinary data analysis. The binary classification problem of a data stream can be formulated as follows. Consider data or an instance

S^{t} = {(x_{i}^{t}, y_{i}^{t})}_{i = 1}^{n}

at time step t, with input data

x^{t} \in R^{d}, y^{t} \in {0, 1}

, where n refers to the number of instances at time t. Online learning is a sequential learning framework, where the new data arrive one-by-one

(n = 1)

. If

S^{t}

contains more than one instance

(n \geq 2)

, it becomes batch learning [7,8,14]. In situations that demand immediate results in real time, an online framework is more suitable than batch learning [15].

The term incremental learning is frequently encountered in discussions about online learning. Incremental learning focuses solely on the most recent instance, without considering previously received data [16]. Typical online incremental learning systems discard new instances after using them to update and adapt the model, enabling them to handle high-speed data streams [15]. While online incremental learning is a specific subset of online learning, this study considers the broader framework of online learning, which allows for storing data points within the constraints of limited memory.

The learning procedure of online classification can be described as a single loop. The fundamental assumption of online learning is that the input

x^{t}

and the true label

y^{t}

arrive at different time points. At time t, the online classifier first receives the input vector

x^{t}

and generates the predicted label

{\hat{y}}^{t}

before the true label

y^{t}

becomes available. Once the true label

y^{t}

has been received, the classifier evaluates its performance using a predefined loss function and updates the model based on this loss. This model update process is repeated for each

S^{t}

. There are no restrictions on the time intervals between steps, allowing them to be irregular or uneven [14,15].

2.2. Concept Drift in Online Learning

If the data generation process extends over a long period, it would be unrealistic to assume a static probability distribution

p^{t} (x, y)

for the stream

S^{t}

. These unknown distributional changes are referred to as concept drift. Ref. [9] provided a formal definition of concept drift as below.

Definition 1.

For a given data stream S, we say that concept drift occurs between two distinct points in time, t and

t + Δ

, if and only if

\exists x : p^{t} (x, y) \neq p^{t + Δ} (x, y)

where

p^{t} (x, y)

denotes the joint distribution at time t between the set of input attributes and the class label.

Concept drift is categorized into two different types: real concept drift and virtual concept drift [17,18,19]. As described in Figure 1, real drift denotes a change in the true decision boundary. Any changes that do not alter the true decision boundary are then categorized as virtual drift.

Concept drift is also categorized according to its patterns of change, such as speed, severity, or cycle nature. Ref. [20] provided the most commonly discussed categorizations: abrupt, gradual, incremental, and recurring. These different characteristics are depicted in Figure 2.

Abrupt concept drift occurs when the data-generating function changes suddenly, without a transition from the current function. In contrast, gradual or incremental drift evolve over time, with only incremental drift, involving transitions. Recurring drift refers to temporary changes in the data distribution. Further detailed elaborations can be found in [21].

In such diverse changing environments, it is extremely challenging to find a universal method that can handle all types and characteristics of drift simultaneously. Therefore, like most existing studies, this research aimed to address the problem of real concept drift. Specifically, we focused on real abrupt concept drift, where the distribution changes suddenly at a certain point, resulting in a shift in the true decision boundary.

2.3. Class Imbalance in Online Learning

Class imbalance denotes a situation where the classes in the data have different magnitudes of representativeness. The dominance of the majority class hinders the classifier in properly detecting the characteristics of each class and deteriorates the accurate prediction of minority class.

However, if infinite sequences are assumed, defining the class imbalance solely based on class size is not appropriate in an online setting. There is no consensus in the literature on the definition of class imbalance in online settings [15]. However, with the demand for real-time imbalance detection, Ref. [11] proposed the following metric, a time-decayed class size.

Definition 2.

A time-decayed class size

w_{k}^{(t)}

is updated when a new example

y^{t}

arrives, where

0 < θ < 1

and

I [y^{t} = k] = 1

if true

y^{t} = k

, otherwise 0.

w_{k}^{(t)} = θ w_{k}^{(t - 1)} + (1 - θ) \cdot I [y^{t} = k]

The time-decayed class size is updated at each time step, and a class imbalance is observed when

w_{i} - w_{j} > 0

for different class

i, j

. This dynamic metric enables automatic detection of the current imbalance status.

2.4. Online Learning Methods for Class Imbalance

Most of the leading approaches proposed for addressing class imbalance in classification tasks can be broadly classified into two types: data-level methods and algorithm-level methods. Data-level methods involve the manipulation of the training data necessary to reduce the skewed learning focus towards one or a few classes. Resampling techniques are among the most popular of these methods. For example, oversampling generates synthetic examples for the minority class, while undersampling reduces the number of majority class examples; often, a strategy that combines these methods is adopted. SMOTE [22] has become one of the most well-adopted and famous techniques in this category. However, most of the works on imbalanced data are in batch learning. Thus, the research development of intelligent resampling techniques for online learning is still in its early stage [23]. Online adaptations of the SMOTE algorithm have their own limitations, such as keeping all minority class instances to perform oversampling, which makes them impractical to deploy within online frameworks [24]. Ref. [13] proposed an adaptive rebalancing (AREBA) method that dynamically adjusts the queue size for each class in an online setting. By reducing the majority class instances to maintain balance with the minority class, AREBA effectively acts as an undersampling approach.

Algorithm-level methods, on the other hand, adjust the classifier itself, without modifying the original data. Most of them are model-dependent due to their modification of learning components, such as cost functions. Online cost-sensitive learning is the most popular algorithm-level method, where popular examples include CSOGD [25] and ONN [26].

Regarding how to tackle imbalanced data streams, many solutions adopt ensemble techniques that incorporate multiple classifiers. Ref. [12] proposed oversampling-based online bagging (OOB), which monitors the current imbalance status with a time-decayed metric and then decides the degree of oversampling using prequential recall. OOB further conducts random oversampling by updating the classifiers K times, and K is drawn from a Poisson distribution with specified mean. However, despite its effectiveness, the computational overhead of OOB makes it difficult to scale in high-speed online learning scenarios.

3. The Proposed Method

In this section, we present the motivation, methodology, and implementation details of the proposed method.

3.1. Motivation

To address both the concept drift problem and the class imbalance issue, we propose a two-step data-level approach: (a) introducing a density-based resampling mechanism, and (b) applying conscious biasing to the learning procedure.

Traditional neighbor-based oversampling algorithms, such as SMOTE, have weak mathematical foundations and are extremely sensitive to the structure of the data. This can cause suboptimal performance in some cases, especially when online learning is necessary, where access to data is highly limited. In this respect, our approach embeds a mathematically sound density-based algorithm for resampling.

Moreover, we introduce a strategic bias in the classifier by oversampling minority class examples to slightly exceed the count of majority class examples. This adjustment enables the classifier to better capture the characteristics of the minority class, accounting for the faster observation rate of majority class data.

3.2. Adaptive Weighted Kernel Density Estimator

Kernel density estimation (KDE) is one of the most popular nonparametric density estimation greenmethods [27]. KDE learns the underlying probability density by smoothing the cumulative kernel function on each data point. A standard kernel density estimator can be expressed as below.

\hat{f} (x) = \sum_{i = 1}^{n} \frac{1}{n h^{d}} K (\frac{x - X_{i}}{h}) = \sum_{i = 1}^{n} \frac{1}{n} K_{h} (x - X_{i}),

where

K : R^{d} \to R

is a kernel function and

h > 0

is the smoothing parameter that controls the amount of smoothing. Ref. [28] summarized the concept of KDE as smoothing out each data point into a smooth bump whose shape is determined by the kernel function

K (\cdot)

. Refer to Figure 3 for an easy explanation of the KDE method. A Gaussian kernel function is applied to each data point, and by averaging these kernels, the red density function can be estimated.

A weighted kernel density estimator (wKDE) weights the kernel function to add more flexibility, which can be expressed as follows.

\hat{f} (x) = \sum_{i = 1}^{n} w (X_{i}, Z_{i}) K_{h} (x - X_{i}),

where

Z_{i}

represents the covariates, if they exist. Note that this reduces to standard KDE when

w (X_{i}, Z_{i}) = \frac{1}{n}

. This paper adopts a Gaussian kernel, which is the most common choice, as it is known that the choice of kernel function has little impact on the results of density estimation [28].

In KDE, the success of the results depends on the selection of the smoothing parameter, known as bandwidth. The choice of bandwidth in KDE plays a crucial role in shaping the final form of the density function. Numerous studies have focused on identifying optimal and robust bandwidth selection methods. As part of the effort to determine an appropriate bandwidth, Ref. [29] proposed an adaptive weighted kernel density estimator (awKDE). They pointed out the drawbacks of using a fixed global bandwidth, which can obscure information in dense areas and makes the final density sensitive to noise. The awKDE method addresses this by adjusting the bandwidth size based on data sparsity. It begins by estimating the overall density with an initial bandwidth, then refines this by narrowing the bandwidth in dense regions and expanding it in sparse regions. The awKDE algorithm is given in Algorithm 1.

Algorithm 1: Adaptive weighted kernel density estimation (awKDE)

Input:

K (x)

: Kernel function, we choose the Gaussian function

Output: Estimated density

₁: Find a pilot weighted KDE $\hat{f}$ with bandwidth $h_{n}$ , $h_{e}$ , or $h_{p}$
₂: Define the local bandwidth factor $λ_{i}$ by

$λ_{i} = {\{\frac{\hat{f} (X_{i})}{g}\}}^{- α},$

where $log g = n^{- 1} \sum log \hat{f} (X_{i})$ and the sensitivity parameter $0 \leq α \leq 1$
₃: Define the adaptive kernel estimate $\tilde{f}$ by

$\tilde{f} (x) = \sum_{i = 1}^{n} w (X_{i}, Z_{i}) \frac{1}{h λ_{i}} K (\frac{t - X_{i}}{h λ_{i}})$

3.3. Online Biased Learning with awKDE Oversampling

We present a new algorithm that incorporates the awKDE method into an online learning framework. The algorithm first has several memory queues for different classes, organized based on class membership, to construct training sets. Additionally, a backup queue is employed to store the samples generated from the estimated density. The algorithm is further equipped to automatically identify the minority class using time-decayed class sizes, thus removing the need for manually specifying it. The proposed algorithm operates in two major stages: the balancing stage and the bias stage.

The balancing stage aims to balance the training set by resampling the minority class. This stage runs until the memory queues for all classes have been totally filled. In this period, the algorithm estimates the density of the minority class using awKDE and synthesizes samples from the estimated density. We use

α = 0.5

for the local bandwidth for the implementation of awKDE. Density estimation is only performed when a new instance of the minority class is observed, in order to optimize computational efficiency. Finally, to improve the data diversity, the algorithm randomly selects different instances to add to the training set every time. Figure 4 illustrates an example of the developments within the balancing stage over time: each colored box represents one new instance, and letters within the boxes represent the arrival order of instances.

When both class memory queues are full, the conscious bias stage begins. In this stage, the training set is constructed in a way that is biased toward the minority class. In this stage, the minority class is defined as the class with the smaller time-decayed size. To ensure the minority class is emphasized, samples from the backup queue are added to the training set, increasing the number of minority class instances beyond that of the majority class. This strategy helps to counterbalance the rapid influx of majority class observations. Importantly, the backup queue is only updated when a new instance of the minority class is observed, minimizing unnecessary computational overhead. The extent of the bias is managed through the bias ratio

γ

(

0 < γ

). The number of oversampled instances is determined by multiplying the queue size by

γ

. For example, if

γ = 0.5

, the number of minority class instances added will equal half the size of the queue.

This mechanism is adaptive; if a role reversal occurs between the majority and minority classes due to concept drift or other factors, the bias stage dynamically adjusts to prioritize the new minority class. Figure 5 illustrates the progression of the bias stage over time, showing the interaction between instances and the adjustment of class representation. Specifically, Figure 5 illustrates a case where the minority and majority classes are reversed after concept drift occurs. Subsequently, the awKDE for the new minority class is estimated, and samples are added from the backup queue based on this estimation. The detailed pseudocode for this process is presented in Algorithm 2.

Algorithm 2: Online biased learning with awKDE oversampling

input: f: classifier

W: max queue size,

\geq 2

γ

: biased ratio,

0 < γ

θ

: time-decayed factor for class size metrics,

0 < θ \leq 1

output: Updated classifier

4. Comparative Experiments

This section covers the experimental data, the specifics of the experimental setup, the comparison methods, and the final results of the comparisons.

4.1. Data

We performed extensive experiments with three synthetic datasets and two real-world datasets. Specifically, we designed diverse concept drift scenarios with class imbalance for the synthetic datasets. Below, we provide a detailed description of the synthetic datasets and the types of concept drift they exhibited.

4.1.1. Synthetic Dataset

Sine [30]: This contains two relevant variables; $x \in [0, 2 π]$ ; $y \in [- 1, 1]$ . Before the drift, all points under $y = sin (x)$ are labeled as minority(positive) class. After the drift, the true label is reversed. Both x and y are normalized before classification, so that $x, y \in [0, 1]$ .
Circle [30]: This contains two relevant variables; $x, y \in [0, 1]$ . We construct a circle as a true boundary and all points inside of the circle ${(x - x_{c})}^{2} + {(y - y_{c})}^{2} = r_{c}^{2}$ are labeled as minority(positive) class. Before the drift, we set $(x_{c}, y_{c}, r_{c}) = (0.4, 0.5, 0.2)$ and after the drift it varies to $(0.6; 0.5; 0.25)$ .
SEA [31]: This contains two relevant variables; $x, y \in [0, 10]$ . All points under $x + y \leq 7$ are labeled as minority (positive) class. The true decision boundary changes to $x + y \leq 3$ after the drift. Both $x, y$ are normalized before classification, so that $x, y \in [0, 1]$ .

Finally, we investigated the borderline distributional type for three different imbalance ratios:10%, 1%, and 0.1%. The borderline denoted situations where examples near the true decision boundary (the borderline) overlapped between classes. To align with the definition of the borderline type in [32], we ensured that more than 50% of the minority class instances overlapped with the majority class. We generated

t = 5000

data points, namely 5000 time steps for each targeted simulation situation and initiated the distributional change at

t = 2500

.

4.1.2. Real-World Dataset

The two real-world datasets are described below.

Gesture phase segmentation [33,34]: This dataset consists of seven videos recorded using Microsoft Kinect sensors, capturing gestures performed by three participants narrating comic strips. Each video is annotated with five gesture phases: ‘Rest’, ‘Preparation’, ‘Stroke’, ‘Hold’, and ‘Retraction’. For our experiments, we focused on distinguishing between the ‘Rest’ and ‘Hold’ phases, which are challenging to classify because both involve stationary hand positions. Furthermore, the imbalance ratio varies between individuals, with some having ‘Hold’ as the majority class and others having it as the minority class.
Fraud [6]: This dataset contains transactions made with credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred over two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, and the positive class (frauds) account for 0.172% of all transactions. The exact imbalance types and drift types are unknown; however, they are expected to be reflected given the massive dynamics in financial environments.

4.2. Classifier

We set a fully-connected neural network as the classifier for the synthetic data. It contained one hidden layer with eight hidden nodes. We initialized the weight with He normal weight initialization [35] and optimized with Adam optimization [36], with a learning rate of 0.01. The activation function for the hidden layer is LeakyReLU [37], while the final activation function was sigmoid. We set binary cross-entropy as the loss function. The epoch for each time step was set to one, which meant the classifier was updated only once per time (one-pass learning).

4.3. Competing Methods

To compare the performance of the proposed method with existing approaches, we selected both data-storing methods and model-ensemble methods. Specifically, our objective was to evaluate whether differences in data-storing methods impacted the performance by comparing approaches that stored incoming data as-is with those that incorporate balancing, as well as those that combined balancing with biasing mechanisms. Additionally, we compared methods that rely on ensemble models to highlight the trade-offs between computational complexity and performance. The methods were chosen as follows:

Baseline model: An online incremental learning algorithm with no specific treatments. This method uses only a single instance at $t - 1$ to predict the label at time t.
Sliding queue: This method trains the classifier using the most recent instances stored in a queue of size 100. While it leverages more data than the proposed method, it does so without considering the quality or adaptability of the stored instances.
Oversampling-based Online Bagging (OOB) [12]: This method employs a bagging ensemble to oversample the minority class. The ensemble size is set to 30, requiring significantly more memory to store multiple models and their corresponding data.
Adaptive REBalancing method (AREBA) [13]: This method adaptively adjusts the queue size to balance each class, with a maximum queue size of 20 per class, as specified by the authors as the optimal configuration. Since AREBA stores data separately for each class (positive and negative), the total memory usage effectively amounts to 40.
OB-awKDE: The proposed method. To achieve memory optimization, the maximum queue size for each class (positive, negative, and KDE-generated samples) was set to 20, resulting in a total memory usage of 60. We set the bias ratio to 0.5 $(γ = 0.5)$ .

4.4. Performance Measure for Online Learning Methods

In online classification, a new performance evaluation metric is essential to address the dynamic nature of streaming data. A widely used conventional metric for imbalanced classification is the G-mean, which is defined as

G - mean = \sqrt{recall \times specificity},

where

recall = \frac{T P}{T P + F N} and specificity = \frac{T N}{T N + F P}

. In this definition,

T P

,

T N

,

F P

, and

F N

refer to true positive, true negative, false positive, and false negative, respectively, as calculated from the confusion matrix shown in Table 1.

Ref. [38] proposed the prequential evaluation method to assess class-specific accuracy in online learning scenarios. This approach incorporates a forgetting mechanism that reduces the influence of older results, while emphasizing recent performance. We employed the prequential error metric with a fading factor

θ = 0.99

to compute the class-specific accuracy, referred to as prequential recall and specificity. The final evaluation metric, prequential G-mean, was then computed as the geometric mean of the prequential recall and specificity.

4.5. Results

4.5.1. Synthetic Dataset

For the synthetic data experiments, each scenario was conducted over 100 independent runs, to ensure a robust evaluation. The prequential G-mean at each time step was computed as the average across these 100 runs. To assess the specific impact of concept drift on model performance, the evaluation metrics were reset to 0 at the moment of drift (

t = 2500

), ensuring that the post-drift performance was not influenced by pre-drift outcomes. Notably, the classifiers themselves were not reset, thereby emphasizing their adaptability to changing data distributions. Statistical analysis of the evaluation metrics was performed using ANOVA, with Tukey’s post hoc tests employed to identify significant pairwise differences among methods and scenarios.

Table 2 summarizes the final prequential G-mean values, along with their standard deviations, calculated at the final time step across the 100 runs. Tukey’s post hoc test results, including statistically significant differences (p-values), are included to highlight the superiority of OB-awKDE under the various experimental conditions.

According to Table 2, the OB-awKDE method outperformed almost all competing approaches across the three datasets, particularly under severe class imbalance and abrupt concept drift. For the Sine dataset, which features a dynamic decision boundary and abrupt drift, OB-awKDE achieved the highest final G-mean of 0.752 under the 10% imbalance ratio, significantly surpassing OOB (0.694) and AREBA (0.618). This performance can be attributed to the density-based resampling mechanism, which adaptively generated synthetic samples that aligned with the complex, non-linear data structure.

For the Circle dataset, OB-awKDE achieved a G-mean of 0.734 under the 1% imbalance ratio, far exceeding OOB (0.405) and AREBA (0.386). These results demonstrate the advantage of OB-awKDE’s density estimation approach in generating high-quality synthetic samples that effectively model the underlying distribution, even when majority and minority class boundaries are heavily intertwined. In contrast, methods like OOB, which rely on random oversampling, struggled to capture this overlap, leading to a reduced classification performance.

For the SEA dataset, characterized by a linear decision boundary and abrupt drift, the OOB method showed the best performance under a 10% imbalance ratio. This is likely because ensemble-based methods have an advantage in relatively straightforward scenarios such as linear decision boundaries. However, OB-awKDE attained a final G-mean of 0.799, closely matching OOB (0.813), while requiring significantly less memory. In more extreme cases where the imbalance ratio dropped below 1%, the proposed method once again demonstrated superior performance.

Figure 6 provides a temporal visualization of the prequential recall, specificity, and G-mean dynamics, offering insight into the adaptation process and stability of each method over time. As presented in Section 4.4, the G-mean is defined as the geometric mean of the recall and specificity. Therefore, the geometric mean of the values in the first and second plots in each row of Figure 6 corresponds to the values in the third plot.

We observed that, with the occurrence of concept drift, the values of all metrics dropped immediately across the methods. However, OB-awKDE demonstrated a rapid recovery in learning ability after the concept drift, while the other methods were slower to regain their performance level. For the SEA dataset, as shown in Figure 6 green (c) the baseline method and the sliding queue method, which stored a larger number of samples (100 memory slots), showed a significant decline in recall after the drift, indicating the limitations of simple mechanical processing methods. Conversely, OB-awKDE maintained a stable recall and specificity by leveraging its adaptive resampling mechanism to preserve representative samples in memory.

A further advantage of OB-awKDE lies in its superior minority class capturing capability, a core challenge in imbalanced classification. Across datasets such as Sine and Circle, where overlapping or non-linear decision boundaries posed challenges for the traditional methods, OB-awKDE effectively generated synthetic samples that were both representative and diverse. This enabled the model to robustly learn the minority class, even under severe imbalance. In contrast, for the SEA dataset, which features a simpler linear decision boundary and a 10% imbalance ratio, OB-awKDE closely competed with the memory-intensive methods like OOB, while utilizing significantly fewer resources. This balance between efficiency and minority class representation highlights its scalability and practical applicability.

4.5.2. Real-World Dataset

The real-world experiments were conducted using two datasets: Gesture Phase Segmentation and Fraud Detection. The Gesture dataset primarily focuses on short-term performance measurement, as it involves relatively small-scale data with distinct segments and limited time steps. In contrast, the Fraud dataset provides a robust evaluation of long-term performance, due to its large-scale, highly imbalanced data and extended time horizon.

The Gesture dataset consists of three subsets (A, B, and C), all involving the classification of ‘Rest’ and ‘Hold’ phases. Since each volunteer’s recording was observed over different durations, the time steps varied accordingly. Person A’s dataset includes 2084 time steps, with the minority class being ‘Hold’, accounting for 11.66% of the data, representing the most severe imbalance. Person B’s dataset consists of 678 instances, where 38.34% belong to the ‘Rest’ class. Person C’s dataset contains 977 poses, with 34.49% classified as ‘Hold’. To maintain consistency in the criterion across subsets, the recall was calculated by considering the ’Hold’ class as ‘Positive,’ as shown in Table 1.

The Fraud dataset, with its extensive time steps of 284,807 and a minority class (fraudulent transactions) accounting for only 0.172% of all instances, serves as a benchmark for evaluating model stability and effectiveness in long-term, real-world online learning scenarios.

Table 3 summarizes the final prequential G-mean values for both datasets.

In the Gesture dataset, OB-awKDE demonstrated very high classification performance by achieving a high G-mean value. In the most severe imbalance scenario presented by Person A, OB-awKDE achieved a G-mean of 0.626, significantly outperforming AREBA (0.378) and OOB (0.565). For Person B, OB-awKDE also achieved a G-mean of 0.854, surpassing its closest competitor, OOB (0.829). For Person C, OB-awKDE attained a G-mean of 0.732, outperforming AREBA (0.724), which had shown good results for this dataset.

For the Fraud dataset, OB-awKDE achieved the highest G-mean of 0.877, significantly surpassing both AREBA (0.770) and OOB (0.066). This result highlights OB-awKDE’s ability to maintain high-quality minority class predictions in scenarios with extreme imbalance and extensive data.

Figure 7 and Figure 8 visualize the performance changes over time. With the Gesture dataset, OB-awKDE consistently demonstrated superior performance in recall, highlighting its effectiveness in capturing the minority class. For Person B, where the minority class is ‘Rest’, the specificity plot provides insights into the model’s ability to handle the minority class, because ‘Rest’ is ‘Negative’ class in the setting of Table 1.

Additionally, in the long-term evaluation using the Fraud dataset, while the specificity trends of most methods did not show significant differences, OB-awKDE demonstrated strong recall performance in capturing the minority class. This highlights its robustness in handling highly imbalanced and large-scale scenarios.

In summary, OB-awKDE excelled in both short-term and long-term performance evaluations, consistently demonstrating superior adaptability, stability, and minority class representation. These results validate its efficacy in addressing real-world challenges in online imbalanced classification.

5. Discussion and Concluding Remarks

Table 2 presents the final performance results for the synthetic datasets. The proposed method demonstrated consistently superior prequential G-mean values compared to the alternative approaches across most scenarios, with statistically significant differences. Furthermore, Figure 6 provides a detailed visualization of the prequential G-mean values over all time steps, highlighting the sustained performance advantage of the proposed method throughout the learning process. The results of the real-world datasets summarized in Table 3 further validate the effectiveness of the proposed approach. At the final time step, the method achieved significantly higher prequential G-mean values relative to the competing methods. The performance trends illustrated in Figure 7 and Figure 8 display the results across all time steps, and the proposed method showed a trend of progressively improving performance as the time steps advanced.

As a concluding remark, this study introduced a novel algorithm designed for online imbalanced classification under concept drift conditions. The proposed approach leveraged an adaptive weighted kernel density estimation technique for oversampling, applying a slight excess to minority class instances to address their slower update rates during the learning process. Through comprehensive experimental evaluations on both synthetic and real-world datasets, the proposed method was shown to outperform existing techniques, consistently achieving higher prequential G-mean values. This is attributed to the OB-awKDE method’s smoother adaptation and its density-based resampling mechanism, which dynamically adjusts to evolving distributions without excessive memory usage. These results underscore the method’s robustness and its potential applicability in a wide range of real-world online learning tasks.

As future research directions, methodologies for addressing gradual and recurring drift patterns could be explored. Furthermore, the issue of multi-class imbalance in real-time learning could also be considered.

Author Contributions

D.L. provided the core idea of the study, programmed the method, conducted the data experiment, and wrote the manuscript. H.K. set the overall direction for the paper, led the process accordingly, and performed manuscript writing and editing. Conceptualization, D.L. and H.K.; methodology, D.L.; software, D.L.; validation, H.K.; formal analysis, D.L.; investigation, D.L. and H.K.; resources, H.K.; data curation, D.L.; writing—original draft preparation, D.L.; writing—review and editing, H.K.; visualization, D.L.; supervision, H.K.; project administration, H.K.; funding acquisition, H.K. All authors have read and agreed to the published version of the manuscript.

Funding

Hyunjoong Kim’s work was supported by the IITP (Institute of Information & Coummunications Technology Planning & Evaluation)-ICAN (ICT Challenge and Advanced Network of HRD) grant funded by the Korea government (Ministry of Science and ICT) (IITP-2023-00259934) and by the National Research Foundation of Korea (NRF) grant funded by the Korean government (No. 2016R1D1A1B02011696).

Data Availability Statement

The data are public, they can be downloaded from the relevant site.

Conflicts of Interest

Author Daeun Lee was employed by the company SAP Labs Korea. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Elnahrawy, E.; Nath, B. Online data cleaning in wireless sensor networks. In Proceedings of the 1st International Conference on Embedded Networked Sensor Systems, Los Angeles, CA, USA, 5–7 November 2003; pp. 294–295. [Google Scholar]
Wang, S.; Minku, L.L.; Yao, X. Online class imbalance learning and its applications in fault detection. Int. J. Comput. Intell. Appl. 2013, 12, 134001. [Google Scholar] [CrossRef]
Meseguer, J.; Puig, V.; Escobet, T. Fault diagnosis using a timed discrete-event approach based on interval observers: Application to sewer networks. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2010, 40, 900–916. [Google Scholar] [CrossRef]
Sun, Y.; Tang, K.; Minku, L.L.; Wang, S.; Yao, X. Online ensemble learning of data streams with gradually evolved classes. IEEE Trans. Knowl. Data Eng. 2016, 28, 1532–1545. [Google Scholar] [CrossRef]
Sousa, M.R.; Gama, J.; Brandão, E. A new dynamic modeling framework for credit risk assessment. Expert Syst. Appl. 2016, 45, 341–351. [Google Scholar] [CrossRef]
greenDal Pozzolo, A.; Caelen, O.; Johnson, R.A.; Bontempi, G. Calibrating probability with undersampling for unbalanced classification. In Proceedings of the IEEE Symposium Series on Computational Intelligence, Cape Town, South Africa, 7–10 December 2015; pp. 159–166. [Google Scholar]
Ditzler, G.; Roveri, M.; Alippi, C.; Polikar, R. Learning in nonstationary environments: A survey. IEEE Comput. Intell. Mag. 2015, 10, 12–25. [Google Scholar] [CrossRef]
Minku, L.L. Online Ensemble Learning in the Presence of Concept Drift. Ph.D. Dissertation, School of Computer Science, The University of Birmingham, Birmingham, UK, 2011. [Google Scholar]
Mahdi, O.A.; Pardede, E.; Ali, N.; Cao, J. Fast reaction to sudden concept drift in the absence of class labels. Appl. Sci. 2020, 10, 606. [Google Scholar] [CrossRef]
Feng, Y.; Zhou, M.; Tong, X. Imbalanced classification: A paradigm-based review. Stat. Anal. Data Min. ASA Data Sci. J. 2021, 14, 383–406. [Google Scholar] [CrossRef]
Wang, S.; Minku, L.L.; Yao, X. A learning framework for online class imbalance learning. In Proceedings of the IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL), Singapore, 16–19 April 2013; pp. 36–45. [Google Scholar]
Wang, S.; Minku, L.L.; Yao, X. Resampling-based ensemble methods for online class imbalance learning. IEEE Trans. Knowl. Data Eng. 2015, 27, 1356–1368. [Google Scholar] [CrossRef]
Malialis, K.; Panayiotou, C.G.; Polycarpou, M.M. Online learning with adaptive rebalancing in nonstationary environments. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4445–4459. [Google Scholar] [CrossRef]
Oza, N.; Russell, S. Experimental comparisons of online and batch versions of bagging and boosting. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 26–29 August 2001. [Google Scholar]
Wang, S.; Minku, L.L.; Yao, X. A systemic study of online class imbalance learning with concept drift. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 4802–4821. [Google Scholar] [CrossRef]
Polikar, R.; Upda, L.; Upda, S.; Honavar, V. Learn++: An incremental learning algorithm for supervised neural networks. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2001, 31, 497–508. [Google Scholar] [CrossRef]
Gama, J.; Žliobaitė, I.; Bifet, A.; Pechenizkiy, M.; Bouchachia, A. A survey on concept drift adaptation. ACM Comput. Surv. 2014, 46, 1–37. [Google Scholar] [CrossRef]
Pesaranghader, A.; Viktor, H.L.; Paquet, E. McDiarmid Drift Detection Methods for Evolving Data Streams. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–9. [Google Scholar]
Brzezinski, D. Combining Block-Based and Online Methods in Learning Ensembles from Concept Drifting Data Streams. Ph.D. Dissertation, Poznan University of Technology, Poznan, Poland, 2015. [Google Scholar]
Wadewale, K.; Desai, S. Survey on method of drift detection and classification for time varying data set. Int. Res. J. Eng. Technol. 2015, 2, 709–713. [Google Scholar]
Iwashita, A.S.; Papa, J.P. An overview on concept drift learning. IEEE Access 2019, 7, 1532–1547. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Mao, W.; Wang, J.; Wang, L. Online sequential classification of imbalanced data by combining extreme learning machine and improved smote algorithm. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015; pp. 1–8. [Google Scholar]
Wang, B.; Pineau, J. Online bagging and boosting for imbalanced data streams. IEEE Trans. Knowl. Data Eng. 2016, 28, 3353–3366. [Google Scholar] [CrossRef]
Wang, J.; Zhao, P.; Hoi, S.C. Cost-sensitive online classification. IEEE Trans. Knowl. Data Eng. 2013, 26, 2425–2438. [Google Scholar] [CrossRef]
Ghazikhani, A.; Monsefi, R.; Yazdi, H.S. Online neural network model for non-stationary and imbalanced data stream classification. Int. J. Mach. Learn. Cybern. 2014, 5, 51–62. [Google Scholar] [CrossRef]
Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
Chen, Y.-C. A tutorial on kernel density estimation and recent advances. Biostat. Epidemiol. 2017, 1, 161–187. [Google Scholar] [CrossRef]
Wang, B.; Wang, X. Bandwidth selection for weighted kernel density estimation. arXiv 2007, arXiv:0709.1616. [Google Scholar]
Gama, J.; Medas, P.; Castillo, G.; Rodrigues, P. Learning with drift detection. In Proceedings of the SBIA Brazilian Symposium on Artificial Intelligence, São Luís, Brazil, 29 September–1 October 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 286–295. [Google Scholar]
Street, W.N.; Kim, Y. A streaming ensemble algorithm (SEA) for large-scale classification. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’01), New York, NY, USA, 26–29 August 2001; Association for Computing Machinery: New York, NY, USA, 2001; pp. 377–382. [Google Scholar]
Napierala, K.; Stefanowski, J. Identification of different types of minority class examples in imbalanced data. In Hybrid Artificial Intelligent Systems; Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 139–150. [Google Scholar]
Madeo, R.C.B.; Peres, S.M.; de Moraes Lima, C.A. Gesture phase segmentation using support vector machines. Expert Syst. Appl. 2016, 56, 100–115. [Google Scholar] [CrossRef]
Madeo, R.C.B.; Wagner, P.K.; Peres, S.M. UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 25 November 2024).
He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; IEEE: New York, NY, USA, 2015; pp. 1026–1034. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; Citeseer: Princeton, NJ, USA, 2013; Volume 30, p. 3. [Google Scholar]
Dawid, A.P.; Vovk, V.G. Prequential Probability: Principles and Properties. Bernoulli 1999, 5, 125–162. [Google Scholar] [CrossRef]

Figure 1. Types of concept drift. The yellow and blue points represent the data of each class, and the straight line represents the true classification boundary.

Figure 2. Characteristics of concept drift. The yellow and blue points represent two different states.

Figure 3. Illustration of kernel density estimation using Gaussian kernel: blue dots indicate data points, gray lines represent individual kernels, and the red line shows the estimated density.

Figure 4. Example of balancing stage.

Figure 5. Example of conscious bias stage.

Figure 6. Simulation results on synthetic dataset with 10% imbalance ratio and borderline imbalance type.

Figure 7. Performance metrics for Gesture dataset over time.

Figure 8. Performance metrics for Fraud dataset over time.

Table 1. Confusion matrix for binary classification.

	Predicted Positive	Predicted Negative
Actual Positive	TP (True Positive)	FN (False Negative)
Actual Negative	FP (False Positive)	TN (True Negative)

Table 2. Final-step prequential G-means and statistical test results.

Dataset	IR	Baseline	Sliding Queue	OOB	AREBA	OB-awKDE
	10%	0.207 ± 0.136 (0.000)	0.592 ± 0.086 (0.000)	0.694 ± 0.003 (0.001)	0.618 ± 0.171 (0.000)	0.752 ± 0.034
Sine	1%	0.0 ± 0.0 (0.000)	0.114 ± 0.114 (0.000)	0.395 ± 0.135 (0.000)	0.329 ± 0.118 (0.000)	0.712 ± 0.079
	0.1%	0.0 ± 0.0 (0.000)	0.0 ± 0.0 (0.000)	0.023 ± 0.107 (0.000)	0.074 ± 0.196 (0.000)	0.261 ± 0.316
	10%	0.028 ± 0.058 (0.000)	0.703 ± 0.027 (0.000)	0.701 ± 0.033 (0.000)	0.666 ± 0.155 (0.000)	0.749 ± 0.033
Circle	1%	0.0 ± 0.0 (0.000)	0.103 ± 0.146 (0.000)	0.405 ± 0.109 (0.000)	0.386 ± 0.155 (0.000)	0.734 ± 0.058
	0.1%	0.0 ± 0.0 (0.000)	0.0 ± 0.0 (0.000)	0.029 ± 0.120 (0.000)	0.060 ± 0.176 (0.000)	0.285 ± 0.335
	10%	0.302 ± 0.059 (0.000)	0.575 ± 0.044 (0.000)	0.813 ± 0.026 (0.842)	0.680 ± 0.197 (0.000)	0.799 ± 0.030
SEA	1%	0.0 ± 0.0 (0.000)	0.088 ± 0.116 (0.000)	0.568 ± 0.097 (0.000)	0.333 ± 0.201 (0.000)	0.758 ± 0.064
	0.1%	0.0 ± 0.0 (0.000)	0.0 ± 0.0 (0.000)	0.038 ± 0.134 (0.000)	0.056 ± 0.172 (0.000)	0.194 ± 0.290

Table 3. Final-step prequential G-means for real-world datasets.

Dataset	Baseline	Sliding Queue	OOB	AREBA	OB-awKDE
Gesture of A	0.382 ± 0.259	0.106 ± 0.256	0.565 ± 0.314	0.378 ± 0.246	0.626 ± 0.251
Gesture of B	0.767 ± 0.234	0.748 ± 0.253	0.829 ± 0.213	0.697 ± 0.293	0.854 ± 0.262
Gesture of C	0.136 ± 0.162	0.597 ± 0.280	0.189 ± 0.172	0.724 ± 0.224	0.732 ± 0.213
Fraud	0.424 ± 0.159	0.064 ± 0.202	0.066 ± 0.205	0.770 ± 0.121	0.877 ± 0.128

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, D.; Kim, H. Adaptive Oversampling via Density Estimation for Online Imbalanced Classification. Information 2025, 16, 23. https://doi.org/10.3390/info16010023

AMA Style

Lee D, Kim H. Adaptive Oversampling via Density Estimation for Online Imbalanced Classification. Information. 2025; 16(1):23. https://doi.org/10.3390/info16010023

Chicago/Turabian Style

Lee, Daeun, and Hyunjoong Kim. 2025. "Adaptive Oversampling via Density Estimation for Online Imbalanced Classification" Information 16, no. 1: 23. https://doi.org/10.3390/info16010023

APA Style

Lee, D., & Kim, H. (2025). Adaptive Oversampling via Density Estimation for Online Imbalanced Classification. Information, 16(1), 23. https://doi.org/10.3390/info16010023

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Oversampling via Density Estimation for Online Imbalanced Classification

Abstract

1. Introduction

2. Related Works

2.1. Online Learning Framework

2.2. Concept Drift in Online Learning

2.3. Class Imbalance in Online Learning

2.4. Online Learning Methods for Class Imbalance

3. The Proposed Method

3.1. Motivation

3.2. Adaptive Weighted Kernel Density Estimator

3.3. Online Biased Learning with awKDE Oversampling

4. Comparative Experiments

4.1. Data

4.1.1. Synthetic Dataset

4.1.2. Real-World Dataset

4.2. Classifier

4.3. Competing Methods

4.4. Performance Measure for Online Learning Methods

4.5. Results

4.5.1. Synthetic Dataset

4.5.2. Real-World Dataset

5. Discussion and Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI