1. Introduction
Electrocardiography is a noninvasive tool that is accessible globally for the initial diagnosis of cardiovascular diseases [
1,
2]. There are many forms of cardiovascular disease that can be identified via the ECG graph of a patient, including arrhythmia [
3] Due to the huge volume of its application, researchers are developing several effective tools to classify ECG signals effectively. Arrhythmia is considered the most common form of heart disease, which refers to a condition where heart rhythms are irregular. It can be detected with abnormalities in the ECG graph. But analyzing the ECG graph abnormalities is time-consuming and difficult without expert supervision. On the other hand, the morphology and characteristics of different patients’ ECG signals have diversity due to different physiological situations. Some of the same diseases can have different patterns in the ECG signals [
4], which can be difficult to interpret or annotate for generating training samples’ deep learning models. Recently, deep learning (DL) appears with strong baseline capabilities in many real-life applications. DL-based ECG classification has become so popular and mainstream nowadays due to the growth of AI in disease diagnosis. As DL has the powerful attribute to learn features from large-scale clinical data in various conditions, several successful DL-based diagnosis tools are designed [
3]. Mostly, CNNs and RNNs are used to learn the high-level features from raw time series ECG data, and these features are utilized in the classification task. But these classification models are not generalizable in new subject data during testing. In most real-time applications, these models fail to deliver desired precision in classification.
ECG classification can be referred into two schemes, namely, intrasubject and intersubject. However, the training and test examples are differently distributed but belong to the same subject and can be referred to as intrasubject. The main limitation of developing the intrasubject scheme is that models are not well generalized in new subject cases; hence, there is a noticeable performance drop, whereas if the training and test examples are not overlapping and belong to different subjects, that can be referred to as intersubject. In the intersubject scheme, the samples have differences individually. Individual differences refer to differences in sex, age, physiological conditions, etc. In this case, the classification performance is based on being subject-to-subject. Therefore, differences in the training and testing examples make the whole learning generalizability difficult; hence, it is considered the most challenging and realistic setting for developing ECG classification models [
5].
Even though DL methods have made great advances in the intrasubject scheme, they are still facing considerable performance degradation in the intersubject scheme, as the intersubject scheme is a more realistic setting for developing applications, since each subject/patient has different morphological characteristics in ECG signal data due to the variance in individual cases. Considering the application reliability, a method called subject-independent refers to the training of a model with labeled training samples and testing the model into the new test samples strictly. But in this case, the model performance deteriorates by a significant margin, which can not be relied on while building models. The problem can be solved via subject-specific training with a specific model, where the whole scheme can be divided into two phases, such as initial training and fine tuning [
5]. Usually, in the initial training, the model uses the training samples to train, and in the next stage, the model is fine-tuned by partially labeled samples from test samples. The whole phase increases the adaptation capacity of intersubject variation. But this scheme is associated with various underlying issues, such as developing models for each individual subject or patient is not feasible for real-time deployment where computational resources are limited. Another major drawback is that annotating such a large amount of subject-wise data tends to be expensive and laborious. Instead of annotating in a conventional way, researchers adopt domain adaptation (DA) methods to solve these limitations, where domain shift refers to distribution diversity between train and test samples [
6]. And DA is a prominent method that solves the issue of domain shift. In that case, training and test samples are considered as source and target samples, respectively.
However, ECG datasets are often limited for real-time testing, and data imbalance issues are also there. For example, the MIT-BIH arrhythmia dataset is an imbalanced dataset, where normal class data are higher than rare disease class data. In other datasets, this sparse representation of data is presented. Another challenge is collecting well-representable data for model training. Technically, deep learning models show an inherent bias towards classes that are more common in training [
7]. But individual performance in minor classes is not appreciable considering the relative overall performance based on accuracy. Hence, we need a well-occupied data augmentation method that can create synthetic data with respect to the original data for the minor classes. But it causes a marginal performance boost that might not be sufficient in medical diagnosis. Hence, generative models can learn the distribution of the training data so that it generates samples that have the same distribution. Thus, it can improve the classification performance in minor classes.
Most of the unsupervised domain adaptation (UDA)-based methods require a common connection between source and target distributions [
8], and also need access to source data during real-time deployment. But in medical cases, it is intended to be highly impractical due to the nature of privacy concerns in patient data due to several ethical implication issues, such as how most UDA cases and other traditional models tend to access the direct patient data in the real-time adaptation, which breaks data confidentiality. It is also highly risky that traditional approaches are vulnerable in preserving data anonymization since most of the deep learning approaches tend to memorize specific information of the patient. There are some other ethical concerns that are associated with data privacy, such as data sharing without explicit consent, bias, and discrimination, etc. And it is not feasible to train a model with limited computational resources. On the other hand, an accidental loss of source data can make it impossible for traditional UDA techniques to generalize in new subject data. Therefore, there are many underlying challenges even in the UDA setting for ECG classification tasks. In
Figure 1, we illustrate the source and target sample before adaptation and also illustrate the effect of source-free adaptation with generated samples.
To tackle the aforementioned issues in traditional DL-based models and UDA methods in ECG classification, we adopt a novel problem-setting source-free adaptation. Source-free adaptation (SFA) [
8,
9] is a new extension of the area of DA/UDA, where in real-time adaptation, a source dataset is not required. Perhaps, it is suitable for overcoming all the open challenges in ECG classification can be solved via the SFA method. So, we reinforce the SFA idea into ECG classification, which is totally a novel scenario in this problem. We propose a source-free intersubject method, SF-ECG, for ECG classification. Our proposed framework can improve the intersubject performance of DNN without requiring source data in adaptation. Our framework consists of three modules: data adjustment, local clustering, and data adaptation. More of the details can be found in
Section 3.
Our contributions are summarized into four folds:
To the best of our knowledge, we first incorporate the source-free adaptation strategy to address interpatient ECG classification.
We propose an SF-ECG framework, which is an unsupervised domain adaptation method that predicts unlabeled target samples without requiring source data in the test-time adaptation setting.
We adopt a generative model that generates high-quality ECG training samples for the source classifier. It alleviates the data insufficiency problem in class-wise training. Which exhibits a strong source classifier that is utilized in the test-time adaptation.
Empirical experiments show that our framework is capable of classifying interpatient ECG samples with high precision even in this novel scenario, outperforming many state-of-art methods in the same settings with a better generalization ability.
Organizations. The remaining part of the paper is organized as follows,
Section 2 discusses related work.
Section 3 provides an extensive demonstration of our proposed framework in detail.
Section 4 presents the experimental setting, results, and ablation studies. And finally,
Section 5 ends with a discussion and conclusion of this work.
3. Proposed Method
Our framework for source-free intersubject ECG classification is divided into three modules, namely, data adjustment, local structure learning, and adaptation network. In the data adjustment module, we use a generative model to generate synthetic samples based on the source sample distribution. The local structure module learns the target features and clusters these features based on the neighbors. An illustration of the overall framework is shown in
Figure 2.
3.1. Preliminaries
In the source-free domain adaptation task (SFDA), we consider ; here, is the number of samples in the source set that are labeled. and are the source samples and their corresponding labels, respectively. We also have the target set = ; here, is the number of samples that have no corresponding labels. In the SFDA settings, we only have access to the source set in the model pretraining condition. However, it is not available in the real-time adaptation task.
3.2. Generative Model for Data Adjustment
In various cases, ECG datasets often face data imbalance in minor classes. To generate synthesized data, we use a generative model, GAN [
25,
26,
27]. Generally, GAN is formulated into two modules: (1) generator and (2) discriminator. The generator takes source samples with random noise,
z. This random noise
z learns the distribution of the source data
=
and outputs synthetic samples
=
. The output of the discriminator is usually the synthetic and original data. The discriminators’ objective is to distinguish the synthetic and original data precisely. Basically, the generator is trained to generate synthetic data to classify the original by the discriminator. On the other hand, the discriminator tends to recognize the synthetic and original data as accurately as possible. The training objective of GAN can be described as the value function
, where the generator value function
is expected to maximize and the discriminator value function
is expected to minimize:
Here,
and
are denoted as the probability of
x, which belongs to the original data distribution
, and a substantial mapping function obtained from the noise vector to the generated vector, respectively. However, the optimal parameters can be obtained by maximizing and minimizing the value function of the discriminator and generator, respectively. Discriminator
is trained with both positive and negative ECG samples in each iteration. And the generator
is updated with policy gradients two times, while
is updated once; we consider this strategy from [
25]. Training converges until
is indistinguishable. Generated synthetic samples on the class-deficient category alleviate this issue, and certainly make the dataset considerably balanced. We use the balanced dataset in the local structure module (C). Adversarial loss and cross-entropy loss are adopted for the GAN module.
Table 1 depicts the model description.
3.3. Local Structure Module
In traditional settings, UDA methods are based on the feature alignment strategy of source and target features. But in the SFA setting, it is not feasible to access the source data. We usually obtain a class prediction
and feature embedding space
. Following [
9], the local structure method intends to shift the target features with the source domain. In the first place, due to the expected domain shift, some of the target features can be deviated by far from the source feature region. But in this case, we still consider that feature space retains the clusters that are formed by the classes. For this, we actually measure the distance and move the data point into the close likely data point cluster. Due to the domain shift between the source and target interpatient data, there can be wrong predictions by the classifier. So, it is essential to cluster target features that are in the same class clustered together. Therefore, it is obvious that the nearest neighbor target features tend to have share category labels. As a result, the clustered features are most likely to have an inclination toward a common label jointly. As shown in
Figure 2, this scheme helps to classify target features correctly that possibly have the wrong classification.
To ensure semantically close neighborhoods, we use a feature extractor bank
=
to accumulate joint target features. Then, we utilize a neighborhood selection procedure to ensure the best clusters among target features. It explicitly encourages correct classification. We take a few nearest neighbors from the feature extractors with a consistency regularization. To store the softmax predicted scores, we use a score bank
=
. Here,
are the classifier and feature extractor, respectively. To achieve this, we adopt the ResNet-50 model with a fully connected layer as the feature extractor
, and an additional fully connected layer as classifier
g. The local clustering is obtained by adopting the following loss in Equation (
2) for consistent predictions in the k-nearest features [
9].
Here,
is the k-nearest neighbors stored in the feature extractors for each target feature; it is calculated based on the cosine similarities. The dot product between stored prediction scores
and target sample
is minimized of the negative log values. Equation (
2) intends to ensure the consistent correct prediction between the nearest neighbors and features. Here, the term
and
q are the empirical label distributions and uniform distribution, respectively. And finally, the old items are replaced in the bank with the new items using the corresponding minibatch.
Figure 3 illustrates the local structure module.
3.4. Source-Free Domain Adaptation Module
Our domain adaptation module has the feature extractor from the local structure module, which is a shared feature extractor with source and target features aligned. We also have a classifier network and a discriminator; the whole setting is replicated from [
36]. As we know, in the SFDA setting, source data are not accessible during the adaptation. So, in this case, using the GAN framework we generated identically distributed samples from the source samples to use in the training time with
. As we already have the feature extractor, which exhibits the neighborhood-based domain-invariant target feature alignment with the source samples, the discriminator tends to separate the generated samples from the unlabeled target samples. The classifier network is utilized to classify the target samples while it is trained and fine-tuned with the
. As such, no source database
is required in the adaptation time, since we are only using the generated source samples. All the feature extractors, classifiers, and discriminators have learnable parameters. However, in the adaptation task, fixed numbers of generated source samples from
are taken; hence, the adaptation performance is based on the number of source samples that are used during the adaptation.
3.5. Objective Function
Following the work in [
36], we use four types of loss functions in our framework to train. Here is the description of each loss function.
Adversarial Loss (
): Adversarial loss is used during the GAN discriminator training; it actually helps the discriminator to discriminate between synthetic data and original data. There are two adversarial losses for the generator and discriminator. Here,
and
are the generator and discriminator loss, respectively. Generator loss
is denoted as:
where
z and
y are the random noise and generated class labels, respectively. The discriminator loss
is denoted as:
Here, is the target data, which are sampled from distribution.
Cross-Entropy Loss: This loss is calculated when the generated samples are used in the pretrained classifier. This loss ensures consistent parameters by not updating the pretrained classifier.
is denoted as:
Here,
,
is the pre-trained classifier and traditional cross-entropy [
36] loss respectively.
is the number of synthetically generated samples.
Discriminative Loss: Domain invariant features are obtained from the feature extractor, and discriminative loss is used in the feature extractor. This is apparently a binary loss between generated source and target samples. We train this feature extractor using a gradient reversal layer [
37].
is denoted as:
Here, n and are the total number of generated and target samples and domain labels, respectively.
Classification Loss: The classification loss for the classifier in training with the generated samples. The gradient of classification loss is also utilized in the feature extractor training.
Here, C is the classifier and is the number of synthetically generated samples.
Total Loss: The total loss is expressed by:
Here, are the tuning factors. Ideally, , , and are set to 1, and is set to 0 until 50 epochs, and then it is set to 1.
5. Conclusions
In this paper, we propose a novel domain adaptation technique called SF-ECG, and also devise a new domain adaptation task in ECG classification, which is source-free domain adaptation. Our framework is developed for interpatient adaptation tasks, mostly for arrhythmia classification. We design a framework that consists of three modules, namely, data adjustment, local clustering, and source-free adaptation. With a more balanced performance across multiple categories, we achieve results that are comparable to those of other contemporary arts. To efficiently improve the deep learning model, we use four loss functions to reduce the performance-degrading distribution discrepancies of various records. In the inference phase, our method does not require additional data or even sources, and does not necessitate the addition of annotations to new records. The proposed approach is adaptable to new data and has the potential to significantly enhance deep learning models’ interpatient performance. Our approach is also capable of data privacy and generalizability in unseen samples. We obtain more stable results in minor categories where samples are deficient.