Next Article in Journal
An Efficient Approach for Localizing Sensor Nodes in 2D Wireless Sensor Networks Using Whale Optimization-Based Naked Mole Rat Algorithm
Previous Article in Journal
Enhanced Classification of Human Fall and Sit Motions Using Ultra-Wideband Radar and Hidden Markov Models
Previous Article in Special Issue
Deep Learning for Motion Artifact-Suppressed OCTA Image Generation from Both Repeated and Adjacent OCT Scans
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Transformative Noise Reduction: Leveraging a Transformer-Based Deep Network for Medical Image Denoising

1
Department of AI and Robotics, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, Republic of Korea
2
Korea Agency of Education, Promotion and Information Service in Food, Agriculture, Forestry and Fisheries, Sejong 30148, Republic of Korea
3
Division of Software Convergence, Sangmyung University, Seoul 03016, Republic of Korea
4
School of Medicine, Sungkyunkwan University, Suwon 16419, Republic of Korea
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2024, 12(15), 2313; https://doi.org/10.3390/math12152313
Submission received: 2 July 2024 / Revised: 18 July 2024 / Accepted: 20 July 2024 / Published: 24 July 2024

Abstract

:
Medical image denoising has numerous real-world applications. Despite their widespread use, existing medical image denoising methods fail to address complex noise patterns and typically generate artifacts in numerous cases. This paper proposes a novel medical image denoising method that learns denoising using an end-to-end learning strategy. Furthermore, the proposed model introduces a novel deep–wider residual block to capture long-distance pixel dependencies for medical image denoising. Additionally, this study proposes leveraging multi-head attention-guided image reconstruction to effectively denoise medical images. Experimental results illustrate that the proposed method outperforms existing qualitative and quantitative evaluation methods for numerous medical image modalities. The proposed method can outperform state-of-the-art models for various medical image modalities. It illustrates a significant performance gain over its counterparts, with a cumulative PSNR score of 8.79 dB. The proposed method can also denoise noisy real-world medical images and improve clinical application performance such as abnormality detection.

1. Introduction

Noise is widespread in medical images because of the characteristics of medical imaging, including the image acquisition process and respiratory movement. Such arbitrary modifications to the acquired images can significantly degrade the perceptual quality by incorporating numerous artifacts and obscuring salient details. Consequently, the performance of image analysis algorithms, such as segmentation, registration, and classification, are affected. Additionally, these degraded images directly affect the decision-making processes of medical practitioners. Despite its numerous real-world implications, medical image denoising (MID) is challenging, as it necessitates the preservation of crucial diagnostic information while effectively reducing noise [1,2,3].
MID, which is a challenging topic, is widely investigated by the vision community. Initially, classical image processing techniques such as non-local self-similarity [4], sparse coding [5], and filter-based approaches [6,7,8] were employed for MID. However, the current state-of-the-art denoising methods involve deep learning using two learning strategies: learning denoising as image-to-image translation and learning residual noise from noisy images. Although these learning-based denoising methods have shown promising results compared with classical approaches, their performance remains limited, and they fail in extreme cases (i.e., Gaussian noise at sigma = 50), as shown in Figure 1.
The current image-to-image denoising methods can effectively remove noise from images. However, these methods typically result in oversmoothing in cases involving complex spatial distributions. Consequently, essential features and details are not preserved, which is a critical issue in medical imaging. Meanwhile, residual denoising strategies are exempt from smooth spatial representations. However, these strategies may generate visually disturbing artifacts with desaturated complex structures. In general, both existing MID methods fail to reconstruct detailed, natural-appearing images similar to the reference ground truths. This underscores the urgency and significance of our study for accurately preserving the details of medical images.
Despite the severe limitations of existing denoising methods, MID presents numerous real-world implications. The widespread applicability and importance of MID motivated us to develop an effective method for managing the diverse and high levels of imaging noise present in numerous medical imaging modalities. Additionally, the recent success of transformer [14,15] models inspired us to investigate transformer-based attention in MID to shift the paradigm of medical image denoising research.
This study proposes a novel deep method to effectively learn MID and address its limitations, thus steering MID research in a new direction. The proposed method introduces a novel deep and wide residual (DWR) block to learn the underlying noise in medical images. Despite being deep in architecture, the proposed block leverages a dilated convolution operation to capture long-distance intrapixel dependencies while rendering the block more efficient. The proposed DWR block addresses smoothing artifacts, which are widespread in existing image-to-image translation-based denoising methods. Additionally, the proposed method leverages a multihead attention (MHA) block [14] to reconstruct denoised images in the latter half of the proposed network. The proposed MHA addresses the limitations of residual denoising approaches by reconstructing plausible, high-quality images without generating visual artifacts. The proposed method was extensively investigated and compared with existing MID methods on different medical imaging modalities. The practicability of the proposed method was evaluated using noisy real-world medical images. The contributions of the current study are as follows:
  • A novel transformer-attention-based deep architecture is proposed that can address the limitations of existing MID methods.
  • A novel DWR module is proposed to learn long-distance pixel dependencies in order to perform MID efficiently. Additionally, this study proposes to leverage MHA in the decoder to mitigate artifacts from denoised images.
  • Dense experiments conducted on numerous medical modalities show that the proposed method substantially outperforms existing MID methods based on qualitative and quantitative comparisons.
  • The effectiveness of the proposed method is investigated based on real-world noisy medical images, and its practicability is analyzed for real-world usage.
The remainder of this paper is organized as follows: Section 2 reviews the related studies, Section 3 details the data simulation and learning strategy, Section 4 presents an analysis of the experimental results, and Section 5 concludes this paper.

2. Related Studies

MID is considered one of the most challenging enhancement tasks in medical imaging. Hence, numerous novel approaches for addressing MID have been introduced in recent years. However, learning-based methods are superior to their classical counterparts. This section briefly reviews the learning-based approaches.

2.1. Image-to-Image Translation

Deep learning is widely used in MID [3,16,17]. The most recent studies have considered denoising to be an image-to-image translation task. Gondara et al. proposed a convolutional autoencoder (CAE [9]), and Walid et al. proposed a denoising autoencoder (DAE [13]) to learn additive denoising. Chen et al. and Fan et al. proposed a residual encoder–decoder convolutional neural network [18] and a quadratic autoencoder [19], respectively, to denoise low-dose computed tomography (CT) images. Hyun et al. used U-Net denoising and k-space correction simultaneously to denoise magnetic resonance images [20]. Similarly, Kidoh et al. designed a shrinkage convolutional neural network (SCNN) and a deep-learning-based reconstruction (dDLR) network to denoise brain magnetic resonance images [21]. Rawat et al. proposed a feature-guided denoising convolutional neural network for learning additive noise reduction [22].
A few recent studies have leveraged adversarial training strategies. For example, Ghahremani et al. [23] comprehensively investigated MID using a U-Net and adversarial guidance. Zhou et al. proposed a unified motion correction and denoising adversarial network [24]. Li et al. utilized a conditional generative adversarial network to reduce random noise in CT images [25]. Similarly, Jianning et al. [26] proposed a multilevel discriminator to denoise CT images. Notably, such image-to-image translation approaches typically yield smoother outputs with less-detailed edges and textures compared with their conventional counterparts.

2.2. Residual-Noise Estimation

Recent studies have addressed the challenge of denoising noisy medical images by learning the underlying noise patterns instead of relying solely on image-to-image translation techniques. Jiang et al. [11] employed a denoising convolutional neural network (DnCNN) [27] that was specifically designed for magnetic resonance image denoising. Their main aim was to enhance image quality by effectively removing noise while preserving important image features. Based on this study, Jifara et al. and Walid et al. [3,10] further improved the performance of a DnCNN by modifying the network architecture such that it can manage the complex noise patterns inherent in medical images more effectively. Similarly, Kokil et al. [28] proposed the use of a residual learning network to address speckle noise in medical images. Their method effectively mitigated noise artifacts by leveraging residual connections while maintaining the image details. More recently, Sharif et al. [12] introduced a dynamic residual attention network (DRAN) designed to learn residual noise patterns from multimodal medical images. This approach adapts attention mechanisms to focus on the relevant image regions, thereby enabling accurate noise removal across different imaging modalities.
Although these residual denoising methods are promising for generating sharper and cleaner images, their potential limitations must be acknowledged, particularly in extreme cases where they may inadvertently introduce visual artifacts. Thus, studies are being actively conducted to further refine these algorithms so that noise reduction can be achieved simultaneously with the preservation of crucial image features in medical imaging applications.
The proposed method further extends the possibility of a generic denoising method that can perform multipattern denoising on multiple imaging modalities. Notably, a generic denoising method can offer many advantages, such as the sharing of domain knowledge among numerous imaging modalities. Table 1 shows a comparison between existing methods and the proposed method.

3. Method

This section describes the process of preparing the data for learning MID. Additionally, insights into the proposed novel deep model and its components are presented.

3.1. Data Preparation

The preparation of large-scale data samples to learn MID is challenging. Only a few real-world data samples are available for open MID research. Therefore, in this study, we obtained large-scale MID data samples and simulated Gaussian noise to learn generic MID.

3.1.1. Data Acquisition

One of the main motivations of this study was to generalize and illustrate the practicability of deep denoising in diverse medical imaging modalities. Therefore, we investigated the following modalities to learn MID efficiently:
  • X-ray imaging is widely used for diagnosing bone fractures, joint problems, lung conditions, dental issues, etc. This study leverages the well-known Chexpert [29] dataset to represent X-ray images.
  • Magnetic resonance imaging (MRI) is an effective medical imaging technique that uses magnetic fields and radio waves to generate detailed images of the body’s internal structures. It is crucial for diagnosing various conditions from brain tumors to joint injuries. This study leverages the dataset presented in [30] to learn MID for MRI.
  • CT is a diagnostic imaging method that uses X-rays to create cross-sectional images of the body, thus providing detailed views of internal structures and aiding in the detection and diagnosis of various medical conditions such as fractures, tumors, and internal bleeding. The scan dataset presented in [31] was used to learn MID in CT images.
  • Microscopy provides high-resolution images that reveal the intricate details of minute biological structures, cells, tissues, and microorganisms, and it is essential for advancing our understanding of biology, medicine, and various scientific disciplines. Furthermore, microscopic images typically contain Gaussian noise, which exhibits various pixel intensities, thus complicating accurate analyses and interpretations in fields such as biology and materials science. Thus, protein atlas scans [32] were used to investigated MID.
We obtained 20,000 random images for training and 1000 images for validation while learning MID. An additional 4000 samples from the obtained data (1000 images from each modality) were used for an extensive evaluation with various noise factors. Figure 2 shows the representative images of the imaging modalities used in this study. Notably, a fixed noise deviation was used to test the deep model in the testing phase and to realize an unbiased comparison among the deep models. However, noise was randomly generated during training to diversify the data and avoid overfitting. We simulated Gaussian noise in these samples to learn MID and to analyze the performance of the deep networks. Notably, the proposed study leveraged only imaging modalities that are commonly used for diagnoses and incorporated Gaussian noise.
In addition to simulating noise for extensive evaluation, we incorporated noisy medical images to illustrate its practicability in actual applications.

3.1.2. Noise Simulation

Noise in medical images is typically considered an additive factor and can be represented as
n s N ( I R | μ , σ 2 )
Here, μ and σ 2 denote the mean and variance of the Gaussian distribution ( N ), respectively.
Considering this basic principle, we added Gaussian noise to a clean image I R . To learn MID efficiently, we simulated noise in the acquired data samples. Therefore, reference noisy image pairs must be formulated by contaminating them with artificial noise. In this study, a uniform noisy image I N was generated from a noise-free image I R as follows:
I N = I R + n s
The illustration presented in Figure 3 shows an example of a noisy–clean image pair alongside the corresponding generated noise. Notably, the noise simulation process incorporates a crucial element: the random standard deviation of the noise distribution. We tuned n s such that the standard deviation varied randomly from 0 to 75. This wide range of noise deviation allowed us to extensively evaluate the capability of deep models for a diverse range of noise patterns and levels.
This method was deliberately designed to introduce variability in the intensity of the generated noise. It aims to mimic the diverse spectra of noise encountered in real-world scenarios. This variability is essential for creating a realistic representation of noise that reflects the nuances observed in practical settings.

3.2. Learning from Data

The proposed method introduces a novel deep architecture to learn MID effectively. The proposed network aims to learn MID as M : I N I C . Here, the mapping function ( M ) learns to generate a clean medical image ( I C ) from a noisy input ( I N ) as I C [ 0 , 1 ] H × W × 3 . Meanwhile, H and W represent the height and width, respectively, of the input and output images.

3.2.1. Network Architecture

As shown in Figure 4, the proposed network regards MID as an image-to-image translation task. The proposed deep network is designed to leverage the advantages of a feature pyramid structure [33] with a DWR module and the MHA to obtain plausible images. The proposed DWR module allows the method to perceive long-distance pixel correlations to understand the spatial relations between neighboring pixels. Additionally, the proposed MHA enables the proposed method to exploit learned long-distance pixel dependencies while performing image reconstruction through decoding. Meanwhile, the features learned at different feature levels are propagated using a contextual gating mechanism to leverage spatial awareness and reduce the underlying noise of the encoder blocks. Notably, the early layer of the denoising networks encodes raw noise and salient features. Therefore, propagating such features to decode a clean image can result in noisy images, despite efficient feature encoding. This study leverages a feature gate to refine the encoded features and address this limitation.
Additionally, the proposed deep network presents a fully convolutional encoder–decoder architecture [34,35] that features convolutional skip connections. The initial layer of the generator transforms the input image ( I L ) into a 64-depth feature map. This input convolutional layer employs a kernel size of 3 × 3 , padding of 1, and stride of 1. Meanwhile, the encoder comprises four consecutive feature levels with alternating feature depths of d = 64 , 96 , 128 , 160 . Following each DWR block in the encoder, a convolutional downsampling layer is applied as follows:
F = C ( X )
Here, C represents a 3 × 3 convolution operation with a stride of 2.
In addition to the encoder, the proposed architecture includes a decoder that efficiently reconstructs noise-free images. The proposed decoder leverages a DWR block, followed by an MHA block and an upsampling block. The decoder section of the network mirrors the encoder in terms of the number of feature levels, with an upsampling layer following each residual block. The upsampling operation is implemented as follows:
F = C ( X )
Here, F involves a transpose convolution operation [36], which renders the model fully convolutional and results in effective restoration.
When traversing the feature levels, the decoder has the same dimensions as the encoder. Additionally, to propagate features between blocks of the same dimensions for efficient denoising, we leverage a convolutional gate to refine the features while propagating them for reconstruction. The convolutional gating mechanism is perceived as follows:
F G = C 1 × 1 ( X )
Here, C 1 × 1 represents a point-wise convolutional operation with a kernel size of 1 × 1 . Finally, the decoder portion culminates in a final convolutional layer, which yields a three-channel enhanced image based on a convolutional kernel size of 3 × 3 , padding of 1, and a stride of 1. This final output layer is activated using a tanh function to obtain the final images within the [ 0 , 1 ] range.

3.2.2. DWR Module

Residual blocks [37] have been proven to be efficient for learning image denoising. Typically, the residual block learns the input feature X using the following equation:
R = D ( X ) + X
Here, D ( · ) represents vanilla residual blocks with consecutive convolutional operations, which present a few notable limitations. For example, they cannot extract salient features using a shallow architecture. Therefore, recent studies pertaining to denoising using residual blocks have used consecutive blocks to create a deeper architecture to learn denoising effectively. However, we discovered that such an approach renders the convolutional architecture computationally expensive. Despite their high complexity, conventional residual blocks cannot capture long-distance pixel dependencies because of their narrow receptive fields. Hence, we propose a novel residual block to address the mentioned problems. Figure 5 presents an overview of and a comparison between the proposed DWR and conventional residual blocks.
As shown in Figure 5, the proposed DWR module replaces the convolutional operation of Figure 5b with two consecutive dilated convolutions. Here, the proposed dilated convolution operation leverages a dilation of size 4. Significant dilation enables the proposed DWR module to encompass a wider receptive area and capture long-distance pixel-wise dependencies. Additionally, consecutive dilated convolutions allow the proposed network to traverse deeper without incorporating consecutive residual blocks. Apart from capturing long-distance dependencies, such an architecture can avoid gradient-diminishing problems, which are inherent in consecutive residual blocks. The proposed DWR module allows the deep architecture to traverse deeper with a wider receptive field without exponentially increasing the computational complexity. Based on the architectural modifications, Equation (6) can be derived as follows:
R = D ( X ) + X
Here, D incorporates a 1 × 1 convolution and is followed by two convolutions comprising a kernel size of 3 × 3 , stride size of 1, padding size of 4, and dilation size of 4. The final layer of the proposed DWR module is a point-wise convolution. Here, we used point-wise convolutions to reduce the computational complexity while introducing adaptive channel interactions to render the architecture more efficient.

3.2.3. MHA

MHA is a pivotal mechanism in artificial intelligence that is prominently utilized across diverse domains such as natural language processing and computer vision [14,15]. It empowers models to focus concurrently on multiple segments of the input sequence, thereby facilitating the capture of intricate dependencies and correlations within the tensor. At its core, the MHA block processes input embeddings through query, key, and value matrices and computes attention scores to ascertain the relevance of each element to the others in the sequence. Through a multistep process involving attention score computation, softmax normalization, and weighted value aggregation, the MHA enables the model to simultaneously attend to various aspects of the input, thus enhancing its ability to discern complex patterns and nuances within the specified tensor.
Considering the widespread success of the MHA, we suggested its incorporation into the proposed network architecture to efficiently process long-range dependencies and capture complex patterns extracted by the proposed DWR module. Figure 6 provides an overview of the MHA block. We conceptualize the MHA as follows:
M H A ( Q , K , V ) = c o n c a t ( h e a d 1 , , h e a d h ) W O
where head i = Attention ( Q W i Q , K W i K , V W i V )
Attention ( Q , K , V ) = softmax Q K T d k V
In our approach, we employ three crucial matrices: Q, K, and V, which represent the query, key, and value matrices, respectively. These matrices contribute significantly to capturing various aspects of the input data. Each attention head, denoted by head i , yields an output, thus enabling the model to focus on different input components simultaneously. To derive the final output, the outputs from all the attention heads are concatenated along the feature dimension and then multiplied by the output weight matrix W O . Moreover, each attention head possesses its own set of learnable weight matrices, i.e.,  W i Q , W i K , and  W i V , thus enabling the model to learn distinct representations for different attention heads.
The number of attention heads, denoted by h, and the dimensionality of the key vectors, d k , are hyperparameters that affect the capacity of the model to capture intricate dependencies within the data. Finally, the softmax activation function is used to compute the attention scores, thereby facilitating the weighted aggregation of values based on their relevance to the queries.

3.2.4. Learning Objective

We applied pixel-wise reconstruction loss to steer our deep model through a coarse-to-refined reconstruction process. The L1 or L2 distance typically serves as a pixel-wise loss function. Whereas both options are commonly used, the L2 loss is directly related to the peak signal-to-noise ratio (PSNR) and typically yields smoother images [38,39]. However, because low-light images contain significant sensor noise, we selected the L1 objective function as the reconstruction loss.
The reconstruction loss can be represented as follows:
L D = I R I C 1
Here, I C represents the output obtained via M , and I R denotes the reference clean image. This loss function quantifies the absolute differences between the corresponding pixels in the output and reference images, thereby allowing the network to minimize these differences during training.

3.3. Learning Details

The proposed deep network for effective MID was implemented with the PyTorch framework [40]. We optimized our method in the learning phase with an Adam optimizer [41]. We tuned its hyperparameters as β 1 = 0.9 , β 2 = 0.99 . Initially, we set the learning rate for the model as η i = 1 × 10 4 . The proposed method was trained for 50,000 steps with a batch size of 24 with synthesized data (for extensive comparison). We adjusted the learning rate with the ReduceLROnPlateau scheduler [40] by reducing η i by a factor of 0.1. For the training phase, we utilized image patches with dimensions of 128 × 128 × 3 . All experiments were executed on low-end hardware featuring an AMD Ryzen 3200G central processing unit (CPU) operating at 3.6 GHz, complemented by 32 GB of random-access memory and an NVIDIA GeForce GTX 3060 (12 GB) graphical processing unit (GPU).
In addition to tuning the hyperparameters and setting up the hardware, we leveraged a sophisticated training strategy to ensure the proposed model’s convergence with noisy data. As Algorithm 1 shows, the proposed method was trained by generating random noise between 0 and 75 for each mini-batch from the training set D t r a i n . Such random noise generation helps the proposed method avoid overfitting while learning denoising. The training process iterated over 50,000 training attempts, adjusting the learning rate every 2500 steps to facilitate convergence. Additionally, objective loss was computed for each mini-batch, and the model weights were updated using the Adam optimizer. Every 5000 steps, the model weights were saved as checkpoints.
During the training, we observed the convergence of the proposed method by observing the training loss and the PSNR score for each step. Figure 7 illustrates the training process of the proposed method. It can be seen that the proposed method learned to address the noise more precisely with each training step. Besides minimizing the objective loss, the proposed network improved its PSNR performance. This ensured the convergence of the proposed method on the given MID data. In the experiment, we found that the proposed method had converged by 50,000 steps, and training beyond that did not drastically improve the proposed method’s performance. It took less than 24 h to train the proposed method on our hardware.
Algorithm 1 Training algorithm of the proposed method
  • Input: Training set D t r a i n , validation set D v a l
  • Output: Trained deep model M
  • Initialize CNN model M with random weights
  • Initialize learning rate η 0 , initial batch size B 0 , number of steps N steps , learning rate decay factor α
  • σ = 75—maximum standard deviation of Gaussian noise
  • Initialize step counter e = 1
  • for  i = 1 to N e p o c h s  do
  •     if  i mod 2500 = 0  then
  •         Update learning rate: η i = α · η i 1
  •     Sample mini-batches B t r a i n from D t r a i n with augmentation
  •      R N u n i f o r m ( 0 , 75 )
  •      B t r a i n n o i s e ( D t r a i n , R N )
  •     for each mini-batch B t r a i n  do
  •         Compute loss L on B t r a i n
  •         Update weights of M
  •     if  i mod 2 = 5000  then
  •         Save current weights of M as current weights: c u r r e n t _ w e i g h t s = M . get _ weights ( )

4. Experiments

The proposed method was evaluated and compared with existing MID methods to determine its practicability for diverse medical imaging modalities. The performance of the proposed method was qualitatively and quantitatively evaluated using noisy real-world medical images. Furthermore, we evaluated the practicability of the proposed components and analyzed the inference performance of the proposed method via sophisticated experiments.

4.1. Comparison with State-of-the-Art Methods

This section presents a comparison of existing MID methods with the proposed deep method. The proposed method was evaluated using noisy real-world medical images, and its parameters were analyzed to demonstrate its practicability for real-world usage.

4.1.1. Comparison Setup

The MID methods were evaluated using four imaging modalities: MRI, X-ray imaging, microscopy, and CT. We incorporated Gaussian and speckle noise with distinct noise levels (i.e., 20, 25, 50, and 75) into each image sample and summarized the performance using the following evaluation metrics:
  • PSNR: This is commonly used in image denoising to measure the quality of denoised images based on a comparison with the original noisy image. Higher PSNR scores represent better visual quality of generated images. Equation (12) presents the derivation of the PSNR.
    PSNR ( I 1 , I 2 ) = 10 log 10 MAX 2 MSE ( I 1 , I 2 )
    where H , W , I G , and I C represent the image height, image width, ground truth image, and reconstructed image, respectively. The term c is an index of the image channels.
  • SSIM: This is a widely used metric for image quality assessment. This study utilized the SSIM to compare the structural information of generated and ground truth images. A higher SSIM score represents better structural reconstruction. We calculated the SSIM score as follows:
    SSIM ( I G , I C ) = ( 2 μ I G μ I C + c 1 ) ( 2 σ I G I C + c 2 ) ( μ I G 2 + μ I C 2 + c 1 ) ( σ I G 2 + σ I C 2 + c 2 )
    where I G and I C represent the ground truth and denoised images, respectively; μ x and μ y are the mean values of I G and I C , respectively; σ x 2 and σ y 2 are the variances of I G and I C , respectively; σ I G I C is the covariance of I G and I C .
  • LLIPS: In addition to the standard quantitative metrics, we used another well-known perceptual metric, i.e., the LLIPS, to summarize the performance of the deep models in terms of perceptual perspective. Specifically, we leveraged the LLIPS with Alexnet pretrained weights. The reference and denoised images were compared quantitatively by calculating the LLIPS as follows:
    L LLIPS = I G I H 1
We compared the performances of state-of-the-art residual denoising methods (i.e., ResCNN [10], DnCNN [11], MMD [3], MID-DRAN [12]) and image-to-image translation medical image denoising methods (i.e., CAE [9], DAE [13]). It is worth noting that most of the existing medical image denoising methods are not publicly available. Therefore, based on their available implementation information, we implemented the existing denoising methods. Further, we trained all these methods using their suggested hyperparameters and their suggested datasets to cross-check our implementation. We added only those methods for which we were able to reproduce their reported results for a fair comparison. Later, we trained all these methods using the same data samples with their suggested hyperparameters. We tested the performance of these methods and compared them with the proposed network for numerous medical image modalities and noise levels. We summarized the performance of these methods using standard metrics. In addition to the quantitative comparison, we compared the MID methods with visual comparisons. Therefore, the strengths and weaknesses of each method can be quantified with visual observations.

4.1.2. Quantitative Evaluation

Table 2 presents a quantitative comparison between the state-of-the-art MID models and the proposed method. As shown, the proposed method outperformed existing techniques substantially for numerous medical imaging modalities. Notably, the proposed method demonstrated consistency for all noise levels. In addition to the conventional evaluation metrics, such as the PSNR and SSIM, the proposed method is superior in terms of the perceptual evaluation metrics. The proposed method can achieve a higher fidelity score than existing methods. Compared with its counterpart MID model, it achieved higher PSNR, SSIM, and LLIPs values by 8.79 dB, 0.07, and 0.09, respectively. The significant improvement of the proposed method compared with existing methods for numerous modalities and noise levels confirm its practicability for generic cases.
In contrast to the proposed method, existing MID models are inconsistent when confronted with diverse noise levels. Furthermore, their performance can vary depending on the imaging modality. For instance, DRAN demonstrates promising performance at low noise levels (i.e., σ = 10 , 25 ). It dominates the existing MID models for such noise levels. However, AED outperformed DRAN at extreme noise levels (i.e., σ = 50 , 75 ). These experimental results further confirm the limitations of existing MID methods under diverse noise patterns. By contrast, our proposed method can manage realistic diverse noise patterns, thus outperforming its counterparts.

4.1.3. Qualitative Evaluation

We extensively evaluated the MID models to quantify their strengths and weaknesses for numerous medical imaging modalities. Figure 8 shows a subjective comparison among MID models. As shown, existing image-to-image translation models tend to yield blurry images in Gaussian denoising. By contrast, the residual models demonstrate color discretion. In general, the existing models yield implausible images with visually disturbing artifacts. The proposed method addresses both limitations via an effective denoising network. Notably, the proposed DWR module and MHA-guided reconstruction enable the proposed model to yield sharp, clear, and plausible medical images. The proposed model is superior for all compared modalities. It can denoise medical images without generating visual artifacts, even under a high noise proportion (i.e., σ = 50 ). A qualitative comparison confirmed the practicability of the proposed method for generic MID in real-world applications. Notably, the performance of the proposed method was consistent across numerous imaging modalities. This consistency indicates that the proposed method can be leveraged for any imaging modality, particularly those that incorporate noise patterns such as Gaussian noise.

4.2. Real-World MID

We evaluated our method using real-world noisy CT images [33,42,43] and synthesized noisy images. Therefore, the model was tuned to noisy real-world medical images. The proposed method was retrained by leveraging transfer learning with low-dose sharp kernel CT images. We used 15,824 images from the dataset presented in [42] to tune the proposed method for real-world denoising. Notably, the training samples comprised sharp and soft kernel images with 1 and 3 mm capture settings. We regarded the full-dose images as the reference clean images and the quarter-dose images as noisy inputs, as suggested for previous methods. Additionally, we used 500 samples from each kernel (sharp and soft) and their subcategories (1 and 3 mm) to perform quantitative and qualitative comparisons.
Table 3 presents a quantitative comparison between images yielded by the proposed method and input low-dose images. The proposed method effectively denoised noisy images and significantly improved their quality. Across all the subcategories, the proposed method substantially improved the quality of the noisy images. In particular, the proposed method improved the low-dose noisy images in terms of the PSNR, SSIM, and LLIPS by 5.88 dB, 0.15, and 0.02, respectively. The proposed method not only performed denoising but also significantly improved the structural quality, as indicated by the metrics. In addition to structural improvements, the proposed method can improve the perceptual quality of noisy real-world images. The notable performance of the proposed method confirms its practical usage in widespread medical applications and diagnostic processes.
In addition to a quantitative comparison, we performed a visual comparison, as shown in Figure 9. The proposed method proved to be superior for real-world MID. In particular, it yielded clearer and more visually plausible images than the inputs for all subcategories. In complex spatial regions, it maintained the salient information. Additionally, it generated cleaner images and ensured perceptual quality. The performance demonstrated by the proposed method in real-world MID confirms its applicability beyond synthetic datasets. The proposed method can be leveraged in real-world applications, including computer-aided diagnosis (CAD) applications [12], to shift medical imaging to a new paradigm.

4.3. Real-World Application

Medical image denoising has many real-world implications. A sophisticated denoising method can substantially improve the diagnosis process for medical experts by enhancing the medical images. In addition, the common noise in medical images can deteriorate the performance of computer-aided diagnosis systems such as segmentation, detection, etc. To further confirm the application of denoising in CAD application, we incorporated our proposed method to improve real-world noisy medical images. Further, we studied the state-of-the-art detection method (i.e., Yolo-V8 [44]) on red blood cells (RBCs), white blood cells (WBCs), and platelets in blood cell images. Table 4 illustrates the performance of Yolo-V8 on the RBC and WBC blood cell detection dataset [45]. It can be seen that the proposed method significantly improves the detection performance by reducing noise from the original images. Overall, the proposed method improves the performance of the Yolo-V8 (small) model in the RBC and WBC blood cell detection dataset. The performance improves by 0.06 for a mean average precision of 50.

4.4. Inference Analysis

The results presented in Table 5 provide a comprehensive overview of the computational efficiency and performance of the proposed two-stage denoising method, which showcases its superiority over existing denoising techniques while remaining highly computationally efficient. Featuring only 12.54 million trainable parameters, the proposed method balances model complexity and computational overhead, thus rendering it a promising solution for practical deployment.
The fact that the proposed method has 12.54 million trainable parameters indicates its ability to process data efficiently without imposing an excessive computational burden. This efficiency translates into real-time performance, as evidenced by the mere 9.56 ms to denoise an input image measuring 128 × 128 × 3 pixels. Notably, the proposed method is fully convolutional. Therefore, it does not incorporate pre- or post-processing. Consequently, the inference time is expected to remain constant for similar hardware (such as ours). However, network optimization techniques and more efficient hardware will allow the proposed network to operate faster and more efficiently for specific use cases.
These results underscore the feasibility of integrating the proposed method into real-world applications, where performance and computational efficiency are paramount. By offering a significant performance gain over existing denoising methods while maintaining computational frugality, the proposed approach is promising for various applications ranging from image processing in consumer electronics to medical imaging [12]. This balance between efficacy and efficiency renders the method a compelling option for addressing real-world denoising challenges.

4.5. Ablation Study

An ablation study was performed to evaluate the practicability of the novel component and the proposed MHA-guided reconstruction mechanism. Therefore, the proposed DWR block was replaced with a vanilla residual block, and the MHA was removed from the proposed architecture. Table 6 presents the performance of the proposed network with and without its novel components. The base model incorporates the vanilla residual block without MHA in the decoder. The DWR variants of the proposed network incorporate a DWR block into the encoder and decoder. As shown, the proposed DWR block significantly improved the performance of deep networks with residual blocks. Additionally, the proposed MHA block in the decoder further enhanced the performance of the proposed model. Notably, the contributions of the proposed modules and mechanisms were independent of the imaging modalities. Consequently, the practicability of these modules and mechanisms, independent of the imaging type, can improve MID performance for any imaging modality.
In addition to a quantitative evaluation, we visually compared the effects of the proposed components. Figure 10 shows the denoising comparison between the proposed network and its variants. As shown, the proposed DWR block facilitated the proposed method to mitigate imaging noise as compared with its vanilla counterpart. Additionally, the proposed MHA-guided reconstruction enabled the proposed method to leverage the salient features of the DWR module for reconstructing visually plausible images. In general, the proposed modules substantially improved the performance of the proposed network and addressed the limitations of conventional MID methods.

4.6. Discussion

The proposed method reveals the limitations of existing MID models. It proposes a novel DWR module and a transformer attention module to achieve effective multimodal MID. Additionally, it demonstrates that a transformer-based attention module with a vanilla CNN can be extremely effective for multimodal MID. In addition to its significant improvements over conventional deep models, the proposed model is computationally effective. It comprises only 12.54 million trainable parameters and requires only 31 ms for denoising a medical image using mid-level hardware. Notably, the proposed model was utilized without incorporating any model optimization, such as quantization and pruning. Therefore, the inference speed of the proposed method can be substantially improved by leveraging optimization techniques.
In addition to learning and testing the proposed method on x64 architectures, it can be deployed on edge devices [46]. Such deployment and the leveraging of edge devices for MID can further advance MID research. An efficient MID method for edge platforms can significantly facilitate CAD applications [47]. Additionally, such an optimized network should allow edge vision developers to develop optimized and portable medical image enhancement devices [48]. The evaluation and optimization of the proposed method for edge devices would be an interesting research direction.
In addition to edge optimization, the proposed method focuses on the most common noise patterns in two-dimensional images. However, the proposed model can be adjusted to manage three-dimensional (3D) medical images, thus resulting in more sophisticated MID. Future studies are planned to apply the proposed method to 3D MID.

5. Conclusions

In this study, a novel MID method that leverages end-to-end deep learning to perform denoising in diverse medical imaging modalities was proposed. The proposed method incorporates an efficient residual block with dilation to capture long-distance pixel-wise dependencies and mitigate extreme noise from medical images. Additionally, it proposes utilizing MHA to leverage the salient features extracted by the proposed DWR module to obtain plausible images. The proposed method can denoise medical images without generating visual artifacts and can yield clean images that are similar to the reference images. The practicability of the proposed method was extensively evaluated using synthesized and noisy real-world medical images. The proposed method outperformed existing methods based on both quantitative and qualitative comparisons. Studies have been planned to investigate the practicability of the proposed method for 3D medical images and edge devices.

Author Contributions

Conceptualization, R.A.N. and A.H.; methodology, R.A.N., D.J. and S.-W.L.; validation, A.H. and H.S.K.; investigation, A.H. and H.S.K.; resources, D.J. and S.-W.L.; writing—original draft preparation, R.A.N., A.H. and S.-W.L.; writing—review and editing, H.S.K. and D.J.; supervision, D.J.; project administration, D.J. and S.-W.L.; funding acquisition, R.A.N. and H.S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a National Research Foundation (NRF) grant funded by the Ministry of Science and ICT (MSIT), Republic of Korea, through the Development Research Program (NRF2022R1G1A1010226 and NRF2021R1I1A2059735).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Lee, G.; Fujita, H. Deep Learning in Medical Image Analysis: Challenges and Applications; Springer: Cham, Switzerland, 2020; Volume 1213. [Google Scholar]
  2. Kulathilake, K.S.H.; Abdullah, N.A.; Sabri, A.Q.M.; Bandara, A.R.; Lai, K.W. A review on self-adaptation approaches and techniques in medical image denoising algorithms. Multimed. Tools Appl. 2022, 81, 37591–37626. [Google Scholar] [CrossRef]
  3. El-Shafai, W.; Mahmoud, A.; Ali, A.; El-Rabaie, E.; Taha, T.; Zahran, O.; El-Fishawy, A.; Soliman, N.; Alhussan, A.; Abd El-Samie, F. Deep cnn model for multimodal medical image denoising. Comput. Mater. Contin. 2022, 73, 3795–3814. [Google Scholar] [CrossRef]
  4. Wang, J.; Guo, Y.; Ying, Y.; Liu, Y.; Peng, Q. Fast non-local algorithm for image denoising. In Proceedings of the 2006 International Conference on Image Processing, Atlanta, GA, USA, 8–11 October 2006; pp. 1429–1432. [Google Scholar]
  5. Elad, M.; Aharon, M. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 2006, 15, 3736–3745. [Google Scholar] [CrossRef] [PubMed]
  6. Arif, A.S.; Mansor, S.; Logeswaran, R. Combined bilateral and anisotropic-diffusion filters for medical image de-noising. In Proceedings of the 2011 IEEE Student Conference on Research and Development, Cyberjaya, Malaysia, 19–20 December 2011; IEEE: New York, NY, USA, 2011; pp. 420–424. [Google Scholar]
  7. Bhonsle, D.; Chandra, V.; Sinha, G. Medical image denoising using bilateral filter. Int. J. Image Graph. Signal Process. 2012, 4, 36. [Google Scholar] [CrossRef]
  8. Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef]
  9. Gondara, L. Medical image denoising using convolutional denoising autoencoders. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), Barcelona, Spain, 12–15 December 2016; pp. 241–246. [Google Scholar]
  10. Jifara, W.; Jiang, F.; Rho, S.; Cheng, M.; Liu, S. Medical image denoising using convolutional neural network: A residual learning approach. J. Supercomput. 2019, 75, 704–718. [Google Scholar] [CrossRef]
  11. Jiang, D.; Dou, W.; Vosters, L.; Xu, X.; Sun, Y.; Tan, T. Denoising of 3D magnetic resonance images with multi-channel residual learning of convolutional neural network. Jpn. J. Radiol. 2018, 36, 566–574. [Google Scholar] [CrossRef] [PubMed]
  12. Sharif, S.; Naqvi, R.A.; Biswas, M. Learning medical image denoising with deep dynamic residual attention network. Mathematics 2020, 8, 2192. [Google Scholar] [CrossRef]
  13. El-Shafai, W.; El-Nabi, S.A.; El-Rabaie, E.S.M.; Ali, A.M.; Soliman, N.F.; Algarni, A.D.; El-Samie, A.; Fathi, E. Efficient Deep-Learning-Based Autoencoder Denoising Approach for Medical Image Diagnosis. Comput. Mater. Contin. 2022, 70, 6107–6125. [Google Scholar] [CrossRef]
  14. Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17683–17693. [Google Scholar]
  15. Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar]
  16. Suganyadevi, S.; Seethalakshmi, V.; Balasamy, K. A review on deep learning in medical image analysis. Int. J. Multimed. Inf. Retr. 2022, 11, 19–38. [Google Scholar] [CrossRef]
  17. Patil, R.; Bhosale, S. Medical image denoising techniques: A review. Int. J. Eng. Sci. Technol. (IJonEST) 2022, 4, 21–33. [Google Scholar] [CrossRef]
  18. Chen, H.; Zhang, Y.; Kalra, M.K.; Lin, F.; Chen, Y.; Liao, P.; Zhou, J.; Wang, G. Low-dose CT with a residual encoder-decoder convolutional neural network. IEEE Trans. Med. Imaging 2017, 36, 2524–2535. [Google Scholar] [CrossRef]
  19. Fan, F.; Shan, H.; Kalra, M.K.; Singh, R.; Qian, G.; Getzin, M.; Teng, Y.; Hahn, J.; Wang, G. Quadratic autoencoder (Q-AE) for low-dose CT denoising. IEEE Trans. Med. Imaging 2019, 39, 2035–2050. [Google Scholar] [CrossRef] [PubMed]
  20. Hyun, C.M.; Kim, H.P.; Lee, S.M.; Lee, S.; Seo, J.K. Deep learning for undersampled MRI reconstruction. Phys. Med. Biol. 2018, 63, 135007. [Google Scholar] [CrossRef]
  21. Kidoh, M.; Shinoda, K.; Kitajima, M.; Isogawa, K.; Nambu, M.; Uetani, H.; Morita, K.; Nakaura, T.; Tateishi, M.; Yamashita, Y.; et al. Deep learning based noise reduction for brain MR imaging: Tests on phantoms and healthy volunteers. Magn. Reson. Med. Sci. 2020, 19, 195. [Google Scholar] [CrossRef]
  22. Rawat, S.; Rana, K.; Kumar, V. A novel complex-valued convolutional neural network for medical image denoising. Biomed. Signal Process. Control 2021, 69, 102859. [Google Scholar] [CrossRef]
  23. Ghahremani, M.; Khateri, M.; Sierra, A.; Tohka, J. Adversarial distortion learning for medical image denoising. arXiv 2022, arXiv:2204.14100. [Google Scholar]
  24. Zhou, B.; Tsai, Y.J.; Chen, X.; Duncan, J.S.; Liu, C. MDPET: A unified motion correction and denoising adversarial network for low-dose gated PET. IEEE Trans. Med. Imaging 2021, 40, 3154–3164. [Google Scholar] [CrossRef] [PubMed]
  25. Li, Y.; Zhang, K.; Shi, W.; Miao, Y.; Jiang, Z. A Novel Medical Image Denoising Method Based on Conditional Generative Adversarial Network. Comput. Math. Methods Med. 2021, 2021, 9974017. [Google Scholar] [CrossRef] [PubMed]
  26. Chi, J.; Wu, C.; Yu, X.; Ji, P.; Chu, H. Single low-dose CT image denoising using a generative adversarial network with modified U-Net generator and multi-level discriminator. IEEE Access 2020, 8, 133470–133487. [Google Scholar] [CrossRef]
  27. Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [PubMed]
  28. Kokil, P.; Sudharson, S. Despeckling of clinical ultrasound images using deep residual learning. Comput. Methods Programs Biomed. 2020, 194, 105477. [Google Scholar] [CrossRef]
  29. Irvin, J.; Rajpurkar, P.; Ko, M.; Yu, Y.; Ciurea-Ilcus, S.; Chute, C.; Marklund, H.; Haghgoo, B.; Ball, R.; Shpanskaya, K.; et al. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 590–597. [Google Scholar]
  30. Buda, M.; Saha, A.; Mazurowski, M.A. Association of genomic subtypes of lower-grade gliomas with shape features automatically extracted by a deep learning algorithm. Comput. Biol. Med. 2019, 109, 218–225. [Google Scholar] [CrossRef]
  31. Yang, X.; He, X.; Zhao, J.; Zhang, Y.; Zhang, S.; Xie, P. Covid-ct-dataset: A ct scan dataset about covid-19. arXiv 2020, arXiv:2003.13865. [Google Scholar]
  32. Uhlen, M.; Oksvold, P.; Fagerberg, L.; Lundberg, E.; Jonasson, K.; Forsberg, M.; Zwahlen, M.; Kampf, C.; Wester, K.; Hober, S.; et al. Towards a knowledge-based human protein atlas. Nat. Biotechnol. 2010, 28, 1248–1250. [Google Scholar] [CrossRef] [PubMed]
  33. Sun, H.; Peng, L.; Zhang, H.; He, Y.; Cao, S.; Lu, L. Dynamic PET image denoising using deep image prior combined with regularization by denoising. IEEE Access 2021, 9, 52378–52392. [Google Scholar] [CrossRef]
  34. Gao, F.; Wu, T.; Chu, X.; Yoon, H.; Xu, Y.; Patel, B. Deep Residual Inception Encoder–Decoder Network for Medical Imaging Synthesis. IEEE J. Biomed. Health Inform. 2019, 24, 39–49. [Google Scholar] [CrossRef] [PubMed]
  35. Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
  36. Sharif, S.; Naqvi, R.A.; Ali, F.; Biswas, M. DarkDeblur: Learning single-shot image deblurring in low-light condition. Expert Syst. Appl. 2023, 222, 119739. [Google Scholar] [CrossRef]
  37. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  38. Kınlı, F.; Menteş, S.; Özcan, B.; Kıraç, F.; Timofte, R.; Zuo, Y.; Wang, Z.; Zhang, X.; Zhu, Y.; Li, C.; et al. AIM 2022 challenge on Instagram filter removal: Methods and results. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 27–43. [Google Scholar]
  39. Sharif, S.; Naqvi, R.A.; Loh, W.K. Two-Stage Deep Denoising With Self-guided Noise Attention for Multimodal Medical Images. IEEE Trans. Radiat. Plasma Med. Sci. 2024, 8, 521–531. [Google Scholar] [CrossRef]
  40. Pytorch. PyTorch Framework Code. 2016. Available online: https://pytorch.org/ (accessed on 24 April 2024).
  41. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  42. McCollough, C. TU-FG-207A-04: Overview of the low dose CT grand challenge. Med. Phys. 2016, 43, 3759–3760. [Google Scholar] [CrossRef]
  43. Ma, Y.; Wei, B.; Feng, P.; He, P.; Guo, X.; Wang, G. Low-dose CT image denoising using a generative adversarial network with a hybrid loss function for noise learning. IEEE Access 2020, 8, 67519–67529. [Google Scholar] [CrossRef]
  44. Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 16 July 2024).
  45. TFG. YOLO Dataset. 2022. Available online: https://universe.roboflow.com/tfg-2nmge/yolo-yejbs (accessed on 14 July 2024).
  46. Sharif, S.; Mobin, I.; Mohammed, N. Augmented quick health. Int. J. Comput. Appl. 2016, 134, 1–6. [Google Scholar] [CrossRef]
  47. Dong, G.; Ma, Y.; Basu, A. Feature-guided CNN for denoising images from portable ultrasound devices. IEEE Access 2021, 9, 28272–28281. [Google Scholar] [CrossRef]
  48. Sakib, S.; Fouda, M.M.; Al-Mahdawi, M.; Mohsen, A.; Oogane, M.; Ando, Y.; Fadlullah, Z.M. Deep learning models for magnetic cardiography edge sensors implementing noise processing and diagnostics. IEEE Access 2021, 10, 2656–2668. [Google Scholar] [CrossRef]
Figure 1. Comparison between existing MID and the proposed method. Existing denoising methods typically yield smooth denoising results with visual artifacts. The proposed method can clean noisy medical images and address the limitations of existing methods. Left to right: noisy input, AED [9], ResCNN [10], DnCNN [11], MIDDRAN [12], DAE [13], MMD [3], the proposed method, and the reference image.
Figure 1. Comparison between existing MID and the proposed method. Existing denoising methods typically yield smooth denoising results with visual artifacts. The proposed method can clean noisy medical images and address the limitations of existing methods. Left to right: noisy input, AED [9], ResCNN [10], DnCNN [11], MIDDRAN [12], DAE [13], MMD [3], the proposed method, and the reference image.
Mathematics 12 02313 g001
Figure 2. Representative images obtained via each imaging modality: (a) X-ray; (b) MRI; (c) CT; (d) microscopy.
Figure 2. Representative images obtained via each imaging modality: (a) X-ray; (b) MRI; (c) CT; (d) microscopy.
Mathematics 12 02313 g002
Figure 3. Gaussian noise simulation for learning medical image denoising. This study incorporated noise simulation to learn and evaluate MID methods using numerous medical imaging modalities. From left to right: clean image, random noise (simulated), and noisy image (clean image + generated noise).
Figure 3. Gaussian noise simulation for learning medical image denoising. This study incorporated noise simulation to learn and evaluate MID methods using numerous medical imaging modalities. From left to right: clean image, random noise (simulated), and noisy image (clean image + generated noise).
Mathematics 12 02313 g003
Figure 4. Overview of the proposed novel MID network. The proposed method allows the network to encode salient features in high-dimensional space and to learn to reconstruct clean images by decoding the encoded features. The proposed network incorporates a novel DWR module to capture long-distance pixel dependencies and an MHA block to perform effective reconstruction.
Figure 4. Overview of the proposed novel MID network. The proposed method allows the network to encode salient features in high-dimensional space and to learn to reconstruct clean images by decoding the encoded features. The proposed network incorporates a novel DWR module to capture long-distance pixel dependencies and an MHA block to perform effective reconstruction.
Mathematics 12 02313 g004
Figure 5. Comparison between vanilla residual blocks and proposed DWR. Proposed DWR block design captures long-distance pixel dependencies to learn efficient denoising. (a) Residual block; (b) bottleneck residual block; (c) proposed deep–wider residual block.
Figure 5. Comparison between vanilla residual blocks and proposed DWR. Proposed DWR block design captures long-distance pixel dependencies to learn efficient denoising. (a) Residual block; (b) bottleneck residual block; (c) proposed deep–wider residual block.
Mathematics 12 02313 g005
Figure 6. Overview of proposed MHA, which enables proposed network to reconstruct clean and artifact-free medical images while performing denoising.
Figure 6. Overview of proposed MHA, which enables proposed network to reconstruct clean and artifact-free medical images while performing denoising.
Mathematics 12 02313 g006
Figure 7. Learning process of proposed network. Proposed method was trained for 50,000 steps. Convergence was determined by considering training loss and PSNR scores. (a) Training loss vs. steps; (b) PSNR vs. steps.
Figure 7. Learning process of proposed network. Proposed method was trained for 50,000 steps. Convergence was determined by considering training loss and PSNR scores. (a) Training loss vs. steps; (b) PSNR vs. steps.
Mathematics 12 02313 g007
Figure 8. Comparison between deep medical image denoising methods. Existing denoising methods tend to yield smooth denoising results with visual artifacts. Proposed method can clean noisy medical images and address limitations of existing methods. Left to right: noisy input, AED [9], ResCNN [10], DnCNN [11], MIDDRAN [12], DAE [13], MMD [3], proposed method, and reference image.
Figure 8. Comparison between deep medical image denoising methods. Existing denoising methods tend to yield smooth denoising results with visual artifacts. Proposed method can clean noisy medical images and address limitations of existing methods. Left to right: noisy input, AED [9], ResCNN [10], DnCNN [11], MIDDRAN [12], DAE [13], MMD [3], proposed method, and reference image.
Mathematics 12 02313 g008
Figure 9. Performance of proposed method in real-world noisy MID. Proposed method can manage real-world noise. In each pair, left represents noisy input and right represents image denoised by proposed method.
Figure 9. Performance of proposed method in real-world noisy MID. Proposed method can manage real-world noise. In each pair, left represents noisy input and right represents image denoised by proposed method.
Mathematics 12 02313 g009
Figure 10. Ablation study on proposed network. Proposed DWR facilitates deep network to learn to mitigate noise by leveraging long-distance pixel dependencies. Proposed MHA block aims to reconstruct plausible, clean images by exploiting salient features extracted by proposed DWR module. From left to right, the Input image, base network (without DWR + MHA), DWR network (without MHA block), the proposed deep network (DWR + MHA), and the reference image.
Figure 10. Ablation study on proposed network. Proposed DWR facilitates deep network to learn to mitigate noise by leveraging long-distance pixel dependencies. Proposed MHA block aims to reconstruct plausible, clean images by exploiting salient features extracted by proposed DWR module. From left to right, the Input image, base network (without DWR + MHA), DWR network (without MHA block), the proposed deep network (DWR + MHA), and the reference image.
Mathematics 12 02313 g010
Table 1. Comparison between existing denoising methods and proposed two-stage network.
Table 1. Comparison between existing denoising methods and proposed two-stage network.
MethodLearning StrategyStrengthsWeaknesses
Image-to-image translationTranslates noisy image into clean image
  • Can outperform conventional (non-deep-learning) approaches
  • Easy to train and infer
  • Tends to yield smooth images with fewer details
  • Limited to specific noise types/modality
Residual denoisingLearns underlying noise from noisy image
  • Can achieve sharper images
  • Well-known for Gaussian denoising
  • Yields visual artifacts in extreme cases (high noise)
  • Can estimate only a specific noise pattern
Proposed methodDenoises medical images utilizing DWR and MHA
  • Outperforms existing MID methods in visual and quantitative comparison
  • Modality-independent deep denoiser that can manage real and synthetic data
  • Computationally lightweight
  • Optimized for desktop-class hardware
Table 2. Quantitative comparison between existing MID models and the proposed deep network. The proposed method outperforms the existing models by a large margin for MID. Notably, the performance of the proposed method is consistent overall when comparing noise levels and imaging modalities.
Table 2. Quantitative comparison between existing MID models and the proposed deep network. The proposed method outperforms the existing models by a large margin for MID. Notably, the performance of the proposed method is consistent overall when comparing noise levels and imaging modalities.
Model σ ChexpertCTMRIMicroscopyCombined
PSNR↑SSIM↑LLIPS↓PSNR↑SSIM↑LLIPS↓PSNR↑SSIM↑LLIPS↓PSNR↑SSIM↑LLIPS↓PSNR↑SSIM↑LLIPS↓
AED1030.430.91780.107827.390.88820.136133.930.93750.068032.070.90940.078430.950.91320.0976
DnCNN26.190.78120.278623.290.67630.226026.530.71310.169730.340.86600.093326.590.75920.1919
ResCNN24.770.74550.332423.920.72140.185926.680.75170.161030.640.87100.101226.500.77240.1951
DRAN33.350.92360.062236.720.96240.016235.150.94420.040937.110.96930.033035.580.94990.0381
MMD27.630.85370.177125.050.74980.158224.880.68480.221129.550.85140.122426.780.78490.1697
DAE24.100.83830.196819.000.80080.175229.720.80280.147527.610.64170.216925.110.77090.1841
Proposed37.190.96850.013041.550.98560.003742.550.98190.006243.130.98920.005341.110.98130.0070
AED2530.510.91500.104627.090.87090.148833.720.93180.071031.950.90530.079530.820.90570.1010
DnCNN28.010.82990.207425.220.74480.174327.820.76890.175729.950.85410.124627.750.79940.1705
ResCNN29.040.86030.167626.500.80100.126128.840.77040.199330.630.84490.137928.750.81920.1578
DRAN30.840.88280.111032.010.89500.065734.380.92700.072834.910.94080.050933.030.91140.0751
MMD30.280.87490.133626.840.79290.136727.980.77090.178031.140.87590.108029.060.82870.1391
DAE24.670.83890.180619.170.79380.172729.010.79540.185727.600.65420.200825.110.77060.1849
Proposed36.940.96700.014540.070.98250.004640.830.97870.008241.250.98410.007639.770.97810.0087
AED5030.220.90710.110827.270.87780.143133.280.92110.080231.730.89930.086030.630.90130.1051
DnCNN28.100.83350.216626.550.81340.133026.830.71710.280527.200.75430.215127.170.77960.2113
ResCNN29.270.87810.158927.650.85060.111126.750.61550.307527.060.64170.250127.680.74650.2069
DRAN26.270.78000.254227.940.82290.136632.800.87560.151233.000.87850.087630.000.83930.1574
MMD27.940.85890.190725.670.77450.162826.330.61370.278926.800.63470.220826.680.72050.2133
DAE24.030.80550.208018.890.77200.189728.000.74920.272027.540.65100.202924.620.74440.2182
Proposed36.750.96590.016039.200.98040.005639.830.97570.011039.640.97930.010438.850.97530.0107
AED7529.950.90000.119527.420.88530.137632.850.91050.091231.520.89380.094030.440.89740.1106
DnCNN26.530.81460.244125.610.81760.145821.170.31790.453121.280.30960.447323.650.56490.3226
ResCNN27.050.83920.231226.570.82610.139520.530.28310.490620.700.28600.490323.710.55860.3379
DRAN23.890.69180.373827.320.81580.159531.030.79460.249431.750.81510.138828.500.77940.2304
MMD26.090.80390.276025.210.79150.170021.130.30850.449021.290.30650.430323.430.55260.3313
DAE22.920.76810.259518.490.75070.214327.550.71300.330927.090.61420.235024.010.71150.2599
Proposed36.570.96530.016438.660.97890.006338.970.97030.013938.790.97630.012438.250.97270.0122
AEDAvg.30.280.91000.110727.290.88060.141433.450.92520.077631.820.90200.084530.710.90440.1035
DnCNN27.210.81480.236625.170.76300.169825.580.62920.269827.190.69600.220126.290.72580.2241
ResCNN27.530.83080.222526.160.79980.140725.700.60520.289627.260.66090.244926.660.72420.2244
DRAN28.590.81960.200331.000.87400.094533.340.88540.128634.190.90090.077631.780.87000.1252
MMD27.980.84780.194325.690.77710.156925.080.59450.281827.200.66710.220426.490.72160.2133
DAE23.930.81270.211218.890.77930.188028.570.76510.234027.460.64030.213924.710.74930.2118
Proposed36.870.96670.015039.870.98190.005040.540.97670.009840.700.98220.008939.500.97690.0097
Table 3. Quantitative performance of proposed model for real-world noisy MID. Proposed method substantially improved quality of noisy real-world images.
Table 3. Quantitative performance of proposed model for real-world noisy MID. Proposed method substantially improved quality of noisy real-world images.
KernelWavelengthMethodPSNR↑SSIM↑LLIPS↓
Soft1 mmInput36.310.87990.0802
Proposed40.710.95430.0431
3 mmInput36.290.88320.0777
Proposed40.800.95560.0414
Sharp1 mmInput28.530.67680.1342
Proposed34.900.84620.1180
3 mmInput28.550.67510.1354
Proposed34.770.84590.1175
Combine1 mmInput32.420.77830.1072
Proposed37.810.90030.0806
3 mmInput32.420.77910.1066
Proposed37.790.90070.0795
Average1 mm/3 mmInput30.470.72750.1207
Proposed36.350.87320.0993
Table 4. The proposed method was used to determine the performance of the Yolo-V8 (small) model on the RBC and WBC blood cell detection dataset on real-world noisy medical images and denoised images. The proposed method can significantly improve the detection accuracy of the Yolo-V8 model by reducing the noise that commonly contaminates medical images.
Table 4. The proposed method was used to determine the performance of the Yolo-V8 (small) model on the RBC and WBC blood cell detection dataset on real-world noisy medical images and denoised images. The proposed method can significantly improve the detection accuracy of the Yolo-V8 model by reducing the noise that commonly contaminates medical images.
InputClassBox (Precision)RecallmAP (50)mAP (50–95)
Original [45]Platelets0.82400.81500.85500.4620
RBC0.74800.74400.78700.5770
WBC0.98300.88400.91400.7880
All0.85100.81400.85200.6090
EnhancedPlatelets0.87300.83800.91200.4720
RBC0.74900.82600.85900.6200
WBC0.98000.98300.98400.8290
All0.86800.88200.91800.6400
Table 5. The inference and parameter analysis of the proposed network. In addition to illustrating a significant performance improvement over existing methods, the proposed method is also computationally efficient. It comprises only 12.54 million trainable parameters and takes less than 10 ms to denoise a medical image on mid-level hardware.
Table 5. The inference and parameter analysis of the proposed network. In addition to illustrating a significant performance improvement over existing methods, the proposed method is also computationally efficient. It comprises only 12.54 million trainable parameters and takes less than 10 ms to denoise a medical image on mid-level hardware.
Dimension 128 × 128 × 3 256 × 256 × 3 512 × 512 × 3
Flops (G)17.4269.68278.74
Gmacs16.2264.90259.59
Parameters (M)12.54
Inference Time (ms)9.5631.80119.99
Table 6. Quantitative evaluation of proposed components for different medical imaging modalities. Proposed modules substantially improved performance of proposed deep network, thus allowing proposed network perform consistently across numerous imaging modalities.
Table 6. Quantitative evaluation of proposed components for different medical imaging modalities. Proposed modules substantially improved performance of proposed deep network, thus allowing proposed network perform consistently across numerous imaging modalities.
Model σ ChexpertCTMRIMicroscopyCombined
PSNR↑SSIM↑LLIPS↓PSNR↑SSIM↑LLIPS↓PSNR↑SSIM↑LLIPS↓PSNR↑SSIM↑LLIPS↓PSNR↑SSIM↑LLIPS↓
Base1020.610.91250.059118.420.87610.061431.210.68520.111130.450.62840.135225.170.77560.0917
DWR35.900.95880.031236.750.96430.018338.330.94330.018240.210.94160.011637.800.95200.0198
Proposed37.190.96850.013041.550.98560.003742.550.98190.006243.130.98920.005341.110.98130.0070
Base2520.640.82300.241918.180.76100.169736.020.91680.033823.280.34920.436424.530.71250.2204
DWR35.370.95240.034635.690.95440.020224.280.39250.339337.800.90360.012233.280.80070.1016
Proposed36.940.96700.014540.070.98250.004640.830.97870.008241.250.98410.007639.770.97810.0087
Base5019.380.64320.531617.220.63180.351117.840.20480.627517.640.19630.746118.020.41900.5641
DWR34.110.94310.047633.820.93990.030633.770.87140.069834.920.85050.024434.150.90120.0431
Proposed36.750.96590.016039.200.98040.005639.830.97570.011039.640.97930.010438.850.97530.0107
Base7517.930.53800.693816.220.56220.475414.710.13950.795414.510.13440.883415.840.34350.7120
DWR20.590.62000.596320.940.68040.362615.400.14940.751115.210.14550.858218.040.39880.6420
Proposed36.570.96530.016438.660.97890.006338.970.97030.013938.790.97630.012438.250.97270.0122
BaseAvg.19.640.72920.381617.510.70780.264424.950.48660.391921.470.32710.550320.890.56260.3971
DWR31.490.86850.177431.800.88470.107927.940.58920.294632.030.71030.226630.820.76320.2016
Proposed36.870.96670.015039.870.98190.005040.540.97670.009840.700.98220.008939.500.97690.0097
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Naqvi, R.A.; Haider, A.; Kim, H.S.; Jeong, D.; Lee, S.-W. Transformative Noise Reduction: Leveraging a Transformer-Based Deep Network for Medical Image Denoising. Mathematics 2024, 12, 2313. https://doi.org/10.3390/math12152313

AMA Style

Naqvi RA, Haider A, Kim HS, Jeong D, Lee S-W. Transformative Noise Reduction: Leveraging a Transformer-Based Deep Network for Medical Image Denoising. Mathematics. 2024; 12(15):2313. https://doi.org/10.3390/math12152313

Chicago/Turabian Style

Naqvi, Rizwan Ali, Amir Haider, Hak Seob Kim, Daesik Jeong, and Seung-Won Lee. 2024. "Transformative Noise Reduction: Leveraging a Transformer-Based Deep Network for Medical Image Denoising" Mathematics 12, no. 15: 2313. https://doi.org/10.3390/math12152313

APA Style

Naqvi, R. A., Haider, A., Kim, H. S., Jeong, D., & Lee, S. -W. (2024). Transformative Noise Reduction: Leveraging a Transformer-Based Deep Network for Medical Image Denoising. Mathematics, 12(15), 2313. https://doi.org/10.3390/math12152313

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop