Breast Ultrasound Images Augmentation and Segmentation Using GAN with Identity Block and Modified U-Net 3+

Alruily, Meshrif; Said, Wael; Mostafa, Ayman Mohamed; Ezz, Mohamed; Elmezain, Mahmoud

doi:10.3390/s23208599

Open AccessArticle

Breast Ultrasound Images Augmentation and Segmentation Using GAN with Identity Block and Modified U-Net 3+

by

Meshrif Alruily

¹,

Wael Said

^2,3

,

Ayman Mohamed Mostafa

^1,*

,

Mohamed Ezz

¹

and

Mahmoud Elmezain

^4,5

¹

College of Computer and Information Sciences, Jouf University, Sakaka 72388, Saudi Arabia

²

Computer Science Department, Faculty of Computers and Informatics, Zagazig University, Zagazig 44511, Egypt

³

Computer Science Department, College of Computer Science and Engineering, Taibah University, Medina 42353, Saudi Arabia

⁴

Computer Science Department, Faculty of Science, Tanta University, Tanta 31527, Egypt

⁵

Computer Science Department, College of Computer Science and Engineering, Taibah University, Yanbu 966144, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(20), 8599; https://doi.org/10.3390/s23208599

Submission received: 12 September 2023 / Revised: 10 October 2023 / Accepted: 16 October 2023 / Published: 20 October 2023

(This article belongs to the Special Issue Biosignal Sensing and Analysis for Healthcare Monitoring)

Download

Browse Figures

Versions Notes

Abstract

:

One of the most prevalent diseases affecting women in recent years is breast cancer. Early breast cancer detection can help in the treatment, lower the infection risk, and worsen the results. This paper presents a hybrid approach for augmentation and segmenting breast cancer. The framework contains two main stages: augmentation and segmentation of ultrasound images. The augmentation of the ultrasounds is applied using generative adversarial networks (GAN) with nonlinear identity block, label smoothing, and a new loss function. The segmentation of the ultrasounds applied a modified U-Net 3+. The hybrid approach achieves efficient results in the segmentation and augmentation steps compared with the other available methods for the same task. The modified version of the GAN with the nonlinear identity block overcomes different types of modified GAN in the ultrasound augmentation process, such as speckle GAN, UltraGAN, and deep convolutional GAN. The modified U-Net 3+ also overcomes the different architectures of U-Nets in the segmentation process. The GAN with nonlinear identity blocks achieved an inception score of 14.32 and a Fréchet inception distance of 41.86 in the augmenting process. The GAN with identity achieves a smaller value in Fréchet inception distance (FID) and a bigger value in inception score; these results prove the model’s efficiency compared with other versions of GAN in the augmentation process. The modified U-Net 3+ architecture achieved a Dice Score of 95.49% and an Accuracy of 95.67%.

Keywords:

ultrasounds; breast cancer; augmentation; segmentation; U-Net 3+

1. Introduction

One of the diseases that both men and women can be affected by is breast cancer, which can affect women in particular. Recently, with the spread of this dreaded disease and the infection of a large number of women around the world, the concerned groups had to find ways to detect and segment this disease at an early stage to help doctors diagnose it and start the treatment journey early, whether with chemotherapy or surgical intervention [1,2,3,4,5]. With the onset of the disease, many methods and models were introduced to detect and segment the tumor, such as machine learning and deep learning methods [6,7]. In particular, deep learning provided many models for the discovery and dissection of disease, such as U-Net, convolution neural network (CNN), and fully convolution neural network (FCN). Each of them proved their competence in detecting and segmenting the disease and diagnosing it quickly in accurate ways and with high accuracy.

Preprocessing before the segmentation process can support increasing the accuracy of the segmentation process. There are different types of preprocessing of segmentation, such as removing noise from the ultrasounds and augmenting the dataset for use in the training process. The augmentation process can also assist in increasing the dataset sample size. Incorporating more samples into the dataset improves the training efficiency and segmentation precision [8,9].

Recently, due to the improved performance in ultrasonic augmentation compared to several old and classical approaches, CNN has grown increasingly as one of the major deep learning models [10,11]. To succeed, CNNs need big datasets or large numbers of training images to produce highly accurate segmentation results, which are particularly scarce in the medical field due to the high cost of medical imaging. In order to produce a large number of training sets from the available ultrasounds without needing to do new ultrasounds, the augmentation process is used. GAN introduce a highly efficient method for augmenting new ultrasound images from already produced ultrasounds. Many versions of GAN have been introduced in the augmentation process, but the main challenge is dealing with the mode collapse problem and the smoothing function. Mode collapse occurs when the generator part generates only a few different samples that deceive the discriminator instead of creating a varied collection of samples that span the whole target distribution. This can occur when the discriminator becomes too strong and can recognize and exploit patterns in the generator’s output, causing the generator to focus solely on creating those patterns instead of investigating the whole range of possible outputs. This paper presents some modifications to reduce mode collapse during augmentation. Enlarging the ultrasound dataset efficiently can help the segmentation model increase the accuracy of the segmentation process [12,13,14].

Deep learning has introduced many architectures for segmenting ultrasounds efficiently after the argumentation process, such as the fully convolutional network (FCN), CNN, the artificial neural network (ANN), and U-Net [15]. The CNNs rely on classifying ultrasounds, where the image is treated as input, and the output is generated as a single label. However, they are ineffective in biomedical scenarios, particularly when segmenting medical images like ultrasounds. U-Net was introduced to address this issue. Many architectures of U-Net have been introduced in ultrasound segmentation, such as attention U-Net, residual U-Net, and attention residual U-Net [16,17,18]. The U-Net 3+ architecture is a modified version of the U-Net that was introduced using full-scale skip connections [19]. Full-scale feature fusion, on the other hand, might result in unduly redundant calculations. The main goal of using U-Net 3+ was to decrease the network while boosting the feature extraction capabilities. To begin, the model prunes the full-scale skip connections of U-Net 3+ to eliminate duplication and increase computational efficiency. U-Net 3+ redesigns the skip connections to incorporate full-scale semantic information from the input photos. It showed that it was not correct but quicker and more effective than several common image segmentation architectures. This paper proposes the following items:

1.: The modified GAN with identity solve the mode collapse problem in the other versions of GAN for argumentation with ultrasounds.
2.: The framework uses the identity block with GAN and a modified loss function for expanding the ultrasound dataset.
3.: The argumentation process for the ultrasounds helps increase the segmentation process’s accuracy for different U-Net architectures.
4.: The combination of the GAN with identity blocks and the U-Net 3+ achieves higher accuracy when compared with other models in both the argumentation and segmentation processes.

The paper provides an introduction and an overview in Section 1 and Section 2 about the segmentation and argumentation of ultrasound images for detecting breast tumors. Section 3 introduces the identity GAN in the augmentation step and the U-Net 3+ in the segmentation process. The experimental findings are presented in Section 4, while Section 5 provides the conclusion and the recommended future work.

2. Related Work

The segmentation process using deep models greatly impacts the medical field, especially in breast tumor segmentation using U-Net. Many architectures of U-Net have been introduced for the segmentation process, such as U-Net [20], U-Net ++ [21], Attention U-Net [21,22], Vanilla U-Net [22,23], and Stack U-Net [24]. Although all the previously mentioned methods are useful in detecting cancerous breast tumors and other tumors, these methods find themselves restricted and insufficient when the dataset is small, and this causes poor accuracy of the results. For solving the problem of the small dataset, the GAN model presents many solutions for the general augmentation of medical images and breast ultrasounds [25]. Maria et al. [26] introduced a modified version of the GAN called the UltraGAN; the modified model by the authors was used to increase the number of ultrasound images using quality transfer while preserving structural information. UltraGAN employs frequency loss functions and an anatomical coherence restriction to improve quality. We demonstrate image quality enhancement without losing anatomical consistency. Using a publicly available dataset, we verify UltraGAN for echocardiogram segmentation and show that quality-enhanced pictures can benefit downstream tasks. We share our source code and training models to ensure repeatability. Miomoria et al. [27] used a deep convolution generative adversarial network (DCGAN) in ultrasound generation to enhance the segmentation process and increase the number of ultrasound images. In order to generate novel synthetic images comparable to the actual images, DCGAN uses CNNs to learn a hierarchy of characteristics from the input images. The discriminator network can differentiate between authentic and fraudulent images, while the generator network creates an image from a random noise vector as input.

DCGAN will employ CNNs to learn a hierarchy of characteristics from input photos, producing new, synthetic images comparable to genuine ones. The discriminator network distinguishes between authentic and fake images, whereas the generator network constructs an image from an additive noise vector. The authors use 528 ultrasound samples for the breast, divided into 144 benign samples and 529 ultrasounds of 216 malignant samples, and they generate using a DCGAN with 50, 100, 200, 500, and 1000 epochs. Robust data augmentation GAN (RDAGAAN) [28] are a modified GAN model also introduced for augmenting limited datasets before medical segmentation or object detection, such as tumor detection. The authors combined several small datasets from multiple sources. Creating an image is then split into two networks: one for object generation and the other for the translation of the image. The image translation network combines the images generated from the network with the images contained within the bounding boxes of the input dataset. A quantitative investigation revealed that the produced photos increased the detection ability of the YOLO v5 model. A modified version of GAN called stacked generative adversarial networks (StackGAN) [29] can be utilized to invent new samples of images from a smaller dataset or image deduction. Despite its ability to generate high-resolution images, particularly ultrasound and brain images, it has several drawbacks, including the tendency for some images to be incoherent and the possibility of mode collapse.

Lennart et al. [30] established a novel architecture for augmenting synthetic ultrasounds with speckle noise using generative adversarial networks (GAN). The key component is a speckle layer, which may be added to a neural network to produce real and domain-specific speckles. The generated GAN architecture is known as the speckle GAN. Speckle GAN’s discriminator network is taught to discriminate between actual and manufactured speckle patterns. Speckle GAN may produce synthetic speckle patterns that are very realistic and visually indistinguishable from actual speckle patterns by tuning the generator and discriminator networks in an adversarial way.

Although many models and methods have been used in both segmentation and augmentation, all of them have their limitations, such as the accuracy of the segmentation process, the mode of collapse in the generation of the ultrasound samples, and the loss function.

3. Proposed Model

This section of the paper introduces the methodology of the hybrid approach for segmentation and detecting the tumor from ultrasound images. The hybrid approach starts by importing an ultrasound dataset containing ultrasounds for women between 25 and 75 for 600 female patients in all. The dataset consists of 780 images, each measuring 500 by 500 pixels, and the images are categorized into normal, benign, and malignant. The dataset contains 487, 210, and 133 for benign, malignant, and normal, respectively. The second step is about augmenting the dataset using the modified GAN, and the last step is about segmentation and detecting the breast tumor using the U-Net 3+.

The U-Net 3+ architecture is an advanced modification of the U-Net model, specifically designed for image segmentation tasks. It comprises three main components: the encoder, bottleneck, and decoder, each playing a crucial role in the overall architecture. The encoder is responsible for capturing high-level semantic information from the input image. It consists of a series of convolutional layers that progressively reduce the spatial dimensions while increasing the number of feature channels. These convolutional layers typically incorporate operations such as convolution, batch normalization, and non-linear activation functions like ReLU. By applying these operations, the encoder extracts abstract features that represent the underlying structures and patterns in the input image.

The bottleneck component serves as the central part of the U-Net 3+ architecture. It typically consists of multiple convolutional layers with a smaller number of filters compared to the encoder layers. The purpose of the bottleneck is to compress the encoded features into a compact representation, also known as a latent space. This compression helps retain the most relevant information necessary for accurate segmentation while reducing the computational complexity of the model. The decoder is responsible for upsampling the compact representation from the bottleneck and recovering the spatial dimensions to match the original input size. Similar to the encoder, the decoder consists of convolutional layers, batch normalization, and activation functions. However, what sets the U-Net 3+ decoder apart is the incorporation of skip connections. The skip connections establish connections between corresponding layers in the encoder and decoder paths. These connections allow the network to preserve and utilize fine-grained details from the encoder during the upsampling process. Specifically, the skip connections concatenate feature maps from the encoder to the corresponding decoder layers, enabling the decoder to access both low-level and high-level features. This integration of fine-grained details helps in the precise localization and segmentation of objects in the image.

In the U-Net 3+ architecture, multiple decoder paths with different scales or resolutions of skip connections are employed. This multi-scale design allows the network to capture features at various levels of abstraction and combine information from different scales effectively. By considering features at multiple scales, the U-Net 3+ architecture enhances the segmentation process, enabling more accurate and detailed segmentation results. Figure 1 shows the block diagram of the hybrid approach.

3.1. GAN with Identity Block

Generative adversarial networks (GAN) exhibit a set of exceptional qualities that make them exceptionally well-suited for enhancing image processing tasks. Firstly, their generative capability allows them to generate new images that closely resemble real ones, enabling improvements in various aspects of image quality. GAN can capture intricate details, textures, and structures present in the training data, resulting in visually appealing and high-quality images. Secondly, GAN employ an adversarial training framework where a generator network and a discriminator network compete against each other. The generator learns to produce images that are indistinguishable from real images, while the discriminator aims to correctly classify real and generated images. This adversarial process drives the generator to continually improve its image-generation abilities, leading to the production of more realistic and visually pleasing results. Furthermore, GAN excel at learning complex non-linear transformations between input and output images.

Traditional linear processing methods often struggle to capture intricate relationships and patterns in complex image data. However, GAN can learn and represent these non-linear mappings, enabling them to enhance various image attributes effectively. This includes improving resolution, enhancing sharpness, adjusting color balance, and refining fine details that are challenging to achieve with traditional linear techniques. Another key advantage of GAN is their ability to learn from unannotated or partially annotated data. This unsupervised or self-supervised learning capability is particularly valuable in image processing tasks where obtaining large-scale annotated datasets can be time-consuming and expensive. GAN can leverage the inherent patterns and structures in the training data to enhance images without relying on explicit labels or annotations.

Lastly, GAN offer flexibility and adaptability to cater to specific image processing goals. They can be customized by modifying the network architecture, loss functions, or training strategies to suit the desired enhancements. This adaptability allows GAN to tackle a wide range of image processing tasks, including but not limited to super-resolution, denoising, deblurring, style transfer, and inpainting.

The architecture of the GAN with identity block (IGAN) contains two main parts, as in traditional GAN: the first part includes the generator, while the second includes the discriminator. The GAN with identity block model uses the proximity of every pair of samples in a single mini-batch. Then, the overall summary of a single data point is computed by adding its proximity to other samples in the same batch. Finally, it is explicitly added to the model. The discriminator is still required to output a single number for each example. This number indicates the likelihood of the example originating from the training data. The single output of the discriminator with mini-batch training is allowed to use other examples in the mini-batch as side information. The model presents a solution to the problem of mode collapses in the discriminator, such as the Softmax function or the Gumbel-Softmax function that introduces randomness into the generator’s output, making it more difficult for the discriminator to identify patterns and forcing the generator to explore a wider range of possible outputs. The GAN with identity blocks changes the loss function from Equation (1) to avoid mode collapse by adding smoothing in the other versions of the GAN during the image generation in the augmentation process.

Mode collapse occurs when the discriminator returns one or zero as the classification result for the real and fake images. Gradients will be close to zero at both ends, and the discriminator cannot provide useful feedback, resulting in a vanishing gradient problem. As a result, this study alters the GAN loss function stated in Equation (1) by including label smoothing. Instead of supplying 1 and 0 labels for real and false data while training the discriminator, we used 0.9 and 0.1. Equations (2)–(4) express the loss function employed in this study. Figure 2 explains the architecture of the generator, and Figure 3 explains the architecture of the discriminator.

\min_{G} \max_{S} E_{x \sim q_{data} (x)} [\log S (x)] + E_{z \sim p (z)} [\log (1 - S (G (z)))]

(1)

L (G, S) = \min (L_{G}) + \max (L_{S})

(2)

L_{G} = - \frac{1}{m} \sum_{i = 1}^{m} [\log S (G (z))]

(3)

L_{S} = \frac{1}{m} \sum_{i = 1}^{m} [\log S (x) + \log (1 - S (G (z)))]

(4)

The GAN with the identity block begins by importing a limited ultrasound dataset, loading the ultrasound from the patients in the dataset, and then extracting the semantic label from the lesion area and from the contour, also synthesizing the semantic label. The main vital stage in the augmentation is feeding the tumor with the generator to obtain the synthesized image and calculating the regional loss, as mentioned in the previous subsection. The architecture was trained using an optimizer called Adamx that uses the gradient clipping technique to limit gradients’ size to prevent noisy updates. The learning rate was set to 0.0005 with a batch size of 256.

3.2. Evaluation Metrics

The framework uses two different evaluation metrics for comparing the results of the identity GAN and another version of GAN. The framework uses Fréchet inception distance (FID) and inception score (IS) using Equations (5) and (6). In Equation (5), x represents a single generated sample drawn from the generator, while y corresponds to the label prediction obtained through the Inception model. The FID and IS are introduced as metrics for evaluating the quality and similarity of images generated by generative adversarial networks (GAN).

The inception distance is computed using the Inception v3 neural network, which was trained for classifying images using the ImageNet dataset. The basic idea is to use the Inception network for feature extraction from both the real and generated images and then compare the distribution of these features using the Kullback–Leibler (KL) divergence. The FID modifies the inception distance by comparing the feature distributions of the real and generated images using the Fréchet distance, which measures the distance between two multivariate Gaussian distributions. The r in Equation (6) refers to the real image, while g denotes the generated image. The paper also uses Precision, Recall, Accuracy, and Dice Score for model evaluation using Equations (7)–(10), respectively.

I S = e x p (E_{x ~ p_{g}} D_{K L} (p (y| x) | | p (y)))

(5)

F I D = | | μ_{r} - μ_{g} {| |}^{2} + T r (Σ r + Σ g - 2 \sqrt{(Σ r Σ g)})

(6)

P r e c i s i o n = \frac{{T P}_{i}}{{T P}_{i} + {F P}_{i}} \times 100 %

(7)

R e c a l l = \frac{{T P}_{i}}{{T P}_{i} + {F N}_{i}} \times 100 %

(8)

A c c u r a c y = \frac{{T P}_{i} + {T N}_{i}}{{T P}_{i} + {T N}_{i} + {F P}_{i} + {F N}_{i}} \times 100 %

(9)

D i c e S c o r e = \frac{2 \times | P r e c i s i o n \times R e c a l l |}{| P r e c i s i o n + R e c a l l |} \times 100 %

(10)

The FID is computed on a large number of samples from both the real and generated datasets, typically 10,000 and more, in order to obtain an accurate estimate of the distance between their distributions. In general, lower values of ID and FID indicate better quality of the generated images, as they imply a closer match between the distribution of features in the real and generated images.

3.3. The Modified U-Net 3+

In order to detect the breast tumor, the framework segments the ultrasounds using the U-Net 3+ architecture. The U-Net applied the same architecture but with a redesigned skip connection. The new skip connections consider the beneficial effect of particular multi-scale feature maps on segmentation. Each decoder layer in U-Net 3+ will have neighboring multi-scale feature maps that contribute similarly to segmentation, resulting in unnecessary duplicate calculations. The model has three modifications when compared with the previous versions of U-Net and U-Net ++.

The first modification is based on adopting and updating its deep supervision approach to accept full-scale semantic information. The second modification is based on modifying the dense connection architecture to cater to both low and high degrees of information in feature maps more efficiently for segmentation. The last modification is used to lower false-positive rates by employing a categorization-guided module to anticipate items when they are not there. The model uses a loss function that contains the summation of the focal loss (

f_{1}

), multi-scale structural similarity index loss (

m s - s s i m

), and intersection over union (

I o U

) loss. Equation (11) shows the loss function of the modified U-Net 3+.

l_{s e g} = l_{f 1} + l_{m s - s s i m} + l_{I o U}

(11)

The model uses the convolution block attention module (CBAM) to enhance the architecture’s capacity for feature extraction. CBAM is a light feed-forward neural convolutional neural network attention module that can be implemented into any CNN architecture for end-to-end training and is one of several attention models. The CBAM modules contain two main blocks: the channel attention module is the first, and the spatial attention module is the second. The two channels are used to enhance the process of feature extraction. The model was trained with 200 epochs, a 0.0001 learning rate, and the modified loss function with Adamx as the optimizer.

3.4. The Dataset

Women between the ages of 25 and 75 provided breast ultrasound scans for the study’s initial collection. These data were compiled in 2018 with 600 total female patients. Each of the 780 images in the dataset is 500 by 500 pixels. PNG files are used for the images. The original photos are shown with the ground truth photographs. The photos are divided into three categories: normal, benign, and malignant, as shown in Figure 4. The dataset still needs to be improved due to the small number of patients. The breast ultrasound images dataset [31], which is publicly available, has been obtained from Kaggle’s data repository [32].

3.5. Hardware and Software Specifications

This paper is based on processor dual-core Intel(R) i7-11700, NVIDIA GEFORCE RTX 3080 GPU with random access memory, and Python programming languages in all experiments such as augmentation and segmentation process.

4. Results and Discussion

The experiments on ultrasonic augmentation using the GAN with identity (IGAN) and the findings of ultrasound segmentation using the U-Net 3+ are presented in this section. The section also compares IGAN, DCGAN, UltraGAN, and speckle GAN based on FID and IS. The section also compares segmentation using different architectures of U-Net and the four augmentation models.

Table 1 provides a comparison of IGAN with DCGAN [33], speckle GAN [30], and UltraGAN [26] regarding IS and FID. These comparisons were made after 200 epochs of training, using a learning rate of 0.0001 and the Adam optimizer. The results in Table 1 show the efficiency of IGAN compared with other available architectures in the augmentation process due to the larger IS and smaller FID. The evaluation in Table 1 was conducted using Fréchet’s inception distance and inception score using Equations (6) and (7). Figure 5 additionally presents a comparison of IGAN with other GAN architectures in terms of the augmentation process. The IGAN model yields a 7.75% improvement in IS compared to the best results achieved with DCGAN, while it provides a 10.77% enhancement in FID compared to the top-performing results obtained by UltraGAN.

Enhancing the accuracy of ultrasound breast segmentation can be achieved by leveraging different versions of GAN in the augmentation process. The choice of GAN architecture, loss functions, and training parameters can significantly impact the quality and diversity of the generated ultrasound images and also participate in increasing the accuracy of the segmentation process. The next tables discuss the results of segmentation after using the augmentation process using GAN and also show comparisons between the segmentation accuracy after using different versions of GAN.

Table 2 shows the results of the ultrasound segmentation before using any augmentation models. The table shows the efficiency of U-Net 3+ when compared with other U-Net architectures. All the U-Net architectures in the table use Adamx as an optimizer, 0.0001 as a learning rate, and 128 as a batch size. The experiments have been conducted using the same hardware specification. The last column in the table introduces the time per epoch in minutes. The U-Net 3+ model achieves scores of 92.68, 92.68, 92.67, 92.36, and 24.65 for Dice Score, Accuracy, Precision, and Recall, respectively, as well as 24.65 min per epoch without applying any augmentation preprocessing.

Hyperparameter optimization is a crucial step in developing effective U-Net architectures for tumor segmentation. The choice of optimizer and learning rate are key hyperparameters that significantly impact the model’s performance. In the case of the mentioned U-Net architectures, Adamx is specifically selected as the optimizer, providing adaptive learning rates and momentum to enhance convergence speed and generalization. The learning rate of 0.0001 strikes a balance between convergence speed and stability, allowing the model to effectively adjust its parameters during training. However, it is important to note that these hyperparameter values should be fine-tuned through systematic experimentation and evaluation to identify the optimal combination for the specific tumor segmentation task.

Techniques such as grid search, random search, Bayesian optimization, or automated hyperparameter tuning libraries can be employed to explore the hyperparameter space and uncover the best configuration. Cross-validation is commonly used to assess the model’s performance across different hyperparameter settings. By carefully optimizing these hyperparameters, researchers and practitioners can ensure that the U-Net architectures perform optimally and deliver accurate tumor segmentation results.

Table 3 showcases the results of ultrasound breast image segmentation using DCGAN exclusively as a pre-augmentation method. The information in Table 3 highlights the enhancements in segmentation performance, which encompass improvements in Dice Score, Accuracy, Precision, and Recall, achieved through the integration of the DCGAN architecture. Notably, these enhancements occur with a limited scaling factor. The impact of DCGAN is an increase of 0.84%, 0.98%, 0.85%, and 1.29% for Dice Score, Accuracy, Precision, and Recall, respectively.

Table 4 shows the segmentation after augmentation using a StackGAN. The table shows the enhancement of the segmentation results in terms of Dice Score, Accuracy, Precision, and Recall after using the augmentation using only the StackGAN architecture. The effect of StackGan is evident in an increase of 0.84%, 1.08%, 1.32%, and 1.28% in terms of Dice Score, Accuracy, Precision, and Recall, respectively. The enhancement occurs with a small-scale adjustment, a little increase to the DCGAN.

Table 5 shows the segmentation of ultrasound breast images after using the UltraGAN as a pre-augmentation process. The data presented in Table 5 also signify an increase of 0.84%, 1.07%, 1.29%, and 1.76% for the selected evaluation metrics. Similar to the previous DCGAN and StackGAN, the UltraGAN model also introduces a limited scaling factor.

Table 6 shows the outcomes of ultrasound image segmentation using the GAN with an identity block (IGAN) model. The data in Table 6 emphasize the efficiency of IGAN when compared to other GAN architectures, specifically DCGAN, StackGAN, and UltraGAN. This efficiency is represented by an increase of 3.03% for Dice Score, 3.23% for Accuracy, 3.15% for Precision, and 3.59% for Recall. The results illustrate the significant impact of using IGAN as a preprocessing step for the augmentation process.

Figure 6 shows the comparison between different architectures of U-Net before and after augmentation. Figure 6 is divided into five sections, each comparing the various U-Net architectures regarding Dice Score, Accuracy, Precision, and Recall. The first section of the figure shows the comparison before the augmentation process. The second section or row compares the architectures after using the DCGAN in the augmentation. The third section shows the comparison after using the speckle GAN, the fourth section shows the comparison after using the UltraGAN, and the last section shows the comparison after using the GAN with identity block. The results in the last section show the efficiency of the U-Net 3+ and GAN with identity blocking compared to all other sections in the figure.

Figure 7 shows the segmentation results after using the GAN with an identity block and U-Net 3+. The figure is divided into three sections: the first column shows the input ultrasound image, the second column shows the predicted mask for the breast tumor, and the last column explores the use of the GRAD-CAM algorithm to provide a clearer understanding of the proposed model’s decision-making process.

The results in the previous tables show the efficient enhancement of using the combination of different architectures of U-Nets and different versions of GAN in augmentation, and it is important to note that the success of this combined approach depends on the quality and diversity of the augmented data generated by GAN. The U-Net 3+ architecture is well-suited for medical image segmentation tasks with its encoder–decoder structure and skip connections. It can effectively capture both local and global contextual information in the images, enabling accurate segmentation of structures of interest. The combination of identity GAN augmentation and U-Net 3+ leverages the strengths of both approaches, allowing for a more robust and accurate segmentation model when compared with other combinations between different architectures of U-Net and other versions of GAN for the augmentation process, as mentioned in Table 3, Table 4, Table 5 and Table 6. So, after different experiments and using different versions of modified GAN for augmentation, the results prove the efficiency of using different GAN in enhancing the accuracy of the segmentation process and also prove the efficiency of GAN with identity blocks in enhancing different architectures of U-Nets. The results also prove that the combination between U-Net 3+ and GAN with identity blocks is a well-situated combination among other combinations.

The proposed combination model, which incorporates both a GAN with an identity block and a U-Net 3+ architecture, is not limited to the detection of breast cancer from ultrasound images. Given that ultrasound images can be used to visualize blood flow and detect clots within blood vessels, we could apply the proposed combination model for blood coagulation detection. This combination allows us to leverage the generative capabilities of the GAN to generate synthetic coagulation images and then use the U-Net 3+ model for precise segmentation and detection.

5. Conclusions and Future Work

This paper provided a hybrid method for segmenting breast tumors from an ultrasound dataset. The model presented two main stages: the first is using the GAN with identity blocks for the augmentation process, and the second is segmenting the ultrasound images to detect the tumor. The combination of GAN with identity block for augmentation and U-Net 3+ architecture has proven effective enhancement in the segmentation accuracy of breast ultrasound images. The modified GAN with nonlinear identity blocks overcame different types of modified GAN in the ultrasound augmentation process, such as speckle GAN, UltraGAN, and deep convolutional GAN. The modified U-Net 3+ also overcame different architectures of U-Nets in the segmentation process. The proposed combination of results achieved a Dice Score of 95.49, an Accuracy of 95.67, a Precision of 95.59, and a Recall of 95.68.

The GAN model with identity block provides a powerful framework for data augmentation, generating synthetic images that closely resemble real data while introducing variations and diversity, solving the mode collapse, and vanishing gradient problems. The experiments also prove the efficient enhancement of different types of U-Net architectures and the efficiency of using GAN with identity blocks compared with other versions of GAN in the augmentation step. The results also prove the efficiency of U-Net 3+ with modified GAN when compared with other architectures of U-Net with identity blocks. Finally, GAN with identity blocks can effectively increase the medical datasets, which also helps increase the accuracy of different types of medical segmentation, regardless of the type of medical image or the different characteristics of U-Nets. Despite achieving these positive results, some challenges still exist, such as generating high-quality medical images. In order to bridge the gap between research and real-world applications, we intend to collaborate with clinical experts to conduct clinical validation studies, ensuring that our architecture meets the standards and requirements of the medical field.

Author Contributions

Data curation, M.A.; formal analysis, W.S. and A.M.M.; investigation, M.E. (Mohamed Ezz), M.E. (Mahmoud Elmezain) and A.M.M.; supervision, M.A.; writing—original draft, M.E. (Mahmoud Elmezain) and M.E. (Mohamed Ezz); writing—review and editing, W.S., M.A. and A.M.M. All authors have read and agreed to the published version of the manuscript.

Funding

The Deputyship of Research & Innovation, Ministry of Education in Saudi Arabia funding this research through the project number 223202.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Furnished on request.

Acknowledgments

The authors acknowledge the Deanship of Scientific Research at Jouf University.

Conflicts of Interest

The authors declare no conflict of interest.

References

Fu, Y.; Lei, Y.; Wang, T.; Curran, W.J.; Liu, T.; Yang, X. A review of deep learning based methods for medical image multi-organ segmentation. Phys. Med. 2021, 85, 107–122. [Google Scholar] [CrossRef] [PubMed]
Gheshlaghi, S.H.; Kan, C.N.E.; Ye, D.H. Breast Cancer Histopathological Image Classification with Adversarial Image Synthesis. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Virtual, 1–5 November 2021; pp. 3387–3390. [Google Scholar] [CrossRef]
Singh, S.; Tripathi, B.K. Pneumonia classification using quaternion deep learning. Multimed. Tools Appl. 2022, 81, 1743–1764. [Google Scholar] [CrossRef] [PubMed]
Han, X.; Wang, J.; Zhou, W.; Chang, C.; Ying, S.; Shi, J. Deep Doubly Supervised Transfer Network for Diagnosis of Breast Cancer with Imbalanced Ultrasound Imaging Modalities. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 141–149. [Google Scholar] [CrossRef]
Abhisheka, B.; Biswas, S.K.; Purkayastha, B. A Comprehensive Review on Breast Cancer Detection, Classification and Segmentation Using Deep Learning. Arch. Comput. Methods Eng. 2023, 30, 5023–5052. [Google Scholar] [CrossRef]
Suh, Y.J.; Jung, J.; Cho, B.-J. Automated Breast Cancer Detection in Digital Mammograms of Various Densities via Deep Learning. J. Pers. Med. 2020, 10, 211. [Google Scholar] [CrossRef]
Zhu, X.; Wolfgruber, T.K.; Leong, L.; Jensen, M.; Scott, C.; Winham, S.; Sadowski, P.; Vachon, C.; Kerlikowske, K.; Shepherd, J.A. Deep Learning Predicts Interval and Screening-detected Cancer from Screening Mammograms: A Case-Case-Control Study in 6369 Women. Radiology 2021, 301, 550–558. [Google Scholar] [CrossRef]
Ren, W.; Chen, M.; Qiao, Y.; Zhao, F. Global guidelines for breast cancer screening: A systematic review. Breast 2022, 64, 85–99. [Google Scholar] [CrossRef]
Jiang, Y.; Chen, L.; Zhang, H.; Xiao, X. Breast cancer histopathological image classification using convolutional neural networks with small SE-ResNet module. PLoS ONE 2019, 14, e0214587. [Google Scholar] [CrossRef]
Zhu, C.; Song, F.; Wang, Y.; Dong, H.; Guo, Y.; Liu, J. Breast cancer histopathology image classification through assembling multiple compact CNNs. BMC Med. Inform. Decis. Mak. 2019, 19, 198. [Google Scholar] [CrossRef]
Chen, Z.; Zeng, Z.; Shen, H.; Zheng, X.; Dai, P.; Ouyang, P. DN-GAN: Denoising generative adversarial networks for speckle noise reduction in optical coherence tomography images. Biomed. Signal Process. Control 2020, 55, 101632. [Google Scholar] [CrossRef]
Meng, H.; Guo, F. Image Classification and Generation Based on GAN Model. In Proceedings of the 2021 3rd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Taiyuan, China, 3–5 December 2021; pp. 180–183. [Google Scholar] [CrossRef]
Abedi, M.; Hempel, L.; Sadeghi, S.; Kirsten, T. GAN-Based Approaches for Generating Structured Data in the Medical Domain. Appl. Sci. 2022, 12, 7075. [Google Scholar] [CrossRef]
Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
Umehara, K.; Ota, J.; Ishida, T. Application of Super-Resolution Convolutional Neural Network for Enhancing Image Resolution in Chest CT. J. Digit. Imaging 2018, 31, 441–450. [Google Scholar] [CrossRef]
Ibtehaz, N.; Rahman, M.S. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 2020, 121, 74–87. [Google Scholar] [CrossRef] [PubMed]
Jayandhi, G.; Jasmine, J.S.L.; Seetharaman, R.; Joans, S.M.; Joy, R.P. An Effective Segmentation of Breast Cancer Using Modified U-NET. In Proceedings of the 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 25–26 March 2022; pp. 1215–1218. [Google Scholar] [CrossRef]
Li, H.; Chen, D.; Nailon, W.H.; Davies, M.E.; Laurenson, D.I. Dual Convolutional Neural Networks for Breast Mass Segmentation and Diagnosis in Mammography. IEEE Trans. Med. Imaging 2022, 41, 3–13. [Google Scholar] [CrossRef]
Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.W.; Wu, J. UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 1055–1059. [Google Scholar] [CrossRef]
Robin, M.; John, J.; Ravikumar, A. Breast Tumor Segmentation using U-NET. In Proceedings of the 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 8–10 April 2021; pp. 1164–1167. [Google Scholar] [CrossRef]
Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. IEEE Trans. Med. Imaging 2020, 39, 1856–1867. [Google Scholar] [CrossRef] [PubMed]
Li, S.; Dong, M.; Du, G.; Mu, X. Attention Dense-U-Net for Automatic Breast Mass Segmentation in Digital Mammogram. IEEE Access 2019, 7, 59037–59047. [Google Scholar] [CrossRef]
Deepak, S.; Ameer, P.M. MSG-GAN Based Synthesis of Brain MRI with Meningioma for Data Augmentation. In Proceedings of the 2020 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), Bangalore, India, 2–4 July 2020; pp. 1–6. [Google Scholar] [CrossRef]
Sevastopolsky, A.; Drapak, S.; Kiselev, K.; Snyder, B.; Keenan, J.; Georgievskaya, A. Stack-U-Net: Refinement network for improved optic disc and cup image segmentation. In SPIE Medical Imaging; Angelini, E.D., Landman, B.A., Eds.; SPIE: San Diego, CA, USA, 2019; Volume 10949. [Google Scholar] [CrossRef]
Gab Allah, A.M.; Sarhan, A.M.; Elshennawy, N.M. Classification of Brain MRI Tumor Images Based on Deep Learning PGGAN Augmentation. Diagnostics 2021, 11, 2343. [Google Scholar] [CrossRef] [PubMed]
Escobar, M.; Castillo, A.; Romero, A.; Arbeláez, P. UltraGAN: Ultrasound Enhancement Through Adversarial Generation. In Simulation and Synthesis in Medical Imaging; Springer International Publishing: Cham, Switzerland, 2020; pp. 120–130. [Google Scholar] [CrossRef]
Fujioka, T.; Mori, M.; Kubota, K.; Kikuchi, Y.; Katsuta, L.; Adachi, M.; Oda, G.; Nakagawa, T.; Kitazume, Y.; Tateishi, U. Breast Ultrasound Image Synthesis using Deep Convolutional Generative Adversarial Networks. Diagnostics 2019, 9, 176. [Google Scholar] [CrossRef]
Lee, H.; Kang, S.; Chung, K. Robust Data Augmentation Generative Adversarial Network for Object Detection. Sensors 2023, 23, 157. [Google Scholar] [CrossRef]
Zhang, H.; Xu, T.; Li, H.; Zhang, S.; Wang, X.; Huang, X.; Metaxas, D. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5908–5916. [Google Scholar] [CrossRef]
Bargsten, L.; Schlaefer, A. SpeckleGAN: A generative adversarial network with an adaptive speckle layer to augment limited training data for ultrasound image processing. Int. J. Comput. Assist. Radiol. Surg. 2020, 15, 1427–1436. [Google Scholar] [CrossRef]
Al-Dhabyani, W.; Gomaa, M.; Khaled, H.; Fahmy, A. Dataset of breast ultrasound images. Data Brief 2020, 28, 104863. [Google Scholar] [CrossRef] [PubMed]
Shah, A. Breast Ultrasound Images Dataset. 2020. Available online: https://www.kaggle.com/datasets/aryashah2k/breast-ultrasound-images-dataset (accessed on 23 August 2023).
Patil, A.; Venkatesh. DCGAN: Deep Convolutional GAN with Attention Module for Remote View Classification. In Proceedings of the 2021 International Conference on Forensics, Analytics, Big Data, Security (FABS), Bengaluru, India, 21–22 December 2021; pp. 1–10. [Google Scholar] [CrossRef]
Piantadosi, G.; Sansone, M.; Sansone, C. Breast Segmentation in MRI via U-Net Deep Convolutional Neural Networks. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 3917–3922. [Google Scholar] [CrossRef]
Alzahrani, Y.; Boufama, B. Breast Ultrasound Image Segmentation Model Based Residual Encoder. In Proceedings of the 2021 4th International Conference on Intelligent Robotics and Control Engineering (IRCE), Lanzhou, China, 18–20 September 2021; pp. 85–89. [Google Scholar] [CrossRef]
Micallef, N.; Seychell, D.; Bajada, C.J. Exploring the U-Net++ Model for Automatic Brain Tumor Segmentation. IEEE Access 2021, 9, 125523–125539. [Google Scholar] [CrossRef]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar] [CrossRef]
Kurian, N.C.; Lehan, A.; Verghese, G.; Dharamshi, N.; Meena, S.; Li, M.; Liu, F.; Gillet, C.; Rane, S.; Grigoriadis, A.; et al. Deep Multi-Scale U-Net Architecture and Label-Noise Robust Training Strategies for Histopathological Image Segmentation. In Proceedings of the 2022 IEEE 22nd International Conference on Bioinformatics and Bioengineering (BIBE), Taichung, Taiwan, 7–9 November 2022; pp. 91–96. [Google Scholar] [CrossRef]
Wang, Z.H.; Liu, Z.; Song, Y.Q.; Zhu, Y. Densely connected deep U-Net for abdominal multi-organ segmentation. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1415–1419. [Google Scholar] [CrossRef]

Figure 1. Methodology block diagram.

Figure 2. IGAN generator architecture.

Figure 3. IGAN discriminator architecture.

Figure 4. Normal, benign, and malignant ultrasound.

Figure 5. Augmentation results comparison.

Figure 6. Comparison between different architectures of U-Net before and after augmentation.

Figure 7. Original mask, predicted mask, and GRAD-CAM.

Table 1. Comparative analysis of evaluation metrics for IGAN-generated images.

Model	IGAN	DCGAN	Speckle GAN	UltraGAN
Inception Score	14.32	13.29	11.53	12.68
Fréchet Inception Distance	41.86	47.9	49.2	46.91

Table 2. A pre-augmentation comparative of different segmentation models.

Model	Dice Score	Accuracy	Precision	Recall	Time per Epoch (min)
U-Net [34]	86.94	87.64	88.65	87.69	24.26
Multi residual U-Net [35]	91.69	91.26	91.67	90.67	24.65
U-Net++ [36]	91.67	91.67	91.06	91.67	25.23
Attention U-Net++ [37]	92.06	92.13	92.01	92.14	24.36
Vanilla U-Net [38]	91.36	91.67	91.68	91.06	27.23
Dense U-Net [39]	89.26	88.67	88.19	88.19	24.67
U-Net 3+	92.68	92.68	92.67	92.36	24.65

Table 3. A comparative of different segmentation models using only DCGAN pre-augmentation.

Model	Dice Score	Accuracy	Precision	Recall
U-Net	90.26	90.64	90.15	90.46
Multi residual U-Net	92.01	92.23	91.89	91.69
U-Net++	92.23	92.45	92.36	92.76
Attention U-Net++	92.89	93.06	93.25	93.45
Vanilla U-Net	92.68	92.67	92.56	92.81
Dense U-Net	91.01	91.56	91.26	91.81
U-Net 3+	93.46	93.59	93.46	93.55

Table 4. A comparative of different segmentation models using only StackGAN pre-augmentation.

Model	Dice Score	Accuracy	Precision	Recall
U-Net	90.44	90.67	90.73	90.29
Multi residual U-Net	92.15	92.68	92.38	92.54
U-Net++	92.42	92.65	92.81	92.67
Attention U-Net++	93.46	93.49	93.54	93.21
Vanilla U-Net	92.46	92.09	92.67	92.13
Dense U-Net	91.06	91.11	91.60	91.32
U-Net 3+	93.46	93.68	93.89	93.54

Table 5. A comparative of different segmentation models using only UltrGAN pre-augmentation.

Model	Dice Score	Accuracy	Precision	Recall
U-Net	90.64	89.46	89.67	90.13
Multi residual U-Net	92.46	92.67	92.16	92.16
U-Net++	93.01	93.12	93.46	93.57
Attention U-Net++	93.45	93.67	93.45	93.54
Vanilla U-Net	92.67	92.15	92.67	91.78
Dense U-Net	91.19	91.57	91.67	91.56
U-Net 3+	93.46	93.67	93.87	93.99

Table 6. A comparative of different segmentation models using only IGAN pre-augmentation.

Model	Dice Score	Accuracy	Precision	Recall
U-Net	92.67	92.67	92.65	92.19
Multi residual U-Net	93.67	93.45	93.59	93.51
U-Net++	93.56	93.78	93.16	93.87
Attention U-Net++	94.69	94.69	94.25	94.65
Vanilla U-Net	93.56	93.45	93.25	93.45
Dense U-Net	92.67	92.68	92.48	92.67
U-Net 3+	95.49	95.67	95.59	95.68

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alruily, M.; Said, W.; Mostafa, A.M.; Ezz, M.; Elmezain, M. Breast Ultrasound Images Augmentation and Segmentation Using GAN with Identity Block and Modified U-Net 3+. Sensors 2023, 23, 8599. https://doi.org/10.3390/s23208599

AMA Style

Alruily M, Said W, Mostafa AM, Ezz M, Elmezain M. Breast Ultrasound Images Augmentation and Segmentation Using GAN with Identity Block and Modified U-Net 3+. Sensors. 2023; 23(20):8599. https://doi.org/10.3390/s23208599

Chicago/Turabian Style

Alruily, Meshrif, Wael Said, Ayman Mohamed Mostafa, Mohamed Ezz, and Mahmoud Elmezain. 2023. "Breast Ultrasound Images Augmentation and Segmentation Using GAN with Identity Block and Modified U-Net 3+" Sensors 23, no. 20: 8599. https://doi.org/10.3390/s23208599

APA Style

Alruily, M., Said, W., Mostafa, A. M., Ezz, M., & Elmezain, M. (2023). Breast Ultrasound Images Augmentation and Segmentation Using GAN with Identity Block and Modified U-Net 3+. Sensors, 23(20), 8599. https://doi.org/10.3390/s23208599

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Breast Ultrasound Images Augmentation and Segmentation Using GAN with Identity Block and Modified U-Net 3+

Abstract

1. Introduction

2. Related Work

3. Proposed Model

3.1. GAN with Identity Block

3.2. Evaluation Metrics

3.3. The Modified U-Net 3+

3.4. The Dataset

3.5. Hardware and Software Specifications

4. Results and Discussion

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI