1. Introduction
The world is facing a huge amount of human death mainly caused by cardiovascular diseases (CVD), i.e., 17.3 million deaths every year, which may reach the limits of more than 23.6 million by 2030 [
1]. It dramatically impacts both developed and non-developed countries, no matter the countries with higher income. CVD comprises heart failure, ischemic heart diseases, peripheral arterial disease, stroke, and many other vascular conditions, which are key contributors to life reduction [
2,
3]. Around 9.6 and 8.9 million males and females passed away due to CVD in 2019, i.e., an estimated one-third of the total deaths worldwide.
For diagnosing cardiac diseases, it is necessary to quantify Left Ventricle (LV) function. Since precise LV contours [
4] provide much information regarding its size and shape, segmenting LV is critical in estimating the key diagnostic indicators [
5]. Numerous segmentation methods are used to increase the robustness and correctness of results. LV segmentation from cardiac magnetic resonance images provides significant supplementary data for the analysis and treatment follow-up of CVD ailments, the fundamental cause of death worldwide [
6].
Customarily, the LV segmentation expedition is accomplished physically, which is monotonous, tedious and needs lots of human involvement and expertise to produce segmentation results [
7]. Moreover, occasionally the specialists/clinicians neglect to contract an enormous arrangement of information. So, the specialists presented a semiautomatic framework for the procedure’s part to surface the outcomes. These systems are undoubtedly taking many times and need experts with full concentration. Furthermore, accuracy in results and fastness require the automaticity of systems. One step towards this is semi-automatic systems, but nowadays, technological advancement opens the door to moving to the fully automatic system.
The automatic algorithms can deal with tedious searches and manage enormous information measures to investigate the local and worldwide capacity of heart MRI-LV by computing clinical parameters like EF (Ejection Fraction), ESV (End Systolic Volume), EDV (End Diastolic Volume) and myocardial mass [
8].
The difficulties in automatically segmenting the Left Ventricle incorporate managing the existence of papillary muscles in feeble edges around the epicardium of the Left Ventricle, which will be managed properly without the consideration of manual adjustment of conclusive outcomes. At long last, the changeability in the state of the endocardium and epicardial forms crosswise over cuts and stages, which make the myocardium exposure much more complex, will likewise be considered in the proposed work. To put it plainly, the proposed framework will lessen the measure of time required for the examination of heart work and segmentation. A few major steps for the proposed method include data preparation, preprocessing, and the light U-Net model. The major contributions made in this proposed method correspond to the following steps.
Contributions
An image normalization method is performed prior to processing. The efficiencies of the proposed algorithm are strongly impacted by the enhancement of image contrast. Improving an image’s contrast is surely a prerequisite for image processing.
Intelligent Contrast Stretching and Histogram Equalization improves an image that has already undergone preprocessing with unsharp masking.
The Encoder path implanted layers with convolutional blocks and reducing the batch layer while in next layer removing the activation layer as well.
Afterward, up-sampling enlarges the feature batch back to its actual size. The significant contribution of the proposed technique is to provide 99% precise results of the epicardium layer.
The test dataset includes LVOT (Left Ventricle Outflow Tract) images of both endocardium and epicardium.
An analytic overview of previous research is depicted in further segments. Then, the novel contribution is introduced as an intelligent image enhancement technique that will not apply to all images from the dataset but only to the low-contrast images, which require preprocessing. Afterward, a novel deep learning model which utilizes the stock U-Net model as the backbone and is optimized to a lightweight model is mentioned along with its results and analytical discussion.
2. Related Work
Diseases related to the heart are the leading cause of death nowadays. Certain advancements are being made continuously to detect cardiac illness on time. Automatically limiting the cardiovascular MR images to specific areas of LV is an essential step to automated procurement post imaging and planning analysis, for example, function and segmentation examining errands. Cardiovascular MRI adds a useful asset for analyzing the capacity and imaging the heart structure [
9]. The diseases are either detected by manual or automated segmentation analysis. Manual delineation is slow and tedious because of the huge amount of data acquisition. So, automated segmentation is the demand which is quick, precise, and helps to encourage cardiac problems analysis [
10].
Basic image-based methodologies [
11,
12] involving threshold, shape models, active contours, region-based techniques, level sets, and graph-based techniques are used to develop automatic segmentation systems and succeeded in attaining over 90 percent of segmentation results. Image segmentation using thresholding is easier; however, the approach is powerful for segmenting images having lighter objects on darker backgrounds [
13,
14]. Segmentation of images in medical is yet a main issue due to the lack of contrast among tissues and the contextual, substantial noises and boundaries of indistinct objects. In recent eras, methods of profile-based image segmentation with ASM (active shape model) [
15,
16] and AAM (active appearance model) [
17,
18] have drawn much devotion. ASM has exposed its potential in object detection and extraction of features.
Atlas-based methodologies are used to convert the segmentation problem into a registering problem [
19,
20]. Among the most recent research [
21], ACM is applied to MICCAI 2009 cardiac dataset, another Global Local Region-Based ACM, in combination with watershed [
22]. Similarly, a combination of extended random walk and high-speed random walk [
23] and the ACM-based technique, along with region-based segmentation [
24], obtained a good dice value of 0.986. The level sets-based technique [
25] showed good results for segmentation. Another two-layered level set using a circular shape constraint [
26] shows DM (Dice Metric) (Endo: 0.92, Epi: 0.94) and APD (Endo: 1.77, Epi:1.85). The multi-atlas-based segmentation [
27] technique is used for myocardium segmentation. Many efforts have been adapted in the vast field of cardiology, including AI (Artificial Intelligence), ML (Machine Learning), CV (Computer Vision) and DL (Deep Learning) constitute a set of tools to increase the effectiveness of the cardiologist [
28]. The most commonly used AI methodologies are neural networks [
29]. These DL methods are being used for automatic image segmentation of the heart for evaluation of cardiac functions and mass parameters via various numbers of datasets [
30,
31] and providing effectiveness in results.
DL has recently demonstrated better performance and enormous potential in a number of sectors. CNN (Convolutional Neural Network) is among the extensively used DL techniques [
32,
33,
34,
35,
36,
37]. As a result of the quick growth of AI, particularly DL, image segmentation techniques have accomplished appropriate results [
38]. DL provides several benefits over conventional ML and CV techniques in relation to segmentation accuracy and speed [
39]. The application of DL to segment medical photos can, therefore, efficiently assist clinicians in confirming the size of sick tumors, quantitatively evaluating the effect before and after therapy, and significantly decreasing their burden. When it comes to image classification, DL plays a vital role. Due to the dominance regarding accuracy when trained with large amounts of data, DL has achieved much popularity. DL has two phases; encoding and decoding. The encoding phase was utilized for cardiac MR image portrayal and classification of the pixel-level data, and the phase of decoding was utilized to reestablish the first image’s resolution. The red color indicates the region of LV more readily shows the segmentation [
40].
Recently, DL has shown extraordinary potential and good performance in different fields. Among DL techniques, CNN is the most broadly utilized [
32,
37,
41,
42,
43,
44,
45,
46,
47,
48,
49,
50]. Until now, CNN structures are best considered for entire image cataloguing. Still, CNNs can likewise be utilized for the segmentation of images. CNN-based segmentation algorithms have abilities to solve numerous issues, particularly in the analysis of medical images, as they have shown their extraordinary accuracy [
51] and robustness in the past recent years [
52,
53]. With the advancements in CNN [
54,
55,
56,
57,
58,
59,
60], the majority of the fields in pattern identification and CV experience a tremendous improvement and revolution, including image cataloging, object recognition and image segmentation.
It is mostly DL techniques that use CNN [
61], in addition to variants and hybrid techniques such as Dense R-CNN, dual-attention, up-sampling, dilated convolution, bilinear interpolation [
62], Multi-channel DL [
63], CNN, and multi-scale features with a dynamic pixel-wise weight model for LV segmentation [
55]. Moreover, FCNNs (Fully Convolutional Neural Networks) [
64] show good dice (Epi: 0.94 ± 0.02, Endo: 0.96 ± 0.02) results. There has also been a YOLO-based network generated for LV segmentation [
65]. ACNN (Anatomically Constrained CNN) [
37] and FCNN [
45] also produce good segmentation outputs.
3. Data and Method
The processing flow of the deep learning-based novel technique is shown in
Figure 1. The major starting procedure includes the preparation of data, i.e., resizing of input images and pixel-wise normalization of resized images.
Then contrast adjustment of these images is done using Novel Intelligent Histogram Based Decision (IHBD). The next step is to train the network with the novel Light U-Net model-optimized algorithm using 2-D convolution in a distinctive hierarchal fashion. Normalization and activation are also performed in layers. The training model works to produce delineations of endocardium and epicardium. In the end, testing of the trained model is performed for extracting the segmentation of LV contours from the images of the dataset.
For evaluation purposes, the dataset of the MACCAI 2009 challenge was used. It is composed of dicom short axis, 1.5 T MR images of 45 different patient cases. Patients included are heart failure, ischemic heart failure, non-ischemic heart failure, and some normal patients are also included. The dataset provides ground truths as well. The total number of images in the dataset is 7365, including all diastolic and systolic images ranging between the complete slices of the heart.
3.1. Preprocessing: Intelligent Contrast Stretching and Histogram Equalization
The absolute requirement of image processing is the contrasting enhancement of an image. Improving visual parameters and reducing illumination issues of LV in a gray image is significant for achieving promising results. Since the data was in its raw form and presented very blurry images, the first step was to normalize the image’s pixels. The images mostly had good contrast, but in the case of a few images, the contrast was either very high or very low, which could cause less segmentation accuracy.
To deal with this issue, the IHBD (Intelligent Histogram-Based Decision) method is proposed, which can decide whether the image has low contrast. Once dataset images are classified into a low-contrast image set, these images are passed to the image enhancement module. [
66].
Figure 2a shows the images with ground truth drawn from the dataset having different contrasts.
Figure 2b represents the histogram of these images. A horizontal axis is assigned for the color scale representation, while the vertical values on the axis depict the frequency of image pixels for that specified shade. Most of the information is occupied by unilluminated areas of the generated histogram. The very first step is to generate a histogram for all input images. Then, from the histogram, a limit value as a bin low and a peak limit value is calculated for the input image. Furthermore, a function is performed to classify the images into low-contrast ROI and normal ROI by performing a comparison with each other, depending upon the image’s bin values. This is the point of decision-making using the bin values from the histogram of input images. If the value of the low bin limit is higher than the heuristic value (a value of 10 or a higher limit less than the speculative constant 240), then the image is treated as a normal image, and if not, the image is treated as a low-contrast image.
Then, after this decision is made, the problem of less distinctive shades in the images is handled through a contrast stretching approach. Similarly, the ROIs with normal contrast are forwarded to the novel segmentation process without passing it to the contrast stretching step. Ultimately, the “bin high and low limit” can be selected from the histogram features as the most relevant to this decision-making. The function of contrast stretching is accomplished through MATLAB by the stretch lim function. The histogram details the comparison of low-contrast and high-contrast images, as shown in
Figure 2b.
Table 1 shows that normal sample images from the dataset have a smaller number of bins, and the low-contrast images have a large number of bins. Low-contrast images have less separation of values from the low limit to the high limit, while normal images have more value separation. Hence, the sequence of steps for the decision-making using IHDB is clearly mentioned in Algorithm 1.
Algorithm 1. Algorithm for IHBD. |
Input: Image (I) |
Output: Segmented endocardium and epicardium contours |
Begin |
|
end |
The process of IHBD is performed on the whole dataset as represents a maximum number of images i from the dataset and hence all images are separated into low-contrast and normal images.
Table 2 shows that in the case of endocardial contour, 2495 images are low-contrast, and 4870 images are normal images, which clearly shows that 33.87% of the data has low-contrast images in the case of endocardial contour. In the case of epicardial contour images, 2247 are low-contrast images, and 5118 images are normal images, which shows that 30.5% of images are low-contrast images. Hence the low-contrast images need to be normalized.
The histogram equalization uses pixels edge points data. A variational platform is generated depending upon existing differences in the intensity of ROI pixels and shade variations of its neighbor pixels. The following Equation (1) is used for generating the variational map
.
where
is a variational platform,
represents the input image sample at
and
position,
represents the set of all images
represents the distinct intensity range for the low-contrast image, and it ranges between ‘0’ to
.
represents the frequency of different intensities. After deep analysis,
is selected to have a value of ‘15’ for this case. Histogram is figured using a variational platform as in Equation (2).
where
depicts the histogram of the variational platform.
represents rows, and
represents columns of pixels in
input image. Next, a probability density function (pdf) for
from the histogram, which can be quantified using Equation (3), which normalizes the count values.
Variational platform
is iterated with
, and
denoting cumulative distribution function (CDF) is computed using Equation (4) using a discrete function
.
where pdf is iterated with samples ranging to
from the
as in Equation (3). Here
is a mapping function that is calculated through CDF, as given in Equation (5).
The is iterated with the probability distribution function and multiplied by the cumulative distribution function. Finally, the structure of the enhanced image is preserved using a guided filtering process.
The proposed intelligent contrast stretching is demonstrated in
Figure 3. Low-contrast images are shown in row
Figure 3a; all images are clearly dark, having pixels values closer to 0. These images are processed through the image adjust function, which maps the pixel values to equalize in weights within the acceptable limit of contrast, as shown in
Figure 3b. The final mapping of adjusted pixels on the image is shown in
Figure 3c. Results show prominent enhancement in contrast to images. A mathematical evaluation of these images or the performance of this enhancement algorithm is tested through PSNR values calculations as shown in
Figure 3d. PSNR values are prominently showing remarkable achievement as given images in
Figure 3 have values as 54.95, 54.43, 54.2698 and 54.6299.
3.2. Light Deep Learning U-Net Model
A generalized perspective of the proposed Light U-Net network is that the encoder uses weights from the database [
67]. The pre-trained encoder, built on structures, combines spatial convolutions with 3 × 3 kernel size, ReLU activation, and layers of batch normalization and inserts 256 × 256 pixel input images into the proposed model. In this instance, the encoder will filter and learn the properties of the images, feeding the network using compact depth-to-depth convolution. By reducing the number of parameters, these inverted blocks make it simpler and quicker to train the model.
A pre-trained network has the added benefit of improving the model’s performance and accelerating convergence. The up-sampling procedures are performed in the decoding route to expand the feature map back to its original size. This approach involves concatenating the characteristics between the encoder and decoder blocks, as well as passing them via a 3 × 3 convolution layer, batch normalization, and sigmoid activation. In order to construct the segmented map, the last block in the network contains a 1 × 1 convolutional layer and a sigmoid activation. The stepwise description of the proposed model is depicted in
Figure 4. This U-Net is built on the basis of the architecture of CNN with suitable adaptations to the nature of grayscale MR images. The root elements used as a building block of the proposed Light U-Net model are also mentioned in
Figure 4.
The encoder and decoder paths are preliminary to the U-Net architecture. Here, for Light U-Net, both are utilized to achieve the desired training. The layers utilized are the convolutional layer, max pooling, ReLU activation, sigmoid activation functions, concatenation, and recollecting as up-sampling is applied to reach the endpoints of the deep net model. The data propagate through these network paths of encoding, decoding and successions.
For training, the original ROI is provided as input to the U-Net model. The encoder block (Eb) and decoder block (Db) make up the U-Net. The encoder and decoder blocks are constituted by two 3 × 3 convolutional layers that are repeated (double-conv). The design of the encoder block contributes uniqueness as three convolutional layers are used in parallel and each with one less additional layer of batch normalization and activation stepwise. When comparing Eb with Db, Db links a reverse convolution layer after double-convolution, and Eb connects a max-pooling layer after double convolution. After applying double convolution, a dense block transfers and adds a 1x1 convolutional layer to provide the final segmentation output.
The design of the model is depicted layerwise in
Figure 4; a multiplexed convolutional operation starts from the input, and the first convolutional block is a complete set of batch normalization and activation functions. In parallel, the second convolutional block, as a comparison to the first, skips batch normalization; similarly, following this reducing pattern, the third block consists of only one convolutional layer. All these collectively add up to the next layer and concatenate to the decoder block as well. The design of the decoder side is in a smooth fashion, one above the other convolutional blocks with batch normalization, activation and up-sampling. The design of the model is shown in
Figure 4.
Finally, the softmax function and a 1 × 1 convolution layer are utilized for convexing the extended feature set to 2 so it can produce the predicted segmentation, as mentioned in Equation (6).
where
stands for the generated value of
number channel at coordinates
i and
j of the output map,
stands for the SoftMax value of
number of channels at
i and
j coordinates of the output map, and
x = 1, 2. The cross-entropy function yields the following cost function (Equation (7)).
where the ground truth channel is at the
level is indicated by
at the given locations. Then, the cost function is reduced using the gradient descent algorithm as given in Equation (8).
where
are the symbols used for all U-Net parameters. A model was obtained that was almost optimum after training. To obtain a dual-channel output of
, the unprocessed-ROI is inserted as raw input into the training zone of U-Net. The following formula yields the final binary mask
g as in Equation (9).
4. Results
The framework is implemented using the Spyder (Python 3.9) NN deep learning framework as the training phase. Adam optimizer is also used as an optimizer. The time taken by the system to perform each step is 2 s. In this approach, the starting learning rate at a value of 0.001 was decomposed by 0.95 at each epoch, and the weight deterioration (L2 regularization) was adjusted at 1 × 10−4. The experiments are executed on an NVIDIA RTX 3070 GPU.
The TensorFlow [
68] framework was used as a backend, and the Keras [
69] library as a frontend to construct the suggested model. The training and testing subsets of the MICCAI 2009 data set are each given a proportion of 70% and 30%, respectively. The training is conducted using a 0.0001 learning rate, handling with a batch width of 16, the selected optimizer being Adam, and the model is trained up to 100 epochs before convergent executing 76 epochs. In order to further prevent overfitting, early stopping regularization is applied to the validation subset, especially to the validation loss. It took 4 h to finish the network’s training.
Formulated gauges help to measure the performance of newly explored techniques. Among these standard measures, the most used are the dice metric (DM), accuracy, average perpendicular distance (APD), the intersection of union (IoU), recall and precision. The dice metric reports the execution of unwavering quality in terms of manual segmentation consequences and pixel-wise overlapping of the generated map. In the equation intersection,
represents the segmented points by novel approach,
is the sign used for manual results of segmented points. Hausdroff distance (HD) is a measure that shows the distance between the ground truth contour and the segmented region. Intersection-over-union (IoU) is determined sequentially through the images between ground truth (GT) and predicted segmentation (IP). The recall measures the similarity proportion between the algorithm-calculated boundary and the boundary of ground truths. As in the equation below, the
represents recall.
represent the results of the proposed technique and gold standard manual contours. The precision is another scale that measures the sameness of ground truths and the proposed result. In other words, we can say it is a probability of valid results. The equation below shows
as precision.
represent the results of the proposed technique and ground truth contours. These equations are given below.
4.1. Experiment 01: LV Endocardium Segmentation Results
A hyperparameter that manages the quantity at the base of gradient descent for iterations over the full training dataset is its quantity of epochs. When the subsequent cycle of the same dataset simulation occurs, known as the next epoch, the weights that were initially established will be subject to changes. Underfitting and Overfitting are the two fundamental issues that plague epoch optimization. An optimization of gradient descent iteratively adjusts the weights of the neural network. The network underfits the data if we train it for only a few epochs. This indicates that the neural network is unable to detect the data’s underlying trend. An increase in epochs eventually reaches an ideal state where we will obtain the greatest accuracy on the training set, which consequently provides improved results on the testing set. If we continue to increase the number of epochs after this point, the data will become overfit. A hypermeter requires tuning, and we cannot predict in advance how many epochs would be needed to achieve the optimum level of training. To train the neural network, we can only employ a few heuristics and a fixed number of epochs while keeping an eye on the precision of the testing results.
In
Table 3, when 20 epochs were used, the initial values in terms of accuracy, dice coefficient, IoU, loss and precision were low. With the upsurge in the number of epochs, the values are increased continuously, gained precision and yielded good results. Increasing the number of epochs, in the beginning, showed much difference in the values optimistically. However, as the number of epochs reached nearly 60, the learning rate now showed less increase and attained the maximum accuracy from a certain number of epochs. The endocardium showed a 92.7% accuracy and precision ranging to 98.17%, i.e., a good result. The dice coefficient reached up to 97.1%.
The novel technique of deep learning shows promising results for all slices of the heart and for all phases of the heartbeat, i.e., both systolic and diastolic. In addition, this novel approach shows good results for the LVOT images as well, which are extremely difficult images to segment. Randomly, a few images from different cases are selected to show the visual results of the proposed novel deep learning model, which are shown in
Figure 5.
Figure 5 is divided into three columns. The first column shows the input image of LV with the ground truth. The second column shows the mask being applied on the ground truth images. Lastly, after the mask is applied, we get the results of the novel deep learning algorithm Light U-Net applied on the ground truth images. The results are clearly showing the remarkable performance of a novel deep-learning algorithm. Images from the basal slice, apical slice and LVOT are shown in the figure. The purpose of this comparison is to show the difference and improvement achieved after the implementation of the novel Light U-Net model against the U-Net model on endocardial images of the dataset.
Figure 6 shows the performance range through graphical representation in terms of accuracy, dice measure, IoU, loss, precision, and recall values of the endocardium against a number of epochs. The blue and red lines in
Figure 6 show the training and testing results of the novel Light U-Net model, respectively, while the U-Net model’s training and testing are indicated using green and purple lines, respectively. Here, in
Figure 6a, the accuracy of the proposed model can be clearly seen to improve and is more targeted than the U-Net model.
Figure 6b shows the dice coefficient curves. In the initial epochs, the dice coefficient has increased spontaneously, but after crossing 15 epochs, the learning rate has slowed down, and the maximum results are achieved at 60 epochs.
Figure 6c provides the IoU results of the endocardium, which increases swiftly at the start and provides the best results at 60 epochs. The loss values are lower over the training data at the end of each epoch. In
Figure 6d, our Light U-Net model provides the minimum loss at 60 epochs, and initially, until 30 epochs, the value of loss decreases gradually. The proposed algorithm results of precision, as shown in
Figure 6e, show a straight jump from lower values to very high values in just a single epoch, after which the change is less as the line in the graph is nearly straight. The recall values of the endocardium are given in
Figure 6f, and these curves show a strange order as the recall curve of the U-Net model first decreases and then suddenly increases, while the Light U-Net model provides a complete increase until 60 epochs.
4.2. Experiment 02: LV Epicardium Segmentation Results
The epicardium results are also the most important, as are the endocardium results. Initially, 20 epochs provided an accuracy of 92.6% and a precision ranging to 98.6%. With the upsurge in the number of epochs, it can be seen below in
Table 4 that the epicardium results are gaining precision and showing a positive change. Reaching 60 epochs shows complete, accurate and precise values, i.e., with an accuracy of 93.7%, the precision reaches 99.9% and the dice coefficient, 97.1%. Thus, the positive change in the results was the result of increasing the number of epochs.
The results of the innovative deep learning method novel Light U-Net applied to the ground truth images are obtained after the mask has been applied. The novel deep-learning technique is producing promising outcomes for all heart slices and for both the systolic and diastolic stages of the heartbeat. To demonstrate the visual results of the novel proposed deep learning model, a few randomly selected images from various cases are provided in
Figure 7. The results clearly demonstrate the novel deep learning algorithm’s outstanding performance.
Three columns make up
Figure 7, as mentioned in the visual results of the endocardium. The ground truth input image of LV is displayed in the first column. The mask is applied to the images of ground truth shown in the second column of the figure. Additionally, the LVOT images, which are difficult to segment, are also responding well to our novel approach. This comparison shows the difference between the implementation of the Light U-Net model against the stock U-Net model for epicardial contours.
Figure 8 provides the graphical trends of the epicardium against a number of epochs using statistical performance measures as previously mentioned; in the endocardium, the blue and red lines show the results of the novel Light U-Net model. On the other hand, the results from the U-Net model are indicated using green and purple lines. The overall trends in the graph show that the proposed model is yielding better results.
4.3. Experiment 03: Comparison of Contours Results between Proposed Model and U-Net Model
A comprehensive demonstration of the workflow of the model and its end results is necessary. Simple U-Net architecture is implemented on dataset mask images with ground truth marked on images obtained through the data provided by the Sunnybrook code of evaluation through MATLAB implementation. The resultant segmentation measures produced from both the U-Net implementation and the proposed Light U-Net model are shown in
Table 5. For training, the values of the hyperparameters, dataset and evaluation metrics images are used similarly for both models so performances can be compared. Overall, the novel Light U-Net performs well on the LV segmentation challenge. It further shows that the segmented analytical results from the proposed model are on the mark more than those of a simple U-Net, as depicted in
Table 5. It shows that using a novel Light U-Net to confine the actual MRI is an essential step that lowers the formulation difficulty, which consequently lowers the likelihood of segmentation errors.
4.4. Left Ventricle Segmentation Results of MICCAI Dataset
The results for LV using a deep learning methodology are also evaluated using the publicly available dataset, MICCAI 2009, which has short-axis MRI images that encapsulate all images from the outflow tract to the apical of the heart with the ED to the ES phases of the heart. Each patient’s data is in a separate folder, including the images of a full heart scan with phases of the heartbeat. This dataset is used for training the model and then its testing. In
Table 6 below, selecting 60 epochs provided the accuracy ranging to 0.932 along with precision at 0.9985, Dice matrix at 0.988, IoU at 0.9695 and loss at 0.0135. These results show that if the number of epochs is increased, then the results will be improved.
4.5. Proposed Segmentation Results Comparison with Existing Techniques
To match the performance of the proposed technique, the standard, publicly available dataset, MICCAI 2009, is used. For comparison, relevant state-of-the-art latest methods are considered, such as ResU-Net, half-U-Net, multi-scale, multi-skip, dilated dense, and a combination of CNN and U-Net.
Table 7 below illustrates that the proposed method obtained improved performance compared to existing methods.
5. Discussion
U-Net is suitable for segmentation problems. Pixel-level information can be preserved by this U-Net and CNN architectures, which is the main reason for choosing these architectures for segmentation. The dataset of MICCAI 2009 images uses four different types of patients with a whole scan of the heart from the apex to the base, and all systolic and diastolic phase data are present in this dataset, along with its ground truths. Obtaining results from the basic U-Net model does not provide better accuracy, so some modifications can enhance the results. The proposed convolutional model was used as a base by adding new layers in a unique minimizing hierarchy, which improved the results. Moreover, enhancement of the data also adds to the good results. The comparison between the U-Net and proposed Light U-Net segmentation models is depicted in the figure below. The input image, along with its ground truth, is used by both models as shown in
Figure 9a. Then, a mask is applied in the second step as shown in
Figure 9b. The difference in the results of both models is depicted in the figure.
In
Figure 9c, the U-Net model segmentation provides rough results, and the obtained images have a distortion in them, while the proposed Light U-Net model as shown in
Figure 9d, shows a complete difference and provides enhanced results. Image number 3 in the figure shows LVOT. The LVOT in the U-Net model has not been completely segmented, and the results are not clear. In our model of Light U-Net, the LVOT image is segmented very well and improved as compared to the U-Net model.
6. Conclusions and Future Work
This innovative method contributes to processing and normalizing images by performing an intelligent histogram selection of images based on the novel enhancement technique. The novel Light U-net model, having a parallel unique structure of convolutional layers at the encoder side, contributes to the unique architecture of the model. Over the authentic and well-known dataset, the method demonstrated excellent performance and outperformed recent approaches by showing a dice coefficient value of 97.1%. The testing accuracy of the dataset is 92%, and precision is up to 98.17%. The proposed method is intended to eliminate manual and semiautomatic LV segmentation. Deep learning models touching the borders of high levels of accuracy and precision pushed us to deploy neural net learning methods. As per our expectations, the novel approach of U-Net along with the novel enhancement technique, outperformed the recently developed algorithms.
The shape and size of the endocardium and epicardium at the slices of the apical regions and at LVOT are not clearly predictable, which presents significant challenges to the segmentation problem and our model also faces challenges at these stages that must be handled in new techniques by adding or setting parameters for training the network; the use of a vision transformer can also enhance the model. Increases in the testing and training datasets may also contribute to better training configurations.