MRI-Based Brain Tumor Classification Using a Dilated Parallel Deep Convolutional Neural Network

Rahman, Takowa; Islam, Md Saiful; Uddin, Jia

doi:10.3390/digital4030027

Open AccessArticle

MRI-Based Brain Tumor Classification Using a Dilated Parallel Deep Convolutional Neural Network

by

Takowa Rahman

¹,

Md Saiful Islam

^1,*

and

Jia Uddin

^2,*

¹

Department of Electronics and Telecommunication Engineering, Chittagong University of Engineering and Technology, Chittagong 4349, Bangladesh

²

AI and Big Data Department, Endicott College, Woosong University, Daejeon 300-718, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Digital 2024, 4(3), 529-554; https://doi.org/10.3390/digital4030027

Submission received: 3 May 2024 / Revised: 27 June 2024 / Accepted: 27 June 2024 / Published: 28 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

Brain tumors are frequently classified with high accuracy using convolutional neural networks (CNNs) to better comprehend the spatial connections among pixels in complex pictures. Due to their tiny receptive fields, the majority of deep convolutional neural network (DCNN)-based techniques overfit and are unable to extract global context information from more significant regions. While dilated convolution retains data resolution at the output layer and increases the receptive field without adding computation, stacking several dilated convolutions has the drawback of producing a grid effect. This research suggests a dilated parallel deep convolutional neural network (PDCNN) architecture that preserves a wide receptive field in order to handle gridding artifacts and extract both coarse and fine features from the images. This article applies multiple preprocessing strategies to the input MRI images used to train the model. By contrasting various dilation rates, the global path uses a low dilation rate (2,1,1), while the local path uses a high dilation rate (4,2,1) for decremental even numbers to tackle gridding artifacts and to extract both coarse and fine features from the two parallel paths. Using three different types of MRI datasets, the suggested dilated PDCNN with the average ensemble method performs best. The accuracy achieved for the multiclass Kaggle dataset-III, Figshare dataset-II, and binary tumor identification dataset-I is 98.35%, 98.13%, and 98.67%, respectively. In comparison to state-of-the-art techniques, the suggested structure improves results by extracting both fine and coarse features, making it efficient.

Keywords:

brain tumor classification; data augmentation; grid effect; multiscale dilated parallel convolution; machine learning classifiers; receptive field

1. Introduction

A growth that may adversely impact a person’s life is a brain tumor, which can appear in the tissues enclosing the brain or skull. Two characteristics can identify a benign or malignant growth. While secondary tumors, also referred to as brain metastasis tumors, are typically formed from tumors outside the brain, primary cancers start inside the brain. Meningiomas, pituitary adenomas, and gliomas are the three most common primary brain tumors. The brain and spinal cord membrane layers are the origin of meningiomas, a type of tumor that grows slowly. Cancerous cells that arise in the pituitary gland are referred to as pituitary adenomas [1]. The brain tissue is compressed by the irregular growth of these tumors. Malignant tumors, in comparison with benign tumors, grow unevenly and damage the tissues around them. Surgical techniques are frequently employed in the treatment of brain tumors [2]. Because MRI is non-interfering, it is preferred over computed tomography (CT), positron emission tomography (PMT), and x-rays [3]. It is estimated that 79,340 Americans aged 40 and older will be diagnosed with a primary brain tumor by 2023. It is estimated that one million Americans suffer from primary brain tumors; of these, 72% are benign tumors and 28% are malignant. The adults with primary brain tumors typically have meningioma (46.1%), glioblastoma (16.4%), and pituitary tumors (14.5%) [4,5]. Biopsies are taken for analysis after the tumor is found using standard medical techniques like MRI. The first test used in medicine to find cancer is MRI [6,7]. Two MRI pictures of two distinct brains are shown in Figure 1.

As the number of patients has grown, individually analyzing these images has become laborious, disorganized, and frequently incorrect. A computer-aided diagnostic technique that improves the ease of brain MRI identification needs to be developed to overcome this limitation. Many attempts have been made to create an extremely effective and trustworthy method for classifying brain tumors automatically. Conventional approaches to machine learning rely on manual qualities, which increase the cost and limit the durability of the solution. But occasionally, models of supervised learning can perform better than unsupervised learning strategies, leading to an overfitted framework that is inappropriate for another large database. These challenges underscore the significance of creating a machine-learning-based, fully automated system for classifying brain tumors.

A CNN’s architecture is based on a neural network known as the deep learning model, which excels at image recognition and classification [9,10]. The receptive field in a CNN is too tiny to produce excellent precision [11]. A large receptive field of the convolution kernel would help enhance the efficiency of the classification techniques because the fixed size of the sliding window in a CNN prohibits the utilization of techniques like convolution, pooling, and flattening. The recommended model’s parameters possess the ability to acquire characteristics extracted from the images. While hyperparameters are focused on, recent iterations of CNN models have yet to focus much on them. Another important consideration is a CNN’s local feature collection. Furthermore, because of the limited quantity of the kernel, sharply raising the dilation rate could exacerbate feature collection failures and hinder small-object detection [12]. High dilation rates may impact tiny-object detection. As a result, the dilation rate has been gradually decreased in this suggested model. By doing this, the dilated feature map’s sparsity has decreased, and more data can be extracted from the investigated region.

Using publicly available Kaggle and Figshare datasets, this work aims to develop a fully autonomous dilated PDCNN with an average ensemble model for brain tumor classification [8,13,14]. This article suggests an architecture for the detection and classification of brain tumors that consists of two synchronously dilated DCNNs. Because convolutions are accurate and time-efficient processes, the dilated PDCNN with an average ensemble model performs more quickly than the conditional random field (CRF)-based methods. The recommended dilated PDCNN with an average ensemble framework incorporates batch normalization to normalize the results of previous layers.

By simultaneously integrating two DCNNs with two distinct window sizes, parallel pathways enable the model to learn both global and local features. While maintaining a large receptive field, this research also recommends managing gridding artifacts and extracting both coarse and fine characteristics from the images. Key accomplishments of the work are shown in these aspects:

A dilated PDCNN is suggested for brain tumor classification with even-numbered dilation rate decrements in the global and local paths and a combination of two parallel CNNs with data preparation (image preprocessing, data augmentation) and hyperparameter tuning.
To improve the identification and classification performance, both high-level and low-level data, along with particular brain features, are integrated.
To achieve high accuracy and computational efficiency, the architecture of the suggested dilated convolution with an expanded receptive field of the kernel is analyzed in detail.
The recommended experimental findings are discussed, including why low precision in brain tumor identification with the dilation rate is caused by the tiny receptive field of the PDCNN.
To greatly improve the dynamic properties offered by the two simultaneous convolutional layers, feature fusion technology is employed.

The remaining portion of this work will be organized as follows: A summary of pertinent studies and a thorough assessment of these investigations are presented in Section 2. The recommended dilated PDCNN with an average ensemble approach is described in detail in Section 3. Section 4 and Section 5 describe the proposed approach, thoroughly compare it to existing approaches, and present the outcomes of the experiment. Section 6, the last section of the study, brings the article to an end.

2. Related Work: A Brief Review

There are multiple studies in the literature that categorize brain tumors differently. A few of the works that have been analyzed are listed here.

The method proposed by Anil et al. [15] consists of a classification network that divides the input MRI images into two groups: one that contains tumors and another that does not. In this study, the classifier for brain cancer identification is retrained by applying the transfer learning approach. With a success rate of 95.78%, the results show that VGG-19 is the most efficient. To categorize brain tumors, Muhammad Sajjad et al. establish a new CNN model [16]. First, segmentation is used to identify the location of the tumor from MRI images. The dataset is enlarged in the next phase. The categorization process ends up using the suggested CNN. The data are classified with 94.58% accuracy. Habib [17] recommends a CNN model that uses the Kaggle binary category of brain tumor dataset-I, which is used in this study for recognizing brain cancers. With an updated neural network architecture, this method can attain an accuracy of 88.7%. The authors of [5] describe the development of a model centered on a simulated CNN for MRI analysis using matrix calculations and mathematical formulas. A total of 155 brain tumors and 98 brains with no tumors are used to train this neural network employing MRI. The model demonstrates a tumor’s location with a 96.7% correctness rate in the validation data.

A multi-pathway CNN structure is created by Díaz et al. [18] to automatically segment brain tumors, including pituitary tumors, meningiomas, and gliomas. They achieve 97.3% accuracy when testing their proposed model on a publicly available T1-weighted contrast-enhanced MRI dataset. Their atmosphere for learning is quite expensive, though. Mahmoud Khaled Abd-Ellah et al. recommend a PDCNN framework in [19] to identify and categorize gliomas from brain MRI images. The proposed PDCNNs are tested on the BraTS-2017 dataset. In this study, 1200 images are employed for the PDCNN’s training phase, 150 images are employed for its validation phase, and 450 images are applied for its testing phase. The framework obtains impressive outcomes in terms of sensitivity, specificity, and accuracy (97.44%, 97.00%, and 98.00%, respectively).

To classify brain tumors, Kwabena Adu et al. propose a less trainable CapsNet structure in [20]. This architecture uses segmented tumor regions as inputs, and it outperforms related works with a greater accuracy of 95.54%. To improve and maintain the high resolution of the images being used for better classification, the network also employs dilation. The architecture’s dilation has shortened training times and a decreased number of elements that need to be learned. A. E. Minarno et al. use a CNN structure to identify three different kinds of brain tumors on MRI images [21]. A total of 3264 datasets containing detailed images of meningioma tumors (937 photos), pituitary cancers (901 photos), glioma tumors (926 photos), and other tumor-free datasets (500 photos) are analyzed in this study. The CNN method is presented along with hyperparameter tuning to achieve the best possible results for brain tumor categorization. This paper tests the framework in three distinct cases. Classifying brain tumors with an accuracy of 96.00% is the result of the third model evaluation scenario.

In order to classify brain tumors, seven deep CNN models are evaluated by M.A. Gomez-Guzman et al. [22]. The dataset used for the proposed study is Msoud, which consists of 7023 MRI images from the Figshare, SARTAJ, and Br35H datasets. Three classes of brain tumors—glioma, meningioma, and pituitary—and one class of healthy brains comprise the MRIs included in the dataset. Multiple preprocessing strategies are used in this paper’s training of the models using input MRI images. Xception, MobileNetV2, InceptionV3, InceptionRes-NetV2, Generic CNN, ResNet50, and EfficientNetB0 are the CNN models that are assessed. The best CNN model for this dataset is InceptionV3, which achieves an average accuracy of 97.12% when all CNN models—including a generic CNN and six pre-trained models—are compared.

A CNN architecture is recommended by M. I. Mahmud et al. [23] for the effective identification of brain tumors using MR images. A comparison between the suggested architecture and a number of models, including ResNet-50, VGG16, and Inception V3, is also covered in this article. The suggested model outperforms the others after being compared to other models using various metrics, including accuracy, recall, loss, and area under the curve (AUC). The CNN model performs well with a dataset of 3264 MR images, exhibiting 93.3% accuracy, 98.43% AUC, 91.19% recall, and 0.25 loss.

3. Proposed Brain Tumor Detection and Classification Methodology

Prior to beginning treatment, the most significant challenge is identifying and categorizing brain MRI tumors. There are not many studies on tumor diagnosis as a time-saving method, despite the majority of brain tumor identification research focusing on tumor slicing and positioning methods. Most DCNN-based methods are unable to acquire global context details of larger regions because of the small receptive fields. Stacking multiple dilated convolutions has the disadvantage of creating a grid effect, even though dilated convolution maintains data resolution at the output layer and expands the receptive field without incorporating calculation. If the dilation factor (DF) is low, the model may have a smaller receptive field but misses the coarse characteristics. In contrast, when the DF is excessive, the model is unable to learn from the finer details. This study proposes the use of a dilated PDCNN architecture that maintains a large receptive field to cope with gridding distortions and capture both coarse and fine attributes from images. Initial input image resizing is followed by grayscale transformation to minimize complexity. Data augmentation has since been used to expand the number of datasets. While maintaining an extensive receptive field, the dilated PDCNN utilizes the reduced computational cost and helps to reduce gridding artifacts. The schematic representation of the suggested dilated PDCNN design is presented in Figure 2.

The sequence that follows is the order in which the recommended structure’s events occur: Brain MRI images are fed into the input layer of the dilated PDCNN after being processed. The initial images are converted from various resolution dimensions to 32 × 32 pixels for training reasons. The grayscale transformation of these input images contributes to a reduction in complexity. Following that, new images are created from prior ones using data augmentation. The dataset has been split into training and testing subsets in order to train the suggested network. The PDCNN structure then makes use of the chosen dilated rates to effectively classify the input images. Following the classification of the images using four classifiers, support vector machine (SVM), K-nearest neighbor (KNN), naïve Bayes (NB), and decision tree, the brain tumor identification process is completed using an average ensemble approach.

The step-by-step flow of the suggested framework is mentioned in Algorithm 1.

Algorithm 1 Algorithm based on brain tumor detection and classification approach

Input Three different representative public datasets of brain MRI pictures: I_j,j = 1, 2, …, K of size

W \times H

and the class labels C = 2, 3, 4.

Output Brain tumor detection and classification

1: Develop a dilated PDCNN with selected parameters

2: Accuracy ← 0

3: for epoch = 1, 2, … , N_epochs do

4: for image = 1, 2, ……, K_{batch_size} do

5: Image_resized ← Resize (image, set W = 32, H = 32)

6:

{I m a g e}_{g r a y}

← Grayscale, using

{I m a g e}_{g r a y} = r g b 2 g r a y ({I m a g e}_{r e s i z e d})

7:

{I m a g e}_{d a t a}

←

Augmentation ({I m a g e}_{g r a y})

8: Input layer takes

{I m a g e}_{d a t a}

and sends it to the convolution layers to extract features

9: Suggested parallel dilated deep convolutional layers

10: Train the dilated PDCNN model with ML classifiers including SVM, KNN, NB, and decision tree

11: Calculate the test accuracy for each ML classifier with the average ensemble

12: Calculate error rate, e(t)

13: end for

14: end for

3.1. Dataset

This study makes use of three distinct public datasets containing images from brain MRIs. The details regarding the dataset are provided as follows.

Dataset-I: Through the Kaggle platform, an initial accessible dataset of binary-class MRI scans of the brain has been obtained for simplicity, and this dataset is widely used. This dataset is known as dataset-I in this study [8]. This set of 253 brain MRI images includes 98 samples with tumors and 155 samples without tumors.

Dataset-II: The Figshare dataset containing 233 patients’ brain MRI images is employed in this research [13]. These brain MRI images were obtained at Nanfang Hospital and General Hospital, two Chinese medical centers. This dataset, designated dataset-II, comprises 3064 brain MRI scans, including 1426 glioma tumors, 708 meningioma tumors, and 930 pituitary tumors.

Dataset-III: The additional dataset utilized in this study can also be obtained via the Kaggle website [14]; it contains brain MRI images of 826, 822, 395, and 827 glioma tumors, meningioma tumors, no tumors, and pituitary tumors, respectively. This collection of data is identified as dataset-III in the current research. The four different kinds of brain MRI images that are present in dataset-III are shown in Figure 3.

3.2. Data Preprocessing

A method for enhancing the efficiency of a machine learning model is called data preprocessing, which involves purifying and preparing data for usage by the model. The skull photos in the MRI datasets are not all identical in width and height; instead, each image is scaled to 32 × 32 pixels for training purposes. The grayscale conversion of these data contributes to a reduction in the level of complexity. Digital images can be noise-free without having their edges blurred through the utilization of the anisotropic diffusion filter. After the utilization of the anisotropic diffusion filter, Table 1 represents the filtered dataset.

3.3. Data Augmentation

Since deep learning requires a lot of data to extract information, data enhancement is being employed at this time to increase the quantity of available data by altering the initial image. Supplementary data can be used to increase the effectiveness of categorized outcomes. Illustrations can undergo the following procedures: shifting, scaling, translation, and filtering methods. This article uses the process of anisotropic diffusion filtering as augmentation. Dataset statistics for this research are presented in Table 2.

3.4. Developed Dilated PDCNN Design

This paper presents the design of a multiscale dilated two-simultaneous deep CNN technique to extract multiscale detail characteristics from MRI images. To increase the receptive field despite adding more parameters to the network, dilated convolution is used. Additionally, batch normalization is used to guarantee that the model’s precision will not drop as the network depth increases.

The multiscale extraction of characteristics, integrating path, and classification stage are the three main elements of the suggested network, as illustrated in Figure 4. Since the suggested model uses dilated CNNs, the DF is an additional hyperparameter that must be considered.

Both local and global characteristics are acquired in the dilated PDCNN framework through the corresponding local and global routes. However, most DCNN-based methods cannot effectively collect both local and global data because of their tiny receptive fields. Stacking multiple dilated convolutions has the disadvantage of creating a grid effect, even though dilated convolution maintains data resolution at the output layer and expands the receptive field without incorporating computation. In the case of a poor DF, the model may contain a smaller receptive field and nevertheless miss the coarse features. In contrast, with an excessive DF, the model is unable to pick up the finer details. By contrasting various DFs, these suitable DFs are chosen for both local and global feature paths. Each of the convolutional layers is followed by the max-pooling layer for every single path that down-samples the outcome of the convolutional layer and uses the ReLU activation function. In the end, an average ensemble method is employed to carry out the brain tumor categorization process after four ML classifiers have trained the images.

The step-by-step flow of the suggested dilated PDCNN structure is mentioned in Algorithm 2.

Algorithm 2 Algorithm based on dilated PDCNN model

Parallel dilated deep convolutional layers

i.: For the local and coarse path, set the window size = 5, and for the global and finer path, set the window size = 12
ii.: Divide I_j, j = 1, 2, …, K into blocks a_ij, i = 1, 2, 3, … B, j = 1, 2, 3, …, K of size w × h × d, where d = Number of feature maps in I_j and B = Number of blocks created from each I_j
iii.: Flatten $a_{i j}$ into vector $x_{i} \in R^{n}$ , i = 1, 2, 3, …M, where M = K × B, and D = w × h × d
iv.: Large dilation _rate: coarse features
v.: Small dilation _rate: finer features
vi.: Compare different configurations of dilation rates to find best-diagnosed results
vii.: Compute ReLU activation function, $f (x) = \{\begin{matrix} x f o r x \geq 0 \\ 0 f o r x < 0 \end{matrix}\}$
viii.: Compute cross-channel normalization, $x^{ʹ} = \frac{x}{{(K + \frac{α \times s s}{w i n d o w c h a n n e l s i z e})}^{β}}$ , where $α, β, K$ are the hyperparameters in the normalization and ss = sum of squares of the elements in the normalization window
ix.: Apply max-out plan $Z_{s}$ to different feature maps $O_{s}, O_{s + 1}, \dots \dots, O_{s + k - 1}$ , which takes maximum over the $O_{s}$ and maps it individually, as represented in $Z_{s, i, j} = \max (O_{s, i, j}, O_{s + 1, i, j}, \dots \dots, O_{s + k - 1, i, j})$
x.: Apply $A d a m_O p t i m i z e r$ to minimize error rate
xi.: Repeat steps ii, vii, and ix; twice for both parallel paths where the filter number 96
xii.: Update weights using $back_p r o p a g a t i o n$
xiii.: ${W e i g h t s}_{B e s t}$ ← Save weights
xiv.: Employ the optimized weights to extract the multiscale features in the training set

3.4.1. Multiscale Feature Selection Path

CNNs have been used extensively in the field of medicine and have demonstrated good results in the segmentation and classification of medical images [24]. CNN architectures are built using a variety of building blocks, such as fully connected (FC) layers, pooling layers, and convolution layers. Convolution layers, which combine linear and nonlinear operations—that is, activation functions and convolution operations—are used in feature extraction [25,26]. Kernels and their hyperparameters, such as the size, quantity, stride, padding, and activation function of each kernel, are the parameters of convolution layers [27]. Six convolution layers are used in the two simultaneous paths, and the convolution operation occurs using Equation (1).

O_{p, q, r}^{l} = f (W_{r}^{l} I_{p, q}^{l - 1} + b_{r}^{l})

(1)

where for the

r^{t h}

kernel in layer

l,

O_{p, q, r}^{l}

expresses the resultant feature map of position

(p, q)

,

W_{r}^{l}

represents the weight vector’s values,

I_{p, q}^{l - 1}

indicates the input vector of position

(p, q)

in

l - 1,

and

b_{r}^{l}

is the symbol of bias. In addition, the activation function is

f (.)

[28]. By down-sampling, pooling layers lower the dimensionality of the feature maps. The stride, padding, and filter size are among the hyperparameters that comprise pooling layers, although they do not contain any other parameters. Two common varieties of pooling layers are max pooling and global average pooling. Maximum pooling layers are used in this structure. The output size of the pooling operation in the CNN is calculated using Equation (2).

O = [\frac{n - f + 2 p}{s}] + 1

(2)

where

n

stands for the dimension of input,

f

is the kernel size, the padding size is shown by

p

, and

s

is symbol of stride size [28].

The pooling layers’ feature maps are smoothed out and sent to several one-dimensional (1D) vectors known as FC layers. The most popular activation parameter for FC layers is the rectified linear unit (ReLU), which is illustrated in Equation (3).

f (x) = \{\begin{matrix} x f o r x \geq 0 \\ 0 f o r x < 0 \end{matrix}\}

(3)

The final FC layer’s activation function is usually SoftMax for the categorization of multiple classes and Sigmoid for binary classification. The node values in the final FC layer of the proposed model are computed using Equation (4), and the sigmoid activation function for a binary categorization dataset-I is calculated using Equation (5) [25].

z = w^{T} h + b

(4)

P (y = 1| x) = \frac{1}{1 + e x p (- z)}

(5)

where

h

stands for the neural network layers’ internal calculations,

b

shows the bias, and

w

stands for the weights used to determine an output node’s value. Furthermore, the input vector and output class are denoted by

x

and

y

, respectively. The SoftMax activation function is calculated using Equation (6) for the multiclass categorization of Figshare dataset-II; and Kaggle dataset-III; in this proposed structure.

P (y| x) = \frac{\exp (f_{y})}{\sum_{c = 1}^{C} \exp (f_{c})}

(6)

where

x

stands for the input vector and

y

for the class in the case of a multiclass categorization problem. Additionally, the

c^{t h}

component of the class rating vector in the final FC layer is displayed by

f_{c}

. The category

k

with the highest

P

coefficient is chosen as the output class in the SoftMax activation function [25]. A backpropagation algorithm is used during CNN training to adjust the weights of the FC and convolution layers. The two main elements of backpropagation are the loss function and gradient descent (GD), among which GD is used to minimize the loss function. Among the loss functions most frequently employed by CNNs is the cross-entropy (CE) loss function. For the binary categorization dataset-I with a sigmoid activation function, the CE loss function is computed using Equation (7).

L = \frac{1}{N} \sum_{i = 1}^{N} - [y_{i} \log (\frac{1}{1 + \exp (- z)}) + (1 - y_{i}) \log (\frac{\exp (- z)}{1 + \exp (- z)})]

(7)

where

z

is computed using Formula (4). For the multiclass categorization Figshare dataset-II and Kaggle dataset-III with the SoftMax activation function, the CE loss function is calculated using Equation (8) [28,29].

L = \frac{1}{N} \sum_{i = 1}^{N} - \log (\frac{\exp (f_{y i})}{\sum_{c = 1}^{C} \exp (f_{c})})

(8)

where

N

denotes the quantity of training elements, the input image class

i^{t h}

is indicated by

y_{i}

, and the

c^{t h}

component of the category scores vector in the final FC layer is represented by

f_{c}

[28].

Expanding the receptive field in deep learning involves boosting the dimension and depth of the convolution kernel, which in turn enhances the number of elements in the network. By adding weights of zero to the conventional convolution kernel, dilated convolution may enhance the receptive field without adding more network elements.

Equation (9) defines the convolution function * as follows: 1-D dilated convolution using DF, where

l = 1

connects the input image

F

alongside kernel

k

. The term “standard CNN” refers to this 1-D convolution. The network is identified as a dilated CNN when

l

rises.

(F * k) (p) = \sum_{s + t = p} F (s) \times k (t)

(9)

Upon the introduction of a DF denoted as

l

and through its expansion,

l

is referred to as,

(F *_{l} k) (p) = \sum_{s + l t = p} F (s) \times k (t)

(10)

Using Equation (10), the dilated convolution operation is calculated in this proposed structure. The fundamental CNN has a value of

l = 1

[29,30].

The main function of the dilated convolution layer is to extract features. In addition to conveying fine and high-level feature details, MRI images also contain rough and low-level information. As a result, image data must be extracted at several scales. Specifically, the local and global routes are employed to obtain the local and global features. Within the local route, the convolutional layers make use of the small 5 × 5 pixels window dimension to provide low-level details about the images. However, a vast number of filters with 12 × 12 pixels are present in the convolutional stages of the global path. The same 5 by 5 filters are used by three different convolution layers throughout the local path, and each layer’s decremental even number of the high DF (4,2,1) is the only factor used to produce the coarse feature maps. Three distinct convolution layers in the global path employ identical 12 × 12 filters, and the generation of finer feature maps is exclusively dependent on the tiny DF (2,1,1) of every single layer. As illustrated in Figure 4, three convolution layers with distinct filter numbers (128,96,96) are applied at each feature extraction path to extract image data at various scales.

Conv1, Conv3, and Conv4 provide local as well as coarse features, while Conv2, Conv5, and Conv6 supply global as well as fine features. The max-pooling layer is employed after each convolutional layer for each path that down-samples the output of the convolutional layer. By employing a 2 × 2 kernel, the max-pooling layers lower the dimension of the attributes that are produced.

A dimension of (32,32,1) is assigned to each input tensor in the suggested model’s structure. To test the impact of the DF on the model’s efficiency and comprehend the gridding impact brought about by the dilation approach, the interior design is kept as simple as possible. In the local path, layer Conv1 applies a 5 × 5 filter and a dilation factor of

d_{1}

= 4 to generate coarse feature maps (such as shapes and contours); layer Conv3 applies the same filter and dilation factor of

d_{2}

= 2 along with the final convolution to generate coarse feature maps once more; and layer Conv4 applies a 5 × 5 filter and dilation factor of

d_{3}

= 1 to generate coarse feature maps. In the global route, layer Conv2 applies a 12 × 12 filter and a dilation factor of

d_{4}

= 2, layer Conv5 applies the same filter and dilation factor of

d_{5}

= 1 along with the last convolution to generate fine feature maps once more, and layer Conv6 applies a 12 × 12 filter and a dilation factor of

d_{6}

= 1 to generate fine feature maps. The activation function of ReLU is utilized by all six convolutional stages.

3.4.2. Merge Stage

A merge layer connects the two routes, creating a single path with a cascaded link until it reaches the endpoint. This process extracts multiscale features, where local paths with high dilation rates extract local as well as coarse features, and global paths with low dilation rates extract global as well as fine features. Two fully interconnected layers that are connected to a dropout layer through the merging pathway come after a batch normalization layer and a ReLU layer. In order to address the issue of performance degradation brought on by a boost in neural network stages, batch normalization is used. Equation (11) provides the feature map that results after a merging phase [19].

Z = σ (B N (f (X)))

(11)

where σ stands for the ReLU activation function, BN denotes the batch normalization function, and

f (X)

denotes the fused feature maps from each channel in the preceding paths.

3.4.3. Hyperparameter Tuning

Hyperparameter adjustment is a successful parameter-searching technique for the suggested dilated PDCNN framework. The dense layer, optimization, and dropout measure are among the parameters that must be chosen to perform this PDCNN adjustment. It provides the framework with the ideal set of parameters, producing the most effective results. Hyperparameter settings for model training are displayed in Table 3.

The training data for the simulated scenario are provided by the effective adjustment of the hyperparameter, which includes the adaptive moment estimation (Adam) optimizer, 0.3 dropout, 512 dense layers, and a 0.0001 rate of learning. In this work, the weight of the layers is updated via Adam, the optimizer that calculates the adaptive learning rates of every parameter. The training setting employs a validation frequency of 20 Hz. The highest average accuracy for the test datasets is collected for each run. When the epoch count reaches 70, the framework is trained employing a range of epoch counts; it acquires 98.67% accuracy for dataset-I. It acquires 98.13% and 98.35% accuracy for dataset-II and dataset-III, respectively, when the epoch number is 60.

3.4.4. Feature Map of Dilated Convolutional Layers

A CNN feature map represents specific attributes in the input image as the result of a convolutional layer. It is produced by filtering input images or the previous layers’ feature map output. The feature maps that are acquired from every convolutional layer are presented in Figure 5 and Figure 6. In Figure 5, the low-level and coarse features of the three convolutional layers conv_1, conv_3, and conv_4 having filters of 128, 96, and 96 are displayed. The feature maps in this figure are primarily composed of coarse and local features, which represent the texture in an image. In this local path, a dilated CNN algorithm that has DFs associated with (

d_{1}

= 4,

d_{2}

= 2,

d_{3}

= 1) is referred to as a dilated PDCNN (4,2,1). In Figure 6, the high-level feature maps including contour representations, shape descriptors, and fine features of the three deeper convolutional layers conv_2, conv_5, and conv_6, having the same filters, are shown. DFs corresponding to (

d_{4} = 2

,

d_{5} = 1

,

d_{6}

= 1) are used in this global path. The multiscale feature maps, which are displayed in Figure 7, are greatly improved when these features are combined using a feature fusion technique. Figure 8 displays the final multiscale features that are extracted, along with a fully connected layer that is prevented from overfitting by employing the dropout technique.

3.4.5. Parameters for Dilated PDCNN Model

Both the FC and convolutional layers provide parameters that can be learned. Parameters are the quantities of weights that the CNN structure learns during training. It is possible to calculate the convolutional layer parameters (

P_{c o n v}

) equation as follows:

P_{c o n v} = F_{h} \times F_{w} \times F_{n u m} \times C_{i n} + F_{n u m}

(12)

F_{h}

and

F_{w}

stand for the length and width of the filter, accordingly.

F_{n u m}

indicates the quantity of filters.

C_{i n}

indicates the associated layer’s input channel quantity. The parameters of the layer that is fully connected (

P_{F C}

) are as follows:

P_{F C} = A_{(p r e v)} \times N_{(u n i t)} + N_{u n i t}

(13)

where

A_{(p r e v)}

indicates the prior layer’s activation pattern, and

N_{(u n i t)}

denotes the number of neurons that make up the present FC layer. There are no variables that can be learned in the max-pooling layer. The batch normalization layer’s variables are the product of the number of channels utilized in the preceding convolutional layer [10]. Details of the suggested dilated PDCNN framework are presented in Table 4.

3.5. Classification Stage

In this categorization phase extracting all the multiscale attributes from the last FC layer, four types of classifiers, SVM, KNN, NB, and decision tree, are used to categorize the three types of brain tumor datasets. These ML classifiers and their hyperparameter settings used in this experiment for brain tumor classification are discussed in the following subsections.

3.5.1. SVM

The SVM algorithm is a categorization technique that works by creating a hyperplane with the maximum margin between classes. An SVM needs a collection of instances, each of which can be expressed as a pair of (

x_{i}

,

y_{i}

).

x_{i}

is an input vector’s symbol, and its matching class label is indicated by

y_{i}

.

In SVM, the ideal hyperplane is computed as illustrated in Equation (14) [31].

F (x) = \sum_{i = 1}^{l} α_{i}^{*} y_{i} {K (x}_{i}, x_{j}) + b^{*}

(14)

where

x_{i}

represents features from the brain MRI,

α_{i}^{*}

defines the Lagrange multiplier, and

y_{i}

indicates the category of the used three types of datasets.

Hence, a multiclass linear support vector machine with a linear kernel function and a zero verbose value is employed in this suggested method.

3.5.2. K-NN

The initial training set, which is kept in the memory, is utilized directly by k-NN to make predictions. In order to categorize a newly acquired data set, including a feature from a brain MR image, for example, k-NN selects the set of k elements from the training samples that are nearest to the newly collected data by measuring the distance and allocates the label with two categories, normal and tumor; three categories, namely glioma, meningioma, and pituitary; or four categories, normal, glioma, meningioma, and pituitary tumor. The newly created object is selected by a majority decision of its k neighbors. The following is the calculation of the Euclidean distance (d) utilizing the k-NN algorithm between data points x and y in the present research [28]:

d (x, y) = \sqrt{\sum_{i = 1}^{N} {(x_{i} - y_{i})}^{2}}

(15)

The efficacy of the algorithm is assessed using the correct classification provided during the test phase. The coefficient K is modifiable until a respectable degree of accuracy is achieved if the outcome does not meet expectations. As can be seen, the value of neighbors is limited to 5, and the best value for K is determined to be 5. This value is then applied with the maximum precision to dataset-I, -II, and -III, respectively. The standardization of the predictors is performed using the zero standardized value.

3.5.3. Naïve Bayes (NB)

The NB classifier is an ML classifier with the assumption of conditional independence between the attributes given in the class. In this article, the final class is predicted using vectors of attributes and class prior probabilities, as shown in Equation (18) [28].

P (C = i| X = x) = \frac{P (X = x| C = i) \times P (C = i)}{P (X = x)}

(16)

where

X

indicates the given data instance (extracted deep features from brain MR image), which is represented by its feature vector (

x_{1}, \dots, x_{n}

), and

C

is the class target (type of brain tumor) with two classes (normal and tumor) for binary dataset-I, three classes (glioma, meningioma, and pituitary) for Figshare dataset-II, or four classes (normal, glioma, meningioma, and pituitary) for multiclass Kaggle dataset-III. Here,

P (X = x | C = i)

in this classifier is computed by taking the dataset’s features to be independent.

The value for the minimum threshold of probabilities for the NB classifier is 0.001. For every predictor and class combination, the kernel smoothing window width is automatically chosen by default in this suggested approach.

3.5.4. Decision Tree

For both classification and regression, decision tree structures provide a non-parametric supervised learning technique. In this work, this technique is employed to build a model that utilizes basic decision rules deduced from the data features extracted from brain MR images. With training vectors

x_{i} \in R^{n}

, i= 1,…,l and a label vector

y \in R^{l}

, a decision tree successively divides the domain of features so that instances that share similar desired values are organized together [32].

Three datasets are used in this proposed approach. Here, dataset-I at node

m

is denoted by

Q_{m}

. Splitting

θ = (j, t_{m})

for every candidate that is composed of a characteristic j and threshold

t_{m}

divides the dataset-I,

Q_{m}

, into

Q_{m}^{l e f t} (θ)

and

Q_{m}^{r i g h t} (θ)

subgroups

Q_{m}^{l e f t} (θ) = \{(x, y)| x_{j} \leq t_{m}}

(17)

Q_{m}^{r i g h t} (θ) = Q_{m} \ Q_{m}^{l e f t} (θ)

(18)

Next, to calculate the quality of a potential split of node m, a loss function H() is used.

G (Q_{m}, θ) = \frac{n_{m}^{l e f t}}{n_{m}} H (Q_{m}^{l e f t} (θ)) + \frac{n_{m}^{r i g h t}}{n_{m}} H (Q_{m}^{r i g h t} (θ))

(19)

This process should be carried out for

Q_{m}^{l e f t} (θ^{*})

and

Q_{m}^{r i g h t} (θ^{*})

subsets until the maximum permitted depth is achieved. These steps are also followed for the other two datasets multiclass Figshare dataset-II and Kaggle dataset-III.

The maximum number of branch nodes in the suggested method is fixed at 1. In this approach, there must be a minimum of one leaf node.

3.5.5. Average Ensemble Method

Machine learning and signal analysis both use the statistical technique of ensemble averaging. Model averaging is a machine learning technique for ensemble learning in which each member of the ensemble makes an equal contribution to the ultimate prediction. A group of frameworks often outperforms a single one because the individual errors in the models “average out.” In the average ensemble method, the actions are as follows:

Create N experts, each starting at a different value; typically, initial values are selected at random from a distribution.
Train every specialist independently.
Add up all the experts and take the mean of their scores.

These steps of the average ensemble method are used to average the accuracy to acquire the final outcome in the proposed method [33,34].

4. Experimental Outcomes and Evaluation

MATLAB is used to run the implementation program for the suggested dilated PDCNN model. The computing device is equipped with a Core i5 processor manufactured by Intel, running at 3.2 GHz, with 8 GB of RAM and a Windows 10 operating system installed.

4.1. Performance Analysis of Suggested Dilated PDCNN Model

The confusion matrix is used to express the classification system’s results. The efficiency is evaluated using the following criteria [35].

A c c u r a c y = \frac{T r u e P o s i t i v e + T r u e N e g a t i v e}{(P o s i t i v e + N e g a t i v e)}

(20)

P r e c i s i o n = \frac{T r u e P o s i t i v e}{(T r u e P o s i t i v e + F a l s e P o s i t i v e)}

(21)

R e c a l l = \frac{T r u e N e g a t i v e}{(F a l s e P o s i t i v e + T r u e N e g a t i v e)}

(22)

F 1 - S c o r e = \frac{2 \times T r u e P o s i t i v e}{(2 \times T r u e P o s i t i v e + F a l s e P o s i t i v e + F a l s e N e g a t i v e)}

(23)

Table 5 provides an overview of the dilated PDCNN algorithm’s efficiency indicators using ML classifiers over dataset-I. As per the findings in Table 5, the dilated PDCNN model that incorporates KNN and decision tree classifiers has the best F1-score, recall, accuracy, and precision in comparison with the remaining models. The suggested dilated PDCNN model utilizing KNN and decision tree classifiers yields 100.00% for all performance criteria, which is better than the outcomes obtained by other ML classifiers. The dilated PDCNN model that includes SVM and NB classifiers is noteworthy for having 100% precision and recall, which is also the same as the dilated PDCNN model alongside KNN and decision tree classifiers. However, the KNN and decision tree classifiers execute better concerning other performance indicators. The suggested dilated PDCNN utilizing the average ensemble approach for dataset-I has finalized accuracy, precision, recall, and F1-score values of 98.67%, 98.62%, 99.17%, and 98.28%, respectively, after implementing the average ensemble technique.

An overview of the dilated PDCNN algorithm’s efficiency indicators using ML classifiers over dataset-II is provided in Table 6. The results shown in Table 6 demonstrate that, when contrasted with other scenarios, the dilated PDCNN algorithm using the NB classifier provides the highest performance indicators. The suggested dilated PDCNN model incorporating the NB classifier outperforms the findings of the remaining ML classifiers alongside an accuracy of 98.90%, precision of 98.67%, recall of 98.67%, and F1-score of 98.67%. In the end, using the average ensemble approach, the suggested dilated PDCNN employing the average ensemble technique for the dataset-II has accuracy, precision, recall, and F1-score values of 98.13%, 97.74%, 98.05%, and 97.80%, respectively.

Table 7 provides an overview of the effective measurements of the suggested dilated PDCNN model using machine learning classifiers for dataset-III. In comparison to other models, the accuracy, precision, recall, and F1-score of the suggested dilated PDCNN alongside SVM classifier are 98.60%, 98.50%, 98.25%, and 98.50%, respectively, based on the results shown in Table 7. The findings of the dilated PDCNN employing the average ensemble technique for dataset-III are, after executing the average ensemble strategy, 98.35%, 98.35%, 97.85%, and 98.20%, respectively, for accuracy, precision, recall, and F1-score.

4.2. Comparative Analysis of Different Dilation Rates

While dilated convolution retains data resolution at the output layer and increases the receptive field without adding computation, stacking several dilated convolutions has the drawback of producing a grid effect. Validating the results involves comparing multiple combinations of dilation rates for the various convolution layers. High dilation rates may impact tiny-object recognition. As a result, the DF gradually decreases (even-numbered arithmetic decrease) at the local path in the suggested framework. By doing this, the dilated feature map’s sparsity is reduced, and more data can be extracted from the investigated region. At the global path, the low DF (2,1,1) is carried out to extract the fine features.

A comprehensive review of the gridding issue and the consequences of different dilation rates can be found in the accompanying Figure 9. The poor efficiency of the (4,2,1) dilated value for the global pathway as well as the (8,4,2) dilated value for the local route of the suggested model is caused by the gridding phenomenon, which arises when a high DF is used. This limits the framework from acquiring finer characteristics. When a high DF (4,2,1) is used for both local and global paths, the accuracy increases more than before. On the contrary, using low dilation rates, the model only learns fine features. When a low DF (1,1,1) is used in the global path and (2,2,2) is used for the local path, the value of accuracy for dataset-I, dataset-II, and dataset-III is 97.30%, 97.70%, and 93.70% respectively. When a low DF (2,1,1) is used for the global feature and a high DF (4,2,1) is used for the local feature, the highest accuracy is achieved. The highest accuracy for dataset-I, dataset-II, and dataset-III is 97.33%, 98.20%, and 97.94% respectively. Providing the best-case scenario, a well-balanced model (4,2,1) for the local path and (2,1,1) for the global path may acquire both the coarse and the fine characteristics of the pictures.

4.3. Evaluation Measurements of the Proposed System on the Three Datasets

With the SVM, KNN, NB, and decision tree classifiers for dataset-I, Table 8 illustrates the classification accuracy, error, duration, and kappa scores for the suggested PDCNN as well as dilated PDCNN architectures. When the expected precision of the random classifier is considered, the kappa statistic expresses how closely the instances identified by the classification model matched the data assigned as ground truth. In comparison to the PDCNN alongside the average ensemble model, the dilated PDCNN has a larger kappa. The error rate is reduced, and the elapsed time is increased following the application of dilation to the PDCNN with the average ensemble model.

The success rate, error, period, and kappa statistics for the suggested PDCNN and dilated PDCNN architectures employing the SVM, KNN, NB, and decision tree classifiers for dataset-II are presented in Table 9. When the expected precision of the random classifier is considered, the kappa statistic expresses how closely the instances identified by the classification model match the data assigned as ground truth. As compared to the PDCNN employing the average ensemble model’s kappa, the dilated PDCNN offers a greater kappa. The error rate drops when dilation is applied to the PDCNN using the average ensemble method, but the time that passes increases.

Table 10 presents the kappa values, accuracy, error, and training duration for the recommended PDCNN and dilated PDCNN models that employ the SVM, KNN, NB, and decision tree classifiers for dataset-III, in that order.

When compared to the PDCNN employing an average ensemble model, the dilated PDCNN has a higher kappa value. The error rate is reduced, and the elapsed time is increased following the application of dilation to the PDCNN with the average ensemble model.

4.4. Impact of Applying Dilation on the Proposed Model

In the categories of efficiency, precision, recall, F1-score, error rate, kappa, and training time, Figure 10 shows that the suggested dilated PDCNN alongside the average ensemble approach performs better than the conventional PDCNN alongside the average ensemble framework. Values of the effectiveness indicators will increase even further if dilation is applied to increase the efficiency of the recommended approach.

These findings show that in comparison to the proposed PDCNN model using the average ensemble technique, the proposed dilated average ensemble classifier for three types of dataset indicates a higher accuracy, precision, recall, F1-score, and kappa and a lower error rate and execution time.

4.5. Comparison of the Suggested Model with Prior Investigations Based on Three Datasets

A comprehensive assessment is made at the end of the validation process for the proposed approach. A brief overview is shown in Table 11.

5. Discussion

With an increasing number of patients, manually analyzing MRI images has grown more complicated, time-consuming, and frequently inaccurate. Conventional machine learning techniques utilize manual properties, which reduce the solution’s durability and raise its cost. Nonetheless, there are situations when supervised learning models perform better than unsupervised learning strategies, leading to an overfitted structure that is inappropriate for another large database. These problems emphasize how crucial it is to create a fully machine-learning-based classification system for brain tumors. By combining the average ensemble technique with PDCNN, this investigation presents a novel approach to the identification and classification of brain tumors. The dilated PDCNN architecture includes both local and global multiscale feature selection paths, a merging phase, and categorization pathways. The initial pictures are converted to grayscale, which makes the process easier. After that, new images are produced from old ones by employing data augmentation. Using a modest window size of 5x5 pixels and gradually higher dilation rates (4,2,1) for each convolution layer, the convolutional layers in the local path collect coarse characteristics and provide local data for the images. In contrast, the global path’s convolutional layers obtain fine details by using a large window dimension of 12 by 12 pixels and low dilation rates (2,1,1) for every layer of convolution. The ReLU activation function and max-pooling layer are applied after each convolutional layer for each path that down-samples the convolutional layer output. A fusion layer connects the two parallel pathways, forming a single path with a cascading link that continues until it reaches the end destination. Two fully connected layers that are attached to a dropout layer that is included in the merging route come after a batch-normalized layer and a ReLU layer. In the output path, the four classifier types—SVM, KNN, NB, and decision tree—are used to execute the brain tumor categorization procedure. A regularization method called dropout is also employed to stop the training data from being overfitted.

Table 5, Table 6 and Table 7 present the performance parameters of the dilated PDCNN model with ML classifiers on binary dataset-I, multiclass Figshare dataset-II, and multiclass Kaggle dataset-III. Among all the performance metrics, including accuracy, precision, recall, and F1-score, for three different brain tumor datasets employing the average ensemble technique, binary classification dataset-I provides the best outcomes. The values of accuracy, precision, recall, and F1-score of the dilated PDCNN model with binary classification dataset-I are 98.67%, 98.62%, 99.17%, and 98.28%, respectively.

The impact of different dilation rates on the model’s accuracy has been examined for the dilated PDCNN. The comparison analysis among various dilation rate arrangements for the various convolution layers is displayed in Figure 9. The comparative study demonstrates that the decremental large dilation rate (4,2,1) for the local path and the low dilation rate (2,1,1) for the global path yield the best results, based on an understanding of the gridding phenomenon and various recommendations for the dilation rate parameter for each layer. For datasets I, II, and III, the highest accuracy values obtained are 98.67%, 98.13%, and 98.35%. This demonstrates that while the global path (lower dilation rates) gains knowledge from the finer features, the local path (higher dilation rates) concentrates on the coarse features. The best outcomes are obtained with this combination.

Table 8, Table 9 and Table 10 display the evaluation results, including accuracy, error rate, time, and kappa value, for the proposed system with the three types of datasets. The lowest error rate of 1.33% is obtained for the binary classification dataset- I, and the highest value of kappa of 0.977 is obtained for the multiclass Kaggle dataset-III.

The results shown in Figure 10 demonstrate that in terms of accuracy, precision, recall, F1-score, error rate, kappa, and training duration, the suggested dilated PDCNN with the average ensemble model performs better than the standard PDCNN with the average ensemble approach. The performance indicators’ values will increase even further if the three different types of datasets are dilated to increase the suggested dilated PDCNN’s efficiency with the average ensemble model. A thorough comparison is performed once the evaluation of the proposed method is accomplished. The findings demonstrate that the suggested simultaneous network topology outperforms detection and classification techniques that have been previously published.

6. Conclusions and Future Work

Since brain tumors vary in shape, dimension, and structure, the proper identification of these conditions remains extremely difficult. It is well known how important it is to detect brain tumors early to receive the right medical care. This study proposed a dilated PDCNN structure with ML classifiers to detect and classify brain tumors from MRI images. The proposed dilated PDCNN with the average ensemble method is evaluated for binary and multiclass classification using the Kaggle dataset, which contains four different types of tumor images, while the Figshare dataset contains three types of tumor images. The suggested dilated convolution with an expanded receptive field of the kernel increases the computation efficiency while preserving high accuracy. The framework achieves outstanding accuracy, precision, recall, and F1-score regarding the binary brain cancer dataset-I. In order to gain a better understanding of the inner workings of the network and the effectiveness of the dilation rate parameter, experimental evaluation can be performed on other datasets in future investigations. Additionally, studies can be carried out to identify brain tumors with greater accuracy by utilizing actual patient information from any source (various images captured by scanners).

Author Contributions

Conceptualization, M.S.I.; Methodology, T.R.; Software, T.R.; Data curation, T.R.; Writing—original draft, T.R. and M.S.I.; Writing—review and editing, M.S.I. and J.U.; Project administration, J.U.; Funding acquisition, J.U. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by Woosong University Academic Research 2024.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sadoon, T.A.; Al-Hayani, M.H.A. Deep Learning Model for Glioma, Meningioma and Pituitary Classification. Int. J. Adv. Appl. Sci. (IJAAS) 2021, 10, 88–98. [Google Scholar] [CrossRef]
Kheirollahi, M.; Dashti, S.; Khalaj, Z.; Nazemroaia, F.; Mahzouni, P. Brain tumors: Special Characters for Research and Banking. Adv. Biomed. Res. 2015, 4, 4. [Google Scholar] [CrossRef]
Vankdothu, R.; Hameed, M.A. Brain Tumor MRI Images Identification and Classification Based on the Recurrent Convolutional Neural Network. Meas. Sens. 2022, 24, 100412. [Google Scholar] [CrossRef]
Brain Tumors—National Brain Tumor Society. 2023. Available online: https://braintumor.org/brain-tumors/about-brain-tumors/brain-tumor-facts/ (accessed on 2 May 2023).
Irsheidat, S.; Duwairi, R. Brain Tumor Detection Using Artificial Convolutional Neural Networks. In Proceedings of the 2020 11th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 7–9 April 2020; pp. 197–203. [Google Scholar]
Toğaçar, M.; Ergen, B.; Cömert, Z. BrainMRNet: Brain Tumor Detection using Magnetic Resonance Images with a Novel Convolutional Neural Network Model. Med. Hypotheses 2020, 134, 109531. [Google Scholar] [CrossRef]
Jiang, X.; Jin, Y.; Yao, Y. Low-dose CT Lung Images Denoising based on Multiscale Parallel Convolution Neural Network. Vis. Comput. 2021, 37, 2419–2431. [Google Scholar] [CrossRef]
Brain MRI Images for Brain Tumor Detection. Kaggle. Available online: https://www.kaggle.com/datasets/navoneel/brain-mri-images-for-brain-tumor-detection (accessed on 15 May 2023).
Yamashita, R.; Nishio, M.; Do, R.K.G.; Togashi, K. Convolutional Neural Networks: An Overview and Application in Radiology. Insights Imaging 2018, 9, 611–629. [Google Scholar] [CrossRef]
Sejuti, Z.A.; Islam, M.S. A hybrid CNN-KNN Approach for Identification of COVID-19 with 5-fold Cross-validation. Sens. Int. 2023, 4, 100229. [Google Scholar] [CrossRef] [PubMed]
Lin, G.; Wu, Q.; Qiu, L.; Huang, X. Image Super-resolution Using a Dilated Convolutional Neural Network. Neurocomputing 2018, 275, 1219–1230. [Google Scholar] [CrossRef]
Afsar, P.; Plataniotis, K.N.; Mohammadi, A. Capsule Networks for Brain Tumor Classification based on MRI Images and Coarse Tumor Boundaries. In Proceedings of the ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 1368–1372. [Google Scholar]
Cheng, J. 2017 Figshare Dataset. Available online: https://figshare.com/articles/braintumordataset/1512427 (accessed on 15 May 2023).
Brain Tumor MRI Dataset. Kaggle. Available online: https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset (accessed on 15 May 2023).
Anil, A.; Raj, A.; Sarma, H.A.; Chandran, R.; Deepa, R. Brain Tumor detection from brain MRI using Deep Learning. Int. J. Innov. Res. Appl. Sci. Eng. (IJIRASE) 2019, 3, 68–73. [Google Scholar] [CrossRef]
Sajjad, M.; Khan, S.; Muhammad, K.; Wanging, W.; Ullah, A.; Baik, S.W. Multi-grade Brain Tumor Classification using Deep CNN with Extensive Data Augmentation. J. Comput. Sci. 2019, 30, 174–182. [Google Scholar] [CrossRef]
Habib, M.A. Brain Tumor Detection Using Convolutional Neural Networks. Medium, 25 November 2021. Available online: https://medium.com/@mohamedalihabib7/brain-tumor-detection-using-convolutional-neural-networks-30ccef6612b0 (accessed on 1 June 2023).
Díaz-Pernas, F.J.; Martínez-Zarzuela, M.; González-Ortega, D.; Antón-Rodríguez, M. A Deep Learning Approach for Brain Tumor Classification and Segmentation using a Multiscale Convolutional Neural Network. Healthcare 2021, 9, 153. [Google Scholar] [CrossRef] [PubMed]
Abd-Ellah, M.K.; Awad, A.I.; Hamed, H.F.A.; Khalaf, A.A.M. Parallel Deep CNN Structure for Glioma Detection and Classification via Brain MRI Images. In Proceedings of the 2019 31st International Conference on Microelectronics (ICM), Cairo, Egypt, 15–18 December 2019; pp. 304–307. [Google Scholar]
Adu, K.; Yu, Y.; Cai, J.; Tashi, N. Dilated Capsule Network for Brain Tumor Type Classification Via MRI Segmented Tumor Region. In Proceedings of the International Conference on Robotics and Biomimetics, Dali, China, 6–8 December 2019; pp. 942–947. [Google Scholar]
Minarno, A.E.; Mandiri, M.H.C.; Munarko, Y.; Hariyady, H. Convolutional Neural Network with Hyperparameter Tuning for Brain Tumor Classification. Kinet. Game Technol. Inf. Syst. Comput. Netw. Comput. Electron. Control. 2021, 6, 127–132. [Google Scholar] [CrossRef]
Gómez-Guzmán, M.A.; Jiménez-Beristaín, L.; García-Guerrero, E.E.; López-Bonilla, O.R.; Tamayo-Perez, U.J.; Esqueda-Elizondo, J.J.; Palomino-Vizcaino, K.; Inzunza-González, E. Classifying Brain Tumors on Magnetic Resonance Imaging by using Convolutional Neural Networks. Electronics 2023, 12, 955. [Google Scholar] [CrossRef]
Mahmud, M.I.; Mamun, M.; Abdelgawad, A. A Deep Analysis of Brain Tumor Detection from MR Images using Deep Learning Networks. Algorithms 2023, 16, 176. [Google Scholar] [CrossRef]
Zhou, C.; Song, J.; Zhou, S.; Zhang, Z.; Xing, J. COVID-19 Detection based on Image Regrouping and Resnet-SVM using Chest X-ray images. IEEE Access 2021, 9, 81902–81912. [Google Scholar] [CrossRef] [PubMed]
Sourab, S.Y.; Kabir, M.A. A Comparison of Hybrid Deep Learning Models for Pneumonia Diagnosis from Chest Radiograms. Sens. Int. 2022, 3, 100167. [Google Scholar] [CrossRef]
Balaji, K.; Lavanya, K. Medical Image Analysis with Deep Neural Networks. In Deep Learning and Parallel Computing Environment for Bioengineering Systems; Academic Press: Cambridge, MA, USA, 2019; pp. 75–97. [Google Scholar]
Fathi, E.; Shoja, B.M. Deep Neural Networks for Natural Language Processing. In Handbook of Statistics; Elsevier: Amsterdam, The Netherlands, 2018; Volume 38, pp. 229–316. [Google Scholar]
Yaseliani, M.; Hamadani, A.Z.; Maghsoodi, A.I.; Mosavi, A. Pneumonia Detection Proposing a Hybrid Deep Convolutional Neural Network Based on Two Parallel Visual Geometry Group Architectures and Machine Learning Classifiers. IEEE Access 2022, 10, 62110–62128. [Google Scholar] [CrossRef]
Roy, S.S.; Rodrigues, N.; Taguchi, Y.-H. Incremental Dilations Using CNN for Brain Tumor Classification. Appl. Sci. 2020, 10, 4915. [Google Scholar] [CrossRef]
Kesav, N.; Jibukumar, M.G. Efficient and Low Complex Architecture for Detection and Classification of Brain Tumor using with Two Channel CNN. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 6229–6242. [Google Scholar] [CrossRef]
Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A Comprehensive Survey on Support Vector Machine Classification: Applications, Challenges and Trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
Available online: https://www.geeksforgeeks.org/decision-tree/ (accessed on 24 October 2023).
Dietterich, T.G. Ensemble Methods in Machine learning. In International Workshop on Multiple Classifier Systems; Springer: Berlin, Germany, 2000; Volume 1857, pp. 1–15. [Google Scholar]
Bhuiyan, M.; Islam, M.S. A New Ensemble Learning Approach to Detect Malaria from Microscopic Red Blood Cell Images. Sens. Int. 2023, 4, 100209. [Google Scholar] [CrossRef]
Rahman, T.; Islam, M.S. MRI Brain Tumor Detection and Classification using Parallel Deep Convolutional Neural Networks. Meas. Sens. 2023, 26, 100694. [Google Scholar] [CrossRef]
Choudhury, C.L.; Mahanty, C.; Kumar, R. Brain Tumor Detection and Classification using Convolutional Neural Network and Deep Neural Network. In Proceedings of the 2020 International Conference on Computer Science, Engineering and Applications (ICCSEA), Gunupur, India, 13–14 March 2020; pp. 1–4. [Google Scholar]
Sultan, H.H.; Salim, N.M.; Al-Atabany, W. Multi-Classification of Brain Tumor Images Using Deep Neural Network. IEEE Access 2019, 7, 69215–69225. [Google Scholar] [CrossRef]
Priyansh, S.; Maheshwari, A.; Maheshwari, S. Predictive Modeling of brain tumor: A Deep Learning Approach. Innov. Comput. Intell. Comput. Vis. 2021, 3, 275–285. [Google Scholar]
Rahman, T.; Islam, M.S. MRI Brain Tumor Classification Using Deep Convolutional Neural Network. In Proceedings of the 2022 3rd International Conference on Innovations in Science, Engineering and Technology (ICISET), Chittagong, Bangladesh, 26–27 February 2022; pp. 451–456. [Google Scholar]
Biswas, A.; Islam, M.S. A Hybrid Deep CNN-SVM Approach for Brain Tumor Classification. J. Inf. Syst. Eng. Bus. Intell. 2023, 9, 1–15. [Google Scholar] [CrossRef]
Munira, H.A.; Islam, M.S. Hybrid Deep Learning Models for Multi-classification of Tumour from Brain MRI. J. Inf. Syst. Eng. Bus. Intell. 2022, 8, 162–174. [Google Scholar] [CrossRef]

Figure 1. MRI scans performed on two different brains. (a) On the left is a tumor, and (b) on the right is a healthy brain [8].

Figure 2. Proposed methodology’s workflow.

Figure 3. Examples of brain MR images [14].

Figure 4. Proposed architecture of dilated PDCNN model.

Figure 5. Local and coarse feature maps of various convolutional layers of the dilated PDCNN. (a) Feature map of the conv_1 layer (DF: 4 × 4). (b) Feature map of the conv_3 layer (DF: 2 × 2). (c) Feature map of the conv_4 layer (DF: 1 × 1).

Figure 6. Global and finer feature maps of various convolutional layers of the dilated PDCNN. (a) Feature map of the conv_2 layer (DF: 2 × 2). (b) Feature map of the conv_5 layer (DF: 1 × 1). (c) Feature map of the conv_6 layer (DF: 1 × 1).

Figure 7. Addition of all features.

Figure 8. Feature extraction after the FC_2 layer.

Figure 9. Comparing accuracy across different configurations of dilation rates.

Figure 10. Performance metric comparison of (a) accuracy, (b) precision, (c) recall, (d) F1-score, (e) error rate, (f) kappa, and (g) execution time between three types of brain tumor datasets.

Table 1. Filtered dataset after utilization of anisotropic diffusion filter.

	No Tumor	Glioma Tumor	Meningioma Tumor	Pituitary Tumor
MRI brain pictures [12]
Anisotropic diffusion-filtered pictures

Table 2. Dataset statistics.

	Category		Original Data		Augmented Data
	Category		Number	Percentage	Number	Percentage
Dataset I	Tumor	Yes	98	61%	196	61%
	Tumor	No	155	39%	310	39%
	Total		253	100%	506	100%
Dataset II	Glioma		1426	47%	2852	47%
	Meningioma		708	23%	1416	23%
	Pituitary		930	30%	1860	30%
	Total		3064	100%	6128	100%
Dataset III	Glioma		826	28.78%	1652	28.78%
	Meningioma		822	28.64%	1644	28.64%
	Pituitary		395	13.76%	790	13.76%
	No tumor		827	28.81%	1654	28.81%
	Total		2870	100%	5740	100%

Table 3. Hyperparameter settings for model training.

Hyperparameter	Optimized Value
Optimizer	Adam
Dropout	0.3
Dense layer	512
Learning rate	0.0001
Maximum epoch	50
Validation frequency	20
Iteration per epoch	34

Table 4. Information about the suggested dilated PDCNN framework.

Layer Type	No. of Filters	Filter Size	Stride	Dilation Factor	Activation Shape	Total Learnable Parameters
Image MRI	-	-	-	-	32 × 32 × 1	0
Conv layer	128	5 × 5	2,2	4,4	16 × 16 × 128	3328
ReLU	-	-	-	-	16 × 16 × 128	0
Cross-channel normalization	-	-	-	-	16 × 16 × 128	0
Max pooling	-	2 × 2	2,2	-	8 × 8 × 128	0
Conv layer	96	5 × 5	2,2	2,2	4 × 4 × 96	307,296
Conv layer	128	12 × 12	2,2	2,2	16 × 16 × 128	18,560
ReLU	-	-	-	-	4 × 4 × 96	0
Max pooling	-	2 × 2	2,2	-	2 × 2 × 96	0
Conv layer	96	5 × 5	2,2	1,1	1 × 1 × 96	230,496
ReLU	-	-	-	-	1 × 1 × 96	0
Max pooling	-	2 × 2	2,2	-	1 × 1 × 96	0
ReLU	-	-	-	-	16 × 16 × 128	0
Cross-channel normalization	-	-	-	-	16 × 16 × 128	0
Max pooling	-	2 × 2	2,2	-	8 × 8 × 128	0
Conv layer	96	12 × 12	2,2	1,1	4 × 4 × 96	1,769,568
ReLU	-	-	-	-	4 × 4 × 96	0
Max	-	2 × 2	2,2	-	2 × 2 × 96	0
Conv layer	96	12 × 12	2,2	1,1	1 × 1 × 96	1,327,200
ReLU	-	-	-	-	1 × 1 × 96	0
Max pooling	-	2 × 2	2,2	-	1 × 1 × 96	0
Element-wise addition of two inputs	-	-	-	-	1 × 1 × 96	0
Batch normalization	-	-	-	-	1 × 1 × 96	192
ReLU	-	-	-	-	1 × 1 × 96	0
FC_1 layer	-	-	-	-	1 × 1 × 512	49,664
ReLU	-	-	-	-	1 × 1 × 512	0
Dropout layer	-	-	-	-	1 × 1 × 512	0
FC_2 layer	-	-	-	-	1 × 1 × 2	1026
						Total = 3,707,330

Table 5. Performance parameters of dilated PDCNN model with ML classifiers on dataset-I.

Dilated PDCNN Models Utilizing ML Classification	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
Dilated PDCNN	97.33	93.10	95.83	96.43
Dilated PDCNN with SVM	98.67	100.00	100.00	98.31
Dilated PDCNN with KNN	100.00	100.00	100.00	100.00
Dilated PDCNN with NB	97.33	100.00	100.00	96.67
Dilated PDCNN with decision tree	100.00	100.00	100.00	100.00
Dilated PDCNN with average ensemble	98.67	98.62	99.17	98.28

Table 6. Performance parameters of dilated PDCNN model with ML classifiers for dataset-II.

Dilated PDCNN Models Utilizing ML Classification	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
Dilated PDCNN	98.20	98.00	98.33	98.00
Dilated PDCNN with SVM	97.72	97.33	97.33	97.33
Dilated PDCNN with KNN	97.60	97.00	97.60	97.30
Dilated PDCNN with NB	98.90	98.67	98.67	98.67
Dilated PDCNN with decision tree	98.21	97.67	98.33	97.67
Dilated PDCNN with average ensemble	98.13	97.74	98.05	97.80

Table 7. Performance parameters of dilated PDCNN model with ML classifiers for dataset-III.

Dilated PDCNN Models Utilizing ML Classification	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
Dilated PDCNN	98.21	98.25	97.75	98.25
Dilated PDCNN with SVM	98.60	98.50	98.25	98.50
Dilated PDCNN with KNN	98.50	98.50	98.00	98.50
Dilated PDCNN with NB	98.57	98.50	98.00	98.50
Dilated PDCNN with decision tree	97.85	98.00	97.25	97.25
Dilated PDCNN with average ensemble	98.35	98.35	97.85	98.20

Table 8. Evaluation results of the proposed system with the binary classification dataset-I.

Structure	Classifier	Performance Indicators
Structure	Classifier	Accuracy (%)	Error (%)	Time (s)	Kappa
PDCNN	Custom PDCNN	96.03	3.97	662	0.917
	PDCNN and SVM	97.33	2.67	1020	0.943
	PDCNN and KNN	96.00	4.00	1069	0.915
	PDCNN and NB	94.67	5.33	1079	0.888
	PDCNN and decision tree	98.67	1.33	1070	0.972
	Average ensemble	96.54	3.46	980	0.927
Dilated PDCNN	Custom PDCNN	97.33	2.67	1683	0.943
	PDCNN and SVM	98.67	1.33	1020	0.972
	PDCNN and KNN	100.00	0.00	1223	1.000
	PDCNN and NB	97.33	2.67	1223	0.944
	PDCNN and decision tree	100.00	0.00	1223	1.000
	Average ensemble	98.67	1.33	1274	0.972

Table 9. Evaluation results of the proposed system with the multiclass Figshare dataset-II.

Structure	Classifier	Performance Indicators
Structure	Classifier	Accuracy (%)	Error (%)	Time (s)	Kappa
PDCNN	Custom PDCNN	97.64	2.36	8050	0.963
	PDCNN and SVM	97.71	2.29	6106	0.960
	PDCNN and KNN	97.40	2.60	6307	0.959
	PDCNN and NB	97.40	2.60	4998	0.959
	PDCNN and decision tree	96.60	3.40	7170	0.946
	Average ensemble	97.35	2.65	6526	0.958
Dilated PDCNN	Custom PDCNN	98.20	1.80	7204	0.972
	PDCNN and SVM	97.72	2.28	6106	0.962
	PDCNN and KNN	97.60	2.40	6187	0.961
	PDCNN and NB	98.90	1.10	6149	0.982
	PDCNN and decision tree	98.21	1.79	8381	0.972
	Average ensemble	98.13	1.87	6805	0.970

Table 10. Evaluation results of the proposed system with the multiclass Kaggle dataset-III.

Structure	Classifier	Performance Indicators
Structure	Classifier	Accuracy (%)	Error (%)	Time (s)	Kappa
PDCNN	Custom PDCNN	96.80	3.20	5633	0.956
	PDCNN and SVM	97.94	2.06	4462	0.972
	PDCNN and KNN	97.80	2.20	5753	0.969
	PDCNN and NB	97.90	2.10	5753	0.972
	PDCNN and decision tree	97.40	2.60	5753	0.965
	Average ensemble	97.58	2.42	5470	0.967
Dilated PDCNN	Custom PDCNN	98.21	1.79	4891	0.976
	PDCNN and SVM	98.60	1.40	4739	0.980
	PDCNN and KNN	98.50	1.50	4739	0.979
	PDCNN and NB	98.57	1.43	4739	0.980
	PDCNN and decision tree	97.85	2.15	4739	0.971
	Average ensemble	98.35	1.65	4769	0.977

Table 11. Assessment of the employed Kaggle and Figshare datasets with the methods currently in use.

Ref.	Structures	Year	Data Type	Accuracy (%)
P. Afshar et al. [12]	Capsule networks	2019	Figshare dataset-II	90.89
C. L. Choudhury et al. [36]	CNN	2020	Binary dataset-I	96.08
H. H. Sultan et al. [37]	Resize + augmentation + CNN + hyperparameter tuning	2019	Figshare dataset-II	96.13
Suhib et al. [5]	Gray transformation + resize + flatten + CNN	2020	Binary dataset-I	96.7
A. E. Minarno et al. [21]	Resize + augmentation + CNN + hyperparameter tuning	2021	Kaggle dataset-III	96.00
Priyansh et al. [38]	CNN-based transfer learning approach	2021	Binary dataset-I	Resnet-50-95,VGG-16- 90,Inception-V3-55
T. Rahman et al. [39]	Resize + gray + augmentation + binary + CNN	2022	Binary dataset-I	96.9
A. Biswas et al. [40]	Resize + anisotropic diffusion filter + adaptive histogram equalization + DCNN-SVM	2023	Figshare dataset-II	96
H.A. Munira et al. [41]	Thresholding + cropping + resizing + rescaling + CNN-RE and CNN-SVM	2022	Figshare dataset-IIKaggle dataset-III	CNN-RF-96.52CNN-SVM-95.41
T. Rahman et al. [35]	Resize + gray transformation + augmentation + PDCNN	2023	Binary dataset-I	97.33
			Figshare dataset-II	97.60
			Kaggle dataset-III	98.12
Proposed method	Resize + gray scale transformation + augmentation + dilated PDCNN + machine learning classifiers + average ensemble	-	Binary dataset-I	98.67
			Figshare dataset-II	98.13
			Kaggle dataset-III	98.35

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rahman, T.; Islam, M.S.; Uddin, J. MRI-Based Brain Tumor Classification Using a Dilated Parallel Deep Convolutional Neural Network. Digital 2024, 4, 529-554. https://doi.org/10.3390/digital4030027

AMA Style

Rahman T, Islam MS, Uddin J. MRI-Based Brain Tumor Classification Using a Dilated Parallel Deep Convolutional Neural Network. Digital. 2024; 4(3):529-554. https://doi.org/10.3390/digital4030027

Chicago/Turabian Style

Rahman, Takowa, Md Saiful Islam, and Jia Uddin. 2024. "MRI-Based Brain Tumor Classification Using a Dilated Parallel Deep Convolutional Neural Network" Digital 4, no. 3: 529-554. https://doi.org/10.3390/digital4030027

APA Style

Rahman, T., Islam, M. S., & Uddin, J. (2024). MRI-Based Brain Tumor Classification Using a Dilated Parallel Deep Convolutional Neural Network. Digital, 4(3), 529-554. https://doi.org/10.3390/digital4030027

Article Menu

MRI-Based Brain Tumor Classification Using a Dilated Parallel Deep Convolutional Neural Network

Abstract

1. Introduction

2. Related Work: A Brief Review

3. Proposed Brain Tumor Detection and Classification Methodology

3.1. Dataset

3.2. Data Preprocessing

3.3. Data Augmentation

3.4. Developed Dilated PDCNN Design

3.4.1. Multiscale Feature Selection Path

3.4.2. Merge Stage

3.4.3. Hyperparameter Tuning

3.4.4. Feature Map of Dilated Convolutional Layers

3.4.5. Parameters for Dilated PDCNN Model

3.5. Classification Stage

3.5.1. SVM

3.5.2. K-NN

3.5.3. Naïve Bayes (NB)

3.5.4. Decision Tree

3.5.5. Average Ensemble Method

4. Experimental Outcomes and Evaluation

4.1. Performance Analysis of Suggested Dilated PDCNN Model

4.2. Comparative Analysis of Different Dilation Rates

4.3. Evaluation Measurements of the Proposed System on the Three Datasets

4.4. Impact of Applying Dilation on the Proposed Model

4.5. Comparison of the Suggested Model with Prior Investigations Based on Three Datasets

5. Discussion

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI