This paper presents the design of a multiscale dilated two-simultaneous deep CNN technique to extract multiscale detail characteristics from MRI images. To increase the receptive field despite adding more parameters to the network, dilated convolution is used. Additionally, batch normalization is used to guarantee that the model’s precision will not drop as the network depth increases.
Both local and global characteristics are acquired in the dilated PDCNN framework through the corresponding local and global routes. However, most DCNN-based methods cannot effectively collect both local and global data because of their tiny receptive fields. Stacking multiple dilated convolutions has the disadvantage of creating a grid effect, even though dilated convolution maintains data resolution at the output layer and expands the receptive field without incorporating computation. In the case of a poor DF, the model may contain a smaller receptive field and nevertheless miss the coarse features. In contrast, with an excessive DF, the model is unable to pick up the finer details. By contrasting various DFs, these suitable DFs are chosen for both local and global feature paths. Each of the convolutional layers is followed by the max-pooling layer for every single path that down-samples the outcome of the convolutional layer and uses the ReLU activation function. In the end, an average ensemble method is employed to carry out the brain tumor categorization process after four ML classifiers have trained the images.
3.4.1. Multiscale Feature Selection Path
CNNs have been used extensively in the field of medicine and have demonstrated good results in the segmentation and classification of medical images [
24]. CNN architectures are built using a variety of building blocks, such as fully connected (FC) layers, pooling layers, and convolution layers. Convolution layers, which combine linear and nonlinear operations—that is, activation functions and convolution operations—are used in feature extraction [
25,
26]. Kernels and their hyperparameters, such as the size, quantity, stride, padding, and activation function of each kernel, are the parameters of convolution layers [
27]. Six convolution layers are used in the two simultaneous paths, and the convolution operation occurs using Equation (1).
where for the
kernel in layer
expresses the resultant feature map of position
,
represents the weight vector’s values,
indicates the input vector of position
in
and
is the symbol of bias. In addition, the activation function is
[
28]. By down-sampling, pooling layers lower the dimensionality of the feature maps. The stride, padding, and filter size are among the hyperparameters that comprise pooling layers, although they do not contain any other parameters. Two common varieties of pooling layers are max pooling and global average pooling. Maximum pooling layers are used in this structure. The output size of the pooling operation in the CNN is calculated using Equation (2).
where
stands for the dimension of input,
is the kernel size, the padding size is shown by
, and
is symbol of stride size [
28].
The pooling layers’ feature maps are smoothed out and sent to several one-dimensional (1D) vectors known as FC layers. The most popular activation parameter for FC layers is the rectified linear unit (ReLU), which is illustrated in Equation (3).
The final FC layer’s activation function is usually SoftMax for the categorization of multiple classes and Sigmoid for binary classification. The node values in the final FC layer of the proposed model are computed using Equation (4), and the sigmoid activation function for a binary categorization dataset-I is calculated using Equation (5) [
25].
where
stands for the neural network layers’ internal calculations,
shows the bias, and
stands for the weights used to determine an output node’s value. Furthermore, the input vector and output class are denoted by
and
, respectively. The SoftMax activation function is calculated using Equation (6) for the multiclass categorization of Figshare dataset-II; and Kaggle dataset-III; in this proposed structure.
where
stands for the input vector and
for the class in the case of a multiclass categorization problem. Additionally, the
component of the class rating vector in the final FC layer is displayed by
. The category
with the highest
coefficient is chosen as the output class in the SoftMax activation function [
25]. A backpropagation algorithm is used during CNN training to adjust the weights of the FC and convolution layers. The two main elements of backpropagation are the loss function and gradient descent (GD), among which GD is used to minimize the loss function. Among the loss functions most frequently employed by CNNs is the cross-entropy (CE) loss function. For the binary categorization dataset-I with a sigmoid activation function, the CE loss function is computed using Equation (7).
where
is computed using Formula (4). For the multiclass categorization Figshare dataset-II and Kaggle dataset-III with the SoftMax activation function, the CE loss function is calculated using Equation (8) [
28,
29].
where
denotes the quantity of training elements, the input image class
is indicated by
, and the
component of the category scores vector in the final FC layer is represented by
[
28].
Expanding the receptive field in deep learning involves boosting the dimension and depth of the convolution kernel, which in turn enhances the number of elements in the network. By adding weights of zero to the conventional convolution kernel, dilated convolution may enhance the receptive field without adding more network elements.
Equation (9) defines the convolution function * as follows: 1-D dilated convolution using DF, where
connects the input image
alongside kernel
. The term “standard CNN” refers to this 1-D convolution. The network is identified as a dilated CNN when
rises.
Upon the introduction of a DF denoted as
and through its expansion,
is referred to as,
Using Equation (10), the dilated convolution operation is calculated in this proposed structure. The fundamental CNN has a value of
[
29,
30].
The main function of the dilated convolution layer is to extract features. In addition to conveying fine and high-level feature details, MRI images also contain rough and low-level information. As a result, image data must be extracted at several scales. Specifically, the local and global routes are employed to obtain the local and global features. Within the local route, the convolutional layers make use of the small 5 × 5 pixels window dimension to provide low-level details about the images. However, a vast number of filters with 12 × 12 pixels are present in the convolutional stages of the global path. The same 5 by 5 filters are used by three different convolution layers throughout the local path, and each layer’s decremental even number of the high DF (4,2,1) is the only factor used to produce the coarse feature maps. Three distinct convolution layers in the global path employ identical 12 × 12 filters, and the generation of finer feature maps is exclusively dependent on the tiny DF (2,1,1) of every single layer. As illustrated in
Figure 4, three convolution layers with distinct filter numbers (128,96,96) are applied at each feature extraction path to extract image data at various scales.
Conv1, Conv3, and Conv4 provide local as well as coarse features, while Conv2, Conv5, and Conv6 supply global as well as fine features. The max-pooling layer is employed after each convolutional layer for each path that down-samples the output of the convolutional layer. By employing a 2 × 2 kernel, the max-pooling layers lower the dimension of the attributes that are produced.
A dimension of (32,32,1) is assigned to each input tensor in the suggested model’s structure. To test the impact of the DF on the model’s efficiency and comprehend the gridding impact brought about by the dilation approach, the interior design is kept as simple as possible. In the local path, layer Conv1 applies a 5 × 5 filter and a dilation factor of = 4 to generate coarse feature maps (such as shapes and contours); layer Conv3 applies the same filter and dilation factor of = 2 along with the final convolution to generate coarse feature maps once more; and layer Conv4 applies a 5 × 5 filter and dilation factor of = 1 to generate coarse feature maps. In the global route, layer Conv2 applies a 12 × 12 filter and a dilation factor of = 2, layer Conv5 applies the same filter and dilation factor of = 1 along with the last convolution to generate fine feature maps once more, and layer Conv6 applies a 12 × 12 filter and a dilation factor of = 1 to generate fine feature maps. The activation function of ReLU is utilized by all six convolutional stages.
3.4.4. Feature Map of Dilated Convolutional Layers
A CNN feature map represents specific attributes in the input image as the result of a convolutional layer. It is produced by filtering input images or the previous layers’ feature map output. The feature maps that are acquired from every convolutional layer are presented in
Figure 5 and
Figure 6. In
Figure 5, the low-level and coarse features of the three convolutional layers conv_1, conv_3, and conv_4 having filters of 128, 96, and 96 are displayed. The feature maps in this figure are primarily composed of coarse and local features, which represent the texture in an image. In this local path, a dilated CNN algorithm that has DFs associated with (
= 4,
= 2,
= 1) is referred to as a dilated PDCNN (4,2,1). In
Figure 6, the high-level feature maps including contour representations, shape descriptors, and fine features of the three deeper convolutional layers conv_2, conv_5, and conv_6, having the same filters, are shown. DFs corresponding to (
,
,
= 1) are used in this global path. The multiscale feature maps, which are displayed in
Figure 7, are greatly improved when these features are combined using a feature fusion technique.
Figure 8 displays the final multiscale features that are extracted, along with a fully connected layer that is prevented from overfitting by employing the dropout technique.