1. Introduction
Recently, computer vision-based medical imaging techniques help medical experts for better diagnosis and treatment [
1]. A number of medical imaging modalities for example X-ray, computer tomography (CT), MRI, and ultrasound have shown remarkable achievements in the health care system [
2]. These medical imaging techniques have been utilized for brain imaging analysis, diagnosis, and treatment. The detection and classification of brain tumors have emerged as a hot research topic for researchers, radiologists, and medical experts [
3].
Brain tumors arise due to the unusual growths and unrestrained cell division in the brain. It can deteriorate the health condition of a patient and expedite casualty if not detected precisely [
4]. Generally, brain tumor is grouped into two varieties, for example malignant tumor and benign tumor. A malignant tumor is regarded a cancerous tumor and a benign tumor is considered a noncancerous tumor. The objective of tumor detection is to identify the position and extension of the tumor area. This detection task can be accomplished by comparing the abnormal areas with the normal tissue [
5]. Accurate imaging analysis of brain tumor images can determine a patient’s condition. MRI is an extensively used imaging technique for the study of brain tumors. The brain MR images provide a clear representation of the brain structure and abnormalities [
6]. It is observed that for brain tumor detection, two important imaging modalities such as CT scan and MRI are used. However, as compared to CT scans, MRI is preferred due to its non-invasive nature which produces high-resolution images of brain tumors. Usually, brain MRI can be modeled into four modes such as T1-weighted, T1-weighted contrast-enhanced, T2-weighted, and T2-weighted FLAIR. Each model illustrates different features of a brain tumor [
7]. In the literature, various automated approaches have been introduced for brain tumor classification utilizing brain MRI. Over the years, support vector machine (SVM) and neural network (NN) based approaches are extensively utilized for brain tumor classification [
8,
9,
10,
11]. Konur et al. [
12] proposed SVM based approach where the SVM model is first trained with known samples and then, the trained model is used to process other brain tumor images. Xiao et al. [
13] developed a segmentation technique by merging both Fuzzy C-Means and SVM algorithms.
Earlier, machine learning (ML) based tumor detection approaches are considered state-of-the-art techniques. Recently, these ML-based approaches are unable to provide high accuracy results due to inefficient prediction models and the acute features of the medical data. Therefore, most of the researchers tried to find an alternative learning-based approach for the improvement of the classification accuracy [
14,
15]. Alternatively, deep learning (DL) sets a sensational development in the machine learning domain since DL architectures can efficiently predict the model by using a large dataset. Unlike SVM and KNN, the deep learning models are able to signify complex relationships deprived of using a large number of nodes. Therefore, these approaches have obtained brilliant performance in medical imaging applications [
16]. Recently many researchers have developed computer-aided frameworks for medical image classification tasks that produce outstanding results. Yu et al. [
17] introduced a computer-aided electroencephalogram (EEG) classification framework named CABLES that classifies the six different EEG domains under a unified sequential frame. The authors have conducted comprehensive experiments on seven different types of datasets by using a 10-fold cross-validation scheme. The proposed EEG signal classification framework has shown significant improvements over the domain-specific approaches in terms of classification accuracies. Sadiq et al. [
18] developed an innovative pre-trained CNN based automated brain-computer interface (BCI) framework for EEG signal identifications. This framework is basically investigating the consequences of various limiting factors. The proposed approach has been assessed using three public datasets. The experimental results witnessed the robustness of the proposed BCI framework when identifying EEG signals. Huang et al. [
19] urther developed a deep learning based EEG segregation pipeline to overcome the previous limitations present in the BCI framework. In this approach, the authors have merged the concepts of multiscale principal component analysis, Hilbert transform based signal resolution approach, and pre-trained CNNs for the automatic feature estimation and segregation. The proposed BCI framework has been evaluated using three binary class datasets. It is found the proposed approach is reliable in identifying EEG signals and has shown outstanding performance in terms of classification accuracy.
The traditional diagnostic approach such as histopathology is the process of detecting the disease with the help of microscopic investigation of a biopsy which is exposed onto a glass slide. The traditional diagnostic approaches are performed manually from tissue samples by pathologists. The area of infected area However, these traditional diagnostic approaches are time consuming and difficult. On the other hand, transfer learning-based DCNN framework reduces the workload of the pathologist and supports them to concentrate on vulnerable cases. Moreover, the use of transfer learning can help to process the brain MRl images faster and more accurately. Further, the automatic detection and classification will lead to a quicker diagnosis procedure which is less labor-intensive.
The DCNN architectures have shown outstanding performance in detecting and classifying brain tumors because of their generalizations of different levels of features. Also, the pre-processing steps such as data augmentation and stain normalization used in DCNN are beneficial to obtain robust and accurate performance. Therefore, we are motivated to use DCNN architecture to detect and classify brain tumors. However, the accurateness of DCNN architectures depends on the data sample and the training process since these architectures require more precise data for better output. In order to overcome this limitation, transfer learning can be employed for improved performance. Mainly, transfer learning has two main aspects such as fine-tune the convolutional network and freeze the layers of the convolutional network. Instead of building a CNN model from scratch, fine-tuning the pre-trained model will be sufficient for the classification task.
Therefore, we use a pre-trained DCNN architecture called as VGGNet based on transfer learning to classify brain tumors for instance meningioma tumor, glioma tumor, and pituitary tumor. Usually, the pre-trained architecture is previously trained on a large dataset and used to transfer its learning parameters to the target dataset. Therefore, the pre-trained model can consume less time since it does not require a large dataset to get the results. The top layers of the VGGNet model extract the low-level features for example colors and edges whereas, the bottom layers extract the high-level features for example objects and contours. The objective is to transfer the knowledge learned by VGGNet to a different target task of classifying brain tumor MRI images. The main objective of using VGGNet over other pre-trained networks is the use of small receptive fields in VGGNet rather than massive fields. Due to the use of smaller convolutional filters in the VGGNet, it contains a significant number of weight layers and in turn, it provides better performance. Further, this proposed approach uses a Global Average Pooling (GAP) layer at the output to avoid overfitting issues and vanishing gradient problems. The GAP layer is used to transform the multidimensional feature map into a one-dimensional feature vector [
20,
21,
22]. Since the GAP layer does not need parameter optimization, therefore, the overfitting issue can be escaped at this layer. The major contributions of the proposed research work are listed as follows
To develop an approach for automatic detection and classification of brain tumors using transfer learning-based DCNN architecture. The proposed approach is proficient in extracting important features from the Figshare dataset.
Data augmentation technique is used to artificially increase the size of training image data by rotating and flipping the original dataset. More training image data will help the CNN architecture to boost performance and produce skillful models.
GAP layer is used at the output to avoid overfitting issues and vanishing gradient problems.
To compare the proposed framework with competing brain tumor classification approaches with reference to accuracy on the Figshare dataset.
The remainder of the research article is structured as follows:
Section 2 presents the comparative analysis of related work done.
Section 3 provides a brief outline of DCNN, transfer learning and the pre-trained model VGGNet.
Section 4 illustrates the proposed brain tumor detection approach.
Section 5 presents a thorough description about the dataset used in the experiment, evaluation metrics, the training process, results and discussion. Finally,
Section 6 provides the conclusion and possible future prospect of this work.
2. Related Work
Brain tumor detection and classification problem have been evolved as a hot research topic for two decades because of their high medical relevance. Timely detection, diagnosis, and classification of brain tumors have been instrumental in effective treatment planning for the recovery and life extension of the patient. Brain tumor detection is a procedure to differentiate the abnormal tissues for example active tumor tissue, edema tissue from normal tissues for example gray matter, white matter. Generally, the brain tumor detection process is grouped into three types such as manual detection, semi-automatic detection, and fully automatic detection. Currently, medical experts are giving more importance to fully automatic detection methods where the tumor location and area can be detected automatically deprived of human intervention by setting appropriate parameters.
The deep learning model extends the conventional neural networks with the addition of more hidden layers among the input layer and output layer of the network in order to establish additional complex and nonlinear relations. A number of deep learning models for instance convolutional neural network (CNN), deep neural network (DNN), recurrent neural network (RNN) are extensively employed for medical imaging applications. Here, we summarize numerous the deep learning-based existing work done for the brain tumor classification task.
Havaei et al. [
23] introduced an automated DNN based brain tumor segmentation technique. This method accomplishes both local and global contextual features of the brain at one time. The fully connected (FC) layer used at the last layer of the network improves the network speed by 40 fold. This proposed model is applied specifically for the segmentation of glioblastomas tumors pictured in brain MRI. Rehman et al. [
24] introduced three different types of CNN-based architecture for example AlexNet, GoogLeNet, and VGGNet for the classification of brain tumors. This framework attained an accuracy of 98.69% by utilizing the VGG16 network.
Instead of extracting features from the bottom layers of the pre-trained network, Noreen et al. [
25] introduced an efficient framework where, the features are extracted from multiple levels and then, concatenated to diagnose the brain tumor. Initially, the features are extracted from DensNet201 and then, concatenated. At last, the concatenated features are provided as input to the softmax classifier for the classification. Similar steps are applied for the pre-trained Inceptionv3 model. The performances of both models are assessed and validated using a three-class brain tumor dataset. The proposed framework achieved accuracies of 99.34%, and with Inception-v3 model and DensNet201 model respectively in terms of classification of brain tumor.
Li et al. [
26] developed a multi-CNN structure by combining multimodal information fusion and CNN to detect brain tumors. The authors have extended the 2D-CNNs to multimodal 3D-CNNs in order to get different information among the modalities. Also, a superior weight loss function is introduced to minimize the interference of the non-focal area which in turn increases the accuracy of detection. Sajjad et al. [
27] developed a CNN-based multi-grade classification system which helped in clear segmentation of tumor region from the dataset. In this system, first, a deep learning technique is utilized to segment tumor areas from brain MRI. Subsequently, the proposed model is trained effectively to avoid the deficiency of data problems to deal with MR images. Finally, the trained network is fine-tuned using augmented data to classify brain tumors. This method achieves an accuracy of 90.67% for enhancing the classification of tumors into different grades.
Anaraki et al. [
1] proposed a tumor classification approach by taking advantage of both CNN and the genetic algorithm (GA). Instead of adopting a pre-defined deep neural network model, this proposed approach uses GA for the development of CNN architecture. The proposed approach attained an accuracy of 90.9% for classifying three Glioma grades. Zhou et al. [
28] presented a universal methodology based on DenseNet and RNN to detect numerous types of brain tumors on brain MRI. The proposed methodology can successfully handle the variations of location, shape, and size of tumors. In this method, first DenseNet is applied for the extraction of features from the 2D slices. Then, the RNN is used for the classification of obtained sequential features. The effectiveness of this approach is evaluated on public and proprietary datasets and attained an accuracy of 92.13%.
Afshar et al. [
29] proposed a new learning-based architecture called capsule networks (CapsNets) for the detection of brain tumors. It is perceived that CNN needs enormous amounts of data for training. The introduction of CapsNets can overcome the training complexity of CNN as it requires fewer amounts of data for training. This approach incorporates CapsNets for brain tumor classification and achieves better performance than CNNs. Deepak et al. [
30] recommended a transfer learning-based tumor classification system utilizing a pre-trained GoogLeNet to categorize three prominent tumors seen in the brain such as glioma, meningioma, and pituitary. This method effectively classifies tumors into different grades with a classification accuracy of 98%. Frid-Adar et al. [
31] introduced a generative adversarial network for the task of medical image classification to face the challenges of the unavailability of medical image datasets. The proposed approach generates synthetic medical images which are utilized to improve the performance. This approach is validated on a limited liver lesions CT image dataset and has shown a superior classification performance of 85.7% sensitivity and 92.4% specificity. Abdullah et al. [
32] introduced a robust transfer learning enabled lightweight network for the segmentation of brain tumors based on the VGG pre-trained network. The efficacy of the proposed approach has been evaluated on the BRATS2015 dataset. This framework attained a global accuracy of 98.11%.
3. Deep Convolutional Neural Network
CNN’s are feed-forward neural networks that rely on features such as receptive field, weight sharing, and pooling operation in order to characterize the image data [
33]. Generally, CNN involves three key layers for example convolutional layer, pooling layer, and fully connected (FC) layer. In convolutional layers, the convolution operation between various kernels and the input image is performed to obtain feature maps. The convolution layers reduce drastically the total network parameters due to the concept of weight sharing and identify various patterns using the receptive field. The pooling layer pools features via a window of a particular size. The pooling layer minimizes the size of the feature map and parameters used in CNN. It helps to reduce the computational cost. Max-Pooling is the widely used pooling technique for designing CNN where it considers the maximum value of input taken by the pooling window. The FC layer acts as a classifier. The features propagated in a forward direction through the network to the FC layer. Finally, the back-propagation procedure is utilized to update the network parameters with the help of gradient descent.
Each convolutional layer extracts features from the previous layer. The feature map can be calculated as
where,
represents the rth input channel and
represents the corresponding sub-kernel,
is a bias term and
signifies the convolution operation. Thus, the feature can be commutated by taking summation of the application of
dissimilar N × N convolution filters and a bias term. To get the non-linear features, some of the non-linear functions for example sigmoid, rectified linear unit have been applied to the convolution result. Recently, a max-out non-linearity has been used effectively in exhibiting valuable features. The max-out features related with
K feature maps for spatial position
i,
j as follow
Max pooling operation determines maximum feature value in each feature map. It minimizes the size of feature map which depends on the pooling size
. It is characterized as
In the case of medical image processing, the available data to train a DCNN model is limited. This results in over-fitting which lowers the performance of the DCNN model. To solve this issue, the concept of transfer learning is introduced [
34]. Transfer learning is nothing but a part of deep learning that is based on the fact that a pre-trained model is trained on a large dataset and transfers its learning parameters to the small dataset usually the target dataset. In order to use the pre-trained model for a different task, the last FC layers are trained with preliminary arbitrary weights on the target dataset. Therefore, the transfer of the pre-trained network parameters can deliver a new target model with effective feature extraction proficiency and less computational cost. Transfer learning has shown superior performance in medical imaging applications with reference to accuracy, training time, and error rates. Here, we use a popular pre-trained network, VGG16 for brain tumor classification tasks.
VGG16 is a DCNN pre-trained network introduced by Simonyan et al. in 2014 [
35]. It involves 16 layers which contain convolutional layers, 3 FC layers and 5 max-pooling layers. The input image fed to the first convolutional layer is having image size of 224 × 224. Rectified linear unit (ReLU) is utilized as the activation function subsequently every convolutional layer. In addition, a max pooling layer is used in VGG16 network to minimize the size of the network. To prevent over-fitting, dropout regularization is used in the FC layers. Finally, softmax linear layer is used after the last FC layer for the classification of the given image. VGG16 network replaces multiple 3 × 3 filters in a sequential manner. The use of multiple stacked small size kernel is more effective as compared to a large size kernel and it improves the depth of ConvNet. Therefore, the ability of the network to learn hidden features increases with increase in depth of ConvNet.
5. Experiments and Discussion
We assess the efficiency of the proposed transfer learning-based framework on the widely used Figshare dataset. Here, we illustrate a detailed description of the datasets, evaluation metrics, network training, and performance evaluation.
5.1. Datasets
We use a publicly available brain MRI dataset called Figshare developed by Cheng et al. [
38]. The dataset contains 3064 T1-weighted brain MRI slices of three different categories of tumor for example meningioma, glioma, and pituitary obtained from 233 patients. The dataset comprises 708 meningioma slices, 1426 glioma slices, and 930 pituitary slices. All images are stored in .mat format. The MRI slices are normalized with an intensity range between 0 to 1.
Figure 2 shows three categories of brain tumors from the Figshare dataset.
Figure 3 demonstrates the images obtained after the application of the data augmentation technique.
5.2. Evaluation Metrics
To assess the effectiveness of the proposed model for detection and classification task of brain tumor, we use four performance measures for instance accuracy, sensitivity, precision, and specificity.
Accuracy: It is a performance metric which determines the percentage of appropriately classified image samples among total number of image samples without considering the image class labels. It can be calculated using following relation
Sensitivity: It is a performance metric which determines the ability of the model to properly classify brain tumors. It can be calculated using following relation
Specificity: It is a performance metric which determines the ability of the model to accurately classify negative samples. It can be calculated using following relation
Precision: It is a performance metric which determines the true positive measure. It can be calculated using following relation
The notations used in the above equations are represented as; TP: true positive, TN: true negative, FP: false positive and FN: false negative.
5.3. Network Training
Here, we explain the detailed architecture and training procedure of the proposed transfer learning-based framework. The architecture of our proposed DCNN framework is illustrated in
Figure 4. Here, the convolution operation is accomplished using 3 × 3 convolution filters with zero paddings. Similarly, the pooling operation is performed using max-pooling of size 2 × 2. The MRI slices are resized into 224 × 224 pixels using interpolation. The values of the augmentation parameters used in the process of data augmentation are set as vertical flip = 0.5, horizontal flip = 0.5, random brightness contrast = 0.3, and shift scale rotate = 0.5. Each MRI slice is normalized with the subtraction of the mean image calculated from the training set. We use FC layers trained with the ReLU activation function. The dropout rate used to avoid overfitting is 0.2. Adam optimizer is utilized for the optimization of the network having a learning rate of 0.001. The parameters
and
are set to 0.6 and 0.8, correspondingly. We use 100 epochs with a batch size of 20 to train the DCNN model. We use 70 percentage of MRI dataset for training, 15 percentage of the dataset for validation, and the rest 15 percentage of the dataset for testing. The hyper parameters used in our proposed methodology are illustrated in
Table 1.
5.4. Results and Discussion
In the first part of the experiment, we assess the performance metrics of the proposed transfer learning-based DCNN model on the Figshare dataset with regards to accuracy, sensitivity, precision, and specificity.
Table 2 shows the performance measures of the proposed transfer learning-based classification model. It is noticed that the proposed classification model has shown the highest percentage for four important classification measures when evaluated on the Figshare dataset. As seen, our classification framework reached an accuracy, sensitivity, specificity, and precision of 98.93%, 98.68%, 99.13%, and 99.11%, respectively. Also, we present the accuracy of each class of tumors as well as mean accuracy in
Table 3. In our experiment, we have used 106 brain MRI slices having Meningioma tumor, 214 brain MRI slices having Glioma tumor, 139 brain MRI slices having Pituitary tumor for testing the proposed classification framework. As observed from
Table 2, we obtain the classification accuracy for Meningioma 97.88%, for Glioma 99.29%, and Pituitary 98.38%. Finally, we have taken the mean accuracy of three classes of tumors, and it resulted as 98.51%.
Table 4 illustrates the MSE loss after each convolutional layer of the proposed transfer learning-based DCNN model being trained. In addition, we present the sample experimental results obtained by the proposed framework in
Figure 5.
Figure 4 shows the original brain MR image with tumor image, white matter, segmented Image, gray matter, skull-stripped image, and extracted tumor respectively. The confusion matrix of the proposed transfer learning-based DCNN model is illustrated in
Figure 6.
Usually, feature maps assist in determining the active areas of brain MR images which play a significant role in the tumor classification procedure. The outer layers of the network emphasize granular features such as the shape of the brain and locations of tumors. The granularity of the features minimizes as we proceed through the layers. The last layer produces fine granular features which can find small tumors. At last, the generated feature maps are employed to classify the brain MRI using a Sigmoid activation function. The receiver operator characteristics (ROC) curve and precision-recall (PR) curve of the proposed classification methodology is shown in
Figure 7. The training progress of the proposed transfer learning-based framework is presented in
Figure 8. It shows the variation of accuracy percentage versus the number of epochs. It is perceived that with increase in epochs, the model attains the minimum mean square error and converges. Finally, our proposed transfer learning-based DCNN model attains an accuracy of 98.9% on the Figshare dataset.
In the second part of the experiment, the competency of the proposed transfer learning-based DCNN framework is compared with existing brain tumor classification techniques on the Figshare dataset. We consider some of the existing methods introduced by researcher Ismael et al. [
3], Abiwinanda et al. [
7], Swati et al. [
16], Afshar et al. [
29], and Pashaei et al. [
39], for the comparison. Ismael et al. [
3] combined both statistical features and NN algorithms for the brain tumor classification task and attained an accuracy of 91.9% on the Figshare dataset. Abiwinanda et al. [
7] developed a CNN-based architecture with max-pooling for brain tumor classification and attained an accuracy of 84.19%. Afshar et al. [
29] use CapsNets based model for the detection of brain tumors. It overcomes the training complexity of CNN and achieves a classification accuracy of 86.56%. Pashaei et al. [
39] use CNN for the purpose of extraction of hidden features from brain MRI and then, ELMs are employed to classify extracted features. They obtained the highest classification accuracy of 93.68%. Jun et al. [
40] introduced a unique attention based mechanism by integrating the multipath network for the task of brain tumor classification. The primary objective of using the attention mechanism is to choose only the acute information fit into the target area whereas ignoring the irrelevant details. In addition, the multipath network is utilized to reduce the complexity. This scheme has been evaluated with the Figshare dataset and attained an accuracy of 98.61%. Masood et al. [
37] develop an efficient custom Mask Region-based CNN scheme for tumor classification and segmentation in adverse conditions for example noisy input MR images, asymmetrical shapes, and indistinct boundaries. This scheme has been evaluated with the Figshare dataset and attained an accuracy of 98.34%. The performance results with regards to the accuracy of different competing approaches are showed in
Table 5. It is perceived that the proposed transfer learning-based DCNN model has shown superior accuracy results compared to other existing approaches.
It is observed from our research work that different brain tumors have different shapes and positions. For example, a glioma tumor is typically encircled by edema. On the other hand, pituitary tumor exists near the sphenoidal sinus and optic chiasma. Meningioma normally occurs very close to the skull, and cerebrospinal fluid. Thus, it is very problematic to classify the discriminative features of a particular brain tumor. Also, the discriminative features are correlated to the position of the brain tumor. Most of the deep learning-based approaches are unable to classify features due to the unavailability of data. However, our proposed transfer learning-based tumor detection and classification approach require fewer amount data for learning since the pre-trained model is previously trained via a large amount of data. Therefore, the proposed approach is capable of learning and classifying different types of features. It provides accurate classification results for three categories of tumors.
In addition, we have done a comparative analysis of the results obtained by the proposed transfer learning based DCNN model on the Figshare dataset with and without the data pre-processing step as shown in
Table 6. The objective is to show the effectiveness of the data pre-processing step used in our proposed approach. It is observed that the proposed approach has shown better results in terms of accuracy, precision, specificity, and precision with the data pre-processing step as compared to without pre-processing step.