1. Introduction
A brain tumor is an abnormal growth of cells in the brain, which can be either malignant or benign [
1]. As the brain is a vital organ responsible for cognitive function, the presence of tumors can have life-threatening consequences. Brain tumors account for 85–90% of all central nervous system tumors, according to a report [
2]. Radiologists use imaging techniques such as CT scans and MRI to locate cancer in the brain, with MRI providing higher-resolution imaging than CT scans. However, manually analyzing the images and grading tumors can be time-consuming, requires specialized expertise, and may still result in imprecise diagnoses and high costs [
3]. These challenges arise from the asymmetrical shapes of tumors and the difficulty of distinguishing between different types of tumors that may look similar. Consequently, there is growing interest in developing computerized systems for recognizing brain tumors to improve diagnosis and treatment outcomes [
4].
A computerized system utilizing traditional machine learning techniques has been created by researchers. The system involves a variety of processes, including preprocessing, feature extraction, dimensionality reduction, and categorization [
5,
6,
7]. Feature extraction is the most important step in this process as it is necessary to automate brain tumor detection. However, the efficacy of this technique heavily relies on the type and nature of the features used, and traditional methods may not be able to identify small tumors in unseen samples or may work slowly when processing large amounts of brain MRIs. K-nearest neighbors (KNN), support vector machines (SVM), decision trees, and segmentation-based methods are some of the ML-based approaches used to address this problem. In the segmentation-based approach, algorithms are used to segment the tumor as a region of interest (ROI), followed by feature extraction from the ROI. This approach involves several steps, including preprocessing, finding a region of interest, feature extraction, training, and finally classification. However, due to the complex structure of the brain, the current techniques lack accuracy. Therefore, it is vital to develop an efficient and precise model for the timely detection of brain tumors with minimal human intervention [
8].
Various popular deep learning models have been developed, including GoogleNet, InceptionNet, ResNet, VGGNet, DenseNet, and AlexNet. However, basic classification techniques used for detecting brain tumors only indicate the presence of a tumor and do not provide information about its location, leading to a high rate of false positives. Researchers have used different object detection methods for brain tumor identification to overcome this limitation. One study [
9] utilized the deep learning-based CenterNet to localize brain tumors, using ResNet34 having an attention module as a base network. However, most object detection-based techniques for brain tumor identification require widespread hyperparameters and entail high computational costs. To address these challenges, this study proposes a new approach for early brain tumor recognition. In the first phase, the tumor is located using segmentation. In the second phase, ResNet-v2 and Histogram of Oriented Gradients (HoG), and a DNN are used to extract features from the segmented image. In the end, the features are combined, and a BiLSTM is trained for three classes including glioma, pituitary, and meningioma. The suggested model accurately identifies tumor locations and enhances the detection accuracy by employing essential features extracted from the segmented area.
The aim of this study is to create a strong framework utilizing MFO and Kapur’s thresholding-based segmentation, along with feature fusion, for the purpose of identifying and categorizing brain tumors using various MRI images. Additionally, the goal is to propose an efficient system that can accurately detect and locate tumors in MRI scans that have not been seen before. The study also aims to develop a technique that can identify small tumors in MRI scans for better early detection. The proposed system was extensively tested and the results show that it performs significantly well in terms of accuracy and robustness for early recognition and categorization of brain tumors. The remaining sections of the study are dedicated to discussing existing methods in
Section 2, demonstrating the proposed technique in
Section 3, evaluating the experiment in
Section 4, and concluding the study in
Section 5.
2. Related Work
Several researchers have explored machine learning and deep learning-based methods for medical diagnosis, as mentioned in references [
10,
11]. Some of these approaches utilize segmentation techniques, such as ensemble deep networks, which require training from the initial stage. To address this issue, some researchers have introduced a drop-out layer during the testing phase to recognize uncertainties in lesion identification [
12]. Additionally, in reference [
13], a CNN-based approach was proposed, and data augmentation was performed to improve the classification accuracy. The study used three datasets and achieved an accuracy of 98.43%. DL-based methods are increasingly essential in several image-processing applications, including medical diagnosis [
14]. In another study [
15], data augmentation was performed using patch rotation and extraction techniques on 3064 images, and CapsuleNet was utilized for recognizing and categorizing brain cancer into three classes. In reference [
16], VGG16 and AlexNet were utilized to attain features from brain scans, and a feature fusion method was employed for binary classification. Finally, an SVM was used to categorize the images, achieving an accuracy of up to 96%. In reference [
17], an encoder-based technique was used for categorization, with an accuracy of 98.5%.
Reference [
18] used ResNet50 with additional layers for binary categorization of brain tumor images and attained an accuracy upto 97%. In reference [
19], the BrainMRNet model was proposed, consisting of attention layers, residual stages, and hyper-column technique, achieving an accuracy of 96.05%. Reference [
20] utilized transfer learning-based CNN and a multiple logistic regression method outperforming the existing techniques on three benchmarks. Sachdeva et al. [
21] proposed a brain tumor detection method using SVM and artificial neural networks, combined with a genetic algorithm, achieving an accuracy of 91% and 94.9%, respectively. Tahir et al. [
22] explored different approaches to increasing classification accuracy by employing edge detection, noise removal techniques, and contrast improvements achieving an accuracy of 86%.
In recent studies, different techniques have been suggested for brain tumor detection. Sarah et al. [
23] utilized Harris Hawks optimized neural networks and varied types of layers for modifying the architecture. They preprocessed the images for noise removal and candidate region recognition to identify tumor regions. The proposed method achieved 98% accuracy using the Kaggle dataset. Aruna et al. [
24] developed an approach using pretrained CNNs such as InceptionV3, ResNet50, and VGG19. They concatenated deep features extracted through CNNs using a two-stage strategy and reduced dimensions further using PCA for categorization. The results showed improved classification accuracy, but their approach increased computational complexity. In another study, Bakary et al. [
25] employed the transfer learning concept to develop an automatic brain tumor classification technique using MR images of the brain. They used the AlexNet model for feature extraction and binary classification, achieving an overall accuracy of 99.62%. However, they did not classify the images into specific types of tumors.
The study by Sarmad et al. [
26] proposes an automated system for brain tumor detection that employs several steps to achieve a high accuracy in classifying different types of brain tumors. The first phase involves using linear contrast stretching to identify edges in the sample. In the second phase, a DNN with 17 layers is designed for segmenting the tumor. This step aims to accurately identify the location and boundaries of the tumor within the brain image. In the third step, a modified version of the MobileNetV2 architecture is utilized for extracting features. Transfer learning is used to train the network, which involves adapting the pretrained model’s parameters to the specific task of brain tumor detection. Then, an entropy-based controlled mechanism is utilized with multiclass support vector machines (M-SVM) for feature selection. This step aims to identify the most relevant features for tumor classification. Finally, M-SVM is utilized for brain tumor categorization, which involves identifying glioma, meningioma, and pituitary images. The proposed system achieves a high accuracy of 97.47% and 98.92% for meningioma and pituitary images, respectively.
While several methods have been developed for brain tumor detection, early detection remains a significant challenge. Early detection is critical for effective treatment and improved patient outcomes. A detail of the existing model is shown in
Table 1.
3. Methodology
In this section, we introduce the working principals of the proposed model. The proposed system is a three-stage model as shown in
Figure 1. The images of the brain are in grayscale; therefore, a preprocessing phase has been skipped. First, segmentation is employed using the mayfly optimization with a multilevel threshold approach. Second, the features are extracted from segmented tumors. Third, the brain samples are classified from the proposed multilayer perceptron (MLP).
3.1. MFO with Multi-Level Thresholding
MFO is one of the population-based methods developed in 2020 [
28]. The concepts of
MFO consist of the following functions; (1) initialization of equal number of male and female agents, (2) allowing the male mayfly to recognize the finest position as
loc for the chosen task, (3) allowing the female mayfly to find and be merged with male mayfly located at
loc, (4) offspring generation, and (5) termination of search and displaying the final output.
We employed a multilevel thresholding approach with
MFO technique. Kapur et al. [
29] proposed a threshold-based approach to compute the optimal thresholds for segmentation. The computation depends upon the distribution of probability and entropy of the image histogram. The approach determines the optimal threshold to maximize the entropy. For the bilevel threshold computation, an objective function can be attained as presented in Equation (1).
Here,
and
are computed as below:
Here,
refers to the distribution of probability (DP) of the intensity level of grayscale;
and
presents the DP for the class labels
and
as described in Equations (2) and (3). This entropy-based approach is flexible enough for multilevel thresholding. Thus, it is necessary to split the images into
n class labels using
n − 1 threshold numbers. The objective value can be changes as shown in Equation (4).
Here,
T = [
t1,
t2,
t(
n − 1)] presents a vector consisting of several threshold numbers. The entropies are described separately with the respective threshold
t value; therefore, Equation (5) has been modified for
n entropy.
where, (
presents the probability occurrence for the
n classes, and for the optimal threshold numbers, the
MFO approach is utilized. The
MFO technique is projected similarly to mating method and flighting feature of the mayflies [
28]. The mayflies in swarms are recognized as female and male individuals. The male mayfly performs more robustly consequently improving the optimization process. The
MFO approach modifies the position depending upon the location
loci(
t) and velocity
velocityi(
t) at current round:
All female and male mayflies modify the location employing Equation (6) with respect to time. However, they utilize unique velocity modifying features.
Mating
The above half female and male mayflies pass through mating and generate children. The offspring are generated from the parents as denoted in mathematical form below:
Here,
P refers the random numbers for Gauss distribution. Some segmented images are shown in
Figure 2.
3.2. Features Extraction (FE)
The proposed approach involves utilizing two algorithms, ResNet-V2 and Histogram of Orientation Gradients (HOG), for feature extraction from brain images. ResNet-V2 is a deep neural network architecture that has been shown to be effective in image classification tasks, while HOG is a popular algorithm used for feature extraction in computer vision. After extracting features from the images, a classifier is trained using a Bi-directional Long Short-Term Memory (BiLSTM) network. BiLSTM is a form of recurrent neural network that can acquire lasting dependencies in sequential data. In this case, it is utilized to classify brain images into non-tumorous and tumorous classes based on the features extracted by ResNet-V2 and HOG. The use of deep learning techniques such as ResNet-V2 and BiLSTM has shown promising results in the field of medical image analysis, including brain tumor detection. The combination of different feature extraction algorithms can also improve the accurateness of classification, as different processes may capture different aspects of the image information.
3.2.1. Histogram of Oriented Gradients (HOG)
This step involves extracting low-level features from tumor utilizing a Histogram of Oriented Gradients (HOG) algorithm. HOG is a popular feature extraction algorithm used in computer vision that captures the local gradient statistics of an image. In this approach, the segmented images are provided to a feature extractor block consisting of HOG and ResNet-V2, which is a deep neural network architecture. The HOG algorithm is utilized to extract a total of 1236 low-level features from the segmented images, using 9 bins to capture the gradient orientations. To improve the results, the intensity of the images can be improved by normalizing the images, although this is considered more valuable when the size of image is large.
To find the features, the images are first resized to compatible blocks of size 6 × 6 or smaller, and a stride of 4 is used for each 2 × 2-sized block. The HOG algorithm then computes the gradient magnitude and direction for each pixel in the image, with the direction ranging from 0–180 degrees. Pixels with similar orientations are grouped into the same bin, and the magnitude of the gradient for each pixel is computed using the mathematical equations. The magnitude
m for the gradient of pixel (
i,
j) and the direction is attained as presented in below equations.
where,
refers to the gradients in the directions of
x and
y. The
exhibits the angle from 0 to 180.
3.2.2. ResNet-V2
He et al. [
30] proposed ResNet and the block of residual, comprising two conv. layers and a connection for shortcut without any parameter that conveys the output of current block to the next block. The modification gave better performance than existing unmodified model in ILSVRC-2012 competition employing a 152 layered network and it was concluded that increasing the depth of the network results in improved classification accuracy. After the ResNet-V1, the authors improved the residual block so that ResLU function is not required in the shortcut connection, consequently increasing the detection accuracy. The main change in version 2 was implication of a stack as 1 × 1 batch normalization, 3 × 3 ReLU, and 1 × 1 2D convolutional layers. The architectures of ResNet-V1 and ResNet-V2 are shown in
Figure 3.
3.3. Fusion Process
Feature fusion has been widely applied in various machine learning applications, including medical imaging [
8]. It offers a dynamic approach to combining multiple feature maps, maximizing their integration. The model used for false positive detection relies on entropy. After obtaining the features, they are merged to form a single vector. Three vectors were computed as shown below:
The fusion of features was utilized as presented below:
Here,
presents the feature vectors by ResNetV2, and
presents the feature vectors by HOG. In this context, the feature vector f has undergone fusion. Then, an entropy value is calculated for the chosen features based on the specified value below.
The probability of the features is denoted by and their entropy is presented by . The merged features are ultimately fed into the classifier to distinguish the samples with tumors.
3.4. Classification
At this stage, we present our classification model that was trained using the merged features to obtain optimal performance for detecting brain tumors. We utilized support vector machines (SVM), decision tree (DT), and our novel Bidirectional Long Short Term Memory (BiLSTM) to categorize the three categories of brain tumors. BiLSTM networks have been shown to provide better predictions than traditional LSTM networks, as they work in both forward and backward phases during training. The input features and weights are passed through multiple layers to generate the output, which is then used to compute the error. The parameters are adjusted during the backpropagation (BP) step to minimize the estimation errors. Furthermore, Bi-LSTM layers perform sequential function on input features. We set the hyperparameters as: ADAM optimizer, learning rate as 0.001, and a batch size 32. Our proposed network achieved the best detection results, followed by SVM, while the minimum detection accuracy was obtained using DT. The layers’ details are presented in
Table 2.
5. Conclusions
This study proposes a robust brain tumor detection method based on feature fusion that efficiently performs without requiring the preprocessing of brain samples. The proposed method involves segmentation using the mayfly optimization technique with multilevel thresholding for localizing tumors in brain MRI images. Features are extracted using HOG for local feature mining and ResNet-V2 for valuable feature mining. The merged features are then classified into three categories (pituitary, glioma, and meningioma) using a BiLSTM classifier. We trained and tested our model using the Figshare dataset and evaluated its robustness using the Harvard dataset. The proposed method achieved an accuracy of 99.3%, a recall of 99.1%, precision of 98.3%, an F1 score of 99.1%, and an AUC of 0.989, outperforming state-of-the-art segmentation and DL-based brain tumor detectors.
This automated method can be used directly by radiologists and oncologists to assist in the early detection of brain tumors. Additionally, the system provides precise tumor locations that can aid physicians in making surgical decisions.
Our approach has a limitation in terms of training time, which could be addressed using high computational systems. We also discovered that our system may have difficulty predicting the type of brain tumor when the MR images are blurry. To address these issues in the future, we aim to reduce the required time for training while maintaining the same level of performance. We also plan to add a preprocessing step to enhance images in situations where high-resolution imaging tools are unavailable. Furthermore, we intend to use our proposed method for detecting various cancers such as lungs, skin, and bone.