1. Introduction
In 2022, brain and spinal cord primary cancerous tumors will be diagnosed in an estimated 25,050 adults in the United States (14,170 male and 10,880 female). Despite the odds of developing these tumors in one’s lifetime being less than one percent, they represent 85% to 90% of all primary central nervous system (CNS) tumors. With a high mortality rate among adults in the USA (18,280 deaths estimated in 2022), cancer of the CNS is the 10th leading cause of death [
1]. Glioma is the most common primary malignant tumor of the CNS in adults, demonstrating distinct characteristics. It represents the highest prevalent cerebral tumor, which takes place in the cerebral glial tissues comprising nearly 30% of brain tumors. Its incidence is about 5–10 per 100,000 people every year with serious morbidity and high mortality rates [
2]. Despite improvements in medical treatment targeting glioma-specific molecular pathways, the prognosis remains poor. The median survival is less than 15 months following primary diagnosis, and the five-year survival rate is less than 10% [
3,
4,
5]. The tendency of malignant gliomas to infiltrate the brain parenchyma in a diffuse manner and the particular tumor microenvironment, that may promote glioma development, contribute to their resistance to conventional surgical therapy and other treatment strategies.
According to the classification of the World Health Organization (WHO) [
6,
7], gliomas are categorized into four grades based on cellular morphology and malignant tumoral behavior. Grade I gliomas are biologically benign with low risk, while WHO Grade II gliomas are considered low-grade gliomas, with benign tendency, however, they have a considerable recurrence rate. WHO Grade III glioma (anaplastic glioma) and Grade IV glioma (glioblastoma) are considered high grade undifferentiated malignant gliomas, with poor prognosis [
2,
8]. The most aggressive, glioblastoma, has a survival rate of only 6.8% [
9]. Gliomas are graded primarily on the histopathology of tissue obtained through surgical biopsy or resection, which shows the malignancy scale of the tumor. Revised criteria for glioma grading were published with the 2016 revision of the WHO classification, taking into account molecular and genetic information along with the histological features of the tumors, aiming at better prediction of tumor behavior, treatment response, and prognosis [
10,
11,
12]. Precise tumor grading is of immense importance for guiding neuro-oncologists to the proper decisions for tumor treatment planning and consequently better patient prognosis [
13,
14]. Despite the fact that the biopsy is the reference standard for identifying the grade of gliomas, it is not favorable because of high invasiveness, expense, and its adverse effects such as bleeding and infection. Therefore, many researchers were motivated to investigate imaging techniques for a non-invasive, early, and precise grading of gliomas for a timely management plan [
15,
16,
17,
18,
19,
20,
21,
22,
23,
24]. In particular, magnetic resonance imaging (MRI) is the most common imaging modality for the diagnosis and assessment of cerebral neoplasms, including gliomas. Conventional MRI can evaluate the anatomy of the tumor regarding location, morphology, multiplicity, and mass-related effects. Advanced MRI submodalities such as diffusion- and perfusion-weighted imaging, as well as magnetic resonance spectroscopy (MRS), have the ability to provide physiological information of brain tumors. They can provide quantitative metrics of tumor cellularity, vascularization, and metabolism before and after treatment [
12,
25,
26,
27]. Particularly, diffusion-weighted MR (DW-MR) has proven its ability to differentiate gliomas by grade, and in determining the prognosis. In general, higher grades are related to reduced apparent diffusion coefficient (ADC), as water diffusion is substantially decreased in the setting of increased tumor cellularity [
14]. Perfusion MR imaging techniques including arterial spin labeling (ASL) assess tumor vascularity associated with neoangiogenesis [
27]. MRS gives information about the tumor internal biochemical milieu, by assessment of the concentration of different tumoral cell metabolites. In glial brain tumor grading; N-Acetyl Aspartate (NAA) is a neuronal integrity marker, choline (Cho) a cell membrane breakdown and turnover marker while creatinine (Cr) is a metabolism marker. Increased glioma grade is related to increased aggression of tumor cells that is associated with rising in Cho and reduction in both NAA and Cr levels as well as increased Cho/NAA and Cho/Cr ratios [
28].
With the recent advances in artificial intelligence, specifically, machine learning algorithms, and the important role they are playing in the early diagnosis, detection, and prediction of different medical conditions [
29,
30], they have been incorporated in different studies to help with the clinical diagnosis of gliomas [
31]. For example, Zhang et al. [
16] explored the abilities of texture analysis along with machine learning on multimodal MR images to differentiate between glioma grades. Their study included a total of 120 subjects with gliomas, 28 low-grade gliomas (LGGs) and 92 high-grade gliomas (HGGs) (Grade I = 3, Grade II = 25, Grade III = 29, and Grade IV = 63). They extracted texture features from the aforementioned data and then performed an oversampling technique to balance their data. In their study, a total of 25 machine learning classifiers were incorporated for classification purposes. The optimal classification model achieved the highest accuracy of 96.1% in grading gliomas (Grades: II, III, and IV) using oversampling, while achieving 78.6% accuracy on the original data. Despite their promising results, their model did not investigate any functional or appearance features which could have led to higher performance especially with the unbalanced data. Another recent study by Cho and Park [
17] assessed the accuracy for classification of gliomas by characterization of multimodal MRI data from the MICCAI Brain Tumor Segmentation Challenge (BRATs 2015) [
32]. They utilized the logistic regression using shape and texture features to classify gliomas (LGG = 54 and HGG = 54). They reported an overall accuracy of 89.8%. Suarez-Garcia et al. [
18] investigated the role of multimodal-MRI along with texture features to identify HGG from LGG. A total of 285 subjects were obtained (HGG = 210 and LGG = 75) from the BRATs 2018 challenge. They extracted texture features from the gray level size zone matrix. These features are then fed to multiple linear regression models toward finding the best model to classify gliomas. Their best model was reported to have an accuracy of 91.8% distinguishing HGG from LGG. Banerjee et al. [
19] evaluated the abilities of CNN along with multi-sequence MR images to differentiate HGG from LGG. The authors included 746 patients (HGG = 472 and LGG = 274) from The Cancer Imaging Archive (TCIA) [
33] and BRATs. The study proposed two fine-tuned ConvNets and evaluated these models using leave-one-subject-out. They reported their highest accuracy of 95% for classification of gliomas. In another recent study, Alis et al. [
20] utilized the artificial neural networks (ANN) to differentiate HGG from LGG. The authors assessed the diagnostic accuracy of conventional MR images texture analysis. They extracted first and second order texture features from manually-placed ROIs of 181 subjects (HGG = 97 and LGG = 84). Their proposed pre-trained ANN achieved an accuracy of 88.3% discriminating HGG from LGG. Hsieh et al. [
21] evaluated intensity-invariant MR imaging obtained from 107 patients diagnosed with glioma (HGG = 34 and LGG = 73). They extracted texture features and histogram moments from local binary patterns of manually-delineated tumors. These features were then combined in a binary logistic regression classifier to distinguish HGG from LGG. They reported an accuracy of 93%. Six CNN models were developed by Kalaiselvi et al. [
22] for the classification of glioma lesions from volumetric MR scans. Their study included 4500 images extracted randomly from BRATs 2013 (30 volumes) and Whole Brain Atlas (WBA) (8 volumes). Their best model successfully discriminated HGG from LGG with an accuracy of 88.91%. Zhuge et al. [
23] gathered conventional MR images of 315 patients (HGG = 210 and LGG = 105) from TCIA and BRATs 2018 to develop two deep learning models for distinguishing HGG from LGG. With an accuracy of 97.1%, their proposed 3D ConvNet model achieved the best diagnostic performance. Using multimodal MR images of 285 cases (HGG = 210 and LGG = 75) from the BRATs 2017, Cho et al. [
24] assessed the diagnostic performance of radiomic features for discriminating between LGG and HGG. The authors extracted 468 radiomic features and then they selected their optimal features to build three different machine learning classification models. They reported an average accuracy of 88.54% using three different classifiers: logistic regression, SVM, and random forest classifiers.
Despite the fact that these papers represent the state-of-the-art research studies for grading gliomas using MRIs along with computer-aided diagnostic (CAD) systems, they had some limitations that need to be addressed. Most of these studies [
17,
18,
19,
20,
21,
22,
23,
24] only focused on differentiating between LGG and HGG instead of grading these tumors into Grades I, II, III, or IV for a proper management plan. In addition, none of the aforementioned studies incorporated any functional features for a better diagnostic performance. Some of the authors [
16,
18,
20,
21] incorporated texture features in their work, but they did not investigate any shape or appearance features which could have led to higher performance as well.
To circumvent for these drawbacks, we introduce a novel comprehensive glioma grading CAD (GG-CAD) system, shown in
Figure 1. To the best of our knowledge, the GG-CAD system is the first of its kind to integrate novel 3D appearance features, volumetric features, 3D first and second order textural features, with functional features extracted from multimodal MR images to precisely identify the glioma grade as Grade I, II, III, or IV for a proper medical management plan.
2. Materials
Input Data Description: A total of 99 biopsy confirmed glioma tumors (M = 49 and F = 50) with age range between 1 and 79 years and average of 40.15 ± 19.94 years were included (Grade I = 13, Grade II = 22, Grade III = 22, and Grade IV = 42). All patients, themselves or their parents/legal guardians (for minors with age < 18 years), provided their informed consent to participate in this study. Multimodal MR images, namely; T1-MR with pre- and post-contrast phases, T2-MR (FLAIR), and DW-MR, were acquired at Mansoura University Hospital, Egypt.
Imaging Protocols: The MRI examinations were performed using a 1.5 Tesla scanner (Ingena, Philips medical system, Best, The Netherlands). Examination was done with the participants in the supine position, using a standard eight-channel head coil. With a slice thickness = 3 mm and a matrix = 256 × 256, the following sequences were performed: axial T1 (TR/TE = 475/15 ms), axial T2 (TR/TE = 1250/100 ms), and axial FLAIR (TR/TE/TI = 8000/140/2000 ms). For the contrast-enhanced T1-MR, post-contrast axial images were acquired after administration of gadolinium-based contrast agents with a dose of 0.1 mmol/kg. The axial DW-MR were performed with multi-section single-shot spin-echo planar imaging (EPI) sequence (TR/TE/NEX = 3000/88/1) using b-values of 0 and 1000 . It is worth-mentioning that for all modalities, multiple axial cross-sections were obtained to cover the whole volume and were stored in DICOM format.
Reference Standard Diagnosis: Biopsy remains the reference standard to definitively diagnose gliomas and give a prognosis to determine and guide a treatment/management plan by pathologically testing the abnormal tissues within the brain. In most of the included subjects, excisional biopsy was performed. Multiple cores from the solid enhancing portion of the excised tumor were obtained. This solid portion differs from one tumor to another, and hence the number and locations of the cores differ according to the anatomical position of the lesion which correlates with tumor type and prognosis. For subjects with gliomas in hard-to-reach and/or sensitive areas within the brain that might be damaged by a more extensive procedure, a stereotactic needle biopsy was done. During this, a neurosurgeon drilled a small hole into the skull to allow the insertion of the thin needle. Small pieces from the abnormal tissues were then removed through the needle guided by radiological scanning. In both cases, the extracted specimens are sent to be pathologically tested, then the highest pathological glioma grade was assumed for the whole tumor.
It is worth mentioning that this study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of Mansoura University (MD.20.01.278).
Glioma Tumor Preprocessing: The more accurate the segmentation is, the more precise feature extraction will be. The initial data for each subject are multiple gray-level images at different modalities stored in DICOM format. These DICOM images were transferred to a workstation (extended MR Workspace release 2.6, Philips Medical Systems, B.V., Eindhoven, The Netherlands). For each subject, regions of interest (ROIs) were manually segmented as binary masks. In the case of the T1-MR and DW-MR; the ROIs identify the tumor itself, while for the T2-MR (FLAIR), the ROIs identify the tumor with its surrounding edema. The segmentation process was performed using an in-house software by two radiologists (blinded from each other while performing the segmentation) with more than 10-years of hands-on experience in analyzing medical images. To generate the ground truth segmentation, the common area between the two observers were extracted, and then an expert radiologist with more than 25-years of hands-on experience in medical image analysis decided whether the difference should be considered as a part of the tumor or normal tissues. Meanwhile, to assess and quantify the agreement between the two segmentations, we performed Bland-Altman analysis [
34]. The analysis showed a high agreement between the two segmentations with a mean absolute error of 40.556 and (−141.705, 131.867) ± 95% confidence interval. The mean-difference plot for the analysis is shown in
Figure 2. Finally, 3D glioma objects were constructed and used for extracting distinguishing features.
3. Methods
The proposed GG-CAD system (
Figure 1) can discriminate between different grades of gliomas by performing the following steps: (i) extracting higher order 3D-appearance features from the segmented glioma tumor by constructing the histogram of oriented gradients (HOG) and estimating the tumor volume from both contrast-enhanced T1-MR and from T2-MR (FLAIR), (ii) calculating first and second order textural features from the segmented tumor based on constructing histogram, gray-level co-occurrence matrix (GLCM), and gray-level run length matrix (GLRLM) by utilizing contrast enhanced T1-MR and T2-MR (FLAIR) imaging, (iii) estimating functional features by estimating 3D ADC maps for the segmented tumor from DW-MR imaging acquired at the
b-value of 1000
and calculating the amount of enhancement between pre- and post-contrast phases of T1-MR imaging, (iv) performing features selection using Gini impurity approach over the integration of all the aforementioned extracted features to get the most significant set of features, and (v) feeding the optimal set of features to a multi-layer perceptron artificial neural networks (MLP-ANN) classification model toward getting the final diagnosis of the tumor as Grade I, II, III, or IV.
3.1. Engineering Features
After preprocessing glioma tumors, the structured objects which represent the different subjects should be expressed as distinguishing, standardized, and machine-understandable features. These features have the ability to discriminate between different subjects through showing the learning algorithm how to interpret the characteristics of each object. The quality of these characteristics defines and enhances our machine learning model predictive abilities. Hence, after consulting the medical team, we had agreed on multiple categories of distinctive features that suit the nature of our problem. Below, we are going in depth with the extracted imaging features.
Higher order 3D-Appearance Features: To obtain a sensitive and specific enough GG-CAD system with the ability to differentiate between different grades of glioma tumors, distinctive parametric higher order appearance features were identified. The motivation for using these 3D-appearance features relies on the hypothesis that gliomas with higher grades have more aggressive growth rates, more complex, rough, and irregular shapes than those with lower grades. Therefore, accurate identification, modeling, and extraction of such descriptors is essential towards a precise diagnosis. In the proposed framework, we identified gliomas by HOG as well as the total tumor volume.
3D-HOG: The HOG descriptor is concerned with an object’s morphological structure/appearance finding a simplified image representation that only includes the most significant details about the image [
35,
36]. The HOG descriptor measures how frequent a gradient orientation occurs in a specific area of an object. In the proposed work, we applied a 3D-HOG approach over the volumes from the T2-MR (FLAIR) and pre- and post-contrast T1-MR. The extracted ROIs are resized to a new shape of (32 × 32 × Number of slices per volume). So, all the volumes have the same length (X) and width (Y), while the depth (Z) is the different dimension depending on the number of 2D slices of each volume. The resized ROI is passed to Algorithm 1 with the number of bins set to 9 and the number of cells set to 4. Finally, the HOG descriptor is formed. We have 4 cells, each cell has 8 histograms, and each histogram is generated over 9 bins. Hence, the total number of features are
features for each volume.
Figure 3 shows the different steps to applying the 3D-HOG approach.
Algorithm 1 Histogram of oriented gradients |
| Input: ROI V of size and voxel size , cell size C and
number of bins B |
| Output: The HOG |
1 | foreach voxel do |
2 | |
3 | |
4 | foreach non-overlapping cell i of size do |
5 | foreach bin j do |
6 | |
7 | |
8 | foreach k in the 8-neighborhood of i do |
9 | |
Textural Features: To enhance the performance of early differentiation between different grades of glioma tumors, we extracted comprehensive first and second order texture features that give precise description for the heterogeneity/homogeneity of the detected gliomas. We were motivated by the hypothesis that high grade glioma tumors are more heterogeneous than low grade gliomas in terms of textural appearance [
37,
38]. An illustrative Figure (see
Figure 4) showing the texture differences between 4 subjects with different glioma grades demonstrates the feasibility of our hypothesis.
All first order textural features were estimated using a normalized empirical histogram. While first-order textural qualities are useful descriptors, they are susceptible to noise. To grant a better quantification of heterogeneity between different grades of tumors, second order textural features (GLCM and GLRLM) were used [
39,
40].
GLCM: The GLCM is used to evaluate the spatial interactions between voxels in a neighborhood block (reference and nearby voxels). The GLCM is a bivariate histogram encoding the frequency of neighboring voxel pairs having specific intensities. To construct the GLCM it is first necessary to identify the intensity range of the region of interest and define the level of quantization. Next, one defines the neighborhood system by specifying the spatial relationship between pairs of voxels that are considered neighbors. Each element
of the GLCM is proportional to the instance of a voxel with intensity
i having a neighboring voxel with intensity
j, and the matrix is normalized such that all elements sum to unity [
39,
41,
42]. For our application, voxel intensities were 8-bit quantized, producing a
GLCM. Each voxel’s neighbors were all those within
mm distance, so the resulting GLCM was approximately symmetric, except for the effect of boundary voxels.
GLRLM: Aside from determining the occurrence frequency of GLCM-represented voxel pairs, GLRLM examines voxel runs to determine the connectivity of voxels. GLRM encodes the number of times a run of
n consecutive voxels occurs with the same intensity. As with the GLCM, we first define the gray level range and its quantization (
in 8 bits), which fixes the number of rows in the GLRLM. The column dimension is the maximum extent of the region of interest in the dimension where runs are being observed. Then element
of the GLRLM is the relative instance of a run of
j consecutive voxels all having intensity
i [
40,
41,
42]. Because of the difference between slice spacing and pixel spacing in the MRI data, we constructed one GLRLM for voxel runs in the
x and
y directions, within the same MRI slice, and another for runs in the
z direction, i.e., across slices at the same
location in each slice. See
Table A1,
Table A2,
Table A3,
Table A4 and
Table A5 for a detailed description and formulas of the extracted textural features.
Functional Features: Functionality of gliomas has direct effects on the imaging which can lead to enhancing the abilities of the model in identifying the grade of glioma tumors. Therefore, we investigated two different functional features from T1-MR and DW-MR to capture the functionality aspects of different glioma tumors.
Contrast-enhancement slopes (T1-MR): Hyperenhancement (contrast-enhancement) can characterize the functionality of a given glioma tumor. The contrast-enhancement slope can be estimated between pre- and post-contrast phases of T1-MR. The enhancement leads to remarkable changes in the gray values. These temporal changes enable the construction of contrast-enhancement slope. This slope is calculated by getting the gray-level intensity change rate over the time period between the two phases [
43,
44]. HGG tumors might demonstrate higher and faster slopes than those of LGG tumors.
ADCs (DW-MR): The diffusion-weighted MR signal is affected by diffusion of water through the tissue and also by capillary perfusion. Pathological tissues, such as gliomas, are likely to have unusual diffusivity properties relative to surrounding healthy tissue. Hence, well-known functional parameters called apparent diffusion coefficients (ADCs) can be estimated from different gradient field strength and duration (
b-value). Higher
b-values produce greater attenuation of the DW-MR signal intensity. The voxel-wise ADC values [
45] at a certain
b-value can be calculated as shown in
Figure 5. Subsequently, we set the
b-value to 1000
and calculated ADC accordingly for each voxel within the segmented tumor. Using this set of ADCs directly as a descriptive feature is problematic, since every segmented region includes a different number of voxels. Therefore, we standardized the length of the descriptor by binning the ADC measurements and calculating their cumulative distribution function, or CDF (
Figure 5).
3.2. Classification and Hyper-Parameters Tuning
After deciding on the extracted discriminating features: appearance, textural, and functional for all glioma tumors from three different MR modalities, namely; contrast-enhanced T1-MR, T2-MR (FLAIR), and DW-MR, we proceeded with the classification process to differentiate between the different glioma grades by utilizing MLP-ANN classification models.
MLP-ANNs are well known machine learning classifiers that mainly include three types of layers: an input layer, one or more hidden layers, and an output layer, each with a large number of activation/processing units called nodes/neurons. The network is structured assuring that a full connection occurs between adjacent layers. The MLP-ANN can partition the input space of a feature into arbitrarily complex regions using non-linear activation functions used by neurons. In order to minimize the loss function, the MLP-ANN primarily employs a supervised backpropagation learning technique, where the connection weights and additive biases can be updated in the training phase using gradient descent methods [
46].
Hyper-Parameters Tuning: To obtain the optimal set of MLP-ANN hyper-parameters, an in-depth grid search algorithm was employed. In the grid search, a grid containing different combinations of hyper-parameters is set up and then the MLP-ANN is trained/tested on each of the possible combinations searching for the best diagnostic performance. During the search process the accuracy of diagnosis was used as a metric for optimization purposes. In order to avoid using some hyper-parameters leading to a good performance on the training data but not so good with the test data, the hyper-parameters optimization is implemented using leave-one-subject-out (LOSO) cross-validation on all of the extracted features from all subjects within the dataset. Finally, the following parameters were approved for the proposed GG-CAD system using MLP-ANN (trainfcn: trainlm, max epochs = 500, hidden layers: hl = 200, hl = 100, hl = 50, goal = 0, max validation failure = 6, min gradient = 10, training gain (): initial = 0.001, decrease factor = 0.1, increase factor = 10, ).
3.3. Engineering Features Selection
Typically, a features selection approach is utilized to select the optimal relevant features from a wide cohort of prospective features. This technique yields
m features from a set of
n options, where
, and
m is the smallest set of significant and important features. Here, we applied a Gini impurity-based selection [
41].
Gini Impurity-based Selection: Besides their reputation as one of the robust machine learning classifiers, random forests can be used as feature selectors. This is due to the fact that random forests’ tree-based techniques naturally rely on how effectively the purity of the node is strengthened. This indicates an impurity deterioration across all trees, which is called Gini impurity. The nodes with the highest drop in impurity are found at the beginning of the trees, while the nodes with the lowest impurity drop are found at the end. By trimming trees below a certain node or a certain impurity threshold, an optimal subset of the most significant features can be selected. The process of adopting this selection strategy are shown in Algorithm 2. To discover the best set of features to employ for the learning process, we used the Gini impurity-based technique on the entire set of the integrated features. Here, searching for the optimal impurity threshold value to be used, we applied three alternative scenarios for the selection process through using three different values of impurity threshold. Using LOSO cross-validation approach, each reduced features set is utilized (trained/tested) on the MLP classification model. Then, we compared the diagnostic performance of the developed GG-CAD system using all the extracted features with using the three reduced sets of features. From
Table 1, we can find that using an impurity threshold of 0.001 gave the best performance and that a set of 332 selected features is the optimal set to be used for the proposed model.
Table 2 shows the number of selected components from each group of features. Details of the selected features for the final model are shown in
Table A6.
Algorithm 2 Gini-Impurity-based Selection Process |
1 Prepare the integrated features set. |
2 Model and train a Random Forests classifier. |
3 Set the impurity threshold. |
4 Determine the features with the highest significance. |
5 Build the final set that includes only these selected features. |
3.4. Experiments
To validate the performance of the proposed GG-CAD system, the following experiments were performed:
Experiment I: To assess the final diagnostic performance of the proposed GG-CAD system, the optimal subset of features that was selected using the Gini impurity approach, were fed to a hyper-tuned MLP-ANN classification model.
It is worth mentioning that the diagnostic performance of the GG-CAD system was evaluated using a LOSO and k-fold (with k = 5 and 10) stratified cross-validation approach on the aforementioned data. For the LOSO, all data except one subject are used to train the classification model. Before the next iteration, the classification model is reinitialized, and the observation that was previously left out of the training data is included in the training data, leaving the next subject out for testing purposes. This method is performed 99 times (the included subjects in the study), with the training and testing samples being 98 and 1, respectively, at each iteration. For the k-fold stratified cross-validation, a subset of the data is chosen at random and saved for testing, while the rest () is utilized for training. In the next iteration, the classification model is reinitialized, and the subjects from the previous iteration are included in the training, leaving the next group of subjects for testing. This procedure is performed k times.
Stratification was important to eliminate any opportunity for bias and variance during the execution of k-fold cross-validation. The stratification technique not only achieves randomization, but it also assures that original distribution of subjects over different classes in the overall data set is maintained in the training/testing sets. To achieve stratification in our case, 13% of the training/testing sets were derived from Grade I, 22% from Grade II, 22% from Grade III, and 43% from Grade IV. These randomly stratified k-fold cross-validation approaches (with k = 5 and 10) were utilized to ensure the that the proposed model is robust and not prone to overfitting.
Experiment II: To show the added value of each group of extracted features and highlight the importance of the features integration process, we measured monothetic classifier performance using each individual group of features in turn.
Experiment III: Finally, to appreciate the diagnostic performance obtained by the developed GG-CAD system, we applied two different approaches from the literature [
20,
22] on our dataset (N = 99) and with the intended classification problem of glioma grading (GG-I vs. GG-II vs. GG-III vs. GG-IV) for a fair comparison. Then, we compared the final diagnostic results obtained by the developed GG-CAD system with those obtained by the two different approaches.
Evaluation Metrics: Using quadratic-weighted Cohen’s kappa [
47] and accuracy as the evaluation metrics during the classification, all experimental results (
Section 4) were documented in terms of mean ± standard deviation over 15 times of repetition.
5. Discussion
In adults, gliomas have the highest rates of prevalence and incidence among brain tumors and cause significant mortality and morbidity. They can occur anywhere in the central nervous system but primarily occur in the brain and arise in the glial tissue namely astrocytes, oligodendrocytes, and ependymal cells. Based on histological characteristics, the WHO traditionally classifies gliomas into four different grades. Grade I are solid and non-infiltrative tumors (pilocytic astrocytomas), while grades II–IV are diffuse infiltrating gliomas. Each grade of the glioma has a different treatment plan, which increases the importance of accurate grading of gliomas. Despite its demerits, surgical biopsy remains the gold standard for grading gliomas. Therefore, studying the potential for MRI-based grading of gliomas using CAD systems as a replacement for biopsy has been largely investigated with an ongoing interest [
16,
17,
18,
19,
20,
21,
22,
23,
24].
In this study, the proposed GG-CAD system showed high diagnostic abilities in differentiation between different grades of glioma tumors. This precise tumor grading can guide neuro-oncologists to a proper decision for tumor treatment planning and better prognosis. Conventional MRI can assess the anatomy of tumor and advanced MRI modalities can provide quantitative assessment of tumor metabolic features, tumor cellularity, and vascularity. Hence, three different MR imaging modalities, namely; contrast-enhanced T1-MR, T2-MR (FLAIR), and DW-MR were utilized to provide different aspects of features. Using the most significant sets of the extracted discriminating features along with one of the most powerful machine learning classification techniques like MLP-ANN is proven to be highly efficient in determining the malignancy grade of a given glioma tumor.
In this work, the extracted ROIs from the images of different modalities were stacked and rendered in 3D objects representing the subjects. These 3D objects comprise multiple voxels representing the tumor itself (in contrast-enhanced T1-MR and DW-MR) or the tumor and its surrounding edema (in T2-MR (FLAIR)). Numerous histopathological parameters control the signal strength of each voxel displaying a specific gray-scale value. As a result, 3D arrays of gray-scale values in ROIs may reveal sophisticated geometric patterns that are specific to tumor grades, even if they are visually unrecognizable.
Hence, texture analysis was performed in our research. Texture analysis gives an effective description of how the gray-level of each voxel in a particular area affects the overall distribution of the values of voxels. These texture data have shown a significant impact on the performance of classification systems in a variety of studies [
16,
17,
18,
20,
21,
24].
The developed model utilized first and second order texture features using different methods and algorithms of texture analysis. First-order features or histogram-based features express the histogram representing the distribution of image intensity, which describe how the intensity signals of voxels are distributed over the tumor. So, these features neglect the spatial orientation and relationship between voxels. Second-order features are statistical relationships between adjacent voxels, or groups of voxels, in terms of intensity levels. Intratumoral heterogeneity is quantified by these features. These features are derived from quantitatively describing matrices encoding precise spatial relationships between voxels in a specific area in the tumor. We’ve incorporated both GLCM and GLRLM in our research [
39,
40]. While the GLCM shows how frequently two intensity levels appear in adjacent voxels within the tumor’s object, the GLRLM encodes the size of homogeneous runs in the same object.
In addition to the role of texture analysis in specifying the grade of gliomas, the severity of a glioma determines the 3D-appearance and shape of both the tumor and the surrounding edema. Higher grade gliomas appear to be much more complex and have sharper edges than these lower grade gliomas. This motivated us to measure 3D-appearance features to capture the potential shape differences between different grades of gliomas. Additionally, the volumes of the tumor itself and the tumor with the surrounding edema are also calculated.
One of the most beneficial aspects was to capture the functionality of different grades of gliomas that might be a key point towards achieving our goal. Functionality of gliomas has an effect on the imaging which can lead to enhancing the abilities of the model towards identifying the grade of glioma tumors. 3D DW-MR of each glioma was acquired using and s/mm). ADCs at the nonzero b-value were then calculated to capture the functionality differences between subjects with different grades. In contrast-enhanced T1-MR, the contrast-enhancement changes which differ according to the severity of the tumor and this difference is also captured. Contrast-enhancement slopes detecting the changes in gray-level distribution between pre- and post-contrast phases are estimated to quantify the enhancement variations between the different grades.
In the classification process, we incorporated all of the features with machine learning models. Most of these characteristics differed significantly amongst glioma grades, however there was still a significant overlap. Even when the most appropriate MR sequence has been utilized, such diversity negatively affects the ability of using a single feature class to accurately identify the glioma grade. On the other hand, using an optimal set of significant features selected from the combination of all features provided a better feature representation to detect the grades of different gliomas. With high classification performance, the developed GG-CAD system discriminated between the different grades of gliomas using the optimal set of features selected from the integration of all extracted features. The obtained results demonstrate the diagnostic abilities of the proposed model as well as the clinical utility of our methods when combined with MR imaging in the computer-aided diagnosis of brain malignancies. These results are reported in
Table 1,
Table 3,
Table 4 and
Table 5. Moreover,
Figure 6 shows the confusion matrix of the final proposed model.