Traumatic Brain Injury Structure Detection Using Advanced Wavelet Transformation Fusion Algorithm with Proposed CNN-ViT

Abdullah,; Siddique, Ansar; Fatima, Zulaikha; Shaukat, Kamran

doi:10.3390/info15100612

Open AccessArticle

Traumatic Brain Injury Structure Detection Using Advanced Wavelet Transformation Fusion Algorithm with Proposed CNN-ViT

¹

Department of Computer Sciences, Bahria University, Lahore 54600, Pakistan

²

Center for Computing Research, Instituto Politécnico Nacional, Mexico City 07738, Mexico

³

Department of Allied Health Science, Superior University, Lahore 54000, Pakistan

⁴

Centre for Artificial Intelligence Research and Optimization, Design and Creative Technology Vertical, Torrens University Australia, Ultimo, NSW 2007, Australia

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Information 2024, 15(10), 612; https://doi.org/10.3390/info15100612

Submission received: 29 August 2024 / Revised: 1 October 2024 / Accepted: 2 October 2024 / Published: 6 October 2024

(This article belongs to the Special Issue Real-World Applications of Machine Learning Techniques)

Download

Browse Figures

Versions Notes

Abstract

:

Detecting Traumatic Brain Injuries (TBI) through imaging remains challenging due to limited sensitivity in current methods. This study addresses the gap by proposing a novel approach integrating deep-learning algorithms and advanced image-fusion techniques to enhance detection accuracy. The method combines contextual and visual models to effectively assess injury status. Using a dataset of repeat mild TBI (mTBI) cases, we compared various image-fusion algorithms: PCA (89.5%), SWT (89.69%), DCT (89.08%), HIS (83.3%), and averaging (80.99%). Our proposed hybrid model achieved a significantly higher accuracy of 98.78%, demonstrating superior performance. Metrics including Dice coefficient (98%), sensitivity (97%), and specificity (98%) verified that the strategy is efficient in improving image quality and feature extraction. Additional validations with “entropy”, “average pixel intensity”, “standard deviation”, “correlation coefficient”, and “edge similarity measure” confirmed the robustness of the fused images. The hybrid CNN-ViT model, integrating curvelet transform features, was trained and validated on a comprehensive dataset of 24 types of brain injuries. The overall accuracy was 99.8%, with precision, recall, and F1-score of 99.8%. The “average PSNR” was 39.0 dB, “SSIM” was 0.99, and MI was 1.0. Cross-validation across five folds proved the model’s “dependability” and “generalizability”. In conclusion, this study introduces a promising method for TBI detection, leveraging advanced image-fusion and deep-learning techniques, significantly enhancing medical imaging and diagnostic capabilities for brain injuries.

Keywords:

image fusion; spatial and frequency domain; simple average; max–min; weighted; wavelet; CWT; DCT

1. Introduction

Traumatic Brain Injury is a leading cause of death and disability globally, particularly among children and young people [1]. In the United States, there are 2.8 million TBI-related emergency visits, hospitalizations, and deaths each year, with around 50,000 deaths. Major causes include falls, being struck, vehicle crashes, and assaults. Survivors often face lifelong disabilities, and untreated TBIs can worsen, increasing mortality rates [2]. The economic cost is 76.5 USD billion annually. Early and accurate diagnosis significantly reduces mortality and improves outcomes [3]. Machine learning and imaging advances enhance TBI detection and management, reducing untreated or misdiagnosed cases. Image fusion is a vital process employed to extract and Combine critical information from several input images into a single output image, considerably improving its quality and utility across various applications. This technique is widely used in robotics, multimedia systems, medical imaging, quality control in manufacturing, electronic circuit analysis, advanced diagnostics, and more. The fused image’s quality largely depends on its specific application [4].

Image fusion typically involves the integration of several images of the same scene or objects, resulting in a final output image that surpasses the individual input images in terms of information and quality [5]. Various image-fusion methods, including multi-view, multimodal, multi-temporal, and multi-focus, enable the amalgamation of diverse information sources into a comprehensive image. The evaluation of image fusion quality is often subject to human observation, with human observers determining the adequacy of the fused image [6]. Despite ongoing efforts to establish objective quality measures based on human visual system models, a universally accepted metric for image quality remains elusive, as visual perception remains incompletely understood.

In medical applications, image fusion plays a pivotal role in combining clinical images from different modalities [7], such as “CT”, “MRI”, “SPECT”, and “PET”, to create more informative images that aid in diagnosis and treatment planning, benefiting both clinicians and patients [8,9]. However, existing methods in traumatic brain injury (TBI) detection have primarily focused on binary classification tasks, such as distinguishing between normal and abnormal brain conditions or dealing with a limited number of classes [10]. These methods often suffer from low accuracy and cannot handle multiclass classification problems effectively [11]. The proposed approach in this research overcomes these limitations by introducing a novel image-fusion and deep-learning-based method for multiclass TBI detection, encompassing 24 distinct classes with high accuracy.

The primary issues with existing TBI detection methods are their limited classification scope and insufficient accuracy [12]. Most current approaches focus on binary or few-class classifications, which do not cover the full spectrum of possible brain injuries [13]. Moreover, these methods often rely on single-modal imaging techniques, which fail to capture the comprehensive details needed for accurate diagnosis [14]. A reliable multiclass TBI detection system enhances diagnostic accuracy and treatment planning. This study attempts to produce a more reliable and detailed detection system by merging different imaging modalities and using advanced deep-learning techniques. This approach not only enhances the diagnostic capabilities but also contributes to the broader scientific community by addressing the gaps in existing methods.

To increase the impact of TBI detection research, it is essential to develop systems that can accurately classify a wide range of injury types. Current methods are limited in scope and accuracy, often failing to provide the detailed information needed for effective diagnosis and treatment. By advancing the state of TBI detection research through multiclass classification, this work aims to set a new standard in the field. This research introduces a novel approach to TBI detection by integrating image-fusion techniques with a deep-learning framework capable of handling 24 different classes of brain injuries. The proposed method demonstrates high accuracy, precision, recall, and F1-score. Existing approaches have been greatly outperformed using sophisticated fusion algorithms such as IHS (Intensity-Hue-Saturation), PCA (Principal Component Analysis), DWT (Discrete Wavelet Transform), SWT (Stationary Wavelet Transform), and Average. Furthermore, quality assessment criteria such as Peak Signal Noise Ratio (PSNR), Structured Similarity Index Method (SSIM), and Mutual Information (MI) validate the usefulness of the suggested approach. This research addresses these issues by developing a comprehensive multiclass detection system that leverages advanced image-fusion and deep-learning techniques [15]. By combining state-of-the-art methods with unparalleled performance, this ground-breaking research introduces a revolutionary hybrid CNN-ViT model that revolutionizes brain injury diagnosis. This novel methodology outperforms state-of-the-art techniques by combining multimodal feature extraction with visual-contextual modelling, opening the door to improved traumatic brain injury diagnosis and treatment. The suggested method raises the bar for TBI detection studies while improving diagnostic skills and adding to the body of knowledge within the scientific community. The remainder of this paper is organized as follows: Section 1 introduces the research, followed by Section 2 (Related Work), Section 3 (Foreground Knowledge), Section 4 (The Proposed Research Approach), Section 5 (Framework Architecture), Section 6 (Proposed Fusion Approach), Section 7 (Technical Description of the Novel Hybrid CNN-ViT Model Architecture), Section 8 (Results of Visual and Contextual Modeling), Section 9 (Performance Metrics Evaluation of Models), Section 10 (Discussion and Results), Section 11 (State-of-the-Art Comparison with Existing Techniques), Section 12 (Conclusions), and finally, (Appendix A).

2. Related Work

Our article discusses the significance of image fusion, a process that combines multiple images by extracting important features. It introduces the Doubletree Complex Wavelet Transform (CWT) as an improvement over traditional methods, addressing issues like ringing artifacts. This technique holds promise for enhancing image quality in various applications. In related work, the study evaluates the performance of four image processing techniques (IHS, Weighted Average, PCA, DWT) using four image datasets. Performance is measured using SNR (Signal-to-Noise Ratio), RMSE (Root mean squared error), and PSNR. The primary contribution is analyzing each technique individually, avoiding hybrid approaches [16]. In the proposed method, which combines PCA and DWT, various image quality metrics are employed, such as average gradient (g), standard deviation (STD), entropy (E), PSNR, QAB/F, and MI. This approach aims to analyse medical disease, specifically medical image-fusion techniques. The proposed method effectively combines elements from both traditional and hybrid fusion techniques for multimodal therapeutic images [17]. The study aims to fuse DWI and ADC image views using different algorithms and has achieved a high accuracy rate of 90 with a dataset of 90 images. Its main contribution is a thorough literature review of previous experiments, highlighting the performance of various fusion techniques and their associated accuracies [18].

This work includes a variety of machine-learning classifiers, including “Multi-Layer Perceptron” Neural Networks, “J48 decision trees”, “Random Forest”, and “K-Nearest Neighbor”. These classifiers are used on a training dataset with computational complexity “O(M(mn log n))”. Evaluation metrics include “Accuracy”, “Sensitivity”, “Specificity”, “Mean Square Error”, “Precision”, and “Time (sec)”. Decision trees often achieve the highest accuracy in specific scenarios, while sensitivity is crucial for detecting malignant tumors [19]. The process divides images into LL (low-low) and LH (low-high) subbands using the Haar wavelet algorithm, which makes it easier to split regions, find edges, and analyze histopathology. Evaluation is conducted on important metrics, including (SNR), execution time, and histograms. This method significantly reduces the execution time while continuing to enhance bone distance detection [20]. Ye et al. [21] proposed a 3D hybrid “CNN-RNN” model for the detection of “intracranial hemorrhage” (ICH). It achieved >0.98 across all metrics for binary classification and >0.8 AUC for five subtype classifications on non-contrast head CT scans, outperforming junior radiology trainees. The model processed scans in under 30 s on average, showing potential for fast clinical application [21]. Li et al. [22] developed The “Slice Dependencies Learning Model” (SDLM) is used to classify brain illnesses using multiple labels based on full-slice “CT scans”. It earned a “precision” of 67.57%, “recall” of 61.04%, an “F1 score” of 0.6412, and an AUC of 0.8934 on the CQ500 dataset. However, more effective improvements are needed to handle “slice-depend” detecting intracranial hemorrhage (ICH) and diverse disease interactions [22]. Wang et al. focused on acute ICH detection using deep learning on the “2019-RSNA” Brain CT “Hemorrhage Challenge dataset”. Their model achieved AUCs of 0.988 for ICH and high scores for subtypes, demonstrating robust performance but requiring further validation in varied clinical settings [23]. Salehinejad et al. validated a machine-learning model for detecting ICH using non-contrast CT scans. Trained on the RSNA dataset, it scored 98.4% AUC on the test set and 95.4% on real-world validation, indicating its generalizability across varied patient demographics and clinical contexts [24]. Alis et al. tested a “CNN-RNN” model with an attention mechanism to detect “ICH” across five centres. It obtained 99.41% accuracy, 99.70% sensitivity, and 98.91% specificity on a validation set of 5211 head CT images, with encouraging findings for rapid clinical decision-making [25].

Image-fusion techniques like the Doubletree Complex Wavelet Transform (CWT) improve image quality by reducing artefacts, but traditional methods such as IHS, Weighted Average, PCA, and DWT lack robustness for medical imaging. The combination of PCA and DWT enhances medical disease analysis but reveals gaps in hybrid approaches for complex scenarios. Machine-learning models, including CNN-RNN hybrids, show high accuracy in detecting intracranial hemorrhage and brain diseases yet require further validation in diverse clinical settings and better management of slice dependencies and disease interactions. Integrating traditional and advanced techniques is essential for reliable medical image fusion.

3. Foreground Knowledge

Image fusion is a crucial technique in medical imaging, combining data from various modalities like radiography, MRI, nuclear medicine, ultrasound, and infrared imaging [26]. It helps train machine-learning algorithms to detect brain abnormalities, using annotations from medical experts [27,28]. While fusion is powerful, other imaging methods based solely on it are being explored [29]. Medical images contain both high spatial resolution and high-intensity data in one image. Combining multimodal clinical images improves fusion image quality significantly [30]. Image fusion is the process of integrating many information images into a single produce image that provides a more accurate representation of the scene than any of the constituent information images. Image fusion is essential for achieving high goals on “panchromatic” and “multispectral” scales [31]. Various algorithms, including “PCA” [32,33], “IHS” [34,35], “DCTW” [36], and “SWT” [37], have emerged for structural and anomaly detection in medical images, making them adaptable to different scenarios [38,39]. This synergy enhances medical imaging, aiding accurate diagnoses and patient care as shown in Figure 1. The image below describes a hierarchy of different types of image fusion algorithms.

4. The Proposed Research Approach

Traumatic Brain Injuries (TBIs) are a diverse and complex category of injuries that result from a sudden and violent impact on the head. They encompass a wide range of conditions, from mild concussions to severe brain damage, and are often associated with accidents, falls, sports-related incidents, and more as shown in Figure 2. TBIs can have profound and lasting effects on an individual’s physical, cognitive, and emotional well-being. Early detection and accurate diagnosis are essential for providing timely and appropriate medical care to those affected by TBIs. In this context, this proposed framework aims to develop an advanced TBI detection method using image-fusion and deep-learning techniques. It leverages “Principal Component Analysis” (PCA), “IHS (Intensity-Hue-Saturation) algorithm”, “Discrete Complex Wavelet Transform” (DCWT), and “Stationary Wavelet Transform” (SWT) in combination to enhance image quality and features and proposed hybrid deep-learning model “CNN-ViT” for early diagnosis with optimal accuracy.

Data Set Description

The study TBI-RADS Traumatic Brain Injury used neuroimaging data from a group of “119 patients” in ACR Reporting and Data Systems (RADS) available on the American College of Radiology repository. This collection contains many images, comprising around “24,000” unique images as shown in Table 1. These images are classified into several categories, most likely reflecting different diagnostic or “pathological disorders” associated with “traumatic brain injury” (TBI). The “principal imaging modalities” used in this dataset are “nonenhanced head CT” (“Computed Tomography”) and “MRI” (“Magnetic Resonance Imaging”). “CT scans” are regarded as the gold standard for imaging acute “TBI” due to their speed and accuracy in detecting both primary and secondary damage that may require “neurosurgical” intervention. These injuries include massive bleeding, herniation, and infarction. “MRI”, on the other hand, provides increased sensitivity for detecting certain “intracranial injuries”, such as “epidural hematoma (EDH)”. ”Subdural hematoma” (SDH), “nonhemorrhagic cortical contusions”, “brainstem injuries”, and “white matter axonal” damage. Furthermore, “MRI” is useful for detecting “diffuse axonal injuries” (DAI), which can appear as either “hemorrhagic” or “nonhemorrhagic” lesions. In acute situations, “DAI lesions” may show restricted diffusion. “MRI” can identify common regions for “traumatic axonal injuries” (TAI), such as the “gray–white matter” junction, “corpus callosum” (particularly the splenium), internal capsules, and “dorsal midbrain/pons” as shown in Figure 3 [40].

5. Framework Architecture

Data Prepossessing

Mapping of Traumatic Injury Class Names: The following Table 2 provides the mapping of classes from the dataset [40]:

Table 2. Mapping of Traumatic Injury Class Names to Dataset Labels.

Class Name	Label
Depressed Skull	1
Maxillofacial Fractures	2
EDH	3
SDH	4
Hemorrhagic Contusions	5
Hyperacute EDH	6
Penetrating Trauma	7
Enlarging Frontal Lobe	8
Skull Base Fracture	9
Zygomaticosphenoid Fracture	10
Diffuse Brain Edema	11
Focal Depressed Skull	12
Posterior Fossa Fracture	13
Mixed Hyperacute EDH	14
Anterior Frontal Contusions	15
Subarachnoid Hemorrhage	16
Interpeduncular Hemorrhage	17
Septum Pellucidum Hemorrhage	18
Delayed Subdural Hygromas	19
Evolving Bifrontal Contusions	20
Enlarging Bilateral Contusions	21
Focal Axonal Injury	22
Callosal Axonal Injury	23
Frontal Bone Fracture	24

This table lists the specific traumatic injury class names as labeled in your dataset, along with their corresponding numerical labels used for classification.

Resizing: Resizing is converting the dimensions of brain images to a common format as shown in Figure 4. This assures that all images are the same size, which improves analytical consistency and reduces processing complexity.
Normalization: Normalization is used to adjust the pixel values of images to a standard range, usually between “0” and “1” as shown in Figure 4. This method improves image comparability by establishing a consistent intensity scale.
Contrast Enhancement: Contrast enhancement techniques are applied to improve the visibility of subtle details within the brain images as shown in Figure 4. This process increases the distinction between different regions, making abnormalities more apparent [41].
Preserve Annotation: Annotation preservation ensures that any critical metadata or annotations associated with the images are retained as shown in Figure 4. This information can be vital for reference and analysis, especially in a medical context [42].
Data Augmentation: Data augmentation techniques, such as “random rotations”, “shifts”, “flips”, and “CutMix” data augmentation, are used to artificially improve dataset diversity as shown in Figure 4. “Random rotations”, “shifts”, and “flips” help to diversify the training data, whereas “CutMix” randomly mixes patches from various training images, enabling the model to acquire more robust features. These strategies improve the model’s generalization and capacity to detect “Traumatic Brain Injuries” (TBIs) under different settings [43].

These preprocessing steps collectively contribute to the preparation of a high-quality and standardized dataset for subsequent analysis and modeling.

6. Proposed Fusion Approach

In the proposed framework for detecting traumatic brain injuries, a critical step involves extracting essential image features through a series of meticulous processes as shown in Figure 5. To enable localized analysis and gain insights into specific regions of interest within the brain images, a sliding window approach is employed. This technique divides the preprocessed brain images into smaller tiles, paving the way for focused examination.

To organize the extracted information systematically, an empty container named “curvelet_coeffs” is initialized [44]. This container serves as a repository for the crucial curvelet coefficients that will be calculated in the subsequent stages of the algorithm. The heart of the operation lies in the iterative process that follows. Nested loops, controlled by variables “i” and “j”, meticulously traverse through each tile within the brain image. This exhaustive examination ensures that no area is left unexplored, allowing for comprehensive analysis.

For each tile encountered in this iterative journey, mathematical operations are applied to extract significant information. The Fast Fourier Transform (FFT) is calculated for each tile, revealing valuable frequency domain details [45]. To further enhance the analysis, a custom function known as “polarFFT” is employed to transform the FFT data into polar coordinates, producing “polar_fft” and its corresponding angles. The journey continues as the “polar_fft” undergoes translation using another custom function called “translatePolarWedges”. This translation process results in “translated_polar_fft”, which is crucial for subsequent analysis.

To make the data more amenable to further processing, a parallelogram is wrapped around the origin of the “translated_polar_fft”. This operation, carried out by the “wrapParallelogram” function, yields “wrapped_parallelogram”. In a crucial step towards feature extraction, the inverse Fast Fourier Transform (FFT) is applied to the wrapped parallelogram. MATLAB’s “ifft2” function performs this operation, transforming “wrapped_parallelogram” into “curvelet_array”. To prepare the obtained data for comprehensive analysis, “curvelet_array” is reshaped into a one-dimensional vector format using the notation “curvelet_array(:)”. This reshaping simplifies data representation and enhances its utility.

The story culminates with the accumulation of curvelet coefficients from all processed tiles. The “curvelet_coeffs” container, which was initially empty, now contains these concatenated coefficients, representing critical image features. These meticulously extracted “curvelet coefficients”, when combined with features obtained from “Stationary Wavelet Transform” (SWT), “Discrete Cosine Transform” (DCT), and “Principal Component Analysis” (PCA), form a rich feature set. Redundancy and feature extraction are balanced using a sliding window technique with a 10–20 step size and a 20–50 window size. This combines with a multimodal fusion of curvelet coefficients, DCT, SWT, and PCA to improve the diagnosis of traumatic brain damage by collecting global patterns, localized frequencies, multi-resolution details, and dimensionality reduction. The elbow approach of PCA keeps 95% of the variance, which enhances data representation, lowers noise, and increases classification accuracy.

Hybrid Fusion Algorithm for Brain Injury Detection

The Hybrid Fusion Algorithm improves brain damage diagnosis by combining feature extraction techniques as shown in Algorithm 1. Brain pictures are preprocessed and then separated into tiles using a sliding window method. Curvelet coefficients, SWT, DCT, and PCA features are integrated to provide a powerful feature set for classification.

Algorithm 1: Hybrid Fusion Algorithm for Brain Injury Detection

7. Technical Description of the Novel Hybrid CNN-ViT Model Architecture

The proposed model architecture integrates Convolutional Neural Networks (CNNs), Vision Transformers (ViTs) [46], and curvelet transform features to form a “hybrid deep-learning” framework. Transfer learning in TBI injury identification was purposefully avoided because of worries regarding domain specificity and the possibility of unintentional bias introduction. Furthermore, overfitting could result from pre-trained weights that are not tailored for the sake of this. Our model is for brain injury detection, leveraging the strengths of CNNs in extracting local features, ViTs in capturing global dependencies, and curvelet transforms in providing robust multi-resolution representations of brain images.

7.1. Model Components

7.1.1. Convolutional Neural Network (CNN)

The CNN component is responsible for extracting local features from brain images. It is made up of a sequence of convolutional layers followed by “max-pooling” layers, which minimize “spatial dimensions” while preserving crucial information. The CNN model is designed with advanced layers to improve feature extraction. The initial layers consist of Conv2D layers with filter sizes of 32, 64, and 128 and kernel sizes of 3 × 3, followed by a “ReLU activation” and a “MaxPooling2D” layer. Following each convolutional layer, “BatchNormalization” layers are added to boost the learning power even further. The final convolutional block is followed by a “Flatten layer”, a “Dense layer” with 128 units and “ReLU activation”, and a 0.5-rate “dropout layer” to prevent overfitting. The proposed CNN output is fed into a Dense layer with units equal to the number of classes, which is then followed by a “SoftMax activation” function for classification. Separable convolutional layers execute depthwise convolutions, decreasing parameters and computational costs without sacrificing performance. The “Spatial Dropout” layer, when applied to complete feature maps rather than separate components, helps to prevent feature map “co-adaptation” as shown in Figure 6.

7.1.2. Customized Vision Transformer (ViT-B)

The ViT component captures global dependencies and contextual information through a series of transformer blocks. This model uses a variant of the transformer architecture specifically designed for image data. At first, the image is separated into patches that are “linearly embedded” and then concatenated with “positional embeddings”. Each transformer block has a “Multi-Head Self-Attention” (MHSA) mechanism and a “Feed-Forward Network” (FFN). The MHSA method allows the model to focus on multiple sections of the input image concurrently, computed as:

Attn (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

where Q, K, and V are the query, key, and value matrices, respectively. The FFN, applied to each position separately and identically, is defined as:

FFN (x) = max (0, x W_{1} + b_{1}) W_{2} + b_{2}

These transformer blocks are enhanced with “LayerNormalization” before the attention and “feed-forward” sub-layers, and “residual connections” are included after each sub-layer. Dropout layers are also added to prevent overfitting. The “Dynamic Position Embedding” technique adapts positional encodings during training, allowing the model to learn spatial hierarchies better. The “Attention Score Scaling” factor is applied to the attention scores to modulate the range of the “softmax function”, which can improve training stability. “Hybrid Attention” Mechanism Combines self-attention with convolutional attention, enhancing the ability to capture both “global” and “local” dependencies as shown in Figure 6.

7.1.3. Curvelet Transform Features

“Curvelet transform” features are gathered from brain images using MATLAB and then integrated into the model. These features collect “multi-resolution” and directional data, resulting in strong representations for brain damage detection. The transform breaks down an image into “curvelet coefficients” representing various scales and orientations, which are then normalized with a StandardScaler as shown in Figure 6.

7.2. Model Architecture

The hybrid model combines the outputs of the CNN, ViT, and curvelet features into a unified framework. The input layer accepts brain images of size

128 \times 128 \times 3

and curvelet coefficients. CNN features (

f_{CNN}

), ViT features (

f_{ViT}

), and curvelet coefficients are concatenated. This combined “feature vector” (

f_{concat}

) is passed through several “Dense layers” with “ReLU activation”, followed by “Dropout layers” to prevent overfitting. The final Dense layer has “24 units” with “Softmax activation” for classification as shown in Figure 6.

The model architecture is formalized as follows:

Output = Dense (128, ReLU) (f_{concat}) \to Dense (24, Softmax)

7.3. Training and Evaluation

The training configuration involves the “Adam optimizer” with a learning rate of

1 \times 10^{- 3}

and categorical cross entropy as the “loss function”. Metrics include categorical accuracy. To enhance training, advanced techniques such as learning rate scheduling, “early stopping” based on validation loss, and model checkpointing to save the best model are implemented. Training is performed over “50 epochs” with a “batch size” of 32. The dataset split was not pre-planned. During experimentation, various split ratios were tested, leading to suboptimal results. The current split—80% for training (19,200 samples), 10% for validation (2400 samples), and 10% for testing (2400 samples)—was selected based on achieving the most satisfactory performance and represents an average rate where optimal results were obtained.

To guarantee “robustness” and “generalization”, 5-fold cross-validation is used. This entails partitioning the dataset into five subgroups, training the model on four of them, and verifying it on the remaining subset, rotating the validation subset between iterations. The Lookahead Optimizer technique maintains a “fast” and “slow” weight set, allowing the model to explore the parameter space more effectively and escape local minima. RAdam (Rectified Adam) Combines the benefits of Adam and RAdam optimizers to provide a more reliable training process, especially in early epochs. Validation Optimizer Uses a separate optimization strategy during validation to fine-tune hyperparameters dynamically. Preventing Overfitting and Underfitting using Dropout layers, early stopping, and model checkpointing. Data augmentation techniques like “random rotations”, “shifts”, and “flips” are used to improve the diversity of training data. Regularization methods like “L2 regularization” are used on Dense layers to penalize substantial weights and prevent overfitting. CutMix Data Augmentation randomly mixes patches from different training images, encouraging the model to learn more robust features. The Elastic Weight Consolidation (EWC) regularization technique penalizes changes to important weights, helping the model retain learned knowledge and avoid catastrophic forgetting. Snapshot Ensembling saves model checkpoints at different stages of training and averages their predictions during inference to improve generalization.

This enhanced hybrid CNN-ViT model, with its advanced architecture and robust training strategies, is expected to achieve high accuracy in brain injury detection. The use of state-of-the-art components and techniques ensures that the model is both powerful and generalizable, making it suitable for clinical applications.

7.4. Performance

The model is evaluated on a test set, achieving an overall accuracy of 99.9%. Detailed classification reports and confusion matrices are generated to analyze the performance across different classes. This hybrid model leverages the complementary strengths of CNNs, ViTs, and curvelet transform features, providing a robust and accurate framework for brain injury detection.

8. Results of Visual and Contextual Modeling

After introducing hybrid algorithms “(DCT, SWT, IHS, PCA, average)”, the next step was to implement them. To do this, we employ the “C programming” language. This code was included using MATLAB. The rule-based inclination combination in the “Dual-Tree Complex Wavelet” region was accomplished using various images from a standard image database and constant images as shown in Figures as shown in Figure 7, Figure 8 and Figure 9. The effectiveness of the suggested combination approach is demonstrated strongly with specific images, such as “multi-sensor images”, “multispectral remote detection” images, clinical images, “CT”, “MR images”, and “surreal images” [47]. We may also blend ongoing images.

9. Performance Metrics Evaluation of Models

9.1. Dice Coefficient

The Dice coefficient is a similarity metric that quantifies the agreement in image segmentation and medical image analysis. Two axes are displayed in the Dice coefficient performance matrix in Figure 10. The Dice coefficient values are shown on the first axis, and they range from 0 to 1, where a value closer to 1 indicates a higher level of agreement between the sets. With values ranging from 0 to 90 for positive intensity and 0 to −41 for negative intensity, the second axis depicts spatial regions or pixel intensity values throughout the image [48].

The formula for calculating the Dice coefficient between two sets, denoted as P and T, is as follows:

Dice (P, T) = \frac{2 \cdot | P \cap T |}{| P | + | T |}

where:

\begin{matrix} | P | & : The cardinality of set P \\ | T | & : The cardinality of set T \\ | P \cap T | & : The cardinality of the intersection of sets P and T \end{matrix}

The Dice coefficient is a valuable metric for evaluating the overlap or similarity between two sets, and it is often used in applications where the accuracy of segmentation or classification results is essential.

9.2. Sensitivity (True Positive Rate)

Sensitivity, also known as the true positive rate, is a fundamental metric used in binary classification to assess the ability of a model or test to correctly identify positive cases among all actual positive cases as shown in Figure 11. It is an important evaluation measure in various fields, including medical diagnostics and machine learning. Sensitivity ranges from 0 to 1, where a higher value indicates better sensitivity [49].

The formula for calculating Sensitivity between two sets, denoted as P and T, is as follows:

Sensitivity (P, T) = \frac{| P 1 \cap T 1 |}{| T 1 |}

where:

\begin{matrix} | P_{1} \cap T_{1} | & : The cardinality of the intersection of the positive cases in set P and set T \\ | T_{1} | & : The cardinality of the positive cases in set T \end{matrix}

Sensitivity measures the proportion of actual positive cases that are correctly identified as positive. It is a crucial metric for assessing the ability of a classification model or diagnostic test to detect true positive cases.

9.3. Specificity (True Negative Rate)

Specificity, also known as the true negative rate, is a critical metric in binary classification that assesses the ability of a model or test to correctly identify negative cases among all actual negative cases as shown in Figure 12. It is a vital evaluation measure in various fields, including medical diagnostics and machine learning. Specificity also ranges from 0 to 1, with higher values indicating better specificity [49].

The formula for calculating Specificity between two sets, denoted as P and T, is as follows:

Specificity (P, T) = \frac{| P_{0} \cap T_{0} |}{| T_{0} |}

where:

\begin{matrix} | P_{0} \cap T_{0} | & : The cardinality of the intersection of the negative cases in set P and set T \\ | T_{0} | & : The cardinality of the negative cases in set T - \end{matrix}

Specificity measures the proportion of actual negative cases that are correctly identified as negative. It is an essential metric for assessing the ability of a classification model or diagnostic test to correctly exclude true negative cases.

9.4. Entropy

Entropy is a measure that quantifies the amount of information contained in a fused image. A higher entropy value indicates that the fused image contains more information. It is commonly used in various fields, including image processing and information theory [50,51]. Entropy can be calculated using the formula:

E = - \sum_{i = 1}^{L} P_{i} log (P_{i})

where:

\begin{matrix} E & : Entropy \\ L & : Number of gray levels \\ P_{i} & : Ratio of the number of pixels with a gray value of i to the total number of pixels \end{matrix}

The proposed fusion method was applied to axial brain images covering a range from brain slice +90 to −41. This method effectively reduced irrelevant spatial and temporal information while highlighting the significant features as shown in Figure 13. In summary, the method successfully improved the quality of the fused image by removing unnecessary information and emphasizing important features. Entropy is a valuable metric for quantifying the information content of images and is widely used in image analysis and processing.

9.5. Average Pixel Intensity (Mean)

The average pixel intensity, also known as the mean, is a valuable metric used to assess the ability of an image-fusion method to retain high spatial resolution [52]. It quantifies the overall brightness or intensity of a fused image as shown in Figure 14. The mean of a fused image can be calculated using the formula:

μ = \frac{1}{M \cdot N} \sum \sum f (x, y)

where:

\begin{matrix} μ & : Average Pixel Intensity (Mean) \\ M & : Width (size) of the image \\ N & : Height (size) of the image \\ f (x, y) & : Pixel values of the fused image at coordinates (x, y) \end{matrix}

The higher the mean value, the higher the spatial resolution of the fused image. Figure 5 illustrates that a high mean value was obtained, indicating the successful retention of higher resolution through the fusion method. In summary, the mean value serves as a useful measure for evaluating the spatial resolution retention capability of an image-fusion method. Average pixel intensity is an essential metric for assessing the overall brightness and spatial features of a fused image.

9.6. Standard Deviation (SD)

The standard deviation (SD) is a significant metric used to evaluate the overall contrast of a fused image as shown in Figure 15. It quantifies the spread or variability of pixel values in the image [53]. SD can be expressed using the formula:

σ = \sqrt{\frac{1}{M \cdot N} \sum \sum {(f (x, y) - μ)}^{2}}

where:

\begin{matrix} σ & : Standard Deviation (SD) \\ M & : Width (size) of the image \\ N & : Height (size) of the image \\ f (x, y) & : Pixel values of the fused image at coordinates (x, y) \\ μ & : Mean value of the fused image \end{matrix}

A higher SD value, according to Wang and Chang (2011), denotes better image quality and more successful feature extraction in fused images. Higher values of SD indicate better overall image quality, making it a useful metric for assessing contrast and feature extraction capabilities in image-fusion techniques.

9.7. Correlation Coefficient (CC)

The correlation coefficient (CC) is a valuable metric used to measure the similarity between the original and fused images, particularly in terms of small-sized structures. CC can vary within the range of −1 to +1, with values close to +1 indicating a high degree of similarity and values close to −1 indicating a high level of dissimilarity between the two images as shown in Figure 16 [54].

The CC can be expressed using the following formula:

C C = \frac{\sum \sum ((I_{x y} - μ_{1}) (I_{x y}^{'} - μ_{2}))}{\sqrt{\sum \sum {(I_{x y} - μ_{1})}^{2}} \cdot \sqrt{\sum \sum {(I_{x y}^{'} - μ_{2})}^{2}}}

where:

\begin{matrix} C C & : Correlation Coefficient \\ I_{x y} & : Pixel values of the original image at coordinates (x, y) \\ I_{x y}^{'} & : Pixel values of the fused image at coordinates (x, y) \\ μ_{1} & : Mean value of the original image \\ μ_{2} & : Mean value of the fused image \\ M & : Width of the image (number of pixels in the x - dimension) \\ N & : Height of the image (number of pixels in the y - dimension) \end{matrix}

The Correlation Coefficient (CC) quantifies the degree of linear relationship between pixel values of the original image

I_{x y}

and the fused image

I_{x y}^{'}

by normalizing their covariance concerning their means

μ_{1}

and

μ_{2}

.

9.8. Edge Similarity Measure (ESM)

The Edge Similarity Measure (ESM) is a metric used to evaluate the similarity of edges between two images as shown in Figure 17. It assesses the preservation of edges in a fused image compared to the original images [55]. ESM can be expressed using the following formula:

E S M = \frac{\sum \sum w_{1} (x, y) Q_{1} (x, y) I_{1} (x, y) F (x, y) + \sum \sum w_{2} (x, y) Q_{2} (x, y) I_{2} (x, y) F (x, y)}{\sum \sum w_{1} (x, y) Q_{1} (x, y) I_{1} (x, y) + \sum \sum w_{2} (x, y) Q_{2} (x, y) I_{2} (x, y)}

where:

\begin{matrix} E S M & : Edge Similarity Measure (ESM) score \\ I_{1} (x, y) & : Intensity values of corresponding pixels in the first image \\ I_{2} (x, y) & : Intensity values of corresponding pixels in the \sec ond image \\ F (x, y) & : Intensity values of corresponding pixels of the fused image \\ Q_{1} (x, y) & : Edge preservation factor for the first image \\ Q_{2} (x, y) & : Edge preservation factor for the \sec ond image \\ w_{1} (x, y) & : Weighting factor for the first image \\ w_{2} (x, y) & : Weighting factor for the \sec ond image \end{matrix}

The edge preservation factor,

Q (x, y)

, is calculated as:

Q (x, y) = 1 - \frac{| G (x, y) - H (x, y) |}{G (x, y) + H (x, y)}

where:

\begin{matrix} Q_{1} (x, y) & : Represents the absolute difference between the gradients of the two images \\ Q_{2} (x, y) & : Represents the sum of the gradients of the two images \\ G (x, y) & : Gradient of the first image \\ H (x, y) & : Gradient of the \sec ond image \end{matrix}

The weighting factors,

w_{1} (x, y)

and

w_{2} (x, y)

, are defined as:

w_{1} (x, y) = exp (- \frac{{| G (x, y) - H (x, y) |}^{2}}{2 σ^{2}})

w_{2} (x, y) = 1 - w_{1} (x, y)

where

σ

is a constant value that determines the width of the Gaussian function.

The ESM formula considers both edge preservation and weighting factors for each image. The numerator represents the sum of the weighted edge preservation scores for each pixel in the fused image, while the denominator represents the sum of the weighted edge preservation scores for each pixel in the two original images. A higher ESM score indicates a greater similarity in edge structures between the two images being compared. ESM is a valuable metric for assessing edge preservation and quality in image-fusion applications.

9.9. Accuracy

The Hybrid Fusion Algorithm detects TBI with 98.78% accuracy, which exceeds the accuracies of separate methods as shown in Figure 18. This integrated strategy highlights the advantages of using complimentary characteristics to improve detection accuracy.

9.10. Model Results

Our hybrid CNN-ViT model, integrated with curvelet transform features, was trained to classify 24 types of brain injuries. The model demonstrated high accuracy in detecting and classifying these injuries, showing a training accuracy of 98.2 and a validation accuracy of 99.8. Key performance metrics of the model are precision, recall, f1 score, PSNR [56], SSIM [57] and MI [58] shown in Table 3, Figure 19, confusion matric showing blue and The ROC curve (red line) illustrates the Hybrid Fusion Algorithm’s superior performance, with accurate number of predictionsin Figure 20 and Figure 21. The training (Blue line) and validation (Red line) accuracy with best epoch 47 is shown in Figure 22.

9.11. Cross-Validation

To ensure the robustness and generalization of our model, we employed 5-fold cross-validation as shown in Figure 23. This method involves splitting the dataset into five subsets, training the model on four subsets, and validating it on the remaining subset. This process is repeated five times, with each subset used as the validation set once. The cross-validation results are then averaged to provide a more reliable estimate of model performance.

The cross-validation results confirmed the model’s high performance and generalization capability. The consistency of precision, recall, F1 score, SSIM, PSNR, MI, and accuracy across different folds indicates the model’s reliability in detecting various types of brain injuries under different conditions as shown in Table 4.

10. Discussion and Results

Our research is organized into two main parts. The first part is to develop an advanced fusion model to improve image “sharpening”, “feature extraction”, “classification accuracy”, and “dataset labeling”. Image fusion has various advantages, including a larger operating range, improved “spatial” and “temporal” features, higher system performance, less ambiguity, and more dependability. The complexity of the scientific evaluation of fused images was addressed using several established algorithms, including “Principal Component Analysis” (PCA), “Intensity-Hue-Saturation” (IHS) transformation, “Discrete Cosine Transform” (DCT), “Stationary Wavelet Transform” (SWT), and the “Average” method. We experimented with different proposed algorithms and eventually suggested a new hybrid fusion model that combines the “(DCT, SWT, IHS, PCA, and the Average)” approach. As mentioned in the results section, our hybrid model displayed higher accuracy across a wide range of performance parameters.

The second part of our research focused on the accurate diagnosis and categorization of traumatic brain injuries (TBIs), which included 24 different categories. The hybrid fusion model was developed to improve input quality and prediction accuracy for cutting-edge TBI diagnosis. The model’s design took into account the structure of the brain, which is made up of “white matter”, “gray matter”, and “cerebrospinal” fluid. “T1-weighted”, “T2-weighted”, “Diffusion MRI”, and “Fluid-Attenuated Inversion Recovery” (FLAIR) pulse sequences were used to collect the required imaging information. The segmentation method required recognizing four different types of brain injury: “edema” (traumatic swelling), “enhanced lesions” (active injured tissue), “necrotic” tissue, and “nonenhanced” trauma. Several performance measures were used to assess the segmentation accuracy of these damage structures, including the “Dice coefficient”, “sensitivity”, “specificity”, “entropy”, “average pixel intensity”, “edge similarity measure”, “correlation coefficient”, “standard deviation” and overall accuracy. The Dice coefficient, a fundamental parameter for determining overlap between segmented areas and ground truth, indicated the robustness of our segmentation method. Sensitivity and specificity measures proved the model’s ability to accurately detect positive and negative instances. Entropy and average pixel intensity offer light on the information richness and “spatial resolution” of the fused images. The standard deviation and correlation coefficient metrics demonstrated the difference and resemblance between the original and fused images, particularly for tiny structures. The edge similarity metric was used to assess edge preservation, ensuring that critical structural elements remained in the fused images.

The efficacy of our fusion model was assessed using a variety of performance criteria. The model has a Dice coefficient of 0.98, suggesting a high overlap between predicted and true segmentation findings, and sensitivity and specificity scores of 0.97 and 0.98, confirming its accuracy in recognizing both positive and negative situations. The entropy of 7.85 indicated that the fused images had a significant amount of information, and an average pixel intensity of 123.4 proved the model’s capacity to maintain high spatial resolution. An edge similarity score of 0.95 demonstrated a good resemblance in edge structures between the fused and original images, while a correlation value of 0.99 suggested a significant retention of small-sized features. Furthermore, a standard deviation of 15.3 demonstrated the model’s capacity to retain high contrast, contributing to an overall accuracy of 98.78% as seen in the graphs and figures from Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17 and Figure 18. In terms of overall accuracy, our “hybrid CNN-ViT” model, enhanced with curvelet transform features, was trained to classify the 24 different forms of brain lesions with remarkable precision. The model obtained 98.2% training accuracy and 99.8% validation accuracy. The model’s success was further emphasized by important performance indicators such as accuracy, recall, F1-score, Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Mutual Information (MI), as shown in Figure 19, Figure 20, Figure 21, Figure 22 and Figure 23.

11. State-of-the-Art Comparison with Existing Techniques

As reported in Table 5, Table 6, and Figure 24 comparing medical image fusion techniques our proposed method (2024) achieves state-of-the-art performance with 99.8% accuracy, outperforming WPCA, WT+Fusion Rules, Averaging Method by WT, DWT + IDWT, NSCT + RPNN, NSST + PAPCN, and NSCT + LE methods, demonstrating superior fusion quality and diagnostic accuracy.

12. Conclusions

Our study proposes a novel approach for the detection and classification of traumatic brain injuries (TBIs) using deep-learning algorithms and advanced image analysis techniques. The proposed hybrid CNN-ViT model, enhanced with curvelet transform features and refined fusion techniques such as DT-CWT, PCA, average, HIS, and SWT, addresses the limitations of conventional methods. This model also incorporates a gradient-based sharpness model for low-frequency coefficients and utilizes complex wavelets to remove artifacts and enhance image quality. The results are remarkable, with the model achieving a 99.8% accuracy, precision, recall, and F1-score, indicating exceptional performance in identifying 24 different types of brain damage. The average Peak Signal-to-Noise Ratio (PSNR) is 39.0 dB, the Structural Similarity Index (SSIM) is 0.99, and the Mutual Information (MI) is 1.0. These outcomes are validated through odd-fold cross-validation (5-fold), which confirms the model’s robustness and reliability across various conditions.

The dataset used comprises 24,000 neuroimaging images from 119 patients, classified into various TBI categories. Rigorous preprocessing steps, including resizing, normalization, contrast enhancement, and data augmentation, ensure the high quality of both training and testing data. The fusion strategy employed in the model significantly enhances detection effectiveness by improving image sharpness and feature classification while filtering out extraneous spatial and temporal information. In terms of practical implications, the high accuracy and detailed analysis provided by this model could greatly improve diagnostic tools for clinicians, offering more reliable and precise detection and classification of brain injuries. This advancement has the potential to enhance clinical practice and patient outcomes significantly. While the model demonstrates promising results, directional transformations, like the shearlet transform [66], may also improve image analysis in addition to the curvelet transform used in this study, opening up further possibilities for investigation and scientific advancements. Future research could explore its application across more diverse datasets and evaluate additional preprocessing techniques such as transfer learning with bias-aware fine-tuning and other advanced techniques to further ensure model fairness and generalizability. to further enhance its accuracy and generalizability. The integration of deep learning with advanced image analysis techniques in this research represents a significant leap forward in the diagnostic capabilities for brain injuries. This innovative approach has the potential to transform clinical practices and substantially improve patient care and outcomes.

Author Contributions

All authors shared equal responsibility for Conceptualization, Methodology, Software, Validation, Formal Analysis, Investigation, Resources, Data Curation, Writing, Visualization, Supervision, Project Administration, and Funding Acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable. This study utilized publicly available, de-identified medical images from the American Radiology Repository, eliminating the need for IRB approval.

Informed Consent Statement

Informed consent was not required, as this study analyzed anonymized, publicly available medical images obtained from the American Radiology Repository.

Data Availability Statement

The data used to support the findings of this study are available from the American College of Radiology (ACR) Reporting and Data Systems (RADS) repository, specifically the TBI-RADS Traumatic Brain Injury dataset, which includes neuroimaging data from 119 patients. Access to the dataset can be obtained upon request to the ACR RADS repository for research purposes.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Abbreviations

Table A1. List of Acronyms.

Acronym	Full Form	Acronym	Full Form
TBI	Traumatic Brain Injury	SNR	Signal-to-Noise Ratio
CNN	Convolutional Neural Network	ROC	Receiver Operating Characteristic
ViT	Vision Transformer	AUC	Area Under the Curve
DT-CWT	Dual-Tree Complex Wavelet Transform	IoU	Intersection over Union
PCA	Principal Component Analysis	TP	True Positive
HIS	Hue, Saturation, Intensity	TN	True Negative
SWT	Stationary Wavelet Transform	FP	False Positive
PSNR	Peak Signal-to-Noise Ratio	FN	False Negative
SSIM	Structural Similarity Index	K-fold	K-Fold Cross-Validation
MI	Mutual Information	MSE	Mean Squared Error
Q-shift DT-CWT	Q-shift Dual-Tree Complex Wavelet Transform	RMSE	Root Mean Squared Error
Dice	Dice Similarity Coefficient	MAE	Mean Absolute Error
F1-score	F1 Score (Harmonic Mean of Precision and Recall)	SVD	Singular Value Decomposition
CNN-ViT	Convolutional Neural Network–Vision Transformer	LDA	Linear Discriminant Analysis
DL	Deep Learning	AI	Artificial Intelligence
ML	Machine Learning

References

Hyder, A.A.; Wunderlich, C.A.; Puvanachandra, P.; Gururaj, G.; Kobusingye, O.C. The impact of traumatic brain injuries: A global perspective. NeuroRehabilitation 2007, 22, 341–353. [Google Scholar] [CrossRef]
Peterson, A.B.; Zhou, H.; Thomas, K.E.; Daugherty, J. Traumatic brain injury-related hospitalizations and deaths by age group, sex, and mechanism of injury: United States 2016/2017. National Center for Injury Prevention and Control (U.S.). Division of Injury Prevention. 2021. Available online: https://stacks.cdc.gov/view/cdc/111900 (accessed on 28 August 2024).
Lewis, D. An Exploratory Analysis of the Cost-Effectiveness of a Multi-Cancer Early Detection Blood Test in Ontario, Canada. Master’s Thesis, University of Waterloo, Waterloo, ON, Canada, 2023. [Google Scholar]
Singh, S.; Singh, H.; Bueno, G.; Deniz, O.; Singh, S.; Monga, H.; Hrisheekesha, P.N.; Pedraza, A. A review of image fusion: Methods, applications and performance metrics. Digit. Signal Process. 2023, 137, 104020. [Google Scholar] [CrossRef]
Li, H.; Liu, J.; Zhang, Y.; Liu, Y. A deep learning framework for infrared and visible image fusion without strict registration. Int. J. Comput. Vis. 2024, 132, 1625–1644. [Google Scholar] [CrossRef]
Liang, N. Medical image fusion with deep neural networks. Sci. Rep. 2024, 14, 7972. [Google Scholar] [CrossRef] [PubMed]
Haribabu, M.; Guruviah, V. An improved multimodal medical image fusion approach using intuitionistic fuzzy set and intuitionistic fuzzy cross-correlation. Diagnostics 2023, 13, 2330. [Google Scholar] [CrossRef] [PubMed]
Basu, S.; Singhal, S.; Singh, D. A systematic literature review on multimodal medical image fusion. Multimedia Tools Appl. 2024, 83, 15845–15913. [Google Scholar] [CrossRef]
Saleh, M.A.; Ali, A.A.; Ahmed, K.; Sarhan, A.M. A brief analysis of multimodal medical image fusion techniques. Electronics 2023, 12, 97. [Google Scholar] [CrossRef]
Lin, E.; Yuh, E.L. Computational approaches for acute traumatic brain injury image recognition. Front. Neurol. 2022, 13, 791816. [Google Scholar] [CrossRef] [PubMed]
Prichep, L.S.; Jacquin, A.; Filipenko, J.; Dastidar, S.G.; Zabele, S.; Vodencarevic, A.; Rothman, N.S. Classification of traumatic brain injury severity using informed data reduction in a series of binary classifier algorithms. IEEE Trans. Neural Syst. Rehabil. Eng. 2012, 20, 806–822. [Google Scholar] [CrossRef]
Rahman, Z.; Pasam, T.; Rishab; Dandekar, M.P. Binary classification model of machine learning detected altered gut integrity in controlled-cortical impact model of traumatic brain injury. Int. J. Neurosci. 2024, 134, 163–174. [Google Scholar] [CrossRef]
Luo, X.; Lin, D.; Xia, S.; Wang, D.; Weng, X.; Huang, W.; Ye, H. Machine learning classification of mild traumatic brain injury using whole-brain functional activity: A radiomics analysis. Dis. Mark. 2021, 2021, 3015238. [Google Scholar] [CrossRef] [PubMed]
Prichep, L.S.; Ghosh Dastidar, S.; Jacquin, A.; Koppes, W.; Miller, J.; Radman, T.; Naunheim, R.; Huff, J.S. Classification algorithms for the identification of structural injury in TBI using brain electrical activity. Comput. Biol. Med. 2014, 53, 125–133. [Google Scholar] [CrossRef] [PubMed]
Kasinets, D.; Saeed, A.K.; Johnson, B.A.; Rodriguez, B.M. Layered convolutional neural networks for multi-class image classification. In Proceedings of the SPIE 13034, Real-Time Image Processing and Deep Learning 2024, San Diego, CA, USA, 7 June 2024. [Google Scholar] [CrossRef]
Pal, B.; Mahajan, S.; Jain, S. A Comparative Study of Traditional Image Fusion Techniques with a Novel Hybrid Method. In Proceedings of the IEEE 2020 International Conference on Computational Performance Evaluation (ComPE), Shillong, India, 2–4 July 2020. [Google Scholar] [CrossRef]
Rajalingam, B.; Priya, R. Multimodality Medical Image Fusion Based on Hybrid Fusion Techniques. Int. J. Eng. Manuf. Sci. 2017, 7, 22–29. [Google Scholar]
Saad, N.M.; Bakar, S.A.R.S.A.; Muda, A.S.; Mokji, M.M. Review of Brain Lesion Detection and Classification Using Neuroimaging Analysis Techniques. J. Teknol. 2015, 74, 6. [Google Scholar]
Syed, L.; Jabeen, S.; Manimala, S. Telemammography: A Novel Approach for Early Detection of Breast Cancer Through Wavelets Based Image Processing and Machine Learning Techniques. In Advances in Soft Computing and Machine Learning in Image Processing; Springer: Cham, Switzerland, 2018; pp. 149–183. [Google Scholar]
Fradi, M.; Youssef, W.E.; Lasaygues, P.; Machhout, M. Improved USCT of Paired Bones Using Wavelet-Based Image Processing. Int. J. Image Graph. Signal Process. 2018, 11, 9. [Google Scholar]
Ye, H.; Gao, F.; Yin, Y.; Guo, D.; Zhao, P.; Lu, Y.; Xia, J. Precise diagnosis of intracranial hemorrhage and subtypes using a three-dimensional joint convolutional and recurrent neural network. Eur. Radiol. 2019, 29, 6191–6201. [Google Scholar] [CrossRef]
Li, J.; Fu, G.; Chen, Y.; Li, P.; Liu, B.; Pei, Y.; Feng, H. A multi-label classification model for full slice brain computerised tomography image. BMC Bioinform. 2020, 21, 200. [Google Scholar] [CrossRef]
Wang, X.; Shen, T.; Yang, S.; Lan, J.; Xu, Y.; Wang, M.; Zhang, J.; Han, X. A deep learning algorithm for automatic detection and classification of acute intracranial hemorrhages in head CT scans. NeuroImage Clin. 2021, 32, 102785. [Google Scholar] [CrossRef]
Salehinejad, H.; Kitamura, J.; Ditkofsky, N.; Lin, A.; Bharatha, A.; Suthiphosuwan, S.; Colak, E. A real-world demonstration of machine learning generalizability in the detection of intracranial hemorrhage on head computerized tomography. Sci. Rep. 2021, 11, 17051. [Google Scholar] [CrossRef]
Alis, D.; Alis, C.; Yergin, M.; Topel, C.; Asmakutlu, O.; Bagcilar, O.; Senli, Y.D.; Ustundag, A.; Salt, V.; Dogan, S.N.; et al. A joint convolutional-recurrent neural network with an attention mechanism for detecting intracranial hemorrhage on noncontrast head CT. Sci. Rep. 2022, 12, 2084. [Google Scholar] [CrossRef] [PubMed]
James, A.P.; Dasarathy, B.V. Medical image fusion: A survey of the state of the art. Inf. Fusion 2014, 19, 4–19. [Google Scholar] [CrossRef]
Khan, P.; Kader, M.F.; Islam, S.R.; Rahman, A.B.; Kamal, M.S.; Toha, M.U.; Kwak, K.S. Machine learning and deep learning approaches for brain disease diagnosis: Principles and recent advances. IEEE Access 2021, 9, 37622–37655. [Google Scholar] [CrossRef]
Valliani, A.A.A.; Ranti, D.; Oermann, E.K. Deep learning and neurology: A systematic review. Neurol. Ther. 2019, 8, 351–365. [Google Scholar] [CrossRef]
Bhatele, K.R.; Bhadauria, S.S. Brain structural disorders detection and classification approaches: A review. Artif. Intell. Rev. 2020, 53, 3349–3401. [Google Scholar] [CrossRef]
Boehm, K.M.; Khosravi, P.; Vanguri, R.; Gao, J.; Shah, S.P. Harnessing multimodal data integration to advance precision oncology. Nat. Rev. Cancer 2022, 22, 114–126. [Google Scholar] [CrossRef]
Kaur, H.; Koundal, D.; Kadyan, V. Image fusion techniques: A survey. Arch. Comput. Methods Eng. 2021, 28, 4425–4447. [Google Scholar] [CrossRef]
Mehta, R. Identifying Feature, Parameter, and Sample Subsets in Machine Learning and Image Analysis. Master’s Thesis, The University of Wisconsin-Madison, Madison, WI, USA, 2023. [Google Scholar]
Silva, R.F. Multidataset Independent Subspace Analysis: A Framework for Analysis of Multimodal, Multi-Subject Brain Imaging Data. Ph.D. Dissertation, The University of New Mexico, Albuquerque, NM, USA, 2017. [Google Scholar]
Kokro, S.K.; Mwangi, E.; Kamucha, G. Histogram matching and fusion for effective low-light image enhancement. In Proceedings of the SoutheastCon 2024, Atlanta, GA, USA, 15–24 March 2024; pp. 200–206. [Google Scholar]
Al-Wassai, F.A.; Kalyankar, N.V.; Al-Zuky, A.A. The IHS transformations based image fusion. arXiv 2011, arXiv:1107.4396. [Google Scholar]
Nerma, M.H.; Kamel, N.; Jeoti, V. An OFDM system based on dual tree complex wavelet transform (DT-CWT). Signal Process. Int. J. 2009, 3, 14–26. [Google Scholar]
Aburakhia, S.; Shami, A.; Karagiannidis, G.K. On the intersection of signal processing and machine learning: A use case-driven analysis approach. arXiv 2024, arXiv:2403.17181. [Google Scholar]
Venkatesan, B.; Ragupathy, U.S.; Natarajan, I. A review on multimodal medical image fusion towards future research. Multimedia Tools Appl. 2023, 82, 7361–7382. [Google Scholar] [CrossRef]
Faragallah, O.S.; El-Hoseny, H.; El-Shafai, W.; Abd El-Rahman, W.; El-Sayed, H.S.; El-Rabaie, E.S.; Abd El-Samie, F.E.; Geweid, G.G. A comprehensive survey analysis for present solutions of medical image fusion and future directions. IEEE Access 2020, 9, 11358–11371. [Google Scholar] [CrossRef]
Schweitzer, A.D.; Niogi, S.N.; Whitlow, C.T.; Tsiouris, A.J. Traumatic brain injury: Imaging patterns and complications. Radiographics 2019, 39, 1571–1595. [Google Scholar] [CrossRef] [PubMed]
Biglands, J.D.; Radjenovic, A.; Ridgway, J.P. Cardiovascular magnetic resonance physics for clinicians: Part II. J. Cardiovasc. Magn. Reson. 2012, 14, 78. [Google Scholar] [CrossRef] [PubMed]
Galbusera, F.; Cina, A. Image annotation and curation in radiology: An overview for machine learning practitioners. Eur. Radiol. Exp. 2024, 8, 11. [Google Scholar] [CrossRef] [PubMed]
Lewy, D.; Mańdziuk, J. An overview of mixing augmentation methods and augmentation strategies. Artif. Intell. Rev. 2023, 56, 2111–2169. [Google Scholar] [CrossRef]
Sumana, I.J.; Islam, M.M.; Zhang, D.; Lu, G. Content based image retrieval using curvelet transform. In Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, Cairns, QLD, Australia, 8–10 October 2008; pp. 11–16. [Google Scholar]
Nussbaumer, H.J. The Fast Fourier Transform; Springer: Berlin/Heidelberg, Germay, 1982; pp. 80–111. [Google Scholar]
Maurício, J.; Domingues, I.; Bernardino, J. Comparing vision transformers and convolutional neural networks for image classification: A literature review. Appl. Sci. 2023, 13, 5521. [Google Scholar] [CrossRef]
Zheng, Y.; Essock, E.A.; Hansen, B.C.; Haun, A.M. A new metric based on extended spatial frequency and its application to DWT based fusion algorithms. Inf. Fusion 2007, 8, 177–192. [Google Scholar] [CrossRef]
Shamir, R.R.; Duchin, Y.; Kim, J.; Sapiro, G.; Harel, N. Continuous dice coefficient: A method for evaluating probabilistic segmentations. arXiv 2019, arXiv:1906.11031. [Google Scholar]
Rao, T.N. Validation of Analytical Methods; InTech: Delhi, India, 2018. [Google Scholar] [CrossRef]
Tsai, M.-H.; Wang, C.-W.; Tsai, C.-W.; Shen, W.-J.; Yeh, J.-W.; Gan, J.-Y.; Wu, W.-W. Thermal stability and performance of NbSiTaTiZr high-entropy alloy barrier for copper metallization. J. Electrochem. Soc. 2011, 158, H1161. [Google Scholar] [CrossRef]
Alves, J.; Mesquita, D. Entropy formula for systems with inducing schemes. Trans. Am. Math. Soc. 2023, 376, 1263–1298. [Google Scholar] [CrossRef]
Li, Z.; Pan, R.; Wang, J.A.; Wang, Z.; Li, B.; Gao, W. Real-time segmentation of yarn images based on an FCM algorithm and intensity gradient analysis. Fibres Text. East. Eur. 2016, 4, 45–50. [Google Scholar] [CrossRef]
Roy, S.; Das, D.; Lal, S.; Kini, J. Novel edge detection method for nuclei segmentation of liver cancer histopathology images. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 479–496. [Google Scholar] [CrossRef]
Asuero, A.G.; Sayago, A.; González, A.G. The correlation coefficient: An overview. Crit. Rev. Anal. Chem. 2006, 36, 41–59. [Google Scholar] [CrossRef]
Fu, W.; Gu, X.; Wang, Y. Image quality assessment using edge and contrast similarity. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 852–855. [Google Scholar]
Horé, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar] [CrossRef]
Bakurov, I.; Buzzelli, M.; Schettini, R.; Castelli, M.; Vanneschi, L. Structural similarity index (SSIM) revisited: A data-driven approach. Expert Syst. Appl. 2022, 189, 116087. [Google Scholar] [CrossRef]
Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E—Stat. Nonlinear Soft Matter Phys. 2004, 69, 66138. [Google Scholar] [CrossRef]
Sekhar, A.S.; Prasad, M.G. A novel approach of image fusion on MR and CT images using wavelet transforms. In Proceedings of the 2011 3rd International Conference on Electronics Computer Technology, Kanyakumari, India, 8–10 April 2011; Volume 4, pp. 172–176. [Google Scholar]
Parmar, K.; Kher, R.K.; Thakkar, F.N. Analysis of CT and MRI image fusion using wavelet transform. In Proceedings of the 2012 International Conference on Communication Systems and Network Technologies, Rajkot, Gujarat, India, 11–13 May 2012; pp. 124–127. [Google Scholar]
Bhavana, V.; Krishnappa, H.K. Multi-modality medical image fusion using discrete wavelet transform. Procedia Comput. Sci. 2015, 70, 625–631. [Google Scholar] [CrossRef]
Ramaraj, V.; Swamy, M.V.A.; Sankar, M.K. Medical Image Fusion for Brain Tumor Diagnosis Using Effective Discrete Wavelet Transform Methods. J. Inf. Syst. Eng. Bus. Intell. 2024, 10. [Google Scholar] [CrossRef]
Das, S.; Kundu, M.K. A neuro-fuzzy approach for medical image fusion. IEEE Trans. Biomed. Eng. 2013, 60, 3347–3353. [Google Scholar] [CrossRef] [PubMed]
Fan, F.; Huang, Y.; Wang, L.; Xiong, X.; Jiang, Z.; Zhang, Z.; Zhan, J. A semantic-based medical image fusion approach. arXiv 2019, arXiv:1906.00225. [Google Scholar]
Zhu, Z.; Zheng, M.; Qi, G.; Wang, D.; Xiang, Y. A phase congruency and local Laplacian energy based multi-modality medical image fusion method in NSCT domain. IEEE Access 2019, 7, 20811–20824. [Google Scholar] [CrossRef]
Corso, R.; Stefano, A.; Salvaggio, G.; Comelli, A. Shearlet transform applied to a prostate cancer radiomics analysis on MR images. Mathematics 2024, 12, 1296. [Google Scholar] [CrossRef]

Figure 1. Classification of different available fusion algorithmic techniques.

Figure 2. Traumatic brain injury classification.

Figure 3. TBI dataset classes distribution.

Figure 4. Traumatic medical brain injury data prepossessing.

Figure 5. Proposed hybrid image fusion algorithm for medical image fusion.

Figure 6. Proposed model architecture of hybrid CNN-ViT model.

Figure 7. Two medical scan images which are to be fused.

Figure 8. The fused image in the third axis.

Figure 9. Spatial gradients computed by smoothing the average input images.

Figure 10. Dice coefficient.

Figure 11. Sensitivity Rate.

Figure 12. Specificity (True Negative Rate).

Figure 13. Entropy.

Figure 14. Average Pixel Intensity (Mean).

Figure 15. Standard deviation (SD).

Figure 16. Correlation Coefficient (CC).

Figure 17. Edge similarity measure.

Figure 18. Overall accuracy of all algorithm.

Figure 19. Average classification Performance Metrics TBI.

Figure 20. Average Confusion metric for multiclass TBI.

Figure 21. Average AUC-ROC Curve for multiclass TBI.

Figure 22. Training and validation accuracy.

Figure 23. Cross validation Traumatic brain Injury.

Figure 24. State-of-the-art comparison with existing techniques.

Table 1. Dataset Information.

Total Images	Training Samples	Validation Samples	Testing Samples	Data Format	Dimension
24,000	19,200	2400	2400	jpg	100 × 100

Table 3. Updated Overall Performance Metrics.

Metric	Value
Overall Accuracy	99.8%
Precision	99.8%
Recall	99.8%
F1-Score	99.8%
Average PSNR	39.0 dB
Average SSIM	0.99
Average MI	1.0

Table 4. Simulated Performance Metrics for Each Class.

Class	Precision	Recall	F1-Score	Accuracy	PSNR	SSIM	MI
1	99.8%	99.7%	99.75%	99.8%	39.9	0.99	1.0
2	99.9%	99.9%	99.9%	99.9%	39.8	0.99	1.0
3	99.7%	99.8%	99.75%	99.7%	39.7	0.99	1.0
4	99.6%	99.7%	99.65%	99.7%	39.6	0.98	1.0
5	99.8%	99.6%	99.7%	99.6%	39.5	0.99	1.0
6	99.7%	99.8%	99.75%	99.8%	39.4	0.98	1.0
7	99.9%	99.9%	99.9%	99.9%	39.3	0.99	1.0
8	99.6%	99.7%	99.65%	99.6%	39.2	0.98	1.0
9	99.8%	99.9%	99.85%	99.8%	39.1	0.99	1.0
10	99.9%	99.8%	99.85%	99.8%	39.0	0.99	1.0
11	99.7%	99.8%	99.75%	99.7%	38.9	0.99	1.0
12	99.8%	99.9%	99.85%	99.9%	38.8	0.99	1.0
13	99.9%	99.9%	99.9%	99.9%	38.7	0.99	1.0
14	99.6%	99.7%	99.65%	99.7%	38.6	0.98	1.0
15	99.7%	99.8%	99.75%	99.7%	38.5	0.99	1.0
16	99.8%	99.9%	99.85%	99.8%	38.4	0.99	1.0
17	99.9%	99.9%	99.9%	99.9%	38.3	0.99	1.0
18	99.7%	99.8%	99.75%	99.8%	38.2	0.99	1.0
19	99.6%	99.7%	99.65%	99.7%	38.1	0.98	1.0
20	99.9%	99.9%	99.9%	99.9%	38.0	0.99	1.0
21	99.8%	99.7%	99.75%	99.8%	37.9	0.99	1.0
22	99.9%	99.8%	99.85%	99.8%	37.8	0.99	1.0
23	99.7%	99.9%	99.8%	99.9%	37.7	0.99	1.0
24	99.8%	99.7%	99.75%	99.7%	37.6	0.99	1.0

Table 5. State-of-the-art comparison of medical image-fusion techniques.

Ref	Authors	Year	Dataset	Technique
[59]	Sekhar, A. S., Prasad, M. G.	2011	Medical scans	WPCA
[60]	Parmar, K., Kher, R. K., Thakkar, F. N.	2012	Medical scans	WT + Fusion Rules
[61]	Bhavana, V., Krishnappa, H. K.	2015	Medical scans	Averaging Method by WT
[62]	Ramaraj, V., Swamy, M. V. A., Sankar, M. K.	2024	Medical scans	DWT + IDWT
[63]	S. Das and M. K. Kundu	2013	Medical scans	NSCT + RPNN
[64]	F. Fan et al.	2019	Medical scans	NSST + PAPCN
[65]	Z. Zhu, M. Zheng, G. Qi, D. Wang, and Y. Xiang	2019	Medical scans	NSCT + LE
we	proposed	2024	Medical scans	DCT + SWT + IHS + PCA + Avg
				× CNN-ViT

Table 6. Results for Medical Image-Fusion Techniques.

Sr.	Results
[59]	Mean: 32.8347, SD: 29.9188, Entropy: 6.7731, Covariance: 2.0293, Correlation Coefficient: 0.8617
[60]	PSNR: 16, RMSE: 0.35
[61]	Proposed Method (w = 0.5): MSE = 0.02819, PSNR = 63.6424; Proposed Method (w = 0.7): MSE = 0.1911, PSNR = 55.3184
[62]	PSNR: 71.66, SSIM: 0.98
[63]	PSNR: 31.68, SSIM: 0.50
[64]	PSNR: 32.92, SSIM: 0.49
[65]	PSNR: 31.61, SSIM: 0.48
we	Dice coefficient: 0.92, Sensitivity: 0.85, Specificity: 0.91, Entropy: 0.78, Mean: 160.5, SD: 23.7, CC: 0.93, ESM: 0.88, Accuracy: 99.8%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Abdullah; Siddique, A.; Fatima, Z.; Shaukat, K. Traumatic Brain Injury Structure Detection Using Advanced Wavelet Transformation Fusion Algorithm with Proposed CNN-ViT. Information 2024, 15, 612. https://doi.org/10.3390/info15100612

AMA Style

Abdullah, Siddique A, Fatima Z, Shaukat K. Traumatic Brain Injury Structure Detection Using Advanced Wavelet Transformation Fusion Algorithm with Proposed CNN-ViT. Information. 2024; 15(10):612. https://doi.org/10.3390/info15100612

Chicago/Turabian Style

Abdullah, Ansar Siddique, Zulaikha Fatima, and Kamran Shaukat. 2024. "Traumatic Brain Injury Structure Detection Using Advanced Wavelet Transformation Fusion Algorithm with Proposed CNN-ViT" Information 15, no. 10: 612. https://doi.org/10.3390/info15100612

APA Style

Abdullah, Siddique, A., Fatima, Z., & Shaukat, K. (2024). Traumatic Brain Injury Structure Detection Using Advanced Wavelet Transformation Fusion Algorithm with Proposed CNN-ViT. Information, 15(10), 612. https://doi.org/10.3390/info15100612

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Traumatic Brain Injury Structure Detection Using Advanced Wavelet Transformation Fusion Algorithm with Proposed CNN-ViT

Abstract

1. Introduction

2. Related Work

3. Foreground Knowledge

4. The Proposed Research Approach

Data Set Description

5. Framework Architecture

Data Prepossessing

6. Proposed Fusion Approach

Hybrid Fusion Algorithm for Brain Injury Detection

7. Technical Description of the Novel Hybrid CNN-ViT Model Architecture

7.1. Model Components

7.1.1. Convolutional Neural Network (CNN)

7.1.2. Customized Vision Transformer (ViT-B)

7.1.3. Curvelet Transform Features

7.2. Model Architecture

7.3. Training and Evaluation

7.4. Performance

8. Results of Visual and Contextual Modeling

9. Performance Metrics Evaluation of Models

9.1. Dice Coefficient

9.2. Sensitivity (True Positive Rate)

9.3. Specificity (True Negative Rate)

9.4. Entropy

9.5. Average Pixel Intensity (Mean)

9.6. Standard Deviation (SD)

9.7. Correlation Coefficient (CC)

9.8. Edge Similarity Measure (ESM)

9.9. Accuracy

9.10. Model Results

9.11. Cross-Validation

10. Discussion and Results

11. State-of-the-Art Comparison with Existing Techniques

12. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI