1. Introduction
Magnetic resonance imaging (MRI) has evolved into a potent diagnostic tool, offering non-invasive structural and functional visualization without the need for radiation exposure [
1]. However, the inherent nature of magnetic signal acquisition prevents the direct acquisition of spatial domain images, transforming the imaging process into a sequential filling of sample points in the frequency domain, specifically k-space, followed by Fourier transformation to compute the final image. Consequently, the sequential nature of MRI signal acquisition leads to prolonged scan times and the potential for motion artifacts. These factors limit its broader application and introduce risks.
To tackle this issue, strategies such as under-sampling of the k-space have been adopted, resulting in significant reductions in acquisition time [
2]. Various methodologies optimize under-sampling based on signal and noise characteristics, ensuring maximal preservation of image details while minimizing artifacts during the reconstruction process. Advanced reconstruction algorithms further elevate image quality. The theoretical foundation for such approaches is provided by Compressed Sensing (CS) theory [
3], enabling more aggressive acceleration. Despite the gains in reducing acquisition time and enhancing MRI efficiency, ongoing innovation remains crucial to balancing efficiency and diagnostic accuracy [
4].
Traditional CS methods iteratively infer unknown k-space signals based on existing ones and sparse restrictions. Recent advancements in deep learning have spurred the adoption of artificial neural networks to represent the reconstruction process and parameter learning through backpropagation. The amalgamation of under-sampling the k-space and advanced neural networks reconstruction algorithms generates high-quality images from limited data. This integration enhances temporal and spatial resolution, paving the way for improved clinical diagnosis and research.
However, within the current landscape of deep learning methodologies, particularly when dealing with high acceleration rates, there exists a prominent issue stemming from the inadequate utilization of diverse frequency information across different feature hierarchies. This shortfall often leads to the undesirable outcome of generating overly smoothed images. To address this limitation, we present a novel strategy termed “progressive feature reconstruction”. This strategy operates by selectively unlocking distinct channels for frequency capture across different feature orders. Moreover, the allocation of high and low-frequency channels is dynamically optimized throughout the varying reconstruction stages. By doing so, our approach aims to rectify the shortcomings of prevailing methods by enabling a more nuanced and accurate depiction of frequency characteristics within reconstructed images.
In addition, addressing the challenge of multiple reconstruction outcomes demands an effective fusion strategy. Existing approaches, such as direct concatenation followed by 1 × 1 convolutional dimension reduction, or concatenating and subsequently utilizing refinement modules for final reconstruction, may impose constraints on feature relationships and result in information loss. To overcome this issue, we introduce a Transformer-based fusion algorithm that efficiently leverages information from diverse reconstruction outcomes, enabling adaptive fusion. Integrating this fusion algorithm with the results of progressive feature reconstruction, our approach showcases exceptional performance when compared to eight existing algorithms on two publicly available datasets and one proprietary dataset.
In summary, the proposed innovation in MRI reconstruction can be highlighted as follows:
Progressive Feature Reconstruction: Our approach introduced an innovative technique for MRI reconstruction. It efficiently captured diverse frequency information across feature orders, effectively mitigating over-smoothing issues.
Transformer-Based Fusion: We proposed an efficient fusion method utilizing Transformers, optimizing the integration of reconstruction outcomes from different feature orders.
Enhanced Performance: Through the integration of progressive feature reconstruction and Transformer-based fusion, our method consistently achieved outstanding results across diverse datasets, even in scenarios with limited data samples.
2. Related Works
In the realm of deep learning, certain models were intricately designed by drawing inspiration from optimization algorithms employed in compressed sensing (CS). These innovative approaches entailed the replacement of components within optimization algorithms with neural networks. Notably, the Alternating Direction Method of Multipliers (ADMM-Net) [
5] meticulously fine-tuned the learning parameters of the ADMM framework during training, whereas the Iterative Soft Thresholding Algorithm (ISTA-Net) [
6] adroitly substituted the design parameters within the ISTA algorithm with neural networks. Moreover, the IFR-Net [
7] effectively achieved trainable regularization and feature refinement through the deliberate unfolding of the iterative feature refinement process.
Convolutional neural networks based on deep learning extract prior knowledge from training data and combine the concept of sparse regularization for MRI reconstruction. Prominent instances included variational autoencoders [
8], generative adversarial networks (GANs) [
9,
10,
11,
12,
13,
14], and various other generative models [
15,
16]. Iterative models constituted another category, where many deep learning methods embraced iterative updates for MRI reconstruction. Notable models included unfolded networks [
17], recurrent neural networks (RNNs) [
18], and iterative inversion models [
19]. For instance, CRNN [
18] and CSDL-Net [
20] combined traditional iterative algorithms with recurrent hidden connections to cyclically enhance reconstruction stages, capturing spatiotemporal dependencies and bolstering reconstruction accuracy. Similarly, PC-RNN [
21] employed multiple convolutional recurrent neural network (ConvRNN) modules to iteratively learn features at different scales, culminating in a convolutional neural network (CNN) module that orchestrated pyramid-style image reconstruction for improved results. MEDL-Net [
22] and ReconResNet [
23] utilized regularization learning for MRI reconstruction.
Moreover, the transformative Transformer architecture, renowned in natural language processing and computer vision, demonstrated its prowess in MRI reconstruction [
24,
25,
26,
27,
28]. The Transformer’s design characteristics facilitated the modeling of long-range feature dependencies and the parallel processing of spatiotemporal correlations. DSME-Net [
24] employed bidirectional alternating connections for enhanced information exchange, while T2-Net [
29] and MHAN [
30] addressed joint MRI reconstruction and super-resolution. KTMR [
31] used SwinIR [
27] as their core architecture, and SwinGAN [
28] creatively utilized a dual-domain GAN to accelerate MRI reconstruction and overcome limitations in structural detail preservation found in traditional methods. RNLFNet [
26] combined a self-attention mechanism with Fourier transform to capture long-range spatial correlation in the frequency domain.
Our approach introduced a novel strategy of progressive feature reconstruction combined with a Transformer-based fusion approach, achieving enhanced MRI reconstruction performance by effectively capturing diverse frequency information and optimally integrating reconstruction outcomes.
3. Materials and Methods
We present our method for MRI reconstruction in this section. We first formalize the inverse problem of MRI reconstruction, providing a mathematical representation of the problem. Building upon this formalization, we then introduce the overall architecture of our method.
3.1. Problem Formulation
The fundamental problem of MRI reconstruction is to reconstruct clear MR images from under-sampled k-space. The under-sampled k-space can be formulated as follows:
In the formulation, A represents the combination of the under-sampling matrix M and the Fourier transform matrix F. The matrix M represents the under-sampling pattern applied during the data acquisition process. The variable ε represents the noise present during the acquisition. Finally, x represents the desired clear MR image that we aim to reconstruct, and y represents the under-sampled k-space data obtained from the acquisition process.
However, the inverse process of the above formulation is ill-posed. To address this underdetermined inverse problem, deep learning-based MRI reconstruction typically incorporates a deep convolutional neural network with learnable parameters
θ to learn the mapping between zero-filled images and fully sampled images. The formulation can be expressed as follows:
In Equation (2), our network serves as the for an end-to-end mapping. Further details will be elaborated in subsequent departments.
3.2. Progressive Feature Reconstruction and Fusion Network
Our network comprises two major components: the Progressive Feature Reconstruction Module and the Fusion Module. The design of our Progressive Feature Reconstruction Module is inspired by the successful UNet architecture, aiming to progressively extract and reconstruct features at different scales for high-fidelity MRI restoration. Upon the introduction of zero-filled images into the network, an initial convolutional layer extracts shallow-level features. Subsequently, the Reconstruction Module operates on these shallow-level features to effectuate the reconstruction process.
Within the Reconstruction Module, the shallow-level features sequentially traverse through five distinct reconstruction blocks. These blocks correspond, respectively, to low-order reconstruction, intermediate-order reconstruction, high-order reconstruction, another intermediate-order reconstruction, and finally low-order reconstruction. At the end of each reconstruction block, we add a data consistency layer to correct the data reconstruction. This sequence yields five sets of distinct reconstruction features. After concatenating these features along the channel dimension, they are channeled into the Multi-Order Fusion Transformer for adaptive integration.
The H-Recon block acts as a bottleneck, which is chosen for computational efficiency and parameter considerations. This decision is made to strike a balance between efficiency and performance. Repeating intermediate and low-level feature reconstructions enhances the integration of high-level semantic information.
The Fusion Transformer orchestrates a dynamic fusion process, harmonizing the amalgamated channel-enriched reconstruction features. This transformation culminates in the generation of the final high-definition MRI reconstruction.
Our network architecture embodies a coherent sequence wherein initial feature extraction, hierarchical reconstruction, and adaptive fusion sequentially coalesce to yield high-quality reconstructed MRI images. Its computation follows Algorithm 1.
Algorithm 1 MRI reconstruction using our algorithm |
Input: Under-sampled k-space k0, Under-sampled Mask M. Compute X through k0 using F−1; Compute fea through X using Equation (3); //Progressive Feature Reconstruction for fea in [L1-Recon, M1-Recon, H-Recon, M2-Recon, L2-Recon] do Compute fea through fea using Equation (6); Apply DC on fea using Equation (14); Save fea as L1/M1/H/M2/L2 end //Multi-Order Fusion Transformer mechanism Compute Q1, K1, and V1 using Equation (8); Compute x using Equations (9) and (10); Apply DC on x using Equation (14); Compute Q2, K2, and V2 using Equation (11); Compute Y using Equations (12) and (13); Apply DC on Y using Equation (14); Obtain Y. |
3.3. L-Recon (M-Recon/H-Recon)
The entire reconstruction process is partitioned into five distinct reconstruction blocks; across these reconstruction blocks, a consistent core architecture is employed with the key distinction lying in the utilization of varying up and down-sampling ratios for different-order reconstruction blocks.
In the low-order reconstruction block, an initial convolution facilitates a two-fold down-sampling, which is followed by channel expansion. Subsequently, attention operations are executed, culminating in the reconstruction process. The final output is then restored to the original dimensions through transpose convolution. Similarly, within the intermediate-order reconstruction block, the initial convolution enacts a four-fold down-sampling, and the high-order reconstruction block commences with an initial convolution inducing an eight-fold down-sampling, which is again followed by transpose convolution for dimension restoration after the reconstruction phase.
Owing to the necessity of addressing the specificities of different orders, attention operations in each reconstruction block comprise two integral components: a high-frequency feature extractor (HFE) and a low-frequency feature extractor (LFE). Upon entering the reconstruction block, features that have undergone down-sampling and channel dimension augmentation are partitioned into distinct channel subsets. This process is intrinsically guided by the unique characteristics of each channel, emphasizing an intelligent and automated extraction process rather than manual segmentation. One subset of channel features is routed to the high-frequency feature extractor, while the other subset enters the low-frequency feature extractor. Following attention operations and channel-wise integration, feature optimization is achieved via a deep separable convolution layer. Subsequently, the reconstructed outcome is refined through normalization layers and Multi-Layer Perceptron (MLP) operations.
Considering the intrinsic characteristics of high-frequency information within images, the high-frequency feature extractor is composed of both max-pooling and parallel convolution layers. Similarly, for the low-frequency information that pertains to regions with gradual variations in grayscale values across a broader range, the low-frequency feature extractor integrates self-attention mechanisms. Recognizing that features from different orders contain varying degrees of high and low-frequency content, we have devised a dynamic channel segmentation strategy, channeling subsets of features into distinct-frequency feature extractors.
In the context of the low-order feature reconstruction block, where high-frequency information is prominent, the channel count directed to the high-frequency feature extractor surpasses that allocated to the low-frequency counterpart. Conversely, in the high-order feature reconstruction block, the number of channels directed to the high-frequency feature extractor is lower compared to the channels routed to the low-frequency feature extractor. In the intermediate-order feature reconstruction block, the allocation is balanced between the two.
In essence, this modular approach provides adaptability across diverse reconstruction stages, integrating high and low-frequency information effectively. The tailored architecture and distinct sampling strategies for varying orders contribute to a comprehensive hierarchical feature extraction, ultimately enhancing the quality and accuracy of the reconstruction process.
3.4. Multi-Order Fusion Transformer (MOFT)
In the context of multi-level MRI image reconstruction, a novel approach entails the aggregation of reconstruction features from distinct stages, namely
L1,
M1,
H,
M2, and
L2. Firstly, these features are then fed into a Multi-Order Fusion Transformer (MOFT). Notably, the features originating from the low-order reconstructions,
L1 and
L2, undergo channel dimension reduction via 1 × 1 convolutions. Subsequently, they are down-sampled through average pooling, generating constituents for the attention mechanism’s
V1 component. Correspondingly, the intermediate-order reconstruction features
M1 and
M2 also undergo 1 × 1 convolution-based channel compression, which is followed by down-sampling through average pooling, serving as sources for the attention mechanism’s
Q1 component. The high-order reconstruction feature
H, on the other hand, is directly down-sampled through average pooling, contributing to the attention mechanism’s
K1 component. Upon completion of the attention mechanism, the resultant attention outcomes are up-sampled and added to the concatenated features
L1,
M1,
H,
M2, and
L2, which have undergone channel reduction via 1 × 1 convolutions.
Upon completion of the attention mechanism, the resultant attention outcomes are up-sampled and added to the concatenated features
L1,
M1,
H,
M2, and
L2, which have undergone channel reduction via 1 × 1 convolutions. This summation is then processed through normalization and MLP, marking the first phase of fusion.
where MCA is multi-head cross-attention.
Subsequently, this outcome undergoes further down-sampling to serve as the foundation for generating the
V2 component in the second fusion step. The
Q2 and
K2 components are generated from the features
L1,
M1,
H,
M2, and
L2, with
L1,
M1, and
H contributing to
Q2 and
H,
M2, and
L2 contributing to
K2.
Following the attention mechanism in the second fusion step, the resultant attention outcomes are up-sampled and combined with the output of the initial fusion stage. Subsequent processing includes normalization and MLP operations in a sequential manner, ultimately yielding the fused output as the final reconstructed result.
The advantages of this fusion approach stem from its adaptive amalgamation of reconstruction outcomes across different orders, which is enacted through a two-fold fusion strategy. The two-phase fusion framework introduces a deliberative and graduated approach to feature fusion. This method not only capitalizes on the diversity of reconstruction features derived from disparate stages but also affords a flexible mechanism for the sequential selection of fusion tiers.
3.5. Data Consistency Layers (DC Layers)
In the context of MRI image reconstruction, the Data Consistency (DC) layer serves as a pivotal technique employed to ensure alignment between the reconstructed image and observed data. This layer is integrated into both our reconstruction modules and fusion modules within the architecture. Leveraging the Fourier Transform (
F) and its inverse (
F−1), the DC layer operates in the frequency domain. More precisely, the computation of the DC layer is defined as follows:
where
k0 represents the acquired under-sampled data, and
M represents the under-sampling pattern applied during the data acquisition process.
By incorporating the DC layer, our algorithm can systematically compare the reconstructed image against the observed data within each reconstruction module. Based on this comparison, adjustments are made to the reconstructed image, thereby enhancing both its accuracy and consistency.
4. Results
In this section, we present the experimental details, reconstruction results and metric comparisons across two datasets as well as four ablation studies on our architecture.
4.1. Details of the Experiments
4.1.1. Architecture Specification
In
Figure 1, some parameters of the architecture were not explicitly provided, and we clarified them in this section. Firstly, in
Figure 1a, we set the number of reconstruction blocks n to 6, and this choice was explained in the subsequent ablation study. In
Figure 1b, for different stages of the reconstruction process, the channel expansion during the initial down-sampling and the subsequent allocation of channels to different frequency feature extractors varied. In the Low-Order Feature Reconstruction Block (L-Recon), we had 24 channels for channel expansion. Due to the higher content of high-frequency information in low-order features, 16 channels were allocated to the low-frequency feature extractor, while eight channels were allocated to the high-frequency feature extractor. In the Middle-Order Feature Reconstruction Block (M-Recon), we used 48 channels for channel expansion since the high and low-frequency information content was balanced at this stage. Therefore, both the low-frequency and high-frequency feature extractors had 24 channels each. In the High-Order Feature Reconstruction Block (H-Recon), we employed 96 channels for channel expansion. Because low-order features contained more low-frequency information, 64 channels were allocated to the low-frequency feature extractor, while 32 channels were allocated to the high-frequency feature extractor. Based on the above content, its summary is shown in
Table 1.
4.1.2. Datasets
To evaluate the performance of the proposed method, we tested it on two publicly available datasets (FastMRI knee dataset [
32,
33] and Calgary-Campinas brain dataset [
34]).
The FastMRI dataset, a product of a collaboration between the Department of Radiology at the NYU School of Medicine, NYU Langone Health, and Facebook AI Research (FAIR), was created with the goal of advancing imaging technologies in clinical practice to enhance human health. It aimed to accelerate magnetic resonance imaging (MRI) scans by up to ten-fold using artificial intelligence. This dataset included fully sampled knee joint MRI data acquired on 3 and 1.5 Tesla magnets. The training set of the FastMRI knee dataset comprised 973 volumes (34,742 slices), with a validation set containing 199 volumes (7135 slices), all with an acquisition matrix size of 320 × 320.
The Calgary-Campinas public brain magnetic resonance (MR) images dataset resulted from a collaborative effort between the Vascular Imaging Lab at the University of Calgary and the Medical Image Computing Lab at the University of Campinas (UNICAMP). It provided 3D brain data, specifically 167 three-dimensional (3D), T1-weighted, gradient-recalled echo, 1 mm isotropic sagittal acquisitions obtained from a clinical MR scanner (Discovery MR750; General Electric Healthcare, Waukesha, WI, USA). The training set of the Calgary-Campinas clinical brain dataset comprised 25 volumes (4524 slices), and the validation set consisted of 10 volumes (1700 slices) with an acquisition matrix size of 256 × 256.
We employed under-sampled k-space data as the input to our algorithm. Given that k-space data are represented in a complex form, we handled the real and imaginary components as distinct channels within our model.
4.1.3. Details of the Training
In our training setup, we implemented acceleration factors of 8× and 4× along the phase encoding direction in k-space. We retained 4% and 8% of the central lines while randomly sampling the peripheral region of k-space for under-sampling. The under-sampling process was directly applied to the raw k-space data, which was followed by the application of the inverse Fourier transform to convert the k-space signals into the image domain.
For optimization, we employed the Adam optimizer with default settings of 0.9 and 0.999. The initial learning rate was set to 0.0004, and the total number of training epochs was 50. To ensure training stability, a learning rate decay strategy was utilized, reducing the learning rate by a factor of 0.9 every 5 epochs. Depending on the specific dataset, various batch sizes were employed with a standard choice being 8. During training, the L1 loss function was employed, providing robustness to outliers. The experiments were conducted on a computational cluster featuring four NVIDIA Tesla V100 GPUs.
4.1.4. Comparative Algorithms
We conducted a comprehensive comparative analysis involving eight advanced deep learning algorithms. These algorithms included UNet (2017) [
35] based on convolutional neural networks, KIKI-Net (2018) [
36], which employed cross-domain reconstruction strategies, PD-Net (2018) [
11], the Learned Primal-Dual algorithm for tomographic reconstruction, the state-of-the-art iterative reconstruction method Cascade-net (2017) [
37], CRNN-MRI (2018) [
18], CSDL-Net (2022) [
20], and Transformer-based models like DuDReTLU-net (2023) [
38] and KTMR (2023) [
31]. Notably, DuDReTLU-net was purposefully designed to address dynamic MRI reconstruction challenges, incorporating convolutional modules tailored to our specific reconstruction task. All code implementations for the compared methods were either obtained from the authors’ websites or meticulously recreated in adherence to the original papers. This algorithm was applicable to multi-channel input MRI reconstruction, including 3D reconstruction, dynamic reconstruction, and more. As a comparative algorithm for multi-channel input, this algorithm only required a change in input dimensions to adapt. To achieve architectural versatility, the final fusion module was switched to a refinement module. The refinement module was composed of convolutional layers and activation functions.
4.1.5. Evaluation Metrics
In the process of comparison, we assessed the performance using three key metrics: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Normalized Mean Square Error (NMSE). These metrics provide a comprehensive evaluation of the performance of our method and its competitors in the comparison process.
4.2. MRI Reconstruction Based on Two Datasets
4.2.1. FastMRI Knee MRI Reconstruction
During the reconstruction process of the FastMRI knee dataset, we employed the official data preprocessing pipeline for data preparation and conducted MRI reconstruction using acceleration factors of 4× and 8×. Subsequently, we conducted a comprehensive comparative analysis with eight advanced algorithms. In the comparative results, our approach outperformed these algorithms, exhibiting superior metrics such as higher PSNR and SSIM values as well as lower NMSE values, which are elaborated in
Table 2.
When analyzing the quantitative performance of various algorithms on the FastMRI dataset, several noteworthy observations have emerged. The PD-Net and CSDL-Net algorithms exhibited exceptional performance at the 4× subsampling rate, boasting impressive PSNR values of 32.15 and 32.29, high SSIM scores of 0.804 and 0.806, and low NMSE values of 0.0331 and 0.0327, respectively. Notably, our proposed algorithm, labeled as “Ours”, shines across both subsampling rates. It excels, particularly at the 8× subsampling rate, achieving the highest PSNR (31.02) and SSIM (0.771) among all algorithms, coupled with a commendably low NMSE (0.0413), underscoring its excellence in high-resolution image reconstruction. Meanwhile, CRNN-MRI performs well but falls slightly short in PSNR and SSIM compared to CSDL-Net and “Ours” with slightly higher NMSE values. KTMR demonstrates competitive performance at the 4× subsampling rate, albeit showing a marginal decrease at the 8× rate, with relatively elevated NMSE values at both subsampling rates, indicating potential for accuracy enhancement.
To visually illustrate the effectiveness of our approach, we present comparison results for 4× and 8× acceleration rates in
Figure 2 and
Figure 3. These figures clearly demonstrate that our framework excels in preserving intricate structures, creating more natural and intricate texture features, and producing clearer and more realistic reconstructed images compared to other methods. Furthermore, our method excels in achieving high-fidelity reconstructions even when dealing with highly under-sampled measurements.
Figure 2 and
Figure 3 also display error maps corresponding to various comparative methods, highlighting that our approach yields fewer errors at different under-sampling rates.
4.2.2. Calgary-Campinas Brain MRI Reconstruction
The Calgary-Campinas dataset is a publicly available raw dataset of brain MRI. It differs from the FastMRI dataset in that it contains a smaller number of data samples. The purpose of using the Calgary-Campinas dataset was to validate the reconstruction capability of models on different datasets and assess their fitting ability with fewer data samples. To ensure consistency and reliability in the comparison results, we employed the same training methodology, metric calculation, and error comparison as in the FastMRI dataset. During the training process, we applied the same data preprocessing and model training parameter settings as in the FastMRI dataset. By using the Calgary-Campinas dataset for validation, we were able to comprehensively evaluate and compare the performance of our algorithm on different datasets and verify its generalization ability with limited data samples.
As shown in
Table 3, our proposed algorithm stands out as a frontrunner at both 4× and 8× acceleration rates. It consistently achieves the highest PSNR, SSIM, and the lowest NMSE values, demonstrating remarkable image quality and accuracy. CRNN-MRI also impresses, displaying competitive PSNR and SSIM values at both acceleration rates along with relatively low NMSE values, signifying its competence in preserving image fidelity and structural features. Cascade-net and CSDL-Net perform exceptionally well at the 4× acceleration rate, boasting high PSNR and SSIM values, although their performance diminishes slightly at the 8× rate. Nevertheless, they maintain relatively low NMSE values, suggesting robust accuracy. Unet, while excelling at the 8× rate, exhibits a minor decrease in PSNR and SSIM at the 4× rate, coupled with a slightly higher NMSE, indicating an improvement in performance under higher acceleration conditions.
As shown in
Figure 4 and
Figure 5, both in the error maps and the reconstructed images, our algorithm demonstrates smaller errors and more accurate texture structures, which is similar to the results observed in the FastMRI dataset.
4.3. Ablation Studies on Model Components
To mitigate potential experimental variability, we conducted our MRI reconstruction ablation studies at a 4× acceleration factor on the FastMRI dataset, which was primarily due to its distinction of containing the largest number of samples in both the training and validation sets.
4.3.1. Effectiveness of MOFT
To affirm the effectiveness of the MOFT module, we conducted an ablation study. In this investigation, we replaced MOFT with a Refinement Module (RM) as a substitute for MOFT’s fusion effect, and we also compared the results with concatenation + 1 × 1 convolution as used in previous algorithms.
Table 4 showcases the results with our MOFT achieving the superior fusion results.
4.3.2. Progressive Feature Reconstruction
We introduced a novel strategy termed “Progressive Feature Reconstruction.” It was designed to address the prominent issue in current deep learning methods, particularly when dealing with high acceleration rates, which is the underutilization of varying frequency information across different feature hierarchies. To comprehensively validate the effectiveness of the Progressive Feature Reconstruction Module, we conducted this experiment. The experiment involves sequentially removing the middle and high-order reconstruction blocks in the Progressive Feature Reconstruction Module to showcase the advantages of progressive reconstruction. As our MOFT is designed for fusion across three-order features, it is not applicable for fusion after removing certain-order reconstruction blocks. Therefore, in this ablation study, we replaced MOFT in the network architecture with the Refinement Module (RM) and compared the output results after inputting the multi-order feature reconstructions obtained by concatenation.
Table 5 presents the quality evaluation results, demonstrating that employing features from all hierarchy levels for progressive feature reconstruction yielded optimal results.
4.3.3. Quantity of L-Recon (M-Recon/H-Recon)
To ensure our proposed architecture showcases optimal performance, we conducted an ablation study by varying the number of L-Recon (M-Recon/H-Recon). We experimented with n values ranging from 1 to 8, resulting in diverse experimental outcomes. The results revealed that the optimal performance was achieved when n was set to 6 (as shown in
Figure 6). Consequently, we set n to 6 for our experiments.
4.3.4. Channel Allocation Strategies for Frequency-Specific Feature Extraction
In this ablation study, we devised three distinct allocation strategies. The first strategy entailed a dynamic optimization of channel allocation between high and low-frequency components across various reconstruction modules. Specifically, in the lower-order reconstruction modules, channels allocated for high-frequency components were expanded, while those designated for low-frequency components were reduced. This pattern was reversed in the higher-order reconstruction modules. As shown in
Figure 7, this design comprehensively captured various frequency components within the images, leading to outstanding reconstruction results.
The second strategy focused on swapping the channel counts allocated to high-frequency and low-frequency components between different reconstruction modules, serving as a comprehensive validation of the correctness of the first strategy. However, this strategy produced the least favorable reconstruction results among the three methods.
The third strategy aimed to achieve a balance between high-frequency and low-frequency components across different reconstruction modules. Its results, while not as favorable as those of the first strategy, further underscored the effectiveness of our dynamically designed approach.
5. Discussion
Regarding the future of MRI reconstruction, we envision significant advancements in enhancing both temporal and spatial resolution. This will enable a clearer visualization of biological tissues and dynamic processes. Our next steps involve the development of personalized reconstruction methods tailored to individual physiological and pathological characteristics, aiming for more precise and personalized medical imaging applications. Additionally, we plan to integrate different modalities of MR images, including structural and functional images, to offer a more comprehensive understanding of disease information. The potential of multimodal fusion is substantial, providing clinicians with a more thorough assessment of a patient’s health.
In our paper, we systematically compared various methods using a standardized evaluation approach sourced from the official websites of the public datasets we utilized. The evaluation metrics, including PSNR, SSIM, and NMSE, were selected to rigorously assess the reconstructed images against ground truth images. The implementation was consistent across evaluations using PyCharm (PyTorch 1.11.0).
Furthermore, our method extends beyond MRI reconstruction, demonstrating applicability to tasks such as reconstructing spectroscopic images and addressing real-world challenges like image denoising, dehazing, and deraining. The versatility of our framework allows seamless integration into a unified application, promoting convenient and widespread usage across different domains.
6. Conclusions
In summary, our research effectively addresses the persistent challenges encountered in magnetic resonance imaging (MRI), such as prolonged acquisition times and susceptibility to motion artifacts. We have introduced an innovative deep learning approach tailored to the restoration of high-fidelity MRI images from under-sampled k-space data, achieving remarkable results.
Our cascaded reconstruction strategy, progressively reinstating hierarchical features and employing a novel fusion algorithm, has demonstrated exceptional performance. The dynamic optimization of channel allocation between high-frequency and low-frequency components across diverse reconstruction modules has emerged as the most effective approach, comprehensively capturing a wide spectrum of frequency components and yielding outstanding reconstruction results.
Significantly, our algorithm surpasses state-of-the-art methods, delivering superior PSNR values and substantial SSIM scores. Its application to both FastMRI and Calgary-Campinas datasets underscores its versatility and efficacy. Importantly, as shown in
Figure 8, our method achieves these outstanding results with a relatively lower parameter count, highlighting its efficiency. Comparative analyses against similar approaches furnish robust evidence of our method’s superior performance.
Author Contributions
Conceptualization, B.W.; Methodology, B.W., H.Z. and Z.L.; Validation, B.W. and Y.L.; Formal analysis, X.X.; Investigation, H.Z.; Resources, X.X. and Z.L.; Data curation, B.W. and Z.L.; Writing—original draft, B.W.; Writing—review & editing, Y.L. and Z.L.; Supervision, Y.L., X.X. and Z.L.; Project administration, Y.L. and Z.L.; Funding acquisition, Y.L. and Z.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Key Research and Development Program of China (Grant no. 2022YFF0604704), National Natural Science Foundation of China (Grant No. 62275025), National Science and Technology Infrastructure Program (Grant no. APT2301-6) and Open Fund of State Key Laboratory of Infrared Physics (Grant No. SITP-NLIST-ZD-2023-06).
Data Availability Statement
The two publicly available datasets used in this article are from fastMRI Dataset (nyu.edu) and Calgary-Campinas Public Dataset (ccdataset.com).
Conflicts of Interest
The authors declare no conflict of interest.
References
- Singh, D.; Monga, A.; de Moura, H.L.; Zhang, X.; Zibetti, M.V.; Regatte, R.R. Emerging Trends in Fast MRI Using Deep-Learning Reconstruction on Undersampled k-Space Data: A Systematic Review. Bioengineering 2023, 10, 1012. [Google Scholar] [CrossRef] [PubMed]
- Ramzi, Z.; Ciuciu, P.; Starck, J.-L. Benchmarking MRI reconstruction neural networks on large public datasets. Appl. Sci. 2020, 10, 1816. [Google Scholar] [CrossRef]
- Donoho, D.L. Compressed sensing. IEEE Trans. Inform. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
- Lustig, M.; Donoho, D.L.; Santos, J.M.; Pauly, J.M. Compressed sensing MRI. IEEE Signal Proc. Mag. 2008, 25, 72–82. [Google Scholar] [CrossRef]
- Sun, J.; Li, H.; Xu, Z. Deep ADMM-Net for compressive sensing MRI. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Zhang, J.; Ghanem, B. ISTA-Net: Interpretable optimization-inspired deep network for image compressive sensing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1828–1837. [Google Scholar]
- Liu, Y.; Liu, Q.; Zhang, M.; Yang, Q.; Wang, S.; Liang, D. IFR-Net: Iterative feature refinement network for compressed sensing MRI. IEEE Trans. Comput. Imaging 2019, 6, 434–446. [Google Scholar] [CrossRef]
- Tezcan, K.C.; Baumgartner, C.F.; Luechinger, R.; Pruessmann, K.P.; Konukoglu, E. MR image reconstruction using deep density priors. IEEE Trans. Med. Imaging 2018, 38, 1633–1642. [Google Scholar] [CrossRef] [PubMed]
- Seitzer, M.; Yang, G.; Schlemper, J.; Oktay, O.; Würfl, T.; Christlein, V.; Wong, T.; Mohiaddin, R.; Firmin, D.; Keegan, J. Adversarial and perceptual refinement for compressed sensing MRI reconstruction. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, 16–20 September 2018; Part I. pp. 232–240. [Google Scholar]
- Deora, P.; Vasudeva, B.; Bhattacharya, S.; Pradhan, P.M. Structure preserving compressive sensing MRI reconstruction using generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 522–523. [Google Scholar]
- Adler, J.; Öktem, O. Learned primal-dual reconstruction. IEEE Trans. Med. Imaging 2018, 37, 1322–1332. [Google Scholar] [CrossRef] [PubMed]
- Cheng, J.; Wang, H.; Ying, L.; Liang, D. Model learning: Primal dual networks for fast MR imaging. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, 13–17 October 2019; Part III. pp. 21–29. [Google Scholar]
- Dar, S.U.; Yurt, M.; Shahdloo, M.; Ildız, M.E.; Tınaz, B.; Cukur, T. Prior-guided image reconstruction for accelerated multi-contrast MRI via generative adversarial networks. IEEE J-STSP 2020, 14, 1072–1087. [Google Scholar] [CrossRef]
- Shende, P.; Pawar, M.; Kakde, S. A brief review on: MRI images reconstruction using GAN. In Proceedings of the 2019 International Conference on Communication and Signal Processing (ICCSP), Melmaruvathur, India, 20–23 September 2019; pp. 0139–0142. [Google Scholar]
- Luo, G.; Zhao, N.; Jiang, W.; Hui, E.S.; Cao, P. MRI reconstruction using deep Bayesian estimation. Magn. Reson. Med. 2020, 84, 2246–2261. [Google Scholar] [CrossRef]
- Liu, Q.; Yang, Q.; Cheng, H.; Wang, S.; Zhang, M.; Liang, D. Highly undersampled magnetic resonance imaging reconstruction using autoencoding priors. Magn. Reson. Med. 2020, 83, 322–336. [Google Scholar] [CrossRef]
- Huang, Q.; Yang, D.; Wu, P.; Qu, H.; Yi, J.; Metaxas, D. MRI reconstruction via cascaded channel-wise attention network. In Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), Venice, Italy, 8–11 April 2019; pp. 1622–1626. [Google Scholar]
- Qin, C.; Schlemper, J.; Caballero, J.; Price, A.N.; Hajnal, J.V.; Rueckert, D. Convolutional recurrent neural networks for dynamic MR image reconstruction. IEEE Trans. Med. Imaging 2018, 38, 280–290. [Google Scholar] [CrossRef]
- Putzky, P.; Welling, M. Invert to learn to invert. In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Bian, C.C.; Cao, N.; Mao, M.H. CSDL-Net: An iterative network based on compressed sensing and deep learning. Int. J. Imaging Syst. Technol. 2022, 32, 1511–1520. [Google Scholar] [CrossRef]
- Chen, E.Z.; Wang, P.; Chen, X.; Chen, T.; Sun, S. Pyramid convolutional RNN for MRI image reconstruction. IEEE Trans. Med. Imaging 2022, 41, 2033–2047. [Google Scholar] [CrossRef]
- Qiao, X.; Huang, Y.; Li, W. MEDL-Net: A model-based neural network for MRI reconstruction with enhanced deep learned regularizers. Magn. Reson. Med. 2023, 89, 2062–2075. [Google Scholar] [CrossRef] [PubMed]
- Chatterjee, S.; Breitkopf, M.; Sarasaen, C.; Yassin, H.; Rose, G.; Nürnberger, A.; Speck, O. Reconresnet: Regularised residual learning for mr image reconstruction of undersampled cartesian and radial data. Comput. Biol. Med. 2022, 143, 105321. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Pang, Y.; Tong, C. DSMENet: Detail and structure mutually enhancing network for under-sampled MRI reconstruction. Comput. Biol. Med. 2023, 154, 106204. [Google Scholar] [CrossRef] [PubMed]
- Huang, J.; Fang, Y.; Wu, Y.; Wu, H.; Gao, Z.; Li, Y.; Del Ser, J.; Xia, J.; Yang, G. Swin transformer for fast MRI. Neurocomputing 2022, 493, 281–304. [Google Scholar] [CrossRef]
- Zhou, L.; Zhu, M.; Xiong, D.; Ouyang, L.; Ouyang, Y.; Chen, Z.; Zhang, X. RNLFNet: Residual non-local Fourier network for undersampled MRI reconstruction. Biomed. Signal Process 2023, 83, 104632. [Google Scholar] [CrossRef]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 10–17 October 2021; pp. 1833–1844. [Google Scholar]
- Zhao, X.; Yang, T.; Li, B.; Zhang, X. SwinGAN: A dual-domain Swin Transformer-based generative adversarial network for MRI reconstruction. Comput. Biol. Med. 2023, 153, 106513. [Google Scholar] [CrossRef]
- Feng, C.-M.; Yan, Y.; Fu, H.; Chen, L.; Xu, Y. Task transformer network for joint MRI reconstruction and super-resolution. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, 27 September–1 October 2021; Part VI. pp. 307–317. [Google Scholar]
- Wang, W.; Shen, H.; Chen, J.; Xing, F. MHAN: Multi-Stage Hybrid Attention Network for MRI reconstruction and super-resolution. Comput. Biol. Med. 2023, 163, 107181. [Google Scholar] [CrossRef]
- Wu, Z.; Liao, W.; Yan, C.; Zhao, M.; Liu, G.; Ma, N.; Li, X. Deep learning based MRI reconstruction with transformer. Comput. Methods Programs Biomed. 2023, 233, 107452. [Google Scholar] [CrossRef]
- Zbontar, J.; Knoll, F.; Sriram, A.; Murrell, T.; Huang, Z.; Muckley, M.J.; Defazio, A.; Stern, R.; Johnson, P.; Bruno, M. fastMRI: An open dataset and benchmarks for accelerated MRI. arXiv 2018, arXiv:1811.08839. [Google Scholar]
- Knoll, F.; Zbontar, J.; Sriram, A.; Muckley, M.J.; Bruno, M.; Defazio, A.; Parente, M.; Geras, K.J.; Katsnelson, J.; Chandarana, H. fastMRI: A publicly available raw k-space and DICOM dataset of knee images for accelerated MR image reconstruction using machine learning. Radiol. Artif. Intell. 2020, 2, e190007. [Google Scholar] [CrossRef] [PubMed]
- Souza, R.; Lucena, O.; Garrafa, J.; Gobbi, D.; Saluzzi, M.; Appenzeller, S.; Rittner, L.; Frayne, R.; Lotufo, R. An open, multi-vendor, multi-field-strength brain MR dataset and analysis of publicly available skull stripping methods agreement. NeuroImage 2018, 170, 482–494. [Google Scholar] [CrossRef] [PubMed]
- Jin, K.H.; McCann, M.T.; Froustey, E.; Unser, M. Deep convolutional neural network for inverse problems in imaging. IEEE Trans. Image Process. 2017, 26, 4509–4522. [Google Scholar] [CrossRef] [PubMed]
- Eo, T.; Jun, Y.; Kim, T.; Jang, J.; Lee, H.J.; Hwang, D. KIKI-net: Cross-domain convolutional neural networks for reconstructing undersampled magnetic resonance images. Magn. Reson. Med. 2018, 80, 2188–2201. [Google Scholar] [CrossRef] [PubMed]
- Schlemper, J.; Caballero, J.; Hajnal, J.V.; Price, A.N.; Rueckert, D. A deep cascade of convolutional neural networks for dynamic MR image reconstruction. IEEE Trans. Med. Imaging 2017, 37, 491–503. [Google Scholar] [CrossRef]
- Hong, G.Q.; Wei, Y.T.; Morley, W.A.; Wan, M.; Mertens, A.J.; Su, Y.; Cheng, H.-L.M. Dual-domain accelerated MRI reconstruction using transformers with learning-based undersampling. Comput. Med. Imaging Graph. 2023, 106, 102206. [Google Scholar] [CrossRef]
Figure 1.
The overall architecture. (a) Progressive Feature Reconstruction and Fusion Network consists of two main components: the Progressive Feature Reconstruction Module (L-Recon/M-Recon/H-Recon) and the Fusion Module (MOFT). The Progressive Feature Reconstruction Module generates five sets of different reconstruction outputs in sequence, which are then fed into the Fusion Module for adaptive fusion to produce high-quality MRI images. (b) Architecture of the Reconstruction Module (L-Recon/M-Recon/H-Recon). Depending on the reconstruction of features at different orders, initial down-sampling is performed to varying dimensions, dynamically distributing different channel subsets to feature extractors of distinct frequencies to achieve feature extraction and reconstruction (“↓” represents downsampling and “↑” represents upsampling). (c) Architecture of the Fusion Module (MOFT). The reconstruction outputs from the Reconstruction Module are fused using a designed two-stage fusion framework, emphasizing and coordinating the salient characteristics of each reconstruction stage, ultimately achieving collaborative fusion of image features.
Figure 1.
The overall architecture. (a) Progressive Feature Reconstruction and Fusion Network consists of two main components: the Progressive Feature Reconstruction Module (L-Recon/M-Recon/H-Recon) and the Fusion Module (MOFT). The Progressive Feature Reconstruction Module generates five sets of different reconstruction outputs in sequence, which are then fed into the Fusion Module for adaptive fusion to produce high-quality MRI images. (b) Architecture of the Reconstruction Module (L-Recon/M-Recon/H-Recon). Depending on the reconstruction of features at different orders, initial down-sampling is performed to varying dimensions, dynamically distributing different channel subsets to feature extractors of distinct frequencies to achieve feature extraction and reconstruction (“↓” represents downsampling and “↑” represents upsampling). (c) Architecture of the Fusion Module (MOFT). The reconstruction outputs from the Reconstruction Module are fused using a designed two-stage fusion framework, emphasizing and coordinating the salient characteristics of each reconstruction stage, ultimately achieving collaborative fusion of image features.
Figure 2.
The reconstruction results and error maps are compared among 8 algorithms under 4× acceleration on the FastMRI dataset.
Figure 2.
The reconstruction results and error maps are compared among 8 algorithms under 4× acceleration on the FastMRI dataset.
Figure 3.
The reconstruction results and error maps are compared among 8 algorithms under 8× acceleration on the FastMRI dataset.
Figure 3.
The reconstruction results and error maps are compared among 8 algorithms under 8× acceleration on the FastMRI dataset.
Figure 4.
The reconstruction results and error maps are compared among 8 algorithms under 4× acceleration on the Calgary-Campinas dataset.
Figure 4.
The reconstruction results and error maps are compared among 8 algorithms under 4× acceleration on the Calgary-Campinas dataset.
Figure 5.
The reconstruction results and error maps are compared among 8 algorithms under 8× acceleration on the Calgary-Campinas dataset.
Figure 5.
The reconstruction results and error maps are compared among 8 algorithms under 8× acceleration on the Calgary-Campinas dataset.
Figure 6.
Effect of L-Recon (M-Recon/H-Recon) quantity on reconstruction quality.
Figure 6.
Effect of L-Recon (M-Recon/H-Recon) quantity on reconstruction quality.
Figure 7.
Comparison of reconstruction metrics for different frequency component allocation strategies.
Figure 7.
Comparison of reconstruction metrics for different frequency component allocation strategies.
Figure 8.
Comparison of PSNR and model parameters on FastMRI dataset (left) and Calgary-Campinas dataset (right) with 8× accelerated acquisition under our algorithm and other comparative methods. Our algorithm achieves optimal reconstruction results with lower model complexity.
Figure 8.
Comparison of PSNR and model parameters on FastMRI dataset (left) and Calgary-Campinas dataset (right) with 8× accelerated acquisition under our algorithm and other comparative methods. Our algorithm achieves optimal reconstruction results with lower model complexity.
Table 1.
Channel expansion and allocation in different reconstruction stages during initial down-sampling process.
Table 1.
Channel expansion and allocation in different reconstruction stages during initial down-sampling process.
| L-Recon | M-Recon | H-Recon |
---|
Total number of channels (c) | 24 | 48 | 96 |
High-freq components (c1) | 16 | 24 | 32 |
Low-freq components (c2) | 8 | 24 | 64 |
Table 2.
Quantitative performance of testing methods on FastMRI dataset.
Table 2.
Quantitative performance of testing methods on FastMRI dataset.
| 4× | 8× |
---|
Algorithm | PSNR | SSIM | NMSE | PSNR | SSIM | NMSE |
---|
Unet | 31.91 | 0.800 | 0.0347 | 29.73 | 0.742 | 0.0502 |
KIKI-Net | 31.89 | 0.796 | 0.0348 | 29.27 | 0.722 | 0.0542 |
Cascade-net | 31.97 | 0.801 | 0.0336 | 29.98 | 0.744 | 0.0480 |
CRNN-MRI | 32.27 | 0.805 | 0.0328 | 29.86 | 0.743 | 0.0492 |
PD-Net | 32.15 | 0.804 | 0.0331 | 30.18 | 0.745 | 0.0467 |
CSDL-Net | 32.29 | 0.806 | 0.0327 | 30.40 | 0.753 | 0.0448 |
DuDReTLU-net | 31.97 | 0.803 | 0.0337 | 29.97 | 0.744 | 0.0480 |
KTMR | 32.13 | 0.804 | 0.0331 | 30.25 | 0.750 | 0.0465 |
Ours | 32.60 | 0.818 | 0.0313 | 31.02 | 0.771 | 0.0413 |
Table 3.
Quantitative performance of testing methods on Calgary-Campinas dataset.
Table 3.
Quantitative performance of testing methods on Calgary-Campinas dataset.
| 4× | 8× |
---|
Algorithm | PSNR | SSIM | NMSE | PSNR | SSIM | NMSE |
---|
Unet | 34.76 | 0.925 | 0.0142 | 31.90 | 0.876 | 0.0267 |
KIKI-Net | 35.84 | 0.934 | 0.0129 | 31.69 | 0.869 | 0.0283 |
Cascade-net | 36.23 | 0.939 | 0.0105 | 31.86 | 0.875 | 0.0265 |
CRNN-MRI | 36.87 | 0.945 | 0.0090 | 32.55 | 0.881 | 0.0234 |
PD-Net | 36.28 | 0.939 | 0.0105 | 32.03 | 0.870 | 0.0262 |
CSDL-Net | 36.49 | 0.940 | 0.0098 | 32.65 | 0.883 | 0.0229 |
DuDReTLU-net | 34.74 | 0.923 | 0.0144 | 31.54 | 0.867 | 0.0292 |
KTMR | 35.96 | 0.936 | 0.0108 | 31.93 | 0.867 | 0.0268 |
Ours | 37.68 | 0.954 | 0.0077 | 33.44 | 0.901 | 0.0195 |
Table 4.
Performance analysis of MOFT module.
Table 4.
Performance analysis of MOFT module.
| PSNR | SSIM | NMSE |
---|
Cat + 1 × 1 | 32.40 | 0.809 | 0.0328 |
Refine Module | 32.52 | 0.815 | 0.0319 |
MOFT | 32.60 | 0.818 | 0.0313 |
Table 5.
Ablation study on progressive feature reconstruction assessment (“√” represents the block being retained in ablation studies, while “×” represents the block being removed in ablation studies).
Table 5.
Ablation study on progressive feature reconstruction assessment (“√” represents the block being retained in ablation studies, while “×” represents the block being removed in ablation studies).
L-Recon | M-Recon | H-Recon | PSNR | SSIM | NMSE |
√ | × | × | 32.10 | 0.808 | 0.0330 |
√ | √ | × | 32.51 | 0.816 | 0.0318 |
√ | × | √ | 32.46 | 0.815 | 0.0320 |
√ | √ | √ | 32.60 | 0.818 | 0.0313 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).