Next Article in Journal
A Conceptual Framework for Optimizing Performance in Sustainable Supply Chain Management and Digital Transformation towards Industry 5.0
Previous Article in Journal
Global Dynamics and Optimal Control of a Fractional-Order SIV Epidemic Model with Nonmonotonic Occurrence Rate
Previous Article in Special Issue
An Irregular Pupil Localization Network Driven by ResNet Architecture
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Multi-Branch Feature Extraction Residual Network for Lightweight Image Super-Resolution

1
Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, Nanjing 210046, China
2
Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou 215006, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(17), 2736; https://doi.org/10.3390/math12172736
Submission received: 6 July 2024 / Revised: 14 August 2024 / Accepted: 31 August 2024 / Published: 1 September 2024

Abstract

:
Single-image super-resolution (SISR) seeks to elucidate the mapping relationships between low-resolution and high-resolution images. However, high-performance network models often entail a significant number of parameters and computations, presenting limitations in practical applications. Therefore, prioritizing a light weight and efficiency becomes crucial when applying image super-resolution (SR) to real-world scenarios. We propose a straightforward and efficient method, the Multi-Branch Feature Extraction Residual Network (MFERN), to tackle lightweight image SR through the fusion of multi-information self-calibration and multi-attention information. Specifically, we have devised a Multi-Branch Residual Feature Fusion Module (MRFFM) that leverages a multi-branch residual structure to succinctly and effectively fuse multiple pieces of information. Within the MRFFM, we have designed the Multi-Scale Attention Feature Fusion Block (MAFFB) to adeptly extract features via convolution and self-calibration attention operations. Furthermore, we introduce a Dual Feature Calibration Block (DFCB) to dynamically fuse feature information using dynamic weight values derived from the upper and lower branches. Additionally, to overcome the limitation of convolution in solely extracting local information, we incorporate a Transformer module to effectively integrate global information. The experimental results demonstrate that our MFERN exhibits outstanding performance in terms of model parameters and overall performance.

1. Introduction

Single-image super-resolution (SISR) poses a fundamental challenge in computer vision, seeking to enhance low-resolution (LR) input images to high-resolution (HR) ones. With the rise of deep neural networks, SISR has achieved remarkable success in generating high-quality outputs [1,2]. However, attaining such quality often entails the use of computationally intensive models, demanding substantial computing power, storage resources, and memory, consequently impeding both training and testing processes [3]. For instance, the RCAN [4] model, renowned for its performance in image SR tasks, comprises over 800 convolutional layers with approximately 15 million parameters, rendering it unsuitable for resource-constrained devices. Consequently, recent research has shifted towards developing lightweight SR network models to facilitate deployment on mobile devices.
Numerous methods utilize recursion techniques or parameter-sharing strategies to minimize the parameter count. For instance, the DRCN [5] and DRRN [6] utilized recursion mechanisms to explore lightweight model architectures. The CARN [4] employed weight sharing and group convolution to diminish the network parameters. The IMDN [7] leveraged residual feature extraction to enhance model efficiency. LCRCA [8] introduced a lightweight and efficient residual block that enhanced residual information within the same computational budget. The SFFN [9] advocated for the adoption of generic, lightweight, and efficient feature fusion blocks to replace conventional 1 × 1 convolutions.
Traditional convolutional neural networks (CNNs) excel in extracting local features, but they often struggle in capturing the global context. For high-quality image reproduction, it is crucial to understand the overall structure and relationships within an image to generate detailed results. CNNs are typically less effective than self-attention mechanisms, like Transformers, in handling long-range dependencies. This limitation can affect the detail and coherence of image reconstruction. In contrast, Transformers excel in integrating global information, which enhances the image clarity and consistency. As a result, integrating Transformers into computer vision tasks has gained significant attention [10,11]. For instance, SwinIR [12] leveraged global information extraction and sliding windows to address the edge uncorrelation issue in SISR. Consequently, the synergistic combination of CNNs and Transformers, along with the fusion of local feature extraction and detailed texture recovery, plays a pivotal role in image reconstruction endeavors [13,14]. Many current SR model architectures adopt a serial structure, facilitating the gradual extraction of more comprehensive feature information to enhance the image quality. In models that integrate CNNs and Transformers, like LBNet [14], the backbone is bifurcated into two components, the CNN and Transformer, which are then connected serially to harness the strengths of both modalities, resulting in notable performance gains. However, Transformers can be resource-intensive, requiring significant computing power and memory, which can lead to inefficiencies in lightweight SISR tasks.
The pure CNN network lacks context modeling abilities, and simply adding a Transformer can introduce many parameters and computational challenges. To address this, we propose the Multi-Branch Feature Extraction Residual Network (MFERN), which combines CNNs and Transformers for effective yet lightweight image reconstruction. CNNs are strong in extracting local features, while Transformers capture the global context and long-term dependencies. By integrating these, the MFERN enhances the feature extraction comprehensiveness. Additionally, our series association approach increases the flexibility in processing features across different scales, adapting to various visual tasks while maintaining a lightweight model without heavy memory and computational demands. We establish a sequential connection between the CNN module and Transformer module using a tight coupling method to integrate both local and global features, thereby enhancing the model’s performance and generalization capabilities. Within the CNN component, we present the Multi-Branch Residual Feature Fusion Module (MRFFM), which mainly incorporates the Multi-Scale Attention Feature Fusion Block (MAFFB) and Attention Feature Fusion Block (AFFB) modules. These are integrated with a multi-residual branch structure following multiple channel splitting operations. In the MAFFB module, the Dual Feature Calibration Block (DFCB) dynamically combines feature information using combination coefficients and dynamic weights. Additionally, a Spatial Attention Calibration Module (SACB) and Channel Attention Calibration Module (CACB) are included to effectively extract spatial and channel information, respectively. The MRFFM module adeptly extracts and blends features of various scales. Furthermore, by reducing the number of channels and employing channel splitting, we ensure a lightweight model architecture without compromising the performance. Within the AFFB module, we merge the SACB and CACB modules to fuse features across different scales, introducing a multi-attention feature fusion approach.
The contributions of this paper are summarized as follows.
  • We introduce the Multi-Branch Residual Feature Fusion Module (MRFFM). This module uses distillation operations within a multi-residual branch structure to efficiently extract features while keeping the design lightweight. Two feature extraction modules are included to further enhance the feature extraction capabilities.
  • We design the Spatial Attention Calibration Block (SACB) and Channel Attention Calibration Block (CACB) to incorporate the attention mechanism into self-calibrated convolution. These blocks, combined with multi-information weighted fusion, enhance the model’s performance and generalization abilities.
  • We integrate a CNN with a Transformer to ensure the effective integration of local details and global information in the Multi-Branch Feature Extraction Residual Network (MFERN). Additionally, a dense connection structure is incorporated to enhance the reuse rate of feature information.

2. Related Work

2.1. SR Models

Given that the SRCNN [1], employing convolutional operations, outperforms existing SR methods, research on image SR models using CNNs has gained significant attention. However, simple CNN networks often struggle to achieve high-quality image detail feature reconstruction [3,4,15]. Consequently, exploring the integration of Transformers into image vision tasks has emerged as a primary focus. Transformers excel in handling global information features and offer superior performance compared to CNNs [16], with lower FLOP and parameter counts. Moreover, researchers have enhanced Transformers by expanding the receptive field [17], extracting diverse features [18], and expediting reasoning through approaches such as eliminating layer normalization and proposing simplified MHSA [19].

2.2. Lightweight SR Models

The research on lightweight SISR models aims to enable efficient image SR tasks on mobile devices [20,21]. Existing lightweight methods include efficient model structure designs [7,22], pruning or quantization techniques [23], and knowledge distillation [24]. Additionally, researchers have reduced the model size through methods such as weight sharing and channel grouping. For instance, strategies like channel splitting and layered distillation in models such as the IDN [25] and IMDN [7] enhance feature extraction. Recursive cascade learning aids in learning cross-layer feature representations [26], while some models reuse middle-layer features through recursive learning. Despite significant exploration, the unresolved thematic issues in lightweight SISR models require further investigation.

2.3. Attention Feature Fusion

Certain efforts aim to aggregate features of various dimensions across multiple visual tasks [27] to enhance the performance. In CNNs, researchers have integrated attentional mechanisms in the spatial and channel dimensions to enrich the feature expression, as seen in the SCA-CNN [28] and DANet [29]. In Transformers [30], spatial self-attention models the long-range dependencies between pixels. Additionally, some researchers have investigated the incorporation of channel attention into Transformers [31] to fuse spatial and channel information, thereby enhancing the modeling capabilities of Transformers.

3. Proposed Method

This section outlines the structure of our proposed Multi-Branch Feature Extraction Residual Network (MFERN). Initially, we introduce the sequential structure and the intensive connection operation between the CNN components and the Transformer backbone network. Subsequently, we describe the Multi-Branch Residual Feature Fusion Module (MRFFM) within the CNN block, encompassing the Multi-Scale Attention Feature Fusion Block (MAFFB) and Attention Feature Fusion Block (AFFB). Then, we introduce the Dual Feature Calibration Block (DFCB), Spatial Attention Calibration Module (SACB), and Channel Attention Calibration Module (CACB). Lastly, we present the specific characteristics of the Transformer.

3.1. Network Framework

As depicted in Figure 1, the Multi-Branch Feature Extraction Residual Network (MFERN) comprises a sequence of Multi-Branch Residual Feature Fusion Modules (MRFFM) and Transformer modules. We integrate the CNN with the Transformer module to effectively combine local and global feature information. This allows the model to better recover image texture details and reconstruct high-quality images. We denote the input LR image as I L R , the model outputs as I S R , and the HR images as I H R . At the beginning of the model, a 3 × 3 convolutional layer is utilized to extract shallow information:
F s f = H s f I L R ,
where H s f represents a 3 × 3 convolutional layer (denoted as F s f for shallow feature extraction). Subsequently, F s f is forwarded to the CNN for local feature extraction. The network comprises four MRFFM modules, each consisting of three MAFFBs and one AFFB, enabling the extraction of additional local feature information through the multi-branch residual structure. The following formula expresses a portion of the CNN model operation:
F C N N = H C N N F s f ,
where H C N N represents CNN and local feature extraction, while F C N N denotes the CNN output of local feature extraction. Once the local features of the image are obtained, they are sent to the Transformer module to extract global information.
F T r a n s = H T r a n s F C N N ,
where H T r a n s signifies the Transformer module, and F T r a n s denotes feature reconstruction enhanced with global information. The reconstruction process can be expressed as
I S R = H b u i l d 1 F s f + H b u i l d 2 F T r a n s ,
where I S R denotes the ultimate output of the network, H b u i l d 1 represents the reconstruction module for F s f , and H b u i l d 2 represents the reconstruction module for F T r a n s . The reconstruction module comprises a 3 × 3 convolutional layer and a pixel shuffle layer.
To ensure a fair comparison of the experimental results, we incorporate the L 1 loss function to optimize our experimental model. For the training set I L R i , I H R i i = 1 N consisting of N images, the objective of the MFERN model is to minimize the values defined by the following loss function formula:
L ( Θ ) = 1 N i = 1 N H M F E R N ( I L R i , Θ ) I H R i 1 ,
where H M F E R N denotes the parameter set of the MFERN, and . 1 is the L 1 norm. The Θ indicates the parameter set of the proposed MFERN.

3.2. CNN Backbone

In the CNN segment, we introduce the Multi-Branch Residual Feature Fusion Module (MRFFM) to extract local information. All four modules utilize parameter-sharing technology to maintain the model’s lightweight nature. As depicted in Figure 1, the shallow feature F s f sequentially traverses through the four MRFFMs, and each layer’s features can be reused via skip connections. Deep neural networks with many layers may face challenges during training due to small gradients in backpropagation. Skip connections facilitate gradient flow by directly transmitting input information to subsequent layers, stabilizing the gradient and simplifying network training. The above processes can be expressed as
F M _ 1 = F C _ 1 H M _ 1 F s f + F s f ,
F M _ 2 = F C _ 2 H M _ 2 F M _ 1 + F M _ 1 + F s f ,
F M _ 3 = F C _ 3 H M _ 3 F M _ 2 + F M _ 2 + F M _ 1 + F s f ,
F C N N = H M _ 4 F M _ 3 + F M _ 3 + F M _ 2 + F M _ 1 + F s f ,
where H M _ n represents the n-th MRFFM, and F M _ n denotes the feature information extracted by the n-th MRFFM. F C _ n signifies the n-th convolution operation. F C N N represents the feature information extracted by the CNN framework.
The MRFFM comprises two additional modules, as shown in Figure 2: the Multi-Scale Attention Feature Fusion Block (MAFFB) and the Attentional Feature Fusion Block (AFFB). The method involves distilling multiple pieces of feature information to extract and fuse residual branch features multiple times, enhancing its feature expression capabilities. Within the MRFFM branch, we introduce the MAFFB, illustrated in the figure. The MAFFB includes the Dual Feature Calibration Block (DFCB), the Spatial Attention Calibration Module (SACB), and the Channel Attention Calibration Module (CACB).
Dual Feature Calibration Block (DFCB): As illustrated in Figure 3, within the DFCB, the features produced by the upper and lower branches are initially weighted and merged using combination coefficients (CC) [32]. The CC structure, as depicted in Figure 4, employs an attention mechanism to generate weight coefficients for adaptive information selection. Subsequently, the features pass through the Enhanced Spatial Attention (ESA) module with combined coefficients. This process extracts spatial information again, followed by input to two pooling layers simultaneously. Dynamic weight values are then derived post-convolution and activation, multiplying them with the branch features. Finally, the output is added back to the initial input. Through adaptive weights, dual pooling layers, and dynamic adaptive weight integration, the DFCB module excels in extracting valuable feature information efficiently. We denote the input of the DFCB as X i n d , and the aforementioned process can be expressed as
F o u t _ u p _ 1 = H c o n v ( X i n d ) ,
F o u t _ d o w n _ 1 = H c o n v ( F o u t _ u p _ 1 ) ,
F o u t _ u p _ 2 = F o u t _ u p _ 1 + C i u F o u t _ d o w n _ 1 ,
F o u t _ d o w n _ 2 = F o u t _ d o w n _ 1 + C i d F o u t _ u p _ 1 ,
F E S A = H E S A F o u t _ u p _ 2 + F o u t _ d o w n _ 2 ,
F m i d _ c o n c a t = C o n c a t ( H a v g F E S A , H m a x F E S A ) ,
w 1 , w 2 = H s p l i t H s i g m o i d H c o n v F m i d _ c o n c a t ,
F o u t d = w 1 F o u t _ u p _ 2 + w 2 F o u t _ d o w n _ 2 + X i n d ,
where H c o n v represents the operation of channel down-dimensioning and H s i g m o i d signifies the sigmoid activation function utilized for nonlinear processing. F o u t _ u p _ i and F o u t _ d o w n _ i (i = 1, 2) represent the output of layer i of the upper and lower branches, respectively. H E S A denotes the operation of the Efficient Spatial Attention (ESA) module, and F E S A represents the output of the ESA. C i u and C i d represent the two combined coefficient learning mechanisms connecting the upper and lower branches. H a v g and H m a x represent the average pooling and max pooling functions. F m i d _ c o n c a t represents the fusion output that will pass through the features of the upper and lower branches of the two pooling layers. w 1 and w 2 stand for the dynamic weights of the two branches. H s p l i t expresses the channel split function, and F o u t d represents the output of the DFCB unit.
Multi-Scale Attention Feature Fusion Block (MAFFB): The MAFFB initially utilizes a 1 × 1 convolution to reduce the number of channels, followed by feature extraction through a Dual Feature Calibration Block (DFCB), as shown in Figure 2. The 1 × 1 convolution conducts the linear combination of the pixels across the channels, maintaining the image’s spatial structure and modifying its depth. This process is beneficial for dimensionality adjustment, whether reducing or expanding it. Subsequently, after the number of channels is restored by 1 × 1 convolution, two DFCB operations are sequentially employed for feature extraction. Features from the second 1 × 1 convolution are processed through the upper and lower branches in a series of modules. Initially, feature extraction occurs in the upper branch’s first DFCB module, followed by output generation in the lower branch through convolution and the Spatial Attention Calibration (SACB) module. The fusion of the upper- and lower-branch features produces a spatial-level output. This output then undergoes processing via convolution and the Channel Attention Calibration (CACB) module for channel-level feature information. The intermediate feature output is combined with the adaptive weights from the upper branch’s first DFCB module. The MAFFB module employs spatial and channel self-calibrating attention to merge the information effectively at both levels, utilizing dynamic weights to boost the feature extraction efficiency and effectiveness. We denote the input of this module as X i n m . This process can be represented as
F o u t _ 1 = H c o n v H D F C B _ 1 H c o n v X i n m ,
F o u t _ n = H D F C B _ n F o u t _ n 1 n = 2 , 3 ,
F m i d = η 1 H S A C B H c o n v _ 3 F o u t _ 1 + η 2 F o u t _ 2 ,
F o u t m = η 3 H C A C B H c o n v _ 4 F m i d + η 4 F o u t _ 3 ,
where H c o n v represents the operation of channel down-dimensioning in the first 1 × 1 convolution, and H c o n v represents the operation of channel up-dimensioning in the second 1 × 1 convolution. F o u t _ n represents the output of the n-th DFCB. H D F C B _ n denotes the operation of the n-th DFCB module. F o u t _ m i d represents the intermediate output of the MAFFB unit. η i (i = 1, 2, 3, 4) denotes the adaptive weighted multipliers of the n-th output of the DFCB unit, and F o u t m represents the output of the MAFFB unit.
Channel (Spatial) Attention Calibration Block (CACB/SACB): In the CACB module, as shown in Figure 3, the raw input undergoes processing through three branches. The first branch employs basic convolution to reduce the number of channels while preserving the original features. The second branch focuses on extracting spatial information through a combination of convolutional and activation functions. The third branch enhances the feature extraction by incorporating channel attention. Subsequently, the outputs from the three branches are weighted to achieve multi-scale feature fusion. Similarly, the Spatial Attention Calibration Module (SACA) operates akin to the CACB, but with the channel attention (CA) module replaced by spatial attention (SA), enabling the extraction of more valuable spatial features. Spatial and channel attention mechanisms are integrated into self-calibrating convolution to dynamically establish relationships between each spatial position and its neighboring features. This enhancement improves the standard convolution layer’s performance by effectively broadening the receptive field of each spatial position without introducing additional parameters or escalating the model complexity. We denote the input of the unit as X i n c . The process can be expressed as
F o u t _ l e f t = H C A H r e l u H c o n v H s p l i t X i n c ,
F o u t _ r i g h t = H c o n v H r e l u H c o n v H s p l i t X i n c ,
F o u t _ m i d = H c o n v X i n c ,
F a d d = β 1 F o u t _ l e f t + β 2 F o u t _ m i d + β 3 F o u t _ r i g h t ,
F o u t c = H c o n v F a d d + X i n c ,
where F o u t _ l e f t , F o u t _ r i g h t , and F o u t _ m i d represent the output of the left branch, right branch, and middle branch, respectively. H C A refers to the channel attention operation. H c o n v represents the operation of channel down-dimensioning using 1 × 1 convolution, while H c o n v denotes the channel up-dimensioning operation of the final 1 × 1 convolution layer. H r e l u signifies the Rectified Linear Unit (ReLU) activation function utilized for nonlinear processing. F o u t c represents the ultimate output of the CACB unit. The symbols β i (i = 1, 2, 3) express the adaptive weighted multipliers applied to the outputs of the three branches within the CACB unit. The SACB operates similarly to the CACB, with the distinction that it replaces the channel attention mechanism with spatial attention to extract more beneficial spatial features.
Multi-Branch Residual Feature Fusion Module (MRFFM): In the MRFFM, as shown in Figure 2, the features undergo a sequence of three channel splitting operations, each resulting in two branches. One branch retains its feature, while the other is forwarded to the subsequent layer for additional feature extraction via convolution and MAFFB operations. A convolutional layer is integrated into the distillation connection segment to augment the dimensionality of the split channels. Subsequently, the features preserved post-split are concatenated and fused and then passed into the AFFB module. The AFFB module combines elements from both the SACB and CACB, as depicted in the figure. Initially, the original input undergoes channel operation splitting, followed by concatenating and fusing the features extracted by the SACB and CACB modules with the original features post-split. Finally, the weighted output is added. In the MRFFM, layered features from various residual branches are combined to integrate shallow and deep image features comprehensively. This method allows the model to concentrate effectively on important image features, increases the utilization of feature information, and enhances the restoration of intricate image details. We denote the input of the module as F i n m r . The aforementioned operations can be expressed as
F d i s t i l l e d _ 1 , F r e m a i n i n g _ 1 = H s p l i t _ 1 H M A F F B _ 1 H c o n v _ 1 F i n m r ,
F d i s t i l l e d _ 2 , F r e m a i n i n g _ 2 = H s p l i t _ 2 H M A F F B _ 2 H c o n v _ 2 F r e m a i n i n g _ 1 ,
F d i s t i l l e d _ 3 , F r e m a i n i n g _ 3 = H s p l i t _ 3 H M A F F B _ 3 H c o n v _ 3 F r e m a i n i n g _ 2 ,
F d i s t i l l e d _ 4 = H c o n v _ 4 F r e m a i n i n g _ 3 ,
F c o n c a t = C o n c a t F d i s t i l l e d _ 1 , F d i s t i l l e d _ 2 , F d i s t i l l e d _ 3 , F d i s t i l l e d _ 4 ,
F o u t m r = λ 1 H A F F B H c o n v _ 5 F c o n c a t + λ 2 F i n m r .
Here, F r e m a i n i n g _ n represents the n-th remaining features, F d i s t i l l e d _ n denotes the n-th distilled features, H M A F F B _ n represents the n-th MAFFB unit, H s p l i t _ n expresses the n-th channel split function, and F o u t m r represents the output of the MRFFM. “Concat” denotes the fusion of features from the channel dimensions. λ i (i = 1, 2) indicates the adaptive weight applied to the output by adding the features of the AFFB module and the input features. H A F F B represents the operation of the AFFB, and F c o n c a t represents the output of the Concat operation.

3.3. Transformer Backbone

The CNN component alone is insufficient in reconstructing high-quality images. Integrating local and global information is essential. Thus, a Transformer structure is incorporated to capture long-term image dependencies, and a recursive mechanism is introduced to leverage the Transformer’s performance benefits while ensuring minimal computational costs. Similar to ESRT, we initially pass the features through a linear layer, as shown in Figure 5, resulting in Q, K, and V values, which are subsequently split along the width and height dimensions. These are expressed as follows:
Q 1 Q n , K 1 K n , V 1 V n = F S p l i t Q , K , V .
Simultaneously, a feature reduction strategy is employed to further decrease the memory consumption. Each head of the multi-head attention (MHA) mechanism must execute a scaled dot product attention operation, followed by concatenating all outputs and applying a linear transformation to obtain the final output. The scaled dot product attention can be defined as
O i = A t t e n t i o n ( Q i , K i , V i ) = S o f t m a x ( Q i K i T d k ) V i ,
A t t e n t i o n ( Q , K , V ) = C o n c a t O 1 , , O n ,
where Q, K, and V refer to the query matrix, key matrix, and value matrix, respectively, and S o f t m a x represents the softmax operation function.

4. Experiments

4.1. Datasets and Evaluation Metrics

In this experiment, we utilized DIV2K as the training dataset. To thoroughly assess the performance, we employed five benchmark datasets to validate the effectiveness of the MFERN, comprising Set5 [33], Set14 [34], Urban100 [35], BSDS100 [36], and Manga109 [37]. Furthermore, the PSNR and SSIM [38] served as evaluation metrics to gauge the performance of the SR images in the YCbCr color space and the Y channel.

4.2. Implementation Details

This study involved augmenting the training dataset with random rotations and horizontal flips at different angles to improve the data diversity. The model underwent training for 1000 epochs, utilizing the PyTorch framework and the Adam optimizer for updates. The initial learning rate was configured as 5 × 10−4 and decayed to 6.25 × 10−6, following the cosine annealing strategy. Within the model architecture, both the CNN and Transformer modules feature an input channel size of 32 channels. All experiments were conducted on an NVIDIA RTX 2080Ti GPU for three days.

4.3. Comparison with State-of-the-Art

We conducted a comparative analysis of the MFERN against prominent lightweight image SR models across established benchmark datasets. These included VDSR [15], IDN [25], CARN [39], IMDN [7], MADNet [40], DCDN [20], SMSR [41], ECBSR [21], LAPAR-A [42], HPUN-M [43], GLADSR [44], LCRCA [8], DRSAN-48s [45], LatticeNet [32], AFAN-M [46], and FDSCSR-S [47]. The quantitative assessment of the SR for × 2 , × 3 , and × 4 image SR is presented in Table 1. From the table, it is evident that our MFERN model demonstrates exceptional performance while maintaining a modest number of model parameters. Notably, significant improvements are observed across the Set14, B100, and Manga109 datasets. It is noteworthy that our model’s training on the RTX 2080Ti GPU, combined with its minimal parameter count, represents a key advantage for us.
Furthermore, we compared the MFERN’s visual effects with those of other models. In Figure 6, we select three images from the Set14 and Urban100 datasets, varying in resolution. These include both large and small images, focusing on areas with detailed textures. We compared our model’s reconstruction visually with the outputs from other advanced models, listing and comparing the PSNR and SSIM values for each image. Our findings show that the MFERN not only outperformed the other models in terms of the image quality per image but also excelled in restoring intricate details.
Upon closer examination of the × 2 resolution factor, we selected Urban100 ( 2 × ): img_062, where the reconfigured composition revealed a detailed texture. Despite minor blurring, the MFERN significantly outperformed the other methods in restoring the image textures. Similarly, at the × 3 resolution factor, we chose Urban100 ( 3 × ): img_048. The MFERN accurately restored the building’s exterior lines, enhancing the clarity. Notably, in Set14 ( 4 × ): barbara_HR_ × 4 , focusing on the prominent texture details, such as the chair lines, the MFERN’s result closely resembled the HR image, while the other methods produced blurred results.

4.4. Ablation Studies

4.4.1. Effectiveness of MRFFM

To assess the efficacy of the MRFFM, we replaced the MRFFM module with the IMDB [7] and RFDB [49] modules, respectively. To ensure experimental integrity, all model parameters were adjusted to approximately 700K, and it was trained at a × 4 resolution and evaluated on the Manga109 test dataset. The results, presented in Table 2, indicate that our designed MAFFB module outperformed the other two modules within the same framework and with similar parameters. While the MRFFM had slightly higher computational demands compared to the other modules, the performance gain far outweighed this increase. Thus, the effectiveness of the MRFFM module is convincingly demonstrated.

4.4.2. AFFB Validity

To assess the validity of the AFFB module, we excluded it from the MRFFM module, trained the model at a × 4 image SR, and evaluated it on the BSD100 dataset. The test results are presented in Table 3, demonstrating that the AFFB module that we designed consistently enhances the network model’s performance.

4.4.3. Effectiveness of MAFFB

Subsequently, we assessed the effectiveness of each module within the MAFFB through ablation experiments. Initially, we conducted an ablation experiment to evaluate the efficacy of the MAFFB itself. Obtained after directly removing the MAFFB from the MRFFM module, the data presented in Table 3 illustrate a significant decrease in model performance, affirming its importance.
Additionally, we individually evaluated the effectiveness of the DFCB, SACB, and CACB modules within the MAFFB. These modules were removed one by one, and the model was trained under × 4 resolution factors and evaluated on the Urban100 dataset. The results are summarized in Table 4, indicating a notable decline in model performance upon the removal of these modules. This underscores the effectiveness of our designed modules.

4.4.4. Effectiveness of DFCB

To validate the effectiveness of the DFCB module structure that we designed, we conducted experiments by removing the CC structure and ESA module from the original DFCB module individually. These experiments were performed at a × 4 image SR and verified using the Set14 dataset. The results are presented in Table 5, demonstrating the rationality of our DFCB module design.

4.4.5. Dense Connection (DC)

We conducted an ablation experiment on the dense connection part of the overall structure, i.e., we removed the CNN part, the Trans part, and the DC structure of the entire network architecture, respectively, and verified its validity on the Urban100 dataset. As shown in Table 6, the data indicate that, in the three experiments, the number of model parameters with or without the DC structure and the amount of calculation did not change significantly. However, after the DC structure was added to both parts, the model performance was significantly improved, which verifies the validity of our model structure.

4.4.6. Comparison with Some Transformer-Based Methods

We benchtested the MFERN against several existing Transformer-based approaches, namely SwinIR [12], ESRT [13], and LBNet [14]. As shown in Table 7, the MFERN proposed by us does not exceed SwinIR [12] in performance. However, compared with SwinIR and using the Flickr2k dataset for training, we used DIV2k to train the model and the model was much lighter, achieving a good balance between performance and efficiency.

4.5. Model Complexity Studies

We comprehensively compared the model’s complexity with that of the existing methods, including the number of model parameters and computation, on the Set5 test set. Figure 7 shows the good balance of our MFERN with a small number of parameters and a fast execution time. It shows the efficiency and effectiveness of our model.

5. Conclusions

In this study, we introduce the Multi-Branch Feature Extraction Residual Network (MFERN) tailored to efficient image SR tasks. The CNN component of the MFERN comprises four Multi-Branch Residual Feature Fusion Modules (MRFFM), which optimize the model parameters through parameter sharing. Additionally, we employ a dense connection methodology to amalgamate information across all layers, enhancing the effective utilization of the feature data. The MRFFM integrates a multi-branch residual structure to maintain model efficiency while ensuring robust feature extraction. Moreover, we introduce the Multi-Scale Attention Feature Fusion Block (MAFFB) and Attentional Feature Fusion Block (AFFB), leveraging attention mechanisms and dynamic weight operations to extract valuable information and enhance the model performance through output fusion. Concurrently, by serially integrating CNN and Transformer modules, the MFERN adeptly addresses both local details and global features, thereby boosting the model performance. By effectively balancing the model size and performance, the MFERN efficiently tackles image SR tasks. Meanwhile, it is necessary to explore more effective ways to combine the strengths of CNNs and Transformers. Additionally, our future goal will be to develop even lighter models without compromising the performance.

Author Contributions

Methodology, C.L.; Writing—original draft, C.L. and X.W.; Writing—review & editing, G.G.; Supervision, G.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Open Fund Project of Provincial Key Laboratory for Computer Information Processing Technology (Soochow University) under Grant KJS2274.

Data Availability Statement

The results/data/figures in this manuscript have not been published elsewhere, nor are they under consideration by another publisher. The original contributions presented in the study are included in the article, and further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
  2. Li, W.; Wang, M.; Zhang, K.; Li, J.; Li, X.; Zhang, Y.; Gao, G.; Deng, W.; Lin, C.W. Survey on Deep Face Restoration: From Non-blind to Blind and Beyond. arXiv 2023, arXiv:2309.15490. [Google Scholar]
  3. Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
  4. Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
  5. Kim, J.; Lee, J.K.; Lee, K.M. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27– 30 June 2016; pp. 1637–1645. [Google Scholar]
  6. Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
  7. Hui, Z.; Gao, X.; Yang, Y.; Wang, X. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the ACM International Conference on Multimedia, Nice, France, 2–25 October 2019; pp. 2024–2032. [Google Scholar]
  8. Peng, C.; Shu, P.; Huang, X.; Fu, Z.; Li, X. LCRCA: Image super-resolution using lightweight concatenated residual channel attention networks. Appl. Intell. 2022, 52, 10045–10059. [Google Scholar] [CrossRef]
  9. Wang, Z.; Liu, Y.; Zhu, R.; Yang, W.; Liao, Q. Lightweight single image super-resolution with similar feature fusion block. IEEE Access 2022, 10, 30974–30981. [Google Scholar] [CrossRef]
  10. Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar]
  11. Yu, R.; Du, D.; LaLonde, R.; Davila, D.; Funk, C.; Hoogs, A.; Clipp, B. Cascade transformers for end-to-end person search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7267–7276. [Google Scholar]
  12. Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
  13. Lu, Z.; Li, J.; Liu, H.; Huang, C.; Zhang, L.; Zeng, T. Transformer for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 457–466. [Google Scholar]
  14. Gao, G.; Wang, Z.; Li, J.; Li, W.; Yu, Y.; Zeng, T. Lightweight Bimodal Network for Single-Image Super-Resolution via Symmetric CNN and Recursive Transformer. In Proceedings of the International Joint Conference on Artificial Intelligence, Messe Wien, Vienna, 23–29 July 2022; pp. 661–669. [Google Scholar]
  15. Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
  16. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
  17. Chen, X.; Wang, X.; Zhou, J.; Qiao, Y.; Dong, C. Activating more pixels in image super-resolution transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 22367–22377. [Google Scholar]
  18. Li, A.; Zhang, L.; Liu, Y.; Zhu, C. Feature modulation transformer: Cross-refinement of global representation via high-frequency prior for image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 12514–12524. [Google Scholar]
  19. Liu, Y.; Dong, H.; Liang, B.; Liu, S.; Dong, Q.; Chen, K.; Chen, F.; Fu, L.; Wang, F. Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for Super-Resolution. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 7952–7960. [Google Scholar]
  20. Li, Y.; Cao, J.; Li, Z.; Oh, S.; Komuro, N. Lightweight single image super-resolution with dense connection distillation network. ACM Trans. Multimed. Comput. Commun. Appl. 2021, 17, 1–17. [Google Scholar] [CrossRef]
  21. Zhang, X.; Zeng, H.; Zhang, L. Edge-oriented convolution block for real-time super resolution on mobile devices. In Proceedings of the ACM International Conference on Multimedia, Virtual Event, China, 20–24 October 2021; pp. 4034–4043. [Google Scholar]
  22. Gao, G.; Li, W.; Li, J.; Wu, F.; Lu, H.; Yu, Y. Feature distillation interaction weighting network for lightweight image super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2022; Volume 36, pp. 661–669. [Google Scholar]
  23. Li, H.; Yan, C.; Lin, S.; Zheng, X.; Zhang, B.; Yang, F.; Ji, R. Pams: Quantized super-resolution via parameterized max scale. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 564–580. [Google Scholar]
  24. Lee, W.; Lee, J.; Kim, D.; Ham, B. Learning with privileged information for efficient image super-resolution. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 465–482. [Google Scholar]
  25. Hui, Z.; Wang, X.; Gao, X. Fast and accurate single image super-resolution via information distillation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 723–731. [Google Scholar]
  26. Tai, Y.; Yang, J.; Liu, X. Image super-resolution via deep recursive residual network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3147–3155. [Google Scholar]
  27. Li, W.; Li, J.; Gao, G.; Deng, W.; Yang, J.; Qi, G.J.; Lin, C.W. Efficient Image Super-Resolution with Feature Interaction Weighted Hybrid Network. arXiv 2022, arXiv:2212.14181. [Google Scholar]
  28. Chen, L.; Zhang, H.; Xiao, J.; Nie, L.; Shao, J.; Liu, W.; Chua, T.S. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5659–5667. [Google Scholar]
  29. Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
  30. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  31. Li, W.; Guo, H.; Liu, X.; Liang, K.; Hu, J.; Ma, Z.; Guo, J. Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network. ACMMM 2024.
  32. Luo, X.; Qu, Y.; Xie, Y.; Zhang, Y.; Li, C.; Fu, Y. Lattice network for lightweight image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 4826–4842. [Google Scholar] [CrossRef] [PubMed]
  33. Bevilacqua, M.; Roumy, A.; Guillemot, C.; Alberi-Morel, M.L. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In Proceedings of the British Machine Vision Conference, Surrey, UK, 3–7 September 2012; pp. 135.1–135.10. [Google Scholar]
  34. Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Proceedings of the International Conference on Curves and Surfaces, Oslo, Norway, 3–28 June 2012; pp. 711–730. [Google Scholar]
  35. Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–5 June 2015; pp. 5197–5206. [Google Scholar]
  36. Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the IEEE International Conference on Computer Vision, Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 416–423. [Google Scholar]
  37. Matsui, Y.; Ito, K.; Aramaki, Y.; Fujimoto, A.; Ogawa, T.; Yamasaki, T.; Aizawa, K. Sketch-based manga retrieval using manga109 dataset. Multimed. Tools Appl. 2017, 76, 21811–21838. [Google Scholar] [CrossRef]
  38. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
  39. Ahn, N.; Kang, B.; Sohn, K.A. Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 252–268. [Google Scholar]
  40. Lan, R.; Sun, L.; Liu, Z.; Lu, H.; Pang, C.; Luo, X. MADNet: A fast and lightweight network for single-image super resolution. IEEE Trans. Cybern. 2020, 51, 1443–1453. [Google Scholar] [CrossRef] [PubMed]
  41. Wang, L.; Dong, X.; Wang, Y.; Ying, X.; Lin, Z.; An, W.; Guo, Y. Exploring sparsity in image super-resolution for efficient inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4917–4926. [Google Scholar]
  42. Li, W.; Zhou, K.; Qi, L.; Jiang, N.; Lu, J.; Jia, J. Lapar: Linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond. Adv. Neural Inf. Process. Syst. 2020, 33, 20343–20355. [Google Scholar]
  43. Sun, B.; Zhang, Y.; Jiang, S.; Fu, Y. Hybrid pixel-unshuffled network for lightweight image super-resolution. arXiv 2022, arXiv:2203.08921. [Google Scholar] [CrossRef]
  44. Zhang, X.; Gao, P.; Liu, S.; Zhao, K.; Li, G.; Yin, L.; Chen, C.W. Accurate and efficient image super-resolution via global-local adjusting dense network. IEEE Trans. Multimed. 2021, 23, 1924–1937. [Google Scholar] [CrossRef]
  45. Park, K.; Soh, J.W.; Cho, N.I. A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution. IEEE Trans. Multimed. 2023, 25, 907–918. [Google Scholar] [CrossRef]
  46. Wang, L.; Li, K.; Tang, J.; Liang, Y. Image super-resolution via lightweight attention-directed feature aggregation network. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 19, 1–23. [Google Scholar] [CrossRef]
  47. Wang, Z.; Gao, G.; Li, J.; Yan, H.; Zheng, H.; Lu, H. Lightweight feature de-redundancy and self-calibration network for efficient image super-resolution. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 19, 1–15. [Google Scholar] [CrossRef]
  48. Wang, C.; Li, Z.; Shi, J. Lightweight image super-resolution with adaptive weighted learning network. arXiv 2019, arXiv:1904.02358. [Google Scholar]
  49. Liu, J.; Tang, J.; Wu, G. Residual feature distillation network for lightweight image super-resolution. In Proceedings of the European Conference on Computer Vision Workshops, Glasgow, UK, 23–28 August 2020; pp. 41–55. [Google Scholar]
Figure 1. The architecture of the Multi-Branch Feature Extraction Residual Network (MFERN).
Figure 1. The architecture of the Multi-Branch Feature Extraction Residual Network (MFERN).
Mathematics 12 02736 g001
Figure 2. The architecture of the proposed Multi-Branch Residual Feature Fusion Module (MRFFM) and its components: the Multi-Scale Attention Feature Fusion Block (MAFFB) and the Attention Feature Fusion Block (AFFB).
Figure 2. The architecture of the proposed Multi-Branch Residual Feature Fusion Module (MRFFM) and its components: the Multi-Scale Attention Feature Fusion Block (MAFFB) and the Attention Feature Fusion Block (AFFB).
Mathematics 12 02736 g002
Figure 3. The architecture of the Spatial Attention Calibration Block (SACB), Channel Attention Calibration Module (CACB), and Dual Feature Calibration Block (DFCB) units. In the DFCB, C i u and C i d denote the combination coefficient (CC) learning, which is elaborated in Figure 4.
Figure 3. The architecture of the Spatial Attention Calibration Block (SACB), Channel Attention Calibration Module (CACB), and Dual Feature Calibration Block (DFCB) units. In the DFCB, C i u and C i d denote the combination coefficient (CC) learning, which is elaborated in Figure 4.
Mathematics 12 02736 g003
Figure 4. The processes of combination coefficient (CC) learning and enhanced spatial attention (ESA).
Figure 4. The processes of combination coefficient (CC) learning and enhanced spatial attention (ESA).
Mathematics 12 02736 g004
Figure 5. The architecture of the Transformer block (Trans).
Figure 5. The architecture of the Transformer block (Trans).
Mathematics 12 02736 g005
Figure 6. Visual comparisons of MFERN with AWSRN-M [48], CARN [39], CARN-M [39], FDIWN [22], MADNET [40], IMDN [7], and VDSR [15] on Set14 and Urban100 datasets.
Figure 6. Visual comparisons of MFERN with AWSRN-M [48], CARN [39], CARN-M [39], FDIWN [22], MADNET [40], IMDN [7], and VDSR [15] on Set14 and Urban100 datasets.
Mathematics 12 02736 g006
Figure 7. Model execution time study on Set5 dataset ( × 4 ).
Figure 7. Model execution time study on Set5 dataset ( × 4 ).
Mathematics 12 02736 g007
Table 1. Average PSNR/SSIM values for scales × 2 , × 3 , and × 4 on the Set5, Set14, BSD100, Urban100, and Manga109 datasets. The best and second best indexes are highlighted in bold and underlined.
Table 1. Average PSNR/SSIM values for scales × 2 , × 3 , and × 4 on the Set5, Set14, BSD100, Urban100, and Manga109 datasets. The best and second best indexes are highlighted in bold and underlined.
MethodScaleParamsMulti-AddsSet5Set14BSD100Urban100Manga109
PSNR/SSIMPSNR/SSIMPSNR/SSIMPSNR/SSIMPSNR/SSIM
IDN [25] × 2 553K124.6G37.83/0.960033.30/0.914832.08/0.898531.27/0.919638.01/0.9749
CARN [39]1592K222.8G37.76/0.959033.52/0.916632.09/0.897831.92/0.925638.32/0.9765
IMDN [7]694K158.8G38.00/0.960533.63/0.917732.19/0.899632.17/0.928338.88/0.9774
MADNet [40]878K187.1G37.85/0.960033.38/0.916132.04/0.897931.62/0.9233-
DCDN [20]756K-38.01/0.960633.52/0.916632.17/0.899632.16/0.928338.70/0.9773
SMSR [41]985K351.5G38.00/0.960133.64/0.917932.17/0.899032.19/0.928438.76/0.9771
ECBSR [21]596K137.3G37.90/0.961533.34/0.917832.10/0.901831.71/0.9250-
LAPAR-A [42]548K171.0G38.01/0.960533.62/0.918332.19/0.899932.10/0.928338.67/0.9772
HPUN-M [43]492K106.2G38.03/0.960433.60/0.918532.20/0.900032.09/0.928238.83/0.9775
GLADSR [44]812K187.2G37.99/0.960833.63/0.917932.16/0.899632.16/0.9283-
LCRCA [8]813K186.0G38.05/0.960733.65/0.918132.17/0.899432.19/0.9285-
DRSAN-48s [45]650K150.0G38.08/0.960933.62/0.917532.19/0.900232.16/0.9286-
LatticeNet [32]756K169.5G38.06/0.960733.70/0.918732.20/0.899932.25/0.9288-
AFAN-M [46]682K163.4G37.99/0.960533.57/0.917532.14/0.899432.08/0.927738.58/0.9769
FDSCSR-S [47]466K121.8G38.02/0.960633.51/0.917432.18/0.899632.24/0.928838.67/0.9771
MFERN (Ours) 691K173.2G38.05/0.960933.67/0.918132.19/ 0.899832.27/0.929538.86/0.9777
IDN [25] × 3 553K56.3G34.11/0.925329.99/0.835428.95/0.801327.42/0.835932.71/0.9381
CARN [39]1592K118.8G34.29/0.925530.29/0.840729.06/0.849328.06/0.849333.43/0.9427
IMDN [7]703K71.5G34.36/0.927030.32/0.841729.09/0.804628.17/0.851933.61/0.9445
MADNet [40]930K88.4G34.16/0.925330.21/0.839828.98/0.802327.77/0.8439-
DCDN [20]765K-34.41/0.927330.31/0.841729.08/0.804528.17/0.852033.54/0.9441
SMSR [41]993K156.8G34.40/0.927030.33/0.841229.10/0.805028.25/0.853633.68/0.9445
LAPAR-A [42]594K114.0G34.36/0.926730.34/0.842129.11/0.805428.15/0.852333.51/0.9441
HPUN-M [43]500K48.1G34.39/0.926930.33/0.842029.11/0.805228.06/0.850833.54/0.9441
GLADSR [44]821K88.2G34.41/0.927230.37/0.841829.08/0.805028.24/0.8537-
LCRCA [8]822K83.6G34.40/0.926930.36/0.842229.09/0.804928.21/0.8532-
DRSAN-48s [45]750K78.0G34.47/0.927430.35/0.842229.11/0.806028.26/0.8542-
LatticeNet [32]765K76.3G34.40/0.927230.32/0.841629.10/0.804928.19/0.8513-
AFAN-M [46]681K80.8G34.35/0.926330.31/0.842329.06/0.805328.11/0.852233.44/0.9440
FDSCSR-S [47]471K54.6G34.42/0.927433.37/0.842929.10/0.805228.20/0.853233.55/0.9443
MFERN (Ours) 691K76.8G34.43/0.927630.36/0.842229.11/0.805828.27/0.854233.84/0.9460
IDN [25] × 4 553K32.3G31.82/0.890328.25/0.773027.41/0.729725.41/0.763229.41/0.8942
CARN [39]1592K90.9G32.13/0.893728.60/0.780627.58/0.734926.07/0.783730.42/0.9070
IMDN [7]715K40.9G32.20/0.894828.58/0.781127.56/0.735326.04/0.783830.45/0.9075
MADNet [40]1002K54.1G31.95/0.891728.44/0.778027.47/0.732725.76/0.7746-
DCDN [20]777K-32.21/0.894928.57/0.780727.55/0.735626.09/0.785530.41/0.9072
SMSR [41]1006K89.1G32.12/0.893228.55/0.780827.55/0.735126.11/0.786830.54/0.9085
ECBSR [21]603K34.7G31.92/0.894628.34/0.781727.48/0.739325.81/0.7773-
LAPAR-A [42]659K94.0G32.15/0.894428.61/0.781827.61/0.736626.14/0.787130.42/0.9074
HPUN-M [43]511K27.7G32.19/0.894628.61/0.781827.58/0.736426.04/0.785130.49/0.9078
GLADSR [44]826K52.6G32.14/0.894028.62/0.781327.59/0.736126.12/0.7851-
LCRCA [8]834K47.7G32.20/0.894828.60/0.780727.57/0.765326.10/0.7851-
DRSAN-48s [45]730K57.6G32.25/0.894528.55/0.781727.59/0.737426.14/0.7875-
LatticeNet [32]777K43.6G32.18/0.894328.61/0.781227.57/0.735526.14/0.7844-
AFAN-M [46]692K50.9G32.18/0.893928.62/0.782627.58/0.737326.13/0.787630.45/0.9085
FDSCSR-S [47]478K31.1G32.25/0.895928.61/0.782127.58/0.736726.12/0.786630.51/0.9087
MFERN (Ours) 691K43.3G32.25/0.895828.70/0.783727.63/0.738526.31/0.792130.77/0.9113
Table 2. Performance comparison of MRFFM with other basic modules on Manga109 dataset.
Table 2. Performance comparison of MRFFM with other basic modules on Manga109 dataset.
ScaleMethodParamsMulti-AddsPSNR/SSIM
× 4 MFERN + IMDB [7]700.7K12.7G30.27/0.9054
MFERN + RFDB [49]679.6K12G30.52/0.9083
MFERN + MRFFM (Ours)691.4K43.3G30.77/0.9113
Table 3. Study of different units in MRFFM on BSD100 dataset.
Table 3. Study of different units in MRFFM on BSD100 dataset.
ScaleMAFFBAFFBParamsMulti-AddsPSNR/SSIM
× 4 503.4K4.8G27.50/0.7343
681.8K41.1G27.55/0.7363
691.4K43.3G27.63/0.7385
Table 4. Study of different units in MAFFB on Urban100 dataset.
Table 4. Study of different units in MAFFB on Urban100 dataset.
ScaleDFCBSACBCACBParamsMulti-AddsPSNR/SSIM
× 4 540.2K13.1G25.99/0.7815
678.1K40.2G26.27/0.7911
677.5K40.3G26.17/0.7875
691.4K43.3G26.31/0.7921
Table 5. Study of different units in DFCB on Set14 dataset.
Table 5. Study of different units in DFCB on Set14 dataset.
ScaleCCESAParamsMulti-AddsPSNR/SSIM
× 4 686.9K43.3G28.67/0.7834
671.4K42.1G28.60/0.7814
691.4K43.3G28.70/0.7939
Table 6. Study of DC on Urban100 dataset.
Table 6. Study of DC on Urban100 dataset.
ScaleCNN-DCTransformer-DCParamsMulti-AddsPSNR/SSIM
× 4 691.4k43.3G26.18/0.7872
691.4k43.3G26.29/0.7911
691.4k43.3G26.16/0.7869
691.4k43.3G26.37/0.8964
Table 7. Comparison with some Transformer-based methods ( × 4 ).
Table 7. Comparison with some Transformer-based methods ( × 4 ).
MethodParamsMulti-AddsSet5Set14BSD100Urban100Manga109Average
SwinIR [12]897K49.6G32.44/0.897628.77/0.785827.69/0.740626.47/0.798030.92/0.915129.26/0.8274
ESRT [13]751K67.7G32.19/0.894728.69/0.783327.69/0.737926.39/0.796230.75/0.910029.14/0.8244
LBNet [14]742K38.9G32.29/0.896028.68/0.783227.62/0.738226.27/0.790630.76/0.911129.12/0.8238
MFERN (Ours)691K43.3G32.25/0.895828.70/0.783727.63/0.738526.31/0.792130.77/0.911329.15/0.8243
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, C.; Wan, X.; Gao, G. A Multi-Branch Feature Extraction Residual Network for Lightweight Image Super-Resolution. Mathematics 2024, 12, 2736. https://doi.org/10.3390/math12172736

AMA Style

Liu C, Wan X, Gao G. A Multi-Branch Feature Extraction Residual Network for Lightweight Image Super-Resolution. Mathematics. 2024; 12(17):2736. https://doi.org/10.3390/math12172736

Chicago/Turabian Style

Liu, Chunying, Xujie Wan, and Guangwei Gao. 2024. "A Multi-Branch Feature Extraction Residual Network for Lightweight Image Super-Resolution" Mathematics 12, no. 17: 2736. https://doi.org/10.3390/math12172736

APA Style

Liu, C., Wan, X., & Gao, G. (2024). A Multi-Branch Feature Extraction Residual Network for Lightweight Image Super-Resolution. Mathematics, 12(17), 2736. https://doi.org/10.3390/math12172736

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop