MSNet: A Multi-Stream Fusion Network for Remote Sensing Spatiotemporal Fusion Based on Transformer and Convolution
Abstract
:1. Introduction
2. Related Works
- Use the convolutional neural network Extract Net to establish the mapping relationship between the input and the output, to extract a large amount of time information and the spatial details contained in the coarse image and the fine image, and to use receptive fields of different sizes to learn and extract the different-sized input features included in them.
- For the time-varying information that is extracted multiple times, we firstly adopt a weighting strategy to add the information extracted by the Transformer encoder and Extract Net to avoid introducing noise through direct addition, and we then perform the subsequent fusion.
- The results of the two intermediate predictions are quite like the final reconstruction results. The overly complex fusion method introduces new noise to the area that already has noise. We use the average weighting strategy for the final rebuild to prevent noise.
- To verify the capabilities of our model, we tested our method on all three datasets and achieved the best result. Our model is more robust than the method compared in the experiment.
3. Methods
3.1. MSNet Architecture
- Transformer encoder-related operation modules, which are used to extract time-varying features and learn global temporal correlation information.
- Extract Net, which is used to establish a non-linear mapping relationship between input and output, can extract time information and spatial details of MODIS and Landsat at the same time.
- Average weighting, which uses an averaging strategy to fuse two intermediate prediction maps obtained from different a priori moments to obtain the final prediction map.
- First, we subtract from to obtain , which represents the changed area in the time period from to and provides time change information, while provides spatial detail information. After that, we input into the Transformer encoder module and the Extract Net module, respectively, to extract time change information, learn global temporal correlation information, and extract time and space features in MODIS images.
- Secondly, because the size of the MODIS image we input is smaller than the size of the Landsat image, to facilitate the subsequent fusion, we use the bilinear interpolation method for up-sampling, and the extracted feature layer is enlarged sixteen-fold to obtain a feature layer with the same size after Landsat processing. Because some of the information we extract and learn using the two modules overlaps, we use the weighting strategy W to assign a weight to the information extracted by the two modules during fusion. The information extracted by the Transformer encoder gives the weight , and Extract Net is ; we then obtain a fusion of feature layers.
- At the same time, we input into Extract Net to extract spatial detail information, and the obtained feature layer is added with the result obtained in the second step to obtain a feature layer fusion.
- As the number of network layers deepens, the time change information and spatial details in the input image are lost. Inspired by the residual connection of ResNet [26], DensNet [27], and STTFN [23], we added global residual learning to supplement the information that may be lost. We upsample the obtained in the first step with bilinear interpolation and add it to to obtain a residual learning block. Finally, we add the residual learning block and the result obtained in the third step to obtain a prediction result for the fused image based on time to time .
3.2. Transformer Encoder
3.3. Extract Net
3.4. Average Weighting
3.5. Loss Function
4. Experiment
4.1. Datasets
4.2. Evaluation
4.3. Parameter Setting
4.4. Results and Analysis
4.4.1. Subjective Evaluation
4.4.2. Objective Evaluation
5. Discussion
6. Conclusions
- The Transformer encoder module is used to learn global time change information. While extracting local features, it uses the self-attention mechanism and the embedding of position information to learn the relationship between local and global information, which is different from the effect of only using the convolution operation. In the end, our method achieves the desired result.
- We set up Extract Net with different convolution kernel sizes to extract the features contained in inputs of different sizes. The larger the convolution kernel, the larger the receptive field. When a larger-sized Landsat image is extracted, a large receptive field can obtain more learning content and achieve better learning results. At the same time, a small receptive field can better match the size of our time-varying information.
- For the repeated extraction of information, we added a weighting strategy when we fused the feature layer obtained and reconstructed the result from the intermediate prediction results to eliminate the noise introduced by the repeated information in the fusion process.
- When we established the complex nonlinear mapping relationship between the input and the final fusion result, we added a global residual connection for learning, thereby supplementing some of the details lost in the training process.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Justice, C.O.; Vermote, E.; Townshend, J.R.; Defries, R.; Roy, D.P.; Hall, D.K.; Salomonson, V.V.; Privette, J.L.; Riggs, G.; Strahler, A.; et al. The Moderate Resolution Imaging Spectroradiometer (MODIS): Land remote sensing for global change research. IEEE Trans. Geosci. Remote Sens. 1998, 36, 1228–1249. [Google Scholar] [CrossRef] [Green Version]
- Lin, C.; Li, Y.; Yuan, Z.; Lau, A.K.; Li, C.; Fung, J.C. Using satellite remote sensing data to estimate the high-resolution distribution of ground-level PM2.5. Remote Sens. Environ. 2015, 156, 117–128. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, Q.; Du, B.; Huang, X.; Tang, Y.Y.; Tao, D. Simultaneous spectral-spatial feature selection and extraction for hyperspectral images. IEEE Trans. Cybern. 2016, 48, 16–28. [Google Scholar] [CrossRef] [Green Version]
- Yu, Q.; Gong, P.; Clinton, N.; Biging, G.; Kelly, M.; Schirokauer, D. Object-based detailed vegetation classification with airborne high spatial resolution remote sensing imagery. Photogramm. Eng. Remote Sens. 2006, 72, 799–811. [Google Scholar] [CrossRef] [Green Version]
- White, M.A.; Nemani, R.R. Real-time monitoring and short-term forecasting of land surface phenology. Remote Sens. Environ. 2006, 104, 43–49. [Google Scholar] [CrossRef]
- Hansen, M.C.; Loveland, T.R. A review of large area monitoring of land cover change using Landsat data. Remote Sens. Environ. 2012, 122, 66–74. [Google Scholar] [CrossRef]
- Gao, F.; Masek, J.; Schwaller, M.; Hall, F. On the blending of the Landsat and MODIS surface reflectance: Predicting daily Landsat surface reflectance. IEEE Trans. Geosci. Remote Sens. 2006, 44, 2207–2218. [Google Scholar] [CrossRef]
- Hilker, T.; Wulder, M.A.; Coops, N.C.; Seitz, N.; White, J.C.; Gao, F.; Masek, J.G.; Stenhouse, G. Generation of dense time series synthetic Landsat data through data blending with MODIS using a spatial and temporal adaptive reflectance fusion model. Remote Sens. Environ. 2009, 113, 1988–1999. [Google Scholar] [CrossRef]
- Zhu, X.; Chen, J.; Gao, F.; Chen, X.; Masek, J.G. An enhanced spatial and temporal adaptive reflectance fusion model for complex heterogeneous regions. Remote Sens. Environ. 2010, 114, 2610–2623. [Google Scholar] [CrossRef]
- Hilker, T.; Wulder, M.A.; Coops, N.C.; Linke, J.; McDermid, G.; Masek, J.G.; Gao, F.; White, J.C. A new data fusion model for high spatial-and temporal-resolution mapping of forest disturbance based on Landsat and MODIS. Remote Sens. Environ. 2009, 113, 1613–1627. [Google Scholar] [CrossRef]
- Zhukov, B.; Oertel, D.; Lanzl, F.; Reinhackel, G. Unmixing-based multisensor multiresolution image fusion. IEEE Trans. Geosci. Remote Sens. 1999, 37, 1212–1226. [Google Scholar] [CrossRef]
- Wu, M.; Niu, Z.; Wang, C.; Wu, C.; Wang, L. Use of MODIS and Landsat time series data to generate high-resolution temporal synthetic Landsat data using a spatial and temporal reflectance fusion model. J. Appl. Remote Sens. 2012, 6, 063507. [Google Scholar] [CrossRef]
- Zhu, X.; Helmer, E.H.; Gao, F.; Liu, D.; Chen, J.; Lefsky, M.A. A flexible spatiotemporal method for fusing satellite images with different resolutions. Remote Sens. Environ. 2016, 172, 165–177. [Google Scholar] [CrossRef]
- Huang, B.; Song, H. Spatiotemporal reflectance fusion via sparse representation. IEEE Trans. Geosci. Remote Sens. 2012, 50, 3707–3716. [Google Scholar] [CrossRef]
- Belgiu, M.; Stein, A. Spatiotemporal image fusion in remote sensing. Remote Sens. 2019, 11, 818. [Google Scholar] [CrossRef] [Green Version]
- Wei, J.; Wang, L.; Liu, P.; Chen, X.; Li, W.; Zomaya, A.Y. Spatiotemporal fusion of MODIS and Landsat-7 reflectance images via compressed sensing. IEEE Trans. Geosci. Remote Sens. 2017, 55, 7126–7139. [Google Scholar] [CrossRef]
- Liu, X.; Deng, C.; Wang, S.; Huang, G.-B.; Zhao, B.; Lauren, P. Fast and accurate spatiotemporal fusion based upon extreme learning machine. IEEE Geosci. Remote Sens. Lett. 2016, 13, 2039–2043. [Google Scholar] [CrossRef]
- Song, H.; Liu, Q.; Wang, G.; Hang, R.; Huang, B. Spatiotemporal satellite image fusion using deep convolutional neural networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 821–829. [Google Scholar] [CrossRef]
- Liu, X.; Deng, C.; Chanussot, J.; Hong, D.; Zhao, B. StfNet: A two-stream convolutional neural network for spatiotemporal image fusion. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6552–6564. [Google Scholar] [CrossRef]
- Tan, Z.; Yue, P.; Di, L.; Tang, J. Deriving high spatiotemporal remote sensing images using deep convolutional network. Remote Sens. 2018, 10, 1066. [Google Scholar] [CrossRef] [Green Version]
- Tan, Z.; Di, L.; Zhang, M.; Guo, L.; Gao, M. An enhanced deep convolutional model for spatiotemporal image fusion. Remote Sens. 2019, 11, 2898. [Google Scholar] [CrossRef] [Green Version]
- Chen, J.; Wang, L.; Feng, R.; Liu, P.; Han, W.; Chen, X. CycleGAN-STF: Spatiotemporal fusion via CycleGAN-based image generation. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5851–5865. [Google Scholar] [CrossRef]
- Yin, Z.; Wu, P.; Foody, G.M.; Wu, Y.; Liu, Z.; Du, Y.; Ling, F. Spatiotemporal fusion of land surface temperature based on a convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2020, 59, 1808–1822. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16 × 16 words: Transformers for image recognition at scale. In Proceedings of the ICLR 2021, Virtual Conference, Formerly, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- Huber, P.J. Robust estimation of a location parameter. In Breakthroughs in Statistics; Springer: Berlin/Heidelberg, Germany, 1992; pp. 492–518. [Google Scholar]
- Emelyanova, I.V.; McVicar, T.R.; Van Niel, T.G.; Li, L.T.; Van Dijk, A.I. Assessing the accuracy of blending Landsat–MODIS surface reflectances in two landscapes with contrasting spatial and temporal dynamics: A framework for algorithm selection. Remote Sens. Environ. 2013, 133, 193–209. [Google Scholar] [CrossRef]
- Li, Y.; Li, J.; He, L.; Chen, J.; Plaza, A. A new sensor bias-driven spatio-temporal fusion model based on convolutional neural networks. Sci. China Inf. Sci. 2020, 63, 140302. [Google Scholar] [CrossRef] [Green Version]
- Li, J.; Li, Y.; He, L.; Chen, J.; Plaza, A. Spatio-temporal fusion for remote sensing data: An overview and new benchmark. Sci. China Inf. Sci. 2020, 63, 140301. [Google Scholar] [CrossRef] [Green Version]
- Yuhas, R.H.; Goetz, A.F.; Boardman, J.W. Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm. In Proceedings of the Summaries 3rd Annual JPL Airborne Earth Science Workshop, Pasadena, CA, USA, 1–5 June 1992; pp. 147–149. [Google Scholar]
- Khan, M.M.; Alparone, L.; Chanussot, J. Pansharpening quality assessment using the modulation transfer functions of instruments. IEEE Trans. Geosci. Remote Sens. 2009, 47, 3880–3891. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ponomarenko, N.; Ieremeiev, O.; Lukin, V.; Egiazarian, K.; Carli, M. Modified image visual quality metrics for contrast change and mean shift accounting. In Proceedings of the 2011 11th International Conference the Experience of Designing and Application of CAD Systems in Microelectronics (CADSM), Polyana-Svalyava, Ukraine, 23–25 February 2011; pp. 305–311. [Google Scholar]
Method | Strengths | Weaknesses | |
---|---|---|---|
Reconstruction-based | STARFM [8] | small amount of input data | large amount of calculation poor reconstruction effect of some heterogeneous regions |
ESTARFM [9] | different coefficients to process the weight | large amount of calculation | |
STAARCH [10] | map reflection changes | insufficient feature extraction | |
Unmixing-based | UMMF [11] | spectrally unmixed spectrally reset | large amount of calculation |
STDFA [12] | the similarity of nonlinear time changes and spatial changes in spectral unmixing | tedious training process | |
FSDAF [13] | small amount of calculation fast speed high accuracy | insufficient feature extraction | |
Dictionary-based learning | SPSTFM [14] | sparse representation introduce the idea of super-resolution | limitations of the same coefficients of the low-resolution images and the high-resolution images |
CSSF [15] | the explicit mapping between low-high resolution images high accuracy compressed sensing theory | huge amount of calculation | |
ELM-FM [17] | less time high efficiency | insufficient feature extraction | |
Deep learning-based | STFDCNN [18] | super-resolution non-linear mapping | tedious training process |
StfNet [19] | spatial consistency temporal dependence | loss of spatial detail | |
DCSTFN [20] | small amount of input data | information loss for deconvolution | |
EDCSTFN [21] | residual coding block composite loss function | large amount of calculation tedious training process | |
CycleGAN-STF [22] | the introduction of CycleGAN combination with FSDAF | large amount of calculation tedious training process | |
STTFN [23] | weighting strategy with spatiotemporal continuity | insufficient generalization ability | |
Proposed | concise training process extract features multiple times receptive fields of different sizes global temporal correlation high accuracy | large amount of calculation |
Evaluation | Band | Method | ||||
---|---|---|---|---|---|---|
FSDAF [13] | STARFM [8] | STFDCNN [18] | StfNet [19] | Proposed | ||
SAM | all | 0.23875 | 0.23556 | 0.21402 | 0.21614 | 0.19209 |
ERGAS | all | 3.35044 | 3.31676 | 3.14461 | 3.00404 | 2.94471 |
RMSE | band1 | 0.01365 | 0.01306 | 0.01076 | 0.00956 | 0.01009 |
band2 | 0.01415 | 0.01366 | 0.01236 | 0.01271 | 0.01132 | |
band3 | 0.02075 | 0.02055 | 0.01792 | 0.02121 | 0.01724 | |
band4 | 0.04619 | 0.04899 | 0.04100 | 0.05001 | 0.03669 | |
band5 | 0.06031 | 0.06153 | 0.05900 | 0.05302 | 0.04898 | |
band6 | 0.05322 | 0.05278 | 0.05389 | 0.04500 | 0.04325 | |
avg | 0.03471 | 0.03509 | 0.03249 | 0.03192 | 0.02793 | |
SSIM | band1 | 0.90147 | 0.91699 | 0.95517 | 0.94190 | 0.95050 |
band2 | 0.91899 | 0.92325 | 0.93812 | 0.94340 | 0.95149 | |
band3 | 0.85786 | 0.86290 | 0.87329 | 0.89950 | 0.91156 | |
band4 | 0.76070 | 0.74636 | 0.78318 | 0.84868 | 0.86248 | |
band5 | 0.66598 | 0.66011 | 0.72789 | 0.74118 | 0.76460 | |
band6 | 0.66168 | 0.66323 | 0.73555 | 0.74068 | 0.76257 | |
avg | 0.79445 | 0.79548 | 0.83553 | 0.85256 | 0.86720 | |
PSNR | band1 | 37.29537 | 37.68327 | 39.36680 | 40.38939 | 39.92510 |
band2 | 36.98507 | 37.29114 | 38.16128 | 37.91972 | 38.92643 | |
band3 | 33.65821 | 33.74247 | 34.93560 | 33.46842 | 35.27141 | |
band4 | 26.70854 | 26.19858 | 27.74355 | 26.01829 | 28.70879 | |
band5 | 24.39249 | 24.21822 | 24.58366 | 25.51175 | 26.19920 | |
band6 | 25.47784 | 25.55050 | 25.37055 | 26.93525 | 27.28095 | |
avg | 30.75292 | 30.78070 | 31.69357 | 31.70714 | 32.71865 | |
CC | band1 | 0.80138 | 0.79845 | 0.84521 | 0.83428 | 0.84448 |
band2 | 0.79873 | 0.79319 | 0.83720 | 0.83156 | 0.84929 | |
band3 | 0.83290 | 0.82554 | 0.87373 | 0.87264 | 0.87787 | |
band4 | 0.88511 | 0.86697 | 0.91181 | 0.90546 | 0.92743 | |
band5 | 0.76395 | 0.74894 | 0.78783 | 0.84732 | 0.84784 | |
band6 | 0.76036 | 0.75144 | 0.76502 | 0.84588 | 0.83826 | |
avg | 0.80707 | 0.79742 | 0.83680 | 0.85619 | 0.86420 |
Evaluation | Band | Method | ||||
---|---|---|---|---|---|---|
FSDAF [13] | STARFM [8] | STFDCNN [18] | StfNet [19] | Proposed | ||
SAM | all | 0.08411 | 0.08601 | 0.06792 | 0.09284 | 0.06335 |
ERGAS | all | 1.93861 | 1.92273 | 1.80392 | 2.03970 | 1.68639 |
RMSE | band1 | 0.00763 | 0.00729 | 0.00719 | 0.00824 | 0.00585 |
band2 | 0.00913 | 0.00907 | 0.00843 | 0.01167 | 0.00712 | |
band3 | 0.01279 | 0.01256 | 0.01151 | 0.01353 | 0.00969 | |
band4 | 0.02383 | 0.02295 | 0.02102 | 0.02971 | 0.01864 | |
band5 | 0.02830 | 0.02607 | 0.02251 | 0.02284 | 0.02159 | |
band6 | 0.02197 | 0.02181 | 0.01673 | 0.02054 | 0.01425 | |
avg | 0.01727 | 0.01662 | 0.01457 | 0.01775 | 0.01286 | |
SSIM | band1 | 0.97422 | 0.97355 | 0.98460 | 0.97464 | 0.98558 |
band2 | 0.96698 | 0.96495 | 0.98209 | 0.96062 | 0.98031 | |
band3 | 0.94456 | 0.94152 | 0.97475 | 0.94162 | 0.96954 | |
band4 | 0.92411 | 0.91759 | 0.96417 | 0.91455 | 0.96393 | |
band5 | 0.89418 | 0.88558 | 0.95539 | 0.91215 | 0.95239 | |
band6 | 0.88485 | 0.87789 | 0.95259 | 0.90154 | 0.95087 | |
avg | 0.93148 | 0.92684 | 0.96893 | 0.93419 | 0.96710 | |
PSNR | band1 | 42.35483 | 42.73997 | 42.86245 | 41.68016 | 44.65345 |
band2 | 40.79034 | 40.85222 | 41.48586 | 38.65611 | 42.95050 | |
band3 | 37.86428 | 38.02099 | 38.77733 | 37.37629 | 40.27059 | |
band4 | 32.45760 | 32.78532 | 33.54859 | 30.54336 | 34.59058 | |
band5 | 30.96416 | 31.67671 | 32.95179 | 32.82613 | 33.31671 | |
band6 | 33.16535 | 33.22812 | 35.52920 | 33.74927 | 36.92082 | |
avg | 36.26610 | 36.55056 | 37.52587 | 35.80522 | 38.78378 | |
CC | band1 | 0.93627 | 0.92935 | 0.94611 | 0.94664 | 0.96138 |
band2 | 0.93186 | 0.92880 | 0.94530 | 0.93566 | 0.95800 | |
band3 | 0.93549 | 0.93516 | 0.95262 | 0.95539 | 0.96499 | |
band4 | 0.96360 | 0.96287 | 0.97181 | 0.96125 | 0.97591 | |
band5 | 0.95527 | 0.95222 | 0.97545 | 0.97048 | 0.97890 | |
band6 | 0.95313 | 0.95214 | 0.97285 | 0.97164 | 0.97924 | |
avg | 0.94594 | 0.94342 | 0.96069 | 0.95684 | 0.96974 |
Evaluation | Band | Method | ||||
---|---|---|---|---|---|---|
FSDAF [13] | STARFM [8] | STFDCNN [18] | StfNet [19] | Proposed | ||
SAM | all | 0.16991 | 0.29277 | 0.18583 | 0.25117 | 0.14677 |
ERGAS | all | 2.80156 | 4.46147 | 4.25224 | 3.86535 | 2.90661 |
RMSE | band1 | 0.00039 | 0.00251 | 0.00096 | 0.00112 | 0.00047 |
band2 | 0.00044 | 0.00235 | 0.00092 | 0.00081 | 0.00051 | |
band3 | 0.00067 | 0.00358 | 0.00117 | 0.00118 | 0.00064 | |
band4 | 0.00109 | 0.00590 | 0.00124 | 0.00201 | 0.00103 | |
band5 | 0.00126 | 0.00408 | 0.00183 | 0.00177 | 0.00122 | |
band6 | 0.00136 | 0.00263 | 0.00200 | 0.00198 | 0.00126 | |
avg | 0.00087 | 0.00351 | 0.00135 | 0.00148 | 0.00085 | |
SSIM | band1 | 0.99895 | 0.96538 | 0.99205 | 0.98927 | 0.99822 |
band2 | 0.99877 | 0.96977 | 0.99293 | 0.99500 | 0.99805 | |
band3 | 0.99741 | 0.93438 | 0.98947 | 0.98965 | 0.99740 | |
band4 | 0.99616 | 0.92038 | 0.99419 | 0.98248 | 0.99631 | |
band5 | 0.99382 | 0.94190 | 0.98371 | 0.98464 | 0.99388 | |
band6 | 0.99129 | 0.96825 | 0.97625 | 0.97636 | 0.99226 | |
avg | 0.99607 | 0.95001 | 0.98810 | 0.98623 | 0.99602 | |
PSNR | band1 | 68.18177 | 52.01008 | 60.34502 | 59.00582 | 66.48249 |
band2 | 67.04371 | 52.56484 | 60.68929 | 61.83160 | 65.80339 | |
band3 | 63.49068 | 48.93197 | 58.63694 | 58.55977 | 63.88021 | |
band4 | 59.22553 | 44.58211 | 58.13169 | 53.95486 | 59.77506 | |
band5 | 58.02282 | 47.79106 | 54.74701 | 55.05539 | 58.28599 | |
band6 | 57.35352 | 51.60634 | 53.96601 | 54.06602 | 58.02322 | |
avg | 62.21967 | 49.58107 | 57.75266 | 57.07891 | 62.04173 | |
CC | band1 | 0.84000 | 0.71181 | 0.80368 | 0.49726 | 0.86845 |
band2 | 0.85657 | 0.74545 | 0.86845 | 0.38062 | 0.89114 | |
band3 | 0.84979 | 0.81230 | 0.83576 | 0.27147 | 0.88345 | |
band4 | 0.53986 | 0.34009 | 0.58944 | 0.37556 | 0.60303 | |
band5 | 0.79576 | 0.76553 | 0.83580 | 0.62926 | 0.85320 | |
band6 | 0.80288 | 0.76492 | 0.80338 | 0.61085 | 0.85154 | |
avg | 0.78081 | 0.69002 | 0.78942 | 0.46083 | 0.82514 |
Datasets | Depth | SAM | ERGAS | RMSE | SSIM | PSNR | CC |
---|---|---|---|---|---|---|---|
CIA | 0 | 0.19561 | 3.04052 | 0.02869 | 0.86004 | 32.47545 | 0.86223 |
5 | 0.19209 | 2.94471 | 0.02793 | 0.86720 | 32.71865 | 0.86420 | |
10 | 0.19799 | 3.02700 | 0.02879 | 0.86082 | 32.42599 | 0.85658 | |
15 | 0.19881 | 3.00321 | 0.02875 | 0.86301 | 32.41008 | 0.85536 | |
20 | 0.20009 | 2.96461 | 0.02883 | 0.85894 | 32.49450 | 0.85069 | |
LGC | 0 | 0.06397 | 1.70489 | 0.01282 | 0.96487 | 38.73680 | 0.96823 |
5 | 0.06335 | 1.68639 | 0.01286 | 0.96710 | 38.78378 | 0.96974 | |
10 | 0.06706 | 1.72469 | 0.01333 | 0.96557 | 38.47550 | 0.96499 | |
15 | 0.06797 | 1.76038 | 0.01350 | 0.96639 | 38.11193 | 0.96243 | |
20 | 0.06675 | 1.74121 | 0.01340 | 0.96665 | 38.21371 | 0.96467 | |
AHB | 0 | 0.14679 | 3.10283 | 0.00089 | 0.99558 | 61.37776 | 0.82436 |
5 | 0.14836 | 3.13212 | 0.00094 | 0.99506 | 61.07734 | 0.82328 | |
10 | 0.15712 | 3.21723 | 0.00097 | 0.99472 | 60.71349 | 0.80382 | |
15 | 0.15101 | 3.12097 | 0.00093 | 0.99522 | 61.01998 | 0.81928 | |
20 | 0.14677 | 2.90661 | 0.00085 | 0.99602 | 62.04173 | 0.82514 |
Datasets | Size | SAM | ERGAS | RMSE | SSIM | PSNR | CC |
---|---|---|---|---|---|---|---|
CIA | 3 × 3, 3 × 3 | 0.20058 | 2.97356 | 0.02900 | 0.85794 | 32.41488 | 0.84958 |
5 × 5, 5 × 5 | 0.20175 | 2.96925 | 0.02894 | 0.86145 | 32.35826 | 0.84900 | |
3 × 3, 5 × 5 | 0.19209 | 2.94471 | 0.02793 | 0.86720 | 32.71865 | 0.86420 | |
LGC | 3 × 3, 3 × 3 | 0.07320 | 1.82552 | 0.01497 | 0.96084 | 37.44287 | 0.95675 |
5 × 5, 5 × 5 | 0.07127 | 1.83538 | 0.01443 | 0.96422 | 37.36025 | 0.95671 | |
3 × 3, 5 × 5 | 0.06335 | 1.68639 | 0.01286 | 0.96710 | 38.78378 | 0.96974 | |
AHB | 3 × 3, 3 × 3 | 0.19604 | 3.55414 | 0.00113 | 0.9934 | 59.58018 | 0.73684 |
5 × 5, 5 × 5 | 0.21244 | 3.57996 | 0.00125 | 0.99244 | 59.00303 | 0.67548 | |
3 × 3, 5 × 5 | 0.14677 | 2.90661 | 0.00085 | 0.99602 | 62.04173 | 0.82514 |
Datasets | Weighting | SAM | ERGAS | RMSE | SSIM | PSNR | CC |
---|---|---|---|---|---|---|---|
CIA | TC | 0.20046 | 3.02157 | 0.02917 | 0.85953 | 32.29681 | 0.85509 |
Avg | 0.19209 | 2.94471 | 0.02793 | 0.86720 | 32.71865 | 0.86420 | |
LGC | TC | 0.07129 | 1.79503 | 0.01444 | 0.96404 | 37.73562 | 0.95589 |
Avg | 0.06335 | 1.68639 | 0.01286 | 0.96710 | 38.78378 | 0.96974 | |
AHB | TC | 0.16715 | 3.20845 | 0.00097 | 0.99464 | 60.64426 | 0.77796 |
Avg | 0.14677 | 2.90661 | 0.00085 | 0.99602 | 62.04173 | 0.82514 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, W.; Cao, D.; Peng, Y.; Yang, C. MSNet: A Multi-Stream Fusion Network for Remote Sensing Spatiotemporal Fusion Based on Transformer and Convolution. Remote Sens. 2021, 13, 3724. https://doi.org/10.3390/rs13183724
Li W, Cao D, Peng Y, Yang C. MSNet: A Multi-Stream Fusion Network for Remote Sensing Spatiotemporal Fusion Based on Transformer and Convolution. Remote Sensing. 2021; 13(18):3724. https://doi.org/10.3390/rs13183724
Chicago/Turabian StyleLi, Weisheng, Dongwen Cao, Yidong Peng, and Chao Yang. 2021. "MSNet: A Multi-Stream Fusion Network for Remote Sensing Spatiotemporal Fusion Based on Transformer and Convolution" Remote Sensing 13, no. 18: 3724. https://doi.org/10.3390/rs13183724
APA StyleLi, W., Cao, D., Peng, Y., & Yang, C. (2021). MSNet: A Multi-Stream Fusion Network for Remote Sensing Spatiotemporal Fusion Based on Transformer and Convolution. Remote Sensing, 13(18), 3724. https://doi.org/10.3390/rs13183724