Next Article in Journal
Automatic Identification of Liquefaction Induced by 2021 Maduo Mw7.3 Earthquake Based on Machine Learning Methods
Next Article in Special Issue
Remote Sensing Image Super-Resolution via Residual-Dense Hybrid Attention Network
Previous Article in Journal
Parametric Study to Evaluate the Geometry and Coupling Effect on the Efficiency of a Novel FMM Tool Embedded in Cover Concrete for Corrosion Monitoring
Previous Article in Special Issue
RACDNet: Resolution- and Alignment-Aware Change Detection Network for Optical Remote Sensing Imagery
 
 
Article
Peer-Review Record

Time-Series FY4A Datasets for Super-Resolution Benchmarking of Meteorological Satellite Images

Remote Sens. 2022, 14(21), 5594; https://doi.org/10.3390/rs14215594
by Jingbo Wei 1,2, Chenghao Zhou 1, Jingsong Wang 3 and Zhou Chen 1,2,*
Reviewer 1:
Reviewer 2:
Reviewer 3:
Remote Sens. 2022, 14(21), 5594; https://doi.org/10.3390/rs14215594
Submission received: 27 August 2022 / Revised: 26 October 2022 / Accepted: 31 October 2022 / Published: 6 November 2022

Round 1

Reviewer 1 Report

1. The main contribution of this paper is the production of Super-Resolution Benchmarking of Meteorological Satellite Images. However, this dataset is not publicly available, which would reduce the impact of the research results and prevent other researchers from verifying the validity of the dataset.

2. The article emphasizes that this dataset is characterized by temporal continuity, yet the experimental results fail to show how the temporal continuity of the dataset affects the accuracy of the algorithm.

 

3. In the experiments, did the different algorithms use the same parameters in the training on both datasets? Does this have an impact on the experimental results?

Author Response

Response

  1. Thank you for your concern. The new datasets are posted on https://github.com/isstncu/fy4a for free download. In the revised version, the URL is given at the end of abstract and the beginning of the experimental scheme section.
  2. Thank you for your concern. A discussion section is added to account for the temporal continuity where two experiments are explored, namely sequence super-resolution and spatiotemporal fusion. These outcomes may uncover the possibility of the proposed datasets for time correlation reconstruction.

5.1. Sequence Super-resolution

We constructed a training set and a test set by considering the sequence images as a video, and performed a test of video super-resolution. To construct the training set, 40 various locations were selected. Eighty-four pairs of temporally consecutive patches were extracted from each location, which were divided into 12 groups in time order. Each group contains 7 pairs of temporally contiguous patches as a video clip for reconstruction. The patch sizes of each pair are  and , cropped from the 1km and 4km images, respectively. After removing the groups with excessive darkness, 347 valid groups of video clips were finally obtained out of 480 groups of sequential images for training.

Similar to the training set, the test set was constructed to contain 10 groups of sequential patches at 10 various locations. These 10 locations are included in the 40 locations of the training set. Each group contains 10 pairs of temporally consecutive patches. The patch sizes of each pair are  and , cropped from the 1km and 4km images, respectively. The 100 images used in the test set are beyond the training set.

The Zooming Slow-Mo algorithm is chosen to perform the sequence super-resolution, and its results are evaluated on the 100 test images. Only with the pre-trained model, the average PSNR is 28.2371 dB. With the pre-trained model as the initial value and the constructed dataset for re-training, the average PSNR is 29.2174 dB. When trained only with constructed training set where the pre-trained model is not used, the average PSNR is 30.1253 dB. By comparing these scores with that in Table 3 and Table 5, it is concluded that the gap between our FY4ASRcolor dataset and commonly used video sequence datasets is huge. To reconstruct sequence remote sensing images, both the a priori image structure and the sequence change pattern have to be learned, such that matching datasets is getting difficult.

5.2. Spatiotemporal Fusion

Spatiotemporal fusion is a solution to enhance the temporal resolution of high spatial resolution satellites by exploiting the complementarity of spatial and temporal resolutions between satellite images of different sources. Typical studies are carried out between MODIS and Landsat satellites, which have revisit periods of 1 and 16 days, respectively. A typical spatiotemporal fusion needs three reference images. Assuming that MODIS captures images at moments  and  while Landsat took an image only at moment , spatiotemporal fusion algorithms try to predict the Landsat images at moment  with the three known images.

The FY4ASRcolor dataset is ideal for conducting spatiotemporal fusion studies. Different from MODIS and Landsat, the two known images in the FY4ASRcolor dataset were taken at the exactly same time. They also have the same sensor response, which eliminates the fatal sensor discrepancy issue in fusing MODIS and Landsat. A similar work was carried out by us for the spatiotemporal-spectral fusion of the Gaofen-1 images, but it has only a 2-fold difference in spatial resolution. The use of the FY4ASR dataset for spatiotemporal fusion may provide new fundamental data support for this research topic.

We try to use two methods for spatiotemporal fusion, namely FSDAF and SSTSTF. FSDAF is a classical algorithm, while SSTSTF is one of the latest algorithms based on neural networks. SSTSTF requires large amount of data for training, otherwise the performance is not as good as FSDAF. however, FSDAF fails in our test because it cannot give legible images. The changing sunshine intensities lead to a huge variation in the reflection of the features, which may exceed the temporal difference tolerance that FSDAF can reconstruct for surface reflectance. In contrast, SSTSTF can accomplish the reconstruction successfully.

For SSTSTF, paired images from 12 moments were used to construct the dataset. Each high-resolution image has a size of . Images from 9 moments were used for training, and they formed 8 groups. The test used images from 3 other moments, two of which were set as the prediction time. The reconstruction PSNRs are 32.9605 dB for 6:30 and 36.8904 dB for 11:30, respectively, after removing the dark areas from the reconstructed images as it may elevate PSNR values unfairly. The results show that the reconstruction quality of spatiotemporal fusion is weakly less than that of single-image super-resolution. Considering that the amount of training data is far smaller than that used for training of single-image super-resolution, spatiotemporal fusion algorithms need to be carefully designed to adapt to this new dataset.

  1. Thank you for your concern. We used the originally suggested parameters for each model when it is possible.

Two sets of tests were performed for each dataset: using the pre-trained model, and using the re-trained model. The pre-trained models were trained on the DIV2K dataset. The re-trained parameters were trained using our FY4A dataset, where the initial values of the models were derived from the results of pre-training on DIV2K. The parameters of each model are kept consistent across the training process, except that the batch size is reduced in the retraining phase. The batch size is reduced because the GPU used for training is an NVIDIA 2080Ti graphic card, which has a memory size of 11 GB. A small batch size may lead to deviation from the optimal model parameters, but the effect is weak because models have been trained to approach the optimal values.

Author Response File: Author Response.pdf

Reviewer 2 Report

Time-series FY4A data sets for super-resolution benchmarking of Meteorological Satellite Images

 

1. The abstract should have short information about the article, the purpose, and the results. The abstract is intended to help people decide whether or not to read the entire academic paper. I recommend to delete information in lines 5-8. It is an application for FY4ASRgray and color images. This information should be in the data chapter.

2. The structure of the article is appropriate and clear. All information is presented logically and comprehensibly.

3. I have only one recommendation: to add more references if possible.

Author Response

Response

  1. Thank you for your advice. Following your advice, the abstract is rewritten.

Meteorological satellites are usually operated at high temporal resolutions, but the spatial resolutions are too poor to identify ground content. Super-resolution is an economic way to enhance spatial details, but the feasibility is not validated for meteorological images due to the absence of benchmarking data. In this work, we propose the FY4ASRgray and FY4ASRcolor datasets to assess super-resolution algorithms on meteorological applications. The features of cloud sensitivity and temporal continuity are linked to the proposed datasets. To test the usability of the new datasets, five state-of-the-art super-resolution algorithms are gathered for contest. Shift learning is used to shorten the training time and improve the parameters. Methods are modified to deal with the 16-bit challenge. The reconstruction results are demonstrated and evaluated regarding the radiometric, structural, and spectral loss, which gives the baseline performance for detail enhancement of the FY4A satellite images. Additional experiments are made on FY4ASRcolor for sequence super-resolution, spatiotemporal fusion, and generalization test for further performance test. The FY4A datasets can be downloaded from github.com/isstncu/fy4a.

  1. Thank you for your approval.
  2. Thank you for your advice. Five new references are added which are associated with sequence reconstruction and remote sensing deep learning.

Author Response File: Author Response.pdf

Reviewer 3 Report

 

This paper provides a FY4A dataset, which is a fascinating topic which deserves more exposure. I have a few comments and suggestions which hopefully would help perfecting the manuscript (see comments to authors).

1.       Please explain what is the different of Fengyun-4A and FY-4A in keywords.

2.       Section 2 should add a flow chart to explain the process of making the dataset

3.       Whether the weight obtained from the dataset can be directly used on other datasets, which should be verified and discussed.

4.       In the introduction, the development of deep learning should be carefully introduced. Especially some latest remote sensing depth learning models (

https://www.sciencedirect.com/science/article/abs/pii/S0924271622001927

https://www.sciencedirect.com/science/article/pii/S1569843222001376;

10.1109/TGRS.2022.3200872;

10.1109/TGRS.2020.3001401;

10.1109/TGRS.2022.3151901;

https://doi.org/10.1016/j.isprsjprs.2022.09.004;

https://www.sciencedirect.com/science/article/pii/S1569843222001789

10.1109/JSTARS.2020.2991391)

5.       The difference effects of different methods should be highlighted in Figures 12 to 15.

6.       The quality of the dataset should be discussed in detailed.

Author Response

Response

  1. We are sorry for the mistake. Fengyun-4A is the full name and it is abbreviated as FY4A. In the previous manuscript, both FY4A and FY-4A were used, which was our mistake. In the revised version, FY-4A has been replaced with FY4A.
  2. Thank you for your advice. Figure 12 is added to account for the processing steps of the datasets, including radiometric correction, geometric correction, band selection, saturation stretch, and so on.
  3. Thank you for your advice. The test on generalization of trained models across datasets is added at the last part of the discussion section (section 5.3).

In order to evaluate the generalization of the model trained based on the FY4ASRcolor dataset for remote sensing, it is planned to apply the trained models to other datasets. Unfortunately, the existing studies use images with ultra-high resolutions, which prevents us from finding matching application scenarios. Finally, the datasets used in [33] were tested. Two datasets were involved in the experiment, the 0.3m UC Merced dataset and the 30m to 0.2m NWPU-RESISC45 dataset. The tested images are denseresident191 in the UC Merced dataset and railwaystation565 in the NWPU-RESISC45 dataset. The scores of PSNR evaluation are listed in Table 8. The results from pre-training models are close to the values in [33]. However, re-training on FY4ASRcolor leads to a substantial decrease in reconstructing high-resolution remote sensing images. This convinces us again that the characteristics of our data are quite different from other datasets. This conclusion is easily concluded, as meteorological satellite images have to lose spatial resolution to ensure high temporal resolution. Knowledge of structural details cannot be learned from low-resolution images to reconstruct complex structures of high-resolution images. On the contrary, the temporal repetition and spectral features play much greater roles in the reconstruction process.

  1. Thank you for your advice. Three references are added at the end of the third paragraph.

[4] An omni-scale global-local aware network for shadow extraction in remote sensing imagery.

[5] Damaged Building Detection From Post-Earthquake Remote Sensing Imagery Considering Heterogeneity Characteristics.

[6] Clustering Feature Constraint Multiscale Attention Network for Shadow Extraction From Remote Sensing Images.

  1. Thank you for your advice. The best scores in all evaluation tables are bolded now.
  2. Thank you for your advice. We understand that the quality of a dataset is reflected through signal-to-noise ratio, detail richness, geographic accuracy, etc. They are difficult to be assessed quantitatively, but can be revealed indirectly through experimental comparisons. In the discussion section, three new experiments are added, namely, sequence super-resolution, spatiotemporal fusion, and generalizability test. These experiments may reveal more quality and versatility of the data.
  3. Discussions

The time of the images in the FY4ASRcolor dataset is on September 16, 2021 which starts at 00:30 and ends at 23:43. The acquisition duration of each image is 258 seconds. In terms of the time intervals of two adjacent images, most of them are 258 seconds, and the maximum value is 3084 seconds. The strong temporal continuity allows the data to be used for time-related studies. Therefore, two new experimental schemes were explored, namely sequence super-resolution and spatiotemporal fusion. The results of these studies are expected to uncover the feasibility of the new datasets for prediction of temporal correlation. An additional test is made to find the generalization ability. In these studies, only the FY4ASRcolor dataset is used as it better suits for the purpose of remote sensing.

5.1. Sequence Super-resolution

We constructed a training set and a test set by considering the sequence images as a video, and performed a test of video super-resolution. To construct the training set, 40 various locations were selected. Eighty-four pairs of temporally consecutive patches were extracted from each location, which were divided into 12 groups in time order. Each group contains 7 pairs of temporally contiguous patches as a video clip for reconstruction. The patch sizes of each pair are  and , cropped from the 1km and 4km images, respectively. After removing the groups with excessive darkness, 347 valid groups of video clips were finally obtained out of 480 groups of sequential images for training.

Similar to the training set, the test set was constructed to contain 10 groups of sequential patches at 10 various locations. These 10 locations are included in the 40 locations of the training set. Each group contains 10 pairs of temporally consecutive patches. The patch sizes of each pair are  and , cropped from the 1km and 4km images, respectively. The 100 images used in the test set are beyond the training set.

The Zooming Slow-Mo algorithm is chosen to perform the sequence super-resolution, and its results are evaluated on the 100 test images. Only with the pre-trained model, the average PSNR is 28.2371 dB. With the pre-trained model as the initial value and the constructed dataset for re-training, the average PSNR is 29.2174 dB. When trained only with constructed training set where the pre-trained model is not used, the average PSNR is 30.1253 dB. By comparing these scores with that in Table 3 and Table 5, it is concluded that the gap between our FY4ASRcolor dataset and commonly used video sequence datasets is huge. To reconstruct sequence remote sensing images, both the a priori image structure and the sequence change pattern have to be learned, such that matching datasets is getting difficult.

5.2. Spatiotemporal Fusion

Spatiotemporal fusion is a solution to enhance the temporal resolution of high spatial resolution satellites by exploiting the complementarity of spatial and temporal resolutions between satellite images of different sources. Typical studies are carried out between MODIS and Landsat satellites, which have revisit periods of 1 and 16 days, respectively. A typical spatiotemporal fusion needs three reference images. Assuming that MODIS captures images at moments  and  while Landsat took an image only at moment , spatiotemporal fusion algorithms try to predict the Landsat images at moment  with the three known images.

The FY4ASRcolor dataset is ideal for conducting spatiotemporal fusion studies. Different from MODIS and Landsat, the two known images in the FY4ASRcolor dataset were taken at the exactly same time. They also have the same sensor response, which eliminates the fatal sensor discrepancy issue in fusing MODIS and Landsat. A similar work was carried out by us for the spatiotemporal-spectral fusion of the Gaofen-1 images, but it has only a 2-fold difference in spatial resolution. The use of the FY4ASR dataset for spatiotemporal fusion may provide new fundamental data support for this research topic.

We try to use two methods for spatiotemporal fusion, namely FSDAF and SSTSTF. FSDAF is a classical algorithm, while SSTSTF is one of the latest algorithms based on neural networks. SSTSTF requires large amount of data for training, otherwise the performance is not as good as FSDAF. however, FSDAF fails in our test because it cannot give legible images. The changing sunshine intensities lead to a huge variation in the reflection of the features, which may exceed the temporal difference tolerance that FSDAF can reconstruct for surface reflectance. In contrast, SSTSTF can accomplish the reconstruction successfully.

For SSTSTF, paired images from 12 moments were used to construct the dataset. Each high-resolution image has a size of . Images from 9 moments were used for training, and they formed 8 groups. The test used images from 3 other moments, two of which were set as the prediction time. The reconstruction PSNRs are 32.9605 dB for 6:30 and 36.8904 dB for 11:30, respectively, after removing the dark areas from the reconstructed images as it may elevate PSNR values unfairly. The results show that the reconstruction quality of spatiotemporal fusion is weakly less than that of single-image super-resolution. Considering that the amount of training data is far smaller than that used for training of single-image super-resolution, spatiotemporal fusion algorithms need to be carefully designed to adapt to this new dataset.

5.3. Generalization of Trained Models Across Datasets

In order to evaluate the generalization of the model trained based on the FY4ASRcolor dataset for remote sensing, it is planned to apply the trained models to other datasets. Unfortunately, the existing studies use images with ultra-high resolutions, which prevents us from finding matching application scenarios. Finally, the datasets used in [33] were tested. Two datasets were involved in the experiment, the 0.3m UC Merced dataset and the 30m to 0.2m NWPU-RESISC45 dataset. The tested images are denseresident191 in the UC Merced dataset and railwaystation565 in the NWPU-RESISC45 dataset. The scores of PSNR evaluation are listed in Table 8. The results from pre-training models are close to the values in [33]. However, re-training on FY4ASRcolor leads to a substantial decrease in reconstructing high-resolution remote sensing images. This convinces us again that the characteristics of our data are quite different from other datasets. This conclusion is easily concluded, as meteorological satellite images have to lose spatial resolution to ensure high temporal resolution. Knowledge of structural details cannot be learned from low-resolution images to reconstruct complex structures of high-resolution images. On the contrary, the temporal repetition and spectral features play much greater roles in the reconstruction process.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

It has been modified as suggested and agreed to accept.

Back to TopTop