FEPVNet: A Network with Adaptive Strategies for Cross-Scale Mapping of Photovoltaic Panels from Multi-Source Images
Round 1
Reviewer 1 Report
This manuscript proposed a network named FEPVNet, which embeds high-pass and low-pass filters and Polarized Self-Attention into a High-Resolution Network(HRNet) to improve its capabilities in noise resistance and adaptive feature extraction, ultimately enhancing photovoltaic extraction accuracy.
The paper is well written. The relevant background of the current photovoltaic extraction method is clearly introduced, and the innovation of the proposed method is described in detail. By incorporating high-pass and low-pass filters, polarized Self-Attention and data migration strategy into the HRNet, the FEPVNet significantly imporves the accuracy and the adaptive capability in photovoltaic extraction. I recommend acceptance after minor revision.
Some minor issues:
1. Abbreviations in the abstract cannot be used directly in the main text. For example, the PV in the second paragraph of the first page. The rules on abbreviations in academic papers need to be observed separately in the abstract and in the main text.
2. There are some minor problems with the writing, especially the use of prepositions. Please check the whole text carefully.
Author Response
Reviewer #1:
This manuscript proposed a network named FEPVNet, which embeds high-pass and low-pass filters and Polarized Self-Attention into a High-Resolution Network (HRNet) to improve its capabilities in noise resistance and adaptive feature extraction, ultimately enhancing photovoltaic extraction accuracy.
The paper is well written. The relevant background of the current photovoltaic extraction method is clearly introduced, and the innovation of the proposed method is described in detail. By incorporating high-pass and low-pass filters, polarized Self-Attention and data migration strategy into the HRNet, the FEPVNet significantly imporves the accuracy and the adaptive capability in photovoltaic extraction. I recommend acceptance after minor revision.
Response: We thank the reviewer for the precious time for reviewing our manuscript and appreciate the reviewer’s positive comments and we will improve our manuscript accordingly.
# Comment 1-1:
Abbreviations in the abstract cannot be used directly in the main text. For example, the PV in the second paragraph of the first page. The rules on abbreviations in academic papers need to be observed separately in the abstract and in the main text.
Response: Thank you very much for your suggestion! We have made corrections to the all abbreviations in the main text as follows:
“According to the International Energy Agency's (IEA) sustainability program, the number of photovoltaic (PV) plants will increase rapidly.” (page 1, line 41-42)
“Therefore, we selected HRNet as the base model and embedded Canny, Median filter, and Polarized Self-Attention (PSA)to design an adaptive FEPVNet.” (page 2, line 85-87)
“In addition, the Polarized Self-Attention-Residual (PAR), single depthwise separable (SDS) residual and Double Depthwise Separable (DDS) Residual blocks were con-structed to replace the standard residual blocks at different stages of the HRNet main network.” (page 4-5, line 152-155)
“1. A SDS residual block, as shown in Figure 7(c), where two normal convolutions are replaced by depth convolution and point convolution; (page 8, line 275-276)
- A DDS residual block, as shown in Figure 7(d), where two depthwise separable convolutions are used to replace two normal convolutions.” (page 8, line 277-278)
# Comment 1-2:
There are some minor problems with the writing, especially the use of prepositions. Please check the whole text carefully.
Response: Thank you very much for your suggestion! We carefully reviewed the entire manuscript to check and correct both the prepositions and other grammar problems.
We provide some examples of sentences we have modified as follows:
“The global demand for energy is facing significant challenges and uncertainties, manifested by the decrease in fossil energy reserves and rising prices [1].” (page 1, line 35-36)
“(c)comparing the PV extraction ability of different models on the Sentinel-2 dataset.” (page 3, line 130-131)
Author Response File: Author Response.docx
Reviewer 2 Report
1. It is necessary to clarify what "zoom 14 and 16" means in Google images, it is understood to be a scaling treatment on the spatial resolution of the image. Why such images are used and how they are used.
2. It is not indicated which bands are used in Sentinel-2 images, and whether they are combined.
3. It would be appropriate to indicate on which server it is possible to download the satellite images used Geofen-2, and in general for all images.
4. Some cited are not related to the study.
Author Response
Reviewer #2:
# Comment 2-1:
It is necessary to clarify what "zoom 14 and 16" means in Google images, it is understood to be a scaling treatment on the spatial resolution of the image. Why such images are used and how they are used.
Response: Thank you very much for your suggestion and question! The description of Google images used in the manuscript was based on the official documentation of Google Maps Platform (https://developers.google.com/maps/documentation/maps-static/start?hl=zh-cn), which described different resolution levels of the images. For example, zoom 14 represents Google images with a resolution of 10 meters, while zoom 16 represents Google images with a resolution of 2 meters. Using these data, we developed a data migration strategy: First, we trained the FEPVNet model using Sentinel-2 images and then we trained the model with Google image to obtain photovoltaic features at different resolutions.
We have modified the description of these images in Section 2 (Dataset).
“To construct the cross-scale network model, four types of images are required: Sentinel-2 image at 10m resolution, which is available for download via Google Earth Engine (GEE), Google-14 (i.e., zoom level is14) image at 10m resolution, Google-16 (i.e., zoom level is 16) image at 2m resolution, all of which can be downloaded through the Google Images API, and Gaofen-2 image at 2m resolution which can be downloaded from the Data Sharing Website of Aerospace Information Research Institute, Chinese Academy of Sciences.” (page 3, line 98-103)
# Comment 2-2:
It is not indicated which bands are used in Sentinel-2 images, and whether they are combined.
Response: We apologize for not giving sufficient information about the images we used in this manuscript. In this study, we used RGB composite of Sentinel-2 images with bands (4, 3, 2).
We modified the band information as follows:
“The sample images of Sentinel-2 we used consist of three bands: red (B4), green (B3), and blue (B2), while the sample label images are grayscale images. These data were cut into 1024 × 1024 pixels, forming four datasets with properties shown in Table 1.” (page 3, line 111-114)
# Comment 2-3:
It would be appropriate to indicate on which server it is possible to download the satellite images used Geofen-2, and in general for all images.
Response: The Gaofen-2 image used in this research was obtained from the data-sharing website of the Aerospace Information Research Institute of the Chinese Academy of Sciences (http://ids.ceode.ac.cn/gfds/query), which requires registration and approval. Sentinel-2 data was downloaded from Google Earth Engine, and Google data was downloaded through Google Images API.
# Comment 2-4:
Some cited are not related to the study.
Response: We are sorry for our carelessness, and we have carefully proofread the full reference. The references were updated and some of them were corrected.
Author Response File: Author Response.docx
Reviewer 3 Report
The authors proposed a composite strategy for PV panel segmentation that addresses multi-scale issues and focuses on detail features. The results of the study demonstrate that they selected the best-performing HRNet-based framework.
However, I have the following main concerns:
- Introduction: Further elaboration is needed on the rationale behind the authors' proposal of a strategy based on HRNet. Are there any research studies or experimental results that demonstrate the superiority of HRNet-based methods over other deep learning-based approaches?
- Dataset: Did the authors take into account very high-resolution (VHR) satellite images at a half-meter level, such as those from the WorldView series? If so, why were they not included in the study?
- Data for DL: Could the authors please clarify why there are no validation and test sets available for Google-14/16, and why there are no training sets for Gaofen-2 in the DL data used in the study?
- Methodology: Could the authors provide more information on the process used to select the best-performing FEPVNet in section 3.1 of the methodology?
- Results: In the introduction section, the authors assert that DL-based methods have achieved success in object detection and segmentation, which I concur with. However, why wasn't the proposed method compared with any SOTAs?
Author Response
Reviewer #3:
The authors proposed a composite strategy for PV panel segmentation that addresses multi-scale issues and focuses on detail features. The results of the study demonstrate that they selected the best-performing HRNet-based framework.
Response: Thank you for taking the time to read and review our manuscript and thank you for your positive comments!
# Comment 3-1:
Introduction: Further elaboration is needed on the rationale behind the authors' proposal of a strategy based on HRNet. Are there any research studies or experimental results that demonstrate the superiority of HRNet-based methods over other deep learning-based approaches?
Response: Thank you for your suggestion. We have further improved the introduction section of the manuscript to explain why we chose the HRNet-based strategy in detail.
To illustrate the superiority of HRNet, we added more references to the introduction section:
“This study examined the current mainstream CNN models. Many researchers have compared U-Net, DeepLabv3+, PSPNet, and HRNet models on the PASCAL VOC 2012 dataset, and HRNet achieved the best performance [38,39]. Therefore, we selected HRNet as the base model and embedded Canny, Median filter, and Polarized Self-Attention (PSA)to design an adaptive FEPVNet.” (page 2, line 82-87)
# Comment 3-2:
Dataset: Did the authors take into account very high-resolution (VHR) satellite images at a half-meter level, such as those from the WorldView series? If so, why were they not included in the study?
Response: Thank you for your questions! In our study, we considered remote sensing images of different resolutions but did not include Very High-Resolution (VHR) satellite images such as the WorldView series because we did not have access to the WorldView images, especially in a very large scale.
We introduced the data and its usage in Section 2 (Dataset), as follows:
“To construct the cross-scale network model, four types of images are required: Sentinel-2 image at 10m resolution, which is available for download via Google Earth Engine (GEE), Google-14 (i.e., zoom level is14) image at 10m resolution, Google-16 (i.e., zoom level is 16) image at 2m resolution, all of which can be downloaded through the Google Images API, and Gaofen-2 image at 2m resolution which can be downloaded from the Data Sharing Website of Aerospace Information Research Institute, Chinese Academy of Sciences. Therefore, we first validated the FEPVNet performance using the Sentinel-2 images, then constructed three data migration strategies using the Sentinel-2 and Google images, and finally completed the PV extraction from the Goafen-2 images.” (page 3, line 98-106)
# Comment 3-3:
Data for DL: Could the authors please clarify why there are no validation and test sets available for Google-14/16, and why there are no training sets for Gaofen-2 in the DL data used in the study?
Response: Thank you for your question. We apologize for unclear description of the images used in this manuscript. Extracting photovoltaic panels from Gaofen-2 images using a model trained using Sentinel-2 images directly results in poor performance. To address this issue, we proposed a data migration strategy using Google images to transfer the Sentinel-2 based model to Gaofen-2 images. We used Google images as training data, while Gaofen-2 images were used as the validation and testing datasets. Therefore, no validation or testing datasets were required for Google images, and no training dataset was required for Gaofen-2 images.
We have provided detailed description in Section 2 (Dataset), as follows:
“The dataset was divided into three parts, one training set, one validation set, and one test set. The results were poor when training the model on Sentinel-2 images and directly extracting PV from Gaofen-2 images. Therefore, we consider combining multiple PV features to complete the transfer work of the Sentinel-2 model. We aimed to utilize images from Sentinel-2 and Google of different resolutions to perform cross-scale PV extraction in Gaofen-2 imagery without using Gaofen-2 imagery to train the model. As a result, only the training set of Google images was needed. For model training, we used Gaofen-2 imagery as the validation set and test set. Thus, Google images did not require validation and test sets, and Gaofen-2 imagery did not need a training set.” (page 3, line 114-123)
And we explained the construction of our model using data migration strategies in section 3.2, as follows:
“To enhance the generalizability of the Sentinel-2 image model in multi-source im-ages and facilitate its migration for use in high-resolution image models, we compared four methods illustrated in Figure 1(d), which include three image migration strategies. These methods consist of training the model with the Sentinel-2 dataset, mixing Senti-nel-2 images with Google-14 images in a 1:1 ratio to form the dataset training model, mixing Sentinel-2 images with Google-16 images in a 1:1 ratio to form the dataset training model, and mixing Sentinel-2 images with Google-14 and Google-16 images in a 1:1:2 to form the dataset training model. We then used these methods to extract PV from the Gaofen-2 image.” (page 9, line 295-303)
# Comment 3-4:
Methodology: Could the authors provide more information on the process used to select the best-performing FEPVNet in section 3.1 of the methodology?
Response: Thank you for your suggestion. In selecting the best-performing FEPVNet, we used multiple metrics, including accuracy, recall, F1-score, among others. We conducted extensive experiments on different improved models to determine the optimal position of each improvement component and combined them to construct FEPVNet.
We provided a detailed description in section 3.1 on how we determined the structure of FEPVNet to achieve better accuracy.
“Several modifications were made to improve the HRNet model, including adding high-low pass filtering, polarized parallel attention, and deep separable convolution. Four different stem networks were constructed: LG_stem, which combines Laplacian and Gaussian filters, SG_stem, which combines Sobel and Gaussian filters, CG_stem, which combines Canny and Gaussian filters, and CM_stem, which combines Canny and Median filters. In addition, the Polarized Self-Attention-Residual (PAR), single depthwise separable (SDS) residual and Double Depthwise Separable (DDS) Residual blocks were constructed to replace the standard residual blocks at different stages of the HRNet main network. The performance of these modules was evaluated on Senti-nel-2 images in terms of efficiency, Precision, Recall, F1-score, and Intersection over Union (IoU) to determine the best configuration for our model.” (page 4-5, line 147-157)
# Comment 3-5:
Results: In the introduction section, the authors assert that DL-based methods have achieved success in object detection and segmentation, which I concur with. However, why wasn't the proposed method compared with any SOTAs?
Response: We did consider the SOTA methods that are widely used in the field of object detection and segmentation, and compared them with the U-Net and HRNet models in our research. However, our focus was on proposing a new composite strategy to better address the issue of multi-scale and detail features. Therefore, we placed emphasis on validating the performance of our proposed method. We will consider comprehensive comparison with SOTAs in the future.
We conducted experiments on each improvement component in the 4.1 ablation experiment section to determine the optimal structure of FEPVNet. In section 4.2, we compared the performance of U-Net, HRNet, FEPVNet, and FESPVNet. Both of our proposed models, FEPVNet and its lightweight version FESPVNet, outperformed U-Net and HRNet. In order to verify the performance of FEPVNet in cross-regional PV extraction, we completed cross-validation experiments between HRNet and FEPVNet in section 4.3, further demonstrating the performance of FEPVNet.
Furthermore, considering the performance of extracting PV from Gaofen-2 at different scales was not good when the Sentinel-2 based models were used directly, we evaluated the PV extraction results of HRNet and FEPVNet on Gaofen-2 images using our proposed data migration strategy. This experiment was done to demonstrate that the combination of FEPVNet and data migration strategy will implement the cross-scale extraction of PV panels from multi-sources images.
Author Response File: Author Response.docx
Round 2
Reviewer 3 Report
Thanks for the authors’ responses to my concerns. In my opinion, extensive effort is required for a qualified publication. I understand that the authors try to propose a strategy combination for solving multi-scale and detailed feature problems.
- Response 3-1: Gaps in proposing the composite method
The authors insist that they focus on settling "issue of multi-scale and detail features" (Response 3-5), and declaim that there is no SOTA on this issue. Then how do they get to know that the baseline HRNet could potentially solve the "new" situation? To my understanding, the extended contents "HRNet achieved the best performance" (line 89) are limited to filling the gap. Besides, logic deduction on the proposal of the adaptive FEPVNet that embeds Canny, the median filter, and PSA is unconvincible.
- Response 3-2. VHR data
It is acceptable to not include half-meter level satellite imagery in this research. I still recommend them to explore further in their future work. To my knowledge, several satellite companies (i.e., MAXAR) encourage connections for academic research purposes.
- Response 3-3: clear enough
- Response 3-4: clear enough
- Response 3-5: Comparison with SOTAs is lacking
To my understanding, the authors try to propose a scheme to map PV panels. Then without comparing it with SOTAs, it is not acceptable to prove that this method is outstanding.
Author Response
Reviewer #3:
Thanks for the authors’ responses to my concerns. In my opinion, extensive effort is required for a qualified publication. I understand that the authors try to propose a strategy combination for solving multi-scale and detailed feature problems.
Response: Thank you for your positive comments and valuable suggestions on our work. We aim to propose an effective solution to the cross-scale mapping problem of PV panels. We have demonstrated through extensive experimentation that the proposed method improves the accuracy of PV panels extraction by introducing three modules (i.e., Canny, the Median filter, and PSA) on the HRNet and achieves better accuracy across different regions (i.e., China, the US) and scales (i.e., 10m, 2m). We have carefully considered your suggestions and improved the manuscript according to your inputs.
# Response 3-1-1: Gaps in proposing the composite method
The authors insist that they focus on settling "issue of multi-scale and detail features" (Response 3-5), and declaim that there is no SOTA on this issue. Then how do they get to know that the baseline HRNet could potentially solve the "new" situation? To my understanding, the extended contents "HRNet achieved the best performance" (line 89) are limited to filling the gap.
Response: Thank you for pointing this out. We are very sorry that we did not explain well of the Response 3-5 on the previous letter. Actually, HRNet is a SOTA method using convolutional neural networks for semantic segmentation according to literatures (Wang et al. 2021; Sun et al. 2019) and we already conducted a comparative analysis between HRNet and our model, FEPVNet on the manuscript.
The experiments showed that Adaptive FEPVNet outperformed benchmark methods such as HRNet due to its superior ability to extract boundaries between adjacent PV panels, as demonstrated in Figure 10. As a result, FEPVNet achieved higher evaluation metrics (shown in Table 4), surpassing that of the SOTA model HRNet in both study regions.
“Table 4. Evaluation metrics for different main body networks.
Region |
Model |
Recall |
Precision |
F1-score |
IoU |
Params |
Flops |
China |
U-Net |
0.4174 |
0.5316 |
0.4676 |
0.3052 |
31054344 |
64914029 |
HRNet |
0.9052 |
0.9489 |
0.9265 |
0.8631 |
65847122 |
374.51G |
|
FEPVNet SwinTransformer |
0.9309 0.9309 |
0.9493 0.9460 |
0.9400 0.9384 |
0.8868 0.8840 |
65939858 59,830,000 |
376.34G 936.71G |
|
FESPVNet |
0.9246 |
0.9503 |
0.9373 |
0.8820 |
26066258 |
253.77G |
|
US |
U-Net |
0.8717 |
0.6224 |
0.7262 |
0.5702 |
31054344 |
64914029 |
HRNet |
0.9521 |
0.9595 |
0.9558 |
0.9153 |
65847122 |
374.51G |
|
FEPVNet SwinTransformer |
0.9641 0.9591 |
0.9695 0.9726 |
0.9668 0.9658 |
0.9358 0.9339 |
65939858 59,830,000 |
376.34G 936.71G |
|
FESPVNet |
0.9567 |
0.9679 |
0.9623 |
0.9273 |
26066258 |
253.77G |
Figure 10. Prediction results for China and the US in different network models. Note: The predic-tion results of U-Net, HRNet, SwinTransformer, FESPVNet, and FEPVNet in China and US re-gions are shown.” (page 14-15, line 396-400)
To further demonstrate the performance of FEPVNet, we conducted cross-validation using FEPVNet and HRNet on Sentinel-2 datasets from two different regions. According to the evaluation metrics in Table 5 and the prediction results in Figure 11, FEPVNet outperformed HRNet in extracting PV panels from different regions.
“Table 5. Area comparison predictive evaluation metrics.
Region |
Model |
Recall |
Precision |
F1-score |
IoU |
China |
HRNet_US |
0.3755 |
0.9372 |
0.5362 |
0.3663 |
|
FEPVNet_US |
0.4645 |
0.9539 |
0.6248 |
0.4544 |
US |
HRNet_China |
0.8288 |
0.4869 |
0.6134 |
0.4424 |
|
FEPVNet_China |
0.6872 |
0.6221 |
0.6530 |
0.4848 |
Figure 11. Cross-validation results. Note: The prediction results of HRNet and FEPVNet for China and US regions with different weight parameters were shown, respectively.” (page 16, line 421-424)
The cross-scale extraction of PV panels was compared among four methods, using three data migration strategies proposed in this study. The results both in Table 6 and in Figure 12 showed that FEPVNet, with data migration strategies, was more efficient for cross-scale PV panels extraction than SOTA model.
“Table 6. Evaluation metrics for model migration prediction results.
Model |
Strategy |
Recall |
Precision |
F1-score |
IoU |
HRNet |
Sentinel-2 |
0.2620 |
0.9216 |
0.4084 |
0.2563 |
Sentinel-2 Googel-14 |
0.3346 |
0.9036 |
0.4884 |
0.3231 |
|
Sentinel-2 Googel-16 |
0.8940 |
0.9162 |
0.9050 |
0.8265 |
|
Sentinel-2 Googel-14 16 |
0.8889 |
0.9269 |
0.9075 |
0.8308 |
|
FEPVNet |
Sentinel-2 |
0.2883 |
0.9083 |
0.4377 |
0.2801 |
Sentinel-2 Googel-14 |
0.6681 |
0.8724 |
0.7567 |
0.6086 |
|
Sentinel-2 Googel-16 |
0.8864 |
0.9437 |
0.9142 |
0.8419 |
|
Sentinel-2 Googel-14 16 |
0.9084 |
0.9192 |
0.9138 |
0.8413 |
Figure 12. Prediction results of two models with different data migration strategies. Note: The prediction results of HRNet and FEPVNet models were shown for four migration strategies for Gaofen data, respectively.” (page 17-18, line 446-450)
# Response 3-1-2: Besides, logic deduction on the proposal of the adaptive FEPVNet that embeds Canny, the median filter, and PSA is unconvincible.
Response: The Canny operator, median filter, and PSA embedded in the Adaptive FEPVNet method can greatly improve the performance for PV extraction because it captures more details of PV in the remote sensing images. Moreover, PSA can make the network focus more on photovoltaic panel features and reduce the influence of background features, so we used it to construct the PAR module and gained idea performance.
The experimental results in Table 2, Table 3 and Figure 8 prove that the logical deduction is reliable.
“Table 2. Evaluation metrics for different stem networks in China and the US.
Region |
Model |
Recall |
Precision |
F1-score |
IoU |
China |
stem |
0.9052 |
0.9489 |
0.9265 |
0.8631 |
LG_stem |
0.8965 |
0.9336 |
0.9147 |
0.8428 |
|
SG_stem |
0.8830 |
0.9057 |
0.8942 |
0.8087 |
|
CG_stem |
0.9065 |
0.9226 |
0.9145 |
0.8425 |
|
CM_stem |
0.9315 |
0.9472 |
0.9393 |
0.8856 |
|
US |
stem |
0.9521 |
0.9595 |
0.9558 |
0.9153 |
LG_stem |
0.9498 |
0.9564 |
0.9531 |
0.9105 |
|
SG_stem |
0.9444 |
0.9443 |
0.9444 |
0.8946 |
|
CG_stem |
0.9541 |
0.9700 |
0.9620 |
0.9268 |
|
CM_stem |
0.9619 |
0.9691 |
0.9655 |
0.9333 |
Figure 8. Predicted results for different stem networks in China and the US. Note: For the adaptation improvement of stem networks, we complete five stem network comparison experiments: original stem, LG_stem combined with Laplacian and Gaussian, SG_stem combined with Sobel and Gaussian, CG_stem combined with Canny and Gaussian, and CM_stem combined with Canny and Median” (page 10-11, line 343-349)
“Table 3. Evaluation metrics for different main body networks.
Region |
Model |
Recall |
Precision |
F1-score |
IoU |
Params |
Flops |
China |
PAR_stage2 |
0.9306 |
0.9423 |
0.9364 |
0.8805 |
65939858 |
376.34G |
PAR_stage3 |
0.9241 |
0.9372 |
0.9306 |
0.8702 |
67400786 |
385.47G |
|
PAR_stage4 |
0.9281 |
0.9481 |
0.9380 |
0.8833 |
70555922 |
385.45G |
|
DDS_stage2 |
0.9202 |
0.9369 |
0.9285 |
0.8665 |
65120210 |
355.52G |
|
DDS_stage3 |
0.9247 |
0.9149 |
0.9198 |
0.8515 |
53557586 |
260.13G |
|
DDS_stage4 |
0.9217 |
0.9265 |
0.9241 |
0.8590 |
28401362 |
259.82G |
|
SDS_stage2 |
0.9215 |
0.9366 |
0.9290 |
0.8675 |
65068946 |
354.14G |
|
SDS_stage3 |
0.9147 |
0.9388 |
0.9266 |
0.8633 |
52735058 |
252.09G |
|
SDS_stage4 |
0.9277 |
0.9380 |
0.9328 |
0.8742 |
25973522 |
251.93G |
|
US |
PAR_stage2 |
0.9422 |
0.9655 |
0.9537 |
0.9116 |
65939858 |
376.34G |
PAR_stage3 |
0.9318 |
0.9605 |
0.9459 |
0.8975 |
67400786 |
385.47G |
|
PAR_stage4 |
0.9467 |
0.9615 |
0.9540 |
0.9122 |
70555922 |
385.45G |
|
DDS_stage2 |
0.9409 |
0.9650 |
0.9528 |
0.9099 |
65120210 |
355.52G |
|
DDS_stage3 |
0.9361 |
0.9617 |
0.9487 |
0.9025 |
53557586 |
260.13G |
|
DDS_stage4 |
0.9410 |
0.9582 |
0.9495 |
0.9039 |
28401362 |
259.82G |
|
SDS_stage2 |
0.9439 |
0.9609 |
0.9523 |
0.9090 |
65068946 |
354.14G |
|
SDS_stage3 |
0.9417 |
0.9690 |
0.9552 |
0.9142 |
52735058 |
252.09G |
|
SDS_stage4 |
0.9412 |
0.9633 |
0.9521 |
0.9087 |
25973522 |
251.93G |
” (page 12-13, line 372-373)
# Response 3-2. VHR data
It is acceptable to not include half-meter level satellite imagery in this research. I still recommend them to explore further in their future work. To my knowledge, several satellite companies (i.e., MAXAR) encourage connections for academic research purposes.
Response: Thank you very much for your suggestions and we quite agree with you! It’s very important to use high resolution images in the future research and there are more and more data resources available for academic research.
# Response 3-5: Comparison with SOTAs is lacking
To my understanding, the authors try to propose a scheme to map PV panels. Then without comparing it with SOTAs, it is not acceptable to prove that this method is outstanding.
Response: Thank you for your comment! Regarding this comment, we are very sorry that we did not explain well on the previous letter. In Response 3-1, we explained that HRNet is one of the SOTA models and we compared it with our model. Furthermore, according to your comment, we have added one more SOTA model SwinTransformer in our manuscript for comparative analysis shown in Table 4 and Figure 10.
“Table 4. Evaluation metrics for different main body networks.
Region |
Model |
Recall |
Precision |
F1-score |
IoU |
Params |
Flops |
China |
U-Net |
0.4174 |
0.5316 |
0.4676 |
0.3052 |
31054344 |
64914029 |
HRNet |
0.9052 |
0.9489 |
0.9265 |
0.8631 |
65847122 |
374.51G |
|
FEPVNet SwinTransformer |
0.9309 0.9309 |
0.9493 0.9460 |
0.9400 0.9384 |
0.8868 0.8840 |
65939858 59,830,000 |
376.34G 936.71G |
|
FESPVNet |
0.9246 |
0.9503 |
0.9373 |
0.8820 |
26066258 |
253.77G |
|
US |
U-Net |
0.8717 |
0.6224 |
0.7262 |
0.5702 |
31054344 |
64914029 |
HRNet |
0.9521 |
0.9595 |
0.9558 |
0.9153 |
65847122 |
374.51G |
|
FEPVNet SwinTransformer |
0.9641 0.9591 |
0.9695 0.9726 |
0.9668 0.9658 |
0.9358 0.9339 |
65939858 59,830,000 |
376.34G 936.71G |
|
FESPVNet |
0.9567 |
0.9679 |
0.9623 |
0.9273 |
26066258 |
253.77G |
Figure 10. Prediction results for China and the US in different network models. Note: The predic-tion results of U-Net, HRNet, SwinTransformer, FESPVNet, and FEPVNet in China and US re-gions are shown.” (page 14-15, line 396-400)
Author Response File: Author Response.docx