Next Article in Journal
Satellite Hyperspectral Nighttime Light Observation and Identification with DESIS
Next Article in Special Issue
Deep Learning-Based Landslide Recognition Incorporating Deformation Characteristics
Previous Article in Journal
Erosion Gully Networks Extraction Based on InSAR Refined Digital Elevation Model and Relative Elevation Algorithm—A Case Study in Huangfuchuan Basin, Northern Loess Plateau, China
Previous Article in Special Issue
Tracking the 2D/3D Morphological Changes of Tidal Flats Using Time Series Remote Sensing Data in Northern China
 
 
Article
Peer-Review Record

Improving Seismic Fault Recognition with Self-Supervised Pre-Training: A Study of 3D Transformer-Based with Multi-Scale Decoding and Fusion

Remote Sens. 2024, 16(5), 922; https://doi.org/10.3390/rs16050922
by Zeren Zhang †, Ran Chen † and Jinwen Ma *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2024, 16(5), 922; https://doi.org/10.3390/rs16050922
Submission received: 4 February 2024 / Revised: 28 February 2024 / Accepted: 4 March 2024 / Published: 6 March 2024
(This article belongs to the Special Issue Remote Sensing for Geology and Mapping)

Round 1

Reviewer 1 Report (Previous Reviewer 2)

Comments and Suggestions for Authors

The paper has been improved from the first submission. However, I think that there are still a couple of points to be considered:

The applicability of the method to real case studies and real applications.

The use of the method proposed herein to improve the seismic resilience of systems, such as infrastructures, buildings... Please refer to the due literature (see attached).

 

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report (New Reviewer)

Comments and Suggestions for Authors

 

1.       This paper studies seismic fault recognition with self-supervised pre-training. The author uses both synthesized and actual unlabelled seismic datasets and applies an innovative pre-training strategy to improve the recognition accuracy. The results show that the approach is effective and the accuracy is improved compared to the existing methods.

2.       The paper is generally written well and easy to read. Some clarifications, though, need to be made, as mentioned below.

3.       Please provide more detailed information on the field data used and summarize the the features of them.

4.       What features are used to recognize and identify the seismic fault?

5.       Although the author tries to improve the accuracy with various strategies, how does the minimum amount of data required that can ensure the desired outcome of training considering different field conditions?

6.       Define P in Equation 9 and explain why 0.7 is used? Is it based on experience or analyses?

7.       There are some grammatical errors in the paper; for example, in Line 287, "...intricately.Swin-UNETR builds...", should add a space between the two sentences. It is recommended for the author to carefully proofread the paper before the next submission.

Comments on the Quality of English Language

Minor editing is needed.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report (New Reviewer)

Comments and Suggestions for Authors

The paper proposes a very efficient and novel approach to fault recognition in a 3D framework. Though interesting, in the reviewer’s opinion some amendments are necessary according to the following:

1.       Grammar must be somehow checked, and some typos are still present.

2.       It is suggested to avoid the use of etc, especially in the abstract. Provide complete and clear statements.

3.       The introduction is very long and contains some repetitions across the 3 subsections. It is suggested to shorten it and get to the real point of discussion.

4.       Section 2.2: in Eq. (1) it is not clear at all what X is. This is brought to Fi. 6, where the reported values along the vertical axis cannot be understood. For instance, what does 12.5 mean? What do negative values mean?

5.       Fig. 5a must be described better, given the claimed 0-1 distribution.

6.       Line 240: concerning the reported data augmentation, can the authors show the positive impact? According to the reviewer’s expertise, the said impact is often marginal when very tiny objects have to be detected, even in 2D images. Hence, all in all the dataset does not provide any real improvements in terms of feature selection/detection.

7.       Section 3: Figs. 7 and 8 do not seem enough to let the readers understand the details of the offered solution. It is suggested to add a more detailed mathematical description, probably in an Appendix to avoid making the text too fragmented here.

8.       Lines 369-374: regarding the “degree of universality”, can it be shown concretely?

9.       Line 400: here the reference should be probably to Eq. (5) and not (4). Eqs. (5)-(7) should be also described more in detail.

 

10.   Eqs. (8): can it be shown how the indices vary with the mentioned threshold values?

Comments on the Quality of English Language

Grammar must be somehow checked, and some typos are still present.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report (Previous Reviewer 2)

Comments and Suggestions for Authors

The paper is now ready for acceptance

Reviewer 3 Report (New Reviewer)

Comments and Suggestions for Authors

Done a good job, and the paper can now be accepted.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In this paper, the authors conducted SimMIM pre-training on a large amount of field 3D seismic data using the 3D Swin-Transformer backbone. For fine-tuning, they improved the structure of the Swin-UNETR model by fusing three different decoders. Self-supervised pretraining was performed on the Thebe public dataset and fine-tuned on the synthetic dataset. However, the reviewer thinks that the proposed method is mainly published work and is less innovative, and the dataset used for validation is insufficient. Improvements can be made as follows:

1. The topic is too broad and it is recommended to revise it. For example, most of the current self-supervised pretraining methods are Transformer-Based Self-Supervised Pretraining Models, such as MAE and simMIM. Swin-UNETR, on the other hand, is a published network structure.

2. The references related to self-supervision still need to be improved, which is currently insufficient, and a large number of the currently cited literature is not closely related to the self-supervision subject of this paper.

3. There needs to be some innovation in the methodology, not just the application of existing methods. What is the contribution of this paper compared to Ref. 43?

4. A lot of introduction is devoted to the work that has been published, such as 2.1.1 and 2.1.2, and it is recommended to reduce it and highlight your own work.

5. It is recommended to use more datasets for validation. Where appropriate, you can contribute to the dataset.

6. Which are the self-supervised methods in Table 2 and it is recommended that they be compared only with the self-supervised approach and not with other non-self-supervised methods. And the comparison needs to include other advanced self-supervised methods other than SimMIM.

Reviewer 2 Report

Comments and Suggestions for Authors

The paper investigates an interesting topic, such as the FaultSeg Swin-UNETR: Transformer-Based Self-Supervised Pretraining Model for Fault Recognition. The methodology is pertinent and English is fine. However, there are several issues to be considered.

Introduction
The novelties need to be defined in order to support the originality of the paper.

This sentence needs to be explained:

"Additionally, 2D or 2.5D approaches cannot fully capture faults’ crucial three-dimensional spatial features as 3D methods can"

A discussion on the importance of the described technologies for the seismic resilience is needed.

Section 2

The authors just wrote that they observe a significant difference in the parameter size between the backbone and the decoding head of Swin-UNETR and further discover that directly using the entire segmentation network for SimMIM self-supervised pre-training can achiev better results. Howevere, they need to expand this part.

Figure 1 needs to be more clear.

Section 3

The authors just wrote: "Considering the practical applicability of seismic fault detection models, our approach primarily relies on seismic data obtained from actual working areas." Please exaplain.

 It is not clear why the the kurtosis of the Thebe dataset is notably more significant than that of synthetic and field data used for pre-training. Please explain.

Figure 9 needs to be more clear.

Conclusion
This part is more a discussion. Please reorganize it limitations, applications and future work.

Back to TopTop