Landslide Susceptibility Prediction Based on High-Trust Non-Landslide Point Selection
Round 1
Reviewer 1 Report
Thank you for allowing me to review your paper “Landslide susceptibility prediction based on high trust non-landslide point selection”.
The topic is interesting but not fully clear and major revisions must be covered before publishing.
The main issue with this study is that it wants to use "high trust non-landslide", but there are two main questions: 1. Why not use all non-landslide data? As any selection of non-landslide points will be associated with bias. 2. If your landslide inventory data is correct, then all others will be high trust non-landslide points. So why the study is looking for high trust non-landslide points?
I should mention the manuscript is well structured. However, some other major revisions are required.
Abstract
"Landslide susceptibility prediction has the disadvantages of challenging in expanding landslide samples and low accuracy of a subjective random selection of non-landslide samples..." I just sow this issue in some studies (that I personally do not really agree with their work). There are several other works that have used all data instead of some samples.
Introduction
"Landslide is a complex geological phenomenon. It talks about how the rock mass on the slope is affected by rainwater soaking and artificial factors and slides down into the action of gravity. It is likewise the most common geological disaster in the world. Landslides cause severe casualties and economic losses every year, seriously restricting the economic development of some regions. In many areas, disasters have hindered the development of cities and become a barrier to poverty alleviation in various countries" All these sentences have only one reference, which is weird for this section. Moreover, the reference of (Liu,T., Tan,J.M., Guo,F., et al., 2021) is also not fully appropriate, these definitions were not provided in 2021 and by these guys for the first time. I recommend the introduction section is rewritten with one of the authors, who is more familiar with landslide literature, and also the current important ones. This is up to the authors but for such strong definitions, the works of Guzzetti are usually referenced. For any susceptibility mapping study the inventory generation and in this case, the landslide detection must be discussed in advance. For machine learning models and their application for landslide the work of evaluation of different machine learning methods and deep-learning convolutional neural networks for landslide detection is one of the most important ones and from the current deep learning-based studies the comprehensive work of landslide4sense: reference benchmark data and deep learning models for landslide detection.
Line 78, Semi-supervised learning has been widely used in sample data analysis and evaluation [16-18]. How about the application of Semi-supervised learning in the landslide domain? I am not sure if the authors have checked the relevant literature. There is even unsupervised deep learning in the landslide domain.
The text in Figures 1 and 3 is not readable.
Modeling of landslide susceptibility assessment
Semi-supervised learning framework construction
Add some explanation about your applied Semi-supervised learning framework and reduce the general information provided.
Susceptibility index distribution
Explain why this information is important in this work.
Author Response
Dear reviewer:
Thank you for reviewing my manuscript. This is my first time writing an article, and many shortcomings and inadequacies exist. Thank you for your valuable comments and suggestions, which have greatly benefited me. This will be an essential experience in my later academic career. Below are my responses to your questions and requests.
- Why not use all non-landslide data? As any selection of non-landslide points will be associated with bias.
In machine learning prediction, training data is needed to inform the forecast. If you select all non-landslide point data, the following questions will arise, must the points without landslides be low-prone points of landslides? For example, at an intersection, a man is crossing the road and running a red light. Luckily, he doesn't get into a car accident. But can we say that the probability of running a red light without a car accident is very low? The same is true for the non-landslide point. Although he did not have a landslide on the list, we cannot confirm that it is a low-risk point for landslides. What is needed in the training data of machine learning is the low-slip-prone point. So we cannot select all non-landslide points as training data.
- If your landslide inventory data is correct, then all others will be high trust non-landslide points. So why the study is looking for high trust non-landslide points?
The landslide inventory data is correct. I can only confirm that some places have landslides, but I can't ensure that the rest of the sites without landslides must be low-prone points of landslides, so I need to select high-confidence non-landslide points as training data. As for the high-confidence landslide points, the same method was used to expand the high-confidence landslide points because the landslide samples in a study area were too rare. To ensure that the ratio of landslide point data and non-landslide point data in the training data of the machine learning model remains 1:1 to confirm the model's prediction accuracy.
- Several other works used all data instead of some samples. I think it is the innovation of this paper. The comparison model PSO-ELM used in conclusion is not using the selected high-trust data as training data. The results show that the area that should have been a high-risk landslide was predicted to be a low-risk area. It was precise because the area that was supposed to be high-prone to landslides was included in the training of non-landslide points because no landslides had occurred, for the time being, resulting in prediction errors.
- The bibliographic citation in the introduction is inappropriate, and according to your suggestion, the opening has been rewritten, and the bibliography has been added. And the related content of landslide detection is added in the introduction part. The newer research contents and associated literature on landslide detection are cited for broadening their academic horizons. And adds to the existing research and results of the semi-supervised learning framework in a landslide.
- The text in Figures 1 and 3 is not readable.
I don't know if it's because of blurry pictures. My local pictures are apparent. I don't know if it's because of the network upload. I've changed the image to png format and uploaded it again. You can see if it solves the problem.
- Add some explanation about your applied Semi-supervised learning framework and reduce the general information provided.
In the semi-supervised learning framework module, I added an explanation of the operation of the framework and an explanation of some pictures.
- Susceptibility index distribution explain why this information is important in this work.
I have added the following corresponding explanation to the text.
The distribution of the sensitivity index can intuitively observe the number of the susceptibility index in a specific interval in the study area. In actual situations, the size of landslide sites is much smaller than that of non-landslide sites. Therefore, when judging the performance of a model in predicting landslide susceptibility, it will focus on observing the proportion of highly low-prone areas and low-prone areas. The larger the size, the better the model's ability to identify landslides. Therefore, the distribution of the sensitivity index can intuitively see the proportion of highly low-prone areas and low-prone areas of the model, which can better reflect the model's prediction performance.
Thank you again for reviewing my manuscript. Thank you for your comments and suggestions. It is beneficial to me.
Kind regards,
Yizhun Zhang
Author Response File: Author Response.doc
Reviewer 2 Report
Dear authors,
Please revise some minor mistakes:
Line 65: please delete the phrase "is challenging to", it is wordy, change it to "changes".
Line 69: The phrase "particle swarm optimization extreme learning machine algorithm" appears to be a confusing noun string. Consider rewriting the sentence for clarity.
Line 86: please delete "which is"
Line 147: all the equations need to be mentioned in the relevant content.
Line 215: Figure, not Fig (please check the format)
Line 392-396: This sentence is unclear. please split it into two sentences or delete some unnecessary words.
Line 429-433: This sentence is also unclear, too many combined words, please split it into two sentences.
The other parts are well written.
Comments for author File: Comments.pdf
Author Response
请参阅附件。
Author Response File: Author Response.doc
Reviewer 3 Report
1. English has some improper statements and illustrations. The introduction part needs to refer to some papers for PSO geo-analysis.
2. Figure quality is inferior. For instance Figure 2 and Figure 3 are very vague.
3. Figure 4 is useless.
4. ROC curve precision analysis is too basic to present.
5. Figure 6 needs more information
6. I can not understand Figure 8 for what meaning?
Author Response
Dear reviewer:
Thank you for reviewing my manuscript. This is my first time writing an article, and many shortcomings and inadequacies exist. Thank you for your valuable comments and suggestions, which have greatly benefited me. This will be an essential experience in my later academic career. Below are my responses to your questions and requests.
- The introduction part needs to refer to some papers for PSO geo-analysis.
I've rewritten the introduction, but I didn't figure out what PSO geo-analysis is, hopefully, to meet your criteria.
- Figure quality is inferior. For instance Figure 2 and Figure 3 are very vague.
My local pictures are apparent. I don't know if it's because of the network upload. I've changed the image to png format and uploaded it again. You can see if it solves the problem.
- Figure 4 is useless.
I hope that Figure 4 is an impression of the model for readers who have not learned about ELM. If you insist that it is not applicable, I will delete it in subsequent versions.
- ROC curve precision analysis is too basic to present.
I have added more ROC correlation analysis in the text.
As shown in Figure 12, the model's prediction accuracy is evaluated by the area under the ROC curve, AUC. The AUCs of SS-PSO-ELM, SS-ELM, PSO-ELM, and ELM was 0.893, 0.867, 0.788, and 0.710, respectively. From the image coupling effect, the SS-PSO-ELM and SS-ELM models have better results, showing good landslide susceptibility prediction performance, but the SS-ELM model has poor stability, and the curve rises slowly in the later period. Compared with the SS-PSO-ELM model, it does not provide a more stable prediction performance, which proves that the extreme learning machine model optimized by the particle swarm optimization algorithm has higher accuracy and stability for predicting landslide susceptibility. The AUC accuracy of the SS-PSO-ELM model is 0.105 more increased than that of the PSO-ELM model without the semi-supervised learning framework. Show that using a semi-supervised learning framework to screen non-landslide high-confidence points can significantly improve the performance of landslide susceptibility prediction.
- Figure 6 needs more information
I have added more information about Figure 6 in the text.
Figure 6(a) is the cluster center selection diagram, the abscissa is the density of data points, and the ordinate is the distance from the point to the nearest higher density point. The density peak clustering algorithm selects an issue with a higher density and no higher density nearby as the cluster center point. Therefore, according to Fig. 6(a), 489, 324, 367, 455, and 388 were selected as the cluster center points. As shown in Fig. 6(b), the remaining issues are divided according to the five cluster center points. As shown in the figure, all the data are divided into five categories. The cross symbols indicate the positions of the five cluster centers in Fig. 6(b)
- I can not understand Figure 8 for what meaning?
I have supplemented the text with more information and explanations about Figure 8.
Mutual information represents the amount of information one random variable contains in another. Therefore, the higher the mutual information, the closer the relationship between the two variables. Figure 8 shows the mutual information between various environmental factors. It can be seen from the mutual information between each environmental element and landslide in Figure 8 that the mutual information between the slope aspect and landslide is the largest, with a value of 0.86. However, in the final weights shown in Figure 9, the influence of the slope direction on the landslide is ranked second. Because the slope direction has high mutual information not only for landslides but also for other environmental factors, when the slope direction is used as the input for the landslide prediction, if the weight of the slope direction is too high, it will lead to more redundancy and more significant prediction errors. Therefore, in the final weights calculated by the maximum correlation minimum redundancy algorithm, the weight of the slope direction is less than the weight of the elevation. Prove that it is feasible to calculate the importance of the landslide factor based on the maximum correlation minimum redundancy algorithm.
Thank you again for reviewing my manuscript. Thank you for your comments and suggestions. It is beneficial to me.
Kind regards,
Yizhun Zhang
Author Response File: Author Response.doc
Round 2
Reviewer 1 Report
The manuscript is accepted now.
One minor commend is that reference 14 must be modified with "landslide detection using deep learning and object-based image analysis", which totally reflects the text of "But from the current research results, almost all machine learning methods for analyzing the potential risk of landslides rely heavily on inventory datasets of the known spatial extent of landslides. Or at least one characteristic GPS location for each known landslide in the target study area[14]."
Author Response
Dear reviewer
Thank you for reviewing my manuscript. Your suggestion, reference 14, has been revised. Thanks again for your review.
Reviewer 3 Report
The English are not qualified for publishing. For instance
With the development of machine learning, compared with previous empirical and statistical models, machine learning is regarded as having a better nonlinear predictive ability in landslide susceptibility prediction[13].
Many redundant like machine learning are repeated.
Fig.13 Distribution map of the landslide susceptibility index is not well-explained.
Figure 11 should be compared to ground truth data. It is hard to see the differences.
|
Author Response
Dear reviewer
Thank you for reviewing my manuscript. I will reply to your suggestions below.
- 13 Distribution map of the landslide susceptibility index is not well-explained.
Added the following paragraphs at the beginning of Section 5.2 to explain to make Figure 13 clearer.
A graph of the sensitivity index distribution is shown in Figure 13, showing the quantities included in each landslide probability interval. The mean and standard deviation is shown in Figure 13 can better reflect the prediction level of the four models and the dispersion degree of the predicted landslide data. Figure 14 demonstrates the proportion of each landslide-prone zone in the study area. Both figures can show the stability of the model for landslide prediction and judge whether the model prediction is in line with the actual situation.
Furthermore, the following paragraphs have been added to the segmented explanation to analyze Figure 13.
It can be seen in Figure 13. The standard deviations of the four models are compared from large to small, namely SS-PSO-ELM, SS-ELM, PSO-ELM, and ELM. The SS-PSO-ELM standard deviation is the largest, proving that the SS-PSO-ELM model can distinguish and identify landslides and better reflect the differences in landslide susceptibility to the study area. The SS-PSO-ELM model has the most data points with a landslide probability of less than 0.3, which fully reflects the truth that the non-landslide area is much higher than the landslide area. However, because the PSO-ELM and ELM models do not use high-confidence non-landslide points as training data, the probability of landslides in most places is between 0.4 and 0.6, and there is no good landslide discrimination. Furthermore, most of the predicted areas are in the high-risk prone regions to landslides, which is inconsistent with the actual situation.
- Figure 11 should be compared to ground truth data. It is hard to see the differences.
Your suggestion is very reasonable, but our work has not been carried out due to the epidemic. So we consider the landslide points that have already occurred to verify whether the predictions are reasonable. In the follow-up work, we will take your opinion to conduct a field inspection in the research area to verify the rationality of the landslide susceptibility prediction.
As for the English question, I have asked someone to review it. The polish of English grammar and sentence patterns has been revised, and the revision status is displayed in the article.
Thanks again for your review.
Kind regards,
Yizhun Zhang
Author Response File: Author Response.doc