Analysis of Factors Related to Forest Fires in Different Forest Ecosystems in China
Round 1
Reviewer 1 Report
Minor comments:
· Section 2.2.2.4: Climatic factors -> Human driving factors
· Line 262: GBDT -> Gradient Boosted Decision Trees (GBDT)
· Line 329: factors used were used
· Line 347: factors for to train
Major comments:
· Section 2.3.1.: The authors should consider specifying what distance or dij was applied to the study (great-circle distance?). Also, explaining better what the meaning and role of the distance scale (d) is.
· Equation 5: The equation is not labeled, and the elements of the equation are not explained.
· Line 270: What did the authors use to detect and remove outliers?
· Line 278: The authors should include the number of nodes in the hidden layers used. Also, do double hidden layers mean two hidden layers?
· Section 2.3.3.: The authors should include the hyperparameters used to train the LightGBM and how they were tuned.
· Section 2.3.4.: The authors should consider adding some justification for the ANN structure. Did the authors try other options? Also, include a description of the
· Line 334: The percentage in the text does not match Table 2.
· Section 3.2: The authors should consider explaining the results from Table 2 and Figure 5 for all the 4 regions. The text describes the results in Table 2 only for Jilin Province and in Figure 5 for Heilongjiang Province.
· Line 349: The percentage in the text does not match Table 3.
· Section 3.3: Similar to Section 3.2, the results from Table 3 are only described in the text for 2 regions.
· Line 366: The percentage in the text and the explanation do not match Table 4.
· General: The authors should consider adding other performance measures since the dataset they are using is clearly imbalanced. That would ensure that those large accuracies are reliable.
Author Response
Response to Reviewer
Thanks to the editor for arranging the review and the reviewers for their valuable comments. We have carefully answered the questions one by one according to the requirements of the reviewers, and made careful revisions to the article
Point 1: Section 2.2.2.4: Climatic factors -> Human driving factors 

Response 1: Thank you for your suggestion. We have revised this title in the article. (Line 227)
Point 2: Line 262: GBDT -> Gradient Boosted Decision Trees (GBDT)
Response 2: Thank you for your suggestion, we agree with your opinion. We have revised this sentence. (Line 288)
Point 3: Line 329: factors used were used
Response 3: Thank you for your suggestion, we agree with your opinion. We have revised this sentence. (Line 384)
Point 4: Line 347: factors for to train
Response 4: Thank you for your suggestion, we agree with your opinion. We have corrected this sentence. (Line 409)
Point 5: Section 2.3.1.: The authors should consider specifying what distance or dij was applied to the study (great-circle distance?). Also, explaining better what the meaning and role of the distance scale (d) is.
Response 5: Thank you for your suggestion, we agree with your opinion. We have added an understanding of the method and stated the length of the distance in the text. (Line 252-255)
Point 6: Equation 5: The equation is not labeled, and the elements of the equation are not explained.
Response 6: Thank you for your suggestion, we agree with your opinion. We have added a description and lable of the formula to the text. (Line 277-278)
Point 7: Line 270: What did the authors use to detect and remove outliers?
Response 7: Thank you for your suggestion, we agree with your opinion. We have added a description of outlier screening. (Line 305-307)
Point 8: Line 278: The authors should include the number of nodes in the hidden layers used. Also, do double hidden layers mean two hidden layers?
Response 8: Thank you for your suggestion, we agree with your opinion. We have added descriptions of the created ANN model. The number of hidden layers and the number of nodes is introduced in detail. (Line 315-317)
Point 9: Section 2.3.3.: The authors should include the hyperparameters used to train the LightGBM and how they were tuned.
Response 9: Thank you for your suggestion, we agree with your opinion. We have added Table 2, showing the hyperparameter settings of the LightGBM algorithm. On this basis, we also describe the hyperparameter tuning method in this study. (Line 290-297)
Point 10: Section 2.3.4.: The authors should consider adding some justification for the ANN structure. Did the authors try other options?
Response 10: Thank you for your suggestion, we agree with your opinion. More and more studies have shown that non-parametric models are more suitable for predicting the occurrence of forest fires. We have included methods using ANNs in the text and incorporated references. (Line 299-302)
Point 11: Line 334: The percentage in the text does not match Table 2.
Response 11: Thank you for your suggestion, we agree with your opinion. This is a writing error, we have carefully checked the original data and response results, and revised the text. (Line 393)
Point 12: Section 3.2: The authors should consider explaining the results from Table 2 and Figure 5 for all the 4 regions. The text describes the results in Table 2 only for Jilin Province and in Figure 5 for Heilongjiang Province.
Response 12: Thank you for your suggestion, we agree with your opinion. We have fixed this issue. (Line 390-392)
Point 13: Line 349: The percentage in the text does not match Table 3.
Response 13: Thank you for your suggestion, we agree with your opinion. This is a writing error, we have carefully checked the original data and response results, and revised the text. (Line 411)
Point 14: Section 3.3: Similar to Section 3.2, the results from Table 3 are only described in the text for 2 regions.
Response 14: Thank you for your suggestion, we agree with your opinion. We have fixed this issue. (Line 412-413)
Point 15: Line 366: The percentage in the text and the explanation do not match Table 4.
Response 15: Thank you for your suggestion, we agree with your opinion. This is a writing error, we have carefully checked the original data and response results, and revised the text. (Line 430-431)
Point 16: General: The authors should consider adding other performance measures since the dataset they are using is clearly imbalanced. That would ensure that those large accuracies are reliable.
Response 16: Thank you for your suggestion, we agree with your opinion. To prove the stability and feasibility of the created model. We have added a new accuracy measure (confusion matrix). (Line 330-344, 461-466)
Reviewer 2 Report
The manuscript presents an interesting and useful topic, analysing forest fire occurrence in China and the implications of the spatial patterns and drivers to its management. It is generally well-written and well structured. However, it presents relevant flaws regarding the conceptual and methodological procedure, and requires further analysis and a more detailed and clear description of the options made, in order to be shared. I describe further below the main points.
- a general comment is that the references used and the studies mentioned, both in the introduction and the discussion, should cover more of research done for China
- different ecosystems are mentioned several times; however the analysis was done for the 4 provinces selected, and the link between the provinces and the ecosystems is not clear. What is the proportion of each ecosystem covering each of the provinces? What are their environmental conditions? L79 mentions 3 ecosystems but these are not presented. Fig.1 is not informative enough, lacks boundaries of provinces and colours must be distinctive
- Why were only 3 years of fire data used (2019-2021)? It is rather short to study spatial patterns with the purpose of obtaining a prediction. There is no figure showing the location of the fires, or the pixels considered. It is also not clear what the disturbance text information (L130) means with regards to retrieving fire data info from satellite images. This whole section of data acquisition and fire data needs to be clarified and better described.
- P5, vegetation factors. It is not explained how the 16 types of combustibles were converted into one single variable, and which units of measurement were used (missing in table 1). This is crucial, since part of the results presented depend on this variable (also in combination with topography). Results obtained are not fully supported and understandable in light of this flaw, and it is also not replicable.
- the ANN models presented in the results (with intermediate models and subsamples) are not clearly described (what is the size of the subsamples? Eg. L332)
- Fig3 (P10) - How were the classes built? In order to compare between provinces, intervals should be the same
- The combination of different factors (topography and vegetation) is not explained. This is essential to understand the results obtained
- What about validation of the models? How many pixels (points? were included in training (70% of dataset) and testing? This base information is crucial
Author Response
Response to Reviewer
Thanks to the editor for arranging the review and the reviewers for their valuable comments. We have carefully answered the questions one by one according to the requirements of the reviewers, and made careful revisions to the article
Point 1: a general comment is that the references used and the studies mentioned, both in the introduction and the discussion, should cover more of research done for China 

Response 1: Thank you for your suggestion, we agree with your opinion. In-depth study of other Chinese forest fires in this study is very important. We have included more studies on forest fires in China in the Introduction and Conclusions of the paper. (Line 59, 67,83,493, 513, 537, 540, 577)
Point 2: different ecosystems are mentioned several times; however the analysis was done for the 4 provinces selected, and the link between the provinces and the ecosystems is not clear. What is the proportion of each ecosystem covering each of the provinces? What are their environmental conditions? L79 mentions 3 ecosystems but these are not presented. Fig.1 is not informative enough, lacks boundaries of provinces and colours must be distinctive
Response 2: Thank you for your suggestion, we agree with your opinion. The expression of the study area is indeed lacking in this paper. We have included the ecosystem proportions and profiles of the four provinces in this article (Line 77-82). The ecological environment is shown in 2.1. On this basis, we also modified Figure 1 to add provincial boundaries and changed to forest land and non-forest land. (Line 129-130)
Point 3: Why were only 3 years of fire data used (2019-2021)? It is rather short to study spatial patterns with the purpose of obtaining a prediction. There is no figure showing the location of the fires, or the pixels considered. It is also not clear what the disturbance text information (L130) means with regards to retrieving fire data info from satellite images. This whole section of data acquisition and fire data needs to be clarified and better described.
Response 3: Thank you for your suggestion, we agree with your opinion. Due to data deficiencies, we temporarily only collected forest fire disturbance data in the four provinces from 2019 to 2021, but the total number of fire points is sufficient for statistical analysis after repeated screening. In the future, we will further collect fire disturbance data. We have added a schematic diagram of the fire point (Figure 2) to this paper and added a description of the data acquisition section as well as the fire data section. (Line 135-137, 140-144, 162-163, 169-170)
Point 4: P5, vegetation factors. It is not explained how the 16 types of combustibles were converted into one single variable, and which units of measurement were used (missing in table 1). This is crucial, since part of the results presented depend on this variable (also in combination with topography). Results obtained are not fully supported and understandable in light of this flaw, and it is also not replicable.
Response 4: Thank you for your suggestion, we agree with your opinion. Our lack of a description of how the 16 combustible variables translates into 1 variable leads to confusion for the reader. We have fixed this bug and added a description. (Line 218-221)
Point 5: the ANN models presented in the results (with intermediate models and subsamples) are not clearly described (what is the size of the subsamples? Eg. L332)
Response 5: Thank you for your suggestion, we agree with your opinion. We have described the subsample size used by the intermediate model in the text. (Line 387-389)
Point 6: Fig3 (P10) - How were the classes built? In order to compare between provinces, intervals should be the same
Response 6: Thank you for your suggestion, we agree with your opinion. We lack a description of the formulas and an explanation of the fixed distance chosen. In the article, we modify these two aspects. (Line 252-255)
Point 7: The combination of different factors (topography and vegetation) is not explained. This is essential to understand the results obtained
Response 7: Thank you for your suggestion, we agree with your opinion. We lack an explanation of the combined variables in the article, which can lead to confusion for the reader. We have made changes in the article. (Line 403-405)
Point 8: What about validation of the models? How many pixels (points? were included in training (70% of dataset) and testing? This base information is crucial
Response 8: Thank you for your suggestion, we agree with your opinion. We have modified the part of splitting the dataset in the article, increasing the number of pixels in the training and test sets. (Line 313-314)
Round 2
Reviewer 2 Report
Thank you for the changes made to the manuscript, following the comments and suggestions provided. There was an evident effort to clarify the questions raised. I still have some concerns regarding the robustness of the study, and the textual changes made are not sufficient to ensure the quality of the results and conclusions drawn, as follows:
- according to the study described, the link with the ecosystems goes beyond the proportion of forest types, as added in L219 and following. The main issue is that conclusions are drawn regarding fire factors in different ecosystems and the analysis does not reflect this option. As such, the analysis does not support the statement in the abstract "The results of this study indicated that there were differences in the driving factors of fire in different forest ecosystems" - the three forest types? The 16 fuel/vegetation types? The implications of this information for fire management are huge, since measures, strategies and costs change accordingly. I suggest this vegetation variable to be presented as a map with the distribution of each type/code within the 4 provinces.
These data are also needed to understand the combination with topography; for example, if code 15 is broadleaved together with 10º slope class may increase fire hazard, but the same broadleaved forest with a different slope class does not... the manuscript does not provide enough information in this regard.
- Precision, Recall, and F-measure are mentioned several times, and are called Kyoto indicators (L509) - please clarify and avoid repetition (L630 again). The same happens with the division of the dataset into two samples (L407 for example).
- Fig. 2 scale presented in miles, harmonize with other figures (km)
- The explanation given in L421-424 regarding the combination of topography and vegetation must be improved. There are 2 vegetation variables, but one of them corresponds to 16 different classes, which should allow for different importance measure; please revise
Author Response
Response to Reviewer
Thanks to the reviewers for their valuable opinions, we have answered the questions carefully and revised the article according to the requirements of the reviewers.
Point 1: According to the study described, the link with the ecosystems goes beyond the proportion of forest types, as added in L219 and following. The main issue is that conclusions are drawn regarding fire factors in different ecosystems and the analysis does not reflect this option. As such, the analysis does not support the statement in the abstract "The results of this study indicated that there were differences in the driving factors of fire in different forest ecosystems" - the three forest types? The 16 fuel/vegetation types? The implications of this information for fire management are huge, since measures, strategies and costs change accordingly. I suggest this vegetation variable to be presented as a map with the distribution of each type/code within the 4 provinces. 

Response 1: Thank you for your suggestion, we agree with your opinion. In previous editions, we omitted the description of the types of combustibles, which caused confusion for the reader. In this study, we extracted 16 combustible species from three different forest ecosystems. We have added code descriptions for the types of combustibles and added distribution maps for 16 different types of combustibles in four provinces. At the same time, a description of combustibles has been added to this article. (Line 72-75, 84-86, 87-91, 199-219, figure3)
Point 2: Different ecosystems are mentioned several times; however the analysis was done for the 4 provinces selected, and the link between the provinces and the ecosystems is not clear. What is the proportion of each ecosystem covering each of the provinces? What are their environmental conditions? L79 mentions 3 ecosystems but these are not presented. Fig.1 is not informative enough, lacks boundaries of provinces and colours must be distinctive.
Response 2: Thank you for your suggestion, we agree with your opinion. In previous versions, we did lack the analysis of slope and combustible type combinations. In the analysis and processing of historical data, we found that under the same type of combustibles, the probability of large-slope forest fires is greater. We have added analysis and discussion to the text. (Line 551-552, 558-566)
Point 3: Precision, Recall, and F-measure are mentioned several times, and are called Kyoto indicators (L509) - please clarify and avoid repetition (L630 again). The same happens with the division of the dataset into two samples (L407 for example).
Response 3: Thank you for your suggestion, we agree with your opinion. The repeated appearance of the names of the three indicators and the segmentation of the dataset will affect the fluency and readability of the article. We have abbreviated these parts. (Line 385-386, 466-469, 596-600)
Point 4: Fig. 2 scale presented in miles, harmonize with other figures (km).
Response 4: Thank you for your suggestion, we agree with your opinion. We have changed the scale of the pictures to ensure that the units appearing in the text are consistent. (Figure 2)
Point 5: The explanation given in L421-424 regarding the combination of topography and vegetation must be improved. There are 2 vegetation variables, but one of them corresponds to 16 different classes, which should allow for different importance measure; please revise.
Response 5: Thank you for your suggestion, we agree with your opinion. We have indeed lacked the discussion and results of the importance of different combustible species to forest fire occurrence in previous editions. Therefore, this study analyses the number of historical forest fires for different combustible categories. And gives the proportion of historical forest fires for 16 combustibles. The aim was to rank the importance of different types of combustibles in affecting the occurrence of forest fires. (Line 419-427,figure 9)