Reconstructing a Fine Resolution Landscape of Annual Gross Primary Product (1895–2013) with Tree-Ring Indices
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript identified a close relationship between tree-ring index and vegetation GPP, then developed a TRI-GPP model to reconstruct spatially explicit GPP values since 1895 from seven tree ring chronologies. The topic is suitable for publication in this journal. This is a meaningful work. The authors reconstructed and mapped the spatial distribution of GPP from 1895 to 1985 in eastern Illinois. Three main climate drivers to influence vegetation photosynthesis and GPP absorption were analyzed. However, there are still some flaws. I advised accepting this paper after a major revision. The specific revision suggestions can be found as follows:
1) The biggest problem in this manuscript is the lack of necessary explanations for the experimental design. In modeling phase (section 2.2 Pixel by pixel regression), what are the independent variables? How to do GPP mapping pixel by pixel in Phase 1(1895-1937)? After reading this part, I still cannot understand how four plots are used to simulate GPP in the whole study area in 1985. Moreover, how much the example data is for modeling?
2) Line 75, assume that the forest and grassland in 2013 kept stable for the whole study period (1895-2013). Can this hypothesis be supported by long time series LULCC data? And what about some algorithm like LandTrendr?
3) what is the “Combined data” actually? In my opinion, this should only be simulated data. Meanwhile, what is the meaning of simulated+real GPP in figure 6? The blue columns represent data after 1986, and the green columns represent data before 1986. I'm confused.
4) The authors conduct an analysis of the relationship between GPP and climate factors. I suggest that a table be represented for all climate factors to describe the data in detail.
4) In figure 5, Why the GPP have a marked decline in 2012?
5) there are two sections numbered 2.2, 2.2 Date source and data preprocessing and 2.2 Pixel by pixel regression.
6) Why is plot 3,4,6 not shown in Figure 2C? Or why do you need to use a uniform plot number for both time periods?
7) Why do Figures 4c and f have negative predicted values?
Author Response
Reviewer 1
1) The biggest problem in this manuscript is the lack of necessary explanations for the experimental design. In modeling phase (section 2.2 Pixel by pixel regression), what are the independent variables? How to do GPP mapping pixel by pixel in Phase 1(1895-1937)? After reading this part, I still cannot understand how four plots are used to simulate GPP in the whole study area in 1985. Moreover, how much the example data is for modeling?
Thank you for your comments. We realized that we should more details on our method part is necessary. Here is a general process of our paper. The purpose of our study is to reconstruct GPP values 100 years ago. The dependent variables and independent variables are GPP and multiple tree ring indices, respectively. The available GPP data is from 1986 to 2013. We ran two regressions for each pixel (1895-2013 and 1937-2013) whose training sets were the data from 1986 to 2013. If we want the data from 1895-1936, we have to turned to 1895-2013 regression, instead of 1937-2013 regression. The situations of available tree ring indices can be a little complex (Figure 1).
From 1937 to 2013, all seven plots had available tree-ring indices (the number of dependent variables is seven), while from 1895 to 2013, only four plots had available tree-ring indices (the number of dependent variables is four).
We started with the 1937- 2013 period. This is a pixel-by-pixel regression where millions of pixels were consisted of our study areas. If we can make a regression for Pixel A (Figure 2), we can repeat the same process on the rest of pixels. The training set of Pixel A contained the annual GPP values from 1986 to 2013 (We used five-fold validations and in each validation, we had 22 examples and) and the tree-ring indices of seven plots from 1986 to 2013 (Table 1). For each pixel regression, the seven tree-ring indices are unchanged. We tried three models (SVM, GRNN and RF) to build up the regression. We repeated the same process for all vegetation pixels and calculated the averaged RMSE, MAE, MAPE and Adjusted R2 for each fold. Then we chose one of the three as our best model. For each pixel, they have their own models and then we input tree-ring indices of the seven plots from 1937 to 1985 to each model. Those model could reconstruct the GPP values from 1937 to 1985 for each pixel.
Then we come to 1895 – 2013 period (pixel by pixel regression), all the things were the same except that less plots (less tree-ring indices as our independent variables) were involved. In this phase, we only have four plots.
In the updated version, I added more detailed explanation on the Part 2.3 like the following: In Phase 2 (1938-1985), we developed the models for each pixel in our study areas between GPP values in that pixel from 1986 to 2013 and TRI of seven plots from 1986 to 2013 as our independent and dependent variables respectively in model training. We used 5-fold validation to evaluate the model performances (Table 2) where for each pixel, there were 22 pairs of training samples (GPP and multiple TRIs) in the training sets.
2) Line 75, assume that the forest and grassland in 2013 kept stable for the whole study period (1895-2013). Can this hypothesis be supported by long time series LULCC data? And what about some algorithm like LandTrendr?
Thank you for your comments. Some change detection models like LandTrendr or Continuous Change Detection and Classification (CCDC) are great models to detect the land cover or land use change when we input vegetation index bands or raw satellite bands. But our study time is from 1895 to 2013. We did not have any satellite data 100 years ago to be our inputs, so we cannot guarantee there are no any subtle land cover change in our site. But two points support us to draw the conclusion that those areas are stable.
- Many of our team members are from local. They are familiar with the historical land cover changes in our study areas. No history records pointed out that there have been huge land cover changes in the local.
- In most cases, the urbanization is irreversible (Gao and Neill, 2020). We assume that if the pixel in 2013 are non-urban according to the National Land Cover Database (NLCD), in the past the pixel was highly likely to be non-urban also. If a pixel was urban in 2013 NLCD map, we excluded it to make sure that all involved pixel were grass or forest pixels within our study period.
Gao, J., & O’Neill, B. C. (2020). Mapping global urban land for the 21st century with data-driven simulations and Shared Socioeconomic Pathways. Nature communications, 11(1), 2302.
3) what is the “Combined data” actually? In my opinion, this should only be simulated data. Meanwhile, what is the meaning of simulated+real GPP in figure 6? The blue columns represent data after 1986, and the green columns represent data before 1986. I'm confused.
Thank you for your insightful comments. The combined data is the combination of the simulated GPP data and real GPP data.
After model processing, we have simulated GPP data from 1895 to 1985. The next question is to how to make full use of those data. In our manuscript, we try to correct some bias or convince some results about the relationship between GPP data and climate data. Many previous conclusions on climate were limited (for example, in Figure 6B, the correlation between GPP and March temperature was very high for the blue bar) because we only have a short historical data (less than 40 years). We assume that our conclusion can be more reasonable if we have longer historical data. We have verified that our simulated data are plausible, so we try to extend our historical data from 28 years (1986 - 2013) to 119 years (1895 to 2013). The extension is the combination. Any conclusion on the climate (like the correlation between GPP and temperature) should refer to a long-term data (https://www.ncei.noaa.gov/news/weather-vs-climate) but previous study only have GPP data within a few decades and cut off long-term weather data to match GPP data, when they calculated the correlations between climate factors and GPP. Now, we have more than 100 years GPP data and we also have more than 100 years climate data, the conclusion can be more believable (For example, in Figure 6B, the correlation between GPP and March temperature is low in green bar.). We want longer available data, so we did not display only simulated data.
In the example, we found a mismatch between GPP and March temperature in the long-term data and short-term data. Two possible reasons might lead to the mismatch between long-term result and short-term result. One is that the climate had such a huge change within three decades indeed while the other is that the conclusions based on the short-term data could have some bias. According to current situation, we could not judge which reason is correct but at least the mismatch could put forward an alert to the public. But if we switch to only simulated data, our purpose could not be achieved. To declare our idea clearly, we will add the following in the Part 2.5.
We calculated the correlations in two datasets (long-term combined dataset and short-term dataset). If the correlation values in two datasets had a substantial differences, we assume the differences come from two possible reasons. One is that the short-term dataset could not fully reflect the real situations, while the other is that the extreme climate change re-generate the current situations. No matter which reason, we will have a further investigate on the differences.
4) The authors conduct an analysis of the relationship between GPP and climate factors. I suggest that a table be represented for all climate factors to describe the data in detail.
Thank you for your comments. I totally agree with you. Tables can make more sense, compared with only figures. The main body of the manuscript could not hold such big tables. We will put the tables in the auxiliary where many correlations values could be displayed.
4) In figure 5, Why the GPP have a marked decline in 2012?
Thank you for your comment. A severe drought occurred in both states (Illinois and Indiana), which significantly affected local vegetation in 2012. (https://stateclimatologist.web.illinois.edu/drought-in-illinois/ and https://www.weather.gov/ind/summer2012).
5) there are two sections numbered 2.2, 2.2 Date source and data preprocessing and 2.2 Pixel by pixel regression.
Thank you for your reminder. We corrected them in the updated version.
6) Why is plot 3,4,6 not shown in Figure 2C? Or why do you need to use a uniform plot number for both time periods?
Thank you for your insightful observation. The answer can be found in the Answer 1. Those three plots did not have available tree-ring indices earlier than 1937. We have two round regressions but those three plots can only join one round (1937-2013). I will add the following in the Part 2.3.
The reconstruction process in Phase 1 and Phase 2 were similar with each other except that there are four available plots and seven available plots in Phase 1 and Phase 2, respectively.
7) Why do Figures 4c and f have negative predicted values?
Thank you for your comments. SVM model had a poor performance in our research areas and some predictions from SVM model were negative. We ran the regression using three models (SVM, GRNN and RF). The size of training set was less than 30 (1986 - 2013). With a small training size, some model like SVM had a very poor performance where SVM had highest RMSEs (1203.08 g C m-2 year-1 had and 580.23 g C m-2 year-1) in the two phases, so we did not choose SVM as our final model to reconstruct GPP in either phases.
Reviewer 2 Report
Comments and Suggestions for AuthorsAt present, the remote sensing impact and meteorological data records are only complete records of the past few decades, which does affect scholars' research on long-term and large-scale ecological evolution in the past (such as GPP, carbon sink, etc.). The idea of ​​using data simulation proposed by the author is worthy of appreciation. If this method can be widely recognized, it can bring new technical tools to traditional forestry research. However, the main innovation of this paper is also the biggest contradiction (usually innovation also means contradiction). I think the author needs to carefully consider the following two points.
1. The numerical simulation technology used is still a relatively classic machine learning method. At present, generative deep learning is the most advanced method in the field of pixel-level generation. Although this is a small sample learning problem, I think if the author proposes innovations from the perspective of methodological innovation, it is inevitable to include it in the scope of investigation.
2. The author spent a lot of space to discuss the forestry conclusions or data analysis obtained by this method, but neglected the discussion of long-term simulation using this method. I think this is the greatest value that this article should reflect. Once this method is widely recognized, it has the potential to change the way traditional forestry research is done.
Author Response
The numerical simulation technology used is still a relatively classic machine learning method. At present, generative deep learning is the most advanced method in the field of pixel-level generation. Although this is a small sample learning problem, I think if the author proposes innovations from the perspective of methodological innovation, it is inevitable to include it in the scope of investigation.
Thank you for your insightful comments. We agree with you. In many cases, deep-learning approach outperformed classical machine learning approaches. At the beginning, we planned to use deep-learning method. But there are two reasons to prevent us trying deep-learning.
The objective of the manuscript is to construct the TRI-GPP model with a valid machine learning approach (Maybe the approach might not be the best.). We spend most of our efforts to introduce how to collect tree rings and how to process pixel by pixel regressions with machine learning approaches to make it reliable and believable. Once the readers accept the model, we may go further like enlarging our study areas or comparing our current approaches with some deep-learning approaches. Introduction of advanced approach might not be our top priority.
The other reason is that deep learning method might not have a better performance than classical approaches under a small sample size. Running a deep learning approach to cover whole study areas with five-fold validation need a lot of time and advanced hardwares, which is out of our league.
But your decent suggestion encouraged us to run a deep-learning approach in two subset areas (Easy Areas and Difficult Areas) with seven plots (1937-2013). The size of the two test areas were 3km Ñ… 3km, respectively whose spatial resolution was 30 meters. The training set included the GPP values and TRI values from 1986 to 2007. The validation year was 2010. We chose random forest (RF), the best model in the 1937-2013 period and Convolutional Neural Network (CNN), a classical deep-learning approach. When we ran the RF for the whole study areas, we found out that the RF had a poor performance in Difficult Areas and a good performance in Easy Areas (Figure 3).
We want to witness the performances of the two models in the Easy Areas and the Difficult Areas. The CNN setting was the following:
MaxEpochs: 100
InitialLearnRate: 0.03
ValidationFrequency: 10
MiniBatchSize: 128,
LearnRateSchedule: piecewise
LearnRateDropFactor: 0.9
LearnRateDropPeriod: 10
Not matter in which areas (Easy Areas or Difficult Areas), RF was better than CNN (Figure 4). We will add the following in the 4.4 limitation and future research.
We tested Convolutional Neural Network (CNN) in our study areas whose performance was worse than our chosen model due to the small sample size. We assume that if a deep-learning approach can adjust the size of our sample, the model may have a substantial improvement in future regression.
- The author spent a lot of space to discuss the forestry conclusions or data analysis obtained by this method, but neglected the discussion of long-term simulation using this method. I think this is the greatest value that this article should reflect. Once this method is widely recognized, it has the potential to change the way traditional forestry research is done.
Thank you for your comments. Once the TRI-GPP model can be accepted by the public, there are two fields where we can use our models. Firstly, we will enlarge our research areas into a bigger area or introduce certain advanced methods like Long Short-Term Memory (LSTM). Trees are around the whole world, so in theory the model can be applied in the vegetation areas across the world except the tropical areas, where growing season never ends and many trees could not develop tree rings. Secondly, with the extended time-series data, we try to challenge some obsolete conclusions only based on real satellite images. For example, the conclusions on climate factors like the correlations between GPP and temperature need a long-term data, so some results from short-term data were inconsistent with the reality. This is why we tried to compare the conclusions based on long-term data and short-term data in Part 3.3.
In the updated version, we will highlight the application using long-term simulation in Part 4.4.: Once our model (TRI-GPP) is constructed successfully, there are two fields where we can make full use of our models. Firstly, we will enlarge our research areas into a larger area or introduce certain advanced method. Trees are around the whole world, so in theory we could reconstruct a long-term simulated GPP for every vegetation areas and witness some long-term ecological processes including forest succession, vegetation dynamics and others. Secondly, we might challenge some obsolete conclusions only based on 40-year satellite images (1986-present). We found that there were some contradictions between the short-term dataset and long-term dataset. It is worthwhile to list the scope of both datasets where researchers should know which dataset is more appropriate to their research. But this is not the end. With the simulated GPP, we expect more updates on outmoded findings established in old era.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe dependent variables and independent variables are GPP and multiple tree ring indices, respectively. The authors ran two regressions for each pixel of 1895-2013 and 1937-2013. That is ok.
But I am really confused with these issues,
(1) How to derive GPP map (surface data) covering the entire study area with millions of pixels based on 7 plots of tree ring (points data). That is to say, how can GPP regression be achieved in regions where there is no tree ring data?
(2) As this experiment plots only cover a small part of the whole area, how to ensure the representativeness of these plots?
(3) Building model with remote sensing data and available GPP data from 1986 to 2013 is reasonable. But the paper does not describe how the GPP map is made in 1895 (Figure 5 and Auxiliary Figure 1), there was no remote sensing data at that time, how to get the GPP raster data at that time?
Author Response
Point by point responses
Thank you for managing our submission and the relatively quick decision. We appreciate the comments from the two reviewers and the time they invested in improving our work. We implemented the majority of the suggested revisions. The italic texts are the reviewers’ comment and the bold texts are our responses. Below are responses to all 3 comments.
Reviewer 1
(1) How to derive GPP map (surface data) covering the entire study area with millions of pixels based on 7 plots of tree ring (points data). That is to say, how can GPP regression be achieved in regions where there is no tree ring data?
Answer 1: Thank you for your comment. At the very beginning, we assume the model can reconstruct GPP values within the areas whose size was 117 km * 30 km (our study areas is less than a large county in China) because the climate might be the same within comparatively small areas. It is not very possible that at one corner, there is a severe drought and at the other corner, there is a heavy raining. Within the same climate, we assume the GPP values can be reconstructed by surrounding plots, even though the distance between one vegetation pixel and one plot can be some kilometers.
This is why we put forward our assumption but we are not very sure whether the hypothesis is correct. So we use the five-fold validation to verify our hypothesis. In Table 2 and Figure 4, we build up one model for each pixel (total number of models is more than 3 million) and reconstruct the GPP values for five specific years. The RMSE, MAPE, MAE and Adjusted R square values, the average of all pixels, were also displayed where each dot in Figure 4 were the metrics of each vegetation pixel in our study areas. The results were acceptable (Figure 4B: RMSE 559.45 g C m-2 year-1, MAPE = 13.91%, MAE = 428.45 g C m-2 year -1, Adjusted R2 = 0.53). Our assumption were supported by our validations, so we ran our millions models to reconstruct values from 1895 to 1985. In short, the good validation results support our models.
(2) As this experiment plots only cover a small part of the whole area, how to ensure the representativeness of these plots?
Answer 2: Thank you for your insightful comments. The small areas of plots or other reasons might influence our models, so we ran our models to reconstruct validations for all pixels with the five-fold validation. The validation metrics are satisfying, so we assume our models can achieve their goals.
We have a strong transect from east to west across the study area and there is little change in Latitude across the area. Our plots cover any potential east-west moisture gradient and the topography is relatively flat across the area. So the tree-ring sites represent the area well.
(3) Building model with remote sensing data and available GPP data from 1986 to 2013 is reasonable. But the paper does not describe how the GPP map is made in 1895 (Figure 5 and Auxiliary Figure 1), there was no remote sensing data at that time, how to get the GPP raster data at that time?
Answer 3: Thank you for your comments. This is a pixel-level regression. For pixel A, we have a GRNN model like the following:
GPP = GRNN(x1, x2, x3, x4) or maybe we can use a linear regression as our example (the mechanism of linear regression is the same with the GRNN model). The x1, x2, x3 and x4 are the TRIs from four plots whose available years were from 1895 to 2013.
GPP = ax1 + bx2 + cx3 + dx4 (the GRNN model can have more parameters)
All a, b, c, d and other parameters were acquired by our regression from 1986 to 2013. The equation will be like the following
GPP = 2*x1 + 3*x2 + 4*x3 + 5*x4 (a = 2, b =3, c=4, d= 5)
If we want to know the GPP in 1895 for pixel A, we have TRIs for the four plots in 1895. We just input those TRIs into the equation, we will know the specific GPP values. We will do this for each vegetation pixel in our study areas and then we will get the 1895 GPP map in Figure 5.
These reconstruction techniques are similar to Ed Cook’s point-by-point regression for the reconstruction of PDSI (Palmer Drought Severity Index) across all of North America through the North American Drought Atlas and other climate atlases that he has created around the world (Cook et al. 2007, Cook et al. 2010, Cook et al. 2015). Our innovation is that we have applied these techniques in combination with remote sensing data to create a new output of GPP on the fine scale across the study sites. As we have mentioned, the chronology and regression statistics strongly support our capacity to conduct these reconstructions.
Cook, E.R., Seager, R., Cane, M.A. and Stahle, D.W., 2007. North American drought: Reconstructions, causes, and consequences. Earth-Science Reviews, 81(1-2), pp.93-134.
Cook, E.R., Anchukaitis, K.J., Buckley, B.M., D’Arrigo, R.D., Jacoby, G.C. and Wright, W.E., 2010. Asian monsoon failure and megadrought during the last millennium. Science, 328(5977), pp.486-489.
Cook, E.R., Seager, R., Kushnir, Y., Briffa, K.R., Büntgen, U., Frank, D., Krusic, P.J., Tegel, W., van der Schrier, G., Andreu-Hayles, L. and Baillie, M., 2015. Old World megadroughts and pluvials during the Common Era. Science advances, 1(10), p.e1500561.
Reviewer 2 Report
Comments and Suggestions for AuthorsNo further comments.
Author Response
Thank you for your past comments that have improved out paper.