1. Introduction
Informed decision-making is of great importance in forestry [
1,
2]. Forest managers usually make effective plans using quantitative information of past and present stand conditions, such as inventory data, growth and yield models, for prediction of the future stand conditions [
3]. Individual tree size, such as diameter at breast height (
D), total tree height (
H), height to crown base (
HCB), crown length (
CL), ratio of
D to quadratic mean
D (
Dq), basal area (
BA), as well as crown width (
CW), play an indispensable role for constructing tree growth and yield models.
D is usually used as a predictor of the
D-H model because of its measurement convenience and comparable lower measurement error (degree of accuracy degree 0.1 cm) [
4,
5,
6,
7].
D is also used as one of the main predictors in many other tree-based models, including the
HCB model [
8], the
CW model [
9], the
CL model, the crown ratio model, the mortality model [
10,
11], the recruitment model, the site index and the dominant height model [
5,
12], the biomass model [
13,
14], and so on.
As mentioned above,
D is one of the crucial tree variables for growth and yield models. Besides this,
HCB,
H, and
CL are equally important for these models. Tree height (
H), especially dominant height, is the reflection of site quality [
5,
12]; the position of
HCB is the indicator of the crown ratio, while the crown ratio is the indicator of tree vigor and photosynthesis [
15], competition, individual tree foliage and branch biomass [
16,
17,
18], wood quality [
19,
20], and wind firmness [
21].
H and
HCB are inherently correlated in a tree, and thus height growth would always be a factor to influence
HCB. HCB can be estimated from
H, and
H can also be estimated from
HCB [
22]; the latter may be more acceptable due to less cost and time requirement. Moreover,
HCB is a member of the crown, attributed with
CW, crown profile, crown biomass, crown volume, crown surface area,
CL, light intercepting surface, site of photosynthesis, and so on [
2,
9,
23,
24,
25,
26,
27,
28,
29,
30]. Furthermore,
H is simply a sum of
CL and
HCB, and
CL is as essential as
HCB, even more essential in estimating crown measures and tree vigor [
31,
32], crown volume, biomass, and taper [
33,
34]. The correlation between
H,
CL, and
HCB is very high and inherent. All in all,
H,
HCB, and
CL are key measures to describe growth, yield, mortality, and recruitment dynamics of forest stands. However, so far, there is no research on the compatibility and additivity of
H,
CL, and
HCB.
Most of the existed studies with regard to compatibility are concentrated on the compatibility and additivity of components and the total parts. For example, the compatibility and additivity of biomass for slash pine (
Pinus elliottii Engelm. var.
elliottii) trees were carried out [
35] in the southeastern United States with three modeling approaches, and they found no single system to predict biomass that is the most accurate for all the components and total tree biomass. These problems in terms of compatibility and additivity almost always concentrate on regression models rather than classification models. When it comes to classification problems, the machine learning methods would be the best choice rather than other regression methods [
36,
37,
38,
39,
40,
41] in forest fire supervision, forest biomass mapping, parameters extraction of remote sensing figures, and other aspects. However, when it comes to regression problems, traditional methods would usually be chosen first, such as NSUR (nonlinear seemingly unrelated regression method), OLS (ordinary least square method), and QR (quantile regression method), et al.
The efficiency is ordered by SUR2 (jointly fitting component biomass equations using weighted NSUR) > SUR1 (weighted nonlinear seemingly unrelated regression (NSUR)) > DRM (an alternative disaggregative approach). In addition, all these three model systems provided more accurate biomass predictions than the previously published biomass equations. Similarly, Dong et al. [
42] compared the nonadditive against the additive biomass models for larch (
Larix olgensis Henry) trees in Northeast China. Jiang et al. [
43] proposed a simultaneous compatible system of taper and volume equations. Jose et al. [
44] constructed compatible height and site index models for five pine species to ensure the simultaneous estimation of height and site index. The nonlinear seemingly unrelated regression (NSUR) is a commonly used method to estimate parameters of these compatible-simultaneous systems of the models of biomass, taper, volume, and site index. With the model system estimated, the inherent correlations of the components are effectively accounted for and guaranteed as compatible, which is not possible with the ordinary least square regression. The accuracy produced using the NSUR is relatively higher, as errors of each individual model involved in a system are simultaneously reduced. There exist other methods of estimating simultaneous systems of equations besides NSUR; for example, Yang et al. [
45] utilized full information maximum likelihood (FIML), two-stage least square (2SLS), and three-stage least squares (3SLS) methods to construct simultaneous individual tree diameter and height-to-crown base, which sound great for data coming from remote sensors of LiDAR; these methods provide a convenient way to estimate these indexes, which can be applied to remote sensing and can shrink the cost to acquire them. However, the application conditions are constrained by some prerequisites. Nevertheless, no studies have been done on the compatibility and additivity of
H,
CL, and
HCB with these methods.
Thus, based on the knowledge of the previous modeling studies, this study attempts to develop a compatible system and an additive system of
H,
CL, and
HCB models to simultaneously estimate
H,
CL, and
HCB with the data acquired from permanent sample plots from the northeast Jilin province of China. Then, in addition to NSUR, FIML [
46], two-stage least squares (2SLS), and three-stage least squares (3SLS) will also be used to try to estimate the parameters of models with different forms of models. The study develops the compatible and additive simultaneous systems of
H,
CL, and
HCB models and compares the discrepancy of NSUR and other methods using the statistical indices. The presented system of the models will be useful for the effective management of natural secondary forests.
2. Materials and Methods
2.1. Study Area
The study was carried out in the Jingouling experimental forest farm (129°97′–130°22′ E, 43°32′–43°49′ N) and the Tazigou experimental forest farm (130°5′–130°20′ E, 43°17′–43°25′ N) of the Wang Qing forest bureau in Jilin province, China. The area of the Wang Qing forest bureau belongs to the Xueling branch of the Laoyeling mountain range of the Changbai Mountain system, which has a temperate continental monsoon climate, with four distinct seasons, long winter and short summer, and a cold climate. The first frost period is generally in the middle of September, the last frost period is generally in the middle of May, the frost-free period is 110–130 days, and the accumulated temperature is 2110 °C. The altitude is about 360–1477 m. The soil is brown forest soil, with neutral pH and 45 cm thickness.
The stand type is spruce-fir coniferous broad-leaved mixed natural secondary forest and Mongolian oak broad-leaved mixed natural secondary forest. Spruce-fir coniferous broad-leaved mixed natural secondary forest is a forest vegetation type left over from the primitive zonal forest vegetation (spruce-fir coniferous and broad-leaved mixed forest and broad-leaved Korean pine forest) of Chang bai Mountain after many “plucking” cuttings. Mongolian oak broad-leaved mixed natural secondary forest is widely distributed in northeast China, mostly in the secondary succession stage of forest communities formed after the over-cutting of broad-leaved Korean pine forest, with low forest productivity and degraded ecological function.
2.2. Sampling and Measurements
The data used in this study were collected from 384 permanent sample plots (PSPs) with the same area of 625 m
2 established across the Tazigou forest farm and the Jingouling forest farm out of 28 forest classes between July and August in 2013. Two forest types were included, where 192 PSPs were located in Mongolian oak forest (Mongolian oak is the main tree species) and another 192 PSPs were located in
spruce-fir forest (
spruce-fir is the main tree species). Sample plots are representative of the various conditions, stand density, and stand stocking. All standing living trees with diameter at breast height (
D) ≥ 1 cm and total tree height ≥ 1.3 m were measured for
D, total height (
H), height-to-crown base (
HCB), and four crown radii. Two azimuths were used to determine the accurate four-crown radii positions; one of the azimuths is perpendicular to the other one, and their direction was from south to north and west to east. In each quadrant, the crown radii were measured from the center of the target tree bole to the great extent of the crown from the target tree bole. Crown width was computed as the arithmetic mean of two-crown widths, and stand density was calculated by the sum of trees in every plot; then, the stand density index would be calculated:
SDI =
N(Dg/D0)
1.605, where
Dg and
D0 are the stand quadratic diameter and standard average diameter; in China,
D0 is 20 cm,
N is the number of trees in the forest stand, and 1.605 represents the self-thinning slope [
47]. Dominant diameter at breast height (
DT) was calculated according to the average value of the top five sample trees in terms of
D in every plot. Other stand variable calculation methods were used commonly. Out of the total measured trees (27,574), trees with DBH < 5 cm and height < 1.3 m were excluded, which retained 15,301 trees to be used in the modeling. This dataset was randomly partitioned into model fitting and model validation datasets, consisting of 70% and 30% of the total dataset, respectively. A data summary is presented in
Table 1. Relationships of
H,
CL, and
HCB with
D,
CW,
BAS,
Dg,
DT, and
SDI are shown in
Figure 1 and
Figure 2.
2.3. Selecting the Base Model
In this study, the monomolecular model for
H and
HCB and the logistics model for
CL were chosen as base models to develop the systems of
H,
CL, and
HCB models. We also evaluated other model forms in our preliminary analysis; however, the monomolecular model for
H and
HCB and the logistics model for
CL performed better in terms of the convergence and reduction of the sum of squared errors of both the model system and individual models of
H,
CL, and
HCB. Our other evaluated models are the Richards model, the Korf model, and the Weibull model. Bronisz and Mehtätalo [
48] chose a monomolecular model and a logistics model as the best models to fit
H and
CW for young silver birch stands on post-agricultural lands in central Poland and individual trees of Chinese fir (Cunninghamia lanceolata) in south-central China, as well as a height-to-crown base model (
HCB) for Mongolian oak in northeast China, based on covariates including
D,
H,
HCB. The following are the forms of base models that showed better fitting performance than many of the other models evaluated.
Base models:
where
a0,
a1,
b0,
b1,
b2,
c0, and
c1 are parameters to be estimated, and
,
, and
are residual errors of the models
H,
CL and
HCB, respectively.
2.4. Selecting Additional Independent Variables
D was chosen as the main independent variable for all three individual models because its correlations with dependent variables (H, CL, HCB) were highly significant. As mentioned earlier, H, CL, and HCB are inherently correlated to each other, and H is the sum of the other two. We consider the tree variables and stand variables that could improve the fitting accuracy of the base model (Equations (1)–(3)). They were included in the base models as covariate predictors; the final model forms with the added covariate predictors were determined through some statistical measures to be defined later, and in order to avoid multicollinearity among predictor variables, only those predictors that have insignificant correlations or correlation less than 0.3 were chosen as the predictors of models. Meanwhile, the constraint of the simultaneous system was defined by the matrix of variance-covariance of H, CL, and HCB. The individual tree-level variables and stand-level variables were finally chosen after many experiments. The variables that negatively affect the fitting accuracy and convergence of the models were eliminated.
2.5. Introducing a Dummy Variable
Since our data represent two forest types (spruce-fir forest, Mongolian oak forest), the effect of species-specific difference was described by one dummy variable that was included in the simultaneous model system (i.e., if species is spruce-fir, then sp = 1, otherwise 0). The error and trail approach was applied to locate the dummy variable to the right positions of the parameters in each base model.
2.6. Structure of Simultaneous Compatible and Additive Model Systems
A model system (MS) is composed of
H,
CL, and
HCB base models (Equation (4)), which is also known as simultaneous MS. The method had been identified by several studies.
where the
H,
CL, and
HCB denote the observed
H,
CL, and
HCB values of measured trees, respectively;
,
, and
are error terms of each model in the simultaneous equation system,
D is the diameter at breast height.
a0,
a1,
b0,
b1,
b2,
c0, and
c1 are the parameters to be estimated.
is the three-dimensional error vector that is assumed to be normally distributed, with zero means and the variance–covariance matrix Σ. The matrix Σ contains variances and covariances to interpret the inherent correlations of
H,
CL, and
HCB.
We assumed that the error terms (
,
, and
commonly compose the error vector
) would be unrelated among the observations; however, they would be contemporaneously correlated between the sub-models. Observations of
H,
CL, and
HCB would have a variance and covariance matrix (Equation (5)):
As is known,
H,
CL, and
HCB exist with the inherent correlations, and the sum of
CL and
HCB results in
H. The form of an additive model system is shown in Equation (6):
Our assumed error terms and structure of the variance-covariance matrix would be the same as defined in Equations (4) and (5).
2.7. Methods of Estimating Models
2.7.1. Ordinary Least Squares with Separating Regression
Base models (Equations (1)–(3)) and their modified variants were separately fitted by nonlinear ordinary least squares with separating regression (OLSSR).
2.7.2. Nonlinear Seemingly Unrelated Regression (NSUR)
Three base models (Equations (1)–(3)) were linked to form a single equation system through the inclusion of the noise terms. The hypothesis was that the noises or disturbances between the three models would be uncorrelated among all the observations. The covariance matrix of the disturbance terms, however, is generally unknown. A feasible generalized least squares (FGLS) algorithm is applied to estimate the matrix, and it is asymptotically efficient. The algorithm used for this propose is generally known as a “seemingly unrelated regression” (SUR) estimator. Parameters of the simultaneous equation system could be estimated by NSUR, with FGLS associated. The calculation steps we employed are below:
Step 1: The OLS method was used to fit the three base models to obtain residuals . Then, model residuals were used to estimate the variance and covariance of the matrix Σ with a 3 × 3 dimension (Equation (5)). This matrix was used to interpret the inherent correlations of H, CL, and HCB.
Step 2: The estimate of covariance matrix
was used to estimate the covariance matrix R in the form of covariance matrix
. The FGLS is used to estimate the parameter vector (Equation (7) [
49].
2.7.3. Attempting Other Estimation Methods
We also tried fitting simultaneous and additive MS using FIML (full information maximum likelihood), 2SLS (two-stage least square), and 3SLS (three-stage least square) with various start values from OLSSR but failed as none of the MS converged.
In the FIML, 2SLS, and 3SLS model systems, the endogenous variables (error-in variable), exogenous variable (error-out variable), and instrument variables (usually exogenous variables) are required to be set in advance. The instrument variables, endogenous variables, and exogenous variables were all needed in the 2SLS and 3SLS methods; the endogenous variables and exogenous variables were needed in the FIML method, but instrument variables were not required. Out of the prerequisites of these methods, the difficulty of model system convergence would be enlarged. When these methods were utilized to estimate the model system, their complexity will hinder the iteration and convergence.
2.8. Model Evaluation
Many statistical indices can be used to evaluate individual models and model systems constructed from them [
6]. We applied the most common statistical measures for this purpose, such as coefficient of determination (
R2), bias (
), bias variance (
), root mean square error (RMSE), and total relative error (TRE) [
50].
where
and
are the observed and predicted values of
H,
CL, or
HCB for the
ith observation value, respectively, for the
ith observation;
is mean value of samples;
N is the number of samples.
Both the model systems (compatible and additive model systems) were tested against the 30% data set allocated for validation. The same statistical indices (Equations (8)–(12)) that were used to evaluate the fitting performance of the individual models and model systems were also used to evaluate the prediction performance of the simultaneous and additive model systems. Unless otherwise stated, we used a 5% confidence level to evaluate the individual models and model systems.
The basic data processing and calculation of individual tree variables and stand variables were carried on with R-3.6.1 software. SAS 9.4 software was used for estimating base models separately and two model systems with the NLIN and MODEL procedure in the SAS/ETS modular system [
51].
The flowchart of the paper (
Figure 3) is shown below:
4. Discussion
In the previous studies [
14,
24,
42,
50], the simultaneous MS and additive MS almost always revolved around individual tree and stand biomass (stem, branch, foliage, bark, root, and total individual tree biomass), where the sum of the components’ biomass equals to the total biomass or the sum of four direction radii equals to twice the crown width. Biomass additivity and crown additivity are all about the additivity; their compatibility has been seldom mentioned. The NSUR is commonly used to solve the compatibility and additivity of these problems. Moreover, AP (adjustment in proportion) is also a common method to estimate biomass systems and crown radii systems. These methods [
31,
52,
53,
54,
55,
56] all ensure the additivity and compatibility of models. i.e., Fu et al. [
8] developed a system of nonlinear additive crown width models utilizing seemingly unrelated regression for Prince Rupprecht larch in northern China. The compatibility and additivity of
H,
CL, and
HCB is a desirable aspect for a system of models used for predicting
H,
CL, and
HCB with these predictors, including individual tree-level and stand-level variables. So far, however, no study has analyzed the compatibility and additivity properties of
H,
CL, and
HCB. In this study, the simultaneous MS and additive MS of
H,
CL, and
HCB were developed with the most widely applied estimation method, NSUR, due to its superiority to other methods (FIML, 2SLS, 3SLS, and OLS) as NSUR accounts for the inherent correlations between
H,
CL, and
HCB. The additive MS was superior to the simultaneous MS. It may be because the parameters in additive MS are shared by
H with
CL, and
HCB that the parameter estimation by the variance-covariance matrix
R (Equation (5)) is simpler than that of the simultaneous MS.
The base models of
H,
CL, and
HCB are of great importance for the simultaneous MS of
H,
CL, and
HCB model development. Albert [
57] developed a generalized nonlinear mixed-effects height–diameter model for Norway spruce in mixed-uneven aged stands, and 21 (
H-D) functions were evaluated for their fit performance; (Fu et al.) [
25,
32] developed a crown additive model fitted by NSUR and a generalized nonlinear mixed-effects height-to-crown base model with a logistics model; Yang [
58] developed a generalized nonlinear mixed-effects height-to-crown base model for Picea crassifolia trees in northwest China. In this study, due to the data attribute and the model attribute, when several forms of models would be simultaneously estimated, the convergence would be difficult to meet. The monomolecular model has two parameters; in addition, the fitting accuracy is relatively high and easily estimated when estimated together with
CL in the logistics model with many experiments; finally, the monomolecular model was the best base model for
H and
HCB, and the logistic model for
CL was selected, finally, as the base model. The accuracy of these models was definitely enhanced, and the errors were reduced to a great extent when the tree-level and stand-level variables were introduced as such variables significantly affect the growth of
H,
CL, and
HCB. The innovation of this study differs from previous studies: first, the developed simultaneous MS and developed additive MS of
H,
CL, and
HCB; second, the model forms change—the allometric growth biomass models are the best choice for biomass, but the base models of this study need to be tested by many model forms because the variation discrepancy between
H,
CL,
HCB and biomass is enormous.
Our study includes two types of forest stands; the growth differences between the two types of forest stands are native, so the introduction of dummy variables will be reasonable. The accuracy was improved for every sub-model when a dummy variable was introduced. The dummy variable model parameters were all significant (
p < 0.05) except for
b3 (
p = 0.2289) for
CL sub-model of Equation (20), indicating the discrepancy of forest two-kinds stand types is significant. The reason why only several individual tree- and stand-level variables were introduced into our base model was that introducing too many predictors into the systems could impede the computation convergence and result in over-parameterization that causes biased parameter estimates, as reported by Tang [
59]. Furthermore, including too many predictor variables will increase forest inventory costs. Therefore,
D,
CW,
HCB,
Dg, and
SDI are appropriate as covariate predictors in the system of the
H,
CL, and
HCB models.
The simultaneous MS of H, CL, and HCB and additive MS could be used for estimating part of the values of H, CL, and HCB of individual trees for Mongolian oak and spruce-fir in northeast China, where H, CL, and HCB are not simultaneously known because of some inconvenient reasons of measuring these predictors in regular forest inventories. D and CW can be measured to predict H by calculating the DT, Dg, and BAS of the stand. Then, D, DT, Dg, BAS, and SDI can be used to predict HCB, and, finally, CL will be calculated. The simultaneous MS of H, CL, and HCB models will greatly reduce the cost and time for the collection of required input variables in the field. The shortcoming of our model lies in the dearth of H as a predictor of HCB, according to many studies, because H is admitted as a robust variable to predict HCB. Therefore, the fitting accuracy of our HCB sub-model was comparatively lower than the other two models of H and CL. When it comes to the additive MS, H cannot be a covariate in the model. Hence, H would be excluded.
NSUR and OLSSR were utilized to estimate the parameters of the simultaneous MS of H, CL, and HCB, respectively, for simultaneously compatible model MS and additive MS. When NSUR was used, the correlation between H, CL, and HCB was taken into consideration by the variance-covariance matrix; the variance-covariance matrix is an excellent interpretation of their inner relationships. When OLSSR was used, the fitting accuracy was higher than NSUR, but it could not estimate H, CL, and HCB simultaneously. Compared with NSUR, the OLSSR never considers the correlation of sub-models, so the results produced from OLSSR would usually be biased, while NSUR considers the deeper relations of H, CL, and HCB and interprets the relations by R matrix, so the results from NSUR are unbiased. There will be more studies on applying the TSEM (two-stage error-in-variable method) and other methods to the simultaneous system of the H, CL, and HCB models in order to offer more efficient methods to solve these problems. We also tried fitting simultaneous and additive model systems using FIML, 2SLS, and 3SLS; however, none of the model systems converged. The methods of using FIML, 2SLS, and 3SLS to make the simultaneous and additive model systems converge will be our next study point.