1. Introduction
Machine learning (ML) is a subset of artificial intelligence (AI) that focuses on the development of algorithms and models that enable a computer to learn from a specific dataset and make predictions or take actions without being explicitly programmed [
1,
2]. One of the most used ML techniques is the artificial neural network (
ANN), and the resilient backpropagation artificial neural network (
RBPANN) performs supervised ML in multi-layer perceptron. The main principle is to eliminate the harmful influence of the size of the partial derivative on the weight step [
3,
4,
5].
ANNs are computational models inspired by natural neurons, and they represent a generalization of mathematical models of human cognition or neural biology [
1,
6,
7]. In
ANNs, the training and testing datasets are used to train and evaluate the performance of the network. The training dataset is used to train the neural network. The testing dataset is a separate dataset that is used to evaluate the performance of the trained neural network [
1,
8,
9].
One of the most important relationships in forest modeling is the total tree height and diameter at breast height (
h-dbh), and this relationship is usually applied in forest inventory or for height estimation in forest management and planning [
10]. The knowledge of the
h-dbh relationship is fundamental in both developing and applying many growth and yield models [
11,
12]. This relationship has mainly been studied with nonlinear mixed effect modeling (
NLMEM) with fixed and random parameters for several species for group-level or ecological conditions [
10,
11,
13,
14,
15,
16]. Lately, this relationship has been studied with AI, and ML through an
ANN has been used [
7,
17]. Also, other variables such as crown width [
18], biomass [
19], volume [
20], forest fire [
21], and annual radial growth with competition indices [
22] have been studied with different ML algorithms. Occasionally, a clustering analysis based on unsupervised ML has been included to group similar data points together based on their inherent characteristics or similarities [
1,
23,
24,
25]. An unsupervised clustering analysis can identify patterns or structures in datasets to improve the fitted models in forest modeling. For AI algorithms, prediction is more important than inference. In this context, models or algorithms based on
ANNs could give better estimations than the
NLMEM approach, and this is worth reporting on.
Specifically, in Mexican Forestry the
h-dbh relationship has been extensively studied with
NLMEM for local and generalized models, and occasionally an unsupervised cluster analysis was included in modeling [
12,
26,
27].
NLMEM is better than fitted models using the ordinary least squares method and those using random parameters to explain the variability between groups, sites, or ecological regions. Lately, ML algorithms have been used in forestry research and the results outperform the
NLMEM approach for the
h-dbh relationship [
7]. However, in Mexican Forestry,
ANN algorithms have not been applied to model the
h-dbh relationship, and it is necessary to evaluate and compare these approaches. The main model used for
NLMEM has been the Chapman–Richards model [
28], which is based on sigmoid relationship growth based on age [
29]. The significance of Durango pine (
Pinus durangensis Martínez) extends beyond its ecological value; it also plays a pivotal role in wood production in the region of Durango, Mexico. Moreover, the state of Durango holds a preeminent position as the most crucial hub for timber production across the entirety of Mexico. In the study area, Durango pine is the most frequent species in mixed-species forests.
Considering the above schemes and the context of AI in forestry research, this study aims to model the h-dbh relationship for the Durango pine species by NLMEM and ANNs with an unsupervised clustered dataset for training and testing. The algorithms were compared in both training and testing phases and some conventional statistics such as root mean square error, coefficient of determination, Akaike’s information criterion, Bayesian information criterion, and log-likelihood were used to perform the approaches. The resilience backpropagation of the ANN (RBPANN) was employed, and three activation functions were computed and evaluated. The activation functions were tangent hyperbolicus (RBPANN-tanh), softplus (RBPANN-softplus), and logistic (RBPANN-logistic), and those were trained by resilience backpropagation in addition to maximum likelihood being used. Finally, the primary objective of this research was to assess and contrast the efficacy of ANNs and NLMEM in modeling the h-dbh relationship for the Durango pine species. Additionally, a novel algorithm is proposed for accurately estimating total tree height using the diameter at breast height and cluster-groups as the predictor variable.
4. Discussion
Having knowledge about the total tree height and diameter at breast height is essential for both the development and application of many growth and yield models. Models focusing on the
h-dbh relationship serve as valuable tools for accurately predicting tree height based on
dbh measurements. The
dbh can be conducted quickly, easily, and accurately, but the measurement of total tree height is comparatively complex, time consuming, and expensive [
11].
NLMEM has been a capable approach to generate models of the
h-dbh relationship for different species and assume fixed and random parameters for specific groups or covariables to study the variability in inter- and intra-plots, ecological regions, or cluster-groups [
10,
16,
37]. Also, these models have been studied for local and generalized formulations with the
NLMEM approach [
12,
13,
16,
26]. In this case of study, the
NLMEM performance was accurately strong in modeling the
h-dbh relationship for the Durango pine and the inclusion of an unsupervised clustering analysis improved the estimated parameters and their statistical properties [
34,
45]. This involves fixed parameters for the overall dataset in the training phase and random parameters for each cluster-group, in addition to parameters that give information about general variability and variability within cluster-groups.
The
NLMEM demonstrated outstanding performance during the training phase, with the fitting process converging quickly and effortlessly. Additionally, the maximum likelihood approach yielded favorable and suitable results, particularly when expressing the asymptote parameter with mixed effects (
Table 3 and
Table 4). All parameters in the fitting process were significantly different than zero at a 5% significance level, the random parameters allowed suitable estimations in the training phase, and those were used for cluster-groups in the testing phase. The application of the
NLMEM approach on the testing dataset resulted in successful outcomes that aligned with the expected results (as shown in
Table 8), accompanied by the utilization of appropriate statistical measures. As an illustration, the root mean square error (RMSE) for the overall dataset during the testing phase was determined to be 3.1438 m, with an average value of 3.3773 m observed within the cluster-groups (refer to
Table 8). By employing a mixed effect model and incorporating cluster-group inclusion, the Chapman–Richards growth equation [
28] (Equation (2)) proved to be a highly effective model for predicting the height of Durango pine trees. Similar results have been found for several species an different conditions [
11,
16,
26]. Even though the
NLMEM method is accurate for height prediction based on diameter measurements, it is worth considering that
ANNs could be a suitable alternative for modeling the
h-dbh relationship under several dataset conditions and the incorporation of grouped strategies [
7,
14,
46]. In recent times, there has been a growing application of AI and ML techniques in the fields of biology and forestry. These advanced approaches have proven valuable in addressing challenges that require substantial computational resources and unsupervised learning methods [
1,
39]. Several of these approaches have been employed in studying the height–diameter at breast height (
h-dbh) relationship, leading to notable outcomes and reported successes for various species and under diverse forest management conditions and thereby demonstrating their versatility and effectiveness [
7,
14,
15,
46]. In this context the
ANN model outperformed the
NLMEM approach.
In this study, the
ANNs were evaluated and compared with the traditional
NLMEM method. The
ANNs utilized the RBP learning algorithm along with three activation functions. In most cases, the
ANNs employing the
RBPANN-tanh,
RBPANN, and
RBPANN-logistic (Equations (7)–(9), respectively) exhibited superior performance compared to the results obtained by the
NLMEM, both during the training and testing phases. The training statistics for the three
ANNs, as presented in
Table 6, exhibited enhanced fitting performance compared to the statistics obtained by the
NLMEM (see
Table 4). This improvement was observed in both the overall dataset and cluster-group analyses. These findings provide evidence that the clustering analysis using the k-means algorithm effectively grouped the dataset utilized in this study [
34,
45]. The
RBPANN-tanh model, employing a tangent hyperbolic activation function, demonstrated the highest performance in predicting height measurements during both the training and testing phases (as shown in
Table 6 and
Table 9). Furthermore, the ranks and sum of ranks, based on the ranking system proposed by Kozak and Smith [
44], provided evidence of the advantages of the
ANN models over the
NLMEM approach. Models such as the
RBPANN-logistic were reported by Özçelik, Diamantopoulou, Crecente-Campo, and Eler [
7], who revealed that models such as the
RBPANN-logistic exhibited advantages over
NLMEM when predicting the growth of Crimean juniper in the southern and southwestern regions of Turkey. Similar results have been reported regarding the advantages of
ANNs or deep learning algorithms over the ordinary least square model and
NLMEM in both training and testing or validation phases [
14,
15,
46,
47]. In all cases, the implementation of
ANNs exhibited significant advantages over traditional approaches when modeling the
h-dbh relationship.
In this study, based on the implemented ranking system, the
RBPANN-tanh model emerged as the top performer (residual and predicted values are showed in
Figure 5 and
Figure 6). It achieved a sum of ranks of 176 for the training phase and 81 for the testing phase. These sums of ranks account for both the overall dataset and cluster-groups, as illustrated in
Table 7 and
Table 10, respectively. In terms of training, the
RBPANN-softplus model ranked second, whereas during the testing phase, the
RBPANN-logistic model exhibited the second-best performance. On the other hand, the
RBPANN-logistic model performed least effectively in the training phase, while the
NLMEM model demonstrated comparatively lower performance during the testing phase. The
ANNs developed in this study, as depicted in
Figure 4, were trained using the RBP algorithm. The
ANNs were then evaluated using three different activation functions:
RBPANN-tanh,
RBPANN-softplus, and
RBPANN-logistic. These models comprised a total of five layers, including three hidden layers. The training process involved ten repetitions to ensure robustness and accuracy. Even though the
RBPANN-logistic converged in 88 steps, it exhibited relatively poorer performance compared to the
RBPANN-tanh, which achieved better results within 301 steps. Interestingly, the
RBPANN-logistic required a longer convergence time of 1885 steps, indicating its comparatively poorer performance in this aspect. As a result, the developed
ANN model showcased a high capability for predicting total tree height measurements. This highlights the potential application of AI in modeling the
h-dbh relationship, not only for Durango pine trees but also for general forest modeling purposes or other variables [
6,
19,
48]. The
ANNs could be used to improve the estimations in forest inventory and forest management and planning in mixed-species forests in Durango, Mexico. The findings from this study offer substantial evidence that ANNs can be effectively applied to predict the total tree height of Durango pine trees, utilizing both the diameter at breast height and cluster-groups as predictive variables. The ANNs demonstrated robust performance for Durango pine trees and hold the potential for utilization across diverse species and ecological regions not only in Mexico but also worldwide. Furthermore, ANNs could find valuable applications in forest management and planning scenarios.