3.4. Spectral Modeling: PLSR
The statistical parameters of the PLSR models developed in the first soil assessment are shown in
Table 3. The characteristics of Ca
2+ and K
+ content, P, SB, CECe, CEC, and BS had values for R
2 ranging from 0.70 to 0.86 in the calibration and validation steps, and were therefore classified as good or very good prediction models. In the prediction step, the values of R
2 ranged from 0.63 to 0.82, giving models with moderate to very good quantitative prediction capabilities. Our values were similar to those found by Rodrigues et al. [
28] for Ca
2+, K
+, and P (with R
2 above 0.76 in the calibration and validation steps), and were higher than those found by Veum et al. [
66] for Ca
2+, K
+, and P.
Of the PLSR models developed in the first soil assessment (
Table 3), those for K
+, P, and CECe stood out, with values of R
2 of above 0.81. For P, the prediction R
2 was 0.82, while for K
+, the value was 0.81, both of which were higher than those reported by Zornoza et al. [
22]. With R
2 values of above 0.80, these models provide a very good estimate of the characteristics. For Mg
2+, it was not possible to perform the prediction step, possibly due to the low value of R
2 in both the calibration and the validation steps. For S-SO
42−, Al
3+, and m%, it was not possible to develop good models, as they could not explain the variability of the data.
For the RMSE value in the validation step (
Table 3), there was a slight variation compared to the calibration step, indicating models with good predictive capacity. These low RMSE values agree with the coefficients of determination, which were classified as good and very good. The same results were found in the other evaluations (
Table 4,
Table 5 and
Table 6).
Based on the RPD results (
Table 3), the PLSR models in the calibration phase were classified as highly reliable for Ca
2+, K
+, SB, CECe and CEC, reliable for Mg
2+, P, pH, and BS, and unreliable for S-SO
42−, Al
3+, H+Al, and m%. Our RPD values were higher than those reported by Veum et al. [
66] for Ca
2+, Mg
2+, K
+, P, and pH.
Based on the RPIQ values from the first evaluation (
Table 3), the PLSR models were classified as poor for Ca
2+, pH, and BS (in which only high and low values are distinguishable), regular for P, SB, and CECe (in which quantitative predictions are possible), and good for CEC (representing a good quantitative model). These results show that the RPIQ parameter proved to be more rigorous than the other parameters for the classification of the developed models; even so, it was still possible to obtain regular and good models. In particular, the PLSR model for CEC was classified as good based on the RPIQ. Both the R
2 and the RPD demonstrated that this was a good and highly reliable model.
In general, an improvement was seen in most PLSR models with iPLS selection (
Table 3) compared to PLSR models without iPLS selection (
Table A1), although no improvement was observed for S-SO4
2−, pH, Al
+3, H+Al, and m%. The R
2 values in the calibration step were higher by about 16% for Mg
2+ and P, 9% for CECe and CEC, and 4.5% for SB and BS (
Table A1). For Ca
2+ and K
+, no increase in R
2 was observed. There was an average reduction of 16% in the calibration RMSE with iPLS for Ca
2+, Mg
2+, K
+, P, SB, CECe, CEC, and BS compared to the PLSR models without iPLS selection; in particular, the RMSE values for K+, P and CEC showed reductions of 20%, 33% and 24%, respectively. In the validation phase, there was an average increase of 10% in the R
2 values for Ca
2+, Mg
2+, K
+, P, SB, CECe, CEC, and BS, where the values for Mg
2+, P, CECe, and CEC showed increases of 16%, 16%, 11% and 11%, respectively. In the same way as for the calibration step, there was an average reduction of 16% in the RMSE for the validation step with iPLS, and in particular, the RMSE values for P, SB, CECe, and CEC showed reductions of 31%, 22%, 18%, and 27%, respectively. K+ and P stood out in the prediction phase, as they showed increases in R
2 of 15% and 27% and reductions in the RMSE of 17% and 53%, respectively. The prediction RMSE for CEC was also noteworthy, with a reduction of 33%.
In the second soil assessment (
Table 4), the PLSR models of Ca
2+, K
+, P, SB, CECe, CEC, and BS achieved the highest values of R
2, ranging from 0.73 to 0.90 in the calibration and validation steps; in the prediction step, the values of R
2 ranged from 0.67 to 0.91. This range of R
2 values indicates that the models have good to excellent quantitative prediction capabilities. These values were higher than those reported by Pinheiro et al. [
26] and Veum et al. [
66]. The results from the second soil assessment were also superior to those of the first assessment (
Table 3).
In addition, the PLSR models developed for the prediction of SB, CECe, and CEC achieved R
2 values of above 0.85; the value for CEC reached 0.91 (
Table 4), and the model had lower numbers of variables (93) and factors (three) compared to those of SB and CECe. For S-SO
42−, Al
3+, H+Al, and m%, it was not possible to develop good models in the second soil assessment (
Table 4), as they could not explain the data variability.
Based on the RPD values (
Table 4), the PLSR models of the calibration phase were classified as highly reliable for Ca
2+, P, SB, CECe, CEC and BS, reliable for Mg
2+, K
+ and pH, and unreliable for S-SO
42−, Al
3+, H+Al, and m%. Note that there was an improvement in the RPD for BS compared to the first evaluation, which was accompanied by an increase in R
2 values in all phases.
Based on the RPIQ values for the second evaluation (
Table 4), the PLSR models were classified as poor for pH (meaning that only high and low values were distinguishable), regular for Ca
2+, P, SB, and BS (meaning that quantitative predictions were possible), good for CECe, and excellent for CEC. These RPIQ values show an improvement in the predictive capacity of the models compared to the first evaluation, mainly for Ca
2+, CECe, CEC, and BS. The PLSR model for CEC, in particular, was classified as excellent based on the RPIQ, and was shown to be a very good and highly reliable model based on both the R
2 and the RPD.
The better accuracy in the second evaluation can be explained by the longer reaction time for the limestone and phosphogypsum, which allowed for higher values of SB, CECe, CEC, and BS (
Table 2). As a result of this greater amplitude, more robust models were obtained with a better fit.
In general, there was an improvement in most PLSR models with iPLS selection (
Table 4) compared to models without iPLS selection (
Table A1). However, again, no improvements were observed for S-SO
42−, pH, Al
+3, or H+Al, and these results were repeated in the other evaluations. After iPLS selection, the R
2 values in the calibration step were increased by 22% for Mg
2+, 18% for K
+, and 13% for SB and BS. Regarding the calibration RMSE, there was an average reduction of 28% for Ca
2+, Mg
2+, K
+, P, SB, CECe, CEC, and BS compared to PLSR models without iPLS selection, where the results for K
+, SB, CEC, and BS showed reductions of 30%, 43%, 36% and 29%, respectively. In the validation phase, there was an average increase of 14% in the R
2 values for Ca
2+, Mg
2+, K
+, P, SB, CECe, CEC, and BS; the R
2 values for Mg
2+, K
+, SB, and BS showed increases of 26%, 19%, 15% and 18%, respectively. In the validation stage, there was an average reduction of 30% of the RMSE with iPLS, with the RMSE for SB, CECe, and CEC showing reductions of 46%, 35%, and 30%, respectively. In the prediction phase, the results for SB, CECe, and BS showed increases of 24%, 19%, and 35% in the values of R
2, and reductions in the RMSE of 38%, 36%, and 26%, respectively. Another noteworthy result was the prediction RMSE for CEC, with a reduction of 22%.
PLSR models with similar adjustments to the second soil assessment were also found for the third (
Table 5) and fourth soil assessments (
Table 6), two and three years after the treatment application (2016), respectively. In the fourth soil assessment, the PLSR models of Ca
2+, K
+, SB, CECe, CEC, and BS achieved the highest values of R
2, ranging from 0.83 to 0.92 in the calibration and validation steps and from 0.70 to 0.90 for the prediction step. This range of R
2 values indicates that these models have a good or very good capacity for quantitative prediction. These values were higher than those reported by Pinheiro et al. [
26] and Terra et al. [
25], but lower than those found by Rodrigues et al. [
28] for Ca
2+ and K
+, for all three steps (calibration, validation, and prediction). However, it was not possible to develop good models for S-SO
42−, pH, Al
3+, H+Al, and m% for the calibration, validation, and prediction steps of the third and fourth soil assessments (
Table 5 and
Table 6). The prediction models for CEC and SB for the fourth soil assessment (
Table 6) were notable, with R
2 values of 0.89 and 0.90, respectively, indicating a very good or excellent capacity for quantitative prediction.
For the RPD values, the results for the third assessment were similar to those of the second assessment (
Table 5). The PLSR models for the calibration phase were classified as highly reliable for Ca
2+, K
+, P, SB, CECe, and CEC, reliable for Mg
2+, pH and BS, and unreliable for S-SO
42−, Al
3+, H+Al, and m%.
Based on the RPIQ values of the third assessment (
Table 5), the PLSR models were classified as poor for Mg
2+ (meaning that only high and low values were distinguishable), regular for Ca
2+, K
+, P, pH, and BS (meaning that quantitative predictions were possible), good for SB and CECe (meaning that these were good quantitative models), and excellent for CEC. There was an improvement in the predictive capacity of the SB model compared with the second evaluation, and this was classified as a good quantitative model. The classification of the PLSR model for CEC remained excellent, based on the RPIQ.
A comparison between PLSR models with iPLS selection (
Table 5) and PLSR models without iPLS selection (
Table A1) indicates that the R
2 values for the calibration stage increased by 7% for Ca
2+ and Mg
2+, and 8% for SB and CEC. In terms of the calibration RMSE, there was an average reduction of 14% for Ca
2+, Mg
2+, K
+, P, SB, CECe, CEC, and BS compared to the PLSR models without iPLS selection; in particular, the SB and CEC models showed reductions of 21% and 38%, respectively. In the validation phase, there was an average increase of 8% in the R
2 values for Ca
2+, Mg
2+, K
+, P, SB, CECe, CEC, and BS. The R
2 values for Mg
2+ and SB were notable, with increases of 16% and 10%, respectively. Regarding the validation RMSE, there was an average reduction of 18% in the RMSE with iPLS, and in particular, the RMSE values for Ca
2+, SB, and CEC showed reductions of 21%, 26%, and 43%, respectively. In the prediction phase, the results for K
+ were notable, and showed an increase of 16% in R
2 and a reduction in RMSE of 29%. However, in the prediction phase, the PLSR models with iPLS developed in the third evaluation were not improved for most features, compared to the PLSR models without iPLS.
The RPD values found in the fourth evaluation (
Table 6) indicate that the PLSR models of the calibration phase can be classified as highly reliable for Ca
2+, K
+, P, SB, CECe, CEC, and BS, reliable for Mg
2+ and pH, and unreliable for S-SO
42−, Al
3+, H+Al, and m. The classification for the BS model was elevated to highly reliable, demonstrating the evolution of the BS parameters throughout the evaluations.
Based on the RPIQ values for the fourth evaluation (
Table 6), the PLSR models were classified as poor for Mg
2+, pH, Al
3+, H+Al (meaning that only high and low values are distinguishable), regular for K
+ (meaning that quantitative predictions are possible), good for Ca
2+ and P, and excellent for SB, CECe, CEC, and BS. The predictive capacity of the SB, CECe, and BS models improved compared to the other assessments, and these were classified as excellent quantitative models.
The PLSR model for CEC remained excellent based on the RPIQ values over the last three evaluations, a result that can be explained by the association of CEC with mineralogy and soil granulometry [
2]. Numerous studies have been performed using Vis-NIR-SWIR reflectance spectroscopy to predict the physical characteristics of soil, such as granulometry [
50,
67]. The relationship of Vis-NIR-SWIR spectroscopy with the characteristics and physics of soil is already well established; however, works that include exchangeable ions, SB, CECe, CEC, and BS are still scarce, and mainly involve variable selection.
Since both the CEC and BS are necessary to calculate liming recommendations using the BS method [
5], the results for R
2, RPD, and RPIQ show that it is possible to calculate liming recommendations based on the values estimated by the CEC and BS PLSR models.
Once again, an improvement was noted in most PLSR models with iPLS selection (
Table 6) compared to models without iPLS selection (
Table A1). After iPLS selection, the R
2 values at the calibration stage were on average 4% higher, and in particular, the results for Ca
2+ and BS showed increases of 7% and 9%, respectively. There was an average reduction of 18% in the calibration RMSE for models with iPLS for Ca
2+, K
+, SB, CECe, CEC, and BS compared to the PLSR models without iPLS selection; notable results were seen for Ca
2+ and BS, which showed reductions of 39% and 27%, respectively. In the validation phase, there was an average increase of 4% in the R
2 values for Ca
2+, Mg
2+, K
+, P, SB, CECe, CEC, and BS, and the values of R
2 for Mg
2+ and BS increased by 7% and 8%, respectively. Again, in the validation phase, there was an average reduction of 13% in the RMSE for models with iPLS, where the values for Ca
2+, K
+, and BS showed reductions of 31%, 14%, and 17%, respectively. The results for K+ and BS were notable in the prediction phase, as these showed increases of 13% and 12% in R
2 and reductions of 11% and 48% in RMSE, respectively. The prediction results for the RMSE for CEC were also significant, with a reduction of 50%. For Ca
2+ and Mg
2+, there were increases in RMSE of 15% and 4%, respectively, in the models with iPLS. These results indicate a greater reduction in the RMSE in the models with iPLS, particularly for the CEC and BS, which are characteristics used for liming recommendation in the BS method [
5].
In most cases, the reduction in the RMSE values obtained by the PLSR models with iPLS (in the four evaluations) improved the predictive capacity of these models, since the RMSE measures the differences between the estimated values and the real values, and hence quantifies the accuracy of the model. Reductions in the RMSE values therefore gave rise to better accuracy for these models.
Furthermore, for all the characteristics studied for the four soil samples (
Table 3,
Table 4,
Table 5 and
Table 6), the systematic error values (BIAS) in the calibration and validation phases were close to zero; these results demonstrate the very low bias of the model in regard to the estimated characteristics [
68]. Willmott [
69] states that PLSR models are considered to be good when the BIAS approaches zero, similarly to the results in the prediction phase for characteristics with R
2 above 0.6 (
Table 3,
Table 4,
Table 5 and
Table 6).
Thus, there was a gradual evolution of the statistical parameters from the first to the fourth assessment; this may have been because as the treatments (limestone and phosphogypsum) reacted with the soil over the years, greater amplitudes in the chemical characteristics of the soil were generated, allowing us to find better models.