Non-Linear Visualization and Importance Ratio Analysis of Multivariate Polynomial Regression Ecological Models Based on River Hydromorphology and Water Quality

Shah, Vishwa; Jagupilla, Sarath Chandra K.; Vaccari, David A.; Gebler, Daniel

doi:10.3390/w13192708

Open AccessArticle

Non-Linear Visualization and Importance Ratio Analysis of Multivariate Polynomial Regression Ecological Models Based on River Hydromorphology and Water Quality

¹

Occoquan Watershed Monitoring Laboratory, Department of Civil and Environmental Engineering, Virginia Tech, 9408 Prince William Street, Manassas, VA 20110, USA

²

Department of Civil, Ocean and Environmental Engineering, Stevens Institute of Technology, Castle Point on Hudson, Hoboken, NJ 07030, USA

³

Department of Ecology and Environmental Protection, Poznan University of Life Sciences, Wojska Polskiego 28, 60-637 Poznan, Poland

^*

Author to whom correspondence should be addressed.

Water 2021, 13(19), 2708; https://doi.org/10.3390/w13192708

Submission received: 2 September 2021 / Revised: 26 September 2021 / Accepted: 27 September 2021 / Published: 30 September 2021

(This article belongs to the Section Ecohydrology)

Download

Browse Figures

Versions Notes

Abstract

:

Multivariate polynomial regression (MPR) models were developed for five macrophyte indices. MPR models are able to capture complex interactions in the data while being tractable and transparent for further analysis. The performance of the MPR modeling approach was compared to previous work using artificial neural networks. The data were obtained from hydromorphologically modified Polish rivers with a widely varying water quality. The modeled indices were the Macrophyte Index for Rivers (MIR), the Macrophyte Biological Index for Rivers (IBMR), and the River Macrophyte Nutrient Index (RMNI). These indices measure the trophic and ecological status of the rivers. Additionally, two biological diversity indices, species richness (N) and the Simpson index (D), were modeled. The explanatory variables were physico-chemical properties depicting water quality and river hydromorphological status indices. In comparison to artificial neural networks, the MPR models performed similarly in terms of goodness of fit. However, the MPR models had advantages such as model simplicity and ability to be subject to effective visualization of complex nonlinear input–output relationships, as well as facilitating sensitivity analysis using importance ratios to identify effects of individual input variables.

Keywords:

multivariate polynomial regression; artificial neural networks; macrophyte indices; ecological status; rivers

1. Introduction

Several studies acknowledge the significance of accurately accounting for multiple sources of ambiguity when modeling ecological phenomena [1]. Additionally, the assessment of large waterbodies is a challenging task [2]. Many surveys report that the reaction of aquatic communities to varying environmental conditions is often responsible for river degradation, mainly identifying eutrophication [3,4,5,6]. Several studies also focus on the quality of groundwater and of water bodies such as rivers, watersheds, and coastal environments. This plays an important role in determining its impact on public health and the environment [7,8]. Many countries in Europe have started developing river monitoring systems in their national monitoring programs based on macrophytes [9,10], According to the Water Framework Directive (WFD) [11], assessment of freshwater is based on ecological status consisting of biological indicators (fish, macroinvertebrates, phytoplankton, phytobenthos, and macrophytes), supported by water quality and physical conditions of ecosystems. The significance of macrophytes in biological river assessment is formally acknowledged under the WFD, and macrophytes are an essential element in the monitoring of ecological status and surface water quality. Macrophyte development is based on several abiotic and biotic factors consisting of nutrient concentrations, flow velocity, hydraulic conditions, pH, carbonate hardness, shading, and anthropogenic impacts [12]. There are several advantages in using macrophytes for biological monitoring. Macrophytes are immobile and therefore present responses to local environmental changes [13]. Macrophytes are also comparatively extensive and considered to be easy to recognize. Additionally, the degrading conditions due to human disturbance (hydromorphological alteration) has severely affected the macrophytes in various waterbodies [14,15]. At the same time, the increasing amount of monitoring data creates opportunities for even better understanding of the relationships between different elements of the aquatic ecosystems [16,17].

There are several models that have been developed to predict degradation of rivers, lakes, and other waterbodies. For example, Gebler et al. [18] and Krtolica et al. [19] showed the relationship between macrophyte indices, biological diversity indices, and water quality parameters, hydromorphological indices as explanatory variables using artificial neural networks [20]. Further, Najafzadeh et al. [7] evaluated groundwater quality at the Rafsanjan basin, Iran, using artificial intelligence models such as M5 Model Tree (MT), Evolutionary Polynomial Regression (EPR), Gene-Expression Programming (GEP), and Multivariate Adaptive Regression Spline (MARS). These data-driven techniques have been widely used to study groundwater resources, artificial aquifer recharge, and effects of global warming on groundwater quality. Artificial neural networks (ANNs) are advanced algorithms that can extract complex nonlinear relationships among such ecological datasets. However, a neural network can be considered a “black box” due to lower model transparency. ANNs are difficult to communicate and to analyze. Additionally, the most often used back propagation (BP) algorithm has a very slow convergence rate [21], which is frequently made worse by adding extra nodes to either hidden layers or more hidden layers of neural networks. The ANNs are also more subject to overfitting, also referred to as “memorizing the noise”. The ANN models are also limited in their ability to present visual representations of the models developed and relationships between individual dependent and predictor variables.

The aim of this study was to use a nonlinear statistical approach based on multivariate polynomial regression (MPR) [22]. The application of multivariable statistical methods serves as a valuable tool to better understand and interpret complicated datasets [23]. MPR and other non-linear modeling techniques were previously used to analyze and interpret complex relationships in the fields of water quality and ecological status of the surface water streams as well as to identify possible factors that influence water systems [18,24,25]. MPR modeling using TaylorFit software [22] involves a stepwise selection process for the fitting procedure where some candidate terms are tested and selected for addition into the model according to statistical significance. In the past, biological sludge and chaotic time-series [26] were modeled using MPR. TaylorFit is equipped to determine linear and nonlinear relationships between variables, including linear and nonlinear interactions among the predictors [24,27]. Such regression models have advantages over ANNs, as they are representational models, which enables them to be more easily analyzed using techniques such as sensitivity analysis using importance ratios and graphical approaches. Ecological data are immense, non-linear, and complex and include noise and repetition, as well as numerous outliers [28]. There is also a complex relationship between dependent variables and predictors. Traditionally, many such biological models have been analyzed using univariate polynomial regression [29], but these cannot reveal interactions between variables. The approach presented in this paper differs from previous approaches in terms of dealing with such a large amount of data, with 14 candidate predictors and 5 dependent variables. Additionally, sensitivity analysis using importance ratios is used to make interpretations about the possible effects of explanatory variables on the indices.

In this study, multivariate polynomial regression (MPR), a nonlinear statistical modeling technique, was used to model three ecological status indices and two biodiversity indices as a function of several physicochemical properties of water quality and hydromorphological indices. The data obtained are based on a 2018 study [18] that describes the dataset and the artificial neural network models. The modeling process was repeated using MPR to better understand and visualize the relationships between the predictors and the dependent variables, and to quantify the importance of each individual predictor on the dependent variables. This would be very cumbersome to do with ANN models. Furthermore, the MPR models themselves can be easily presented in this work for use by other investigators. The objectives of this study are to (i) develop MPR models to predict macrophyte indices in Polish rivers; (ii) perform importance ratio analysis for individual predictor variables; (iii) to visualize the non-linear relationships between the dependent and predictor variables graphically; and (iv) to compare the goodness of fit statistics and model interpretations with earlier ANN models based on the same data.

2. Materials and Methods

2.1. Data Management

The data were obtained from over 200 survey sites in Poland representing hydromorphologically altered rivers with a wide range of water quality conditions and comprised of 14 explanatory variables. The macrophyte indices of ecological status assessment modeled in this study were Macrophyte Index for Rivers (MIR) [30], River Macrophyte Nutrient Index (RMNI) [31], and Macrophyte Biological Index for Rivers (IBMR) [9]. Species Richness Index (N) and the Simpson Index (D) [32], which measure biological diversity were also modeled. The data was previously modeled using Artificial Neural Networks (ANNs) [18]. The three indices of the ecological status express the trophic degradation of river ecosystems. These indices are based on the coverage of macrophytes using five-grade (IBMR) or nine-grade (MIR, RMNI) scale and their indicator values. These values define the average level of water trophy of each species and the tolerance to this factor. The Simpson index (D) is one of the most often used alpha diversity indices, which includes the number of species present at the site, as well as the relative abundance of species. The species richness index (N) represents the number of species that occur at the site. The basic equations used to calculate each of these macrophyte indices are shown in the Supporting Information (Table S1).

Table 1 shows the fourteen candidate predictor variables and the five indices (MIR, RMNI, IBMR, D, and N) to be predicted. The predictor variables were chosen such that they adequately represented aquatic vegetation, hydromorphological assessment, and physicochemical properties of the water body. Hydromorphological assessment was carried out following a river habitat survey (RHS) [33]. This method categorizes the physical state of a river by the calculation of certain indices. The HMS is an index extracted from the data concerning morphological modification of the river channel due to human activities (e.g., bank reinforcement, channel re-sectioning, culverting, number of weirs, etc.). The HQA index estimates the modification quality of the river through the diversity of features evaluated (e.g., number of different flow types, different substrates, and naturalness of land use). All the available fourteen independent variables were tested for predicting the five macrophyte indices.

2.2. Model Development

The dataset was divided into three parts randomly when the data were modeled using ANN [18]. To make comparisons meaningful, the same fit, cross, and validation datasets were retained for MPR modeling. The fit dataset (n = 140) was used for determining the regression coefficients. The cross dataset (n = 30) was used to determine global goodness-of-fit for cross-validation. This means the coefficients were computed using the fit dataset, but only those terms were added to the model that improved the prediction of the cross dataset and having a p-value < 5%. The validation dataset (n = 30) was used to generate the same global goodness-of-fit statistics for final validation after the models were generated i.e., after the best model is finalized based on fit and cross datasets. This approach minimizes the possibility of overfitting and helps to ensure the resulting model is generalizable, i.e., applicable to new data.

We produced 5 regression models for each index depending on 14 predictors, such as

Y = \sum_{i = 1}^{n} a_{i} \cdot X_{1}^{b_{i}} \cdot X_{2}^{c_{i}} \cdot X_{3}^{d_{i}} \cdot X_{4}^{e_{i}}, 1 \leq i \leq n

(1)

The exponents could be either integers or fractions. In this study, only integer exponents between −3 and +3 were considered. Preliminary examination of fractions and integers outside this range did not result in significantly better models. Given this and the Occam’s Razor principle, higher absolute exponents, multipliers, and fractional exponents were excluded from detailed consideration. TaylorFit software gives a conventional mathematical description of MPR in the form of a general equation (Equation (1)). Basically, it is an extension of multilinear regression (MLR) that includes interactions and other nonlinearities. Based on Taylor Theorem, it describes a polynomial series equation with a finite number of terms for each of the index to yield a definite degree of accuracy.

Further, the number of terms increases with the number of exponents and the number of multiplicands. The maximum number of multiplicands for each candidate term used in this study was 4. A stepwise algorithm, as follows, was used to build the model:

The model always started with an intercept (average of the dependent variable values). The software generated terms that best combined with existing terms of a model, depending on the allowable exponents and multiplicands set by the user. The terms were sorted in terms of best t-statistics of the fit data.
A candidate term was chosen from among the statistically significant variables of fit data, which also improved the R² of a separate cross-correlation dataset. Both these criteria had to be met for a term to be added into the model. This procedure reduced the possibility of overfitting and improved the generalizability of the model.
After any term was added to the model, the other terms previously added were tested for statistical significance and removed if not.
The above process was repeated iteratively for additional candidate terms.
The model was thus built by an iterative process by adding and removing candidate terms from among statistically significant terms based on fit dataset that also improved the R² of the test dataset, until the model could not be improved by addition or removal of any single term.
After the model was complete, it was tested against a 3rd independent validation dataset. The performance with the validation dataset was used for comparison with the ANN models.

2.3. Sensitivity Analysis

Sensitivity analysis was conducted based on select variables that existed in each of the model indices using the MPR. The local sensitivity index, δ, is given by [24,34]

δ = \frac{\partial f}{\partial X}

(2)

where

f

is the model prediction of the dependent variable (MIR, RMNI, IBMR, D, and N, respectively, in our case), and X is the independent variable whose effect on the dependent variable is being examined (e.g., HMS, HQA, P_tot, etc.). However, local sensitivity indices of different variables cannot be compared with each other because of the varying units and ranges of different predictor variables, and therefore, they cannot be used to obtain their relative importance on the output. Consequently, a more useful indicator of sensitivity considers the spread in the variable involved and normalizes the units. The importance ratio [34] as defined in Equation (3), accomplishes both goals.

φ_{X} = \frac{\frac{\partial f}{s_{f}}}{\frac{\partial X}{s_{X}}} = δ \cdot \frac{s_{X}}{s_{f}}

(3)

where

s_{X}

and

s_{f}

are the standard deviations of the independent and dependent variables in the fit dataset, respectively. The interpretation of the importance ratio is that it indicates the sensitivity relative to the spread of the variables.

For example, if the importance ratio for y versus x is 0.30, this means that if we increased x by, say, 1% of its standard deviation, then y would increase by about 0.3% of its standard deviation. The importance ratio (and the sensitivity) for nonlinear models is not a single value, as for linear models. It will be different for each combination of independent variables. Thus, it is better to examine it as a distribution. Thus, we reported tenth, fiftieth, and ninetieth percentiles as well as a graphical display of the cumulative distribution. For a single central tendency, an appropriate measure would be the root-mean-square (RMS) value. We used the RMS importance ratio as a singular way to compare the impact of different independent variables on the prediction. These are examples of ways that representative models such as MPR can provide information that is difficult to obtain with ANN models.

MPR modeling using TaylorFit can produce models in the same form as the evolutionary polynomial regression (EPR) approach [35]. The difference is in the type of optimization approach used to select model terms, resulting in different constraints on the model. Plus, TaylorFit has built-in tools for sensitivity and importance ratio analysis as described above.

3. Results

The MPR models obtained for five indices are shown in Table 2. The t-statistics and associated probability that the coefficient is different from zero (p(t)) are shown in Table 3. Macrophyte communities from different river systems make comparisons between different types of rivers difficult [36]. Each of the fitted models cannot be applied as-is to any other river because these locations will have different physicochemical properties and different hydromorphological status. However, a similar approach can work in other geomorphological regions and other river types by generating new models following the procedure presented in this paper.

The MPR model for MIR had validation R² and MSE values of 0.580 and 10.86, respectively (Figure S1, Table 4). In comparison, the corresponding values of the ANN model were 0.702 and 9.790, respectively. Figure 1a shows the nonlinear response of MIR with respect to HQA and conductivity. For the nonlinear plots, all other variables in the model, except those represented on the plot, were held at their average values.

Compared to ANN, the use of MPR enabled a much better understanding of the predictors effect on different macrophyte indices. Sensitivity analysis, or other explanatory analysis that are utilized in ANN, showed relations between variables [37], but it did not give precise information about existing relationships. Previous studies [18] showed the general importance of parameters on MIR and other macrophyte indices, but MPR revealed a precise interaction between the modeled indices and individual predictor variables. For the first index, as conductivity increased, MIR decreased at a decreasing rate. As HQA increased, MIR increased as well. These results are like other studies showing the importance of conductivity and hydromorphological conditions to macrophytes [38]. The cumulative frequency distributions of the importance ratios for the dependent variables for MIR (Figure 1b) showed that conductivity had a negative importance ratio for all observed values, indicating its negative correlation with MIR. The importance ratios were calculated using Equation (3) for all combinations of observed values, and a cumulative distribution was built by rank ordering all calculated importance ratios. For MIR, the variable with the most impact on its variability was conductivity, as evident by its relatively high root mean square importance ratio (1.358 from Table 5). HQA had mostly positive importance ratios, indicating its positive correlation with MIR for all observed conditions. pH, HMS, and BOD had relatively lower importance ratios than HQA and conductivity. It is evident that the macrophyte increased with the HQA index depicting a broad measure of diversity and “naturalness” of the river. The HQA score is determined by the presence and extent of habitat features of the known wildlife examined in the survey. On the other hand, HMS index, which assesses the human modification activities around the river increased while the MIR index decreased.

The RMNI model showed validation R² and MSE values of 0.650 and 0.050, respectively (Figure S2, Table 4). Consequently, the ANN model had the values of 0.715 and 0.050, respectively. RMNI is extremely sensitive to nutrient composition, as is evident in the nonlinear plot (Figure 2a). As the total phosphorus and nitrogen increase, the RMNI index increase gradually [39]. As seen, the increase in the index was gradual until it attained a constant trend. Figure 2b depicts the importance ratio plots consisting of positive implications on RMNI, except nitrate nitrogen, which had a negative impact. In contrast to MIR, RMNI did not show any relationship to HMS and did not seem to rely significantly on HQA either. These findings contrast with the results of modeling using neural networks but were consistent with the assumptions of the index [31]. For RMNI, the variable with the most influence on its variance was ammonia nitrogen, as perceptible by its relatively high root mean square importance ratio (14.640 from Table 5).

The MPR model for IBMR showed validation R² and MPR values of 0.470 and 0.360, respectively (Figure S3, Table 4), compared to the ANN model, which had an R² value of 0.532 and an MSE of 0.410. Figure 3a shows the nonlinear response of IBMR with respect to HMS and organic nitrogen. An increase in HMS and organic nitrogen had a negative effect on IBMR because of implications of human activities attributed to the HMS index. The clear relation with these two environmental variables is reliable with the IBMR assumption [9], but so far rarely indicated in other studies. The importance ratio of independent variables (Figure 3b) showed negative impacts in terms of importance ratio for IBMR, as we observed in MIR. Nitrite nitrogen HMS seemed to have a positive correlation with IBMR. Organic nitrogen seemed to have negative as well as positive impacts with the index. However, biochemical oxygen demand did not seem to have any significant impact of importance ratio on IBMR. The highest influence of a variable on IBMR was shown by nitrite nitrogen, which had an RMS of 14.607 (Table 5). This index has weight given to trophic level and heavy organic pollutants [40], the effect of which was effectively captured in the MPR model through organic nitrogen and nitrite nitrogen. Unlike other indices, IBMR has not been proven to have shown effects of HMS and HQA indices.

The biological diversity indices D and N were the least performing models in terms of R² values. The MPR model for D had R² and MSE values of 0.237 and 0.010, respectively (Figure S4, Table 4), compared to the ANN derived model R² and MSE values of 0.284 and 0.010, respectively.

The importance ratio plot (Figure 4b) for D showed nitrate nitrogen having a negative effect on the index, whereas HQA had a significant positive correlation with D. Biological oxygen demand (BOD₅) showed the highest RMS importance ratio value of 7.324 (Table 5), depicting the dominance of the variable on D. The species richness index (N) had an R² value of 0.330 and an MSE value of 8.39 (Figure S5). The ANN derived model had 0.415 for validation and MSE of 9.29. HMS, HQA, and conductivity importance ratio plots (Figure 5b) showed a slight positive correlation with N index for all the observed values. Though nitrite nitrogen seemed to have a significant positive effect, but it also seemed to have a negative correlation with N. The nonlinear plot (Figure 4a and Figure 5a) for D and N were both dependent on HMS and HQA. These plots captured the effect of human activities on the area effectively, which was clearly shown by the increasing rate of HMS index that decreased both D as well as N. Simultaneously, with increasing HQA, which demonstrated natural habitat around the region, appeared to result in surges of values for both the indices. Modeling quality for biodiversity indices, both D and N, supported the previous attempts on the relationship between biological diversity patterns and environmental factors in various types of ecosystems [41].

As can be seen, data can be modeled with different approaches, each producing its own performance results; but how can one determine if the differences in performance are significant? The approach described here will work not only for comparing two MPR models, but to compare MPR with MLR, or MPR with ANN, etc. This can be carried out using the data from TaylorFit, but the calculation itself will have to be done in external software such as Excel.

Disparate models can be compared by using the ratio of their MSE values. Note that MSE = SSE/df. The df is unambiguous with MLR or MPR models. In the case of ANN models, there is disagreement in the literature about what the df should be. Some references indicate that df is much less than the number of weights in the ANN (which can be quite high); others indicate it should be higher. A conservative approach would be to just assume that df is equal to the number of weights in the ANN.

In any case, once the MSE is computed for each of the two models, the ratio of the model with the larger MSE (call it “model 1”) and the model with the smaller MSE (model 2) will be the F-statistic for the comparison:

F = \frac{{MSE}_{model 1}}{{MSE}_{model 2}}

(4)

Then, the probability that that value of F or larger can occur by chance can be computed from the F-statistic and the degrees of freedom for the two models:

p (F) = f (F, {df}_{1}, {df}_{2})

(5)

In Excel, for example, the p(F) can be computed using the built-in function “=FDIST(F,df1, df2)”. If p(F) ≤ α, one may accept that model 2 has a significantly smaller (better) MSE.

Table 4 shows the results of comparing each of the MPR models with the corresponding ANN models. The F-statistic shown is the ratio of the MSE for the MPR model to that of the ANN model. The lowest value of p(F) is 0.377 (for MIR), indicating that there is a 37.7% probability that the difference in MSE could occur by random error. Since this is much larger than the typical threshold α = 0.05, we must accept the null hypothesis that MSE for the MPR models may be equal to the MSE of the ANN models.

In other words, although the ANN models produce higher R², they are not significantly higher, and we may conclude that the MPR models are as good as the ANN models in fitting these data.

The Pearson correlation [42] between independent variables is shown in Figure S5 (Supporting Information). It can be seen from the figure that most independent variables are not correlated, though this does not seem to be of concern, as MPR eliminates terms with high collinearity from the final model.

4. Conclusions

Based on our research, the macrophyte indices were modeled using the simple technique of multiple polynomial regression, and then sensitivity analysis was performed to demonstrate the influence of independent variables on the indices. The ecological indices (MIR, RMNI, and IBMR) performed better than the diversity indices (N and D). The MPR modeling can facilitate interpretation of ecological indices (MIR, RMNI, and IBMR) as well as the biological diversity indices (D and N). The same observations when modeled using artificial neural networks (ANNs) showed higher R² but had the complexity and difficulty in understanding the patterns of explanatory variables with the indices. Moreover, none of the R² values for the ANNs were statistically significantly higher than those for the MPR models, based on use of the F-statistic for comparing MSE values for the two modeling approaches. Consequently, inclusion of importance ratio analysis improved the comprehensive performance of the status of river water quality. Further, our MSE (mean square error) as well as df (degrees of freedom) were much lower. This demonstrates that MPR has added advantages in visualizations and sensitivity analysis.

Similar models can be developed to predict the eutrophication of other rivers where macrophytes play a major role. The simplicity of MPR models could be easily adopted in the analysis of “big” ecological data, and nonlinear sensitivity analysis could be used to develop relationships between the variables that are difficult to discover using black box methods.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/w13192708/s1, Figure S1: Comparison of predicted and observed MIR, Figure S2: Comparison of predicted and observed RMNI, Figure S3: Comparison of predicted and observed IBMR, Figure S4: Comparison of predicted and observed D, Figure S5: Comparison of predicted and observed N, Figure S6: Correlation between Independent Variables, Table S1: Equations for Macrophyte indices calculations.

Author Contributions

Conceptualization, S.C.K.J., D.A.V. and D.G.; methodology, D.A.V., S.C.K.J. and V.S.; software, D.A.V.; validation, S.C.K.J., D.A.V. and V.S.; formal analysis, V.S. and S.C.K.J.; investigation, V.S. and S.C.K.J.; resources, D.A.V.; data curation, V.S. and S.C.K.J.; writing—original draft preparation, V.S.; writing—review and editing, D.A.V., S.C.K.J. and D.G.; visualization, V.S. and S.C.K.J.; supervision, S.C.K.J.; project administration, S.C.K.J. and D.A.V.; funding acquisition, D.A.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article or supplementary material. The data presented in this study are available in “Non-Linear Visualization and Importance Ratio Analysis of Multivariate Polynomial Regression Ecological Models based on River Hydromorphology and Water Quality”.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hefley, T.J.; Broms, K.M.; Brost, B.M.; Buderman, F.E.; Kay, S.L.; Scharf, H.R.; Tipton, J.R.; Williams, P.J.; Hooten, M.B. The basis function approach for modeling autocorrelation in ecological data. Ecology 2016, 93, 632–646. [Google Scholar] [CrossRef] [Green Version]
Milošević, D.; Mančev, D.; Čerba, D.; Piperac, M.S.; Popović, N.; Atanacković, A.; Đuknić, J.; Simić, V.; Paunović, M. The potential of chironomid larvae-based metrics in the bioassessment of non-wadeable rivers. Sci. Total Environ. 2018, 616, 472–479. [Google Scholar] [CrossRef] [PubMed]
Gebler, D.; Szoszkiewicz, K.; Pietruczuk, K. Modeling of the river ecological status with macrophytes using artificial neural networks. Limnologica 2017, 65, 46–54. [Google Scholar] [CrossRef]
Szoszkiewicz, K.; Jusik, S.; Lawniczak, A.E.; Zgola, T. Macrophyte development in unimpacted lowland rivers in Poland. Hydrobiologia 2010, 656, 117–131. [Google Scholar] [CrossRef] [Green Version]
Meena, D.K.; Lianthuamluaia, L.; Mishal, P.; Swain, H.S.; Naskar, B.K.; Saha, S.; Sandhya, K.M.; Kumari, S.; Tayung, T.; Sarkar, U.K.; et al. Assemblage patterns and community structure of macro-zoobenthos and temporal dynamics of eco-physiological indices of two wetlands, in lower gangetic plains under varying ecological regimes: A tool for wetland management. Ecol. Eng. 2019, 130, 1–10. [Google Scholar] [CrossRef]
Zuo, K.; Wu, Y.; Li, C.; Xu, J.; Zhang, M. Ecosystem-Based Restoration to Mitigate Eutrophication: A Case Study in a Shallow Lake. Water 2020, 12, 2141. [Google Scholar]
Najafzadeh, M.; Homaei, F.; Mohamadi, S. Reliability evaluation of groundwater quality index using data-driven models. Environ. Sci. Pollut. Res. 2021. [CrossRef]
Najafzadeh, M.; Homaei, F.; Farhadi, H. Reliability assessment of water quality index based on guidelines of national sanitation foundation in natural streams: Integration of remote sensing and data-driven models. Artif. Intell. Rev. 2021, 54, 4619–4651. [Google Scholar] [CrossRef]
Haury, J.; Peltre, M.C.; Tremolieres, M.; Barbe, J.; Thiebaut, G.; Bernez, I.; Daniel, H.; Chatenet, P.; Haan-Archipof, G.; Muller, S.; et al. A new method to assess water trophy and organic pollution-the Macrophyte Biological Index for Rivers (IBMR): Its application to different types of rivers and pollution. Hydrobiologia 2006, 570, 153–158. [Google Scholar] [CrossRef]
Mikulyuk, A.; Martha, B.; Jennifer, H.; Catherine, H.; Ellen, K.; Kristi, M.; Michelle, N.E.; Daniel, O.L.; Kelly, W.I. A macrophyte bioassessment approach linking taxon-specific tolerance and abundance in north temperate lakes. J. Environ. Manag. 2017, 199, 172–180. [Google Scholar] [CrossRef]
European Commission. Establishing a Framework for Community Action in the Field of Water Policy; Directive 2000/60/EC of the European Parliament and of the Council; European Commission: Brussels, Belgium, 2000. [Google Scholar]
Bytyqi, P.; Czikkely, M.; Shala-Abazi, A.; Osman, F.; Ismaili, M.; Hyseni-Spahiu, M.; Ymeri, P.; Kabashi-Kastrati, E.; Millaku, F. Macrophytes as biological indicators of organic pollution in the Lepenci River Basin in Kosovo. J. Freshw. Ecol. 2020, 35, 105–121. [Google Scholar] [CrossRef] [Green Version]
Denny, P. Sites of Nutrient Absorption in Aquatic Macrophytes. J. Ecol. 1972, 60, 819–829. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, J.; Li, Z.; Wang, G.; Liu, Y.; Wang, H.; Xie, J. Optimal submerged macrophyte coverage for improving water quality in a temperate lake in China. Ecol. Eng. 2021, 162, 106177. [Google Scholar] [CrossRef]
Damanik-Ambarita, M.N.; Everaert, G.; Forio, M.A.E.; Nguyen, T.H.T.; Lock, K.; Musonge, P.L.S.; Suhareva, N.; Dominhuez-Granda, L.; Bennetsen, E.; Boets, P.; et al. Generalized Linear Models to Identify Key Hydromorphological and Chemical Variables Determining the Occurence of Macroinvertebrates in the Guayas River Basin (Ecuador). Water 2016, 8, 297. [Google Scholar] [CrossRef] [Green Version]
Carvalho, L.; Poikane, S.; Lyche-Solheim, A.; Phillips, G.; Borics, G.; Catalan, J.; Hoyos, C.D.; Drakare, S.; Dudley, B.J.; Järvinen, M.; et al. Strength and uncertainity of phytoplankton metrics for assessing eutrophication impacts in lakes. Hydrobiologia 2013, 704, 127–140. [Google Scholar] [CrossRef] [Green Version]
Hering, D.; Borja, A.; Carstensen, J.; Carvalho, L.; Elliott, M.; Feld, C.K.; Heiskanen, A.-S.; Johnson, R.K.; Moe, J.; Pont, D.; et al. The European Water Frame Directive at the age of 10: A critical review of the achievements with recommendations for the future. Sci. Total Environ. 2010, 408, 4007–4019. [Google Scholar] [CrossRef] [Green Version]
Gebler, D.; Wiegleb, G.; Szoszkiewic, K. Integrating river hydromorphology and water quality into ecological status modeling by artificial neural networks. Water Res. 2018, 139, 395–405. [Google Scholar] [CrossRef] [PubMed]
Krtolica, I.; Cvijanović, D.; Obradović, Đ.; Novković, M.; Milošević, D.; Savić, D.; Vojinović-Miloradov, M.; Radulović, S. Water quality and macrophytes in the Danube River: Artificial neural network modelling. Ecol. Indic. 2021, 121, 107076. [Google Scholar] [CrossRef]
Tu, J.V. Advantages and Disadvantages of Using Artificial Neural versus Logistic Regression for Predicting Medical Outcomes. J. Clin. Epidemiol. 1996, 49, 1225–1231. [Google Scholar] [CrossRef]
Li, Y.; Rad, A.B.; Peng, W. An Enhanced Training Algorithm for Multilayer Neural Networks Based on Reference Output of Hidden Layer. Neural Comput. Appl. 1999, 8, 218–225. [Google Scholar] [CrossRef]
Vaccari, David, A. TaylorFit Response Surface Analysis- with stepwise Multivariate Polynomial Regression. Available online: http://www.taylorfit-rsa.com/ (accessed on 15 August 2018).
Su, S.; Zhi, J.; Lou, L.; Huang, F.; Chen, X.; Wu, J. Spatio-temporal patterns and source apportionment of pollution in Qiantang River (China) using neural-based modeling and multivariate statistical techniques. Phys. Chem. Earth 2011, 36, 379–386. [Google Scholar] [CrossRef]
Jagupilla, S.C.K.; Vaccari, D.A.; Hires, R.I. Multivariate Polynomial Time-Series Models and Importance Ratios to Qualify Fecal Coliform Sources. J. Environ. Eng. 2010, 136, 657–665. [Google Scholar] [CrossRef]
Kazi, T.; Arain, M.; Jamali, M.; Jalbani, N.; Afridi, H.; Sarfraz, R.; Baig, J.A.; Shah, A.Q. Assessment of water quality of polluted lake using multivariate statistical techniques: A case study. Ecotoxicol. Environ. Saf. 2009, 72, 301–309. [Google Scholar] [CrossRef]
Vaccari, D.A.; Wang, H.-K. Multivariate polynomial regression for identification of chaotic time series. Math. Comput. Model. Dyn. Syst. 2007, 13, 395–412. [Google Scholar] [CrossRef]
Jagupilla, S.C.K.; Shah, V.; Ramaswamy, V.; Gurumurthy, P.; Vaccari, D.A. Prediction of Boundary and Stormwater E. Coli Concentrations Using River Flows and Baseflow Index. J. Environ. Eng. 2020, 146, 04020017. [Google Scholar] [CrossRef]
Jongman, R.H.G.; Braak, C.J.F.T.; van Tongeren, O.F.R. Data Analysis in Community and Landscape Ecology; Cambridge University Press: Cambridge, UK, 1995. [Google Scholar]
Mead, R. A Note on the Use and Misuse of Regression Models in Ecology. J. Ecol. 1971, 59, 215–219. [Google Scholar] [CrossRef]
Szoszkiewicz, K.; Jusik, S.; Pietruczuk, K.; Gebler, D. The Macrophyte Index for Rivers (MIR) as an Advantageous Approach to Running Water Assessment in Local Geographical Conditions. Water 2019, 12, 108. [Google Scholar] [CrossRef] [Green Version]
Willby, N.; Pitt, J.A.; Phillips, G. The Ecological Classification of UK Rivers Using Aquatic Macrophytes; Environmental Agency Science Report: Bristol, UK, 2012. [Google Scholar]
Simpson, E.H. Measurement of diversity. Nature 1949, 163, 688. [Google Scholar] [CrossRef]
Raven, P.J.; Holmes, N.T.H.; Dawson, F.H.; Everard, M. Quality assessment using river habitat survey data. Aquat. Conserv. Mar. Freshw. Ecosyst. 1998, 8, 477–499. [Google Scholar] [CrossRef]
Schoefs, F. Sensitivity approach for modelling the environmental loading of marine structures through a matrix response surface. Reliab. Eng. Syst. Saf. 2008, 93, 1004–1017. [Google Scholar] [CrossRef] [Green Version]
Giustolisi, O.; Savic, D.A. A symbolic data-driven technique based on evolutionary polynomial regression. J. Hydroinform. 2006, 8, 207–222. [Google Scholar] [CrossRef] [Green Version]
Jusik, S.; Szoszkiewicz, K.; Kupiec, J.M.; Lewin, I.; Samecka-Cymerman, A. Development of comprehensive river typology based on macrophytes in the mountain-lowland gradient of different Central European ecoregions. Hydrobiologia 2014, 745, 241–262. [Google Scholar] [CrossRef] [Green Version]
Gevrey, M.; Dimopoulos, I.; Lek, S. Review and comparison of methods to study the contribution of variables in artificial neutral network models. Ecol. Model. 2003, 160, 249–264. [Google Scholar] [CrossRef]
Hachoł, J.; Bondar-Nowakowska, E.; Nowakowska, E. Factors Influencing Macrophyte Species Richness in Unmodified and Altered Watercourses. Pol. J. Environ. Stud. 2018, 28, 609–622. [Google Scholar] [CrossRef]
Birk, S.; Willby, N. Towards harmonization of ecological quality classification: Establishing common grounds in European macrophyte assessment for rivers. Hydrobiologia 2010, 652, 149–163. [Google Scholar] [CrossRef]
Saloua, B.; Abdelilah, R.; Lahsen, C.; Soumaya, H.; Elhassan, A. Evaluation of Biological Water Quality by Biological Macrophytic Index in River: Application on the Watershed of Beht River. Eur. Sci. J. ESJ 2017, 13, 217–244. [Google Scholar] [CrossRef] [Green Version]
Thiebaut, G.; Guerold, F.; Muller, S. Are trophic and diversity indices based on macrophyte communities pertinent tools to monitor water quality? Water Res. 2002, 36, 3602–3610. [Google Scholar] [CrossRef]
Rameshkumar, S.; Radhakrishnan, K.; Aanand, S.; Rajaram, R. Influence of physicochemical water quality on aquatic macrophyte diversity in seasonal wetlands. Appl. Water Sci. 2019, 9, 12. [Google Scholar] [CrossRef]

Figure 1. (a) HQA-conductivity nonlinear relationship with MIR; (b) importance ratio of independent variables with MIR.

Figure 2. (a) Total nitrogen–total phosphorous nonlinear relationship with RMNI; (b) importance ratio of independent variables with RMNI.

Figure 3. (a) HMS-organic nitrogen nonlinear relationship with IBMR; (b) importance ratio of independent variables with IBMR.

Figure 4. (a)HMS-HQA nonlinear relationship with D; (b) importance ratio of independent variables with D.

Figure 5. (a) HMS-HQA nonlinear relationship with N; (b) importance ratio of independent variables with N.

Table 1. The physicochemical parameters, hydromorphological indices, and macrophyte indices associated with their abbreviations, units, and ranges. Data from previous work [18].

Parameter	Shortcut	Unit	Range
pH	pH	-	5.84–8.82
Conductivity	Cond.	mS·cm⁻¹	101–2250
Alkalinity	Alkal.	mg CaCO₃·dm⁻³	40–564
Total Phosphorus	P_Tot.	mg P·dm⁻³	0.03–2.56
Reactive Phosphorus	P_PO4	mg PO₄³⁻·dm⁻³	0.01–1.99
Nitrate Nitrogen	N_NO3	mg N-NO₃⁻·dm⁻³	0.02–5.74
Nitrite Nitrogen	N_NO2	mg N-NO₂⁻·dm⁻³	0.002–0.543
Ammonia Nitrogen	N_NH4	mg N-NH₄⁺·dm⁻³	0.01–7.75
Organic Nitrogen	N_Org.	mg N_org·dm⁻³	0.34–15.09
Total Nitrogen	N_Tot.	mg N·dm⁻³	0.19–24.82
Biochemical Oxygen Demand	BOD₅	mg O₂·dm⁻³	0.04–10.88
Dissolved Oxygen	O₂	mg O₂·dm⁻³	0.42–22.32
Habitat Quality Assessment	HQA	-	6–53
Habitat Modification Score	HMS	-	11–108
Macrophyte Index for Rivers	MIR	-	10.00–80.00
Macrophyte Biological Index for Rivers	IBMR	-	4.44–16.18
River Macrophyte Nutrient Index	RMNI	-	3.56–8.98
Species Richness	N	-	2–35
Simpson Diversity Index	D	-	0.01–0.92

Table 2. Polynomial equation for each MPR model (the notations follow Table 1).

Index	Equation
MIR	$3.34780 \times 10^{1} + 1.7305 \times 10^{0} {(Ptot)}^{- 1} {(Norg)}^{- 1} - 3.41020 \times 10^{1} {(HQA)}^{- 1} {(Ptot)}^{- 1} -$ $0.1163 (HMS) {(Norg)}^{- 1} + 1.7372 \times 10^{4} {(pH)}^{- 1} {(cond .)}^{- 1} - 1.6 \times 10^{- 3} {(NNO 3)}^{- 2} {(BOD 5)}^{2} + 6.9444 \times$ $10^{- 4} {(Ptot)}^{- 2} {(NNO 2)}^{- 1} + 7.5 \times 10^{- 2} (NNO 2) {(NNH 4)}^{- 2} - 9.92 \times 10^{- 2} (PPO 4) {(Ntot)}^{2} - 3.3 \times$ $10^{- 3} {(Ptot)}^{- 2} {(NNH 4)}^{- 1}$
RMNI	$8.2576 \times 10^{0} - 6.2496 \times 10^{- 4} {(HQA)}^{2} (NNO 3) {(Ntot)}^{- 1} - 1.416 \times 10^{- 1} {(alkal)}^{- 1} {(Ptot)}^{- 1} {(NNO 2)}^{- 1} + 1.1 \times$ $10^{- 3} (cond) {(alkal)}^{- 1} {(NNO 2)}^{- 1} - 1.5393 \times 10^{- 4} (Ptot) {(NNH 4)}^{- 1} O 2^{2} + 2.2185 \times$ $10^{- 5} {(Ptot)}^{- 1} {(NNO 3)}^{- 1} {(NNO 2)}^{- 1}$
IBMR	$9.5335 \times 10^{0} - 2.30 \times 10^{- 2} {(Ptot)}^{- 1} (NNO 2) (BOD 5) + 3.226 \times 10^{- 1} {(Ptot)}^{- 1} {(Norg)}^{- 1} - 1.4 \times$ $10^{- 3} (alkal) {(Ptot)}^{- 1} {(Ntot)}^{- 1} + 1.7618 \times 10^{6} {(HMS)}^{- 2} {(cond)}^{- 2} {(NNH 4)}^{- 1} - 7.72 \times$ $10^{- 2} {(HMS)}^{- 2} {(NNO 2)}^{- 2} {(Norg)}^{- 3}$
D	$6.806 \times 10^{- 1} - 3.5578 \times 10^{- 6} {(HQA)}^{- 1} {(HMS)}^{2} {(O 2)}^{2} - 8.8749 \times 10^{- 6} {(PPO 4)}^{- 2} {(NNH 4)}^{- 1} {(BOD 5)}^{- 2} -$ $3.88 \times 10^{- 2} {(HMS)}^{- 1} {(NNO 3)}^{- 1} {(BOD 5)}^{2} + 1.0894 \times 10^{- 5} {(HQA)}^{- 2} {(cond)}^{2} {(NNO 3)}^{- 1}$
N	$3.2233 \times 10^{1} - 9.30 \times 10^{- 2} (HMS) - 9.4650 \times 10^{1} (pH) {(alkal)}^{- 1} - 3.438 \times 10^{- 1} {(HQA)}^{- 1} (alkal) -$ $3.1979 \times 10^{2} {(cond)}^{- 1} (BOD 5) - 1.9307 \times 10^{0} {(HMS)}^{- 1} (NNO 3) {(NNO 2)}^{- 1} + 1.2670 \times$ $10^{- 4} {(NNO 2)}^{- 1} {(NNH 4)}^{- 1} (Ntot) - 9.64 \times 10^{- 2} (cond) (NNO 2) {(O 2)}^{- 1}$

Table 3. T-statistics and p-values for MPR models.

	Term:	1	2	3	4	5	6	7	8	9
MIR	t-stat	5.53	−7.40	−4.91	5.40	−3.50	4.28	3.79	−2.61	−1.98
MIR	p(t)	1.71 × 10⁻⁷	1.51 × 10⁻¹¹	2.71 × 10⁻⁶	3.08 × 10⁻⁷	6.42 × 10⁻⁴	3.60 × 10⁻⁵	2.30 × 10⁻⁴	1.00 × 10⁻²	5.00 × 10⁻²
RMNI	t-stat	4.77	9.05	6.04	3.28	2.94	-	-	-	-
RMNI	p(t)	4.72 × 10⁻⁶	1.55 × 10⁻¹⁵	1.49 × 10⁻⁸	1.30 × 10⁻³	3.90 × 10⁻³	-	-	-	-
IBMR	t-stat	−5.47	−2.76	8.79	−5.13	2.91	−4.08	-	-	-
IBMR	p(t)	2.14 × 10⁻⁷	6.00 × 10⁻³	6.33 × 10⁻¹⁵	1.00 × 10⁻⁶	4.00 × 10⁻³	7.76 × 10⁻⁵
D	t-stat	−5.73	−3.80	−2.25	2.58	-	-	-	-	-
D	p(t)	6.13 × 10⁻⁸	2.16 × 10⁻⁴	2.50 × 10⁻²	1.10 × 10⁻²	-	-	-	-	-
N	t-stat	−4.01	−3.70	−2.66	−2.77	−3.93	2.99	−2.26	-	-
N	p(t)	1.00 × 10⁻⁴	3.09 × 10⁻⁴	8.00 × 10⁻³	6.00 × 10⁻³	1.39 × 10⁻⁴	3.00 × 10⁻³	2.50 × 10⁻²	-	-

Table 4. Comparing MSE for MPR and ANN models using the F-statistic (ANN data from previous work [18]).

	Type of Model	Validation R²	Degrees of Freedom (df) for Error	Mean Square Error (MSE)	F-Stat	p(F)
MIR	MPR	0.58	12	10.856	1.11	0.373
MIR	ANN	0.702	52	9.790	1.11	0.373
RMNI	MPR	0.65	9	0.048	1.00	0.452
RMNI	ANN	0.715	52	0.050	1.00	0.452
IBMR	MPR	0.47	9	0.364	0.88	0.551
IBMR	ANN	0.532	52	0.411	0.88	0.551
D	MPR	0.237	8	0.008	1.00	0.447
D	ANN	0.284	52	0.009	1.00	0.447
N	MPR	0.33	11	8.392	0.90	0.544
N	ANN	0.415	52	9.288	0.90	0.544

Table 5. Independent variable importance ratio statistics.

Index	Independent Variables	10% Percentile	Median	90% Percentile	Root Mean Square (RMS)
MIR	HMS	−0.437	−0.213	−0.108	0.295
	HQA	0.097	0.250	0.666	0.659
	pH	−0.057	−0.035	−0.018	0.044
	BOD5	−0.098	−0.005	0.000	0.424
	Conductivity	−1.410	−0.467	−0.112	1.358
RMNI	HQA	0.002	0.006	0.019	0.020
	Ptot	−0.001	0.475	4.122	8.350
	NNO3	−0.285	0.080	0.290	8.737
	NNH4	0.002	0.074	1.273	14.640
	Conductivity	0.000	0.000	0.002	0.000
IBMR	HMS	−0.561	−0.482	−0.396	0.892
	Norg	−2.107	−0.295	−0.038	3.993
	NNO2	−0.278	−0.024	2.907	14.607
	BOD5	−0.169	−0.031	−0.009	0.173
	Conductivity	−0.245	−0.013	−0.001	0.797
D	HMS	−0.427	−0.127	0.074	0.580
	HQA	−0.115	0.023	0.264	0.589
	NNO3	−0.654	0.000	0.755	4.638
	BOD5	−0.370	−0.038	0.672	7.324
	Conductivity	0.010	0.057	0.279	0.262
N	HMS	−0.602	−0.513	0.128	0.803
	HQA	0.067	0.201	0.747	0.647
	NNO2	−0.291	0.516	8.040	6.463
	BOD5	−0.619	−0.367	−0.182	0.466
	Conductivity	−0.024	0.208	0.879	0.798

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shah, V.; Jagupilla, S.C.K.; Vaccari, D.A.; Gebler, D. Non-Linear Visualization and Importance Ratio Analysis of Multivariate Polynomial Regression Ecological Models Based on River Hydromorphology and Water Quality. Water 2021, 13, 2708. https://doi.org/10.3390/w13192708

AMA Style

Shah V, Jagupilla SCK, Vaccari DA, Gebler D. Non-Linear Visualization and Importance Ratio Analysis of Multivariate Polynomial Regression Ecological Models Based on River Hydromorphology and Water Quality. Water. 2021; 13(19):2708. https://doi.org/10.3390/w13192708

Chicago/Turabian Style

Shah, Vishwa, Sarath Chandra K. Jagupilla, David A. Vaccari, and Daniel Gebler. 2021. "Non-Linear Visualization and Importance Ratio Analysis of Multivariate Polynomial Regression Ecological Models Based on River Hydromorphology and Water Quality" Water 13, no. 19: 2708. https://doi.org/10.3390/w13192708

APA Style

Shah, V., Jagupilla, S. C. K., Vaccari, D. A., & Gebler, D. (2021). Non-Linear Visualization and Importance Ratio Analysis of Multivariate Polynomial Regression Ecological Models Based on River Hydromorphology and Water Quality. Water, 13(19), 2708. https://doi.org/10.3390/w13192708

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Non-Linear Visualization and Importance Ratio Analysis of Multivariate Polynomial Regression Ecological Models Based on River Hydromorphology and Water Quality

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Management

2.2. Model Development

2.3. Sensitivity Analysis

3. Results

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI