Artificial Neural Networks (ANNs) have wide applications in aquatic ecology and specifically in modelling water quality and biotic responses to environmental predictors. However, data scarcity is a common problem that raises the need to optimize modelling approaches to overcome data limitations. With this paper, we investigate the optimal
k-fold cross validation in building an ANN using a small water-quality data set. The ANN was created to model the chlorophyll-
a levels of a shallow eutrophic lake (Mikri Prespa) located in N. Greece. The typical water quality parameters serving as the ANN’s inputs are pH, dissolved oxygen, water temperature, phosphorus, nitrogen, electric conductivity, and Secchi disk depth. The available data set was small, containing only 89 data samples. For that reason,
k-fold cross validation was used for training the ANN. To find the optimal
k value for the
k-fold cross validation, several values of
k were tested (ranging from 3 to 30). Additionally, the leave-one-out (LOO) cross validation, which is an extreme case of the
k-fold cross validation, was also applied. The ANN’s performance indices showed a clear trend to be improved as the
k number was increased, while the best results were calculated for the LOO cross validation as expected. The computational times were calculated for each
k value, where it was found the computational time is relatively low when applying the more expensive LOO cross validation; therefore, the LOO is recommended. Finally, a sensitivity analysis was examined using the ANN to investigate the interactions of the input parameters with the Chlorophyll-
a, and hence examining the potential use of the ANN as a water management tool for nutrient control.
Full article