Next Article in Journal
3DLEB-Net: Label-Efficient Deep Learning-Based Semantic Segmentation of Building Point Clouds at LoD3 Level
Next Article in Special Issue
Prediction of the Long-Term Performance Based on the Seepage-Stress-Damage Coupling Theory: A Case in South-to-North Water Diversion Project in China
Previous Article in Journal
Container Volume Prediction Using Time-Series Decomposition with a Long Short-Term Memory Models
Previous Article in Special Issue
Chaotic Characteristic Analysis of Vibration Response of Pumping Station Pipeline Using Improved Variational Mode Decomposition Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluation of Seasonal and Spatial Variations in Water Quality and Identification of Potential Sources of Pollution Using Multivariate Statistical Techniques for Lake Hawassa Watershed, Ethiopia

by
Semaria Moga Lencha
1,2,*,
Mihret Dananto Ulsido
2,3 and
Alemayehu Muluneh
2
1
Faculty of Agriculture and Environmental Sciences, University of Rostock, 18051 Rostock, Germany
2
Faculty of Biosystems and Water Resource Engineering, Institute of Technology, Hawassa University, Hawassa P.O. Box 05, Ethiopia
3
Center for Ethiopian Rift Valley studies, Hawassa University, Hawassa P.O. Box 05, Ethiopia
*
Author to whom correspondence should be addressed.
Appl. Sci. 2021, 11(19), 8991; https://doi.org/10.3390/app11198991
Submission received: 2 August 2021 / Revised: 20 September 2021 / Accepted: 23 September 2021 / Published: 27 September 2021
(This article belongs to the Special Issue Water Quality Modelling, Monitoring and Mitigation)

Abstract

:
The magnitude of pollution in Lake Hawassa has been exacerbated by population growth and economic development in the city of Hawassa, which is hydrologically closed and retains pollutants entering it. This study was therefore aimed at examining seasonal and spatial variations in the water quality of Lake Hawassa Watershed (LHW) and identifying possible sources of pollution using multivariate statistical techniques. Water and effluent samples from LHW were collected monthly for analysis of 19 physicochemical parameters during dry and wet seasons at 19 monitoring stations. Multivariate statistical techniques (MVST) were used to investigate the influences of an anthropogenic intervention on the physicochemical characteristics of water quality at monitoring stations. Through cluster analysis (CA), all 19 monitoring stations were spatially grouped into two statistically significant clusters for the dry and wet seasons based on pollution index, which were designated as moderately polluted (MP) and highly polluted (HP). According to the study results, rivers and Lake Hawassa were moderately polluted (MP), while point sources (industry, hospitals and hotels) were found to be highly polluted (HP). Discriminant analysis (DA) was used to identify the most critical parameters to study the spatial variations, and seven significant parameters were extracted (electrical conductivity (EC), dissolved oxygen (DO), chemical oxygen demand (COD), total nitrogen (TN), total phosphorous (TP), sodium ion (Na+), and potassium ion (K+) with the spatial variance to distinguish the pollution condition of the groups obtained using CA. Principal component analysis (PCA) was used to qualitatively determine the potential sources contributing to LHW pollution. In addition, three factors determining pollution levels during the dry and wet season were identified to explain 70.5% and 72.5% of the total variance, respectively. Various sources of pollution are prevalent in the LHW, including urban runoff, industrial discharges, diffused sources from agricultural land use, and livestock. A correlation matrix with seasonal variations was prepared for both seasons using physicochemical parameters. In conclusion, effective management of point and non-point source pollution is imperative to improve domestic, industrial, livestock, and agricultural runoff to reduce pollutants entering the Lake. In this regard, proper municipal and industrial wastewater treatment should be complemented, especially, by stringent management that requires a comprehensive application of technologies such as fertilizer management, ecological ditches, constructed wetlands, and buffer strips. Furthermore, application of indigenous aeration practices such as the use of drop structures at critical locations would help improve water quality in the lake watershed.

1. Introduction

Studies have shown that urban, agricultural, and industrial discharges have a direct effect on surface water quality. Similarly, urban wastewaters cause fecal contamination of surface waters, and urban stormwater runoff, which contains large amounts of fecal microbes, also affects surface water quality [1]. Surface water bodies are vital natural resources that are vulnerable to pollution. The contaminants are chemical, physical, and biological constituents resulting from anthropogenic activities and are of greater environmental consideration [2]. Surface water bodies are extensively used as the major sources for domestic, non-domestic, industrial, and irrigation purposes. Therefore, monitoring and assessment of water bodies is imperative to obtain reliable information on water quality for effective management [3]. Anthropogenic uses of the waterbodies in the study basin can degrade the quality of surface water and impair its usability as potable water supply or for industry, agriculture, recreation, or other purposes. Hence, regular monitoring of water quality of rivers and lake is indispensable [4,5]. The most affected river stretches are those that flow through urbanized and exceedingly populated urban areas where there is no adequate sanitation. Upstream rural areas are mainly affected by pollutants from non-point sources such as agricultural runoff, whereas urban areas are polluted by point sources, sewage discharges, urban runoff, and pollutants from upstream areas [6,7].
Studies have shown that some lakes and wetlands around the world have disappeared or are showing changes in their ecosystem. Furthermore, factors such as intensive land use for urbanization and agriculture have had significant impact on the hydrology, ecology, and ecosystem services of lakes, which has eventually led to a decline in lake levels [8]. In addition, pollutants have long been a concern, as their accumulation can have serious effects on fauna, flora, and human health when the huge amount of urban and industrial wastewater reaches the shores [9].
Lake Hawassa is located near the city of Hawassa and is surrounded by agricultural land, industries and residential areas. Therefore, it is susceptible to a variety of pollutants that enter the lake directly or indirectly. On the other hand, the Lake Hawassa Watershed is experiencing rapid land cover change, and natural resources have overwhelmingly diminished. The lake is hydrologically closed and has no apparent outlet, so all pollutants entering the lake are retained. As a result, the lake faces numerous problems, and the water quality deteriorates over time, threatening biodiversity [10].
Significant industrialization, augmented with rapid urbanization and increasing economic development, has increased the extent of pollution [11]. The pollution is mainly from non-point sources caused by urban and agricultural runoff, overgrazing, deforestation, soil erosion, land development, and industrial effluents. This leads to numerous environmental concerns that have resulted in substantial hydrological disturbances. The main factories in the study area are a ceramics factory, a flourmill, a cement products factory, a Moha soft drink factory, a BGI (St. George Brewery factory), an Etabs soap factory, an industrial park in Hawassa, and other small-scale industries. They are virtually all concentrated along the main road, which is close to the shallow swamp, and discharge their effluents into the lake through streams. On the other hand, deforestation and irrigation of the land have caused the drying up of Lake Cheleleka by reducing the streamflow [12].
Various studies have been conducted to examine water quality in the LHW catchment and identify sources of pollution. Teshome [11] investigated the eastern catchment of Lake Hawasa Watershed to assess the seasonal water quality and its suitability for the designated uses. The findings revealed that the rivers in the eastern part of Lake Hawassa Watershed are suitable for agriculture and livestock but unpleasant for aquatic life, and the lake is hypereutrophic.
Amare [13] investigated the primary sources of non-point source pollution and their relative contribution in Lake Hawassa Watershed using the Annualized Agricultural Non-Point Source (AnnAGNPS) model. The pollutant-loading model revealed non-point source pollutants originating from agricultural lands and associated with deleterious anthropogenic activities responsible for the water quality impairment of Lake Hawassa. These point sources have been determined to be the source of numerous pollutants in the lake ecosystem if the effluent control system put in place is unsuitable [14].
Kebede [15] studied the impact of land cover changes on water quality and streamflow in Lake Hawassa Watershed and concluded that water quality in the upper watershed of the three rivers was better than the lower sections of the catchment with respect to the parameters studied, which might be correlated to the observed land use.
A study conducted by Lencha et al. [16] at Lake Hawassa revealed that most of the population, including the inner part of the city, are using latrines. Larger buildings have conventional flushing systems but without any wastewater treatment. Furthermore, industrial and commercial point sources are known to discharge their effluents into streams or rivers that end up in the Lake. In addition, Hawassa Industrial Park and Referral Hospital discharge their effluents directly into the lake. This is a threat to the people who rely on rivers, streams, and the lake for domestic and other purposes and to the survival of aquatic life as well.
To sum up, some studies regarding the water quality have been conducted in either the eastern or the western catchment of Lake Hawassa, while others have been carried out only at Lake Hawassa. Nonetheless, there is no sufficient water quality study to connect agricultural and urban land use with the watershed pollution level to identify the sources of pollution. The previous studies mainly relied on random monitoring and data from literature and focused only on a few water quality parameters, which cannot reflect the whole picture of water quality in the watershed. Additionally, some previous studies also obtained contradictory findings. On the other hand, urbanization, industrialization, commercial activities, and population growth are increasing rapidly, which could increase sewage and effluents production. Through monitoring data, consistent data analysis, and homogenization of parameters, this study aimed to (1) statistically analyze multiple-parameter data by using principal component analysis (PCA), cluster analysis (CA), and discriminant analysis (DA); (2) investigate the broad-spectrum variation in the parameters of LHW; and (3) cluster monitoring stations with similar characteristics and identify potential sources of pollution in LHW.

2. Materials and Methods

2.1. Study Area

Lake Hawassa Watershed (LHW) is located 275 km from the capital Addis Ababa, in the capital of Sidama regional state, on the main road leading to Nairobi, Kenya via Moyale. LHW has a total area of 1431 km2 and lies between 6°45′ to 7°15′ N latitude and 38°15′ to 38°45′ E longitude (Figure 1). LHW comprises five sub-watersheds [17].
The watershed is known for its flat plains and dissevered undulating landscape with elevation ranging from 1571 to 2962 m above sea level [18]. The area comprises mountains and low-lying areas, with a wide flat wetland called Cheleleka. Perennial rivers and streams on the north and northeast sides of the catchment and runoff on the east wall feed Cheleleka. The sub-basin of Tikur-Wuha consists of only a tributary called Tikur-Wuha that flows into Lake Hawassa. In this lake system, no surface water flows out from the lake except by evaporation and abstraction, so the catchment can be considered hydrologically closed [15]. The climate of the Hawassa sub-basin is sub-humid and distinctly seasonal. The months from April to October are wet and humid, and the main rainy season is between July and September, with a mean annual precipitation of about 955 mm. The mean minimum precipitation is 17.8 mm in December (dry season) and the mean maximum precipitation is 119.8 mm in August (rainy season) [19].

2.2. Sampling and Monitoring Parameters

The monitoring sites and sampling strategy were planned to cover a wide range of factors contributing to the water quality of the river, taking into account tributaries and point sources whose effluents end up in the lake and have a substantial impact on the water quality of the lake. The criteria for selecting monitoring points were hydrological, with confluence of sub-basins having distinct characteristics and land use types, with the intention of transferring parameters to unmonitored sub-basins. Furthermore, factors such as availability of point and non-point sources, land use type, and urban and wastewater drains were considered in the selection of monitoring sites.
Hence, a total of nineteen (19) monitoring stations were selected (Table 1 and Figure 1). Four (4) monitoring sites were selected purposively at the Wesha, Hallow, Wedessa, and Tikur-Wuha river mouths of the respective sub-watersheds.
Eleven (11) monitoring sites were distributed evenly along the entire course of Lake Hawassa for water quality monitoring. Three (3) monitoring sites were selected near the industrial disposal site, and one (1) was at the health care center.
The monitoring sites in the Tikur-Wuha catchment were Wesha River (MS1), Hallo River (MS2), and Wedessa River (MS3), which are located in the upstream part of Lake Hawassa, where agricultural runoff from the catchment flows directly or through its tributaries into the Cheleleka wetland. The three rivers were purposively selected based on their size and spatial location to represent their respective sub-basins. Monitoring station 6 (MS6) is a critical area with mostly fresh water where factories discharge their effluent into the Tikur-Wuha River, and the river eventually flows into Lake Hawassa. This is an area where river inputs to the lake are high.
Monitoring sites for point sources were selected from available industries in the catchment that directly or indirectly feed Lake Hawassa. The selected sites were the St. George Brewery factory, BGI (MS4), and the Moha soft drink factory (MS5), whose effluents discharge into the Cheleleka wetland and eventually enter Lake Hawassa via Tikur-Wuha River, as well as the Referral Hospital (MS15) and Hawassa Industrial park (MS19), which discharge their effluents directly in to Lake Hawassa.
The monitoring stations for Lake Hawassa were selected based on the presence of major pollution sources in the lake, existence of point sources, health facilities, industrial effluent emission sites, availability of boating and recreational activities, presence of service rendering facilities such as Haile and Lewi resorts, fish market (Amora-Gedel and Gudumale), and also the central part of the lake where the disturbance is minimum.
For this purpose, eight (8) monitoring sites were selected in the eastern part (northeast to southeast) of the lake and designated as MS7, MS8, MS9, MS10, MS11, MS12, MS13, and MS14.
The other three (3) monitoring sites were located on the western (northwest to southwest) sides of the lake and were designated as MS16 for the local village Ali-Girma site (opposite Haile Resort), MS17 for Sima site that is opposite side of Mount Tabor, and MS18 for Dore-Bafana Betemengist site. In this part of the lake, although there is no point source pollution, there is enormous anthropogenic activity in the form of non-point source pollution from recreational activities, agricultural runoff, and animal waste.
The analyses of physicochemical water quality parameters at selected sites and periods were conducted from May 2020 to January 2021 to see seasonal variation. Sample collection for the wet season was event-based, i.e., samples were collected after rainfall events. The coordinates of each sampling stations was determined using GNSS.
Composite samples were collected in pre-cleaned 2L polyethylene plastic bottles (sterilized glass bottles were used for biochemical oxygen demand (BOD) and chemical oxygen demand (COD) analyses) for different parameters. The bottles were washed with concentrated nitric acid and distilled water before sample collection and thoroughly rinsed with sample water during collection to avoid possible contamination. The water samples were aseptically handled, labelled, preserved in sterile glass bottles, stored in the cooler (Mobicool v30 AC/DC, Germany) and ice box, and transported to the laboratory of Hawassa University Environmental Engineering, Addis Ababa City Government Environmental Protection, and Green Development Commission and Engineering Corporation of Oromiya for analysis.
The collection, handling, preservation, and treatment of the water samples followed the standard methods outlined for the examination of water and wastewater by the American Public Health Association guidelines [20] and all the parameters were presented with their respective analytical methods and instruments used for analysis in Table 2 below.

Un-Ionized Ammonia Determination from Total Ammonium Nitrogen (TAN)

The un-ionized free ammonia was calculated by the mass action law in its logarithmic form (1). The pKa as function of temperature was taken from Emerson et al. [21]:
%   Un - ionized   NH 3 - N = 1 ( 1 + 10 ( pK a pH ) )
p K a = 0.09108 + 2729.92 ( T k )
where Tk is temperature in Kelvins (273 + °C).

3. Multivariate Statistical Techniques and Data Treatment

3.1. Multivariate Statistical Techniques

Multivariate statistical techniques (MVST) are a valuable tool to estimate efficiently the spatio-temporal variability in a watershed and the influences of human intervention on the characteristics of physicochemical parameters at monitoring stations [22]. In addition, MVST like cluster analysis (CA), discriminant analysis (DA), and PCA/factor analysis can be implemented to interpret complex databases to offer better visualization of water quality in the studied watershed [23]. The statistical techniques PCA, CA, and DA are vital to determine the primary relationships among the physicochemical parameters measured in experimental data standardized to the Z-scale to avoid inaccurate grouping because of the huge variability in the data dimensionality [5,24,25,26].
Principal component analysis (PCA), cluster analysis (CA) and discriminant Analysis (DA) were carried out to examine the seasonal variations, identify possible pollution sources, and analyze and interpret surface water quality data to draw meaningful information in China [2,7,27,28,29,30], South Asia—Bangladesh [31], the Middle East—Iran [3], India [23,32,33], South African [34], Ethiopia [22,35], South Asia Malaysia [36], the Middle East—Lebanon [6,37], Spain [38], and Serbia [39].
XLSTAT 2016 (Addinsoft, New York, USA), Microsoft Excel 2016, and “Statistical Package for the Social Sciences Software, IBM SPSS 25 for Windows” were employed to perform statistical analysis integrally.

3.2. Data Treatment and Multivariate Statistical Methods

PCA is sensitive to outliers, missing data, and poor linear correlation among variables due to insufficient assigned variables. Thus, the data treatment needs to be performed for missing data and outliers in the monitored water quality data before executing multivariate statistical analysis. There might be a real shift in the value of an observation that arises from non-random causes. In this study, outliers were detected according to Grubbs [40] test method using XLSTAT 2016. On the other hand, data collection and analysis were conducted with great prudence to minimize the amount of missing data. However, the incidence of missing data is inevitable and was handled by the multiple imputation of missing values technique using Markov Chain Monte Carlo (MCMC) [41].
The raw water quality parameters were standardized to a mean of 0 and variance of 1 using Z-scale transformation to examine the normality of the distribution of data sets and to ensure that the different variables were equally weighted in the statistical analyses [36]. The data were further checked for normality using Kaiser–Meyer–Olkin (KMO) and Bartlett’s sphericity tests to determine if our measured variables may be factorized efficiently. KMO is the degree of sampling adequacy, which shows the percentage of variance that is likely attributable to the underlying factors. Generally, the KMO index ought to be greater than 0.5 for satisfactory factor analysis. When the KMO index is close to 1, the PCA of the variables is suitable; however, when it is close to 0, the PCA is not relevant. In this study, the KMO had a value of 0.68. Bartlett’s test of sphericity shows whether the correlation matrix is an identity with variables that are unrelated. The significance level, which is 0 in this study (less than 0.05), indicates that there are significant relationships among the variables.

3.2.1. Principal Component (PCs)/Factor Analysis (FA)

PCA reduces the dimensionality of the data set by explaining the correlations amongst a large number of variables in terms of a smaller number of underlying factors without losing much information [42,43]. The original variables of PCs produce loadings that have correlation coefficients with PCs. The PCs’ formula was taken from [33,36]:
Y m n = Z m 1 X 1 n + Z m 2 X 2 n + Z m 3 X 3 n + Z m i X i n
where z is the component loading, y is the component score, x is the measured value of a variable, m is the component number, n is the sample number, and m is the total number of variables.
Meanwhile, FA attempts to extract a lower-dimensional linear structure from the data set and extracts the new group of variables known as varifactors (VFs) via rotation along the PCA axis. In FA, the basic concept is borrowed from [33,36]:
Y m n = Z p 1 P 1 m + Z p 2 P 2 m + Z p 3 P 3 m + + Z p r P r m + e p m
where y is the measured value of the variable, z refers to the factor loading, p is the factor score, m is the sample number, n is the variable number, r is the total number of factors, and e is the residual term accounting for errors or other sources of variation.
In this study, PCA was employed for qualitative determination of pollution sources.

3.2.2. Discriminant Analysis

DA was used for discriminating between and among groups by applying discriminating variables. These variables measure characteristics regarding which the groups are expected to differ [44]. DA applies a linear equation of a regression analysis on raw data with prior knowledge of membership of objects to particular clusters and provides statistical classification of samples, expressed in the following equation [43,45]:
f ( Gi ) = Ki + i = 1 n ( Wij Pij )
where Ki is a constant specific to each particular group, i is the number of groups (G), n is the number of parameters used in group classification, and Wij is the weight coefficient designated by DA for the specific parameter (Pij).
Independent variables are entered into DA either all together or stepwise, using both backward and forward approaches. In the first approach of variable entry, the discriminant function is calculated by engaging all the independent variables at once. This approach is used when there are a limited number of independent variables in the interest of discovering how well certain variables perform as discriminants in the absence of others. The stepwise method, on the other hand, involves entering the independent variables into the discriminant function (DF) one at a time. This stepwise input is based on the fact that variables with relative importance to the cluster variables with greater discriminant weights were entered first [46].
In this study, standard, forward, and backward stepwise approaches of DA were applied to each matrix of the primary data. In the forward stepwise mode, discriminant function analysis (DFA) variables were added stepwise until no significant change occurred, while in the backward stepwise mode, variables were removed starting from least significant until a significant change occurred. For this purpose, two groups obtained from CA were selected for spatial evaluations [35].

3.2.3. Pollution Index (PI)

Pollution index (PI) is a simple technique to examine surface water quality and was applied by Tiwan EPA. The parameters such as DO, BOD, SS, and NH3−N employed to determine PI were classified into four index scores (Table 3) and computed using the equation formulated by [47,48]. In particular, PI refers to the arithmetic mean of the index values with respect to the water quality.
PI = 1 4 K = 1 4 Si
PI classifies water quality into four categories: (0–2) for good or non-polluted, (2–3) for slightly polluted, (3–6) for moderately polluted, and (>6) for highly polluted. Anthropogenic activities have been associated with water quality degradation [47,49].

3.3. Cluster Analysis

Hierarchical agglomerative CA was carried out on the normalized data set using Ward’s approach, where Euclidean distances were used as the degree of similarity among samples, and a distance was represented by the distinction among analytical values. In hierarchical clustering, sequentially higher clusters formed [23,45,50,51,52]. In cluster analysis, cases are classified into classes based on similarities between two samples, which are usually given by the Euclidean distance between analytical values of the two samples. The squared Euclidean distance can be calculated by [53]:
Distance   ( Q i ,   Q j ) = j = 1 n ( X 1 i X 2 j ) 2
where Qi is the ith object, and Xij is the value of the jth variable of the ith object.
The dendrogram provides a visual summary of the clustering process to classify a sample of entities into a smaller number of mutually exclusive groups on the basis of multivariate similarities among entities [33].
Therefore, CA, DA, PCA, and pollution index were applied in this study to identify the underlying interrelationship among the parameters and monitoring stations. CA was applied based on prior knowledge of monitoring stations and the results of DA and pollution index to accurately cluster monitoring stations. PCA was employed to qualitatively identify pollution sources and the type of contaminants contributing to pollution.

4. Results and Discussion

4.1. Correlation Matrix Evaluation and Seasonal Variation

Correlation coefficients are established to portray a correlation among variables and measure statistical significance between pairs of water quality variables [54,55]. Correlation analysis measures the proximity between the identified dependent and independent variables. Correlation coefficients that are close to −1 or +1 demonstrates a strong correlation between x and y, which have a linear correlation. The correlation between the parameters is referred to as strong from (+0.8 to 1.0) or (−0.8 to −1.0), moderate from (+0.5 to 0.8) or (−0.5 to −0.8) and weak from (+0.0 to 0.5) or (−0.0 to −0.5) [56]. In cases where the correlation coefficient between variables is zero, there could be no correlation with a degree of p < 0.05 between the two variables [57]. In this study, a correlation matrix was constructed for each dry and wet season using the physicochemical parameters. Pearson’s correlation coefficient (r) is determined using correlation matrix to identify the highly correlated and interrelated water quality parameters. To test the significance of the pair of parameters, the p-value is determined.
In the wet season, strong positive correlations were observed between TDS values and EC, temperature, TP, TN, and Na+ values (r = 0.992, r = 0.874, r = 0.850, r = 0.836; p < 0.05), and strong negative correlations between TDS and DO with −0.825 at p < 0.05. Moderate positive correlations were found between TDS and PO4P, BOD, COD, and K+ values (r = 0.797, r = 0.698, r = 0.695, r = 0.523; p < 0.05), and low positive correlation between TDS and pH with r = 0.26; p < 0.05 (Table 4). Strong negative correlations were found between DO and EC, TDS, TP, and TN (r = −0.825, r = −0.850, r = −0.851, r =−0.806; p < 0.05), and moderate negative correlations were observed between DO and temperature, BOD, COD, and Na+ values (r = −0.526, r = −0.544, r = −0.692, r = −0.599; p < 0.05).
Strong positive correlations were observed between temperature and the values of EC, TDS, Na+ and TP (r = 0.86, r = 0.864, r = 0.849, r = 0.821; p < 0.05), and a moderate positive correlation was observed between temperature and the values of TN and PO4P (r = 0.525, r = 0.669, r = 0.594; p < 0.05). There was also a moderate negative correlation between temperature and DO, with r = −0.692 at p < 0.005. There was a weak correlation between temperature and the values of COD and BOD (r = 0.447, r = 0.454; p < 0.05).
NH3−N had a moderate positive correlation with K+, with r = 0.531 at p < 0.005, and weak positive correlations with TN and temperature (r = 0.331, r = 0.481 at p < 0.05). NO2−N correlated moderately positively with BOD and COD (r = 0.721, r = 0.664 at p < 0.05) and weakly positively with PO4P and Ca+2 (r = 0.449, r= 0.404 at p < 0.05).
A strong positive correlation was found between PO4P and TN, with r = 0.825 at p < 0.005, moderate positive correlations were found between PO4P and COD, BOD, TP, and temperature (r = 0.712, r = 0.709, r = 0.730, r = 0.602, r = 0.594; p < 0.05), and a moderate negative correlation was observed between PO4P and DO values (r = −0.793; p < 0.05). No statistically significant difference was found between pH and NO3−N and the rest of the parameters of LHW (p > 0.05).
In the dry season, strong positive correlations were observed between TDS values and EC, TP, Na+, PO4P, and temperature values (r = 0.999, r = 0.814, r = 0.899, r =0.839, r = 0.933; p < 0.05), moderate positive correlations were observed between TDS and BOD, COD, K+, and TN values (r = 0.686, r = 0.561, r = 0.645, r = 0.534; p < 0.05), and a strong negative correlation was found between TDS values and DO (r = −0.819 at p < 0.05).
Strong negative correlations were observed between the values of DO and TDS, EC, and Na+ (r = −0.819, r = 0.817, r = −0.826; p <0.05), moderate negative correlations were observed between DO and TN, TP, BOD, K+, and temperature values (r = −0.577, r = −0.568, r = −0.687, r = −0.639 r = −0.729; p < 0.05), and a moderate negative correlation was observed between DO and NO3N, with r = −0.464 at p < 0.005).
Strong positive correlations were found between temperature and EC and TDS (r = 0.839, r = 0.842; p < 0.05), and moderate positive correlations were found for temperature with TP and PO4P(r = 0.730, r = 0.532; p < 0.05). There was also a moderate negative correlation observed between temperature and DO, with r = −0.729 at p < 0.005. NH3−N had a moderate positive correlation with COD, TP, temperature, and Na+ (r = 0.476, r = 0.484, r = 0.550, r = 0.343; p < 0.005).
A strong positive correlation was found between PO4P and TP, with r = 0.921 at p < 0.005, moderate positive correlations were found for PO4−P with BOD, COD, TP, Na+, and temperature (r = 0.749, r = 0.647, r = 0.680, r = 0.76; p < 0.05), and a moderate negative correlation was found between PO4P and DO values r = −0.626; p < 0.05) (Table 5).
The pH of rivers was 7.4 (7.1 to 7.6) in the dry season and 8.2 (7.5 to 8.7) in the wet season, and the pH of lake was 8.2 (7.3 to 8.9) in the dry season and 8.5 (7.5 to 9) in the wet season. The pH of point sources was 8.3 (7.1 to 9) in the dry season and 8.3 (8.1 to 8.7) in the wet season. The recommended pH as per the standard for drinking, irrigation, and aquatic life is 6.5–8.6, and the pH of LHW was within the accepted limit (Table 6). The EC (TDS) of rivers was 148mg/L (297 µS/cm) in dry seasons and 89 mg/L (179 µS/cm) in wet seasons, and EC (TDS) of lakes was 453 mg/L (877 µS/cm) in dry season and 421 mg/L (829 µS/cm) in wet seasons. The EC (TDS) of point sources was 1655 mg/L (3509 µS/cm) in dry season and 1395 mg/L (2809 µS/cm) in wet seasons. This shows that the EC (TDS) of rivers, lakes, and point sources increases significantly with increasing temperature (Table 6). The NO3−N concentration of rivers was 0.5 mg/L, NO3−N concentration of Lake Hawassa was 1.4 mg/L, and that of point sources was 1.5 mg/L for the dry season. In the wet season, the NO3−N concentration was 0.7, 1.9, and 1.9 for rivers, Lake Hawassa, and point sources, respectively. The value of NO3−N increases in the rainy season due to the contribution of agricultural runoff and use of fertilizers. The PO4P concentration of rivers was 6.5 mg/L, PO4P of Lake Hawassa was 3.3 mg/L, and that of point sources was 43.8 mg/L in dry season. In the wet season, the PO4P concentration was 7.4, 2.9, and 25.7 for rivers, Lake Hawassa, and point sources, respectively (Table 6). Similarly, Gebre-Mariam [58] reported that Ethiopian Rift Valley lakes generally have lower EC values in the rainy season than in the dry season, due to dilution by rain coupled with minimal evaporation rates during the rainy season.
The TN (TP) of rivers was 8 (0.12) mg/L in dry seasons and 5(0.26) mg/L in wet season, and TN (TP) of lakes was 5.3 (0.2) mg/L) in dry season and 5.2 (0.6) mg/L in wet season. Hence, there is an obvious increase of TN in rivers and Lake Hawassa when temperature increases due to lower dilution and greater agricultural contribution from the upper stream by irrigation, whereas TP in rivers and Lake Hawassa increases in wet seasons due to greater agricultural, rural, and urban runoff. The TN (TP) from point sources was 31.8 (7.2) mg/L in dry season and 13.9 (5.4) mg/L in wet season. This shows that TN (TP) of point sources increases significantly with increasing temperature due to lower dilution. The NH3−N of rivers was 0.2 mg/L, NH3−N of Lake Hawassa was 0.83 mg/L, and that of point sources was 4.72 mg/L in dry season. In the wet season, the NH3−N values were 0.03, 0.71, and 3.6 for rivers, Lake Hawassa, and point sources, respectively. The decreases in NH3−N level in the rainy season might be due to dilution effect (Table 6).
The positive correlation between temperature and TN, TP, EC, TDS, NH3−N, and PO4−P indicates the increase in the concentration of nutrients as the temperature increases (dry period). It also confirms the major contributors of nutrients were the point sources that are releasing a relatively higher amount of pollutants than the agricultural and other sources, as this value lowers during the wet season due to dilution effect. However, the increase in nutrient (NO3−N) concentration in rivers and Lake Hawassa in the wet season might be due to the increased contribution of agricultural runoff and use of fertilizers.
Sodium, calcium, magnesium, and potassium concentrations of the rivers were 49.1, 13.06, 55.1, and 7.74 mg/L in dry season and 28.9, 32.7, 10.1, and 5.7 mg/L in wet seasons. Sodium, calcium, magnesium and potassium concentrations of the lake were 214, 23.8, 8.7, and 19.7 mg/L in dry season and 178.9, 25.1, 7.3, and 17.2 mg/L in wet season. The sodium, calcium, magnesium, and potassium concentrations of the point sources were 575.2, 38.2, 11.5, and 26.2 mg/L, respectively, in the dry season and 375.2, 38.2, 9.5, and 50.1 mg/L. respectively in the wet season (Table 6). There was an observed decrease in ions when the temperature decreased in the study area. This can be ascribed to the discharge of industrial and domestic effluents, which contribute large amounts of alkaline ions to the river system, as the conductivity depends mainly on the ion concentration in surface water [52]. The natural range of sodium ions in water and soil is so low that their existence can show river pollution caused by human activities. Calcium is added to water from soil, industrial wastes, and natural resources. Magnesium is an essential nutrient required for numerous biochemical and physiological functions [59].
The TDS of water generally increases with the level of dissolved pollutants (such as nitrate, ammonium, and phosphate). Conductivity of ions in water depends on water temperature, and ions move faster when water is warm. Hence, conductivity apparently increases when water has a higher temperature [60]. In addition, Taylor et al. [61] pointed out a strong relationship between these variables or ions, such as nitrate, ammonium, and phosphate, and stated that high concentrations of EC indicate high concentrations of soluble salts. There are strong correlations between EC/TDS, as evidenced by an increase in conductivity as the concentration of all dissolved constituents increases [62] Table 6.
The BOD (COD) of rivers was 19.7 (96.5) mg/L in dry seasons and 6.9 (89.4) mg/L in the wet season, and the BOD and COD of lakes was 28.1 (133.3) mg/L in dry season and was 19.1 (112.9) mg/L in wet season. The BOD and COD concentrations for point sources were 116.2 (398.6) mg/L in dry season and 111.6 (353.7) mg/L in wet season (Table 6). The DO of rivers was 3.5 mg/L in dry season and 6 mg/L in wet season, and the DO of lakes was 4.2 mg/L in dry season and 4.4 mg/L in wet season. The DO of point sources was 2 mg/L in dry season and 2.3 mg/L in the wet season (Table 6).
The DO of the rivers in the dry seasons and Lake Hawassa were well below the standard value. This indicates that the discharge of industrial and domestic effluents has resulted in serious organic pollution of these rivers, as the decrease of DO was mainly caused by the decomposition of organic compounds. Moreover, an extremely low DO content usually indicates the degradation of an aquatic system [63].
The DO showed a negative correlation with most parameters in both dry and rainy seasons, revealing the value of DO decreases with the increase in other water quality parameters. This could explain the temporal variations, as more oxygen was available for reaction with the pollutants, especially metals and organic pollutants, during dry seasons. Additionally, the characteristics of temporal variation in water quality of LHW were affected by DO. DO was strongly correlated with organic matters, nutrients, and metals, and thus seasonal variation should be considered when DO is used as an indicator to evaluate surface water quality. Low dissolved oxygen (DO) is primarily the result of excessive algal growth caused by nutrients. As the algae die and decompose, this process consumes dissolved oxygen. This may result in insufficient dissolved oxygen for fish and other aquatic life. Temperature was significantly correlated with water quality parameters such as EC, TDS, TP, PO4P, and DO in both seasons. Temperature had significant negative correlation with DO in the dry and wet seasons, indicating that when water temperature increases, the metabolic rate of microorganisms also increases, and the amount of DO in the water decreases. This might be because faster biodegradation of organic matter during dry seasons can effectively improve water quality. The solubility of oxygen was inversely related to temperature, as the water becomes warmer and more easily saturated with oxygen, hence holds less DO during the dry season. Singh et al. [32] observed the inverse relationship between temperature and DO in natural processes, as water can hold less DO with increasing temperature.

4.2. Pollution Index (PI)

The mean pollution index of the rivers in the lake watershed was 4.5 in dry and 3.3 in wet season, indicating a moderately polluted condition of rivers. Lake Hawassa PI was 5 in both dry and rainy season, indicating that the quality of the lake was moderately polluted. Anthropogenic activities were causing deterioration of the water quality of the rivers and Lake Hawassa, and the overall status of the water quality is moderately polluted. The PI for the point sources was measured for comparison purposes, and it was found to be highly polluted, having a PI index of 6.8 and 7.3 for the wet and dry seasons, respectively (Table 7).

4.3. Cluster Analysis

Spatial and Temporal Similarities

Cluster analysis was applied to find out if the monitoring stations had similar characteristics in terms of water quality parameters. It was implemented with the water quality data set to group comparable monitoring sites (spatial variability) spread over the watershed. Results from CA display high homogeneity within clusters and high heterogeneity between clusters [64]. Hierarchical agglomerative CA was carried out with the normalized data set employing Ward’s method, using Euclidean distances as a measure of similarity. In this approach, the analysis of variance method is used to evaluate the distances between clusters, attempting to reduce the sum of squares of all clusters that can be made at each step. In this method, the clusters are grouped sequentially, beginning with the most comparable pair of objects and establishing better clusters one after the other, demonstrated through a dendrogram [2,65].
The dendrogram presents a visual summary of the clustering processes and provides the map of the groups with a dramatic reduction in the dimensionality of the original records [2,5,32,43,44]. The CA grouped all 19 monitoring stations into two statistically significant clusters for the dry and wet seasons in LHW, and the dendrogram displays the grouping of stations for the wet and dry seasons, as demonstrated in Figure 2. Regarding the clustering for the dry and wet seasons, monitoring stations from most of the watershed upstream, from the eastern and western sides of the lake, and from the center of Lake Hawassa have been grouped in Cluster 1. Stations in these clusters typically consist of rivers and Lake Hawassa and are categorized as moderately polluted. The monitoring stations in these clusters are MS1-MS3, MS6-MS14, and MS16-MS18, which can be labeled as “moderate anthropogenic effect”. This cluster received pollution from point sources and non-point sources, consisting of animal waste and runoff. It is characterized by moderate anthropogenic impact and labelled as moderately polluted.
The pollution sources for monitoring stations MS1-MS3 were mainly anthropogenic activities from non-point pollution sources such as agricultural and sewage pollution, whereas pollution sources for monitoring stations MS6 (Tikur-Wuha river) and Lake Hawassa (MS7–MS14, MS16–MS18) were mainly industrial pollution, dispersed point sources, agricultural pollution, urban runoff, and sewage pollution.
Owing to their relative sources, all stations in this cluster were rivers and lakes, suggesting that clustering is reasonable for both dry and wet seasons.
The spatial trend of water quality was generally driven by anthropogenic activities from point and non-point sources of pollution, especially anthropogenic activities with respect to pollutant loading and land use.
Cluster 2 includes four monitoring stations in the middle part of the LHW and groups monitoring stations in this cluster as MS4, MS5, MS15, and MS19. Four point sources, specifically BGI, Pepsi Factory, Referral Hospital, and Industrial Park monitoring stations, were assigned to this cluster. Consequently, this cluster is characterized by comparatively heavy pollution.

4.4. Discriminant Analysis

Discriminant analysis (DA) was used to evaluate the spatial variations in water quality and to distinguish the most critical parameters in relation to variations between clusters. Both the standard and stepwise modes were applied to the primary data by dividing them into wet and dry seasons, and the two spatial groups resulting from CA were used in DA. In this case, the WQ parameters were treated as independent variables, while the clusters were considered as dependent variables. The confusion matrixes (CM) showed that 100%, 100%, and 100% of the data points were correctly classified in the standard, forward stepwise, and backward stepwise modes for both dry and wet seasons, respectively (Table 8).
The standard DA method builds DFs using eighteen parameters, while only three and seven parameters were the critical parameters useful to make distinction within the two pollution groups for both the forward stepwise modes and backward stepwise modes, respectively, for both dry and wet seasons. In forward stepwise mode, most of the parameters such as turbidity, TDS, pH, NH3−N, NO3−N, PO4−P, DO, COD, NO2−N, TN, TP, temperature, Mg2+, Ca2+, and K+ were insignificant variables leading to less variation, and they were deleted in the further process. However, in the forward stepwise DA mode, the three significant variables that were useful to make distinctions within the two pollution groups with 100% correct assignation were EC, BOD, and Na+. The backward stepwise mode deleted the least significant and identified seven significant variables: EC, DO, COD, TN, TP, Na+ and K+. These seven parameters, which were 100% correctly assigned, were the critical parameters useful to make distinctions within the two pollution groups. This implies that the expected spatial variation in water quality can be explained sufficiently using variables EC, DO, COD, TN, TP, Na+, and K+. Wilks’ lambda shows that the discriminant distribution is skewed towards high concentrations.
On the other hand, the standard DA functions was constructed using eighteen parameters, of which three and four parameters were used for forward stepwise mode and backward stepwise mode, respectively, for wet season. In forward stepwise mode, the pollutants that were found to be insignificant variables and had less variation in terms of their spatial distribution were deleted in the further process. However, in the backward stepwise DA mode, the three significant variables that were useful to make distinctions within the two pollution groups with 84.5% correct assignment were EC, Na+, and COD. The backward stepwise mode deleted the least significant and identified two significant variables: EC and Ca+2. These two parameters were the critical parameters useful to make distinctions within the two pollution groups with 87.5% correct assignation (Table 8). This implies the spatial water quality variation can be sufficiently explained by using variables EC, Na+, COD, and Ca2+, with Wilks’ lambda value showing discriminatory distribution is skewed toward high concentration, as shown in Figure 3.

4.5. Pollution Source Identification of Monitored Variables

Principal Component Analysis

PCA was applied to the normalized data and was able to identify three principal components (PCs) using the Kaiser criterion [66] based on loading higher than 0.5. The scree plot graphs are used widely to identify the number of PCs to be retained to understand the underlying data structure [26]. Based on the scree plot and the eigenvalues >1 criterion, three factors were chosen as principal factors. The variables with eigenvalues lower than 1 were removed due to their low significance [67].
In this study, the scree plot (Figure 4) shows the sorted eigenvalues from large to small as a function of the number of PCs. This figure shows a pronounced change in slope after the third eigenvalue; three components were retained (Table 9). After the third PC (Figure 4a,b), beginning with the upward curve, the remaining components were circumvented. It was used to classify the number of PCs to be retained in order to figure out the underlying data structure [25]. Consequently, a new set of data is obtained that may explain the variation of data set having fewer variables.
Moreover, scree plots are used to visually evaluate which components or factors elucidate the maximum variability in the data.
The PCA results, which include the loadings (participation of the original variable in the new one), are summarized in Table 9. The FA in LHW extracted three factors by retaining the PCs through varimax rotation that explained 72.5% of the total variance for the wet season. An eigenvalue offers a degree of the importance of the factor, and factors having the highest eigenvalues are the most significant. Eigenvalues of 1.0 or more are considered significant. Liu et al. [26] additionally categorized the factor loadings as ‘strong’, ‘moderate’, and ‘weak’, corresponding to absolute loading values of >0.75, 0.75–0.50, and 0.50–0.30, respectively.
The first factor (F1), accounting for 46.8% of the total variance, showed strong positive loadings of TDS, EC, PO4P, BOD, COD, TP, TN, Na+, and temperature with factor loadings of 0.974, 0.978, 871, 0.811, 0.784, 0.793, 0.898, 0.812, 0.825, and 0.832, respectively; a weak positive loading of K+ (0.477); and strong negative loading of DO (−0.842) (Table 9). High positive loadings of temperature and high negative loading of DO might suggest the impact of seasonal variation, and temperature is inversely related to DO. The strong and moderate positive loading of BOD and COD signify biodegradation of organic matter and are negatively affected by DO of water bodies. F1 stands clearly for pollution by BOD or COD, and nutrients and oxygen depletion is a consequence. When the temperature of water bodies decreases, the biodegradation of organic matter decreases, and the solubility of oxygen in the water increases. Similar reports of high concentrations of BOD and COD exist elsewhere [42,44,45]. Similarly, the strong negative DO loading indicates the utilization of DO under anaerobic conditions in rivers and lakes for the degradation of organic matter. F1 showed strongly positive loadings for both COD and BOD, while the loading for DO was strongly negative. This indicates a group of purely organic pollution indicator parameters from industrial effluents, domestic discharges, and livestock affecting water bodies [23,27,51].
High nutrient loadings of factors such as TN and TP represent pollution from point and non-point sources from industrial setup, agriculture areas, domestic sewage, and urban runoff. The high loading of metals demonstrates the influences of industrial effluents and agriculture activities. Phosphorus and nitrogen can originate from point sources such as sewage pollution, industrial facilities and livestock, as well as from non-point sources, mainly from agricultural activities, runoff from rural and urban areas, soil erosion, and livestock. These results are consistent with findings of other reports elsewhere [27,68]. Consequently, the component is more likely to be explained by the combination of domestic pollution and industrial factors. These factors are characteristic of the monitoring stations in the upper catchment (MS1 and MS2), in the middle section including point sources (MS5 and MS15), along Tikur-Wuha River (MS6), and on the eastern side of Lake Hawassa (MS7, MS9, MS12, MS13, and MS14), where domestic and industrial effluents and agricultural runoff are predominant.
The strongly positive loadings of Na+ and weak positive loadings of K+ are likely due to industrial effluents discharged into the river Tikur-Wuha and Lake Hawassa. Reports also indicate that the sources of Na+ and K+ might be domestic sources, fertilizers, and residential waste in addition to industrial effluents [69]. During field observation, it was found that the major industries are discharging their treated and untreated effluents directly into the Tikur-Wuha River and the lake during the rainy period when the flow rate is high, resulting in high dilution, but during the dry period, the dilution effect is lower and consequent pollution is higher.
On the other hand, the strong loadings of TN and TP in F1 suggest higher contribution from point sources in industry and non-point sources such as agricultural land use, urban drainage, and residential areas during the rainy season. In general, these factors are symbolic of a blended source of contamination, encompassing industrial discharges, urban runoff, and agricultural land use. The results are in agreement with those of other studies [5,24,67,69]. Hence, they can be considered as the contamination index for surface water [44,45].
The second factor (F2) explained 13.4% of the total variance. It had a moderately negative loading of Mg2+ and Ca2+ (−0.654, −0.627) and a moderately positive loading of NH3−N (0.516). This factor’s moderately negative loading of Mg2+ and Ca2+ is likely to originate from industrial wastewater discharged into the Tikur-Wuha River and Lake Hawassa, usually from carbonate minerals, which are naturally present in the soils of the Lake Hawassa watershed. This factor is more pronounced at monitoring stations affected by point sources, agricultural lands, and rural and urban runoff, such as MS3 in the upper catchment, MS19 in the middle section (point source), and MS8, MS11, MS16, and MS18 monitoring stations on both eastern and western sides of Lake Hawassa.
A moderately positive loading of NH3−N (0.7) indicates biodegradation of organic matter. This variable is primarily from runoff, with high loading of solids and wastes from point sources of pollution from domestic and industrial areas. Furthermore, NH3−N is triggered by the decomposition of organic matter, indicating the discharge of domestic sewage to surface water. Studies elsewhere have showed comparable results [42,44,45,69,70].
The third factor (F3), explaining 12.3% of the total variance, had a moderately negative loading for pH (−0.710), suggesting the dominance of physical reactions by aquatic plants and natural weathering of the basin, possibly due to industrial impact from different sources [22]. It had weak positive loading of turbidity (0.452), moderate negative loading of NO2−N (−0.620), and moderate positive loading of NO3−N (0.507). NO3–N may additionally have derived from agricultural areas in the region, where inorganic nitrogen fertilizers are in common use and the role of domestic waste is strong, and hence, this component can be best explained by a ‘‘nutrient’’ factor representing influences from non-point sources such as agricultural runoff and the domestic pollution factor. The reports of Yilma et al. [35] in Ethiopia and Zhang et al. [27] elsewhere were comparable with this result. This factor is typical of the monitoring stations in the middle section including point sources and eastern and western sides of Lake Hawassa (MS4, MS10, and MS17), where domestic sewage, industrial effluents, and agricultural runoff are predominant.
The FA in LHW extracted three factors by retaining the PCs through varimax rotation that explained 70.5% of the total variance for the dry season. The first factor (F1), accounting for 45.7% of the total variance, showed strong positive loadings of TDS, EC, PO4−P, BOD, DO, TP, Na+, and temperature, having factor loadings of 0.962, 0.961, 0.830, 0.796, 0.897, 0.783, and 0.973, respectively; moderate positive loadings of K+, COD, and TN (0.572, 0.721, 0.724); and strong negative loadings of DO (−0.847). Strong positive loadings of temperature and strong negative loadings of DO might suggest the impact of seasonal variations. The strong and moderate positive loading of BOD and COD signify biodegradation of organic matters and negatively affect DO of water bodies. F1 stands clearly for pollution by BOD or COD, and nutrients and oxygen depletion is a consequence. High temperature increases biodegradation and reduces solubility of oxygen in the water. This PC was correlated with COD and BOD5, indicating a group of purely organic pollution indicator parameters from uncontrolled domestic discharges caused by rapid urbanization and industrial effluents. Biodegradation of organic matter causes concentrations of BOD and dissolved oxygen in water [23,27,51].
A high loading of nutrients represents pollution from industrial setup and domestic wastewater. High loading of metals demonstrates the influences of industrial discharges. Phosphorus and nitrogen may originate from point sources such as sewage pollution, agricultural runoff in the upper stream due to irrigation, industrial facilities, and livestock. Consequently, this component is more likely to be explained by the combination of domestic pollution factors and industrial factors. Strongly positive loading of Na+ and moderate positive loadings of K+ are likely to originate from industrial effluents discharged directly into the Tikur-Wuha River and Lake Hawassa. These results are also supported by similar findings obtained elsewhere [27,69].
This factor is more pronounced at monitoring stations in the upper catchment (MS1 and MS3), monitoring stations in the middle section including point sources (MS4, MS5, MS15 and MS19), Tikur-Wuha River (MS6), and monitoring stations from both eastern and western sides of Lake Hawassa (MS9, MS10, MS14, MS16, and MS17), where domestic sewage, industrial effluents, and agricultural activities are predominant. The major industries discharge their treated and untreated effluents directly into Tikur-Wuha River and the lake during the dry period when the flow is low, which might lead to higher pollution. On the other hand, the strong loadings of TN and TP at F1 suggest a higher contribution of point sources from industrial facilities and agricultural runoff in the upper stream due to irrigation. Generally, these factors suggest a blended source of contamination encompassing municipal and industrial point source and livestock. This result is also confirmed by other studies [5,23,33,67,69]. Hence, it can be considered to be the contamination index for surface water [44,45].
The second factor (F2) explained 16% of the total variance and had a strong negative loading of turbidity (−0.781), a moderate negative loading of NO2−N and Mg+2 (−0.567, −0.531), and a moderate positive loading of NO3−N and Ca+2 (0.599, 0.524). NO3–N could be mainly from point sources, and the role of domestic waste is also strong. Hence, this component can be explained by the ‘‘nutrient’’ factor, which represents influences from non-point sources such as the domestic pollution factor [24,27,32,35,66,69]. A moderately positive loading of K+ and a moderately negative loading of Mg2+ in this factor likely originate from industrial discharges into the Tikur−Wuha River and Lake Hawassa. This PC is more influenced by industrial discharges, and monitoring stations from the LHW, where industry is predominant, are more pronounced. This factor is more pronounced in monitoring stations in the upper catchment (MS2) and the monitoring stations in the eastern and western sides of Lake Hawassa (MS11, MS12, MS13, and MS18), where domestic, industrial, and agricultural activities are predominant in the upper stream due to irrigation.
The third factor (F3), explaining 8.8% of the total variance, had a strong positive loading of pH (0.775), suggesting the dominance of physical reactions by aquatic plants and natural weathering of the basin, and attributed to industrial impact from different sources [22]. A moderate positive loading of NH3−N (0.7) indicates the biodegradation of organic matter causing concentrations of waterborne factors such as NH3−N. This variable originated primarily from wastes from point sources of pollution from domestic and industrial areas. Furthermore, NH3−N is triggered by organic matter decomposition, indicating the discharge of domestic sewage to surface water. Reports elsewhere support the findings of this study [42,44,45,70]. This factor is more pronounced in monitoring stations on the eastern side of Lake Hawassa (MS7 and MS8), where domestic sewage, industrial effluents, and agricultural activities are prevalent.
The bi-plot of PCs on key parameters TDS, EC, PO4P, DO, BOD, COD, TN, TP, temperature, Na+, K+, Turbidity, NO2−N, NO3−N, Mg2+, and Ca2+ that characterize monitoring stations from rivers in the upper and middle catchment, point sources in the middle catchment, and the eastern and western sides of Lake Hawassa are presented in Figure 5a,b for dry and wet seasons. In fact, the average values of EC, TDS, BOD, COD, Na+, K+, Mg2+, Ca2+, and NH3−N of point sources were exceedingly higher than that of rivers in the upper and middle catchment (MS1–MS3, and MS6) and Lake Hawassa (MS7-MS14, MS16 and MS18) in Table 6. In addition, NO3−N, NO2−N, TN, TP, and PO4−P were the main parameters characterizing the stated monitoring sites in both seasons. These stations predominantly include rural areas, urban and peri-urban areas, and industrial sites from which domestic sewage, urban runoff, and effluents are discharged into the lake. Furthermore, the influence of agricultural activities in the upper catchment and Tikur-Wuha River feeding the lake was evident. The results of this investigation were comparable to the findings of the studies conducted by Tibebe et al. [71] and Meshesha et al. [72] on Lake Ziway. In particular, higher EC and TDS values were recorded for similar monitoring stations in both seasons (Table 6). In an aquatic environment, EC is used to categorize the pollution status of surface waters, and an increase in conductivity indicates the presence of dissolved ions that can affect aquatic life and water quality [73].

4.6. Total Nitrogen to Total Phosphorus (TN:TP) Ratio

The TN:TP ratio in lakes and reservoirs is a key element, as it gives an idea of which of these nutrients is either in excess or limiting to growth, and it was used to estimate the nutrient limitation in the lake. According to Smith [74], blue-green algae (cyanobacteria) has a capacity to dominate in the lake section when the TN:TP ratio was less than 29, and it tends to be rare in the lake when TN:TP > 29. On the other hand, Fisher et al. [75] used a more conservative ratio of TN:TP. According to them, the ratio > 20 is designated as the phosphorus limitation and nitrogen limitation when the ratio is <10, while a TN:TP ratio of 10 to 16 demonstrates either phosphorus or nitrogen (or both) are limiting for growth. The estimated ratio for Lake Hawassa was 31, which is higher than 20 and 30, revealing cyanobacteria dominance in the lake section, which is rare. The TN:TP ratio > 20 in Lake Hawassa indicated that phytoplankton growth in the lake might be phosphorous deficient.

5. Conclusions

Multivariate statistical techniques help researchers to scrutinize the relationships between parameters in a broader fashion by applying different approaches such as cluster analysis, correlation, factor analysis, discriminant analysis, and multiple regressions to determine the association between dependent and independent variables. They reduce the dimensionality of data so that the whole picture can be visualized more easily than looking at specific cases allows. Furthermore, multivariate techniques provide powerful significance testing compared to univariate techniques. Despite their various merits, the results of multivariate statistical modeling are not easy to interpret and require a large data set to get meaningful results due to the high standard errors. In particular, PCA/FA is likely to lose information if PCs or factors are not chosen judiciously.
This study was conducted to evaluate seasonal and spatial variations in water quality and to identify potential sources of pollution using multivariate statistical techniques for the Lake Hawassa Watershed. The results of this study show that the condition of Lake Hawassa Watershed was classified into moderately and highly polluted categories in both dry and wet seasons. In data-limited developing countries such as Ethiopia, it is especially clumsy to identify possible sources of pollution due to certain contaminants, as this requires frequently monitored water quality data, which are often not available. To address this serious problem, this study applied MVST. Multivariate statistics were used to perform temporal and spatial assessment of surface water quality to reduce the number of monitoring stations and chemical parameters in LHW. In this study, we used Pearson correlation, PCA/FA, CA, and DA to evaluate spatial and temporal variance in surface water quality.
CA grouped the monitoring stations into two statistically significant clusters for the dry and wet seasons, labelled MP and HP, using PI. Accordingly, this resulted in a dendrogram with two clusters for the dry and wet seasons. The findings of the study revealed that rivers in the upstream and middle portion of the lake watershed and Lake Hawassa were moderately polluted (MP), while point sources (industries, hospitals, and hotels) in the middle of the LHW were found to be highly polluted (HP).
DA was used to identify the most critical parameters to investigate the spatial variations and extracted seven significant parameters: EC, DO, COD, TN, TP, Na+, and K+, with spatial variance to distinguish the pollution statuses of the groups obtained using CA.
PCA/FA techniques helped to identify the potential sources of water quality degradation. This study comprehensively analyzed the water quality of LHW and identified three significant sources responsible for pollution of Lake Hawassa Watershed in dry and wet seasons affecting the water quality. Accordingly, the pollution is due to mixed sources including point sources such as municipal and industrial effluents, natural processes, livestock, urban runoff, and non-point sources from agricultural activities.
Poor industrial effluent management combined with non-point sources from agriculture and urban runoff contribute significantly to the pollution of Lake Hawassa. Discharge of industrial effluents into the surface water system is the largest point source of anthropogenic pollution. Diffuse sources that contribute enormously to LHW come from agricultural activities, i.e., intensive farming and livestock (F1, F2, and F3).
We conclude that effective management of point and non-point source pollution is imperative to improve domestic, industrial, livestock, and agricultural runoff to reduce pollutant inputs into the lake. A stringent management that requires a comprehensive application of technologies such as fertilizer management, ecological ditches, constructed wetlands, and buffer strips should complement proper municipal and industrial wastewater treatment set-up.
Furthermore, application of indigenous aeration practices such as the use of drop structures at critical locations would help improve water quality in the lake watershed.

Author Contributions

Conceptualization and improvement of the methodology, S.M.L., M.D.U. and A.M.; Data collection, analysis, and interpretation, S.M.L.; Writing of the original manuscript, S.M.L.; Supervision, follow-up of the work, reviewing and modifying of the manuscript, A.M. and M.D.U. All authors have read and agreed to the published version of the manuscript.

Funding

This research was part of the DAAD-EECBP Home Grown PhD Scholarship Program at EECBP Homegrown PhD Program, 2019 (57472170). The Open Access Department, University of Rostock, has funded the APC.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the first author. The data are not publicly available, as they are experimental.

Acknowledgments

We are grateful to the German Academic Exchange Service (DAAD) for offering a stipend for the first author in the course of the study.

Conflicts of Interest

The authors claim no conflict of interest in connection with the work submitted.

References

  1. Cherif, E.K.; Salmoun, F.; Mesas-Carrascosa, F.J. Determination of Bathing Water Quality Using Thermal Images Landsat 8 on the West Coast of Tangier: Preliminary Results. Remote Sens. 2019, 11, 972. [Google Scholar] [CrossRef] [Green Version]
  2. Fan, X.; Cui, B.; Zhao, H.; Zhang, Z.; Zhang, H. Assessment of river water quality in Pearl River Delta using multivariate statistical techniques. Procedia Environ. Sci. 2010, 2, 1220–1234. [Google Scholar] [CrossRef] [Green Version]
  3. Noori, R.; Sabahi, M.; Karbassi, A.; Baghvand, A.; Zadeh, H.T. Multivariate statistical analysis of surface water quality based on correlations and variations in the data set. Desalination 2010, 260, 129–136. [Google Scholar] [CrossRef]
  4. Wang, Q.; Li, S.; Jia, P.; Qi, C.; Ding, F. A Review of Surface Water Quality Models. Sci. World J. 2013, 2013, 1–7. [Google Scholar] [CrossRef] [Green Version]
  5. Simeonov, V.; Stratis, J.; Samara, C.; Zachariadis, G.; Voutsa, D.; Anthemidis, A.; Sofoniou, M.; Kouimtzis, T. Assessment of the surface water quality in Northern Greece. Water Res. 2003, 37, 4119–4124. [Google Scholar] [CrossRef]
  6. Massoud, M.A.; El-Fadel, M.; Scrimshaw, M.D.; Lester, J.N. Factors influencing development of management strategies for the Abou Ali River in Lebanon. I: Spatial variation and land use. Sci. Total Environ. 2006, 362, 15–30. [Google Scholar] [CrossRef]
  7. Wang, X.-L.; Lu, Y.-L.; Han, J.-Y.; He, G.-Z.; Wang, T.-Y. Identification of anthropogenic influences on water quality of rivers in Taihu watershed. J. Environ. Sci. 2007, 19, 475–481. [Google Scholar] [CrossRef]
  8. Abebe, Y.D.; Geheb, K. Wetlands of Ethiopia. In Proceedings of a Seminar on the Resources and Status of Ethiopia’s Wetlands; IUCN, The World Conservation Union: Gland, Switzerland, 2003. [Google Scholar]
  9. Cherif, E.K.; Salmoun, F.; El Yemlahi, A.; Magalhaes, J.M. Monitoring Tangier (Morocco) coastal waters for As, Fe and P concentrations using ESA Sentinels-2 and 3 data: An exploratory study. Reg. Stud. Mar. Sci. 2019, 32, 100882. [Google Scholar] [CrossRef]
  10. Amare, T.A.; Yimer, G.T.; Workagegn, K.B. Assessment of Metal concentration in water, sediment and macrophyte plant collected from Lake Hawassa, Ethiopia. Environ. Anal. Toxicol. 2014, 05, 1–7. [Google Scholar] [CrossRef] [Green Version]
  11. Teshome, F.B. Seasonal water quality index and suitability of the water body to designated uses at the eastern catchment of Lake Hawassa. Environ. Sci. Pollut. Res. 2019, 27, 279–290. [Google Scholar] [CrossRef]
  12. Zigde, H.; Tsegaye, M.E. Evaluation of the current water quality of Lake Hawassa, Ethiopia. Int. J. Water Resour. Environ. Eng. 2019, 11, 120–128. [Google Scholar]
  13. Amare, D. Assessment of Non-Point Source Pollution in Lake Awassa Watershed Using the Annualized Agricultural Non-Point Source (AnnAGNPS) Model; Addis Ababa University: Addis Ababa, Ethiopia, 2008. [Google Scholar]
  14. Drevnick, P.E.; Engstrom, D.; Driscoll, C.T.; Swain, E.; Balogh, S.J.; Kamman, N.C.; Long, D.T.; Muir, D.; Parsons, M.J.; Rolfhus, K.R.; et al. Spatial and temporal patterns of mercury accumulation in lacustrine sediments across the Laurentian Great Lakes region. Environ. Pollut. 2012, 161, 252–260. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Kebede, W.; Tefera, M.; Habitamu, T.; Alemayehu, T. Impact of Land Cover Change on Water Quality and Stream Flow in Lake Hawassa Watershed of Ethiopia. Agric. Sci. 2014, 5, 647–659. [Google Scholar] [CrossRef] [Green Version]
  16. Lencha, S.M.; Tränckner, J.; Dananto, M. Assessing the Water Quality of Lake Hawassa Ethiopia—Trophic State and Suitability for Anthropogenic Uses—Applying Common Water Quality Indices. Int. J. Environ. Res. Public Health 2021, 18, 8904. [Google Scholar] [CrossRef]
  17. Nigussie, K.; Chandravanshi, B.S.; Wondimu, T. Correlation among trace metals in Tilapia (Oreochromis niloticus), sediment and water samples of lakes Awassa and Ziway, Ethiopia. Int. J. Biol. Chem. Sci. 2011, 4. [Google Scholar] [CrossRef] [Green Version]
  18. Wondrade, N.; Dick, Ø.B.; Tveite, H. GIS based mapping of land cover changes utilizing multi-temporal remotely sensed image data in Lake Hawassa Watershed, Ethiopia. Environ. Monit. Assess. 2014, 186, 1765–1780. [Google Scholar] [CrossRef]
  19. Abiye, T.A. Environmental resources and recent impacts in the Awassa collapsed caldera, Main Ethiopian Rift. Quat. Int. 2008, 189, 152–162. [Google Scholar] [CrossRef]
  20. APHA Standard Methods for the Examination of Water and Wastewater, 23rd ed.; Baird, R.B.; Eaton, A.D.; Rice, E.W.; Brigewater, L.L. (Eds.) American Public Health Association (APHA): Washington, DC, USA; American Water Works Association (AWWA): Washington, DC, USA; Water Environment Federati, Water Environment Federation: Washington, DC, USA, 2017; ISBN 9780123821652. [Google Scholar]
  21. Emerson, K.; Russo, R.C.; Lund, R.E.; Thurston, R.V. Aqueous Ammonia Equilibrium Calculations: Effect of pH and Temperature. J. Fish. Res. Board Can. 1975, 32, 2379–2383. [Google Scholar] [CrossRef]
  22. Angello, Z.A.; Tränckner, J.; Behailu, B.M. Spatio-Temporal Evaluation and Quantification of Pollutant Source Contribution in Little Akaki River, Ethiopia: Conjunctive Application of Factor Analysis and Multivariate Receptor Model. Pol. J. Environ. Stud. 2021, 30, 23–34. [Google Scholar] [CrossRef]
  23. Singh, K.P.; Malik, A.; Sinha, S. Water quality assessment and apportionment of pollution sources of Gomti river (India) using multivariate statistical techniques—A case study. Anal. Chim. Acta 2005, 538, 355–374. [Google Scholar] [CrossRef]
  24. Sharma, S.; Reddy, A.S.; Dalwani, R.R. Ecological water quality index development and evaluation of water quality of the Satluj river. Indian J. Environ. Prot. 2015, 35, 477–489. [Google Scholar]
  25. Palma, P.; Alvarenga, P.; Palma, V.L.; Fernandes, R.M.; Soares, V.M.; Amadeu, M.A.; Barbosa, I.R. Assessment of anthropogenic sources of water pollution using multivariate statistical techniques: A case study of the Alqueva’s reservoir, Portugal. Environ. Monit. Assess. 2010, 165, 539–552. [Google Scholar] [CrossRef]
  26. Liu, C.-W.; Lin, K.-H.; Kuo, Y.-M. Application of factor analysis in the assessment of groundwater quality in a blackfoot disease area in Taiwan. Sci. Total. Environ. 2003, 313, 77–89. [Google Scholar] [CrossRef]
  27. Zhang, Q.; Li, Z.; Zeng, G.; Li, J.; Fang, Y.; Yuan, Q.; Wang, Y.; Ye, F. Assessment of surface water quality using multivariate statistical techniques in red soil hilly region: A case study of Xiangjiang watershed, China. Environ. Monit. Assess. 2009, 152, 123–131. [Google Scholar] [CrossRef] [PubMed]
  28. Zhao, J.; Fu, G.; Lei, K.; Li, Y. Multivariate analysis of surface water quality in the Three Gorges area of China and implications for water management. J. Environ. Sci. 2011, 23, 1460–1471. [Google Scholar] [CrossRef]
  29. Chen, P.; Li, L.; Zhang, H. Spatio-Temporal Variations and Source Apportionment of Water Pollution in Danjiangkou Reservoir Basin, Central China. Water 2015, 7, 2591–2611. [Google Scholar] [CrossRef] [Green Version]
  30. Ma, X.; Wang, L.; Yang, H.; Li, N.; Gong, C. Spatiotemporal Analysis of Water Quality Using Multivariate Statistical Techniques and the Water Quality Identification Index for the Qinhuai River Basin, East China. Water 2020, 12, 2764. [Google Scholar] [CrossRef]
  31. Mahmud, R.; Inoue, N.; Sen, R. Assessment of Irrigation Water Quality by Using Principal Component Analysis in an Arsenic Affected Area of Bangladesh. J. Soil Nat. 2007, 1, 8–17. [Google Scholar]
  32. Singh, K.P.; Malik, A.; Mohan, D.; Sinha, S. Multivariate statistical techniques for the evaluation of spatial and temporal variations in water quality of Gomti River (India)—a case study. Water Res. 2004, 38, 3980–3992. [Google Scholar] [CrossRef]
  33. Sharma, M.; Kansal, A.; Jain, S.; Sharma, P. Application of Multivariate Statistical Techniques in Determining the Spatial Temporal Water Quality Variation of Ganga and Yamuna Rivers Present in Uttarakhand State, India. Water Qual. Expos. Health 2015, 7, 567–581. [Google Scholar] [CrossRef]
  34. Banda, T.D.; Kumarasamy, M. Application of Multivariate Statistical Analysis in the Development of a Surrogate Water Quality Index (WQI) for South African Watersheds. Water 2020, 12, 1584. [Google Scholar] [CrossRef]
  35. Yilma, M.; Kiflie, Z.; Windsperger, A.; Gessese, N. Assessment and interpretation of river water quality in Little Akaki River using multivariate statistical techniques. Int. J. Environ. Sci. Technol. 2018, 16, 3707–3720. [Google Scholar] [CrossRef]
  36. Mohd, N.F.; Samsudin, M.S.; Mohamad, I.; Awaluddin, M.R.; Mansor, A.; Hafizan, J.; Ramli, N. River water quality modeling using combined principle component analysis (PCA) and multiple linear regressions (MLR): A case study at Klang River, Malaysia. World Appl. Sci. J. 2011, 14, 73–82. [Google Scholar]
  37. Kazi, T.; Arain, M.; Jamali, M.; Jalbani, N.; Afridi, H.; Sarfraz, R.; Baig, J.A.; Shah, A.Q. Assessment of water quality of polluted lake using multivariate statistical techniques: A case study. Ecotoxicol. Environ. Saf. 2009, 72, 301–309. [Google Scholar] [CrossRef] [PubMed]
  38. Helena, B.; Pardo, R.; Vega, M.; Barrado, E.; Fernandez, J.M.; Fernand, L. Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga River, Spain) by principal component analysis. Water Res. 2000, 34, 807–816. [Google Scholar] [CrossRef]
  39. Sojka, M.; Siepak, M.; Ziola-frankowska, A.; Frankowski, M. Application of multivariate statistical techniques to evaluation of water quality in the Mala Welna River (Western Poland). Environ. Moni. Assess. 2008, 147, 159–170. [Google Scholar] [CrossRef]
  40. Grubbs, F.E.; Beck, G. Extension of Sample Sizes and Percentage Points for Significance Tests of Outlying Observations. Technometrics 1972, 14, 847–854. [Google Scholar] [CrossRef]
  41. Suhaimi, N.; Ghazali, N.A.; Nasir, M.Y.; Mokhtar, M.I.; Ramli, N.A. Markov Chain Monte Carlo Method for Handling Missing Data in Air Quality Datasets. Malays. J. Anal. Sci. 2017, 21, 552–559. [Google Scholar]
  42. Vega, M.; Pardo, R.; Barrado, E.; Debán, L. Assessment of seasonal and polluting effects on the quality of river water by exploratory data analysis. Water Res. 1998, 32, 3581–3592. [Google Scholar] [CrossRef]
  43. Alberto, W.D.; del Pilar, D.M.; Valeria, A.M.; Fabiana, P.S.; Cecilia, H.A.; Ángeles, B.M.d.l. Pattern Recognition Techniques for the Evaluation of Spatial and Temporal Variations in Water Quality. A Case Study: Suquía River Basin (Córdoba–Argentina). Water Res. 2001, 35, 2881–2894. [Google Scholar] [CrossRef]
  44. Shrestha, S.; Kazama, F. Assessment of surface water quality using multivariate statistical techniques: A case study of the Fuji river basin, Japan. Environ. Model. Softw. 2007, 22, 464–475. [Google Scholar] [CrossRef]
  45. Wu, E.M.-Y.; Kuo, S.-L. Applying a Multivariate Statistical Analysis Model to Evaluate the Water Quality of a Watershed. Water Environ. Res. 2012, 84, 2075–2085. [Google Scholar] [CrossRef] [Green Version]
  46. Swanson, R.A.; Holton, E.F. Research in Organizations Foundations and Methods of Inquiry; Holton, E.F., Ed.; Berrett-Koehler Organization: San Francisco, CA, USA, 2005; pp. 115–142. [Google Scholar]
  47. Liou, S.-M.; Lo, S.-L.; Wang, S.-H. A Generalized Water Quality Index for Taiwan. Environ. Monit. Assess. 2004, 96, 35–52. [Google Scholar] [CrossRef]
  48. Chen, Y.-C.; Yeh, H.-C.; Wei, C. Estimation of River Pollution Index in a Tidal Stream Using Kriging Analysis. Int. J. Environ. Res. Public Health 2012, 9, 3085–3100. [Google Scholar] [CrossRef] [PubMed]
  49. Mahesh Kumar, M.K.; Mahesh, M.K.; Sushmitha, B.R. CCME Water Quality Index and Assessment of Physico- Chemical Parameters of Chikkakere, Periyapatna, Mysore District, Karnataka State, India. Int. J. Innov. Res. Sci. Eng. Technol. 2014, 3. [Google Scholar] [CrossRef]
  50. Varol, M.; Şen, B. Assessment of surface water quality using multivariate statistical techniques: A case study of Behrimaz Stream, Turkey. Environ. Monit. Assess. 2008, 159, 543–553. [Google Scholar] [CrossRef]
  51. Zhou, F.; Liu, Y.; Guo, H. Application of Multivariate Statistical Methods to Water Quality Assessment of the Watercourses in Northwestern New Territories, Hong Kong. Environ. Monit. Assess. 2006, 132, 1–13. [Google Scholar] [CrossRef] [PubMed]
  52. Parinet, B.; Lhote, A.; Legube, B. Principal component analysis: An appropriate tool for water quality evaluation and management—Application to a tropical lake system. Ecol. Model. 2004, 178, 295–311. [Google Scholar] [CrossRef]
  53. Rencher, A. Methods of Multivariate Analysis Second Edition, 2nd ed.; A Wiley-Interscience Publication: New York, NY, USA, 2002; pp. 156–504. ISBN 0471418897. [Google Scholar]
  54. Gummadi, S.; Swarnalatha, G.; Vishnuvardhan, Z.; Harika, D. Statistical Analysis of the Groundwater Samples from Bapatla Mandal, Guntur District, Andhra Pradesh, India. IOSR J. Environ. Sci. Toxicol. Food Technol. 2014, 8, 27–32. [Google Scholar] [CrossRef]
  55. Karakuş, C.B. Evaluation of water quality of Kızılırmak River (Sivas/Turkey) using geo-statistical and multivariable statistical approaches [Internet]. Environ. Dev. Sustain. 2020, 22, 4735–4769. [Google Scholar] [CrossRef]
  56. Shroff, P.; Vashi, R.T.; Champa neri, V.A.; Patel, K.K. Correlation Study among water quality parameters of groundwater of Valsad District of south Gujarat (India). J. Fundam. Appl. Sci. 2015, 151, 1–10. [Google Scholar] [CrossRef] [Green Version]
  57. Kumar, M.; Ramanathan, A.; Rao, M.S.; Kumar, B. Identification and evaluation of hydrogeochemical processes in the groundwater environment of Delhi, India. Environ. Earth Sci. 2006, 50, 1025–1039. [Google Scholar] [CrossRef]
  58. Gebre-mariam, Z. The effects of wet and dry seasons on concentrations of solutes and phytoplankton biomass in seven Ethiopian rift-valley lakes. Limnologica 2002, 179, 169–179. Available online: http://www.sciencedirect.com/science/article/B7GX1-4GWP957-6/2/4d9fcc9b986796cd8cfb4a1b3193ad03 (accessed on 1 December 2002).
  59. Yilmaz, E.; Koç, C. Physically and Chemically Evaluation for the Water Quality Criteria in a Farm on Akcay. J. Water Resour. Prot. 2014, 06, 63–67. [Google Scholar] [CrossRef] [Green Version]
  60. Poisson, A. Conductivity/salinity/temperature relationships of diluted and concentrated standard seawater. Mar. Geodesy 1982, 5, 359–361. [Google Scholar] [CrossRef]
  61. Taylor, M.; Elliott, H.A.; Navitsky, L.O. Relationship between total dissolved solids and electrical conductivity. Water Sci. Technol. 2018, 1–7. [Google Scholar] [CrossRef]
  62. Dhanasekarapandian, M.; Chandran, S.; Devi, D.S.; Kumar, V. Spatial and temporal variation of groundwater quality and its suitability for irrigation and drinking purpose using GIS and WQI in an urban fringe. J. Afr. Earth Sci. 2016, 124, 270–288. [Google Scholar] [CrossRef]
  63. Boyle, T.P.; Fraleigh, H.D. Natural and anthropogenic factors affecting the structure of the benthic macroinvertebrate community in an effluent-dominated reach of the Santa Cruz River, AZ. Ecol. Indic. 2003, 3, 93–117. [Google Scholar] [CrossRef]
  64. Johnson, R.A. Applied Multivariate Statistical Analysis, 6th ed.; Recter, P., Ryan, D., Behrens, L.M., Eds.; Pearson Education, Inc.: San Antonio, TX, USA, 2007; ISBN 0131877151. [Google Scholar]
  65. McKenna, J. An enhanced cluster analysis program with bootstrap significance testing for ecological community analysis. Environ. Model. Softw. 2003, 18, 205–220. [Google Scholar] [CrossRef]
  66. Kasier, H.F. The application of electronic computers to factor analysis. Educ. Psychol. Meas. 1960, 20, 141–151. [Google Scholar] [CrossRef]
  67. Barakat, A.; El Baghdadi, M.; Rais, J.; Aghezzaf, B.; Slassi, M. Assessment of spatial and seasonal water quality variation of Oum Er Rbia River (Morocco) using multivariate statistical techniques. Int. Soil Water Conserv. Res. 2016, 4, 284–292. [Google Scholar] [CrossRef]
  68. Boyacioglu, H.; Boyacioglu, H. Water pollution sources assessment by multivariate statistical methods in the Tahtali Basin, Turkey. Environ. Geol. 2008, 54, 275–282. [Google Scholar] [CrossRef]
  69. Su, S.; Zhi, J.; Lou, L.; Huang, F.; Chen, X.; Wu, J. Spatio-temporal patterns and source apportionment of pollution in Qiantang River (China) using neural-based modeling and multivariate statistical techniques. Phys. Chem. Earth Parts A/B/C 2011, 36, 379–386. [Google Scholar] [CrossRef]
  70. Gazzaz, N.M.; Yusoff, M.K.; Ramli, M.F.; Aris, A.Z.; Juahir, H. Characterization of spatial patterns in river water quality using chemometric pattern recognition techniques. Mar. Pollut. Bull. 2012, 64, 688–698. [Google Scholar] [CrossRef] [PubMed]
  71. Tibebe, D.; Beshah, F.Z.; Lemma, B.; Kassa, Y.; Bhaskarwar, A.N. External Nutrient Load and Determination of the Trophic Status of Lake Ziway. CSVTU Int. J. Biotechnol. Bioinform. Biomed. 2018, 3, 1–16. [Google Scholar] [CrossRef]
  72. Meshesha, D.T.; Tsunekawa, A.; Tsubo, M. Continuing land degradation: Cause-effect in Ethiopia’s Central Rift Valley. Land Degrad. Dev. 2012, 23, 130–143. [Google Scholar] [CrossRef]
  73. Badillo-Camacho, J.; Reynaga-Delgado, I.; Barcelo-Quintal, I.; Valle, P.F. Water Quality Assessment of a Tropical Mexican Lake Using Multivariate Statistical Techniques. J. Environ. Prot. 2015, 6, 215–224. [Google Scholar] [CrossRef] [Green Version]
  74. Smith, V.H. Low Nitrogen to Phosphorus Ratios Favor Dominance by Blue-Green Algae in Lake Phytoplankton. Science 1983, 221, 669–672. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  75. Fisher, M.M.; Reddy, K.R.; James, R.T. Internal Nutrient Loads from Sediments in a Shallow, Subtropical Lake. Lake Reserv. Manag. 2005, 21, 338–349. [Google Scholar] [CrossRef]
Figure 1. Study area map and monitoring station locations (a) = countries sharing boundaries with Ethiopia, (b) = major river basins in Ethiopia, (c) = Rift Valley lake basin and (d) = Lake Hawassa sub-basin and monitoring stations).
Figure 1. Study area map and monitoring station locations (a) = countries sharing boundaries with Ethiopia, (b) = major river basins in Ethiopia, (c) = Rift Valley lake basin and (d) = Lake Hawassa sub-basin and monitoring stations).
Applsci 11 08991 g001
Figure 2. Dendrogram for LHW based on Ward’s method showing the clustering of 19 monitoring stations into two significant clusters for both dry (a) and wet (b) seasons.
Figure 2. Dendrogram for LHW based on Ward’s method showing the clustering of 19 monitoring stations into two significant clusters for both dry (a) and wet (b) seasons.
Applsci 11 08991 g002
Figure 3. Box plot of the most discriminating parameters, BOD (mg/L), EC (μS/cm) and Na+ (mg/L) and Wilks’ lambda showing skewedness of discriminatory distribution toward high concentration.
Figure 3. Box plot of the most discriminating parameters, BOD (mg/L), EC (μS/cm) and Na+ (mg/L) and Wilks’ lambda showing skewedness of discriminatory distribution toward high concentration.
Applsci 11 08991 g003
Figure 4. Factor loadings derived from scree plot and eigenvalue for LHW and three factors are retained for dry (a) and wet (b) seasons.
Figure 4. Factor loadings derived from scree plot and eigenvalue for LHW and three factors are retained for dry (a) and wet (b) seasons.
Applsci 11 08991 g004
Figure 5. PCA biplots (a,b) suggest the projection of the monitoring sites (blue dots) and the variable loadings of the primary components (F1 and F2). The biplots additionally display the relationship between highly correlated variables and monitoring stations for dry (a) and wet (b) seasons. High and low values indicate strong positive and negative correlation, respectively, while values close to 0 imply weak correlation between F1 and F2 and the respective parameter.
Figure 5. PCA biplots (a,b) suggest the projection of the monitoring sites (blue dots) and the variable loadings of the primary components (F1 and F2). The biplots additionally display the relationship between highly correlated variables and monitoring stations for dry (a) and wet (b) seasons. High and low values indicate strong positive and negative correlation, respectively, while values close to 0 imply weak correlation between F1 and F2 and the respective parameter.
Applsci 11 08991 g005
Table 1. Monitoring stations in Lake Hawassa Watershed.
Table 1. Monitoring stations in Lake Hawassa Watershed.
NoMonitoring StationsSite CodeLocation
1Wesha RiverMS1LHW upstream
2Hallo RiverMS2LHW upstream
3Wedessa RiverMS3LHW upstream
4BGI effluent discharge siteMS4LHW middle
5Pepsi factory oxidation pondMS5LHW middle
6Tikur-Wuha RiverMS6LHW middle
7Amora-Gedel (fish market)MS7Eastern side of LH
8Amora-Gedel (Gudumale)MS8Eastern side of LH
9Nearby Lewi resortMS9Eastern side of LH
10Fikerhayk center (FH)MS10Center of LH
11Fikerhayk (meznegna)MS11Eastern side of LH
12Center of LH (Towards HR)MS12Center of LH
13Nearby Haile resort MS13Eastern side of LH
14Tikur-Wuha site MS14Eastern side of LH
15Referral HospitalMS15Eastern side of LH
16Ali-Girma site (opposite to HR)MS16Western side of LH
17Sima Site (opposite to Mount Tabor) MS17Western side of LH
18Dore-Bafana BetemengistMS18Southern part of LH
19Hawassa Industrial ParkMS19LHW middle
The site codes are indicated in Figure 1. FH designates Fikerhayk, HR labels Haile Resort, LHW designates Lake Hawassa Watershed, LH designates Lake Hawassa.
Table 2. Analytical methods and instruments used for analysis.
Table 2. Analytical methods and instruments used for analysis.
ParameterAnalytical Method and Instrument
pH, EC, TDS, and TemperaturePortable multi-parameter analyzer (Zoto, Germany)
TurbidityNephelometric (Hach, model 2100A)
DOModified Winkler
BODManometric, BOD sensor
CODClosed Reflux, colorimetric
SRP and TPSpectrophotometrically by molybdovandate (Hach, model DR 3900)
TNSpectrophotometrically by TNT Persulfate digestion (Hach, model DR 3900)
NO2 and TAN
(NH3−N + NH4−N)
Spectrophotometrically by salicylate (Hach, model DR 3900)
NO3Photometric measurements, Wagtech Photometer 7100 at 520 nm wavelength
SSFiltration by standard glass fiber filter
Mg2+, Na+, Ca2+, and K+Atomic Absorption Spectrophotometer, AAS, model NOVAA400
Total ammonium nitrogen (TAN), electrical conductivity (EC), total dissolved solids (TDS), dissolved oxygen (DO), biochemical oxygen demand (BOD5), chemical oxygen demand (COD), soluble reactive phosphorous (SRP), total phosphorous (TP), nitrate (NO3ˉ), nitrite (NO2), magnesium ion (Mg+2), sodium ion (Na+), potassium ion (K+), calcium ion (Ca+2), and suspended solids (SS).
Table 3. Classification system for pollution index.
Table 3. Classification system for pollution index.
Rank
ItemNon-Polluted
(Good)
Slightly Polluted
(LP)
Moderately Polluted
(MP)
Highly Polluted
(HP)
DO (mg/L)>6.54.6–6.52.0–4.5<2.0
BOD5 (mg/L)<33.0–4.95.0–15.0>15
SS (mg/L)<2020–4950–100>100
NH3−N (mg/L)<0.50.5–0.91.0–3.0>3.0
Index score13610
Table 4. Correlation matrix Pearson (r) and alpha (p) values for the wet season.
Table 4. Correlation matrix Pearson (r) and alpha (p) values for the wet season.
ParametersTDSECNH3−NNO3NPO4−PDOBODCODTNTPTempMg2+Ca2+Na+K+
TDS1
EC0.9921
NH3−N0.4460.3791
NO3−N0.1830.172−0.0301
PO4−P0.7970.8240.416−0.1161
DO−0.825−0.850−0.216−0.275−0.7931
BOD0.6980.7190.106−0.1730.712−0.5261
COD0.6950.7140.204−0.1110.730−0.5440.9651
TN0.8740.8550.4810.0590.825−0.8510.5870.6021
TP0.8500.8710.2490.2550.602−0.8060.4850.4820.7361
Temperature0.8600.8640.3310.4100.594−0.6920.4540.4470.6690.821
Mg2+−0.0050.029−0.3170.070−0.013−0.0850.2240.1590.0460.09−0.0201
Ca2+0.3750.397−0.085−0.0800.350−0.3940.5230.5280.4290.240.1370.4011
Na+0.8360.8530.3140.2680.709−0.5990.6190.6320.5720.680.849−0.0620.191
K+0.5230.4310.5310.1550.290−0.4290.1490.1900.7000.340.320−0.0800.200.191
Values in bold are different from 0 with a significance level alpha = 0.05.
Table 5. Correlation matrix Pearson (r) and alpha (p) values for dry season.
Table 5. Correlation matrix Pearson (r) and alpha (p) values for dry season.
ParametersTDSECNH3−NNO3−NPO4−PDOBODCODTNTPTemMg2+Ca2+Na+K+
TDS1
EC0.9991
NH3−N0.4330.4191
NO3−N0.2080.212−0.101
PO4−P0.8140.8150.383−0.041
DO−0.82−0.82−0.31−0.46−0.631
BOD0.6860.6860.450−0.120.749−0.581
COD0.5610.5640.476−0.190.647−0.410.8711
TN0.6450.6420.4100.1840.680−0.570.5200.6191
TP0.8990.8990.484−0.030.921−0.690.8040.6830.5351
Temperature0.8390.8420.3430.2370.532−0.730.4360.3440.2910.7301
Mg2+−0.27−0.27−0.25−0.13−0.040.305−0.13−0.20−0.16−0.13−0.421
Ca2+0.3850.392−0.190.3980.091−0.330.2350.3240.2080.170.455−0.331
Na+0.9330.9310.5500.1730.760−0.830.8130.6940.6010.8810.788−0.380.371
K+0.5340.5310.1820.4190.261−0.640.2370.2400.7010.1970.335−0.390.420.531
Values in bold are different from 0 with a significance level alpha = 0.05.
Table 6. Descriptive statistics (mean and standard deviation) of the physicochemical characteristics of LHW collected during dry season.
Table 6. Descriptive statistics (mean and standard deviation) of the physicochemical characteristics of LHW collected during dry season.
CodesSSTDSECpHNH3−NNO3−N PO4−PDOBODCODTN TP Mg2+Ca2+Na+K+Temperature
MS117.3891787.10.040.63.64.113.888.35.80.0017.22032.56.719.2
(1.6)(4)(7)(0.2)(0.01)(0.01)(2)(0.7)(1.5)(26.9)(1.5)(0)(2.1)(7.4)(1.5)(0.6)(0.8)
MS227.31002007.60.160.410.23.523.7107.57.20.554.0926.28.117.7
(5.8)(15)(30)(0.5)(0.07)(0.04)(6.7)(1)(7.2)(32.5)(1.8)(0.5)(16.4)(8.4)(3)(0.6)(1)
MS354.5871757.70.100.65.94.469.0313.87.50.001153.44.625.86.818.1
(3)(6)(12)(0.3)(0.01)(0.1)(0.1)(0.4)(20.5)(93.3)(2.5)(0)(50.2)(4.2)(3)(0.6)(0.7)
MS458.0157538257.17.602.818.71.563.3263.723.81511.450.4501.119.833.8
(10.4)(59)(108)(0.4)(1.49)(0.5)(2.9)(0.1)(10.8)(84.9)(5.9)(3)(0.1)(5.6)(83)(0.3)(0.3)
MS527.7234946989.512.350.6118.30.919060041.36.52.915.01078.119.329
(1.5)(193)(385)(0.6)(5.15)(0.05)(40)(0.1)(1.3)(241)(16.7)(1.6)(1.8)(10.6)(178)(0.8)(0.6)
MS623.33176357.60.0616.345.326.311.30.0015.718.7111.89.424.5
(0.1)(63)(126)(0.1)(0.03)(0.2)(0.6)(0.5)(1.8)(5.8)(3.7)(0)(1)(0.7)(24)(0.9)(0.4)
MS710.63887658.80.370.92.54.55.91166.80.85.116.2221.920.122.8
(1.4)(7)(25)(0.003)(0.08)(0.02)(0.5)(0.5)(0.3)(88)(1.2)(0.2)(0.7)(1.1)(15.9)(0.3)(1)
MS813.65188518.911.7514.35.39.51354.50.43.924.2255.022.222.8
(0.1)(26)(32)(0.02)(3.9)(0.02)(0.1)(0.3)(1.5)(5)(1.5)(0.1)(0.1)(1.8)(30.1)(0.8)(1)
MS99.03927488.70.3823.049.94530.00112.822.2191.920.022.7
(1.2)(3)(24)(0.1)(0.11)(1)(0.1)(0)(0)(0)(0)(0)(1)(1.9)(5.4)(0.3)(1.8)
MS1010.84739558.50.120.62.54.371.83261.10.15.218.4224.220.021.3
(0.4)(8)(5)(0.04)(0.04)(0.1)(0.7)(0.3)(22.8)(104)(0.1)(0.1)(0.5)(1.4)(7.9)(0.1)(1.4)
MS1113.54638808.63.713.12.03.39.0964.50.0015.420.9205.120.723.1
(0.2)(3)(20)(0.04)(1.23)(1.8)(0.3)(0.1)(1)(20.5)(1.5)(0)(0.1)(0.1)(4.9)(0.3)(1.3)
MS1210.34609218.61.3412.34.510.1464.00.00110.126225.023.422.6
(1.8)(18)(35)(0.2)(0.56)(0)(0.4)(0)(0.4)(1.8)(1.8)(0)(1.9)(1.9)(10.9)(2.7)(1.1)
MS1312.54118078.50.153.13.14.047.32556.90.513.540.9280.819.023.2
(3)(9)(33)(0.2)(0.03)(2.1)(0.6)(0.5)(8.8)(55)(2.1)(0.1)(2.5)(7.1)(29.2)(0.7)(1.3)
MS149.33587147.31.191.33.83.520.21343.80.0016.316.9150.816.720.8
(1.3)(82)(166)(0.1)(0.38)(0.1)(0.8)(0.5)(4)(23.8)(1.2)(0)(0.3)(0.2)(37.3)(3.3)(0.9)
MS1524.2163232668.324.971.636.71.563.529049.55.613.733.7420.244.723.9
(0.9)(39)(78)(0.005)(7.06)(0.8)(6.8)(0.03)(9.1)(40)(15.5)(1.9)(2.1)(2.9)(41.3)(3.3)(0.8)
MS1616.64839358.60.961.034.222.675.56.33.83.28.8197.217.821.5
(0.8)(8)(45)(0.1)(0.78)(0.1)(0.3)(0.1)(3.1)(10.5)(0.8)(1.2)(0.3)(1.3)(13.7)(2.3)(0.3)
MS1714.34799358.63.1712.74.2481605.30.00114.133.8159.018.022.0
(0.2)(1)(25)(0.01)(0.04)(0.01)(0.1)(0.1)(3)(10)(0.2)(0)(1.7)(1.3)(16.3)(2.3)(0.5)
MS1896.456111338.70.8617.84.355.518512.30.81634243.719.123
3.0(34)(48)(0.1)(0.21)(0.01)(0.7)(0.3)(1.5)(5)(2.7)(0.2)(1.7)(2)(11.2)(1.2)(0.3)
MS197.2106522468.40.050.89.84.212642012.8218.353.7301.420.921.2
(0.8)(215)(469)(0.1)(0.02)(0.04)(1.4)(0.1)(3)(10)(2.3)(0.3)(3.8)(4.9)(25.5)(0.05)(0.4)
All units in mg/L except pH (Dimensionless), Temperature (°C), EC (µS/cm) and Turbidity (NTU).
Table 7. Average concentrations of monitoring stations for rivers, Lake Hawassa, and point sources (PS) observed in both dry and wet seasons.
Table 7. Average concentrations of monitoring stations for rivers, Lake Hawassa, and point sources (PS) observed in both dry and wet seasons.
ParametersSeasonsRiversLake HawassaPS
DO (mg/L)dry seasons4.24.21.7
wet seasons64.32.1
BOD5 (mg/L)dry seasons19.728.1116.2
wet seasons6.919.1111.6
SS (mg/L)dry seasons30.619.729.3
wet seasons51.120.928.1
NH3−N (mg/L)dry seasons0.20.81.2
wet seasons0.0020.7114.4
PIdry seasons4.557.3
wet seasons3.356.8
Rankdry seasonsMPMPHP
wet seasonsMPMPHP
Table 8. Classification matrix for standard, forward stepwise, and backward stepwise DA of spatial variation in LHW for both dry and wet seasons, showing percentage of correct assignation for discriminating parameters.
Table 8. Classification matrix for standard, forward stepwise, and backward stepwise DA of spatial variation in LHW for both dry and wet seasons, showing percentage of correct assignation for discriminating parameters.
Monitoring
Stations
% CorrectStations Assigned by DA
C1C2
Standard DA mode for dry season
C1100150
C210004
Total100154
Standard DA mode for wet season
C1100150
C210004
Total100154
Forward stepwise DA mode for dry season
C1100150
C210004
Total100154
Forward stepwise DA mode for wet season
C194150
C28504
Total84.5154
Backward stepwise DA mode for dry season
C1100150
C110004
Total100154
Backward stepwise DA mode for wet season
C1100150
C27504
Total87.5154
C1: Includes stations (MS1-MS3, MS6-MS14, and MS16-MS18). C2: Includes stations (MS4, MS5, MS15, and MS19).
Table 9. Matrix of factor loadings calculated based on water quality parameters measured in the period from May to January in the Lake Hawassa Watershed and factor loadings of variables on the first three PCs extracted by using eigenvalue for both wet (a) and dry (b) seasons.
Table 9. Matrix of factor loadings calculated based on water quality parameters measured in the period from May to January in the Lake Hawassa Watershed and factor loadings of variables on the first three PCs extracted by using eigenvalue for both wet (a) and dry (b) seasons.
Parameters F1 (a)F2 (a)F3 (a)F1 (b)F2 (b)F3 (b)
Turbidity0.282−0.420 c0.452 c−0.032−0.781 a−0.320 c
TDS0.974 a0.1360.0440.962 a0.020−0.084
EC0.978 a0.0780.0790.961 a0.018−0.098
pH0.2850.324 c−0.710 b0.056−0.1780.775 a
NH3−N0.416 c0.516 b−0.313 c0.521 b−0.2440.700 c
NO2−N0.428 c−0.475 c−0.620 b−0.088−0.531 b−0.064
NO3−N0.1310.398 c0.507 b0.1950.599 b−0.168
PO4−P0.871 a−0.035−0.1740.830 a−0.414 c−0.200
DO−0.842 a−0.055−0.365 c−0.847 a−0.2460.186
BOD0.784 a−0.461 c−0.2970.796 a−0.394 c0.015
COD0.793 a−0.388 c−0.302 c0.721 b−0.320 c0.135
TN0.898 a0.0640.1010.724 b−0.0150.047
TP0.812 a0.1390.436 c0.897 a−0.333 c−0.105
Temp0.825 a0.2900.1940.783 a0.246−0.143
Mg2+0.077−0.654 b0.389 c−0.350 c−0.567 b−0.380 c
Ca2+0.449 c−0.627 b0.1030.401 c0.524 b−0.246
Na+0.832 a0.205−0.1160.973 a0.0010.076
K+0.477 c0.335 c0.0350.572 b0.522 b0.106
Eigenvalue8.42.42.28.22.91.6
Variability (%)46.813.412.345.7168.8
Cumulative %46.860.272.545.761.770.5
a strongly correlated factor loadings, b moderately correlated factor loadings, c weakly correlated factor loadings.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Lencha, S.M.; Ulsido, M.D.; Muluneh, A. Evaluation of Seasonal and Spatial Variations in Water Quality and Identification of Potential Sources of Pollution Using Multivariate Statistical Techniques for Lake Hawassa Watershed, Ethiopia. Appl. Sci. 2021, 11, 8991. https://doi.org/10.3390/app11198991

AMA Style

Lencha SM, Ulsido MD, Muluneh A. Evaluation of Seasonal and Spatial Variations in Water Quality and Identification of Potential Sources of Pollution Using Multivariate Statistical Techniques for Lake Hawassa Watershed, Ethiopia. Applied Sciences. 2021; 11(19):8991. https://doi.org/10.3390/app11198991

Chicago/Turabian Style

Lencha, Semaria Moga, Mihret Dananto Ulsido, and Alemayehu Muluneh. 2021. "Evaluation of Seasonal and Spatial Variations in Water Quality and Identification of Potential Sources of Pollution Using Multivariate Statistical Techniques for Lake Hawassa Watershed, Ethiopia" Applied Sciences 11, no. 19: 8991. https://doi.org/10.3390/app11198991

APA Style

Lencha, S. M., Ulsido, M. D., & Muluneh, A. (2021). Evaluation of Seasonal and Spatial Variations in Water Quality and Identification of Potential Sources of Pollution Using Multivariate Statistical Techniques for Lake Hawassa Watershed, Ethiopia. Applied Sciences, 11(19), 8991. https://doi.org/10.3390/app11198991

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop