1. Introduction
Inland waters, as a source of good water quality, are essential to human health. The amount of worldwide population relying on surface water for drinking purposes ranges between 70 and 85% [
1]. Additionally, surface waters provide services such as irrigation, fisheries for food, hydropower, purification of wastewaters, flood protection, wetland plants for fuel and construction, as well as water and nutrient cycling provided by surface waters [
2]. The impact of human anthropogenic activities such as discharge of waste products or increased loads of nutrients and sediments from agriculture and urban areas escalate the eutrophication of global inland waters. This situation raises concerns about the protective measures of inland water resources and how to ensure their adequate environmental quality. A fundamental task to understand and prevent environmental threats is the continuous monitoring of water quality. The information gathered during monitoring is used to warn of current and emerging risks and assent of applicable regulations by pointing to changes in trends of quality parameters. From monitoring, empirical data is provided to aid decision-making on health issues, and it provides evidence for water quality management in the long term. Currently, monitoring water quality is a growing challenge because of the difficulty in costs and time resources of sampling tasks and identifying a large number of chemicals for industry and domestic uses that make their way into inland waters. Nowadays, every country is responsible for the state of its water. In developing countries, the priority has been to supply drinking water and control wastewater. In these cases, water quality monitoring programs are designed to be conducted with conventional, boat-based, or buoy-based measuring techniques at specific times and locations and their subsequent laboratory analysis. Some national monitoring programs for inland waters are already under continuous development and operation.
In Latin America, Mexico has established a national water monitoring network (RNMCA) since 1996. Initially, with 200 stations and a sampling frequency of 2 to 3 campaigns a year for lakes, it has gradually been expanded to operate with more regularity after a major renovation in 2012. Today, 2700 stations integrate a surface water dataset with information about the location of the stations and measurement frequency. In Brazil, a similar number of stations (4500) were planned to be reached by 2020 [
3], but other cases are still in need of improvement, such as Argentina with 617 stations [
4] or Chile, where until 2009, it lacked a coordinated monitoring system at a national level [
5]. However, even with the improvement in such cases, the coverage in spatial and temporal scales of the water monitoring programs is limited by the economic costs of each sampling station and the frequency of measurement. Remote sensing offers a strong potential to monitor water quality in inland waters because it magnifies forthcoming data availability by providing radiometric measures prone to be associated with water quality parameters. Mainly visible (VIS) and near-infrared (NIR) bands of the electromagnetic spectrum have been used in several studies to obtain correlations between radiometric data acquired from sensors on board satellites and physical and biochemical constituents in water [
6,
7,
8,
9,
10,
11,
12,
13]. As a result of many years of research, the UN Environment Project recognizes the need to integrate remote sensing sensors in the water quality monitoring tasks [
2].
To reliably establish such relation from modeling, radiometric values and in-situ water quality measurement should be acquired in a coincident acquisition date. Models capable of finding a relationship between radiometric data from sensors and water quality constituents can be classified as empirical, semi-analytical, or machine learning-based [
14]. Empirical models fit a standard linear regression between spectral radiometric values in the form of bands or band ratios from the sensor and in-situ water quality measurements. These models are simple and transparent in their process, requiring minimal computational requirements. However, they are limited to the range and temporal scale of the input data because weather conditions and water conditions create significant alterations in observed radiometric data, bounding its regional generalization. Semi-analytical models are based on the optical properties of the water and the atmosphere, which are unrelated to the light field and are therefore called inherent optical properties (IOPs). These IOPs are used to calculate absorption and backscattering coefficients from which water quality parameters can be retrieved. Because of its physics background in the properties of water and atmosphere, these models are generalizable on a regional scale. However, there is a need for extensive in-situ data for validation. The required information about atmospheric composition and bottom reflectance makes its application difficult where this data is missing [
15]. Machine learning (ML) incorporates the advantages of empirical modeling but with an increased computational capacity to handle complex nonlinear relationships. Similar to empirical methods, ML algorithms are limited by the range and settings of input data of its trained models. However, they present several advantages such as iterative learning to reduce the overall error and to maximize fit [
16]. Due to its novelty, the use of ML is still not well understood in water quality retrievals, and its application is still necessary to further understand its behavior in remote sensing of inland waters [
17].
Several sensors are available for potential applications in water quality retrievals to supply these varieties of models with input data. The Operational Land Imager (OLI) onboard NASA’s satellite Landsat-8 (launched 2013) has a broad background of applications in inland waters through the former Landsat missions [
11,
18,
19,
20,
21,
22,
23]. Despite its original design for terrestrial applications, it is suited to inland waters due to its spatial and spectral resolution (11 spectral bands, up to 30 m spatial resolution) and with the drawback of a sparse temporal resolution for regular monitoring (16 days) [
24]. The use of Medium Resolution Imaging Spectrometer (MERIS) (15 bands, 300 m resolution) on board the European Space Agency (ESA) ENVISAT contributed to monitoring inland waters from 2002 to 2012 [
8,
17,
25,
26] and its archives still offer a potential data mine for further applications. The ESA designed the Ocean and Land Color Instrument (OLCI) on board the Sentinel-3 with similar and improved characteristics (21 spectral bands, up to 300 m spatial resolution) is expected to assume the legacy of MERIS and continue with suitable applications on monitoring inland waters. The MultiSpectral Instrument (MSI) onboard Sentinel-2 has suitable characteristics for water quality monitoring (13 spectral bands, up to 10 m spatial resolution) and temporal resolution (10-days single and 5-days combined constellation revisit frequency of Sentinel-2A and Sentinel-2B). Chlorophyll-a (Chl-a) concentrations have been recently investigated with MSI in different locations worldwide such as Estonia [
27] or Africa [
28]. The utilization of geographic information systems (GIS) is a key resource to gather and manage field and remote sensing data. GIS merges different types of data into a common framework where layers of information are displayed to detect patterns and relations. These observations are useful to communicate, analyze and take decisions to solve complex problems. For monitoring, GIS plays a key role, because of the clear manner the changes can be detected using a variety of data [
29]. When monitoring inland waters by remote sensing, the patterns of water parameters are retrieved from models using sensors’ data and they are commonly displayed in spatial and temporal scales, represented in maps of spatial distribution [
30].
Despite the available approaches in computational modeling and remote sensing data, the consideration of such techniques when planning and executing tasks in water quality monitoring is limited. Consequently, remote sensing may not be recognized as the main driver of the design of water quality monitoring programs and decisions of water managers. This may be because local managers are not considering technical expertise in remote sensing techniques and because research integrating data from entire water monitoring programs for modeling purposes is scarce [
31]. Therefore, an evaluation of remote sensing techniques using data from water quality monitoring programs is necessary as an initial step to foster the integration of remote sensing data into the monitoring routines. This work addresses this situation using the RNMCA in Mexico as a case study, acquiring entire time series of relevant-remote-sensing water quality parameters. This data is matched with available remote sensors and modeled through machine learning approaches to evaluate the feasibility of integrating existing monitoring data into predictive models. Additionally, we provide suggestions to improve monitoring programs with the progressive integration of remote sensing.
The specific objectives of this study are: (1) to verify the feasibility to use existing data (gathered with no considerations of remote sensing) from monitoring programs in a routine of water quality parameter retrievals by remote sensing; (2) evaluate readily-to-use (Level 2 Products) water quality remote sensing products with respect to historical water quality measurements; (3) use radiometric data from available sensors and machine learning techniques for water quality parameters estimations; (4) find feasible water quality parameters and inland waterbodies for such monitoring routine. Additionally, it is provided a critical opinion of the main limitations and challenges when integrating these two independent sources of data. This work highlights the need of upscaling this research field using national-wide monitoring data, evaluating different available sensors, and applying multitemporal analysis with the availability of the sensor’s archives.
2. Study Areas
We study five Mexican lakes identified by the Mexican water authority as the most relevant ones in terms of size and regional use, therefore we considered them as priority targets in terms of the integration of monitoring systems with remote sensing: Chapala, Cuitzeo, Pátzcuaro, Yuriria, and Catemaco [
32]. These are all located in the Trans-Mexican Volcanic Belt (TMVB) and have a volcanic origin, with the exception of the lake of Yuriria, which is artificial. Catemaco belongs to the Gulf-Center hydrological-administrative region, and the other four lakes are within the Lerma-Santiago-Pacific area (
Figure 1). The sampling stations of the RNMCA are displayed in
Figure 2.
Chapala Lake is the largest inland lake in Mexico. It covers approximately 3% of its territory with an area of 1116 km
2, and it is considered one of the largest and shallowest tropical lakes in the world [
32]. It is located at 1523.8 m.a.s.l. at 19°05′–21°03′ N and 99°22′–103°31′ W. It has a mean depth between 4 and 6 m with a maximum depth of 8 m. Its dimensions are 75 km in length and 5.5–20 km in width [
33,
34]. The lake’s primary input is precipitation, but it also receives water from the water sheet and several streams, the Lerma River being its main tributary. Evaporation, pumping, and the Santiago River are the main outflows [
35]. The lake’s catchment area is a mixture of lacustrine sediments with volcanic rocks and basaltic and andesitic lavas accumulated since the Miocene. Thermal springs, outcrops, and calcareous sinter are also present in the basin [
36]. The weather in the catchment is mainly humid subtropical, with a mean annual precipitation of 730 mm and a uniform temperature around 24 °C [
34]. Chapala lake has a high level of sediments and turbidity, partly by the geology and topology of the area that facilitates the transport of clay particles to the lake. In particular, the Lerma River can carry many sediments from areas affected by erosion [
33,
36]. Due to intense water extraction, dry periods, and land-use change, the lake’s volume has decreased up to 42% [
35]. In addition, the rivers and streams can transport contaminants from industrial, agricultural, and livestock activities in the catchment area [
33,
34].
Located at 1820 m.a.s.l with coordinates 20°05′–19°52′ N and 100°50′–101°19′ W, Lake Cuitzeo is the second largest lake in the country by surface area [
37,
38]. With a maximum potential area of 420 km
2, currently, Cuitzeo Lake consists of brackish waters of 1–2 m of depth over an area fluctuating around 300 km
2 [
39,
40]. The lake is highly susceptible to weather variations and has been closed to desiccation during at least three severe drought periods in the last century [
38]. The approximately 4000 km
2 watershed has several low and high hills originated by volcanic activity during the Miocene and Pliocene, including pyroclastic-fall deposits and fluviolacustrine plains [
39]. The Grande and Queréndaro Rivers are the main tributaries [
37]. There is no natural outlet in the lake, although according to Soto-Galera [
41], it could have been connected to the Lerma River during the Holocene. The climate in the catchment is moderate, with temperatures ranging from 10 to 28 °C. Annual precipitation can vary from 765 to 1200 mm and it is concentrated in the summer, from May to October [
37,
39,
41]. As the quality and quantity of the water feeding the lake have decreased (e.g., waters coming from municipal and industrial activities or agricultural runoffs), the lake is in a hypertrophic state. Furthermore, it also has detectable arsenic levels coming from geothermal boreholes around the lake and a thermal spring located on a magmatic chamber [
37].
Pátzcuaro Lake is located at 19°32′–19°42′ N and 101°32′–101°42′ W and 3035 m.a.s.l. It has a maximum surface area of 116 km
2 with an average depth of 5 m, although certain zones can have up to 12 m [
42]. The lake and its four islands originated from volcanic activity during the Pleistocene about 1 million years ago [
43]. The lake is well mixed, not stratified, and it is maintained mainly by small springs of shallow groundwater and by local runoff [
44,
45]. The drainage basin covers 929 km
2 and, while the system today is endorheic, it could have drained to the Lerma River 25,000 years ago. Two seasons dominate the weather: rainfall in summer and stable dry conditions in winter with a mean annual precipitation of 950 mm [
45]. Pátzcuaro Lake has been subject to several paleoenvironmental studies where the extracted cores contain lacustrine sediments that record climate change, human impact, volcanic activity and earthquakes for periods up to 48,000 years ago [
43,
44,
45,
46]. In recent years, fish biodiversity in the lake has decreased due to anthropogenic activities [
42,
44,
45].
Yuriria Lake is located at 20°13′–20°17′ N and 101°12′–101°03′ W at 1740 m.a.s.l. [
47]. With 13.79 km in length and 5.88 km wide, it has a surface of 66 km
2 and a maximum depth of 3.2 m [
48]. It is an artificial lake considered the first post-Columbian hydraulic work, as it was formed after building a deviating water channel from the Lerma River in 1548. The silty clay on the surface avoids water leakage to the aquifer [
47]. The channel from the Lerma river is still the main tributary of the lake [
48], although precipitation and runoff also contribute to it. The mean annual temperature in the area is 18 °C and the rainy season is from May to September, with annual precipitation that can vary from 669 to 797 mm. The lake supports migratory and resident birds, and the area is considered a Wetland of International Importance (RAMSAR) since 2004 [
47]. Espinal Carreón et al. [
48] identified eutrophication and contamination levels that may be dangerous for fish biodiversity and recreation.
Catemaco lake is located at 322 m.a.s.l. with coordinates 18°21′–18°27′ N, and 95°01′–95°07′ W, between San Martín Tuxtla Volcano and the Sierra de Santa Marta. It is part of the subcatchment of the San Juan River, a tributary of the Papaloapan River, the second most fast-flowing river in Mexico [
49]. With an approximately squared layout, Catemaco Lake has an area of about 75 km
2. The mean depth is 7.6 m, but while the lake basin is mainly a plateau of 11 m deep maximum, there are three pits that reach up to 22 m depth [
50]. The lake receives water from at least 10 tributaries, and it is also fed by groundwater and precipitation, which can be up to 5000 m per year. Its main effluent is the Grande de Catemaco River, a tributary of the San Juan River [
51,
52]. Catemaco Lake is considered a warm polymictic lake, there is no stratification, and the concentration of dissolved oxygen is constant across the water column. The light penetration between 0.53 and 2 m depth and its temperature ranges from 23 to 28 °C [
53]. The catchment area of Catemaco covers 322.2 km
2. It has escarpments, cinder cones, and maars resulting from volcanic activity in the late Miocene (~7 million years ago) and having the latest eruptions in the XVIII century. In fact, the lake formed when several cinder cones blocked the drainage to the north, and the lake contains many islands formed by subaquatic vulcanism [
54]. Catemaco Lake is in the tropical rain forest and has high biodiversity. Divided by the NW–SE axis, approximately half of the lake borders with the Natural Reserve of Los Tuxtlas [
51]. However, the area is affected by deforestation, water abstraction, and water pollution due to agriculture and livestock farming [
51]. With coliform, organic matter, hydrogen sulfur, water lilies, and phosphorous, the lake has been classified as eutrophic [
52].
In general, the lakes are affected by well-known stressors caused by anthropogenic activities. Furthermore, they are exposed to a certain degree of diversions and removals of water for agricultural, livestock, and industrial activities [
33], numerous discharges of untreated industrial and municipal wastes, and a growing urban population [
41]. This has disruptive effects, such as drying up and refilling by sediments from erosion and runoff from deforested uplands due to poor management of soil resources [
48], loss of surface area, reduction of the water column, lower water transparency and hyper-eutrophication, erosion, or nutrient loads [
53,
55]. As the lakes are surrounded by large urban areas or are close to industrially developed regions, the spectral signature is contaminated to some degree by atmospheric effects caused by aerosols and other gases. Hence, the optical properties and identification of various optical water types are challenging.