1. Introduction
Eutrophication and subsequently harmful algal blooms (HABs) have increased globally, leading to heightened attention due to their adverse impacts on ecosystems, human and wildlife health, and the economy [
5]. Commonly, HABs are defined as any algal biomass that results in the degradation of water quality leading to decreased oxygen levels resulting in fish kills, foul odor and taste, and the closure of recreational and drinking water areas [
6]. Worse, some species of algae and cyanobacteria can produce toxins which can harm both wildlife and humans [
8]. Given the current global state of HABs and cyanobacteria HABs (CyanoHABs), regional to national monitoring has become more critical than ever for managing these events. However, the vast majority of water quality monitoring is conducted via point sampling using in situ or in vivo methods or collecting a sample for lab analysis. While highly accurate, these methods tends to be labor- and cost-intensive, limiting the spatial and temporal coverage of sampled locations [
10]. Additionally, there is a lag between sample acquisition and processing which may delay decision-making from hours to days [
11]. Furthermore, this approach can lead to misrepresentation of an entire waterbody based on the results of a handful or even a single measurement.
Satellite remote sensing offers a complementary, cost-effective, and scalable approach to traditional field sampling by providing a more comprehensive depiction of surface water quality of a waterbody or a region, encompassing numerous waterbodies [
17]. Algal blooms often form in complex waterbodies, containing numerous optically active constituents and pigments, making the detection of a single pigment difficult. However, algorithms optically estimating chlorophyll
a and phycocyanin (PC) pigments have been used frequently as proxies for algal biomass at various spatial scales using remote sensing platforms [
19]. Currently, there are a few established satellite imagers with the sensor configurations capable of detecting the critical spectral features required for monitoring HABs in inland and near coastal waters, most notably using green reflectance (550 nanometer [nm]), phycocyanin absorption (620 nm), chlorophyll
a reflectance (665 nm–680 nm), and cell backscattering (709 nm) [
23]. Given the emphasis on resolvable inland waterbodies, Sentinel-2 MSI is preferred because the transition to a finer spatial resolution (10–60 m) improves nationwide coverage of inland waterbodies up to ~95% [
Even though Sentinel-2 MSI lacks the phycocyanin-specific spectral band centered near 620 nm, the sensor is equipped with multiple well-placed spectral bands and has been routinely used for the detection and monitoring of HABs and HAB proxies [
30]. Sentinel-2 is a constellation of satellites, which improves revisit time to five days for most locations, critical for the practical integration of remote sensing for water quality monitoring and subsequently HABs. This is especially prudent because heavy cloud cover is common during the peak bloom season (May–October).
Leveraging freely available datasets and cloud computing platforms, such as the Google Earth Engine (GEE) JavaScript API v0.1.388, is integral for developing and producing large-scale spatiotemporal research. The GEE is an open-source software that integrates numerous remote sensing platforms wrapped in a high-performance cloud computing environment, opening the door for large-scale remote sensing research [
31]. The GEE has been successfully utilized in numerous studies for the detection of water quality parameters, including chlorophyll
a [
28], turbidity [
32], colored dissolved organic matter [
33], and monitoring HABs [
35]. Despite these successes, limitations of the GEE model still remain, especially with the integration of Sentinel-2 imagery, including the following: (1) the Sentinel-2 collection only includes Sen2Cor atmospherically corrected products, known to underperform as compared with other atmospheric correction methods for aquatic environments [
36], and (2) Sentinel-2 atmospherically corrected imagery availability is limited to collection dates after December 2018. However, the GEE offers a unique opportunity, with sufficient imagery to examine nationwide, multi-year remote sensing data that would be nearly impractical using a traditional computing system.
To fully integrate remote sensing approaches into water quality monitoring, beyond just the localized level, imagery must be calibrated and validated to field data. However, potential biases and challenges arise when aggregating field data at the regional and national levels, a rarely discussed issue in the application of remote sensing for HAB monitoring and detection. For example, according to the USGS’s National Water Information System (NWIS), there are 34 unique methods of extraction or collecting chlorophyll
a [
37]. Furthermore, there is no single acquisition or extraction method preferred across all federal, state, and local entities, so a direct comparison between studies can be challenging or even impossible. Additionally, there are numerous remote sensing algorithms for the detection of HABs, most with slight band or coefficient variations. Johansen et al. (2022) [
38] demonstrated that most empirically based remote sensing algorithms fell into only a few general algorithm formulas, including the following: Normalized Difference Chlorophyll Index (NDCI), Two-Band Difference Algorithm (2BDA), Three-Band Difference Algorithm (3BDA), and the Chlorophyll Index/Maximum Chlorophyll Index (CI/MCI). These empirical algorithms have been used for decades to detect water quality pigments and HABs across numerous satellite imagers, but their robustness and efficacy across space and time is understudied, limiting algorithm performance spatially and temporally (see reviews [
39]). Instead, this research evaluates the broadscale applicability of these algorithms irrespective of localized conditions to accomplish the following:
- (1)
Aggregate field data across the entire continental united states for each chlorophyll a method.
- (2)
Acquire coincident multi-year nationwide Sentinel-2 reflectance imagery using the open-source software Google Earth Engine.
- (3)
Calibrate and validate four well-established empirical Sentinel-2 algorithms using coincident field data.
- (4)
Explore the broadscale usability of these algorithms for the detection of HABs across space, time, and the field method.
4. Discussion
Overall, the results aligned well with other studies utilizing Sentinel-2-derived algorithms for the detection of HABs via chlorophyll
a as proxies [
56]. For example, Xu et al. (2019) [
56] demonstrated a strong correlation and low error when comparing Sentinel-2 imagery to in situ chlorophyll
a at the regional scale with a Pearson’s correlation coefficient (r) of 0.831 and a RMSE of 3.17 μg/L. However, these results were derived using a very small sample size, with a training set of only 27 points and a validation set of 8 points. Additionally, the range of chlorophyll values was limited to ~5–22 μg/L. Moreover, Wang et al. [
28] demonstrated the value of utilizing the GEE for a regional spatiotemporal study to derive chlorophyll from Sentinel-2 imagery by deploying a support vector machine-based approach. Results show strong correlations coupled with very low errors but used a multi-sensor approach combined with limited sample sizes between 32 and 97 observations, and large temporal windows (10 days). Many of the previous large spatiotemporal studies have been limited due to small coincident datasets with large discrepancies between imagery acquisition and field data collection. While the error statistics presented here are higher than the ones in the studies by Xu et al. (2019) [
56] and Wang et al. (2020) [
28], the authors believe they represent a more accurate view of dynamic nature of chlorophyll
a at the national scale. More research on improved statistical methods such as AI and ML are needed to reduce errors in the models.
The novelty of this research is the extensive spatial and temporal scope coupled with the use of large complex field data, which confirmed the validity and robustness of band-ratio algorithm use for broadscale HAB detection and mapping focusing on chlorophyll a as the primary proxy. While the authors acknowledge that the both the field data and imagery may contain uncertainty, as indicated by the relatively higher error statistics noted in this study, the advantage of leveraging freely available large-scale databases’ open-access cloud computing resources (e.g., Google Earth Engine) outweigh the disadvantages, especially for this specific application as it focuses on high-level monitoring of algal blooms at a scale. These relatively simple-to-deploy approaches can be directly used by researchers and water managers to gain insight into the near-current surface conditions of a waterbody. Furthermore, the addition of Sentinel-2C and Sentinel-2D will reduce the revisit time to only 2–3 days, improving situational awareness and insight into the dynamic nature of their waterbody.
By presenting a practical “real-world” scenario, this study demonstrates the value of band-ratio algorithms for near real-time mapping of algal blooms across heterogenous water bodies within the continental United States. In particular, this research demonstrated the strength of utilizing the 2BDA and NDCI algorithms, which leverages both the red band (665 nm) and the near-infrared band (704 nm), as appropriate algorithms for large-scale mapping where in situ field data are limited or unavailable. In addition to broad-scale mapping, the NDCI and 2BDA performed well among the variations evaluated in this study, including data acquisition methods, temporal windows, and annual consistency. Field data acquisition methods varied substantially across sites and time, but this researched converged on only a few highly used methods: (1) Chlorophyll
a, phytoplankton, chromatographic-fluorometric method (Chl
1). (2) Chlorophylls, water, in situ, fluorometric method, excitation at 470 ± 15 nm, emission at 685 ± 20 nm (Chl
2), and Chlorophyll
a, water, trichromatic method, uncorrected (Chl
3). Generally, Chl
1 and Chl
3 show higher alignment with the remote sensing algorithms, which is likely due to the emphasize on chlorophyll
a, specifically of those two methods over the more general chlorophyll approach of Chl
2. In addition to chlorophyll, phycocyanin is rapidly being adopted as a primary metric for water quality and routinely compared to remote sensing-based algorithms for HAB detection. Sentinel-2 imagery lacks the phycocyanin-specific spectral band centered near 620 nm, which is helpful to delineate between cyanobacteria and other algal species. While other spectral imagers such as OLCI and MODIS contain higher spectral resolutions, their coarser spatial resolutions make them less advantageous for inland waterbodies [
44]. Fortunately, there is a strong correlation between phycocyanin and chlorophyll
a when cyanobacteria are the dominant contributor to algal biomass, and, subsequently, algorithm performances are similar [
Furthermore, the impact of temporal windows on algorithm performance was as expected, with the highest algorithm performances being those collected the same day as the satellite overpass, which is especially important for Chl
1 and Chl
3 methods. However, for visual or qualitative purposes, this study indicated that an up-to-five-day temporal window (±48 from collection date) might still be appropriate, which aligned with the findings of Wang et al., 2020 [
28]. Rarely documented in multi-year remote sensing studies is temporal robustness, or the annual consistency, of algorithm performances. This study explored the annual performance of each of the four algorithms presented to evaluate the temporal variations of algorithms. NDCI and 2BDA algorithms performed consistently across the three years, and were analyzed with RMSE ranges of 19.93–30.59 μg/L and 21.12–30.43 μg/L, respectively. MCI performed well in two of the three years but witnessed a dramatic decline in R
2 and RMSE for 2020 compared to NDCI and 2BDA. Annual consistency is important because it provides a baseline for long-term monitoring using consistent methods. The MCI algorithm utilizes a third spectral band, which might explain the relatively poor performance for the year 2020. It is possible that specific water conditions (i.e., sediment, CDOM, etc.) represented in the 2020 subset data have more influence on the 740 nm band, and less on the 704 nm and 665 nm bands. Furthermore, this might explain the overall poor performance of the 3BDA algorithm across this entire study. However, an additional investigation is required to determine the exact mechanisms or conditions that influence these algorithms’ performance.
There is also a need for further research to determine the most appropriate atmospheric correction method—which might also improve model accuracy—although this is not explored here, given the focus on open-source imagery from the Google Earth Engine (GEE).