Detection of Floating Algae Blooms on Water Bodies Using PlanetScope Images and Shifted Windows Transformer Model

Ahn, Jihye; Kim, Kwangjin; Kim, Yeji; Kim, Hyunok; Lee, Yangwon

doi:10.3390/rs16203791

Open AccessArticle

Detection of Floating Algae Blooms on Water Bodies Using PlanetScope Images and Shifted Windows Transformer Model

by

Jihye Ahn

¹,

Kwangjin Kim

²,

Yeji Kim

³,

Hyunok Kim

³ and

Yangwon Lee

^4,*

¹

Research Institute for Geomatics, Pukyong National University, 45 Yongso-ro, Busan 48513, Republic of Korea

²

Nara Space Technology Incorporation, 632 Gukhoe-daero, Seoul 07245, Republic of Korea

³

Satellite Application Department, Korea Aerospace Research Institute, 169-84 Gwahak-ro, Daejeon 34133, Republic of Korea

⁴

Major of Geomatics Engineering, Division of Earth Environmental System Science, Pukyong National University, 45 Yongso-ro, Busan 48513, Republic of Korea

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(20), 3791; https://doi.org/10.3390/rs16203791

Submission received: 8 August 2024 / Revised: 7 October 2024 / Accepted: 10 October 2024 / Published: 12 October 2024

(This article belongs to the Special Issue AI-Driven Satellite Data for Global Environment Monitoring (Second Edition))

Download

Browse Figures

Versions Notes

Abstract

:

The increasing water temperature due to climate change has led to more frequent algae blooms and deteriorating water quality in coastal areas and rivers worldwide. To address this, we developed a deep learning-based model for identifying floating algae blooms using PlanetScope optical images and the Shifted Windows (Swin) Transformer architecture. We created 1,998 datasets from 105 scenes of PlanetScope imagery collected between 2018 and 2023, covering 14 water bodies known for frequent algae blooms. The methodology included data pre-processing, dataset generation, deep learning modeling, and inference result generation. The input images contained six bands, including vegetation indices such as the Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI), enhancing the model’s responsiveness to algae blooms. Evaluations were conducted using both single-period and multi-period datasets. The single-period model achieved a mean Intersection over Union (mIoU) between 72.18% and 76.47%, while the multi-period model significantly improved performance, with an mIoU of 91.72%. This demonstrates the potential of our model and highlights the importance of change detection in multi-temporal images for algae bloom monitoring. Additionally, the padding technique proposed in this study resolved the border issue that arises when mosaicking inference results from individual patches, providing a seamless view of the satellite scene.

Keywords:

floating algae blooms; deep learning; PlanetScope; shifted windows (Swin) Transformer; vegetation indices

1. Introduction

An algae bloom refers to the rapid growth of floating algae in eutrophic lakes, slow-moving rivers, or stagnant seas, turning the water green. Algae blooms are a global concern, and understanding their causes and impacts requires significant effort. In recent decades, there has been an increasing trend in both the area and frequency of algae blooms in the world’s oceans [1]. Over the past 20 years, the area affected by algae blooms has expanded by 13.2%, the annual frequency of algae blooms has increased by 59.2%, and the affected area has grown by an average of 140,000 km² per year. Algae blooms negatively impact aquatic ecosystems, crops irrigated with affected water, and aquaculture. Cyanobacteria can also produce toxins that pose health risks to both humans and animals [2]. When pollutants flow into rivers or lakes, the water becomes enriched with nutrients such as nitrogen and phosphorus, leading to eutrophication. This nutrient enrichment causes algae to grow in large quantities, resulting in algae blooms [3]. In addition to nutrient inputs, key contributors to algae blooms include rising water temperatures, increased sunlight, and stagnant water flows [4]. Algae blooms are most common in the summer when water temperatures are high, as these higher temperatures accelerate the growth rate of cyanobacteria [5]. Cyanobacteria use light energy for photosynthesis, and typically, the more abundant the light, the faster their growth rate. Different species of cyanobacteria can thrive in a range of light conditions, from early spring to late autumn. Thus, light and temperature are crucial factors in the development of algae blooms. Recent increases in water temperatures due to climate change have exacerbated algae blooms and contributed to deteriorating water quality [6].

It is crucial to prevent algae blooms and mitigate their impacts. Monitoring algae blooms can involve visual observation or periodic water quality sampling and testing. Physical and chemical analysis of these samples can directly indicate the intensity and trends of algae blooms. However, traditional methods are limited in determining the spatial distribution of algae blooms over wide areas and require significant costs and effort to implement [7]. As a result, remote monitoring methods that do not require direct water sample collection and analysis have advanced significantly. Using Unmanned Aerial Vehicles (UAVs) or satellites for remote monitoring allows for cost-effective surveillance of large areas in a short time. Traditional satellite-based methods for detecting algae blooms typically rely on indices developed from the reflectance of satellite channels. For example, Hu [8] developed the Floating Algae Index (FAI) algorithm, which detects floating macroalgae from Moderate Resolution Imaging Spectroradiometer (MODIS) images more effectively than the Normalized Difference Vegetation Index (NDVI). Similarly, Siddiqui et al. [9] developed the Seaweed Enhancing Index (SEI), which uses the Near-Infrared (NIR) and Shortwave-Infrared (SWIR) bands of the Landsat-8 satellite, comparing its results with NDVI and FAI. Son et al. [10] created the Index of Green Algae for GOCI (IGAG) algorithm, suitable for Geostationary Ocean Color Imager (GOCI) images, which can clearly distinguish algae bloom distributions from surrounding seawater. Ma et al. [11] tracked the movement and spread of Ulva prolifera in the Yellow Sea using MODIS and Sentinel-1A/1B (C-band) Synthetic Aperture Radar (SAR) images over 80 days, calculating the Difference Vegetation Index (DVI) due to the similar characteristics of Ulva prolifera to vegetation.

These methods fundamentally rely on the principle that chlorophyll-a, an indicator of algae growth, exhibits strong reflectance in the NIR band and weak reflectance in the red band [12]. By assigning specific thresholds for each index, water and algae bloom areas can be automatically classified. However, the spectral reflectance characteristics can vary depending on the concentration or type of algae, making it challenging to apply a universal threshold. To address the limitations of traditional remote sensing algorithms, recent studies have explored the use of Artificial Intelligence (AI) techniques. Wu et al. [13] applied a Support Vector Machine (SVM) to Sentinel-1A (C-band) SAR images to segment dark areas and identify algae blooms. Wang et al. [14] and Cui et al. [15] employed deep learning-based super-resolution techniques on remote sensing data to improve the estimation of algae bloom distribution. Liang et al. [16] used the Extreme Learning Machine (ELM) model on Gaofen-1 Wide Field of View (WFV) images and successfully distinguished floating macroalgae with high accuracy. Macroalgae features were also extracted from Sentinel-2A/2B Multi-spectral Instrument (MSI) Alternative Floating Algae Index (AFAI) images using a Residual U-Net-based deep learning model [17]. Further advancements include those of Guo et al. [18], who applied the GA-Net, an enhanced U-Net model, to Sentinel-1 SAR images, achieving a mean Intersection over Union (mIoU) of 86.31%. Similarly, Zhu et al. [19] applied a Dual Position-Channel Attention U-Net (DPCAU-Net) model to Sentinel-2 images, comparing and analyzing the results with SVM, Back Propagation Neural Network (BPNN), and Single PCU-Net (SPCU-Net).

AI techniques, such as deep learning, can integrate larger datasets and allow for the learning of complex patterns, often enabling more accurate detection of algae blooms than traditional methods. Studies have demonstrated that these techniques excel in adaptively analyzing the spectral and spatial structure information of satellite images to accurately identify key features of interest [20,21]. Moreover, once a deep learning model is well trained, it can quickly process large images, providing a significant advantage. Previous research has applied traditional deep learning methods, such as Convolutional Neural Networks (CNNs), to detect algae blooms in satellite images with notable success. However, these methods still face challenges, particularly regarding sensitivity to spatial resolution and spectral variability. To overcome these limitations, we developed a model for identifying floating algae blooms using the Shifted Windows (Swin) Transformer. The Swin Transformer is based on transformer architecture, which is more efficient than traditional CNN-based models. While CNNs primarily rely on local filters to learn relationships between neighboring pixels, transformer models can capture more complex patterns by considering the entire image through a self-attention mechanism. Accurate detection of algae blooms across large river systems requires integrating information from multiple river sections, and transformer models excel in capturing interactions between pixels through their self-attention mechanism [22].

This study aims to develop a more effective AI model for identifying floating algae blooms using the Swin Transformer and high-resolution PlanetScope satellite images. To accurately determine the distribution of algae blooms and maintain high accuracy across various environmental conditions, we selected an optimal combination of input channels and conducted both single- and multi-temporal experiments using 1,998 image patches from 105 scenes across 14 water bodies worldwide. Also, we adopted the padding technique to resolve the border issue that arises when mosaicking inference results from individual patches.

2. Study Area

A total of 14 water bodies worldwide were selected for algae bloom cases, as shown in Figure 1 and Figure 2. We focused on regions where cyanobacteria (blue-green algae) can cause harmful algae blooms (HABs). These cases were identified by consulting news articles, websites, and research papers to ensure a diverse selection. Using images from multiple regions provides the advantage of reflecting diverse environmental conditions, which increases the robustness of the model. Since the conditions for algae bloom occurrences can vary by region, incorporating images from various areas helps us better understand the patterns of algae bloom occurrences under different environmental conditions. This approach enhances the model’s generalization ability [20] and prevents it from overfitting to specific areas or conditions. As a result, the model maintains high accuracy in new environments and ensures consistent performance under varying conditions [23].

Lake Sevan, the largest freshwater lake in Armenia, experiences algae blooms year-round, with higher frequency during the summer and early autumn. The primary causes of cyanobacteria blooms in the lake are high water temperatures, a contained environment, and an abundance of nutrients such as phosphorus and nitrogen [24]. Lake Chagan in China has also experienced significant algae blooms, primarily due to the continuous development of tourism in recent years [25]. Since 1993, Lake Xingyun in China has become eutrophic due to economic growth, leading to frequent floating algae blooms, raising concerns about the monitoring of the spatial and temporal variations of floating algae [26]. Lake Clear in the USA and Lake Turawskie in Poland are prone to algae blooms, primarily due to nutrient pollution from agricultural runoff and other human activities. The FAI has been used to monitor and identify these blooms [26]. The National Oceanic and Atmospheric Administration’s (NOAA) National Centers for Coastal Ocean Science (NCCOS) has developed an advanced monitoring system to detect and track HABs in various coastal and lake regions across the United States, selecting areas such as Saginaw Bay, Lake Okeechobee, Big Sarasota Pass, and Lake St. Clair as study locations [27]. Lake Valencia, the largest freshwater lake in Venezuela, has been severely affected by algae blooms due to the continuous inflow of untreated wastewater from surrounding urban, agricultural, and industrial areas [28]. The Nakdong River, the longest river in South Korea, is highly prone to algae blooms due to its gentle slope, slow flow rate, and proximity to urban and industrial areas. These blooms are particularly severe from spring to autumn, driven by rising water temperatures and nutrient runoff [29]. Similarly, the Geum, Miho, and Yeongsan Rivers in South Korea experience increased concentrations of cyanobacteria, especially during the summer months. These blooms are intensified by droughts and heat waves, resulting in longer durations and higher densities of harmful algae. The information on the 14 water bodies and the number of satellite images used is shown in Table 1.

Figure 1. Google Maps image showing the locations of 14 water bodies around the world where algae blooms occur [30].

Figure 2. Satellite images of the study areas using Bing Maps images [31].

3. Data

The PlanetScope satellite constellation, operated by Planet Labs, consists of over 430 nanosatellites that provide daily images of the entirety of Earth [32]. This system allows for the acquisition of satellite images just before and after algae bloom events, enhancing the ability to monitor the development and progression of algae blooms over time. The high revisit frequency enables timely responses and more effective management of these events. With a spatial resolution of approximately 3 m, PlanetScope images offer excellent visibility for detecting and monitoring algae blooms, even those occurring on a small scale. This high resolution helps in accurately identifying the boundaries and extent of algae blooms.

Table 2 provides the specifications of PlanetScope. PlanetScope is a 3U form factor (10 × 10 × 30 cm) CubeSat. As of 2020, the constellation includes three generations of satellites: Dove-Classic (PS2), Dove-R (PS2.SD), and SuperDove (PSB.SD). Each successive generation has enhanced the sensing capabilities, with Dove-R improving spectral resolution and SuperDove introducing additional spectral bands. All generations include blue, green, red, and NIR spectral bands, and the PSB.SD satellites additionally feature a red-edge band, which is particularly sensitive to algae bloom detection [33].

The PlanetScope product is available in various processing levels. For this study, we utilized Level 3B surface reflectance images, which include both geometric and atmospheric corrections [32]. Geometric corrections for PlanetScope images are achieved using a combination of Digital Elevation Models (DEMs) with spatial resolutions of 30 to 90 m, along with feature points and Ground Control Points (GCPs) extracted through computer vision algorithms. This process results in a Root Mean Square Error (RMSE) of approximately 10 m across the entirety of Earth [32]. Additionally, PlanetScope images maintain a relative error of less than 0.5 pixels between time-series images, making them ideal for time-series studies involving satellite data [34]. Multiple studies have shown that PlanetScope data are effective for applications in turbid water conditions and algae blooms [35,36,37]. A total of 105 PlanetScope scenes were acquired from 2018 to 2023 from the 14 water bodies. Table 3 presents the number of PlanetScope images obtained per year and season.

4. Methods

4.1. Overview

Figure 3 illustrates the overall workflow of the proposed method for floating algae bloom identification, including data pre-processing, dataset generation, deep learning modeling, and inference result generation. The pre-processing of PlanetScope images involves Coordinate Reference System (CRS) transformation, geometric resampling, histogram stretching, and data normalization. During the dataset generation step, labeled datasets were created through image annotation, which is essential for training deep learning models. The input dataset was divided into single-period and multi-period images to analyze differences in model performance when using single versus time-series images. In the deep learning modeling step, the data were split into training, validation, and test sets. The model structure and hyperparameters were iteratively optimized to develop an effective algae bloom detection model. Model accuracy was validated using a confusion matrix to ensure reliable identification of algae blooms. For the inference result generation step, padding was applied to the input datasets to maintain the benefits of using image patches while ensuring continuity at patch boundaries, leading to more accurate inference results. A water mask was also applied to extract actual water regions, which helped generate the final algae bloom detection results. The water mask for South Korea was created using land cover data provided by the Ministry of Environment [38], although water masks for regions outside of South Korea were not available.

4.2. Data Pre-Processing

In the first step of data pre-processing, CRS transformation and geometric resampling were performed to standardize all images. The CRS was converted from the World Geodetic System 1984 (WGS84) to the Universal Transverse Mercator (UTM), and the images were resampled to a spatial resolution of 3.2 m. This ensures consistent location information across images collected from various regions. Additionally, histogram stretching was applied to enhance image contrast. The pixel values were adjusted between the 2nd and 98th percentiles to avoid outliers or extreme values. Finally, min-max normalization was conducted to scale the pixel values between 0 and 255 as follows:

Normalized Value = \frac{X - X_{\min}}{X_{\max} - X_{\min}} \times 255

(1)

where

X

,

X_{\min}

, and

X_{\max}

represent the pixel value, the minimum, and the maximum values in the original image, respectively. This normalization process, a standard practice in image processing, helps standardize pixel values across images, making them suitable for efficient storage and processing. This comprehensive pre-processing ensures the images are consistent and appropriate for model training and inference, ultimately improving the efficiency of the overall process.

4.3. Dataset Generation

4.3.1. Image Labeling

Image labeling is critical to ensuring the performance of a deep learning model, as accurate labels are essential for evaluating the model’s accuracy. Consistent labeling ensures uniformity during model training and helps prevent incorrect predictions. Chlorophyll-a, a pigment responsible for absorbing light necessary for photosynthesis in plants and widely distributed in algae, can be used to estimate algae blooms indirectly. As shown in Figure 4, chlorophyll-a typically exhibits low reflectance in the blue band (center wavelength around 490 nm), high reflectance in the green band (center wavelength around 560 nm), low reflectance in the red band (center wavelength around 665 nm), and very high reflectance in the red-edge band (center wavelength around 705 nm) [12]. PlanetScope’s Dove-Classic, Dove-R, and SuperDove satellites include a green band, and notably, the SuperDove also includes a red-edge band. Labeling was conducted based on the spectral characteristics sensitive to chlorophyll-a in the green and red-edge bands. News articles and academic papers reporting algae blooms in specific regions were also referenced [24,25,26,27,28,29]. For the Geum River, Miho River, Nakdong River, and Yeongsan River in South Korea, chlorophyll-a observations provided by the Ministry of Environment’s Water Environment Information System were used [39].

Figure 5 sequentially presents the true color composite, false color composite (red-edge–green–NIR), and corresponding labels. The false color composite enhances the contrast between water and algae blooms, making the bloom areas more distinct and facilitating the labeling process. Digitization was performed by referencing the pixel values and distribution in the false color composite. To ensure consistency in labeling, a separate test set was created for each image to verify the labeling accuracy. The Swin Transformer model was then applied to each test set, and images with a mean Intersection over Union (mIoU) below 40% were excluded, resulting in the removal of 16 images. Ultimately, 105 PlanetScope scenes from 2018 to 2023, covering 14 water bodies, were generated as input data for the floating algae bloom identification model. This method ensures the dataset’s consistency and accuracy, optimizing the model’s performance.

4.3.2. Input Image Preparation for Single-Period Model

Recently, indices have been developed to simulate chlorophyll-a concentrations, such as the Normalized Difference Chlorophyll Index (NDCI), which is calculated using the reflectance differences between the red-edge and the red or green bands [40,41,42]. However, of the PlanetScope constellation satellites, only the SuperDove (available since March 2020) includes the red-edge band. This limitation means that chlorophyll-a indices cannot be calculated for all images. Therefore, we calculated vegetation indices using the red, blue, and NIR bands, which are available in all PlanetScope satellites, and included these indices as input channels in the deep learning model.

First, using satellite image analysis, we calculated the NDVI, the most widely used vegetation index in algae bloom detection research. Floating algae blooms exhibit similar spectral characteristics to terrestrial vegetation in the red and NIR bands, making them distinguishable from the surrounding water [43]. The NDVI formula is as follows:

NDVI = \frac{R_{NIR} - R_{RED}}{R_{NIR} + R_{RED}}

(2)

where

R_{NIR}

and

R_{RED}

represent the reflectance in the NIR and red spectral bands, respectively. NDVI increases with vegetation density or biomass [44]. Next, we calculated the Enhanced Vegetation Index (EVI), which offers improved sensitivity in high biomass regions and includes corrections for soil and atmospheric influences. The EVI formula is as follows:

EVI = G \frac{R_{NIR} - R_{RED}}{R_{NIR} + C_{1} R_{RED} - C_{2} R_{BLUE} + L}

(3)

where

R_{NIR}

,

R_{RED}

, and

R_{BLUE}

represent the reflectance in the NIR, red, and blue spectral bands, respectively. The parameters

G, C_{1}, C_{2}

, and

L

are the gain factor, coefficients for the aerosol resistance term, and the canopy background adjustment factor, respectively. The most commonly used values for these parameters were applied:

G = 2.5, C_{1} = 6, C_{2} = 7.5,

and

L = 1

[45]. These adjustments make EVI more responsive in high biomass regions and correct for atmospheric conditions and canopy background noise, providing an improved vegetation signal compared to NDVI [45].

Histogram stretching was applied to improve the visual quality of the images by excluding extreme values of the vegetation indices and emphasizing critical information. Next, normalization was performed to scale the values of each vegetation index to integer values between 0 and 255. This process created input images consisting of six bands (R, G, B, NIR, NDVI, and EVI). In a preliminary experiment, we found that using six bands as input channels produced more reliable results than using only the three RGB bands or the four RGB–NIR bands. This is because floating algae blooms can be identified by visible wavelengths and vegetation indices such as NDVI and EVI. To accommodate all six input channels, we modified the structure of the Swin Transformer model.

By dividing the final images into 512 × 512 patches, the input dataset for the deep learning model was constructed. Using smaller patches instead of whole images offers several advantages [46]. These include reducing memory usage and improving training speed, as the model processes smaller data chunks. Additionally, training the model with multiple patches from different parts of the image allows it to learn a wider variety of visual features, enhancing its generalization capability. The model can also focus on local features within patches, which is particularly beneficial for detecting detailed patterns like algae blooms. Moreover, smaller patches help address issues related to class imbalance by creating more balanced samples. Finally, generating more training data from patches helps prevent the model from overfitting to specific features in the training set. Figure 6 summarizes the procedure for creating the input dataset for the single-period algae bloom identification model. A total of 1,998 patches were generated from 105 scenes from 14 water bodies, covering the years 2018 to 2023. To enhance the model’s accuracy in identifying algae bloom areas, patches were generated only if they contained at least one pixel from an algae bloom region. Table 4 shows the ratios of algae bloom pixels in the 1,998 patches. These diverse ratios ensure that the model is tested under various conditions, including cases with very few or no algae bloom pixels.

4.3.3. Input Image Preparation for Multi-Period Model

To analyze the differences in model performance when using single-period versus time-series images, input datasets for multi-period analysis were created. This dataset was based on pre-bloom and during-bloom images from three rivers in South Korea: the Geum River, Miho River, and Yeongsan River, comprising 34 image pairs (Table A1, Table A2 and Table A3). Pre-bloom images refer to cases where the proportion of algae bloom pixels is less than 5% before the algae blooms spread. Figure 7 provides examples of pre-bloom and during-bloom images for the Yeongsan River. We then generated difference images for six bands (R, G, B, NIR, NDVI, and EVI) by subtracting the pre-bloom image from the during-bloom image. These difference images were divided into 512 × 512 patches, resulting in six difference bands per patch (dR, dG, dB, dNIR, dNDVI, and dEVI). To improve model accuracy, patches were created only if the during-bloom image contained at least one pixel of algae bloom. Figure 6 summarizes the input dataset generation process for the multi-period algae bloom identification model. Ultimately, 567 patches were generated from 34 pairs of multi-period images for the Geum River, Miho River, and Yeongsan River in South Korea.

4.4. Deep Learning Modeling

4.4.1. Dataset Split

The dataset of 1,998 single-period images was split into training, validation, and test sets. To avoid overfitting, which can occur when a model becomes too familiar with specific regions used in the training process [47], we ensured that the datasets for training, validation, and testing did not overlap. Specifically, patches from the Geum River (172 patches), Miho River (306 patches), and Yeongsan River (296 patches) were entirely excluded from the training and validation datasets for the single-period experiment (Table 5). This means that the test datasets only included images from these regions, and the training and validation datasets contained no images from them. Similarly, for the multi-period experiment, datasets from the Geum River (130 patches), Miho River (247 patches), and Yeongsan River (190 patches) were excluded from the training and validation datasets, as outlined in Table 6. This approach ensures the robustness and generalization capability of the deep learning model by preventing overfitting.

4.4.2. Swin Transformer

The Swin Transformer is a deep learning image recognition technology introduced by Microsoft in 2021, which adapts the Transformer model—originally designed for sequence or natural language processing—to image recognition. It enhances image recognition accuracy by using a self-attention mechanism to efficiently focus on the most relevant parts of the input data [48,49]. Traditional Vision Transformer models generate low-resolution feature maps and perform global self-attention operations across a single image. In contrast, the Swin Transformer improves upon this by generating hierarchical feature maps from image patches and performing local self-attention operations within individual windows [22].

The self-attention mechanism enables the model to focus on the most relevant areas and channels of the input data, improving the accuracy of the output. The Swin Transformer addresses a common issue where pixels near the boundaries of image patches are not adequately considered in self-attention calculations by using a shifting window approach. This method shifts the windows by half the patch size, ensuring accurate recognition even near patch boundaries. The hierarchical shifted window structure allows the Swin Transformer to apply local self-attention effectively, leading to performance that is superior to traditional CNN-based models and Vision Transformers.

This approach enables more accurate identification of the distribution and characteristics of algae blooms, maintaining high accuracy even in complex environments. The Swin Transformer can effectively recognize and learn relationships and interactions between elements far apart within the data, which is crucial for high-resolution remote sensing images. Satellite images often contain complex patterns and structures that require an understanding of relationships between distant parts of the image. For example, accurately detecting algae blooms across an entire river requires integrating information spanning multiple river sections. The Swin Transformer has demonstrated significant success over previous state-of-the-art models in image classification (87.3% top-1 accuracy on ImageNet-1K), semantic segmentation (53.5% mIoU on ADE20Kval), and object detection (58.7% box AP and 51.1% mask AP on COCO test-dev) [22].

An overview of the Swin Transformer is presented in Figure 8. The Swin Transformer reduces quadratic computational complexity by creating hierarchical feature maps (Figure 8a). The shifted window approach, shown in Figure 8b, significantly decreases latency compared to previous sliding window-based methods for calculating self-attention. For efficient computation with non-overlapping windows, shifted window partitioning configurations are utilized in consecutive Swin Transformer blocks (Figure 8c). According to the architecture of the Swin Transformer (Figure 8d), the process begins by dividing the input image into a sequence of non-overlapping patches using the patch partitioning module. These patches are then passed through a linear embedding layer, projecting them into an arbitrary dimension. Several Swin Transformer blocks follow, applying self-attention. The primary role of the patch merging module is to reduce the number of tokens in the deeper layers. Notably, the feature map resolutions in the hierarchical stages are similar to those in traditional convolutional architectures like ResNet [50]. The characteristics of the Swin Transformer enable the algae bloom identification model to maintain high accuracy even in complex environments, where it can effectively recognize various patterns and structures.

The Swin Transformer is categorized into Swin-T (Tiny), Swin-S (Small), Swin-B (Base), and Swin-L (Large), based on the number of channels in the hidden layers of the first stage and the number of layers (Table 7). This study utilized the Swin-T model, which has demonstrated excellent performance in algae bloom detection. The detailed architectural specifications of the Swin-T model are presented in Table 8.

4.4.3. Model Training and Validation

Transfer learning enhanced the floating algae bloom identification model by utilizing a pre-trained model on the ADE20K dataset. ADE20K is a widely used semantic segmentation dataset that covers 150 diverse semantic categories [51]. We used UperNet from the MMSegmentation library due to its high efficiency and performance [52]. Subsequently, the Swin Transformer architecture and its hyperparameters were optimized to better fit the specific requirements of the floating algae bloom identification model (Table 9).

The model was trained iteratively through the deep learning process, with accuracy calculated using the classified validation data for each model. To evaluate the performance of the floating algae bloom identification model, a confusion matrix was generated using labeled and predicted images from the test dataset. In the binary classification of algae bloom versus background, the correlation between the model’s predictions and the actual labels is represented through a confusion matrix, comprising true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) values. Based on the confusion matrix, accuracy metrics such as accuracy, precision, recall, F1-score, and mIoU were calculated. High values for these metrics indicate strong model performance [53].

Figure 9 illustrates the confusion matrix and formulas for calculating accuracy metrics. Accuracy is the proportion of all pixels in the predicted image that match the labeled image. This metric focuses on the overall effectiveness of a classifier. Precision is the proportion of true-positive predictions among all positive predictions made by the classifier. It indicates the classifier’s accuracy in identifying the positive class. Recall is the proportion of true-positive predictions among all actual positive instances in the data. It reflects the classifier’s ability to capture all the true-positive cases. A high FP leads to lower precision, which means a tendency to overestimate, while a high FN leads to lower recall, which indicates a tendency to underestimate. The F1-score is calculated as the harmonic mean of precision and recall. The Intersection over Union (IoU) is the ratio of the intersection area to the union area of the labeled and predicted images, serving as a crucial metric in deep learning image recognition. The IoU effectively combines the characteristics of accuracy, precision, and recall, making it a standard measure of image recognition accuracy in computer vision. The mIoU represents the average IoU across all classes, providing a comprehensive assessment of the model’s performance [54]. The final model was generated through the performance evaluation procedure of this floating algae bloom identification model.

4.5. Inference Result Generation

To complete the deep learning model’s input dataset, the images were divided into 512 × 512 patches. This approach offers several advantages, including reduced memory usage, faster model training, and improved learning of local features. However, simply arranging these patches consecutively can lead to discontinuities in the inference results at the boundaries between patches. To address this, we applied overlapping patches with a specific margin, extracted the inference results for each patch by a specific width, and combined them to match the size of the original image (Figure 10). Through various experiments, a padding size of 212 pixels proved to be the most effective for 512 × 512 patches. However, this padding size may vary depending on the study area and data used. By constructing the input dataset with padding during inference, we maintained the advantages of using patches while ensuring continuous values at patch boundaries, leading to more accurate inference results.

Figure 11 presents an example of the inference results for the Yeongsan River. Before applying padding, discontinuities were noticeable at the boundaries between patches in the algae bloom areas. However, after padding was applied, the results appeared more seamless and natural. Water masking was then performed using land cover data to generate the final algae bloom identification results. These steps effectively resolved boundary issues during inference, leading to more accurate and natural algae bloom detection outcomes.

5. Results and Discussion

5.1. Comparisons with In Situ Measurements and Vegetation Indices

To ensure the reliability of the labeled images, comparisons were made with in situ chlorophyll-a measurements, as well as NDVI and EVI, which can indirectly estimate the occurrence of algae blooms. Figure 12 compares 59 labeled images with in situ chlorophyll-a measurements collected between 2018 and 2023 from the Geum River, Miho River, Nakdong River, and Yeongsan River in South Korea [39]. Figure 12a,b present histograms of chlorophyll-a concentrations for pixels labeled as algae bloom or non-algae bloom, respectively, while Figure 12c shows a box plot of chlorophyll-a concentrations. Pixels labeled as algae bloom more frequently show chlorophyll-a concentrations exceeding 25 mg/m³, while most chlorophyll-a concentrations in non-algae bloom pixels remain within the 0–25 mg/m³ range. In algae bloom areas, the 25th and 75th percentiles of the box plot were 12.5 mg/m³ and 107.2 mg/m³, respectively, indicating significantly higher concentrations. In contrast, in non-algae bloom areas, the 25th and 75th percentiles were 6.2 mg/m³ and 23.9 mg/m³, reflecting generally lower concentrations. The extreme outliers distant from the box plots are noise generated during the labeling process, which is an unavoidable limitation in any labeling work. Thus, pixels labeled as algae bloom exhibit higher and more variable chlorophyll-a concentrations, illustrating the ecological phenomenon of elevated chlorophyll levels during algae bloom events. Table A4 provides statistical information on in situ chlorophyll-a measurements from the Geum River, Miho River, Nakdong River, and Yeongsan River in South Korea between 2018 and 2023.

Approximately 100 patches were sampled from the 1998 patches used as input images for the floating algae bloom identification model, and the distribution of NDVI and EVI for pixels labeled as algae bloom or non-algae bloom within the water area was analyzed. Figure 13a,b show histograms of NDVI and EVI for pixels labeled as algae bloom, while Figure 13c,d display histograms of NDVI and EVI for pixels labeled as non-algae bloom. Figure 14 presents box plots of NDVI and EVI values. The NDVI values of pixels labeled as algae bloom frequently range from 0.166 to 0.273, with a mean of 0.204, reflecting vegetation activity in the algae-affected areas. In contrast, the NDVI values of pixels labeled as non-algae bloom cluster around 0, with a mean of 0.086, indicating the typical characteristics of water bodies. The EVI values of pixels labeled as algae bloom mostly fall between 0.364 and 2.456, with a median of 1.311. In contrast, the EVI values of pixels labeled as non-algae bloom tend to cluster between 0.235 and 1.157, with a median of 0.534, indicating limited vegetation activity in non-algae areas. Based on these analyses, the labeling results accurately reflect the phenomenon in which areas with algae blooms exhibit higher chlorophyll-a concentrations or vegetation index values, indicating active algae proliferation.

5.2. Accuracy Evaluation for Single-Period Experiment

Table A5 presents the confusion matrices for each test set in the single-period evaluation, while Table 10 shows the overall accuracy metrics. Generally, the accuracy of the 100-random-patch test set (mIoU of 84.27%) was higher than that of the test sets constructed for specific regions (mIoU ranging from 72.18% to 76.47%). This higher accuracy is likely due to the overlap between training and test regions, allowing the model to learn specific regional characteristics. This suggests that diverse training data contribute to improved model performance. Precision and recall also demonstrated balanced performance. When separate test sets were constructed for each region, the mIoU ranged from 72.18% to 76.47%. This result is considered the most rigorous validation of the deep learning model’s performance, as the areas and patches used in the training set were entirely excluded from the test set. In the three regions—the Geum River, Miho River, and Yeongsan River—the algae IoU ranged from 47.36% to 57.61%, indicating lower sensitivity in detecting algae blooms. This is likely due to the model’s limitation in distinguishing between algae and non-algae classes when regional characteristics were not learned.

In particular, the Geum River exhibited lower recall than precision, suggesting that many actual algae bloom pixels were missed. This can be attributed to the relatively lower proportion of algae bloom pixels in the training dataset for this region. Specifically, the Miho River showed lower precision and recall values compared to other regions. With an algae IoU of 47.36%, distinguishing between algae and non-algae areas proved challenging. As shown in Figure 15, the Miho River sometimes had actual algae blooms spreading along the river edges. In reality, algae blooms tend to occur more frequently along the edges of rivers than in the center. Compared to the central parts of the river, the water at the edges tends to flow more slowly and stagnate, providing a favorable environment for algae blooms. River edges are also more likely to accumulate nutrients from land-based sources. The relatively shallow water at the edges can absorb more solar heat, resulting in warmer temperatures conducive to algae growth. Furthermore, river edges often have more vegetation and structures, slowing water flow and further promoting nutrient accumulation, which leads to algae blooms. However, in satellite images, river edge areas often contain mixed pixels of water, land, and various elements such as trees, grass, and artificial structures, which can hinder the model’s ability to detect algae blooms. These noise factors in the Miho River are considered to have contributed to the model’s decreased detection capability.

Overall, the floating algae bloom detection algorithm achieved its best performance with an mIoU of 84.27% when the test set was randomly constructed, highlighting the benefits of diverse training data. The lower performance on region-specific test sets, with mIoU values ranging from 72.18% to 76.47%, underscores the importance of including comprehensive regional data to improve the model’s generalization ability. Enhancing the dataset with more diverse and extensive data from various regions will likely further improve the model’s performance in algae bloom detection.

Figure 16 shows the qualitative accuracy obtained using 100 random patches: (a) RGB true color composite, (b) false color composite using red–green–NIR bands, (c) label image, and (d) predicted results of the floating algae bloom identification model. The green and NIR bands exhibit relatively high reflectance in algae bloom areas. Specifically, NIR is strongly absorbed by water but highly reflective in biological materials such as algae, enhancing the contrast between water and algae, making detection easier. Using these bands in the false-color composite image clearly shows the distribution of algae blooms. The labels and predictions generally match in the cases of the Geum River and Miho River. However, in the case of the Yeongsan River, some actual algae bloom pixels were not included. This discrepancy is attributed to slight spatial variations in the reflectance values of the green band, which affected the model’s predictions but not the stable label creation process. The labels were designed to be stable and cover a broad range of environmental conditions, therefore potentially overlooking minor reflectance variations in specific areas.

Figure 17 shows the qualitative accuracy of a test set for the 296 patches from the Yeongsan River. The model was trained on images excluding this area, making this a rigorous validation. While the results show a slight underestimation compared to the 100 random patch results, the model still effectively identified structures like bridges and clouds and accurately estimated the overall distribution of algae blooms.

Figure 18 illustrates images that showed significant differences in detection rates based on a test set constructed exclusively for 172 patches of the Geum River, having excluded images from this region during model training. ROI_A represents an area where the model accurately predicted actual algae bloom pixels, while ROI_B indicates an area where the model incorrectly classified actual algae bloom pixels as non-algae. The histogram distribution of the six bands used in the deep learning input dataset for each case was examined. Figure 19 shows the distribution of NDVI and EVI bands, revealing distinct differences based on detection rates. In the ROI_A region, NDVI and EVI values were relatively high, with many values above 0, positively influencing the detection of algae blooms. NDVI and EVI typically show higher values in vegetation-rich areas, including algae blooms. In contrast, in the ROI_B region, where the detection rate was lower, NDVI and EVI values were distributed closer to 0 compared to cases with higher detection rates, negatively impacting the detection of algae blooms.

These examples suggest that the detection rate decreases when the values of specific bands in the test images differ from those in the algae bloom regions of the training images. Therefore, collecting data under various environmental conditions can improve generalization and help develop a more robust algae bloom identification model that is not limited to specific regions.

5.3. Accuracy Evaluation for Multi-Period Experiment

To analyze the differences in model performance when using single-period images versus time-series images, the performance of the multi-period deep learning model was evaluated. For this analysis, 34 pairs of pre- and during-bloom images were used to generate 567 patches, with 57 patches randomly selected for the test set. Additionally, as in the single-period evaluation, separate test sets were constructed to ensure no overlap with the training data, specifically from the Geum River (130 patches), Miho River (247 patches), and Yeongsan River (190 patches). Table A6 presents the confusion matrices for each test set in the multi-period evaluation. Table 11 shows the overall accuracy metrics, with mIoU values ranging from 86.74% to 94.70%, indicating excellent performance. The Yeongsan River showed the highest mIoU, which can be attributed to the extensive coverage of algae blooms across the water area, making it easier for the model to learn and predict accurately.

The model performed exceptionally well despite excluding the training regions from the test sets. This indicates that using time-series images, therefore capturing both pre- and during-bloom conditions, allows the model to effectively learn the temporal dynamics and characteristics of algae and non-algae bloom areas. As a result, this temporal context enhances the model’s ability to generalize and accurately detect algae blooms, even in regions it has not been directly exposed to during training. This highlights the advantage of utilizing time-series data to improve the robustness and adaptability of the deep learning model. To further enhance the multi-period model, collecting diverse datasets that reflect various regional conditions and algae bloom distributions is essential. This will ensure the model can generalize well across different environments and scenarios.

Table 12 compares the accuracy evaluation results from randomly selected test sets in both single-period and multi-period scenarios. The multi-period deep learning model demonstrated a significant performance improvement, with an mIoU of 91.72% compared to 84.27% for the single-period model. Recall improved from 81.15% to 94.40%, indicating a substantial reduction in missed actual algae bloom pixels and highlighting the effectiveness of incorporating temporal context in detecting blooms. The multi-period model used datasets from three South Korean rivers, integrating their regional characteristics into the temporal analysis, which led to superior accuracy. These findings suggest that securing a diverse range of time-series images from various regions will further enhance the model’s performance, making it superior to single-period models. Therefore, developing a more extensive and varied dataset will be crucial for the future development of highly accurate algae bloom identification models.

Figure 20 shows the qualitative accuracy obtained using 57 random patches: (a) RGB true color composite of the during-bloom image, (b) false color composite using red–green–NIR bands of the during-bloom image, (c) label image, and (d) predicted results of the floating algae bloom identification model. In the cases of the Geum River and Yeongsan River, the model accurately predicted the extensive algae bloom coverage. In the case of the Miho River, the model effectively distinguished non-algae regions, such as clouds and sandbanks. Compared to the single-period results, the IoU for algae bloom improved from 71.35 to 84.56, indicating a significant enhancement in the correct identification of algae bloom pixels. Figure 21 presents the qualitative accuracy obtained by creating a separate test set for 130 patches from the Geum River, having excluded this area during model training. With a precision of 81.87 and a recall of 90.22, the model tended to overestimate algae blooms in areas such as bridges and sandbanks. Despite the slightly lower accuracy compared to the random patches, the IoU for algae bloom was 75.20, showing that the model could still accurately estimate the overall distribution of algae blooms.

5.4. Time-Series Simulation Using Sentinel-2 Images

Time-series comparisons between in situ measurements and AI inference results can provide insights into how applicable the AI model is for the daily monitoring of algae blooms in an operational system. However, commercial PlanetScope images were not sufficiently available for the specific location needed for the time-series simulation. Therefore, we used publicly available Sentinel-2 images, which have a 3-day revisit frequency. Due to the spatial, temporal, and spectral differences between the two satellites, this simulation cannot fully evaluate the feasibility of daily monitoring with PlanetScope. Nevertheless, we applied the Swin Transformer model, which was trained on PlanetScope data, to Sentinel-2 images, as it is currently the only viable alternative. This also serves as a preparatory step for future PlanetScope time-series simulations. The in situ values were obtained from chlorophyll-a concentration data provided by the Ministry of Environment’s Water Environment Information System at Baekjebo (Buyeo), South Korea (Table 13) [39]. Chlorophyll-a concentration serves as an indicator of algae blooms. Due to limitations in obtaining time-series data from PlanetScope satellite imagery, Sentinel-2 satellite images were used. The Sentinel-2 images were downloaded from Google Earth Engine as surface reflectance products, with atmospheric correction already applied [55]. Only images with cloud coverage below 20% were selected. Sentinel-2 provides 13 multispectral bands, including R, G, B, and NIR bands with a spatial resolution of 10 m, enabling the calculation of NDVI and EVI. An input image consisting of six bands (R–G–B–N–NDVI–EVI) was generated, and the floating algae bloom identification model developed in this study was applied.

Figure 22 shows the location of the in situ station and the monthly time-series changes in chlorophyll-a concentration throughout 2020. From January to March, chlorophyll-a concentration remained relatively low, fluctuating between 20 and 40 mg/m³. From April to June, the concentration gradually increased, peaking at the end of June and early July, with the highest concentration reaching around 160 mg/m³, followed by a sharp decline. Another peak occurred at the end of August, reaching around 100 mg/m³, after which the concentration gradually decreased until December. This pattern suggests that algae blooms likely occurred due to phytoplankton proliferation during the summer months (June to September). Figure 23 and Figure 24 show 12 Sentinel-2 RGB images and model inference results acquired in March, June, August, October, and November 2020. In March, algae blooms were sparse, appearing primarily as small patches in certain areas. By early June, algae distribution expanded significantly, particularly in the central part of the water body and upstream, where the extent of algae increased notably. By mid-June, algae had proliferated extensively, nearly covering the entire water surface, with this time point representing the peak of algae distribution. In August, algae blooms remained active, with extensive coverage upstream. However, by October, the extent of the blooms had noticeably decreased, appearing only in small areas. By early November, algae blooms had nearly disappeared, with only small patches remaining.

In summary, algae blooms were most active between June and August, reaching their maximum extent in mid-June. Starting in October, the blooms gradually decreased, nearly disappearing by November. Algae blooms were rare from January to March and after November. Both the in situ chlorophyll-a measurements and the model-derived results reflect this time-series pattern of algae blooms. However, since the area used in this test was not included among the scenes used for model training, the precise trends in algae changes did not fully align. In the future, acquiring multiple time-series images of the same scenes for model training could further maximize the advantages of using satellite imagery and models.

6. Conclusions

In this study, we developed a deep learning-based model for identifying floating algae blooms using PlanetScope high-resolution images and the Swin Transformer architecture. We found that using six bands (R, G, B, NIR, NDVI, and EVI) as input channels produced more reliable results, prompting us to modify the Swin Transformer model to accommodate all six channels. Using a total of 1,998 patches from 105 scenes acquired between 2018 and 2023 from 14 water bodies worldwide, we conducted training, validation, and testing with no region overlap to build a robust model. Additionally, two types of performance tests were conducted using single- and multi-temporal datasets. The single-period model achieved an mIoU ranging from 72.18% to 76.47% in the performance test without region overlap. In contrast, the multi-period model demonstrated a significant improvement in performance, with an mIoU between 86.74% and 94.70%. The precision, ranging from 81.87% to 93.45%, and recall, ranging from 90.22% to 97.27%, indicate that the multi-period model had a slight overestimation tendency but did not show signs of underestimation. This suggests that our model has a low likelihood of missing floating algae blooms. Additionally, the padding technique proposed in the study resolved the border issue that arises when mosaicking inference results from individual patches, providing a seamless view of the satellite scene. However, our model has limitations in distinguishing between algae and similar objects such as aquatic vegetation, pollen, optically shallow waters, and turbid waters. Future studies should explore the use of multiple satellites with red-edge bands to distinguish between algae and aquatic vegetation. Integrating additional variables such as water depth and surface temperature could also enhance accuracy. Collecting more diverse datasets from various regions and environmental conditions will be essential for further analyzing the characteristics of algae blooms.

Author Contributions

Conceptualization, J.A. and Y.L.; methodology, J.A. and Y.L.; data curation, K.K. and Y.K.; writing—original draft preparation, J.A.; writing—review and editing, J.A., K.K., Y.K., H.K. and Y.L.; All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the “Development of Application Technologies and Supporting System for Microsatellite Constellation” project through the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2021M1A3A4A11032019). This research was supported by a grant (2021-MOIS37-002) of the Intelligent Technology Development Program on Disaster Response and Emergency Management funded by the Ministry of Interior and Safety (MOIS, Korea).

Data Availability Statement

The datasets are available from the authors on reasonable request.

Acknowledgments

The authors express sincere gratitude to the anonymous reviewers for dedicating their valuable time to provide constructive feedback and valuable recommendations.

Conflicts of Interest

Author Kwangjin Kim was employed by the company Nara Space Technology Incorporation. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Additional Details on Input Image Generation for Multi-Period

Table A1. Pairing of pre- and during-bloom images for the Geum River in South Korea.

Country	Name	Latitude	Longitude	Pair No.	Date
South Korea	Geum River	36.432	127.061	01	6 May 2020, 21 May 2020
				02	6 May 2020, 18 Aug. 2020
				03	6 May 2020, 25 Aug. 2020
				04	6 May 2020, 18 Oct. 2020
				05	6 May 2020, 30 Oct. 2020
				06	3 Jan. 2021, 27 Feb. 2021
				07	3 Jan. 2021, 3 Sep. 2021
				08	3 Jan. 2021, 18 Sep. 2021
				09	3 Jan. 2021, 3 Oct. 2021
				10	3 Jan. 2021, 17 Oct. 2021

Table A2. Pairing of pre- and during-bloom images for the Miho River in South Korea.

Country	Name	Latitude	Longitude	Pair No.	Date
South Korea	Miho River	36.516	127.322	11	3 May 2019, 11 May 2019
				12	3 May 2019, 4 Aug. 2019
				13	3 May 2019, 20 Aug. 2019
				14	3 May 2019, 13 Oct. 2019
				15	3 May 2019, 31 Oct. 2019
				16	6 May 2020, 21 May 2020
				17	6 May 2020, 26 Sep. 2020
				18	6 May 2020, 30 Sep. 2020
				19	6 May 2020, 4 Nov. 2020
				20	6 May 2020, 28 Nov. 2020
				21	6 May 2020, 24 Sep. 2021
				22	6 May 2020, 27 Sep. 2021
				23	6 May 2020, 6 Nov. 2021
				24	6 May 2020, 26 Nov. 2021

Table A3. Pairing of pre- and during-bloom images for the Yeongsan River in South Korea.

Country	Name	Latitude	Longitude	Pair No.	Date
South Korea	Yeongsan River	34.930	126.540	25	29 Jan. 2019, 12 Feb. 2019
				26	29 Jan. 2019, 16 Mar. 2019
				27	29 Jan. 2019, 11 May 2019
				28	15 Apr. 2022, 14 May 2022
				29	15 Apr. 2022, 22 Jun. 2022
				30	15 Apr. 2022, 10 Jul. 2022
				31	15 Apr. 2022, 26 Aug. 2022
				32	15 Apr. 2022, 27 Sep. 2022
				33	31 Jan. 2023, 3 Mar. 2023
				34	31 Jan. 2023, 2 Apr. 2023

Appendix B. Additional Details on In Situ Chlorophyll-a Measurements

Table A4. Statistics on in situ temperature and chlorophyll-a measurements from the Geum River, Miho River, Nakdong River, and Yeongsan River in South Korea from 2018 to 2023 [39].

ID	River	Point	Latitude	Longitude	Year	Temperature (°C)			Chlorophyll-a (mg/m³)
ID	River	Point	Latitude	Longitude	Year	Min	Max	Mean	Min	Max	Mean
3012A32	Geum River	Gongjubo	36.465	127.100	2018	2.5	31.4	17.0	2.3	209.9	57.3
					2019	1.4	28.7	16.4	5.1	265.1	69.4
					2020	2.9	26.6	14.7	2.6	125.4	24.1
					2021	1.2	29.9	16.3	2.0	164.4	46.1
					2022	0.8	29.3	15.8	1.8	113.0	35.6
					2023	1.7	30.7	16.1	1.5	197.9	42.5
3012A07	Miho River	Sejongbo (Yeongi)	36.475	127.264	2018	0.5	30.2	15.4	2.5	160.0	35.9
					2019	2.6	26.7	15.2	3.9	155.6	38.6
					2020	3.9	26.8	14.5	3.1	84.8	17.8
					2021	1.1	28.5	15.6	2.3	116.9	28.1
					2022	3.1	29.9	15.6	2.0	85.7	27.9
					2023	1.6	27.7	15.2	2.7	57.3	24.3
2011A55	Nakdong River	Gangjeong Goryeong (Dasa)	35.843	128.457	2018	2.8	28.9	16.3	4.5	47.2	18.4
					2019	2.8	29.0	16.3	2.4	41.9	16.4
					2020	3.9	27.9	15.6	2.0	58.8	18.6
					2021	2.8	30.1	17.7	2.6	49.0	19.5
					2022	2.6	30.9	16.9	2.2	65.0	17.4
					2023	2.8	31.4	16.7	2.3	73.0	14.2
5004A35	Yeongsan River	Juksanbo (Juksan)	34.977	126.633	2018	2.5	32.1	17.2	4.5	174.2	56.2
					2019	3.9	31.1	16.1	6.8	205.0	60.9
					2020	4.8	30.8	16.0	4.3	238.5	64.1
					2021	3.6	30.9	17.8	15.6	174.3	71.3
					2022	4.0	30.9	18.1	7.3	108.5	41.9
					2023	3.2	33.7	17.6	2.0	123.2	28.6

Appendix C. Additional Results for Single-Period and Multi-Period Models

Table A5. Confusion matrix supplementing Table 10 for the single-period deep learning model.

Classes	(Actual) Algae Bloom	(Actual) Non-Algae Bloom
Randomly selected 100 patches (N = 26,214,400)
(Predicted) Algae bloom	1,714,600	290,282
(Predicted) Non-algae bloom	398,187	23,811,331
Geum River (172 patches, N = 45,088,768)
(Predicted) Algae bloom	1,296,032	188,543
(Predicted) Non-algae bloom	858,183	42,746,010
Miho River (306 patches, N = 80,216,064)
(Predicted) Algae bloom	2,104,250	1,177,266
(Predicted) Non-algae bloom	1,161,310	75,773,238
Yeongsan River (296 patches, N = 77,594,624)
(Predicted) Algae bloom	6,796,896	405,974
(Predicted) Non-algae bloom	4,595,979	65,795,775

Table A6. Confusion matrix supplementing Table 11 for the multi-period deep learning model.

Classes	(Actual) Algae Bloom	(Actual) Non-Algae Bloom
Randomly selected 57 patches (N = 14,942,208)
(Predicted) Algae bloom	863,782	106,433
(Predicted) Non-algae bloom	51,271	13,920,722
Geum River (130 patches, N = 34,078,720)
(Predicted) Algae bloom	1,679,115	371,836
(Predicted) Non-algae bloom	181,996	31,845,773
Miho River (247 patches, N = 64,749,568)
(Predicted) Algae bloom	2,617,991	559,589
(Predicted) Non-algae bloom	263,847	61,308,141
Yeongsan River (190 patches, N = 49,807,360)
(Predicted) Algae bloom	7,198,554	504,610
(Predicted) Non-algae bloom	201,723	41,902,473

References

Dai, Y.; Yang, S.; Zhao, D.; Hu, C.; Xu, W.; Anderson, D.M.; Li, Y.; Song, X.; Boyce, D.G.; Gibson, L.; et al. Coastal phytoplankton blooms expand and intensify in the 21st century. Nature 2023, 615, 280–284. [Google Scholar] [CrossRef] [PubMed]
Kudela, R.M.; Palacios, S.L.; Austerberry, D.C.; Accorsi, E.K.; Guild, L.S.; Torres-Perez, J. Application of hyperspectral remote sensing to cyanobacterial blooms in inland waters. Remote Sens. Environ. 2015, 167, 196–205. [Google Scholar] [CrossRef]
Heisler, J.; Glibert, P.M.; Burkholder, J.M.; Anderson, D.M.; Cochlan, W.; Dennison, W.C.; Dortch, Q.; Gobler, C.J.; Heil, C.A.; Humphries, E.; et al. Eutrophication and harmful algal blooms: A scientific consensus. Harmful Algae 2008, 8, 3–13. [Google Scholar] [CrossRef] [PubMed]
Igwaran, A.; Kayode, A.J.; Moloantoa, K.M.; Khetsha, Z.P.; Unuofin, J.O. Cyanobacteria harmful algae blooms: Causes, impacts, and risk management. Water Air Soil Pollut. 2024, 235, 71. [Google Scholar] [CrossRef]
Schwark, M.; Martínez Yerena, J.A.; Röhrborn, K.; Hrouzek, P.; Divoká, P.; Štenclová, L.; Delawská, K.; Enke, H.; Vorreiter, C.; Wiley, F.; et al. More than just an eagle killer: The freshwater cyanobacterium Aetokthonos hydrillicola produces highly toxic dolastatin derivatives. Proc. Natl. Acad. Sci. USA 2023, 120, e2219230120. [Google Scholar] [CrossRef]
Zhang, J.; Shi, K.; Paerl, H.W.; Rühland, K.M.; Yuan, Y.; Wang, R.; Chen, J.; Ge, M.; Zheng, L.; Zhang, Z.; et al. Ancient DNA reveals potentially toxic cyanobacteria increasing with climate change. Water Res. 2023, 229, 119435. [Google Scholar] [CrossRef]
Oyama, Y.; Fukushima, T.; Matsushita, B.; Matsuzaki, H.; Kamiya, K.; Kobinata, H. Monitoring levels of cyanobacterial blooms using the visual cyanobacteria index (VCI) and floating algae index (FAI). Int. J. Appl. Earth Obs. Geoinf. 2015, 38, 335–348. [Google Scholar] [CrossRef]
Hu, C. A novel ocean color index to detect floating algae in the global oceans. Remote Sens. Environ. 2009, 113, 2118–2129. [Google Scholar] [CrossRef]
Siddiqui, M.D.; Zaidi, A.Z.; Abdullah, M. Performance evaluation of newly proposed seaweed enhancing index (SEI). Remote Sens. 2019, 11, 1434. [Google Scholar] [CrossRef]
Son, Y.B.; Min, J.E.; Ryu, J.H. Detecting massive green algae (Ulva prolifera) blooms in the Yellow Sea and East China Sea using geostationary ocean color imager (GOCI) data. Ocean Sci. J. 2012, 47, 359–375. [Google Scholar] [CrossRef]
Ma, Y.; Wong, K.; Tsou, J.Y.; Zhang, Y. Investigating spatial distribution of green-tide in the Yellow Sea in 2021 using combined optical and SAR images. J. Mar. Sci. Eng. 2022, 10, 127. [Google Scholar] [CrossRef]
Bielski, A.; Toś, C. Remote sensing of the water quality parameters for a shallow dam reservoir. Appl. Sci. 2022, 12, 6734. [Google Scholar] [CrossRef]
Wu, L.; Wang, L.; Min, L.; Hou, W.; Guo, Z.; Zhao, J.; Li, N. Discrimination of algal-bloom using spaceborne SAR observations of Great Lakes in China. Remote Sens. 2018, 10, 767. [Google Scholar] [CrossRef]
Wang, S.; Liu, L.; Qu, L.; Yu, C.; Sun, Y.; Gao, F.; Dong, J. Accurate Ulva prolifera regions extraction of UAV images with superpixel and CNNs for ocean environment monitoring. Neurocomputing 2019, 348, 158–168. [Google Scholar] [CrossRef]
Cui, T.; Li, F.; Wei, Y.; Yang, X.; Xiao, Y.; Chen, X.; Liu, R.; Ma, Y.; Zhang, J. Super-resolution optical mapping of floating macroalgae from geostationary orbit. Appl. Opt. 2020, 59, C70–C77. [Google Scholar] [CrossRef]
Liang, X.J.; Qin, P.; Xiao, Y.F.; Kim, K.Y.; Liu, R.J.; Chen, X.Y.; Wang, Q.B. Automatic remote sensing detection of floating macroalgae in the yellow and east china seas using extreme learning machine. J. Coast. Res. 2019, 90, 272–281. [Google Scholar] [CrossRef]
Qi, L.; Wang, M.; Hu, C.; Holt, B. On the capacity of Sentinel-1 synthetic aperture radar in detecting floating macroalgae and other floating matters. Remote Sens. Environ. 2022, 280, 113188. [Google Scholar] [CrossRef]
Guo, Y.; Gao, L.; Li, X. A deep learning model for green algae detection on SAR images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4210914. [Google Scholar] [CrossRef]
Zhu, S.; Wu, Y.; Ma, X. Deep learning-based algal bloom identification method from remote sensing images—Take China’s Chaohu Lake as an example. Sustainability 2023, 15, 4545. [Google Scholar] [CrossRef]
Adegun, A.A.; Viriri, S.; Tapamo, J.R. Review of deep learning methods for remote sensing satellite images classification: Experimental survey and comparative analysis. J. Big Data 2023, 10, 93. [Google Scholar] [CrossRef]
Zhao, Y.; Zhang, X.; Feng, W.; Xu, J. Deep learning classification by ResNet-18 based on the real spectral dataset from multispectral remote sensing images. Remote Sens. 2022, 14, 4883. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
Wang, L.; Zhang, J.; Liu, P.; Choo, K.R.; Huang, F. Spectral—spatial multi-feature-based deep learning for hyperspectral remote sensing image classification. Soft Comput. 2017, 21, 213–221. [Google Scholar] [CrossRef]
Asatryan, V.; Stepanyan, L.; Hovsepyan, A.; Khachikyan, T.; Mamyan, A.; Hambaryan, L. The dynamics of phytoplankton seasonal development and its horizontal distribution in Lake Sevan (Armenia). Environ. Monit. Assess. 2022, 194, 757. [Google Scholar] [CrossRef] [PubMed]
Shi, X.; Gu, L.; Jiang, T.; Zheng, X.; Dong, W.; Tao, Z. Retrieval of chlorophyll-a concentrations using Sentinel-2 MSI images in Lake Chagan based on assessments with machine learning models. Remote Sens. 2022, 14, 4924. [Google Scholar] [CrossRef]
Liu, M.; Ling, H.; Wu, D.; Su, X.; Cao, Z. Sentinel-2 and Landsat-8 observations for harmful algae blooms in a small eutrophic lake. Remote Sens. 2021, 13, 4479. [Google Scholar] [CrossRef]
Harmful Algal Bloom Monitoring System. Available online: https://coastalscience.noaa.gov/science-areas/habs/hab-monitoring-system (accessed on 2 July 2024).
Issue 42: Algal Blooms. Available online: https://medium.com/@planetsnapshots/issue-42-algal-blooms-6d7385b53a50 (accessed on 2 July 2024).
Hong, D.G.; Jeong, K.S.; Kim, D.K.; Joo, G.J. Long-term ecological research in the Nakdong River: Application of ecological informatics to harmful algal blooms. In Ecological Informatics: Data Management and Knowledge Discovery; Springer: Cham, Switzerland, 2018; pp. 435–453. [Google Scholar]
Google Maps. Available online: https://www.google.com/maps (accessed on 2 July 2024).
Bing Maps. Available online: https://www.bing.com/maps (accessed on 2 July 2024).
TN on Quality Assessment for PlanetScope (Dove). Available online: https://earth.esa.int/eogateway/documents/20142/37627/Technical+Note+on+Quality+Assessment+for+PlanetScope+%28DOVE%29.pdf/518ec6d2-d0bd-87ae-5a59-39e9dd7cc25f (accessed on 2 July 2024).
Planet Imagery Product Specifications. Available online: https://assets.planet.com/docs/Planet_Combined_Imagery_Product_Specs_letter_screen.pdf (accessed on 2 July 2024).
Ghuffar, S. DEM Generation from multi satellite Planetscope images. Remote Sens. 2018, 10, 1462. [Google Scholar] [CrossRef]
Vanhellemont, Q. Daily metre-scale mapping of water turbidity using CubeSat imagery. Opt. Express 2019, 27, A1372–A1399. [Google Scholar] [CrossRef]
Niroumand-Jadidi, M.; Bovolo, F.; Bruzzone, L.; Gege, P. Physics-based bathymetry and water quality retrieval using planetscope imagery: Impacts of 2020 COVID-19 lockdown and 2019 extreme flood in the Venice Lagoon. Remote Sens. 2020, 12, 2381. [Google Scholar] [CrossRef]
Wicaksono, P.; Lazuardi, W. Assessment of PlanetScope images for benthic habitat and seagrass species mapping in a complex optically shallow water environment. Int. J. Remote Sens. 2018, 39, 5739–5765. [Google Scholar] [CrossRef]
Environmental Spatial Information Service. Available online: https://egis.me.go.kr/intro/land.do (accessed on 2 July 2024).
Water Environment Information System. Available online: https://water.nier.go.kr/web (accessed on 2 July 2024).
Dall’Olmo, G.; Gitelson, A.A. Effect of bio-optical parameter variability and uncertainties in reflectance measurements on the remote estimation of chlorophyll-a concentration in turbid productive waters: Modeling results. Appl. Opt. 2006, 45, 3577–3592. [Google Scholar] [CrossRef]
Moses, W.J.; Gitelson, A.A.; Berdnikov, S.; Povazhnyy, V. Satellite estimation of chlorophyll-a concentration using the red and NIR bands of MERIS—The Azov sea case study. IEEE Geosci. Remote Sens. Lett. 2009, 6, 845–849. [Google Scholar] [CrossRef]
Mishra, S.; Mishra, D.R. Normalized difference chlorophyll index: A novel model for remote estimation of chlorophyll-a concentration in turbid productive waters. Remote Sens. Environ. 2012, 117, 394–406. [Google Scholar] [CrossRef]
Cui, T.W.; Zhang, J.; Sun, L.E.; Jia, Y.J.; Zhao, W.; Wang, Z.L.; Meng, J.M. Satellite monitoring of massive green macroalgae bloom (GMB): Imaging ability comparison of multi-source data and drifting velocity estimation. Int. J. Remote Sens. 2012, 33, 5513–5527. [Google Scholar] [CrossRef]
Hu, L.; Hu, C.; Ming-Xia, H.E. Remote estimation of biomass of Ulva prolifera macroalgae in the Yellow Sea. Remote Sens. Environ. 2017, 192, 217–227. [Google Scholar] [CrossRef]
Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
Zagoruyko, S.; Komodakis, N. Learning to compare image patches via convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
Salman, S.; Liu, X. Overfitting mechanism and avoidance in deep neural networks. arXiv 2019, arXiv:1901.06566. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Zhou, B.; Zhao, H.; Puig, X.; Xiao, T.; Fidler, S.; Barriuso, A.; Torralba, A. Semantic understanding of scenes through the ade20k dataset. Int. J. Comput. Vis. 2019, 127, 302–321. [Google Scholar] [CrossRef]
Xiao, T.; Liu, Y.; Zhou, B.; Jiang, Y.; Sun, J. Unified perceptual parsing for scene understanding. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Sulistiyo, M.D.; Kawanishi, Y.; Deguchi, D.; Hirayama, T.; Ide, I.; Zheng, J.Y.; Murase, H. Attribute-aware semantic segmentation of road scenes for understanding pedestrian orientations. In Proceedings of the 21st IEEE International Conference on Intelligent Transportation Systems, Maui, HI, USA, 4–7 November 2018. [Google Scholar]
Google Earth Engine. Available online: https://earthengine.google.com (accessed on 2 July 2024).

Figure 3. Flow chart of the proposed method for identification of floating algae bloom in this study.

Figure 4. Surface remote sensing reflectance spectrum for waters with different concentrations of chlorophyll-a [12].

Figure 5. A dataset example from PlanetScope SuperDove for the Yeongsan River on 15 April 2022: (a) true color (R–G–B), (b) false color (RE–G–NIR), and (c) label.

Figure 6. The procedure for creating the input dataset for the single-period and multi-period model.

Figure 7. Examples of pre-bloom (1 January 2023) and during-bloom (2 April 2023) images for the Yeongsan River: (a) true color (R–G–B), (b) false color (R–G–NIR), and (c) label.

Figure 8. An overview of the Swin Transformer. (a) Hierarchical feature maps for reducing computational complexity. (b) The shifted window approach was used when calculating self-attention. (c) Two successive Swin Transformer blocks. (d) The core architecture of the Swin Transformer [22].

Figure 9. Confusion matrix and IoU/mIoU calculation for image segmentation.

Figure 10. Conceptual diagram illustrating the inference result generation using the padding technique.

Figure 11. Example of inference results before and after applying padding for Yeongsan River.

Figure 12. Comparison of chlorophyll-a concentrations between pixels labeled as algae bloom and non-algae bloom. Box min represents the minimum value of the box (25th percentile), and box max represents the maximum value of the box (75th percentile).

Figure 13. Histograms of NDVI and EVI values for pixels labeled as algae bloom or non-algae bloom. Approximately 100 patches were sampled from the 1998 patches used as input images for the floating algae bloom identification model.

Figure 14. Box plots of NDVI and EVI values for pixels labeled as algae bloom or non-algae bloom. Approximately 100 patches were sampled from the 1998 patches used as input images for the floating algae bloom identification model. Box min represents the minimum value of the box (25th percentile), and box max represents the maximum value of the box (75th percentile).

Figure 15. Example of low prediction accuracy at the river edges of Miho River.

Figure 16. Qualitative accuracy was obtained using 100 random patches: (a) RGB true color composite, (b) false color composite using red–green–NIR bands, (c) a label image, and (d) predicted results of the floating algae bloom identification model.

Figure 17. Qualitative accuracy of a test set for the 296 patches from the Yeongsan River: (a) RGB true color composite, (b) false color composite using red–green–NIR bands, (c) a label image, and (d) predicted results of the floating algae bloom identification model.

Figure 18. Example with significant differences in detection rates in the Geum River (ROI_A: well-predicted region, ROI_B: poorly predicted region).

Figure 19. NDVI and EVI histogram distributions in ROI_A and ROI_B (ROI_A: well-predicted region, ROI_B: poorly predicted region).

Figure 20. Qualitative accuracy was obtained using 57 random patches: (a) RGB true color composite of the during-bloom image, (b) false color composite using red–green–NIR bands of the during-bloom image, (c) a label image, and (d) predicted results of the floating algae bloom identification model.

Figure 21. Qualitative accuracy of a test set for the 130 patches from the Geum River: (a) RGB true color composite of the during-bloom image, (b) false color composite using red–green–NIR bands of the during-bloom image, (c) a label image, and (d) predicted results of the floating algae bloom identification model.

Figure 22. The in situ station at Baekjebo (Buyeo) in South Korea: (a) the location, and (b) time-series changes in chlorophyll-a measurements in 2020.

Figure 23. A total of 12 Sentinel-2 RGB time-series images obtained in March, June, August, October, and November of 2020.

Figure 24. Inference results using a total of 12 Sentinel-2 images acquired in March, June, August, October, and November 2020.

Table 1. Information of water bodies and the number of satellite images used in this study.

Country	Water Body Name	Latitude	Longitude	Max Length	Max Width	Surface Area	Average Depth	No. of Scenes
Armenia	Lake Sevan	40.533	44.990	74 km	32 km	1242 km²	26.8 m	1
China	Lake Chagan	45.250	124.350	37 km	17 km	307 km²	4.0 m	16
China	Lake Xingyun	24.335	102.784	11 km	6 km	35 km²	5.3 m	10
Poland	Lake Turawskie	50.720	18.129	7.5 km	4 km	24 km²	5.0 m	4
South Korea	Geum River	36.432	127.061	398 km	1.7 km	9912 km²	5.3 m	19
South Korea	Miho River	36.516	127.322	89 km	0.6 km	1861 km²	0.3–0.7 m	21
South Korea	Nakdong River	35.808	128.437	510 km	2.3 km	23,384 km²	7.4 m	2
South Korea	Yeongsan River	34.930	126.540	137 km	1.3 km	3468 km²	5.8 m	18
USA	Big Sarasota Pass	27.295	−82.564	-	-	-	-	2
USA	Lake Clear	39.068	−122.84	31 km	13 km	180 km²	8.2 m	2
USA	Lake Okeechobee	27.040	−80.748	58 km	47 km	1900 km²	2.7 m	3
USA	Saginaw Bay	43.779	−83.520	-	-	2960 km²	-	2
USA/Canada	Lake St. Clair	42.412	−82.684	42 km	39 km	1114 km²	3.4 m	3
Venezuela	Lake Valencia	10.175	−67.790	30 km	20 km	350 km²	18.0 m	2

Table 2. The specifications of PlanetScope satellites [33].

Characteristic	Dove-Classic (PS2)	Dove-R (PS2.SD)	SuperDove (PSB.SD)
Sensor Description	Four-band frame imager; split-frame visible + NIR filter	Four-band frame imager; butcher-block filter providing blue, green, red, and NIR stripes	Eight-band frame imager; butcher-block filter providing blue, green, red, red-edge, and NIR stripes
Spectral Bands	Blue: 455–515 nm Green: 500–590 nm Red: 590–670 nm NIR: 780–860 nm	Blue: 464–517 nm Green: 547–585 nm Red: 650–682 nm NIR: 846–888 nm	Coastal Blue: 431–452 nm Blue: 465–515 nm Green I: 513–549 nm Green II: 547–583 nm Yellow: 600–620 nm Red: 650–680 nm Red-Edge: 697–713 nm NIR: 845–885 nm
Ground Sample Distance (nadir)	3.0–4.1 m (approximate, altitude-dependent)		3.7–4.2 m (approximate, altitude-dependent)
Frame Size	24 × 8 km	24 × 16 km	32.5 × 19.6 km
Revisit Time	Daily at nadir
Availability Date	July 2014–April 2022	March 2019–April 2022	March 2020–present

Table 3. The number of PlanetScope scenes obtained per year and season (DJF: December–January–February, MAM: March–April–May, JJA: June–July–August, SON: September–October–November).

Year	DJF	MAM	JJA	SON
2018	-	-	-	2
2019	3	4	4	4
2020	2	5	21	14
2021	4	1	9	10
2022	1	5	8	3
2023	3	2	-	-
Sum	105 scenes

Table 4. The ratio of algae bloom pixels in the 1998 patches used in this study.

Algae Bloom Ratio (%)	0–10	10–20	20–30	30–40	40–50	50–60	60–70	70–80	80–90	90–100	Total
Number of patches	940	260	164	90	95	60	57	59	60	213	1998
Ratio (%)	47.05	13.01	8.21	4.51	4.76	3.00	2.85	2.95	3.00	10.66	100

Table 5. Composition of the test for single-period datasets.

Experiment	Training	Validation	Test	Total
1	1654	172	172 (Geum River)	1998
2	1386	306	306 (Miho River)	1998
3	1406	296	296 (Yeongsan River)	1998

Table 6. Composition of the test for multi-period datasets.

Experiment	Training	Validation	Test	Total
1	350	87	130 (Geum River)	567
2	256	64	247 (Miho River)	567
3	302	75	190 (Yeongsan River)	567

Table 7. A series of Swin Transformer models [22].

Model	Channel Numbers in the Hidden Layers of the First Stage	Layer Numbers
Swin-T	96	{2, 2, 6, 2}
Swin-S	96	{2, 2, 18, 2}
Swin-B	128	{2, 2, 18, 2}
Swin-L	192	{2, 2, 18, 2}

Table 8. Detailed architecture specifications of the Swin-T model. “Concat n × n” indicates a concatenation of n × n neighboring features in a patch. This operation results in a downsampling of the feature map by a rate of n. A linear layer with an output dimension of 96 is denoted by “96-d”. “

W i n d o w s i z e 7 \times 7

” indicates a multi-head self-attention module with a window size of

7 \times 7

[22].

Table 8. Detailed architecture specifications of the Swin-T model. “Concat n × n” indicates a concatenation of n × n neighboring features in a patch. This operation results in a downsampling of the feature map by a rate of n. A linear layer with an output dimension of 96 is denoted by “96-d”. “

W i n d o w s i z e 7 \times 7

” indicates a multi-head self-attention module with a window size of

7 \times 7

[22].

Stage	Downsampling Rate (Output Size)	Swin-T
Stage 1	4× (56 × 56)	Concat 4 × 4, 96-d, LN
Stage 1	4× (56 × 56)	$[\begin{matrix} W i n d o w s i z e 7 \times 7, \\ \dim 96, h e a d 3 \end{matrix}] \times 2$
Stage 2	8× (28 × 28)	Concat 2 × 2, 192-d, LN
Stage 2	8× (28 × 28)	$[\begin{matrix} W i n d o w s i z e 7 \times 7, \\ \dim 192, h e a d 6 \end{matrix}] \times 2$
Stage 3	16× (14 × 14)	Concat 2 × 2, 384-d, LN
Stage 3	16× (14 × 14)	$[\begin{matrix} W i n d o w s i z e 7 \times 7, \\ \dim 384, h e a d 12 \end{matrix}] \times 6$
Stage 4	32× (7 × 7)	Concat 2 × 2, 768-d, LN
Stage 4	32× (7 × 7)	$[\begin{matrix} W i n d o w s i z e 7 \times 7, \\ \dim 768, h e a d 24 \end{matrix}] \times 2$

Table 9. Structure and hyperparameters of the Swin Transformer adopted in the floating algae bloom identification model.

Hyperparameters (Optimized)	Description	Values
Input patch size	The dimensions (width and height) of the input patches	(512, 512)
Input channels	The number of channels in the input data	6
Batch size	The number of patches that are processed together in one forward and backward pass through the model during training	4
Window size	The size of the window used in each Transformer block for self-attention	7
Activation function	A mathematical function applied to the output of each layer in the model	Gaussian Error Linear Unit (GERU)
Optimizer	An algorithm used to adjust the parameters (weights and biases) of the model during training to minimize the loss function	AdamW
Dropout ratio	The proportion of neurons in the model that are randomly “dropped out” or ignored during training	0.1
Learning rate	A hyperparameter that determines the size of the steps taken during parameter updates by the optimizer	0.0001
Loss function	A function that measures the difference between the predicted output of the model and the actual target output during training	Cross-Entropy Loss

Table 10. Single-period accuracy evaluation results.

Composition of Test Set	Accuracy	Precision	Recall	F1-Score	IoU (Algae Bloom)	IoU (Non-Algae Bloom)	mIoU
Randomly selected 100 patches	97.37	85.52	81.15	83.28	71.35	97.19	84.27
Geum River (172 patches)	97.68	87.30	60.16	71.23	55.32	97.61	76.47
Miho River (306 patches)	97.08	64.12	64.44	64.28	47.36	97.01	72.18
Yeongsan River (296 patches)	92.87	91.38	85.92	88.56	57.61	92.93	75.27

Table 11. Multi-period accuracy evaluation results.

Composition of Test Set	Accuracy	Precision	Recall	F1-Score	IoU (Algae Bloom)	IoU (Non-Algae Bloom)	mIoU
Randomly selected 57 patches	98.94	89.03	94.40	91.63	84.56	98.88	91.72
Geum River (130 patches)	98.37	81.87	90.22	85.84	75.20	98.29	86.74
Miho River (247 patches)	98.73	82.39	90.84	86.41	76.07	98.67	87.37
Yeongsan River (190 patches)	98.58	93.45	97.27	95.32	91.06	98.34	94.70

Table 12. Comparison of accuracy evaluation results from randomly selected test sets in single-period and multi-period scenarios.

Composition of Test Set	Accuracy	Precision	Recall	F1-Score	IoU (Algae Bloom)	IoU (Non-Algae Bloom)	mIoU
Single-period	97.37	85.52	81.15	83.28	71.35	97.19	84.27
Multi-period	98.94	89.03	94.40	91.63	84.56	98.88	91.72

Table 13. Information on the in situ station measuring chlorophyll-a concentration [39].

ID	River	Point	Address	Latitude	Longitude
3012A42	Geum River	Baekjebo (Buyeo)	Jawang-ri, Buyeo-eup, Buyeo-gun, Chungcheongnam-do, South Korea (500 m upstream of Baekjebo)	36.322	126.944

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ahn, J.; Kim, K.; Kim, Y.; Kim, H.; Lee, Y. Detection of Floating Algae Blooms on Water Bodies Using PlanetScope Images and Shifted Windows Transformer Model. Remote Sens. 2024, 16, 3791. https://doi.org/10.3390/rs16203791

AMA Style

Ahn J, Kim K, Kim Y, Kim H, Lee Y. Detection of Floating Algae Blooms on Water Bodies Using PlanetScope Images and Shifted Windows Transformer Model. Remote Sensing. 2024; 16(20):3791. https://doi.org/10.3390/rs16203791

Chicago/Turabian Style

Ahn, Jihye, Kwangjin Kim, Yeji Kim, Hyunok Kim, and Yangwon Lee. 2024. "Detection of Floating Algae Blooms on Water Bodies Using PlanetScope Images and Shifted Windows Transformer Model" Remote Sensing 16, no. 20: 3791. https://doi.org/10.3390/rs16203791

APA Style

Ahn, J., Kim, K., Kim, Y., Kim, H., & Lee, Y. (2024). Detection of Floating Algae Blooms on Water Bodies Using PlanetScope Images and Shifted Windows Transformer Model. Remote Sensing, 16(20), 3791. https://doi.org/10.3390/rs16203791

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection of Floating Algae Blooms on Water Bodies Using PlanetScope Images and Shifted Windows Transformer Model

Abstract

1. Introduction

2. Study Area

3. Data

4. Methods

4.1. Overview

4.2. Data Pre-Processing

4.3. Dataset Generation

4.3.1. Image Labeling

4.3.2. Input Image Preparation for Single-Period Model

4.3.3. Input Image Preparation for Multi-Period Model

4.4. Deep Learning Modeling

4.4.1. Dataset Split

4.4.2. Swin Transformer

4.4.3. Model Training and Validation

4.5. Inference Result Generation

5. Results and Discussion

5.1. Comparisons with In Situ Measurements and Vegetation Indices

5.2. Accuracy Evaluation for Single-Period Experiment

5.3. Accuracy Evaluation for Multi-Period Experiment

5.4. Time-Series Simulation Using Sentinel-2 Images

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Additional Details on Input Image Generation for Multi-Period

Appendix B. Additional Details on In Situ Chlorophyll-a Measurements

Appendix C. Additional Results for Single-Period and Multi-Period Models

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI