FE-YOLO: An Efficient Deep Learning Model Based on Feature-Enhanced YOLOv7 for Microalgae Identification and Detection

Ding, Gege; Shi, Yuhang; Liu, Zhenquan; Wang, Yanjuan; Yao, Zhixuan; Zhou, Dan; Zhu, Xuexiu; Li, Yiqin

doi:10.3390/biomimetics10010062

Open AccessArticle

FE-YOLO: An Efficient Deep Learning Model Based on Feature-Enhanced YOLOv7 for Microalgae Identification and Detection

by

Gege Ding

¹

,

Yuhang Shi

²,

Zhenquan Liu

²,

Yanjuan Wang

²

,

Zhixuan Yao

¹,

Dan Zhou

¹,

Xuexiu Zhu

¹ and

Yiqin Li

^1,*

¹

China Waterborne Transport Research Institute, Beijing 100088, China

²

School of Railway Intelligent Engineering, Dalian Jiaotong University, Dalian 116028, China

^*

Author to whom correspondence should be addressed.

Biomimetics 2025, 10(1), 62; https://doi.org/10.3390/biomimetics10010062

Submission received: 13 October 2024 / Revised: 20 December 2024 / Accepted: 14 January 2025 / Published: 16 January 2025

(This article belongs to the Section Bioinspired Sensorics, Information Processing and Control)

Download

Browse Figures

Versions Notes

Abstract

:

The identification and detection of microalgae are essential for the development and utilization of microalgae resources. Traditional methods for microalgae identification and detection have many limitations. Herein, a Feature-Enhanced YOLOv7 (FE-YOLO) model for microalgae cell identification and detection is proposed. Firstly, the feature extraction capability was enhanced by integrating the CAGS (Coordinate Attention Group Shuffle Convolution) attention module into the Neck section. Secondly, the SIoU (SCYLLA-IoU) algorithm was employed to replace the CIoU (Complete IoU) loss function in the original model, addressing the issues of unstable convergence. Finally, we captured and constructed a microalgae dataset containing 6300 images of seven species of microalgae, addressing the issue of a lack of microalgae cell datasets. Compared to the YOLOv7 model, the proposed method shows greatly improved average Precision, Recall, mAP@50, and mAP@95; our proposed algorithm achieved increases of 9.6%, 1.9%, 9.7%, and 6.9%, respectively. In addition, the average detection time of a single image was 0.0455 s, marking a 9.2% improvement.

Keywords:

microalgal detection; feature fusion; object detection; deep learning

1. Introduction

Microalgae are widely distributed on Earth and applied in various fields such as biomedicine, new energy, food, and healthcare, with over 20,000 known species. Before their development and utilization, accurate species identification is essential. The detection and identification of microalgae are crucial for unlocking their full potential in biofuel production, pharmaceutical development, and environmental protection. Moreover, microalgae have inspired innovations in biomimicry, with their efficient photosynthesis mechanisms serving as models for advanced solar energy systems, while their unique structural properties have led to biomimetic materials used in sensors, coatings, and drug delivery systems [1]. For example, F. Zhang et al. [2] explored the use of microalgae in biohybrid microrobots, applying biomimicry by drawing inspiration from the propulsion and phototaxis behaviors of natural microalgae. The study also discusses methods for functionalizing the microalgae surface to enhance their performance, with potential applications in drug delivery, imaging, and water purification. Finally, the authors highlight the future potential and challenges of this biomimetic technology. Thus, microalgae represent a promising resource not only for practical applications in multiple industries but also for advancing biomimetic technologies, unlocking new possibilities for sustainable solutions and innovative designs.

Since its introduction in 2006, deep learning technology has become the most discussed research direction in artificial intelligence and is quickly becoming the focal point of global research [3]. The introduction of deep learning has brought new approaches to the identification and detection of microalgae. In algae research, Qian et al. [4] proposed a novel multi-object deep learning framework for algae analysis based on Faster R-CNN. This framework can simultaneously address various tasks such as genus classification, algae detection, and organism identification. Although this work achieved multi-objective detection, it still faces the challenge of insufficient accuracy. Samantaray A et al. [5] proposed a computer vision system based on deep learning for algae monitoring with a wide range of applicable platforms but only 82% accuracy. Cho S. et al. [6] and Deglint J. L. et al. [7] explored the potential application of deep learning on algae in conjunction with 3D printing and other devices, achieving sufficient accuracy, but at a high cost. Wang et al. [8] introduced an improved Faster R-CNN model using Residual Network 50 (ResNet-50) and the Feature Pyramid Network (FPN) module to enhance feature extraction and address multi-scale target detection, effectively reducing missed detections. To enhance identification accuracy, Cao [9] proposed a ballast water microalgae identification method based on an improved YOLOv3 model. This method employs a lightweight MobileNet network instead of the original Darknet-53 and introduces an enhanced spatial pyramid pool (SPP) while optimizing the YOLOv3 loss function with the Complete IoU (CIoU) algorithm. Although these improvements increased detection accuracy, issues of missed detections remained. Pant [10] enhanced the original ResNeXt CNN model by reducing the size of convolutional kernels and filters, achieving high accuracy in differentiating discoidal algal genera. Yet, the model’s generalization was limited due to the homogeneous nature of the dataset’s algal species. Krause et al. [11] used a fully convolutional neural network to predict bounding boxes for detecting diatoms in microscope images, achieving good results in accuracy, speed, and reduced leakage detection. However, the small dataset influenced detection outcomes. Jianhong Dong et al. [12] studied the M-YOLO v8s model by replacing the C2F module and introducing Focal SIOU loss, optimizing the network structure, improving accuracy (accuracy increased to 98.9%) and detection speed, while reducing computational resource consumption (parameters and FLOPs were significantly reduced). However, the model may exhibit instability under certain extreme microalgae image conditions, and further optimization of image preprocessing methods is needed to improve adaptability.

Given that the YOLOv7 algorithm demonstrates efficient real-time object detection capabilities, enabling the rapid identification of small microalgae cells while maintaining high accuracy, it is particularly well-suited for handling complex backgrounds and densely packed targets, making it an ideal choice for microalgae detection. Therefore, we propose improvements to the YOLOv7 algorithm to further enhance its performance in this specific application [13]. To address the existing constraints of microalgae detection methods, a novel approach for microalgae identification and detection is proposed, based on global information and feature fusion. Firstly, we propose adding CAGS to our network structure. CAGS processes the feature maps by considering both width and height, utilizing depth-wise separable convolution (DWConv) [14] and channel shuffle mechanisms [15]. This enhancement boosts the network’s feature extraction capability without increasing the computational load. In this paper, we adopt SIoU [14] as the loss function for our method, addressing the issue of unstable network convergence that arises when the aspect ratio of the predicted frame matches that of the real frame. Furthermore, we acquired seven common microalgal samples, including Chaetoceros, Chlorella, Chrysophyta, Prorocentrum lima, Karenia, Dunaliella, and Phaeodactylum. A dataset containing 6300 images was created using microscopic photography. The proposed method was tested and compared with the latest classical algorithms using this dataset. The experiments indicate that the FE-YOLO proposed in this study shows significant improvements over other state-of-the-art methods in terms of accuracy, Recall, [email protected], [email protected], and other metrics.

The research conducted in this paper is of significant importance for the rapid identification and detection of microalgal cells, the advancement and utilization of microalgal resources, the protection of marine ecological environments, and the mitigation of harmful algal bloom disasters. This study provides a crucial foundation for enhancing the efficiency and accuracy of microalgal detection methods, which can lead to better resource management and environmental protection strategies. Furthermore, the findings of this research align with the growing field of biomimicry, as microalgae’s natural processes serve as a model for developing more efficient detection systems. By mimicking biological mechanisms, such as the way organisms recognize and respond to environmental cues, these advanced detection methods can improve not only the speed and accuracy of microalgal identification but also their application in sustainable technologies [16]. This research, therefore, plays a key role in advancing both the utilization of microalgae and the protection of marine ecosystems, supporting the development of smarter, more adaptive systems for monitoring water quality and mitigating environmental hazards.

2. Materials and Methods

2.1. Dataset Construction and Processing

The dataset forms the foundation of object detection, with the quality of the data determining the upper limit of detection accuracy. Currently, one of the primary challenges in the field of microalgae detection is the lack of high-quality public datasets, which significantly hinders the development of intelligent microalgae detection technologies.

In this study, seven common microalgae species—Chaetoceros, Chlorella, Chrysophyta, Prorocentrum lima, Karenia, Dunaliella, and Phaeodactylum (all sourced from the Liaoning Marine Technology Research Institute)—were selected as experimental samples. A microalgae dataset comprising images was established using microscopy (Olympus CKX53, Microscope Central, Feasterville, PA, USA) for image acquisition. A microscope was used to capture images under a 40× objective lens and the MakeSense online annotation tool was used for annotation, with the annotation files stored in YOLO format, as shown in Figure 1. Initially, 2100 microalgae images were captured using a microscope. To prevent overfitting due to insufficient data, data augmentation techniques, including the addition of salt-and-pepper noise and random scaling, were applied to the images. This process expanded the dataset to a total of 6300 images, ensuring an equal number of images for each algae species.

2.2. FE-YOLO Algorithm

YOLOv7 faces challenges in detecting small and densely packed microalgae cells in the dataset, as well as issues with background interference and instability in model convergence, necessitating algorithmic improvements for enhanced accuracy and robustness. To address these challenges, this paper proposes the FE-YOLO algorithm. First, the CAGS module is integrated into the Neck section of YOLOv7 to enhance detection accuracy without sacrificing training or inference speed. Second, the SIoU loss function is adopted in place of CIoU to improve model convergence stability and further enhance detection performance.

2.2.1. Optimized Network Architecture

The FE-YOLO model structure proposed in this paper is illustrated in Figure 2. The CAGS module is incorporated into the Neck section, allowing features to be fused while considering both channel and directional correlations without adding additional computational overhead. By optimizing the network structure and introducing the CAGS module, we not only improved training and detection speeds but also enhanced feature extraction capabilities, leading to superior performance in microalgae detection tasks.

2.2.2. CAGS Module

The Coordinate Attention (CA) mechanism [17] effectively understands and utilizes information from different channels and positions in input feature maps. However, this mechanism struggles to capture relationships between specific positions when handling long-range dependencies. As a result, the CA attention mechanism performs poorly in capturing correlations between spatially distant pixels.

To address these issues, this paper proposes integrating GSConv (Group Shuffle Convolution) [18] into the CA attention mechanism, forming the CAGS attention module. The network structure is depicted in Figure 3. Firstly, represent the input feature map as [C, H, W], where C denotes the number of channels, H denotes the height, and W denotes the width. Perform global average pooling separately along the width and height dimensions to obtain feature maps [C, 1, W] and [C, H, 1], respectively. Then, transpose these two feature maps to align them in the same dimensions, resulting in a merged feature map [C, H + W, 1]. Subsequently, process the merged feature map using GSConv operations.

The GSConv operation first applies standard convolution to the input feature map, resulting in a feature map with half the original number of channels, denoted as C/2. Subsequently, it applies Depthwise Separable Convolution (DWConv) [19] to reduce computational load by performing convolution operations independently across individual channels. After DWConv, another feature map with C/2 channels is obtained. These two feature maps are then concatenated to form a feature map with C channels. Finally, shuffle operations [20] are used to facilitate better feature fusion by promoting information interaction between different convolutional layers, resulting in the target number of output feature maps.

Process the feature maps obtained through GSConv operations with convolution, normalization, and non-linear activation functions. Then, split the feature maps into two parallel stages representing width and height information, resulting in two feature maps: [C, 1, H] and [C, 1, W]. Transpose these two feature maps to convert them into [C, H, 1] and [C, 1, W]. Use a 1 × 1 convolution layer to adjust the channel numbers and apply the sigmoid activation function to obtain attention weights in the width and height dimensions. Finally, fuse the resulting attention weights with the original input feature map and the feature map processed by CAGS to obtain the adjusted feature representation under the CAGS attention mechanism. This adjusted feature representation will be used for subsequent tasks in the model, enabling more effective capture of key information and enhancing overall performance.

2.2.3. Optimized Loss Function

The loss function used in the original YOLOv7 model is the CIoU loss, with the formula as follows:

C I o U = I o U - \frac{ρ^{2} (b, b^{g t})}{c^{2}} - α v

(1)

α = \frac{v}{1 - I o U + v}

(2)

v = \frac{4}{π^{2}} (\arctan \frac{w^{g t}}{h^{g t}} - \arctan \frac{w}{h})^{2}

(3)

I o U = \frac{A \cap B}{A \cup B}

(4)

where A is the predicted box and B is the ground truth box.

w^{g t}

and

h^{g t}

represent the height and width of the ground truth box, respectively, while w and h represent the height and width of the predicted box, respectively. The term

v

primarily measures the consistency of the aspect ratio,

α

is the balancing parameter,

b

and

b^{g t}

represent the center points of the predicted and ground truth boxes, respectively, and ρ denotes the Euclidean distance between these two center points. c represents the diagonal length of the smallest enclosing box that can contain both the predicted and ground truth boxes.

Although CIoU loss considers three significant factors—aspect ratio, center point distance, and overlap area—when

w^{g t}

/

h^{g t}

equals

w

/

h

, the term

v

becomes 0, as shown in Equation (3). This condition can lead to instability in the convergence of the CIoU loss function.

To address this issue, this paper introduces SIoU to replace CIoU. SIoU primarily consists of three components: angle loss, distance loss, and shape loss, as illustrated in Figure 4.

Angle cost is defined as follows:

Λ = 1 - 2 \times \sin^{2} (\arcsin (\frac{c_{h}}{σ}) - \frac{π}{4})

(5)

where

c_{h}

represents the height difference between the center points of the ground truth box

B^{* G T}

and the predicted box

B^{*}

, and

σ

denotes the distance between their center points.

Distance cost is defined as follows:

Δ = \sum_{t = x, y} (1 - e^{- γ ρ_{t}})

(6)

ρ_{x} = (\frac{b_{c_{x}}^{g t} - b_{c_{x}}}{c_{w}})^{2}, ρ_{y} = (\frac{b_{c_{y}}^{g t} - b_{c_{y}}}{c_{h}})^{2}, γ = 2 - Λ

(7)

where (

b_{c_{x}}^{g t}, b_{c_{y}}^{g t}

) denotes the center coordinates of ground truth box

B^{G T}

, and (

b_{c_{x}}, b_{c_{y}}

) denotes the center coordinates of predicted box B.

Shape cost is defined as follows:

Ω = \sum_{t = w, h} (1 - e^{- w_{t}})^{θ}

(8)

w_{w} = \frac{| w - w^{g t} |}{m a x (w, w^{g t})}, w_{h} = \frac{| h - h^{g t} |}{m a x (h, h^{g t})}

(9)

where

(w, h)

and

(w^{g t}, h^{g t})

represent the width and height of predicted box B and ground truth box

B^{G T}

respectively. Parameter

θ

is used to control the emphasis on shape loss. To prevent excessive focus on shape loss and thereby reduce the movement of predicted boxes,

θ

is constrained to the range [3,7].

In conclusion, the final definition of the SIoU loss function is as follows:

{L o s s}_{S I o U} = 1 - I o U + \frac{Δ + Ω}{2}

(10)

Compared to the CIoU loss function, SIoU incorporates angle cost, redefines the penalty metric, and avoids instability in model convergence when the aspect ratios of the ground truth and predicted boxes are equal.

3. Result and Discussion

3.1. Evaluating Indicator

This study adopts Precision (P), Recall (R), [email protected], [email protected], and single-image detection time as evaluation metrics. Precision (P) refers to the ratio of correctly predicted positive samples to all samples predicted as positive. Recall (R) refers to the ratio of correctly predicted positive samples to all actual positive samples, calculated as follows:

P = \frac{T P}{T P + F P} \times 100 %

(11)

R = \frac{T P}{T P + F N} \times 100 %

(12)

True Positive (TP) represents the number of samples that are actually positive and predicted as positive. False Positive (FP) refers to the number of samples that are actually negative but predicted as positive. False Negative (FN) represents the number of samples that are actually positive but predicted as negative.

The Precision–Recall (PR) curve is a common metric for assessing model performance, with Precision plotted on the vertical axis and Recall on the horizontal axis. Average Precision (AP) is a scalar representation of the area under the PR curve, with higher values indicating better classifier performance. Mean Average Precision (mAP) represents the average of AP across all detected classes.

A P = \int_{0}^{1} P (R) d R

(13)

m A P = \frac{\sum A P}{N (c l a s s)}

(14)

[email protected] represents the mAP value at an IoU threshold of 0.5. [email protected] represents the average mAP across all 10 IoU thresholds, ranging from 0.5 to 0.95 with a step size of 0.05.

The ROC curve (Receiver Operating Characteristic Curve) is a tool used to evaluate the performance of binary classification models, widely applied in machine learning and statistics. The ROC curve illustrates the relationship between the True Positive Rate (TPR) and the False Positive Rate (FPR) to describe the model’s classification ability. The closer the ROC curve is to the upper-left corner, the better the model’s performance. The AUC (Area Under the Curve) is a numerical measure of the ROC curve, and a value closer to 1 indicates better overall predictive performance.

T P R = \frac{T P}{T P + F N}

(15)

F P R = \frac{F P}{F P + T N}

(16)

AUC (Area Under the Curve) is a numerical metric that quantifies the performance of a classification model based on the ROC curve. It represents the area under the ROC curve and reflects the model’s ability to distinguish between positive and negative classes. The range of AUC values is from 0 to 1, and the higher the AUC value, the better the model performance.

A U C = \frac{\sum (p_{i}, n_{j})}{P \times N}

(17)

where P is the number of positive samples and N is the number of negative samples.

p_{i}

is the positive sample prediction score, which is the probability of predicting a positive sample as a positive example;

n_{j}

is the negative sample prediction score, which is the probability of predicting a negative sample as a positive example. Use an indicator function to represent the positive and negative sample pairs in the above equation where the predicted positive sample value

p_{p o s}

is greater than the predicted negative sample value

p_{n e g}

.

A U C = \frac{\sum I (p_{p o s}, p_{n e g})}{P \times N}

(18)

I (p_{p o s}, p_{n e g}) = \{\begin{matrix} 1 p_{p o s} > p_{n e g} \\ 0.5 p_{p o s} = p_{n e g} \\ 0 p_{p o s} < p_{n e g} \end{matrix}

(19)

3.2. Model Training

The CPU used in this study is an Intel (R) Xeon (R) Gold 6330 CPU @ 2.00 GHz, with 256 GB of memory and a 2 TB hard drive. The GPU used is an NVIDIA A100 with 80 GB of memory. The programming language employed was Python version 3.8, with PyTorch version 1.10, and CUDA version 11.4. The experimental parameter settings are shown in Table 1.

In our study, the dataset was then divided into training, validation, and testing sets at an 8:1:1 ratio. Subsequently, to evaluate the effectiveness of FE-YOLO compared to YOLOv7, we conducted experiments on the training set. During the training process, the curves depicting the changes in Recall, Precision, [email protected], and [email protected], along with comparisons to the original YOLOv7 model, are illustrated in Figure 5. Overall, FE-YOLO shows improvements across various metrics compared to YOLOv7. As shown in Figure 5a,b, the Recall and Precision of FE-YOLO converge around the tenth epoch, whereas those of YOLOv7 converge around the twentieth epoch. Moreover, FE-YOLO exhibits higher Recall and Precision compared to YOLOv7. From Figure 5c, it can be observed that FE-YOLO shows a similar convergence speed in [email protected] compared to YOLOv7 but achieves a slightly higher value, improving by approximately 2%. Figure 5d demonstrates that the [email protected] of the FE-YOLO shows a significant improvement over YOLOv7, with an increase of approximately 8%, indicating a substantial enhancement in detection accuracy. These experiments demonstrate that the proposed FE-YOLO model shows significant performance improvements compared to the original YOLOv7 model.

The PR curves during the training process are shown in Figure 6, with the horizontal axis representing Recall and the vertical axis representing Precision. When the PR curve is closer to the top right corner, it indicates that the model can achieve both high Precision and high Recall in its predictions, implying more accurate results. In the figure, the seven differently colored curves represent the PR curves for seven different types of algae, while the thicker blue curve represents the average PR curve across all categories. From the figure, it is evident that the PR curves of the FE-YOLO are closer to the upper right corner, indicating that the FE-YOLO demonstrates superior performance.

The ROC curve during the training process is presented in Figure 7, where the horizontal axis represents the False Positive Rate (FPR) and the vertical axis represents the True Positive Rate (TPR). A curve that is closer to the top-left corner indicates better predictive performance of the model [21]. In the figure, the seven differently colored curves correspond to the ROC curves for the seven types of algae. It is evident from the figure that the ROC curve for FE-YOLO is closer to the upper-left corner, signifying its superior performance. Additionally, the AUC (Area Under the Curve) values are calculated and displayed in the figure, where a higher AUC value, approaching 1, indicates better model performance. As shown, FE-YOLO consistently demonstrates superior performance compared to the other models.

3.3. Ablation Experiments

This paper introduces two improvements to the original YOLOv7 model. To further investigate the impact of different modules on recognition results, ablation experiments were conducted, with each experiment incorporating only one improvement method. The ablation experiment results for different microalgae are shown in Table 2.

The experimental results indicate that incorporating the CAGS module into the YOLOv7 model and replacing the CIoU loss function with SIoU both lead to significant improvements in detection accuracy, Recall, and mAP metrics. Based on the results of these ablation experiments, the FE-YOLO model proposed in this paper shows significant enhancements over the original YOLOv7 model: Precision increased from 85.2% to 94.8%, an improvement of 9.6%; Recall increased from 94.2% to 96.1%, an improvement of 1.9%; [email protected] increased from 85.1% to 94.8%, an improvement of 9.7%; and [email protected] increased from 62.4% to 69.3%, an improvement of 6.9%.

Furthermore, the detection time per single image decreased from 0.0501 s to 0.0455 s, indicating a speed improvement of 9.2%. Overall, under the same experimental conditions, FE-YOLO has shown significant improvements over the original YOLOv7 model in both detection accuracy and speed.

Figure 8 illustrates the complex challenges faced by the ground truth in identifying microalgae targets, especially in areas where the contours of the microalgae are indistinct or blurred. In these regions, it is difficult to distinguish the contour features of the microalgae from the background. This impedes the model’s ability to accurately capture and identify the targets, significantly affecting the recognition performance of the algorithm. Such limitations can lead to misjudgments in the detection process, thereby reducing the overall accuracy and reliability of detection. In contrast, our proposed improvements have optimized the algorithmic architecture, enabling the differentiation of microalgae from the background. This enhancement significantly improves the accuracy of localization and the Precision of microalgae detection.

3.4. Comparative Experiment

To validate the performance of the proposed method, we compared the FE-YOLO algorithm with the latest classical algorithms, including Faster RCNN [22], YOLOv6 [23], DETR [24], YOLOv5 [25] and YOLOv8 [26]. Table 3 presents the detection results of all methods on the microalgae dataset, and Figure 9 displays the corresponding [email protected] curves during the training process for a more intuitive comparison of the performance of various methods. The results indicate that the proposed FE-YOLO algorithm demonstrates significant improvements in Recall, [email protected], and [email protected] compared to benchmark methods. As seen in Table 3, FE-YOLO has a significant advantage in both detection Precision and Recall, indicating that it can obtain more accurate localization and higher quality prediction frames, mainly due to the more adequate feature fusion and robust semantic information of the CAGS module. Moreover, while maintaining high detection accuracy, the number of parameters and GFLOPS in our model remains small, which is mainly attributed to the two lightweight detection heads we propose. In summary, the FE-YOLO model proposed in this paper demonstrates superior performance in microalgae object detection compared to the latest mainstream object detection algorithms.

3.5. Microalgae Detection Results

To visually validate the effectiveness of the proposed method, we compared the detection results of the FE-YOLO algorithm with the latest classical algorithms—YOLOv5l, YOLOv6l and YOLOv8l—on images of different microalgae species. Some of the detection results are shown in Figure 10. The experiment demonstrates that the three classic algorithms exhibit low detection accuracy and a high rate of missed detections. For instance, YOLOv8l fails to detect Phaeodactylum, while YOLOv5l and YOLOv6l show poor accuracy in detecting Karenia, Prorocentrum lima, and Dunaliella. These results highlight that, in comparison to classic algorithms such as YOLOv5l, YOLOv6l, and YOLOv8l, the FE-YOLO method achieves superior detection performance across different algae species.

4. Conclusions

Addressing the current challenges of slow detection speed and low accuracy in microalgae detection, this paper proposes an FE-YOLO algorithm based on the YOLOv7 model. To enhance detection accuracy, the CAGS module was integrated into the Neck section of YOLOv7. To address the issue of unstable convergence, SIoU was adopted as the loss function, resulting in further improvements in detection accuracy. To validate the effectiveness of the proposed algorithm, this study employed a dataset consisting of 6300 microscope images of seven common microalgae samples (Chaetoceros, Chlorella, Chrysophyta, Prorocentrum lima, Karenia, Dunaliella, and Phaeodactylum). The dataset was established through microscopy and compared against recent classical algorithms, including DETR, Faster RCNN, YOLOv5, YOLOv6, YOLOv7, and YOLOv8. The experimental results demonstrate that the FE-YOLO model shows significant improvements over the latest classical algorithms in terms of average Precision, Recall, [email protected], and [email protected].

Author Contributions

Conceptualization, Y.S. and Z.L.; methodology, G.D.; software, Y.S.; validation, Y.W., Z.Y. and D.Z.; formal analysis, X.Z.; investigation, Z.L.; resources, Y.W.; data curation, Z.Y.; writing—original draft preparation, G.D.; writing—review and editing, Y.L.; visualization, D.Z.; supervision, X.Z.; project administration, Y.L.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was Supported by the National Key Research and Development Program of China (2022YFB4301401, 2022YFB4300401), Young Elite Scientist Sponsorship Program by CAST (No.YESS20230004, 2023021K), Science and Technology Innovation Project of the China Waterborne Transport Research Institute (182408, 182410, 182418), Liaoning Provincial Natural Science Foundation (No. 2024-MS-168) and Fundamental Research Funds for the Provincial Universities of Liaoning (No. LJ212410150030).

Data Availability Statement

The authors will supply the relevant data in response to reasonable requests.

Conflicts of Interest

The authors declare that the publication of this paper has no conflicts of interest.

References

Wondraczek, L.; Gründler, A.; Reupert, A.; Wondraczek, K.; Schmidt, M.A.; Pohnert, G.; Nolte, S. Biomimetic light dilution using side-emitting optical fiber for enhancing the productivity of microalgae reactors. Sci. Rep. 2019, 9, 9600. [Google Scholar] [CrossRef]
Zhang, F.; Li, Z.; Chen, C.; Luan, H.; Fang, R.H.; Zhang, L.; Wang, J. Biohybrid microalgae robots: Design, fabrication, materials, and applications. Adv. Mater. 2024, 36, 2303714. [Google Scholar] [CrossRef]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
Qian, P.; Zhao, Z.; Liu, H.; Wang, Y.; Peng, Y.; Hu, S.; Zhang, J.; Deng, Y.; Zeng, Z. Multi-target deep learning for algal detection and classification. In Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada, 20–24 July 2020; pp. 1954–1957. [Google Scholar]
Samantaray, A.; Yang, B.; Dietz, J.E.; Min, B.C. Algae detection using computer vision and deep learning. arXiv 2018, arXiv:1811.10847. [Google Scholar]
Cho, S.; Shin, S.; Sim, J.; Lee, J. Development of microfluidic green algae cell counter based on deep learning. J. Korean Soc. Vis. 2021, 19, 41–47. [Google Scholar]
Deglint, J.L.; Jin, C.; Wong, A. Investigating the automatic classification of algae using the spectral and morphological characteristics via deep residual learning. In Image Analysis and Recognition: Proceedings of the 16th International Conference, ICIAR 2019, Waterloo, ON, Canada, 27–29 August 2019; Proceedings, Part II 16; Springer International Publishing: Cham, Switzerland, 2019; pp. 269–280. [Google Scholar]
Wang, J.; Dong, J.; Tang, M.; Yao, J.; Li, X.; Kong, D.; Zhao, K. Identification and detection of microplastic particles in marine environment by using improved faster R–CNN model. J. Environ. Manag. 2023, 345, 118802. [Google Scholar] [CrossRef] [PubMed]
Cao, M.; Wang, J.; Chen, Y.; Wang, Y. Detection of microalgae objects based on the Improved YOLOv3 model. Environ. Sci. Process. Impacts 2021, 23, 1516–1530. [Google Scholar] [CrossRef] [PubMed]
Pant, G.; Yadav, D.P.; Gaur, A. ResNeXt convolution neural network topology-based deep learning model for identification and classification of Pediastrum. Algal Res. 2020, 48, 101932. [Google Scholar] [CrossRef]
Krause, L.M.; Koc, J.; Rosenhahn, B.; Rosenhahn, A. Fully convolutional neural network for detection and counting of diatoms on coatings after short-term field exposure. Environ. Sci. Technol. 2020, 54, 10022–10030. [Google Scholar] [CrossRef] [PubMed]
Dong, J.; Wang, J.; Lin, H.; Liu, W. M-YOLO v8s: Classification and Identification of Different Microalgae Species Based on the Improved YOLO v8s Model for Prevention of Harmful Algal Blooms. ACS EST Water 2025, 5, 329–340. [Google Scholar] [CrossRef]
Zeng, J.; Xyu, T.; Yuan, Q.; Qi, Y. Improved YOLOv7 object detection algorithm. Artif. Intell. Internet Things Smart Agric. 2024, 14, 1330141. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
Liu, Y.; Qu, Y.; Yang, Y.; Deng, Q.; Luo, Y.; Zhang, X. Algae-based flexible localized oxygen control around Cells: An approach leading to more Biomimetic microphysiological systems. Chem. Eng. J. 2024, 502, 158040. [Google Scholar] [CrossRef]
Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles. arXiv 2022, arXiv:2206.02424. [Google Scholar]
Hayashi, K.; Kato, S.; Matsunaga, S. Convolutional neural network-based automatic classification for algal morphogenesis. Cytologia 2018, 83, 301–305. [Google Scholar] [CrossRef]
Tian, Q.; Wang, Z.; Cui, X. Improved Unet brain tumor image segmentation based on GSConv module and ECA attention mechanism. arXiv 2024, arXiv:2409.13626. [Google Scholar] [CrossRef]
Yin, X.; Goudriaan, J.A.N.; Lantinga, E.A.; Vos, J.A.N.; Spiertz, H.J. A flexible sigmoid function of determinate growth. Ann. Bot. 2003, 91, 361–371. [Google Scholar] [CrossRef]
Hoo, Z.H.; Candlish, J.; Teare, D. What is an ROC curve? Emerg. Med. J. 2017, 34, 357–359. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In European Conference on Computer Vision; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Wong, C.; Yifu, Z.; Montes, D.; et al. ultralytics/yolov5: v6.2-YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.AI Integrations; Zenodo: Geneva, Switzerland, 2022. [Google Scholar]
Jocher, G. Yolov8. GitHub. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 10 November 2024).

Figure 1. Microalgae image acquisition device.

Figure 2. FE-YOLO model structure.

Figure 3. CAGS network structure.

Figure 4. The scheme for calculation of angle cost and distance cost contribution into the loss function. B represents the predicted bounding box,

B^{g t}

denotes the ground truth bounding box, and

B^{*}

and

B^{* G T}

are the minimum enclosing rectangles of the predicted and ground truth bounding boxes, respectively.

{C_{W}}^{}

and

{C_{h}}^{}

represent the width and height of the minimum enclosing rectangle, and

α^{}

indicates the angular difference between the bounding boxes.

Figure 4. The scheme for calculation of angle cost and distance cost contribution into the loss function. B represents the predicted bounding box,

B^{g t}

denotes the ground truth bounding box, and

B^{*}

and

B^{* G T}

are the minimum enclosing rectangles of the predicted and ground truth bounding boxes, respectively.

{C_{W}}^{}

and

{C_{h}}^{}

represent the width and height of the minimum enclosing rectangle, and

α^{}

indicates the angular difference between the bounding boxes.

Figure 5. The progress of training performance of different models. (a) Recall, (b) Precision, (c) [email protected], (d) [email protected].

Figure 6. The comparison of P-R curves during training. (a) YOLOv7, (b) FE-YOLO.

Figure 7. The comparison of ROC curves during training. (a) YOLOv7, (b) FE-YOLO. Dashed line: randomly guessing the baseline.

Figure 8. Visualization of attention maps for the YOLOv7, ablation1, and FE-YOLO on the microalgae dataset.

Figure 9. [email protected] curves for different methods on the microalgae dataset.

Figure 10. The detection results of different methods.

Table 1. Parameter settings during training.

Parameter	Configuration
Learning rate Momentum Weight decay Batch size Works Epochs Image size	0.01 0.937 0.0005 16 8 120 640 × 640

Table 2. Comparison results of ablation experiments on the microalgae dataset.

AP
Method	CAGS	SIOU	Chl	Chr	Kar	Pha	Pro	Cha	Dun	P	R	[email protected]	[email protected]	Ds
YOLOv7	×	×	87.9%	82.8%	94.9%	69.4%	99.4%	62.3%	98.8%	85.2%	94.2%	85.1%	62.4%	0.0501 s
Ablation1	√	×	88.2%	83.6%	95.2%	73.5%	99.4%	66.7%	98.7%	86.4%	94.6%	88.9%	67.8%	0.0500 s
FE-YOLO	√	√	92.1%	98.9%	98.5%	99.2%	90.7%	84.6%	99.6%	94.8%	96.1%	94.8%	69.3%	0.0455 s

Chl: Chlorella, Chr: Chrysophyta, Kar: Karenia, Pha: Phaeodactylum, Pro: Prorocentrum lima, Cha: Chaetoceros, Dun: Dunaliella, Ds: Detection speed.

Table 3. Comparison results with other methods.

Method	R	[email protected]	[email protected]	GFLOPS	Parameters
Faster RCNN	85.0%	84.9%	41.6%	207	40 M
DETR	90.5%	92.2%	51.3%	225	41 M
YOLOv5l	94.7%	95.0%	65.0%	107.7	46.5 M
YOLOv5x	93.3%	93.9%	65.4%	203.9	86.7 M
YOLOv6m	94.0%	95.6%	64.8%	85.8	34.9 M
YOLOv6l	94.5%	94.9%	63.8%	150.7	59.6 M
YOLOv7	94.2%	93.2%	56.3%	103.2	36.5 M
YOLOv8n	72.7%	84.8%	62.4%	8.1	30.0 M
YOLOv8l	72.8%	84.7%	64.2%	164.8	43.6 M
FE-YOLO	96.1%	94.8%	69.3%	98.7	26.3 M

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ding, G.; Shi, Y.; Liu, Z.; Wang, Y.; Yao, Z.; Zhou, D.; Zhu, X.; Li, Y. FE-YOLO: An Efficient Deep Learning Model Based on Feature-Enhanced YOLOv7 for Microalgae Identification and Detection. Biomimetics 2025, 10, 62. https://doi.org/10.3390/biomimetics10010062

AMA Style

Ding G, Shi Y, Liu Z, Wang Y, Yao Z, Zhou D, Zhu X, Li Y. FE-YOLO: An Efficient Deep Learning Model Based on Feature-Enhanced YOLOv7 for Microalgae Identification and Detection. Biomimetics. 2025; 10(1):62. https://doi.org/10.3390/biomimetics10010062

Chicago/Turabian Style

Ding, Gege, Yuhang Shi, Zhenquan Liu, Yanjuan Wang, Zhixuan Yao, Dan Zhou, Xuexiu Zhu, and Yiqin Li. 2025. "FE-YOLO: An Efficient Deep Learning Model Based on Feature-Enhanced YOLOv7 for Microalgae Identification and Detection" Biomimetics 10, no. 1: 62. https://doi.org/10.3390/biomimetics10010062

APA Style

Ding, G., Shi, Y., Liu, Z., Wang, Y., Yao, Z., Zhou, D., Zhu, X., & Li, Y. (2025). FE-YOLO: An Efficient Deep Learning Model Based on Feature-Enhanced YOLOv7 for Microalgae Identification and Detection. Biomimetics, 10(1), 62. https://doi.org/10.3390/biomimetics10010062

Article Menu

FE-YOLO: An Efficient Deep Learning Model Based on Feature-Enhanced YOLOv7 for Microalgae Identification and Detection

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Construction and Processing

2.2. FE-YOLO Algorithm

2.2.1. Optimized Network Architecture

2.2.2. CAGS Module

2.2.3. Optimized Loss Function

3. Result and Discussion

3.1. Evaluating Indicator

3.2. Model Training

3.3. Ablation Experiments

3.4. Comparative Experiment

3.5. Microalgae Detection Results

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI