1. Introduction
Agaricus bisporus is one of the most produced and consumed edible mushrooms in the world [
1]. Factory cultivating can greatly increase the yield of
A. bisporus compared to traditional cultivating methods. However, current industrialized cultivation still relies heavily on human labor, especially in the monitoring and harvesting of
A. bisporus. This leads to three problems:
- (1)
Labor shortages inflate labor costs, which usually account for 15% to 30% of the total production costs [
2].
- (2)
Manual observation of A. bisporus growth status is inefficient, with accuracy highly dependent on human knowledge.
- (3)
Manual harvesting of A. bisporus is labor-intensive and inefficient, and the size of the harvested mushrooms is inconsistent. Mushrooms harvested manually usually need to be sorted.
Compared to traditional manual methods, machine vision algorithms based on image processing and deep learning offer significant advantages and potential in terms of target detection and production process monitoring speed and accuracy. Therefore, the objective of this paper is to develop an A. bisporus detection algorithm using machine vision techniques, which can achieve mushroom recognition, precise positioning, and diameter measurement.
Many methods based on traditional image processing algorithms have been proposed in the research of
A. bisporus detection. For example, based on the gray-level distribution properties of
A. bisporus images, Yu et al. [
3] scanned the threshold images twice using the region labeling technique of the sequence scanning algorithm to achieve
A. bisporus segmentation. Ji et al. [
4] proposed a “submersion method”, which incorporated the depth information to effectively segment the adherent mushroom clusters and used the circle Hough Transform to measure the diameter of
A. bisporus, achieving a recognition success rate of 92.37% and a diameter measurement error of 4.94%. Chen et al. [
5] proposed an
A. bisporus segmentation recognition algorithm combining morphology and iterative marker-controlled watershed transformation, which achieved a high recognition success rate of 95.7% and a diameter measurement error of only 1.43%. Although the above algorithms have solved the
A. bisporus detection problem to a certain extent, they rely on manual feature extraction and scene information. In addition, those algorithms are time-consuming, and their robustness, real-time performance, and generalization capability are not ideal. Therefore, those methods still cannot meet the needs of practical growth monitoring and automated harvesting.
Compared with traditional image processing algorithms, deep-learning-based target detection algorithms can extract multilevel features from images through unsupervised or weakly supervised learning, and their powerful learning capability and highly hierarchical structure significantly improve the target detection performance [
6]. Deep-learning-based target detection methods are divided into two main categories. One category is detection methods based on region proposal networks, including region-based Convolutional Neural Networks (R-CNN) and Faster R-CNN, which first generate region proposals and then classify these regions. For example, Lee et al. [
7] used a depth camera to obtain the 3D point cloud of
A. bisporus and used the Faster R-CNN model to segment overlapping and adhering mushroom clusters and identify individual mushrooms, reaching an accuracy of 70.93%. Yang et al. [
8] proposed an
A. bisporus identification and localization algorithm based on Mask R-CNN, with an accuracy of 95.61%. Despite the higher detection accuracy, this category of algorithms is computationally intensive and slower in detection. Another category is regression-based target detection methods, including You Only Look Once (YOLO) [
9] and Single Shot MultiBox Detector (SSD) [
10], which normalize the images and feed them directly into a convolutional neural network for detection. Among them, YOLO has a fast detection speed, low false detection rate, good generalization performance, and more improved versions, earning it wide applications in agriculture [
11,
12,
13,
14]. Cong et al. [
15] proposed a lightweight Lentinula edodes detection model based on YOLOv3, Mushroom-YOLO, which had a mean average precision (mAP) of 97.03% and a detection speed of 19.78 ms, achieving excellent timeliness and detection ability. Yin et al. [
16] proposed an algorithm for high-precision estimation of
Oudemansiella raphanipies mushroom cap diameter based on YOLOv4 and Distance Filter, with an estimated mean absolute error (MAE) of 0.77 mm and a root mean square error of 0.96 mm.
Among the YOLO series models, YOLOv5 has shown significant advantages in both detection accuracy and speed, and adaptive anchor frame calculation enables multiscale detection [
17,
18]. Its simplified version, YOLOv5s, is suitable for operation and implementation on mobile devices due to the low model complexity [
19]. For example, Chen et al. [
20] introduced the K-means++ clustering algorithm into the YOLOv5s framework to detect Camellia oleifera fruits concealed by leaves, reaching an mAP of 94.1% with a model of 27.1 M. Based on the channel pruned YOLOv5s, Wang et al. [
21] proposed a channel pruned YOLOv5s method for the accurate detection of small apple fruits, which achieved a 95.8% accuracy with a model of only 1.4 M in size. Li et al. [
22] proposed an improved lightweight algorithm based on YOLOv5s to detect the flat jujube in a complex natural environment. They used the bidirectional feature pyramid network to enhance multiscale feature fusion, and introduced dual coordinate attention modules to improve the feature extraction capability. The Convolutional Block Attention Module (CBAM) [
23], an approach to enhance representation power of CNN networks, is often used to improve the recognition rate of networks. Li et al. [
24] presented a fast and lightweight detection algorithm of passion fruit pests, with a CBAM module added to the YOLOv5s neck network to make the CNN focus on the target adaptively. Sun et al. [
25] proposed an upgraded lightweight apple detection method YOLOv5-Prediction for the rapid detection of apple yield in an orchard environment, with CBAM attention mechanisms used to improve the detection accuracy of the algorithm.
In the growing environment of A. bisporus, the mushrooms may be overlapping, under insufficient light, and adhering to each other, which poses challenges for improving A. bisporus detection accuracy and identification speed. To address these needs, the YOLOv5s model was selected in this study for the detection and diameter measurement of A. bisporus targets. However, the original algorithm tends to lose small object target information during the training process, and the detection performance is less satisfactory in environments with complex target backgrounds. Therefore, we propose a target detection method based on improved YOLOv5s. The proposed method improves the detection accuracy and robustness of the model by introducing the CBAM attention module into the YOLOv5s backbone network and by employing the Mosaic image augmentation technique during training. In the meantime, we use the bounding box to calculate the center coordinates and measure the mushroom sizes, and the obtained results can provide target information for the automatic harvesting equipment, help monitor mushroom growth, predict the maturation time, and serve as a basis for optimal adjustment of the mushroom growing house control system.