1. Introduction
Blueberry is rich in nutrients and has high economic value. The blueberry-growing industry spreads all over the world [
1], and more than 30 countries and regions are developing the blueberry industry. China’s blueberry industry has developed rapidly over the past 20 years. In the Asia-Pacific region, China is a major contributor to the blueberry industry [
2]. The rapid development of deep learning has resulted in many new types of agricultural equipment. As a large agricultural country, it is particularly important for China to improve modern agricultural technology so that agriculture can keep up to date with the pace of modernization, such as that in the blueberry industry. Therefore, it is extremely important to use deep learning technology to develop an automated blueberry picking system, which can not only reduce a lot of human and material resources consumed by traditional picking methods, but also reduce the waste of resources caused by untimely picking. As an integral part of the fruit and vegetable picking robot system, the visual recognition system plays a vital role in fruit and vegetable target recognition and positioning, automatic picking and fruit and vegetable yield estimation [
3]. However, the accuracy of object detection is important for the location of blueberries with different maturity levels in the clustered blueberry [
4]. Therefore, it is necessary to design a detection model that is suitable for specific crop picking. For the fruit detection model, the accuracy of the detection and the lightweight design of the model are the key aspects. This paper studies the problem from these two aspects, and the specific contributions are as follows:
A blueberry dataset is constructed. Blueberry images growing in the natural environment were collected, and three kinds of blueberry with different degrees of maturity were marked with the LabelImg software. The blueberry images were augmented by data augmentation technology to enhance the generalization of the model, which can effectively avoid the overfitting problem during the training process.
A lightweight blueberry recognition model based on multi-scale and attention fusion is proposed. Firstly, we design a new attention module, NCBAM, which is added to the backbone network for improving the feature extraction ability of the model. Secondly, the small target detection layer is added to improve the multi-scale recognition ability of blueberries. Finally, the C3Ghost module is introduced into the backbone network to facilitate the reduction in model parameters.
The proposed blueberry recognition model based on improved YOLOv5 is validated. Experiments show that it can effectively improve the recognition accuracy of blueberries, which is beneficial to the development of orchard automatic picking.
2. Related Work
Blueberries are widely planted because of rich nutrition and high value [
3], but blueberries growing in the natural environment are usually dense and sticky; what is worse, they are prone to complex backgrounds such as shading of branches and leaves. Therefore, rapid and accurate identification of blueberries is currently very challenging. Using deep learning technology to design a blueberry recognition model with excellent performance is one of the key points to realize automatic picking system. Therefore, it is necessary to conduct in-depth research on it.
The detection speed of YOLOv5 is faster than that of YOLOv3 [
5] and YOLOv4 [
6], and it can more accurately detect targets in the case of complex backgrounds and occluded targets. Therefore, the current target detection is generally improved based on the YOLOv5 model [
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20]. In order to improve the detection performance of YOLOv5, the network is generally improved from three aspects: backbone network, neck network and prediction network. More details are as follows:
There are several ways to improve the backbone network. Yan et al. [
7] replaced BottleneckCSP module in the backbone network of the original YOLOv5s with BottleneckCSP-2 module in order to effectively reduce the number of model parameters. Secondly, the SE (Squeeze-and-Excitation) module of the visual attention mechanism network is added to the backbone network to improve the expression ability of the model; Similarly, Chen et al. [
8] also added an SE module to the backbone network for improving the sensitivity of the model to channel features. The proposed improved network model can effectively identify graspable apples that are not occluded or only occluded by leaves, and ungraspable apples that are occluded by branches or other fruits. In order to detect objects in the images with a complex background, Hu et al. [
9] improved the C3 module in the backbone network using the convolution kernel group to enhance the feature extraction of the detected object and the attention module to focus on the whole object; Li et al. [
10] replaced the ordinary convolution in the network model with the depthwise separable convolution, which reduced the number of network parameters and improved the detection accuracy of apple fruits. Luo et al. [
11] proposed a new detection method named YOLOv5-Aircraft, which solved the problem of insufficient detection accuracy and slowed the detection speed of aircraft targets in remote sensing images under complex backgrounds. In the method, the hourglass-shaped module CSAndGlass is designed on the backbone feature extraction network of YOLOv5 and the original residual module is replaced by CSAndGlass, which reduces the semantic loss. Therefore, based on the above research, we can clearly know that in terms of backbone network improvement, firstly, adding an attention module can enhance the feature extraction ability of detected objects, thereby improving the overall detection performance of the model. Secondly, the size of the model can be reduced by using the lightweight module, so as to achieve the purpose of improving the speed. Therefore, in this study, the NCBAM attention module we designed was added to the backbone network to improve the feature extraction ability of blueberries, and the C3 module was replaced with C3Ghost to reduce the model size.
In terms of improving the neck network, Zhao et al. [
12] proposed an improved network structure by adding a micro-scale detection layer, setting an a priori anchor box, and adjusting the confidence loss function of the detection layer based on IoU. The improved YOLOv5 method can accurately detect wheat peaks in UAV images, solve the problem of ear error detection and omission detection caused by occlusion conditions, enhance the feature extraction ability of wheat ears, and improve the detection accuracy. Zhu et al. [
13] designed a new feature fusion layer to capture shallow features of the small boulder and combined Convolutional Block Attention Module (CBAM) and Effective Channel Attention Network (ECA-Net) to integrate a new attention module, which is added to the neck network to highlight information helpful for boulder detection. Through the study of [
12,
13], we concluded that in the field of small object detection, adding a small detection scale can improve the detection accuracy of small objects. Therefore, a small-scale detection layer is added in this study to improve the detection accuracy of blueberries, because the blueberry target is small in some images.
6. Conclusions
This study improves the YOLOv5 network and adds our newly designed NCBAM to the backbone network for improving the model’s ability to extract blueberry features. Then, the C3 module in the backbone network is replaced with the C3Ghost module in order to reduce the model parameters. Finally, a small target detection layer is added to detect blueberry at multiple scales, and the ability to identify blueberries is improved. It can be seen from the experimental results that the improved network has a 2.4% increase in mAP compared with the original YOLOv5 network, which proves that the improved model can effectively improve the recognition accuracy of blueberries. In addition, it can also be used detect three kinds of blueberry of different maturity, providing an accurate blueberry positioning for automatic blueberry picking system, thereby reducing economic losses caused by untimely manual picking, improving the economic benefits of the blueberry industry, and promoting the development of fruit and vegetable picking robot systems. Compared with the original YOLOv5 network, this model has more network parameters. In the next work, we will continue to research on reducing the network parameters and improving the detection ability.