Plucking Point and Posture Determination of Tea Buds Based on Deep Learning

Dong, Chengju; Wu, Weibin; Han, Chongyang; Zeng, Zhiheng; Tang, Ting; Liu, Wenwei

doi:10.3390/agriculture15020144

Open AccessArticle

Plucking Point and Posture Determination of Tea Buds Based on Deep Learning

by

Chengju Dong

^1,2,

Weibin Wu

^1,*,

Chongyang Han

¹

,

Zhiheng Zeng

¹,

Ting Tang

¹ and

Wenwei Liu

²

¹

College of Engineering, South China Agricultural University, Guangzhou 510642, China

²

The Fifth Electronics Research Institute of Ministry of Industry and Information Technology, Guangzhou 511370, China

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(2), 144; https://doi.org/10.3390/agriculture15020144

Submission received: 9 December 2024 / Revised: 2 January 2025 / Accepted: 8 January 2025 / Published: 10 January 2025

(This article belongs to the Topic Intelligent Agriculture: Perception Technologies and Agricultural Equipment for Crop Production Processes)

Download

Browse Figures

Versions Notes

Abstract

:

Tea is a significant cash crop grown widely around the world. Currently, tea plucking predominantly relies on manual work. However, due to the aging population and increasing labor costs, machine plucking has become an important trend in the tea industry. The determination of the plucking position and plucking posture is a critical prerequisite for machine plucking tea leaves. In order to improve the accuracy and efficiency of machine plucking tea leaves, a method is presented in this paper to determine the plucking point and plucking posture based on the instance segmentation deep learning network. In this study, tea images in the dataset were first labeled using the Labelme software (version 4.5.13), and then the LDS-YOLOv8-seg model was proposed to identify the tea bud region and plucking area. The plucking points and the central points of the tea bud’s bounding box were calculated and matched as pairs using the nearest point method (NPM) and the point in range method (PIRM) proposed in this study. Finally, the plucking posture was obtained according to the results of the feature points matching. The matching results on the test dataset show that the PIRM has superior performance, with a matching accuracy of 99.229% and an average matching time of 2.363 milliseconds. In addition, failure cases of feature points matching in the plucking posture determination process were also analyzed in this study. The test results show that the plucking position and posture determination method proposed in this paper is feasible for machine plucking tea.

Keywords:

plucking point; plucking posture; tea buds; deep learning; feature points matching

1. Introduction

Tea occupies a significant place in people’s lives due to its high drinking value and economic value. It is widely cultivated in various countries, including China, Japan, England, India, etc. [1]. The growing demand for tea has led to the higher requirement for tea production and processing. There are four main procedures in tea processing: plucking, green removing, rolling, and drying. Of these, plucking is the first process, and its quality has an important impact on the subsequent processing. Tea plucking is time-consuming and tedious, and is low in mechanization. Problems in tea plucking have become more and more prominent in recent years. On the one hand, tea plucking is a laborious job, but with the development of urbanization, the rural labor force population is shrinking and labor costs are rising rapidly. On the other hand, the main laborers in tea gardens are currently old men or women, which is not sustainable as the population is aging [2]. Moreover, tea plucking, especially for famous tea, is in high demand for experience. Without training, plucking quality is not uniform.

To deal with the aforementioned issues and improve the efficiency of tea plucking, mechanized plucking of tea has been proposed. Japan is the first to study the mechanization of tea plucking and developed different types of tea plucking machines, including a self-propelled tea plucking machine, ride-on tea plucking machine, and others [3,4]. The mechanization of tea plucking has also been studied in England, India, and Australia, respectively [1]. However, these early tea plucking machines mainly cut the top of the tea trees without selection at the same height or angle, which is imprecise and may damage the leaves. It is feasible for ordinary tea, but not good for vintage tea. In recent years, some scholars have initiated research and development of tea bud precision-mechanized plucking equipment to achieve accurate harvesting of vintage tea [5,6,7].

Early tea plucking machines cut the tea leaves without determining the position and posture of the leaves, resulting in low plucking quality. Identification of tea leaves is a crucial step in tea plucking, and many scholars have conducted extensive research on this topic. There are two main stages in the study of tea identification. The color and shape features of the tea were used in the first stage, in which tea images were manipulated with various transformations using image processing techniques to obtain the target features [8,9,10]. Thangavel and Murthi proposed a semi-automated system for tea leaf harvesting which identified the different grades of tea leaves using key frame extraction, rice counting, optical flow, and a combination of Prewitt and auto thresholding [11]. Chen et al. analyzed the color properties of tea images using a luminance and color difference model (YUV model), and then they transformed the gray-level images to binary images using the Otsu method, eliminating the noise from the binary image to obtain the tea region [12]. Wu et al. proposed a tea bud identification algorithm using the K-means clustering method based on the a and b components of the Lab color model, and images taken at three different distances were used to compare the recognition effect and efficiency of the Otsu method and the K-means method [13]. Karunasena et al. presented a tea bud detection method using a cascade classifier based on the method integrated with Histogram of Oriented Gradients (HOG) features and Support Vector Machines (SVM) classification, and four sets of tea leaf sample images were taken to conduct the tea bud identification test [14].

Although the image processing method is simple to operate on image feature recognition, its precision is limited and the requirements of the images are high, making it difficult to meet actual demand. In particular, for tea bud recognition, the color and shape features of the foreground and the background are very close to each other, so the recognition results are barely satisfactory [15]. In recent years, deep learning has demonstrated its transcendent ability to recognize objects under complex background conditions, providing a new method for crop recognition and localization. Many experts have studied the detection and recognition of various fruits, such as kiwifruit [16], apple [17,18], Camellia oleifera fruit [19], tomato [20], and waxberry [21], using deep learning [22,23]. Therefore, the second stage of tea leaf feature recognition research is mainly based on deep learning methods [24,25,26,27,28,29]. Xu et al. proposed a two-level fusion network which combines the fast detection capability of YOLOv3 and the high-accuracy classification capability of DenseNet201 to detect tea buds [30]. They also investigated the influence of the shooting angle of the camera on the detection effect. The results show that detection is better on the side-shot tea buds than on the top-shot tea buds. Yang et al. proposed an improved YOLOv3 deep convolutional neural network algorithm to identify tender shoots for high-quality tea, which achieved end-to-end object detection with an image pyramid and residual network, and the K-means method was used to cluster the dimensions of the target boxes [28].

Plucking position and posture determination are the crucial steps after tea bud identification. However, few scholars have studied the plucking point determination method, and even fewer have studied the plucking posture determination. Li et al. detected the tea shoot regions on RGB images using the YOLO network and obtained the tea shoot point clouds using Euclidean clustering processing and the target point cloud extraction algorithm, and then determined the plucking point by setting a minimum cylinder with the radius enclosing the 3D point cloud and the height determined by the growth characteristics of the tea [7]. Chen et al. presented a plucking point locating method by combining the YOLOv3 algorithm, semantic segmentation algorithm, skeleton extraction, and minimum bounding rectangle. They segmented the main veins from the target regions using the Fast-SCNN and skeleton extraction algorithms, and finally required the final endpoint of the traversal as the plucking point [25]. Yang et al. identified the tea shoot regions with the improved YOLOv3 network, obtained the bounding box of the target. The contact position of the tea stem and the identification box were set as the plucking point. However, this method is highly demanding on the camera angle [28]. Chen et al. trained the Faster R-CNN to detect the one tip with two leaves regions (OTTL regions), and then utilized the FCN to detect the plucking point in the OTTL regions [31].

In fact, unlike large fruits, such as apples and oranges, tea buds are small and delicate. Tea buds are mechanically plucked by shearing the tea stems at a special location. If the plucking point is reached on the wrong direction, failure will be achieved. It is therefore necessary to determine the plucking position and posture accurately according to its growth characteristics.

In this study, we aimed to address the problem of plucking posture and position determination. The main innovations and research ideas are as follows:

An approach for plucking posture and point determination of tea buds based on a deep learning network and an image processing method were proposed. This technique can enhance the accuracy of mechanized tea plucking implementation, thereby contributing to its successful adoption in practice.
Two approaches of feature point matching were put forward, enabling accurate alignment of two sets of tea bud feature points and the acquisition of the feature line for tea plucking posture. The accuracy and efficiency of these two methods were analyzed in detail.
The accurate matching of feature points plays a crucial role in determining the plucking posture of tea buds. Therefore, the failure cases of feature point matching were also analyzed. Additionally, future research prospects were presented.

The rest of this paper is arranged as follows. Section 2 presents the method of detection and segmentation of the tea buds and plucking area, as well as the method used for the determination of the plucking posture and position. The effect evaluation of the algorithm and some discussions are presented in Section 3. Conclusions are drawn in Section 4.

2. Materials and Methods

2.1. Dataset Building

2.1.1. Image Acquisition

The image dataset was built using tea bud pictures taken in Guangzhou National Agricultural Science and Technology Park (113.449166 E, 23.394961 N) at Guangdong, China, in June. Yinghong No. 9, which is one of the vintage tea varieties in South China, was used as the study object. To enhance the effectiveness of the proposed algorithm on improving actual plucking accuracy and efficiency, the pictures were captured 3–5 days before the tea buds were harvested under different weather conditions, including sunny, cloudy, and drizzly weather. In view of higher recognition rate and effective plucking, images were taken with a shooting angle of 60° and a photographic distance of 0.5–1 m, as shown in Figure 1. In total, 4024 images were taken with a resolution of 4032 × 3024 and stored in JPEG format.

2.1.2. Data Processing

After removing invalid images, the resolution of all the remaining pictures were scaled to 1008 × 756 so as to reduce the training time. In addition, to improve the generalization capability of deep learning results, the images were augmented using a horizontal mirror. Finally, there were 6450 images in the dataset. They were divided into three subsets: 80% (5000) of images were in the training set, and 20% (1250) of images were in the validation set; in addition, 200 images were used as test set to evaluate the performance of the results.

Image annotation is crucial for dataset processing in deep learning, which can produce feature information to train the deep learning network. The image annotation software Labelme was used as the annotation tool in this study to generate tea bud and plucking area regions [32]. One bud with one leaf (OBOL) is usually considered to be high-quality tea, so the OBOL regions were annotated as the tea buds in the tea image dataset. Meanwhile, in order to localize the plucking point and posture of the tea bud, the plucking area was also annotated. The plucking area was located on the tea stem between the first leaf and the second leaf, and was annotated by 4 lines. The overall annotation result is shown in Figure 2.

2.2. Instance Segmentation of Tea Bud and Plucking Area Based on Improved YOLOv8-Seg

YOLOv8 is an exceptional deep learning network model released by Ultralytics in 2023, which exhibits remarkable capabilities in classification, detection, segmentation, and pose estimation tasks. Compared to the previous versions, YOLOv8 adopts an anchor-free structure; instead of predicting the offset between the target and a predefined anchor, it directly predicts the target center, thereby enhancing both the efficiency and accuracy of the predictions [33]. The YOLOv8-seg model incorporates a mask coefficient and prototype branches, enabling the realization of instance segmentation. To accommodate diverse tasks, YOLOv8-seg offers different models based on the scale factors of depth and width, namely YOLOv8n-seg, YOLOv8s-seg, YOLOv8m-seg, YOLOv8l-seg, and YOLOv8x-seg.

The YOLOv8-seg network structure consists of input, backbone, neck, and head modules, as illustrated in Figure 3. The backbone is composed of a CBS module, C2f module, and SPPF module. The kernel size of the first convolution layer has been modified from the original 6 × 6 to 3 × 3 based on the YOLOv5 network. Moreover, all C3 modules have been replaced with C2f modules, resulting in further enhancements in computational efficiency. The neck module is utilized for feature fusion on the feature maps generated by the backbone network. The decoupled structure is employed in the head component, where multiple parallel branches are utilized to extract category features, position features, and mask features, respectively.

To enhance the segmentation efficiency and accuracy of the YOLOv8-seg network and accommodate the varying shapes of tea buds and plucking areas in this study, an improved algorithm called LDS-YOLOv8-Seg was proposed in this study. Firstly, to reduce the excessive parameters in the head module of the YOLOv8-seg model, shared convolution was employed to effectively reduce the parameters and enhance segment efficiency. Additionally, in order to compensate the potential accuracy degradation resulting from parameter reduction, Group Normalization (GN) was adopted as a replacement for Batch Normalization (BN) for normalization purposes. Secondly, an attention mechanism was added to the SPPF module to improve the model’s ability of feature extraction, which enabling the backbone to prioritize the most relevant parts of the input. Finally, the Deformable Convolution network version 2(DCNv2) was introduced to effectively adapt to the irregular and variable shape characteristics of the segmentation object, thereby acquiring more intricate spatial information. The structure of the LDS-YOLOv8-Seg network is shown in Figure 4.

2.2.1. The Improved Segmentation Head

The module of the head is a crucial component of the YOLOv8-seg model, which is responsible for the final object detection, classification, and segmentation. It comprises four types of branches, with the first branch dedicated to the prototype prediction, primarily consisting of an up-sampling layer. In addition, there are three branches for different sizes of feature maps: one for bounding box regression prediction, and another for classification prediction and the last branch for mask coefficient prediction. The regression prediction branch and classification prediction branch both consist of two CBS modules and a separate conv2d, contributing to the calculation of regression loss and classification loss, respectively. The CBS module comprises the conv2dlayer, BatchNorm2d(BN) layer, and the SiLU activation function.

The original YOLOv8-seg head structure, as shown in Figure 5, contains a large number of parameters, resulting in high computational cost. In this study, the model was lightweight by introducing shared convolution and 1 × 1 convolution to reduce the amount of calculations. The improved YOLOv8-seg head structure is shown in Figure 6, wherein all the white modules employ shared convolutions, including the CGS modules and Conv2d modules for classification and regression. In the regression and classification branches, the output feature maps of the neck component initially undergo convolutional operations using 1 × 1 shared convolutional blocks (CGS) followed by subsequent processing through two 3 × 3 shared convolutional blocks (CGS) for classification and detection. The bounding box regression results of different scales are scaled using a scale factor after the regression calculation in order to address the issue of inconsistent target scales detected by different regression branches due to the shared convolution. The segmentation task is performed at the pixel level, with a focus on calculating more refined spatial features. In order to ensure the segmentation accuracy, this branch abstains from utilizing shared convolutions. The Group Normalization (GN) module, proposed by Wu et al., has been demonstrated to improve calculation accuracy in detection and segmentation tasks compared to Batch Normalization (BN) [34]. The utilization of shared convolution inevitably leads to a certain degree of accuracy degradation. Therefore, in this study, Group Normalization (GN) was employed as an substitute for Batch Normalization (BN) within the CBS, resulting in the formation of the CGS module to enhance the accuracy of final calculation. Moreover, the computational burden of the model can be further reduced by a 1 × 1 convolution.

2.2.2. SPPF-LSKA Module

The attention mechanism simulates the cognitive process of humans in allocating attention when processing information. By assigning different weights to each element of the input data, it enables the model to focus on the most relevant features, thereby enhancing the expressiveness and generalization ability of the model while reducing unnecessary computations. The large separable kernel attention (LSKA) is a spatial attention mechanism, which is improved based on large kernel attention (LKA). The structures of LKA and LSKA are shown in Figure 7, where the

\otimes

represents the Hadamard product; the LKSA module is obtained by decomposing the two-dimensional convolution kernels of the depth-wise convolution and deep dilated convolution in the LKA module into two cascaded horizontal and vertical one-dimensional separable convolution kernels, thereby further reducing both memory usage and computational complexity. The LSKA exhibits robust long-range dependence in picture, spatial, and channel adaptabilities, as well as in the exceptional scalability of the extremely large kernel.

The LSKA initially employs convolution to extract the horizontal and vertical features from the input feature map. Subsequently, dilated convolutions with different dilation rates are utilized to further enhance the feature extraction, which enables a broader receptive field coverage and obtains additional feature information without incurring additional computational costs. The obtained features are ultimately combined through a 1 × 1 convolution layer to generate the final attention map, which is subsequently subjected to the Hadamard product with the original input feature map.

The LSKA module was incorporated after the Concat layer of the SPPF module in this study. Moreover, the BN layer was replaced by a GN layer in the CBS module to generate the CGS module. The SPPF-LSKA module was formed through these aforementioned enhancements. The enhanced structure is shown in Figure 8. The feature fusion ability of the SPPF module can be enhanced through these improvements, reducing the influence of background features on the tea bud and plucking area detection and improving the model’s accuracy in complex environments.

2.2.3. C2f-DCNv2 Module

Traditional convolution performs well in handling static or regular targets; however, it may encounter challenges when dealing with irregular and variable shapes of tea buds, as well as significant differences in size. Compared to traditional convolution networks, the deformable convolution network (DCN) exhibits better recognition capabilities for diverse shapes and sizes of tea buds by dynamically adjusting the spatial and structural characteristics of the target. The deformable convolution network version 1 (DCNv1) introduces learnable offsets to accommodate the deformation of objects in the image; however, the new position of the sample point after offsetting may move beyond the ideal offset range. To address this limitation, in DCNv2, each sample not only learns the offset as in DCNv1, but also modulates it using the learned modulation parameters. To be specific, the DCNv2 introduces a modulation scalar

Δ m_{k}

in the convolution process, which represents the feature amplitude change at the kth position. This enables the network to prioritize critical features, reduce irrelevant information interference, and consequently enhance the model’s performance. The structure of the DCNv2 is shown in Figure 9.

The deformation characteristics of the target in the deep layers are more intricate in the deep learning networks. Therefore, in this study, the convolution operation in the final layer, C2f, of the backbone network was replaced by DCNv2 in this study, and the C2f-DCNv2 module was proposed. To be specific, the convolution in the bottleneck structure of C2f was specially replaced by DCNv2, enabling more flexible sampling of the input feature maps and better learning of the target scale and deformations. After the convolution layer, the residual connection was employed to fuse the output feature map with the input feature map. The structure of the improved C2f-DCNv2 is shown in Figure 10, where

\oplus

represents the residual connection.

2.3. Matching the Tea Bud with Its Plucking Point

After the instance segmentation of the tea images, the tea bud mask and plucking area mask were acquired. But each tea bud mask and plucking area mask received from the instance segmentation was a single mask; they were not related to each other. The plucking point and the tea bud posture were what we wanted in the tea plucking activity. To obtain this information, the plucking area and the tea bud on the same tea should be matched first. The accurate matching of the plucking area and the tea bud is a crucial prerequisite for obtaining the optimal plucking posture. Enhancing the matching accuracy not only improves the overall matching effectiveness, but also contributes to the precision calculations of the tea bud plucking posture significantly. Furthermore, it facilitates the actual high-precision mechanized tea bud harvesting. Two algorithms were proposed to perform the matching, including the nearest point matching method and the point in range matching method.

2.3.1. Plucking Point Localization

To achieve the matching of the plucking area and the tea bud, the plucking point should be calculated first. The masks of the plucking area we obtained from the instance segmentation were a series of mask matrices. Every element of the matrix was the probability that the position of the original image belongs to the plucking area. As for the plucking area, the coordinates of the plucking point could be obtained by centroid calculating the mask matrix of the plucking area. Centroids of the mask matrix can be calculated with Equations (1) and (2).

x_{c} = \frac{\sum p_{i} x_{i}}{\sum p_{i}}

(1)

y_{c} = \frac{\sum p_{i} y_{i}}{\sum p_{i}}

(2)

where

x_{c}

and

y_{c}

represent the width and height coordinate of the centroid,

x_{i}

and

y_{i}

represent width and height coordinate of every element in the mask matrix, and

p_{i}

represents the probability that the position of the element belongs to the plucking area.

The plucking point could be obtained from the above calculation, and it is definitely located in the plucking area. To match the plucking area with the tea bud, a feature point of the tea bud area should be calculated too. For the sake of convenience, the central point of the tea bud’s bounding box was chosen as the feature point, which can be calculated with Equations (3) and (4).

x_{t} = \frac{x_{l e f t - t o p} + x_{r i g h t - b o t t o m}}{2}

(3)

y_{t} = \frac{y_{l e f t - t o p} + y_{r i g h t - b o t t o m}}{2}

(4)

where

x_{l e f t - t o p}

and

y_{l e f t - t o p}

represent the width and height coordinates of the left-top point of the tea bud bounding box,

x_{r i g h t - b o t t o m}

and

y_{r i g h t - b o t t o m}

represent the width and height coordinate of the right-bottom point of the tea bud bounding box.

The central point of the tea bud bounding box and the centroid of the plucking area are shown in Figure 11.

2.3.2. Nearest Point Matching Method

The nearest point matching method (NPM) was proposed to realize the matching of the plucking point and the tea bud in this study. Two point sets were obtained through the above calculation. Point set A was a collection of the plucking points of the tea bud. Point set B was a collection of the central points of the tea buds’ bounding boxes. As shown in Figure 12, the blue points belong to the point set A and the red points belong to the point set B. According to the analysis on the tea image dataset, the plucking point of a tea bud is closer to the central point of its bounding box than to the bounding box of another tea bud. Therefore, the core idea of the nearest point matching method is to match every member of point set A to the nearest member of point set B. More detailed procedures for the nearest point matching method can be seen in Algorithm 1.

Algorithm 1: Nearest point matching method
Input: Point set A: { $x_{a}, y_{a} \| (x_{a}, y_{a}) \in$ centroid of plucking area}; Point set B: { $x_{b}, y_{b} \| (x_{b}, y_{b}) \in$ central points of tea bud bounding box};
Output: Result ←List: $((x_{i}, y_{i})$ , $(x_{j}, y_{j}) \|$ $((x_{i}, y_{i})$ in A matched with the nearest $(x_{j}, y_{j}) \|$ in B))
1:	Result ← zeros (row ← min (row of A, row of B), column ← 4)
2:	for i = 1, …, n (n ← row of A)
3:	D ← zeros (row = row of A, column = 1)
4:	for j = 1, …, m (m ← row of B)
5:	$d_{j}$ ← distance of $(x_{a} [i], y_{a} [i])$ and $(x_{b} [j], y_{b} [j])$
6:	k ← Index of minimum of d
7:	Result [i] ← $((x_{a} [i], y_{a} [i]), (x_{b} [k], y_{b} [k]))$
8:	B ← B without $(x_{b} [k], y_{b} [k])$
9:	Return Result

2.3.3. Point in Range Matching Method

Generally speaking, the centroid of the plucking area is located in the bounding box of the tea bud because the plucking area is one part of the tea bud. Therefore, the point in range matching method (PIRM) can be used to match the plucking point with the central point of the tea bud’s bounding box. As shown in Figure 3, the width coordinate x of the plucking point was located between the left and the right of the tea bud. Similarly, the height coordinate y of the plucking point was located between the top and the bottom of the tea bud. The core idea of the PIRM is to seek which tea bud bounding box satisfies the conditions of Equations (5) and (6).

x_{b}^{l e f t} \leq x_{a} \leq x_{b}^{r i g h t}

(5)

y_{b}^{b o t t o m} \leq y_{a} \leq y_{b}^{t o p}

(6)

where

x_{a}

and

y_{a}

denote width and height coordinates of the plucking point,

x_{b}^{l e f t}

and

x_{b}^{r i g h t}

denote the left and right coordinates of the bounding box, and

y_{b}^{b o t t o m}

and

y_{b}^{t o p}

denote the bottom and top coordinates of the bounding box. More detailed procedures for the PIRM can be seen in Algorithm 2.

Algorithm 2: Point in range matching method
$Input : Point set A \leftarrow {x_{a}, y_{a} \| (x_{a}, y_{a}) \in$ centroid of plucking area}; Point set B $\leftarrow {x_{b}^{l e f t}, x_{b}^{r i g h t}, y_{b}^{b o t t o m}, y_{b}^{t o p} \| (x_{b}^{l e f t}, x_{b}^{r i g h t}, y_{b}^{b o t t o m}, y_{b}^{t o p}) \in$ limits of tea bud area bounding box};
Output: Result $\leftarrow List : ((x_{i}, y_{i}), (x_{j}, y_{j}) \| ((x_{i}, y_{i})$ $in A matched with (x_{j}, y_{j})$ $calculated from B))$
1:	Result ← zeros (row ← min (row of A, row of B), column ← 4)
2:	for i = 1, …, n (n ← row of A)
3:	for j = 1, …, m (m ← row of B)
4:	$if x_{b}^{l e f t} [j] < x_{a} [i] < x_{b}^{r i g h t} [j], y_{b}^{b o t t o m} [j] < y_{a} [i] < y_{b}^{t o p} [j]$
5:	Result $[i] \leftarrow (x_{a} [i], y_{a} [i]), ((x_{l e f t} [j] + x_{r i g h t} [j]) / 2, (y_{b o t t o m} [j] + y_{t o p} [j]) / 2)$
6:	$B \leftarrow B without (x_{b}^{l e f t} [j], x_{b}^{r i g h t} [j], y_{b}^{b p t t o m} [j], y_{b}^{t o p} [j])$
7:	break
8:	else
9:	continue
10:	Return Result

2.4. Plucking Posture Determination

Typically, tea buds grow upwards due to the characteristics of apical dominance and heliotropism. In many studies, the mechanized harvesting of tea leaves was accomplished by vertically descending the plucking end-effector to reach the designated plucking position [35,36,37]. In practice, however, the growth direction of the tea buds is not completely straight up, owing to the influences of the planting position on the tea plant, the situation regarding the growth of the stem on which the tea bud is situated, and so on. The tea buds which are located at the edges of the tea bushes, especially, may even grow horizontally. Therefore, if the plucking manipulator simply moves straight down to the plucking point to pick the tea bud, it may cause failure or even damage the tea plant.

In view of the above problems, a method was proposed to determine the plucking direction of the tea bud. The plucking direction is opposite to the direction of the growth of the tea bud. Hence, the plucking manipulator can move along the plucking direction obtained in this study to achieve better plucking, avoiding errors or even failures caused by a direct downward motion.

Given the above studies, once the plucking point is matched to the central point of the tea bud’s bounding box, the plucking direction of the tea bud can be determined.

2.4.1. Two-Dimensional Plucking Posture Determination

The two-dimensional plucking direction, which is the plucking direction on the image of the tea bud, was determined first. As shown in Figure 13, the direction from the plucking point to the central point of the tea bud bounding box is the growth direction. Inversely, the direction from the central point of the tea bud bounding box to the plucking point is the plucking direction. Figure 13b was the partial enlarged detail of the frame in Figure 13a. Equation (7) shows the calculated formula of growth direction.

\vec{B A} = A - B

(7)

where A is the coordinate of the central point of the tea bud bounding box and B is the coordinate of the plucking point.

2.4.2. Three-Dimensional Plucking Posture Determination

The two-dimensional plucking direction cannot be used directly in the plucking operation of the tea bud. It is necessary to compute the three-dimensional plucking direction based on the two-dimensional direction obtained from the above study. A depth camera (such as Intel Realsense, Microsoft Kinect, and so on) can be used to obtain the depth information of every point on the tea bud, which can help us to determine the three-dimensional location of every tea bud. The three-dimensional plucking direction can be obtained when the two-dimensional direction line is matched with the spatial coordinates of the corresponding points.

As shown in Figure 14, the central point of the tea bud bounding box is not necessarily located on the tea bud region, and the spatial coordinates of the points on the tea bud regions vary considerably. But the plucking point is located on the plucking area definitely because it is the centroid of this region and the plucking areas are generally approximating rectangles. Meanwhile, the two-dimensional plucking direction line of the tea bud definitely passes through the plucking area. Therefore, the points of the intersection of the two-dimensional plucking direction line and the plucking area are first calculated, and then the fitted straight line of these points can be used as the three-dimensional plucking direction line.

Equation (8) represents the three-dimensional plucking direction line; its transformation equation can be seen as Equations (9) and (10).

\frac{x - x_{0}}{m} = \frac{y - y_{0}}{n} = \frac{z - z_{0}}{p}

(8)

x = \frac{m}{p} (z - z_{0}) + x_{0} = k_{1} z + b_{1}

(9)

y = \frac{n}{p} (z - z_{0}) + y_{0} = k_{2} z + b_{2}

(10)

where the parameters k1, b1, k2, and b2 were transformed using Equations (11)–(14).

k_{1} = \frac{m}{p}

(11)

b_{1} = x_{0} - \frac{m}{p} z_{0}

(12)

k_{2} = \frac{n}{p}

(13)

b_{2} = y_{0} - \frac{n}{p} z_{0}

(14)

Hence, the spatial line can be regarded as the intersection of the two planes in Equations (9) and (10). The least-squares method was utilized to fit the spatial line, and when the residual analysis was first finished, the residual sum of the squares was calculated as in Equations (15) and (16).

R_{1} = \sum_{i = 1}^{n} {(x_{i} - k_{1} z_{i} - b_{1})}^{2}

(15)

R_{2} = \sum_{i = 1}^{n} {(y_{i} - k_{2} z_{i} - b_{2})}^{2}

(16)

where n represent the number of the points. To obtain the minimum residual, the partial differential of Equations (15) and (16) were calculated as in Equations (17)–(20).

\sum_{i = 1}^{n} 2 (x_{i} - k_{1} z_{i} - b_{1}) \times (- z_{i}) = 0

(17)

\sum_{i = 1}^{n} 2 (x_{i} - k_{1} z_{i} - b_{1}) \times (- 1) = 0

(18)

\sum_{i = 1}^{n} 2 (y_{i} - k_{2} z_{i} - b_{2}) \times (- z_{i}) = 0

(19)

\sum_{i = 1}^{n} 2 (y_{i} - k_{2} z_{i} - b_{2}) \times (- 1) = 0

(20)

From Equations (17)–(20), the fitted value of k1, b1, k2, and b2 can be calculated as in Equations (21)–(24).

k_{1} = \frac{n \sum_{i = 1}^{n} x_{i} z_{i} - \sum_{i = 1}^{n} x_{i} \times \sum_{i = 1}^{n} z_{i}}{n \sum_{i = 1}^{n} z_{i}^{2} - \sum_{i = 1}^{n} z_{i} \times \sum_{i = 1}^{n} z_{i}}

(21)

b_{1} = \frac{\sum_{i = 1}^{n} x_{i} - k_{1} \sum_{i = 1}^{n} z_{i}}{n}

(22)

k_{2} = \frac{n \sum_{i = 1}^{n} y_{i} z_{i} - \sum_{i = 1}^{n} y_{i} \times \sum_{i = 1}^{n} z_{i}}{n \sum_{i = 1}^{n} z_{i}^{2} - \sum_{i = 1}^{n} z_{i} \times \sum_{i = 1}^{n} z_{i}}

(23)

b_{2} = \frac{\sum_{i = 1}^{n} y_{i} - k_{1} \sum_{i = 1}^{n} z_{i}}{n}

(24)

From the above equations, the fitted line of the three-dimensional plucking direction can be acquired.

Once the three-dimensional plucking direction is obtained, the movement direction of the plucking manipulator can be calculated using a coordinate transformation. The manipulator can be directed to move to the plucking point along the plucking direction to pluck the tea bud.

2.5. General Summary of the Method

To summarize, this study focuses on determining the tea bud plucking point and plucking posture, which are the crucial aspects of mechanized tea plucking implementation. The workflow of the proposed method is shown in Figure 15. Firstly, a dataset of tea bud images was established by capturing photographs in the tea garden and conducting pre-processing procedures. In this stage, the particular focus lies in annotating both the tea bud and the plucking area together. Secondly, the improved YOLOv8-seg network was employed to accurately segment the tea bud and plucking area. However, it is important to note that while achieving the precise recognition and segmentation of the tea bud and plucking area is desirable, it is not the primary focus of this study. Instead, the main objective in this step was to extract feature points and plucking points through a series of image operations. Finally, the unmatched point sets obtained from the previous step were paired using NPM or PIRM, which were proposed in this study. The connected lines of these pairs represent the two-dimensional plucking directions of the tea buds. Subsequently, the fitted line of the intersection points of the two-dimensional plucking direction and plucking area was calculated as the three-dimensional plucking direction.

3. Results and Discussion

3.1. Evaluation Metrics

Two aspects of performance evaluation were included in this study: one is the evaluation of the detection and segmentation of the tea buds and plucking area, and the other is the evaluation of the feature point matching.

The evaluation of the detection and segmentation was mainly performed by evaluating the results of the YOLO network used in this study. Precision, recall, F1 score, and GFLOPs were included in the evaluation metrics. Precision, calculated with Formula (25), is the ratio of actual true samples to all the samples predicted as true, which represents the accuracy of prediction of the true samples. Recall is the ratio of samples predicted as true in the actual true samples, which can be calculated with Formula (26). F1 score, calculated with Formula (27), is a balance indicator of precision and recall.

p r e c i s i o n = \frac{T P}{T P + F P}

(25)

r e c a l l = \frac{T P}{T P + F N}

(26)

F 1 s c o r e = \frac{2 * p r e c i s i o n * r e c a l l}{p r e c i s i o n + r e c a l l}

(27)

where TP is the number of actual true samples which were predicted as true samples, FP is the number of false samples which were predicted as true samples, and FN is the number of actual true samples which were predicted as false samples.

GFLOPs represent the total number of floating-point operations executed using a model during inference or training, serving as a metric for the complexity of a model. A higher GFLOP value generally indicates that the model possesses greater computational complexity and requires more additional computational resources for processing.

The accuracy and influence of feature point matching on plucking efficiency was the main factor considered in the evaluation of the feature point matching. Therefore, matching accuracy and matching time were proposed as the evaluation metrics, which were all obtained using the statistical analysis of the matching test data.

3.2. Model Training of LDS-YOLOv8

The LDS-YOLOv8 deep learning network was built based on the pytorch framework, and Win 10 OS with 128 GB RAM, an Intel(R) Xeon(R) E5-2680 v4 @ 2.40 GHz CPU, and an NVIDIA RTX A6000 48 G GPU was used as the training platform in this study. The software used in the training and testing processes included python3.8, CUDA11.4, cuDNN8.2.2, etc. In view of the balance between computer performance and training efficiency, the batch size was set as 24. The initial learning rate and final learning rate were both set to 0.01.

3.3. Evaluation of Tea Bud Recognition and Plucking Area Detection

3.3.1. Detection and Segmentation Results Using the Improved YOLOv8 Model

The YOLOv8 network architecture provides five different sizes of network models. As the model size increases, there is a corresponding improvement in detection and segmentation accuracy. However, this enhancement comes at the cost of increased computation time. For tea plucking, the calculation time of models of different sizes can all meet their actual needs; therefore, the improvements and tests in this study were all conducted based on the YOLOv8x-seg model for the best accuracy. The overall situations of the training process of the original and improved YOLOv8-seg model were analyzed, and Figure 16 shows the trend in the change in the loss value during training, including box loss, classification loss, distribution focal loss, and segmentation loss. It can be seen that both the original and improved models exhibited a rapid decrease in loss value with the increase in the epoch, and the loss value became stable eventually, which indicated that the training progressed well and that the results were convergent. Additionally, the improved model achieved convergence earlier than the original model, which demonstrated the improved efficiency of its training process.

Table 1, Table 2, Table 3 present the results of object detection and instance segmentation acquired using the YOLOv8x-seg and LDS-YOLOv8x-seg models for tea buds, plucking area, and overall situation, respectively. As shown in Table 1, the precision and mAP of the LDS-YOLOv8x-seg model outperformed those of the YOLOv8x-seg model in terms of tea bud detection. As observed in Table 2, the precision, recall and mAP of the LDS-YOLOv8x-seg model all outperformed those of the YOLOv8x-seg model in terms of plucking area detection. From a comprehensive perspective, as shown in Table 3, the precision, recall, and mAP of the object detection achieved using the LDS-YOLOv8x-seg model surpassed those of the YOLOv8x-seg model. In respect to the instance segmentation of the tea buds, as shown in Table 1, both the precision and mAP of the LDS-YOLOv8x-seg model outperformed those of the YOLOv8x-seg model, with a 3.6% increase in precision. Regarding the instance segmentation of the plucking area, as observed in Table 2, the precision of the LDS-YOLOv8x-seg model was marginally lower than that of the original YOLOv8x-seg model; however, both the recall and mAP witnessed significant enhancements. Comprehensively, as depicted in Table 3, in terms of instance segmentation, the LDS-YOLOv8x-seg model manifested notable improvements in both precision and mAP while maintaining a substantially unchanged recall compared to the original YOLOv8x-seg. The results presented above demonstrate that the improved model of this paper exhibited superior performance in detection and segmentation on the tea buds and plucking area compared to the original model.

It can also be seen from Table 1 and Table 2 that both the original and improved models performed better on the detection and segmentation of the tea bud than those of the plucking area, with the precision, recall, and mAP of the tea bud detection and segmentation outperformed those of the plucking area detection and segmentation. Analyzing the reason for this, it may be that the shape characteristics of the tea bud were well defined, while the plucking area is smaller than the tea bud and the color and shape are less distinct from its surrounding area. Moreover, it is undeniable that the tea bud and plucking area of every tea have been labeled together during the image labeling, which is beneficial for the detection of the plucking area.

In addition, the precision, recall, and mAP results of tea bud detection and segmentation exhibited smaller variance; however, for the plucking area detection and segmentation, the results were notably different. Analyzing the reason for this, the plucking area is small, which makes it difficult to detect and segment. Meanwhile, even a small error can lead to a large scaling deviation because of its small dimension.

3.3.2. Performance Comparison Between Different Networks

To further illustrate the effectiveness of this study, the improved algorithm in this paper was compared to other algorithms. Compared to the existing research on tea bud target recognition, there is a scarcity of studies focusing on the instance segmentation of both the tea buds and plucking area. This aspect holds significant importance, as it serves as a crucial prerequisite for determining the optimal plucking posture and identifying the precise plucking points.

Comparative analysis of the algorithm in the relevant study and the typical instance segmentation Mask R-CNN were conducted. Chen et al. [31] achieved the detection and segmentation of tea buds using a combination of the Faster-RCNN and FCN-16s models. They employed the two-stage target detection network Faster-RCNN for tea bud recognition, followed by the application of the trained FCN-16s network to accomplish precise tea bud segmentation. The Mask R-CNN network with different backbones were also calculated in this comparison. As presented in Table 4, comparing these methods to the method introduced in this paper, it is observed that the precision of tea bud detection achieved using our method notably surpasses theirs.

In terms of the overall recognition and segmentation effect, the average precision of the method proposed in this paper was also notably higher than others. The average recall of this study did not surpass that of others; however, in the actual mechanized tea bud plucking operation, precision holds greater significance than recall due to the fact that the mechanized plucking operations tend to be repeated in round trips.

The method in this paper simultaneously segmented the tea bud and plucking area, while their method only segmented the plucking area in terms of instance segmentation [31]. In addition, the one-stage YOLO algorithm exhibits a notable efficiency advantage when compared to the two-stage instance segmentation algorithm Mask RCNN.

3.3.3. Ablation Experiments

To investigate the effect of each enhancement strategy, including improved head, SPPF-LSKA, and C2f-DCNv2, on improving the model’s object detection and instance segmentation performance, ablation experiments were performed using the validation dataset in this study. The results are presented in Table 5. It is apparent from the table that, in comparison to the original YOLOv8-seg model, the improved model demonstrates a significantly enhanced performance effect. Analysis of the three individual enhancements indicates that the network incorporating the improved head module exhibited superior performance in both object detection and instance segmentation tasks. Specifically, in object detection, precision improved by 1.3%, recall increased by 1.2%, and mAP improved by 1.6% relative to the original network model. In terms of instance segmentation, both precision and mAP showed significant improvements, while the model’s computational cost was markedly reduced, achieving a 12.54% decrease compared to the original YOLOv8-seg. This further demonstrated the substantial impact of the improved head module on model lightweighting. For the improved model with the SPPF-LSKA module, it can be seen that the module has a significant improvement in detection and instance segmentation performance; all the indicators in the table were increased compared to the original model. The network enhanced with C2f-DCNv2 also exhibited superior performance in both instance segmentation and object detection, with the recall and mAP increasing by 1.2% and 0.8%, respectively, in detection, and all of the indicators improved during instance segmentation.

Finally, the enhanced model incorporating all three improvement modules exhibited a significantly superior enhancement effect compared to the original model and the improved model with the respective module. Among the seven evaluation indicators shown in the table, the improved model incorporating all three improvement modules outperformed the others in six of them. The overall detection and segmentation performance of the improved model for the tea buds and plucking area was effectively enhanced, with the precision of detection and segmentation improving by 2.1% and 1.3%, respectively. Furthermore, the computational cost of the entire model decreased by 12.77% compared to the original model.

The aforementioned findings demonstrate the efficacy of the YOLOv8-seg model’s enhancements, further proving that the LDS-YOLOv8-seg model is better suited for the detection and segmentation of the tea buds and plucking area.

Figure 17 shows the detection and segmentation results of the tea buds and plucking areas achieved using the original and improved YOLOv8-seg model. The figure demonstrates that the improved method exhibited superior recognition and segmentation capabilities.

3.4. Evaluation of the Matching and Plucking Posture Determination

Matching accuracy and matching time are the key factors that affect the success and efficiency of tea bud plucking. The matching of the central points of the tea buds’ bounding boxes and the centroids of plucking areas using NPM and PIRM were carried out, respectively, on 200 images of the test dataset using the LDS-YOLOv8-seg model in this study. The matching accuracy was calculated by counting the number of identified feature points and the number of points matched correctly. The results of matching accuracy and matching time are listed in Table 6. Using statistical analysis, a total of 788 pairs of feature points were matched using the NPM, of which 712 pairs were correctly matched with an accuracy of 90.355%, while a total of 779 pairs of feature points were matched using the PIRM, of which 773 pairs were correctly matched with an accuracy of 99.229%. The NPM achieved more feature points matching pairs; however, there were more false matches. In terms of matching time, the average consumption time and standard deviation for each image with the NPM were 0.175 ms and 0.380 ms, respectively, while the results of the PIRM were 2.363 ms and 0.995 ms.

As can be seen from the above results, the average matching time using the NPM is shorter than when using the PIRM, but its matching accuracy is lower. In fact, regarding the plucking movement time of the tea bud plucking apparatus, the average matching time of both methods all fully meet the requirements for tea plucking. From a comprehensive point of view, the PIRM has a better effect on the matching of the tea buds and plucking area.

Figure 18 shows the matching results of the tea buds and plucking area.

3.5. Analysis of Failure Cases

Failure cases were encountered in the instance segmentation and feature point matching. The difficulty of detecting small targets is the main reason for the failure of the tea bud and plucking area identification. In addition, the recognition accuracy of the network model is also affected by the large difference between the size of the tea buds and the plucking area. According to the statistics of the pixel areas of the tea buds and plucking area in the images of the training dataset used in this study, as shown in Table 7, there is a large gap between the two kinds of pixel areas.

Analysis of the causes of failure in the feature point matching process was the main work in this study. In the failure cases of the PIRM, we learned that the main reason was that the centroid of the plucking area and the central point of the tea bud bounding box were located together or separately in the bounding box of another tea bud. As shown in Figure 19, the plucking point of the NO.1 tea bud was located in the bounding box of the NO.2 tea bud and was closer to the central point of the bounding box of the NO.2 tea bud. In the failure cases of the NPM, the main reason was that the plucking point of another tea was indeed closer to the central point of the bounding box of this tea than the bounding box of that tea. It can be seen in Figure 20a that the plucking point of the NO.1 tea bud was closer to the central point of the bounding box of the NO.2 tea bud than to the NO.1 tea bud.

All in all, the reason for the mistake in feature points matching was that the tea buds to be plucked were growing too close together. In addition, after a pair of points was successfully matched in the algorithm, they were removed from its point set, so once a pair of points was incorrectly matched, there was a second matching error, as shown in Figure 20a, or even if three or more pairs of points were matched incorrectly at the same time, as shown in Figure 20b. In addition, the matching sequence also affects the matching result. As shown in Figure 20a, if the matching started from the central point of the No.1 tea bud bounding box, there would be no error; otherwise, if the matching started from the central point of the No.2 tea bud’s bounding box, both matches were wrong. In particular, as shown in Figure 20b, all three matches were incorrect because the matching started from the central point of the NO.3 tea bud’s bounding box. However, due to the random position of the tea buds in one image, a certain matching sequence could not be set in advance.

4. Conclusions

(1) An improved YOLOv8-seg model was proposed for the detection and instance segmentation of tea buds and their plucking area, aiming at the automatic plucking of the tea buds. Incorporating the improved head, SPPF-LSKA, and C2f-DCNv2 modules, the LDS-YOLOv8-seg model demonstrated superior performance in detecting and segmenting tea buds and the plucking area with detection precision, recall, and mAP values of 0.818, 0.761, and 0.835, as well as segmentation precision, recall, and mAP values of 0.739, 0.68, and 0.719. Furthermore, the computational cost decreased by 12.77% compared to the original model.

(2) It is a critical prerequisite to locate the plucking point of tea for automatic plucking. The centroid of the plucking area was computed by mapping the probability of each pixel in the mask image to the coordinates of its corresponding pixel in the original image. The proposed method is more accurate and versatile compared to traditional plucking point localization methods.

(3) In order to carry out automatic tea plucking more accurately, a plucking posture determination method was developed in this study. The plucking point and the central point of the tea bud bounding box were identified first, and then the matching of these points was achieved using either the PIRM or the NPM proposed in this study. Two-dimensional plucking posture can be determined using the matched points. Combined with three-dimensional information, three-dimensional plucking posture can be obtained. The matching effects of the PIRM and the NPM were also compared, and it is shown that the PIRM performed better, with 99.229% matching accuracy and an average consumption time of 2.363 milliseconds.

(4) The failure cases in the feature points matching process were analyzed in this study. The main reason for the mismatch was that the tea buds that needed to be plucked were growing too close together. In addition, the matching order of different pairs of points on the same image also affected the matching accuracy.

As discussed previously, for tea leaves that grow close together, the method proposed in this paper is susceptible to incorrect matching, which consequently hinders the accurate determination of the plucking posture of tea bud. Therefore, our future research will investigate a more refined algorithm that integrates color and shape features for matching tea bud feature points, aiming to address the issue of failures in feature point matching caused by the sequence of matches. Another important research task is related to exploring the integration of the tea plucking posture determination method proposed in this paper into a tea bud plucking apparatus for field experiments.

Author Contributions

Conceptualization, C.D.; methodology, C.D.; software, C.D. and Z.Z.; validation, C.H., T.T. and Z.Z.; formal analysis, W.W.; investigation, C.D. and T.T.; data curation, C.H.; writing—original draft preparation, C.D. and C.H.; writing—review and editing, T.T.; visualization, W.L.; supervision, W.L.; funding acquisition, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guangdong Province Modern Agricultural Industry Technology System Innovation Team Construction Project (Tea) (grant No. 2023KJ120), the “14th Five-Year Plan” Guangdong Province Agricultural Science and Technology Innovation in the Ten main Directions “Unveiling the List of Hanging” Project—Lingnan Characteristic Fruit Intelligent Harvesting Technology (grant No. 2022SDZG03), and the 2023 Guangzhou Key Research and Development Project (grant No. 2023B01J2002).

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Han, Y.; Xiao, H.; Qin, G.; Song, Z.; Ding, W.; Mei, S. Developing Situations of Tea Plucking Machine. Engineering 2014, 6, 268–273. [Google Scholar] [CrossRef]
Motokura, K.; Takahashi, M.; Ewerton, M.; Peters, J. Plucking Motions for Tea Harvesting Robots Using Probabilistic Movement Primitives. IEEE Robot. Autom. Lett. 2020, 5, 3275–3282. [Google Scholar] [CrossRef]
Han, Y.; Xiao, H.; Song, Y.; Ding, Q. Design and Evaluation of Tea-Plucking Machine for Improving Quality of Tea. Appl. Eng. Agric. 2019, 35, 979–986. [Google Scholar] [CrossRef]
Wang, X.; Han, C.; Wu, W.; Xu, J.; Zhang, Q.; Chen, M.; Hu, Z.; Zheng, Z. Fundamental Understanding of Tea Growth and Modeling of Precise Tea Shoot Picking Based on 3-D Coordinate Instrument. Processes 2021, 9, 1059. [Google Scholar] [CrossRef]
Li, Y.; Wu, S.; He, L.; Tong, J.; Zhao, R.; Jia, J.; Chen, J.; Wu, C. Development and field evaluation of a robotic harvesting system for plucking high-quality tea. Comput. Electron. Agric. 2023, 206, 107659. [Google Scholar] [CrossRef]
Zhu, Y.; Wu, C.; Tong, J.; Chen, J.; He, L.; Wang, R.; Jia, J. Deviation Tolerance Performance Evaluation and Experiment of Picking End Effector for Famous Tea. Agriculture 2021, 11, 128. [Google Scholar] [CrossRef]
Li, Y.; He, L.; Jia, J.; Lv, J.; Chen, J.; Qiao, X.; Wu, C. In-field tea shoot detection and 3D localization using an RGB-D camera. Comput. Electron. Agric. 2021, 185, 106149. [Google Scholar] [CrossRef]
Liu, Z.; Zhou, Z. Research progress of tea image feature extraction and its application. J. Green Sci. Technol. 2021, 23, 207–209. [Google Scholar] [CrossRef]
Tian, J.; Zhu, H.; Liang, W.; Chen, J.; Wen, F.; Long, Z. Research on the Application of Machine Vision in Tea Autonomous Picking. J. Phys. Conf. Ser. 2021, 1952, 022063. [Google Scholar] [CrossRef]
Wei, B. Research on tea leaf recognition based on the color and shape features. Fujian Tea 2016, 38, 16–17. [Google Scholar] [CrossRef]
Thangavel, S.K.; Murthi, M. A semi automated system for smart harvesting of tea leaves. In Proceedings of the 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 January 2017; pp. 1–10. [Google Scholar]
Chen, J.; Chen, Y.; Jin, X.; Che, J.; Gao, F.; Li, N. Research on a Parallel Robot for Tea Flushes Plucking. In Proceedings of the Proceedings of the 2015 International Conference on Education, Management, Information and Medicine, Shenyang, China, 24–26 April 2015; pp. 22–26. [Google Scholar]
Wu, X.; Tang, X.; Zhang, F.; Gu, J. Tea buds image identification based on lab color model and K-means clustering. J. Chin. Agric. Mech. 2015, 36, 161–164, 179. [Google Scholar] [CrossRef]
Karunasena, G.M.K.B.; Priyankara, H. Tea Bud Leaf Identification by Using Machine Learning and Image Processing Techniques. Int. J. Sci. Eng. Res. 2020, 11, 624–628. [Google Scholar] [CrossRef]
Wang, C.; Tang, Y.; Zou, X.; Luo, L.; Chen, X. Recognition and Matching of Clustered Mature Litchi Fruits Using Binocular Charge-Coupled Device (CCD) Color Cameras. Sensors 2017, 17, 2564. [Google Scholar] [CrossRef] [PubMed]
Fu, L.; Feng, Y.; Wu, J.; Liu, Z.; Gao, F.; Majeed, Y.; Al-Mallahi, A.; Zhang, Q.; Li, R.; Cui, Y. Fast and accurate detection of kiwifruit in orchard using improved YOLOv3-tiny model. Precis. Agric. 2021, 22, 754–776. [Google Scholar] [CrossRef]
Gené-Mola, J.; Vilaplana, V.; Rosell-Polo, J.R.; Morros, J.-R.; Ruiz-Hidalgo, J.; Gregorio, E. Multi-modal deep learning for Fuji apple detection using RGB-D cameras and their radiometric capabilities. Comput. Electron. Agric. 2019, 162, 689–698. [Google Scholar] [CrossRef]
Mirbod, O.; Choi, D.; Heinemann, P.H.; Marini, R.P.; He, L. On-tree apple fruit size estimation using stereo vision with deep learning-based occlusion handling. Biosyst. Eng. 2023, 226, 27–42. [Google Scholar] [CrossRef]
Tang, Y.; Zhou, H.; Wang, H.; Zhang, Y. Fruit detection and positioning technology for a Camellia oleifera C. Abel orchard based on improved YOLOv4-tiny model and binocular stereo vision. Expert Syst. Appl. 2023, 211, 118573. [Google Scholar] [CrossRef]
Rong, J.; Wang, P.; Wang, T.; Hu, L.; Yuan, T. Fruit pose recognition and directional orderly grasping strategies for tomato harvesting robots. Comput. Electron. Agric. 2022, 202, 107430. [Google Scholar] [CrossRef]
Wang, Y.; Lv, J.; Xu, L.; Gu, Y.; Zou, L.; Ma, Z. A segmentation method for waxberry image under orchard environment. Sci. Hortic. 2020, 266, 109309. [Google Scholar] [CrossRef]
Tang, Y.; Qiu, J.; Zhang, Y.; Wu, D.; Cao, Y.; Zhao, K.; Zhu, L. Optimization strategies of fruit detection to overcome the challenge of unstructured background in field orchard environment: A review. Precis. Agric. 2023, 24, 1183–1219. [Google Scholar] [CrossRef]
Li, X.; Liu, B.; Shi, Y.; Xiong, M.; Ren, D.; Wu, L.; Zou, X. Efficient three-dimensional reconstruction and skeleton extraction for intelligent pruning of fruit trees. Comput. Electron. Agric. 2024, 227, 109554. [Google Scholar] [CrossRef]
Chen, B.; Yan, J.; Wang, K. Fresh Tea Sprouts Detection via Image Enhancement and Fusion SSD. J. Control Sci. Eng. 2021, 2021, 6614672. [Google Scholar] [CrossRef]
Chen, C.; Lu, J.; Zhou, M.; Yi, J.; Liao, M.; Gao, Z. A YOLOv3-based computer vision system for identification of tea buds and the picking point. Comput. Electron. Agric. 2022, 198, 107116. [Google Scholar] [CrossRef]
Gui, Z.; Chen, J.; Li, Y.; Chen, Z.; Wu, C.; Dong, C. A lightweight tea bud detection model based on Yolov5. Comput. Electron. Agric. 2023, 205, 107636. [Google Scholar] [CrossRef]
Wang, T.; Zhang, K.; Zhang, W.; Wang, R.; Wan, S.; Rao, Y.; Jiang, Z.; Gu, L. Tea picking point detection and location based on Mask-RCNN. Inf. Process. Agric. 2021, 10, 267–275. [Google Scholar] [CrossRef]
Yang, H.; Chen, L.; Chen, M.; Ma, z.; Deng, F.; Li, M.; Li, X. Tender Tea Shoots Recognition and Positioning for Picking Robot Using Improved YOLO-V3 Model. IEEE Access 2019, 7, 180998–181011. [Google Scholar] [CrossRef]
Yang, H.; Chen, L.; Ma, Z.; Chen, M.; Zhong, Y.; Deng, F.; Li, M. Computer vision-based high-quality tea automatic plucking robot using Delta parallel manipulator. Comput. Electron. Agric. 2021, 181, 105946. [Google Scholar] [CrossRef]
Xu, W.; Zhao, L.; Li, J.; Shang, S.; Ding, X.; Wang, T. Detection and classification of tea buds based on deep learning. Comput. Electron. Agric. 2022, 192, 106547. [Google Scholar] [CrossRef]
Chen, Y.-T.; Chen, S.-F. Localizing plucking points of tea leaves using deep convolutional neural networks. Comput. Electron. Agric. 2020, 171, 105298. [Google Scholar] [CrossRef]
Torralba, A.; Russell, B.C.; Yuen, J. LabelMe: Online Image Annotation and Applications. Proc. IEEE 2010, 98, 1467–1484. [Google Scholar] [CrossRef]
Sapkota, R.; Ahmed, D.; Karkee, M. Comparing YOLOv8 and Mask R-CNN for instance segmentation in complex orchard environments. Artif. Intell. Agric. 2024, 13, 84–99. [Google Scholar] [CrossRef]
Wu, Y.; He, K. Group Normalization. Int. J. Comput. Vis. 2020, 128, 742–755. [Google Scholar] [CrossRef]
Lin, G.; Chen, D.; Chen, J.; Zhong, K.; Cai, L.; Lin, F.; Zheng, X.; Li, W. Design and testing of a Cutting-collecting integrated end-effector for tea picking. Mech. Electr. Eng. Technol. 2023, 52, 42–45+172. [Google Scholar] [CrossRef]
Zhang, Z.; Zhu, L.; Lin, G.; Zhang, S.; Guan, J. Research progress on key technologies of the famous tea picking end effectors. Mordem Agric. Equip. 2022, 43, 7–12. [Google Scholar] [CrossRef]
Chen, G.; Mao, b.; Zhang, Y. Research on key technologies of famous tea picking robot. J. Chin. Agric. Mech. 2023, 44, 174–179. [Google Scholar] [CrossRef]

Figure 1. Situation of the image acquisition.

Figure 2. Annotation of the tea image: (a) original image; (b) region of tea bud; (c) region of plucking area; (d) annotated image.

Figure 3. The structure of the original YOLOv8-seg network.

Figure 4. The structure of the LDS-YOLOv8-Seg network.

Figure 5. The original YOLOv8-seg head structure.

Figure 6. The improved YOLOv8-seg head structure.

Figure 7. The structure of the LKA module and LSKA module: (a) LKA module; (b) LSKA module.

Figure 8. The structure of the SPPF-LSKA module.

Figure 9. The structure of the DCNv2 module.

Figure 10. The structure of the improved C2f-DCNv2.

Figure 11. Schematic diagram of the feature points determination.

Figure 12. Nearest point matching method (NPM): (a) the original image; (b) two point sets of feature points; (c) the matched point sets of the image.

Figure 13. Determination of two-dimensional plucking posture: (a) plucking posture; (b) the growth direction and plucking direction.

Figure 14. Feature points of target regions. (a) Result of the two-dimensional image. (b) Result of the depth image (meters).

Figure 15. Flowchart of the determining method of the tea bud’s plucking point and plucking posture.

Figure 16. Curves of loss value changing for original and improved models: (a) box loss; (b) classification loss; (c) distribution focal loss; (d) segmentation loss.

Figure 17. Detection and segmentation results of the tea bud and the plucking area: (a) original image; (b) result of the YOLOv8-seg; (c) result of the LDS-YOLOv8-seg.

Figure 18. Matching results of the tea bud and plucking area.

Figure 19. Failure case of point in range matching method.

Figure 20. Failure case of nearest point matching method.

Table 1. Performance of the detection and segmentation of the tea buds.

Network Model	Precision (B)	Recall (B)	mAP (B)	Precision (M)	Recall (M)	mAP (M)
YOLOv8x-seg	0.799	0.875	0.922	0.806	0.870	0.922
LDS-YOLOv8x-seg	0.835	0.859	0.925	0.842	0.852	0.924

Table 2. Performance of the detection and segmentation of the plucking area.

Network Model	Precision (B)	Recall (B)	mAP (B)	Precision (M)	Recall (M)	mAP (M)
YOLOv8x-seg	0.794	0.618	0.707	0.645	0.492	0.499
LDS-YOLOv8x-seg	0.8	0.664	0.745	0.635	0.508	0.514

Table 3. Performance of the overall detection and segmentation situation.

Network Model	Precision (B)	Recall (B)	mAP (B)	Precision (M)	Recall (M)	mAP (M)
YOLOv8x-seg	0.797	0.747	0.815	0.726	0.681	0.710
LDS-YOLOv8x-seg	0.818	0.761	0.835	0.739	0.68	0.719

Table 4. Results of different networks.

Method	Precision of Tea Bud Detection	Average Precision of Overall Performance	Average Recall of Overall Performance	F1 Score of Overall Performance
Chen et al. [31]	0.727	0.668	0.777	/
Mask R-CNN with Resnet 50	0.734	0.666	0.724	0.694
Mask R-CNN with Resnet 101	0.693	0.617	0.713	0.662
Mask R-CNN with Resnet 152	0.724	0.662	0.720	0.689
This paper	0.835	0.739	0.680	0.708

Table 5. Results of the ablation experiments.

Improved Head	SPPF-LSKA	C2f-DCNv2	Precision (B)	Recall (B)	mAP (B)	Precision (M)	Recall (M)	mAP (M)	GFLOPs
-	-	-	0.797	0.747	0.815	0.726	0.681	0.710	344.5
√	-	-	0.81	0.759	0.831	0.735	0.674	0.714	301.2
-	√	-	0.798	0.747	0.817	0.73	0.681	0.715	345.8
-	-	√	0.793	0.759	0.823	0.731	0.688	0.718	342.5
√	√	√	0.818	0.761	0.835	0.739	0.68	0.719	300.5

Table 6. Accuracy and time of the feature points matching.

Matching Algorithm	Matching Accuracy	Matching Time
Matching Algorithm	Matching Accuracy	Average Time (Millisecond)	Standard Deviation (Millisecond)
NPM	90.355%	0.175	0.380
PIRM	99.229%	2.363	0.995

Table 7. Comparison of the pixel area of the tea bud and the plucking area.

Region	Maximum Pixel Area	Minimum Pixel Area	Average Pixel Area
Tea bud	171,662.354	974.700	16,687.086
Plucking area	5065.741	35.070	475.799

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dong, C.; Wu, W.; Han, C.; Zeng, Z.; Tang, T.; Liu, W. Plucking Point and Posture Determination of Tea Buds Based on Deep Learning. Agriculture 2025, 15, 144. https://doi.org/10.3390/agriculture15020144

AMA Style

Dong C, Wu W, Han C, Zeng Z, Tang T, Liu W. Plucking Point and Posture Determination of Tea Buds Based on Deep Learning. Agriculture. 2025; 15(2):144. https://doi.org/10.3390/agriculture15020144

Chicago/Turabian Style

Dong, Chengju, Weibin Wu, Chongyang Han, Zhiheng Zeng, Ting Tang, and Wenwei Liu. 2025. "Plucking Point and Posture Determination of Tea Buds Based on Deep Learning" Agriculture 15, no. 2: 144. https://doi.org/10.3390/agriculture15020144

APA Style

Dong, C., Wu, W., Han, C., Zeng, Z., Tang, T., & Liu, W. (2025). Plucking Point and Posture Determination of Tea Buds Based on Deep Learning. Agriculture, 15(2), 144. https://doi.org/10.3390/agriculture15020144

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Plucking Point and Posture Determination of Tea Buds Based on Deep Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Building

2.1.1. Image Acquisition

2.1.2. Data Processing

2.2. Instance Segmentation of Tea Bud and Plucking Area Based on Improved YOLOv8-Seg

2.2.1. The Improved Segmentation Head

2.2.2. SPPF-LSKA Module

2.2.3. C2f-DCNv2 Module

2.3. Matching the Tea Bud with Its Plucking Point

2.3.1. Plucking Point Localization

2.3.2. Nearest Point Matching Method

2.3.3. Point in Range Matching Method

2.4. Plucking Posture Determination

2.4.1. Two-Dimensional Plucking Posture Determination

2.4.2. Three-Dimensional Plucking Posture Determination

2.5. General Summary of the Method

3. Results and Discussion

3.1. Evaluation Metrics

3.2. Model Training of LDS-YOLOv8

3.3. Evaluation of Tea Bud Recognition and Plucking Area Detection

3.3.1. Detection and Segmentation Results Using the Improved YOLOv8 Model

3.3.2. Performance Comparison Between Different Networks

3.3.3. Ablation Experiments

3.4. Evaluation of the Matching and Plucking Posture Determination

3.5. Analysis of Failure Cases

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI