1. Introduction
China is recognized as a large vegetable producer and vegetable consumer. China’s vegetable sown area and production account for 52.25% and 58.31% of the world’s total planted area and production, respectively, ranking first in the world [
1]. With the development of modern facility agriculture, the scale of intensive vegetable production has expanded. Centralized and factory nurseries have become an inevitable trend and are widely used in agricultural production activities around the world [
2]. However, in factory nursery production, the seedling success rate ranges from 80% to 95%, and the reasons for not emerging mainly include a lack of seedlings, diseased seedlings et al. [
3]. The transplant, culling, and replenishment of seedlings before leaving the factory are key steps in determining the quality and yield of vegetable seedlings. There is still relatively little research on how to pick and replenish weak seedlings, and most of the related work is performed manually. However, the high temperature, high humidity, and high degree of confinement in the greenhouse make it extremely difficult for workers to pick and replenish seedlings, and there are also disadvantages, such as high subjectivity, low efficiency, and high costs in picking and replenishing seedlings manually [
4]. It is not possible to accurately predict the growth of seedlings and target replenishment by relying on experience alone. In order to have a high neatness of seedlings at the factory, seedling factories often reduce the loss caused by the lack of seedlings and diseased seedlings by increasing the number of seeds sown, which leads to a great loss in the economy of factory seedlings. A reliable early identification system for weak seedlings can help nursery plants quickly locate weak and dead seedlings and target transplanting and replanting operations, greatly improving the efficiency and economy of plant nurseries.
The gradual integration of computer technology and agricultural knowledge has enabled the study of crop morphological structures and physiological functions to cross over to the stage of digitalization and visualization [
5]. Researchers have applied machine vision and spectroscopy to high-throughput crop phenotyping to achieve autonomous monitoring, analysis, and the application of crop physiological and ecological information [
6,
7,
8]. Crop phenotype detection technology is the basis for growth modeling. Three-dimensional vision technology can store the 3D information of plant shapes and organs in the computer to reproduce the morphological structure of crops. It can analyze and detect the dynamic process of plant growth and plant
–environment interactions, which accelerates the scholars’ quantitative research on the process and laws of crop growth and development [
9,
10]. A 3D vision generally uses 3D imaging techniques such as depth cameras, binocular vision, and depth estimation for phenotypic studies of crops. A large number of relevant studies have been generated in recent years. For example, Jin proposed a low-damage transplanting method for leafy vegetable seedlings based on machine vision and image processing to solve the problem of high damage rates in seedling transplanting in horticultural facilities. He used the Intel D415 camera to obtain the height and extreme edge points of seedlings and performed path planning for the end-effector based on coordinated information to achieve the low-damage transplantation of seedlings and improve the success rate of seedling transplantation [
11].
Three-dimensional vision technology can make up for the shortcomings of machine vision and 3D vision can obtain the actual phenotype data of the research object, which is excellent in crop growth quality monitoring. For example, Yang et al. proposed an RGB-D camera-based method for in situ measurements of vegetable seedling height parameters in greenhouse nursery trays. He combines 3D point cloud filtering with clustering technology, which can effectively filter out the soil background point cloud set and realize in situ point cloud segmentation, and the average relative error of its plant height measurement is 7.69%; the accuracy can reach the standard for practical production applications and scientific research needs [
12]. Teng et al. used Azure Kinect for 3D reconstruction of the seedling moss stage and proposed an improved point cloud alignment method based on ICP, which aligns the point cloud of each viewpoint three times consecutively by continuously decreasing the distance threshold between the grid size and the corresponding point until the complete color point cloud information is obtained. This method increases the accuracy to 92.5% and has the potential to be widely used for the non-destructive testing of oilseed rape phenotypes with low cost and high accuracy [
13]. Otoya et al. used the RealSense D435 depth camera to grade artichokes. The leaf area estimation method based on point cloud segmentation and the triangulation algorithm classified artichokes into four grades: high-quality seedlings, medium-quality seedlings, poor-quality seedlings, and no seedlings, and this method enabled the non-destructive assessment of seedling quality [
14]. Nguyen et al. performed the precise 3D reconstruction of cabbage, cucumber, and tomato seedlings by using a structured light-based 3D reconstruction method and accurately estimated plant phenotypic characteristics such as leaf number, plant height, and leaf size without destroying any part of the plant [
15]. Chen et al. used the structure from the motion method to obtain the point clouds of plants and proposed a fuzzy C-mean clustering-based point cloud segmentation method for individual plants, which finally realized the grid method to calculate the leaf area. This method improves the accuracy of leaf area calculation for overlapping leaves and complex angle shots to a certain extent [
16]. Wang et al. proposed a KinectV2 camera-based nondestructive monitoring method for the growth process of factory plug seedlings to achieve the nondestructive measurement of plug seedlings. He obtained the germination rate of seedling trays by threshold segmentation and the morphological processing of color images and completed the analysis of plant height and leaf area for the seedlings by converting depth images into point clouds, realizing nondestructive monitoring for germination rate, plant height, leaf area, and the seedling index of cavity trays [
17]. Zhang et al. took cucumber cavity tray seedlings as the research object and proposed a point cloud processing-based automatic detection method for late seedling emergence in cavity trays. The leaf area and plant height were obtained by the α-shape algorithm; the method of locating the top of seedling stems based on the principal curvature, and the product of leaf area and plant height was used as the grading factor to achieve the automatic detection of late seedling emergence [
18].
Crop phenotype data based on 3D vision technology can well describe the current crop growth condition, and combined with machine learning or deep learning techniques, can further predict the crop growth trend. For example, Zhang et al. proposed a method to measure the 3D morphological characteristics of plants and established a plant time-series growth equation and visualization model to present the growth process of Arabidopsis dynamically, which facilitates the phenotype detection of Arabidopsis. However, due to the method of generating point clouds as a structure and the need to rely on L-studio software to fit the mathematical growth equations, the modeling speed is slow and cannot achieve the speed and portability required for practical production [
19]. An et al. designed an automated high-throughput plant phenotype detection pipeline for monitoring the growth of rosettes. This pipeline is topped with 18 cameras and is capable of holding 4 × 4 seedling trays for a total of 16 trays. With this device, images of rosettes can be taken continuously, and the power-law distribution between the total leaf growth area and rosette area can be analyzed from the time series. However, this device is complex, costly, and less portable [
20].
In summary, phenotypic characteristics, such as leaf area and plant height, are the main parameters for evaluating and predicting plant growth [
21,
22]. Plant height determines whether seedlings are spindling, while the leaf area is a determinant of seedling growth, strengths, or weakness. The joint growth prediction of these two characteristics is expected to achieve the discrimination of the seedlings’ strength and weakness indicators. Since the growth model of seedlings carries time-series information, the growth status of one day is necessarily highly correlated with the growth status of the next. The long and short memory network (LSTM) has been superior in the analysis of time-series dynamical systems in several fields [
23]. LSTM can solve the situation of gradient disappearance and explosion in traditional recurrent neural networks (RNN) and could trace back more time-series information to make the model’s prediction more explanatory. In contrast, traditional machine learning binary classification networks such as SVM, random forest, and XGBoost can jointly model the two features obtained from the prediction with a strong classification ability and less impact on discrete points. In order to solve the problem of early identification and the location of weak seedlings, a phenotype-based growth prediction and strong seedling discrimination model are proposed in this paper. The model has high detection and prediction accuracy and can not only discriminate weak seedlings but also locate weak seedlings, which can provide information on the number of seedlings and the location of seedlings for the dividing and combining robot and has good practical value and application prospects.
2. Materials and Methods
2.1. Experimental Materials and Data Acquisition
The experiment was conducted in July 2022 in a small daylight greenhouse at the vegetable improvement base of Huazhong Agricultural University with a north–south layout and free control of shade curtains to control the temperature and humidity as well as light in the greenhouse. The watermelon variety tested was the common variety “Zaojia (84-24)”, with a total of 16 trays and 788 seedlings emerging. The growth cycle of seedlings was 8–10 days. The cultivation substrate used for growing watermelon was grass charcoal, vermiculite, and perlite, uniformly mixed according to a volume ratio of 3:1:1, while drip irrigation was used.
Kinect 3D sensor real-time acquisition algorithms can meet the requirements of fast, accurate, real-time crop growth pattern image information acquisition, which has become a development trend and a necessary means of digital agricultural production management [
24]. The data acquisition device for this paper is Azure Kinect DK from Microsoft. The platform for data acquisition is shown in
Figure 1 and consists mainly of the Azure Kinect sensor, a computer, and a shaded photo booth. The Kinect was mounted on a steel mount, looking down 90° at a distance of about 0.45 m, with the camera plane parallel to the shooting platform. The computer is used to acquire and process the images captured by the Kinect. The data was collected from the time the seedlings sprouted to the time they developed their true leaves, using Azure Kinect to take top views of the entire tray of watermelon seedlings three times a day at 9:00, 14:00, and 19:00. Since the color camera lens of the Azure Kinect sensor is extremely exposed, the data acquisition was chosen to take place in a dark room.
The color image contains the color information of plug seedlings, and the rich RGB features in the color information have a better processing effect for seedling positioning and image segmentation. The depth image contains information about the actual distance from the camera lens to the seedlings in the cavity tray and has high accuracy in phenotype detection. It can be used for the non-destructive detection of 3D phenotype data from the seedlings. The joint analysis of color and depth images requires the alignment of the two images. The depth image is aligned to the color image using the transformation depth image to color camera function in the Kinect SDK during data acquisition, and the aligned depth image has the same pixels as the color image so that the depth information can be directly segmented and recognized based on the color information.
Figure 2 shows the continuously acquired color image with the aligned depth image.
The robustness of each seedling was assessed manually on the sixth day of data acquisition. The assessment results were divided into two categories: normal seedlings and abnormal seedlings. Abnormal seedlings were weak seedlings with dwarf plants and smaller wilted leaves or spindling seedlings with thin stems and tall plants, while the rest were normal seedlings.
Table 1 shows the statistics of all the sample data.
2.2. Overall Flow Chart
The flow chart of the technical approach in this paper is shown in
Figure 3. It includes four parts: data acquisition, seedling location, phenotype detection, and weak seedling identification. Data acquisition includes image data acquisition by the RGB-D camera and manual acquisition of plant height and leaf area. Seedling location and phenotype detection were performed by image processing and point cloud processing using the collected data, and validation experiments were conducted simultaneously. The weak seedling discrimination system uses LSTM and a random forest classification model to jointly predict the dual features of plant height and leaf area to obtain the final weak seedling discrimination model.
2.3. Seedling Positioning and Indexing Methods in Cavity Trays
2.3.1. Plug-Hole Location and Indexing
The first step is to detect the plug holes of plug seedlings, and the most critical is to determine the location of the plug boundary. Since the growth cycle of seedlings in this experiment was the seedling stage, there was no problem with incomplete information on the boundary of the plug due to the shading of seedlings. In order to obtain accurate information about the location of the plug-hole boundaries, it is necessary to segment the seedlings and soil information more precisely to leave the plug-hole boundaries that are needed.
As shown in
Figure 4, the information of seedlings can be removed by first Extra Green and inverting the color image of the watermelon seedling tray taken. Threshold segmentation is a typical algorithm for segmentation based on gray value features in image processing. Since the boundary of the plug and the soil information have different gray level ranges, the OTSU threshold segmentation of the color map with the seedling information removed can obtain a binarized image containing only the boundary information of the plug.
The binary image of plug seedlings after segmentation also contains noise, and if the noise information cannot be accurately removed, it will interfere with the subsequent processing and even affect the correctness of the results, so removing noise is a necessary part of the image after binarization. The noise is formed by the fine pixel points of the soil, and the boundary of the plug should be preserved, so a 3 × 3 kernel is used to open the operation so that the boundary of each hole of the plug can be more clearly shown, and the information of the plug seedings and the soil substrate can be divided. The subsequent noise is mostly scattered in small areas and single-connected areas. To remove such noise, calculate the area of all single-connected areas in the pixel points, then set the threshold value and set the pixel values of all areas with area values less than this threshold value to 0. Until now, it has been possible to split off unwanted seedling and soil information more precisely and keep the information we need about the boundary of the plug.
Since the plug holes have a standard structure and are arranged in a square matrix, the boundaries are continuous in the horizontal and vertical directions, so the plug hole boundaries can be determined using the pattern of the pixel statistics of the seeding plug image with the change in horizontal and vertical coordinates. For each column and row of the graph, the pixel values are counted separately from the trends.
The horizontal and vertical wave peaks correspond to the hole boundaries, and the coordinates of the wave peaks are the coordinates of the pixel points corresponding to the hole boundaries, which can accurately determine the location information of each hole boundary to achieve the plug hole location. The red points in the boundary identification part of
Figure 4 are the results of peak point detection.
2.3.2. Seedling Index
Watermelon seedlings may grow skewed during the germination period due to phototropism and water control, which may cause the seedlings to be photographed outside of the center of the hole. Even if the hole location is correctly located, accurate seedling information is not obtained due to skewing. To address this problem, a seedling image skew correction algorithm is proposed.
The first Extra Green is performed to retain only the seedling information, and the localization range is expanded for the localized plug-hole images, as in
Figure 5b. Use the Moments function in the OpenCV to obtain the centroid coordinates of each connected domain, as in
Figure 5c, and calculate the Euclidean distance between the centroid coordinates and the center of the image, and the location of the seedling with the smallest centroid is the location of the seedling in this hole.
Figure 5e shows the effect of seedlings’ plug holes after correction.
2.4. Seedling Phenotype Detection Algorithm
2.4.1. Seeding Height Measurement
The seeding height H was defined as the vertical distance from the root of the seedling stalk to the top of the leaf. As shown in
Figure 6, the field of view of the camera is parallelogram ABCD,
h1 is the distance from the root of the main stalk of the seedling measured by the camera,
h2 is the distance from the top of the leaf measured by the camera, and Equation (1) is the formula for calculating the seeding height
H.
Since the soil plane is not a flat plane and the soil height varies for each seedling, it is not possible to measure the seeding height with a uniform height, and each seedling needs to be analyzed individually. To measure the height of each seedling, proceed as follows:
Step 1: The depth information in the unoperated depth image is cumbersome, and the color image is the first Extra-Green first to remove the seedling information.
Step 2: The depth information of the soil can be removed by removing the depth information of non-zero pixel locations in the corresponding depth image.
Step 3: Only the depth information of the seedlings is left in the depth image with the soil information removed, and the difference between the maximum and minimum values is the seeding height at this point.
2.4.2. Leaf Area Measurement
The leaf area of seedling leaves will occur by non-spreading, and the 2D image is no longer able to accurately estimate the leaf area, which needs to be measured by converting the depth image pixels into 3D spatial coordinates (3D point cloud). According to the Kinect imaging principle, the conversion formula of depth image and 3D spatial coordinates is shown in Equation (2).
In Equation (2), (
,
,
) is the 3D spatial coordinate corresponding to the point (
,
), (
,
) is the pixel coordinate of any point of the depth image,
is the depth of information corresponding to the point (
,
), is the focal length of the IR camera, and (
,
) is the optical center coordinate of the IR camera. Use PCL to generate an empty point cloud, and then add the coordinate points converted to 3D spatial coordinates to the point cloud file, and the RGB information contained in each point is the RGB information of the coordinates corresponding to the color image in order to obtain the spatial point cloud map containing color information, as shown in
Figure 7a. The neighborhood extreme filtering method [
12] can eliminate the dragging problem caused by the depth camera shooting and obtain a pure leaf point cloud.
The greedy projection triangulation of the leaf corresponds to the noise reduction of the leaf point cloud as the basis for the construction of a triangular network to obtain a model containing some triangular slices, as shown in
Figure 7b. Each of these triangular slices contains real position information, and the area of each triangle can be calculated based on the three sides of the triangle by indexing the original position and depth values through the 3D point cloud, thus calculating the length of the three sides of each triangle.
The area of each triangle is found by using the three side lengths through the Helen formula, and finally, the total area of the leaf can be obtained by adding up the areas of all the triangular face pieces. The specific formula is as follows.
In the above formula, , , and denote the side lengths of the three sides of the th triangle after greedy triangulation; denotes the area of one of the triangles; denotes the sum of the areas of all the triangles, i.e., the total leaf area of the whole tray of watermelon seedlings; and denotes the number of triangles.
2.5. LSTM-Based Phenotype Prediction Model for Seedling
Long short-term memory (LSTM) is a special kind of recurrent neural network (RNN), which is mainly designed to solve the gradient disappearance and gradient explosion problems during the training of long series. LSTM can handle sequence-changing data and has a better performance in longer series compared to general neural networks. Therefore, LSTM is widely used in time series problems, such as time series, stock prediction, speech recognition, and signal analysis problems. For the continuous time series of watermelon seedling growth conditions, the growth conditions of the previous day are inevitably highly related to the growth conditions of the following day and influence the growth conditions of the latter day. The use of phenotypic features alone without considering the association between different schedule types can lead to misclassification. The shifting of its cellular state in the LSTM structure describes exactly that feature. For the phenotypic information measured from continuously acquired image information, using LSTM networks can make full use of the continuity between features to tap the temporal information carried between images and maximize the accuracy of discrimination. Therefore, this paper selects the LSTM neural network architecture to build a growth prediction model for watermelon seedlings at the seedling stage.
The structure of the LSTM network is shown in
Figure 8, which consists of multiple neurons connected at the beginning and end, and each neuron consists of gating structures and cell memory units inside, allowing it to handle data prediction tasks with long time series comfortably. The gating structure contains the forget gate, input gate, and output gate, which work together to determine the surrender and preservation of information.
The forget gate determines the amount of information forgotten at the previous moment. The input gate determines the amount of information updated to the cell memory units at the current moment, including which determines the degree of cellular memory at the current moment versus and which controls the amount of information flowing into the cell memory units. The cell memory unit stores the amount of information about the cell at the current moment and can be updated at any time. The output gate determines the amount of information flowing out at the current moment.
All emerged seedlings were phenotypically examined for a total of 788 sets of data, and seeding height and leaf area were used as the input information for LSTM, respectively. The structure of the LSTM prediction model is shown in
Figure 9, and the data structure needs to be cleaned before training the LSTM network. The observed data set is first converted to the form of a supervised learning set, i.e., from a set of time-series data to the form of a data set with inputs and outputs. In this experiment, the time step is three and each data set consists of six data sets, three input data, and three output data, for a total of 13 data sets. Additionally, all the data sets are divided into 70% as the training set and 30% as the testing set. Finally, the data are normalized and standardized to make the gradient descent faster and the convergence more accurate. After training, data prediction and inverse normalization are performed to predict the future schedule-type data and obtain the predicted seeding height and leaf area.
The data acquisition work cycle was 6 days, with three sets of data collected per day, for a total of 18 sets of data. The prediction of growth using an LSTM neural network requires sufficient antecedent data to improve the prediction accuracy and practical needs in agronomic production. To ensure that replenishment decisions are available as close to the three days before the seedling stage as possible, allowing time for replenishment measures to keep the factory seedlings growing as evenly as possible and to ensure their economy meant that the first three days of seedling growth data were chosen to be used for prediction. The parameters of the LSTM network structure are shown in
Table 2. The data on days t, t − 1, and t − 2 were used to predict the data on days t + 1, t + 2, and t + 3 (t = 3) as the output, and the seedlings’ strengths and weaknesses were discriminated by the subsequent discriminant method.
2.6. Machine Learning-Based Weak Seedling Discrimination Model
In the seedling period, there are only two categories to describe the strength and weaknesses of seedlings, so the strong seedling model in this study is actually a binary classification problem with supervised learning. To make the discriminations of strong and weak seedlings predictable, the data predicted in the previous step were used as the input phenotypic features. The predicted phenotype data of all emerged seedlings were cleaned with a total of 788 sets of data, and the ratio of the training and testing sets were uniformly divided into 70% and 30%.
In this study, logistic regression, support vector machine (SVM), random forest, and the boosting algorithms GBDT, XGBoost, and LightGBM were used to build classification prediction models, and the optimal prediction model was selected based on accuracy, recall, precision, and F1 Score to achieve the strong and weak seedling discrimination of watermelon seedlings.
For the dichotomous classification problem, there exists a situation analysis table summarizing the predicted results of the classification model, called the confusion matrix, as in
Table 3.
Each parameter in the confusion matrix is TP (True Positive): predicting positive classes to positive classes; FN (False Negative): predicting positive classes to negative classes; FP (False Positive): predicting negative classes to positive classes; TN (True Negative): predicting negative classes to negative classes. The formula for calculating the evaluation indicators is as follows:
4. Conclusions
To address the problem of the poor early identification of weak seedlings in factory nurseries, this paper proposes a visual system for the early discrimination of weak watermelon seedlings based on phenotype detection and machine learning, which uses two early characteristics of the seedling height and leaf area to assess their growth status, mainly including the following aspects:
First item. The color information and depth information of seedlings were obtained using an RGB-D camera, and the seeding height and leaf area characteristics of the seedlings were obtained based on a traditional image segmentation algorithm and 3D point cloud processing method, and their MPAE were 2.59% and 7.23%, respectively, indicating that the method has high reliability, and the consumer-grade camera Azure Kinect DK is low-cost, simple to operate, stable, and reliable.
Second item. The two features are fed into LSTM for prediction, and then the predicted information is fed into the random forest classification network to build a weak seedling early discrimination model. The model achieved 84% discrimination accuracy on the test set of early discrimination for the weak seedlings of watermelon seedlings, which can realize the early prediction of weak seedlings, provide visual support for seedling dividing and combining trays and seedling replenishment robots, realize the regulation and early warning for seedling factories, and greatly improve the seedling economy of factories, and has good development potential.
This study can be used as a vision system for the seedling dividing and combining robot. In the seedling production, Kinect is installed on the seedling dividing and combining robot, and the growth condition of the whole seedlings plug is obtained by monitoring for three consecutive days, and the position of each seedling is matched with the growth condition and fed to the robot, and the robot uses the robot arm to transplant and replenishes seedlings according to the position of the weak seedlings to realize the regulation and early warning of seedling plant production.
In conclusion, this paper presents a creative solution for seedling monitoring and weak seedling prediction while combining traditional image processing and AI machine learning methods, which is a useful example of digital research in a factory nursery and can effectively promote the degree of automation and intelligence in factory nurseries.