1. Introduction
Agriculture is a fundamental economic activity for the subsistence of human beings, and it has left its mark in history, making differences in the development of civilizations in different geographical locations and times [
1,
2]. Agricultural progress has been a fundamental basis for the growth of human populations, because the production of sufficient food in quantity and quality depends on it. By the year 2050, the human population will have grown to about 10 billion people, according to United Nations data [
3], which implies great challenges for different food production activities.
In the last two decades, technological advances applied to the tillage of the land have generated what is called precision agriculture (
PA). It encompasses a set of technologies that combine sensors, statistics, classical algorithms, and algorithms of artificial intelligence (
AI) especially those of computer vision (
CV) [
1]. The increase in the amount of food from the cultivation of the fields and the optimization of resources used for this are some of the main goals of the
PA. These can be achieved by monitoring certain characteristics of the crops, such as growth, irrigation, fertilization, detection of pests and diseases [
4,
5,
6].
Currently, several
CV methods have been developed to support agricultural activity in tasks such as the estimation of fruit quality [
7,
8,
9], recognition of pests [
10,
11,
12], improvement of irrigation systems [
13] and nutrient deficiency detection [
14].
Color attributes in digital images are used to segment different crop elements, such as leaves, fruits, and weeds, among others, from the rest of the elements present in it [
15]. Rasmussen [
16] evaluated the level of leaf development in fields free of weeds, highlighting the importance of the conditions under which these images are acquired for obtaining successful results such as camera angle and light conditions among others. Kirk [
17] estimated the amount of vegetation or foliage present in images of cereal crops at early stages of phenological development. In [
18], Story presented a work to determine overall plant growth and health status. In this method, RGB (red, green, blue) and HSL (hue, saturation, lightness) formats values as are used as color features. Wang [
19] proposed a method to segment rice plants from the background of the image based on subtracting the value of the green channel from the value of the red channel for the pixels of the image. The results of the segmentation process are used to estimate the amount of nitrogen present in the leaves. Quemada [
20] carried out a segmentation process in hyperspectral images for the estimation of nitrogen present in corn crops. Additionally, there are computational methods implemented to detect weeds in uncontrolled light conditions, as was reported by Jeon [
21], in which images were acquired using an autonomous robot. Yadav [
22] measured the amount of chlorophyll found in potato crops using
CV algorithms. Philipp [
23] performed a series of segmentation comparisons using different color representation models. In [
24], Menesatti proposed the use of a rapid, non-destructive, cost-effective technique to predict the nutritional status of orange leaves utilizing a Vis–NIR (visible–near infrared) portable spectrophotometer. Fan [
25] developed a method for segmenting apples combining local image features and color information through a pixel patch segmentation method based on a gray-centered
color model space to address this issue.
At a more specific level, there are papers that describe methods for the segmentation and analysis of different elements that are part of the plants in crops, which use different
CV techniques. For example, Xu [
26] reported a method for extracting color and textures characteristics of the leaves of tomato plants, which is based on histograms and Fourier transforms. Wan [
27] proposed a procedure to measure the maturity of fresh supermarket tomatoes at three different levels through the development of a threshold segmentation algorithm based on the
color model, and classification is performed using a backpropagation neural network. Tian [
28] used an improved
k-means algorithm based on the adaptive clustering number for the segmentation of tomato leaf images. Castillo-Martínez [
29] reported a color index-based thresholding method for background and foreground segmentation of plant images utilizing two color indexes which are modified to provide better information about the green color of the plants. Lin [
30] proposed a detection algorithm based on color, depth, and shape information for detecting spherical or cylindrical fruits on plants. Lu [
31] presented a method for automatic segmentation of plants from background in color images, which consists of the unconstrained optimization of a linear combination of
color model component images to enhance the contrast between plant and background regions.
In recent years, a new image processing technique called deep learning (
) has been developed. It consists of several types of convolutional neural network (
CNN) models [
32], for example: LeNet [
33], AlexNet [
34], VGG-16 [
35], and Inception [
36]. The capabilities and applications of CNN models have increased as well as the number of trainable parameters in them. These are a function of the number of layers used; therefore, highly specialized hardware has been required for their training. In PA, the use of DL has been successfully used in different contexts, for example, for pest and disease detection [
37,
38,
39], leaf identification [
11,
40], and estimation of nutrients present in plant leaves [
41]. The development of CNN models for performing the separation of items of interest from items of non-interest has been explored in several papers. In [
42]. Milioto used an RCNN model to segment sugar beet plants, weeds, and background using
images. Majjed [
43] trained different CNN models based on SegNet and FNC to segment grapevine cordons and determine their trajectories. Kang [
44] proposed several CNN models based on DaSNet and a ResNet50 backbone for real-time semantic apple detection and segmentation.
This paper details a segmentation method applied to images taken in greenhouses with tomato crops, classifying the pixels into three classes: leaves, fruits, and background. This is based on segmentation using the dominance of a color channel with respect to the others and, in a following stage, the determination of thresholds using the same color channel information. This method has the advantages of ease of implementation and low computational cost.
The reminder of this paper is organized as follows. The segmentation method developed to separate the leaves and fruits of the tomato plants is described in detail in
Section 2. In
Section 3, different images generated during the segmentation process of the leaves and fruits are shown. Additionally, tables are displayed with the metrics selected to measure performance. A comparison of results generated by the developed segmentation method against those of a CNN model is made in
Section 4. In
Section 5, a commentary on the performance of our method is provided. Lastly, the conclusions are presented in
Section 6.
2. Method
The different methods of separating the elements present in images into portions that are easier to analyze are called segmentation methods [
45,
46,
47]. These can be classified into the following categories:
Region-based methods. These methods are based on separating a group of pixels that are connected and share properties. This technique performs well on noisy images.
Edge-based methods. These algorithms are generally based on the discontinuity of the pixel intensities of the images to be segmented, which are manifested at the edges of the objects.
Feature-based clustering methods. These methods are based on looking for similarities between the objects present in the images; this allows for the creation of categories of interest for a particular objective.
Threshold methods. These methods are based on a comparison of the pixel intensity value against a threshold value T. There are two types of threshold segmentation methods depending on the value T; if it is constant, it is called global threshold segmentation, otherwise it is called local threshold segmentation.
The following sections detail the proposed method of using color dominance to segment all pixels of digital images acquired inside greenhouses into three different classes: leaves, tomato plant fruits, and background, which is based on the calculation of local thresholds for each pixel.
RGB Thresholding
The color model is made up of three components: one for each primary color. These can have values ranging from 0 to 255, allowing any color in the visible spectrum to be represented.
The method uses a two-stage algorithm to classify each pixel of the image into one of three classes: leaves, fruits, and background. The first stage is based on the dominance of one of the color channels over the other two in the color model; the green channel is used for the leaves, and the red channel is used for the fruits. The second stage aims to eliminate the false positives generated by the first stage. The segmentation of leaves and fruits of the tomato plant is based on the calculation of four thresholds for the differences of the dominant color channel against the other two, two to label the leaves and the remaining two for the fruits. In the calculation of the thresholds, statistical variables such as the standard deviation and maximum values of the dominant color channels are used.
An image can be mathematically represented as a two-dimensional function . If it is handled with the color model, it is made up of three elements: ,, and , where the subscript refers to the primary colors red, green, and blue, respectively. x and y represent the spatial coordinates of a particular pixel within the image of dimension , where M represents the number of rows and N is the number of columns in an image.
The first segmentation stage is based on the dominance of the green color channel over the other two channels. This is performed by applying:
where
contains the pixels that are filtered from
with dominance of the green color channel over the other two for
and
.
The other group of pixels of interest to segment are those from the fruits of the tomato plants. The segmentation of the fruits is based on the dominance of the red color channel over the other two; it is performed by applying:
where
contains the pixels that are filtered from
with dominance of the red color channel over the other two
and
.
The second stage of the segmentation process begins with the calculation of the differences between the dominant color channel and the non-dominant channels in
and
, which are calculated by applying:
where
and
are used to determine the dominance of the green color channel related to the pixels that form the leaves in
.
and
are utilized to determine the dominance of the red color channel related to the pixels that form the fruit in
. The subscript refers to the dominant color channel, whereas the superscript refers to one of the other two color channels.
Then, the thresholds are implemented to determine which pixels belong to leaves and fruits. These are calculated with:
where
and
are the thresholds used to detect leaves, and
and
are utilized to find the fruit region.
and
are the highest values of the green and red color channels in
and
, respectively.
,
,
, and
correspond to the standard deviations of the
,
,
, and
values, respectively. Finally,
is a factor utilized to control the thresholds with the objective of maximizing the result of a particular metric by experimenting with different values of
.
The image
with the pixels that make up the leaves filtered from
and
is obtained with:
The image
with the pixels that make up the fruits filtered from
,
and
is obtained with:
Figure 1 describes the segmentation process of tomato leaves and fruits in a general way.
4. Comparison of Results against PSPNet Model
A CNN PSPNet [
50] model was trained with a ResNet50 [
51] backbone to perform semantic segmentation of the leaves and fruits of tomato plants.
Figure 10 shows the architecture of the implemented CNN PSPNet model.
The CNN model was created using the and libraries. For the training process, two sets of images of the initial dataset were created: the dataset with 180 items and the dataset with 80, where it was necessary to perform the same labeling as described in the corresponding section for both sets of images. The learning process was carried out for 70 epochs with a learning rate of , lasting eight hours using the equipment described in corresponding section. The accuracy for the training set was about , whereas for the validation set it was .
Five images were processed with the color dominance segmentation method and compared with the segmentation using the CNN model. The qualitative and quantitative results are presented in
Section 3.
A qualitative comparison of the segmentation on tomato leaves and fruits by the CNN model and color dominance method is shown in
Figure 11 and
Figure 12. The differences in the segmentation results in both cases are observed near the contours of the leaves and fruits.
Table 7 and
Table 8 show the quantitative results of the five metrics used to measure the segmentation performance of leaves and fruits of tomato plants. In most cases, the performance of the color dominance segmentation method is superior to that of the CNN model.
The PSPNet model was used to perform the segmentation of 100 images in which the color dominance segmentation method was tested to obtain an overall quantitative comparison of the performance of both methods with the same dataset. The comparative averages of the performance metrics are shown in
Table 9.
The averages of the color dominance segmentation method are higher than those of the CNN PSPNet model in all cases, both for leaves and fruits. The metric where the greatest difference is shown is in both cases.
Calculating the average of the results of the five metrics, the color dominance segmentation method has a superior performance of percentage points, whereas for fruits it has a performance advantage of percentage points when compared to the CNN model.
5. Discussion
The images resulting from applying the two-stage segmentation method are shown in
Figure 6 and
Figure 7, which show a successful segmentation of the leaves and fruits of tomato plants.
As for the quantitative measurement, the results using the selected performance metrics from
Table 5 and
Table 6 show adequate performance of the color dominance segmentation method. Another aspect to highlight is that the processed images were taken in real growing environments without lighting control.
When comparing the results of the color dominance segmentation method with the semantic segmentation performed with the CNN PSPNet model in
Table 9, the performance of the proposed method is superior, with the great advantage of not requiring a manual image labeling process nor a prior training process costly in time and computational power.
The performance of the color dominance segmentation method can be increased by adjusting the value of to maximize results or by looking for a particular segmentation objective. For example, value adjustment can be performed by applying heuristic methods, such as simulated annealing or genetic algorithms. Another alternative is to use different values for , one for the leaves and a different one for the fruits, which allows for an improvement in the results of the segmentation of the leaves and fruits of the tomato plant.
Challenges for the proposed color dominance segmentation method include testing the method on images of tomato crops in the field (outside greenhouses), where brown elements such as soil can affect segmentation performance, particularly in fruit segmentation; adaptation and testing on crops with similar colors, such as strawberries and raspberries; and utilizing other color dominances that occur naturally in other crops and make the necessary adaptations to take advantage of them when segmenting the elements of interest.
An aspect of research involves looking for a color dominance segmentation process that can be used in a different color model, such as in addition to the , which allows the establishment of other conditions based on the dominance of one of the characteristics of the color model used.
A clear disadvantage of the proposed method is that it cannot be applied to crops in which the fruits are green because they would be classified directly as leaves, for example, the cucumber. In these cases, it will be necessary to add a method or algorithm that allows for discrimination of the shape of the fruits from the leaves.
The results of the segmentation performed by the method facilitate the search for pests, diseases, and nutritional deficiencies that may manifest themselves in the leaves, fruits, and background of the segmented images. As part of the future work using the segmentation method implemented, it is intended to develop a system that supports a diagnosis of the state of crops using intelligent algorithms such as CNN, pattern recognition, and heuristic methods for the generation of plant-saving fertilization alternatives.
In addition, it is possible to measure the results of the segmentation performed by the presented algorithm more accurately, thus reducing manual labeling errors.