1. Introduction
The world population has increased and is expected to continue to grow [
1]. In recent decades, this growth has driven the demand for agricultural goods, resulting in an increase in crop areas [
2]. However, traditional agricultural production is not economically or environmentally sustainable; hence, it is critical to make optimal use of resources to enable high-yield crops [
2].
Furthermore, crop productivity is constantly threatened by insect pests. It is predicted that worldwide food supplements are declining by 40% on average every year owing to plant diseases and insect outbreaks [
3]. Each year invasive insects cost the global economy around USD 70 billion [
4].
Temperature influences the rate of population expansion in several insect species. In addition, the rise in global temperature caused by climate change influences insect damage and development. The metabolic rates of insects increase when the temperature rises, causing them to consume more food and inflict more harm. Crop losses due to insect pests are expected to increase by 10% to 25% for every degree of average global warming of the Earth’s surface [
5].
Tomato is a fruit–vegetable that has great potential to be cultivated since it is a source of vitamins and minerals. In terms of improving yields and fruit quality, tomatoes rank among the horticultural commodities with high economic value that still require careful handling [
6]. It is critical to preserve these kinds of plantations against diseases and pests, in order to improve the quality and quantity of the crop [
7]. According to data from the Food and Agriculture Organization of the United Nations, tomato production in Western Europe has increased considerably from at least 2000 to 2019 [
8].
Numerous fungal, bacterial, and viral diseases have severely afflicted this plant, with symptoms appearing in various areas of the plant, such as the leaf, stem, fruit, etc. Wilt, rot, stains on fruits, browning of foliage, and stunted development are some of the symptoms [
9].
The advancements in information technology have allowed for the development of more precise farm management systems that overcome these invaders. Insect traps (ITs) are essential for keeping track of insect activity and are frequently used in pest detection and control programs, such as in [
10], where trapping techniques for emerald ash borer and its introduced parasitoids were addressed. In [
11], the authors address trapping, detection, control, and regulation of tephritid fruit flies, lures, area-wide programs, and trade implications associated with them. In [
12], the authors address the use of pheromone traps to monitor the distribution and population trends of the gypsy moth; for further references, please also see [
13,
14,
15]. ITs are also used to assess biodiversity, plan conservation [
16,
17,
18], and evaluate pest activity and research initiatives, such as in [
19], where over a two-year period, the association between female mating success and background male moth densities along the gypsy moth western front in northern Wisconsin, USA, was measured. In [
20], the authors describe the usage of automated pheromone-baited traps, utilizing recording sensors and data loggers to collect male unique date–time stamps when they entered the trap; for further references, please also see [
21,
22,
23].
As a result of the use of IT, a lot of research has been conducted to determine the effectiveness of traps, such as reference [
24], where attraction and trapping capabilities of bucket- and delta-style traps with different pheromone emission rates for gypsy moths were compared. In [
25], the performances of pheromone-baited traps to monitor the seasonal abundance of tortrix moths in chestnut groves were analyzed. In [
26], the authors evaluated gravid traps for the collection of
culex quinquefasciatus; for further references, please also see [
27,
28,
29,
30]. The research was also carried out to estimate the range of attraction, such as in reference [
31], where the authors presented a novel method for estimating a pheromone trap attraction range to the pine sawyer beetle
monochamus galloprovincialis. In [
32], the range of attraction of pheromone traps to
agriotes lineatus and
agriotes obscurus was assessed. In [
33], the authors assessed the attraction range of sex pheromone traps to
agriotes male click beetles in South-Eastern Europe. In [
34], the authors addressed the space of pheromone plume and its relationship with the effective attraction radius in applied models; for further references, please also see [
35,
36,
37,
38]. Work is also being conducted around the probabilities associated with insects, such as in [
39,
40]. Regarding the work in [
39], the probability of detecting Caribbean fruit flies was addressed. Concerning the work in [
40], the regional gypsy population trends (in an expanding population using a pheromone trap catch and spatial analysis) were predicted. This work on the probabilities associated with insects was conducted to better understand trap catches and to relate them to the absolute population density [
41,
42,
43,
44,
45,
46,
47]. Regarding reference [
41], the gypsy moth was used as the simulation model to interpret the capture of moths in pheromone-baited traps used for the surveillance of invasive species. Regarding the work in [
44], the European pine sawfly was monitored with pheromone traps in maturing Scots pine stands. As for the work in [
45], the autumn gum moth was monitored regarding relationships between pheromone and light trap catches and oviposition in eucalypt plantations.
For several insect trap systems, a relationship was found between trap catches and subsequent egg mass [
44,
45,
48,
49] and larval density [
50,
51,
52]. However, translating trap catches into absolute population density and, in particular, interpreting zero catches, remains challenging at the quantitative level [
12,
24,
41,
53].
By gathering data on the target pest’s existence, abundance, and dispersion, insect pest monitoring is often carried out in agriculture and forestry to evaluate the pest status in specific sites (such as a greenhouse, field, orchard/vineyard, or forest). The ultimate objective of insect pest monitoring within integrated pest management programs in agriculture is to give growers a useful decision-making tool. For instance, the intervention thresholds are crucial for optimizing the control method and grower inputs for a given insect pest infestation in a particular field at the ideal time. Insect population outbreaks can be predicted using monitoring data to develop prediction phenological models, providing extra knowledge to enhance control methods and maximize the use of insecticides [
54]. Similarly, forestry relies heavily on the detection and monitoring of both native insect pests and invasive species to set up effective management programs. This is because forest insect species can have a serious negative influence on the biodiversity, ecology, and economy of the afflicted area [
55].
The impetus for this work stemmed from the necessity to monitor insects that invade crops. The monitoring of insect populations potentiates an increased crop yield as the use of pesticides can be more efficient. Therefore, this work can contribute to precision agriculture [
56]. On the other hand, the proposed technique for the detection and subsequent counting of insects, which corresponds to the number of bounding boxes retrieved, contributes to smart farming. To this end, use was made of YOLOv5 and a tiled image-splitting technique in order to optimize the model’s performance.
Images from insect traps acquired in the open fields are subject to a wide variety of illumination conditions due to weather conditions, day-cycle light, landscape elements that cast shadows (e.g., trees, buildings, mountains), etc. The camera trap setup is also subject to oscillations due to the wind, which may result in lesser image quality due to motion blur. Trap imagery acquired in the open fields may also contain objects other than insects, such as leaves that stick to the traps. Machine learning models that use images acquired under these conditions tend to achieve worse results since they need to deal with such variability. On the other hand, images acquired in the laboratory are usually captured under fully controlled conditions (constant illumination, no wind, etc.), while images captured in greenhouses may also be subject to some uncontrolled environmental conditions (e.g., illumination variability), but not as adverse as on images captured in the fields.
This paper considers the much less controlled scenario of images acquired on the tomato crop fields, aiming to evaluate the applicability of YOLOv5 for the detection of insects in yellow sticky traps.
2. State-of-the-Art
Insect populations that exceed the economic threshold can cause significant harm to plants and, hence, diminish yields. The quantity of pests at an observed location is frequently determined by visually inspecting sticky surfaces in IT and counting the captured insects and this is a time-consuming job [
57]. To overcome this problem, there has been much development of Internet of Things (IoT) systems with the support of machine learning for monitoring IT. This paper was developed in this direction, using images of IT captured by an IoT system to detect the number of insects present in the traps in the agricultural field through machine learning. This section will discuss some of the work that has been done in this area.
Deep learning was used to detect, identify, and count specific pest species in ITs in [
58]. To reduce the impact of illumination variations on detection performance, a color correction variation [
59] of the “gray-world” technique [
60] was adopted. The authors suggested a sliding window-based detection pipeline that applies a convolutional neural network (CNN) to image patches at various locations to calculate the probability that they contain certain pests. Their work was inspired by algorithms proposed for pedestrian detection, analyzed in [
61]. The final detections were produced via non-maximum suppression (NMS) [
62] and thresholding of image patches based on their positions and related confidences. To evaluate the precision of the bounding boxes, the intersection-over-minimum (IoM) was computed. It was concluded that many of the errors occurred because the same moth could have various wing positions, occlusion levels, lighting circumstances, and decay patterns throughout time, indicating that the algorithm would improve in well-managed sites.
In [
63], the authors’ main objective was to create a model that detects whiteflies and thrips from sticky trap images in greenhouse settings. They developed a model based on faster region-based convolutional neural network (R-CNN), the “TPest-RCNN”, and trained it using transfer learning with a public data set in the first phase. They utilized their data set with the weights from the first phase to the second phase. The model was proven to be accurate in detecting microscopic pests in images with varied pest concentrations and light reflections. It was also concluded that for recognizing insect species from images captured in sticky yellow traps, the best results were achieved by the proposed model, beating the faster R-CNN architecture and techniques employing manual feature extraction (color, shape, texture).
The research in [
64] focuses on a four-layer deep neural network based on light traps with a search and rescue optimization strategy for identifying leaf folders and yellow stemborers. The search and rescue optimization approach was employed in the deep neural network to find the ideal weights to enhance the convergence rate, reduce the complexity of learning, and increase detection accuracy. The proposed method achieved 98.29% pest detection accuracy.
The proposed work in [
65] studies the monitoring of spotted wing drosophila IT using image-based object detection with deep learning. The authors trained the ResNet-18 deep CNNs to detect and count the insect in question. From an image captured from a static position, an area under the precision–recall curve (AUC) of 0.506 was obtained for the female and 0.603 for the male. From the observed results, it was concluded that it is possible to use deep learning and object detection to monitor the insects.
In [
66], the authors performed automatic insect detection where they first used a spectral residual model; different color features were then extracted. In the end, whiteflies and thrips were identified using a support vector machine classifier. The classification accuracies for the whiteflies and thrips were 93.9% and 89.8%, respectively. As for the detection of the trap, a precision of 93.3% was obtained.
To identify whiteflies and thrips, researchers in [
67] presented an image-processing approach that included object segmentation and morphological processing of color features combined with classical neural networks. The images were acquired under controlled conditions, in a laboratory environment, from sticky traps moved from greenhouses. The proposed algorithms achieved 96% and 92% precision, respectively.
In [
68], a pheromone-trapping device was developed. In this work, the original image was cropped into several sub-images with 30% overlap. These sub-images were then used to train the tested models, which were the images reconstructed with the detections performed. The results showed a mean average precision (mAP) of 94.7%.
Using IoT and deep learning frameworks, the work in [
69] provided a real-time remote IT monitoring system and insect identification algorithm. The authors used the faster R-CNN ResNet 50 and an average accuracy (using different databases) of 94% was obtained.
The study in [
70] used machine vision and deep learning to detect and count
Aphis glycines automatically. To detect the insect, the authors used a sliding windows approach with a size of 400 × 400 pixels to slide over the acquired images with a stride of 400 pixels. Each image framed by the sliding windows in each step was fed into the faster R-CNN developed by the authors. The results demonstrate the high potential of the method proposed.
In [
71], the authors proposed using low-cost cameras to capture and upload images of insect traps to the cloud. The authors used R-CNN and YOLO models to detect the insects, whitefly in this case, in yellow sticky traps. They used a public data set [
72] for training the models. However, the images used for training were acquired under controlled illumination conditions. The authors do not explicitly state whether the images were split or used as a whole. The model with the best mAP was YOLOv5x, with a mAP of 89.70%.
The technique proposed in [
73] combines high-tech deep learning with low-tech sticky insect traps. The authors propose a high-throughput cost-effective approach for monitoring flying insects as an enabling step towards “big data” entomology. In this work, the traps were captured a few days after being composed of a high number of insects, and images of them were only obtained after that capture, under laboratory and field conditions. The images were split into segments of 500 × 500 pixels. The authors concluded that the model was more likely to miss important images than it was to incorporate irrelevant ones.
Regarding the work in [
74], the authors used yellow insect traps for the detection of
Trioza erytreae and
Scaphoideus Titanus Ball using image-processing techniques and the FASTER R-CNN and YOLOv4 models. In order to promote the robustness of the models, images of the traps were taken by a 12-megapixel camera under different light conditions, backgrounds, and distortions. The authors did not perform splits on the images in order to train the models with tiles of the images instead of the images as wholes. The authors concluded that the models performed poorly with and without image processing.
Considering the methodologies stated, open-source solutions may be employed to aid in the detection process’s implementation. In [
75], this approach is followed, using the Computer Vision Annotation Tool (CVAT) (
https://github.com/openvinotoolkit/cvat, accessed on 9 December 2021), which contains a feature for automatic annotation/labeling. This software can also be powered by Nuclio (
https://nuclio.io/, accessed on 9 December 2021), a serverless technology that allows deploying trained models to CVAT. This tool was analyzed and it was concluded that it could be interesting to use it given the infrastructure of the project, as CVAT allows to create and carry out annotation tasks and, with Nuclio, deploy trained models [
76].
From the state-of-the-art, it is not always clear that the approach used to split the image into tiles will feed the trained model. This is important, because in the case of splitting the image, in order to optimize the model performance, duplicated detections can arise. This problem is addressed in this paper and an approach to solve it is demonstrated. Furthermore, the main contribution of this paper was to test the application of YOLOv5 in detecting insects in traps (tomato plantations in this case). From the reviewed works, using YOLOv5, images acquired under controlled conditions (laboratory or greenhouses) were usually used. Thus, this paper contributes to the future developments of insect detection in images that are split using YOLOv5 and an approach that optimizes the performance of the trained model and the non-appearance of duplicate detections. Furthermore, this paper contributes to the monitoring and detection of insects in crop traps and, consequently, to the prediction of events in the agricultural field, by providing a new metric to be analyzed and correlated with other data from the crop.
3. Materials and Methods
In this article, a method was developed to detect insects in IT, yellow sticky cards, placed in agricultural fields. The work carried out in this article arose in the context of AI for new devices and technologies at the edge (ANDANTE) [
77] project and, consequently, the data used in this work were provided by project partners. To carry out this work, first, the image was prepared to feed the artificial intelligence (AI) model, then the model was trained, and the results were evaluated and analyzed. This section presents the data set used and the pipeline of the method developed.
Given that there was no manual annotation on the images provided, the first stage of development was to manually annotate some yellow sticky cards and insects in the images. The open-source software CVAT, its application programming interface (API), and Nuclio (open source and managed serverless) were used in the developments described, making model training, manual and automatic detection, data management, and selection easier.
CVAT and its API allowed the creation of a website where all images were available and could be annotated manually and automatically. It was through CVAT that the bounding boxes of the yellow sticky cards and insects were manually created in the first phase. Through its API, it was possible to select images and access those same bounding boxes in the desired formats. With this access, everything was ready to start the development and training of the models with manual annotations. After the training, Nuclio was used to put the developed models into practice in CVAT, i.e., it became possible on the website to select a set of images in CVAT and apply the developed models to them with the immediate output of the results, in this case, the automatic bounding boxes of the yellow sticky cards and insects. This is because Nuclio allowed incorporating the developed models with the extra processing done, such as the splitting of the images into tiles and their consequent reconstruction, already with the respective automatic bounding boxes resulting from the annotations made by the model, thus providing CVAT with the coordinates of the bounding boxes to be placed on the image concerned. From CVAT API, it is thus possible to obtain the bounding boxes presented in each image and, consequently, the number of insects on the image in question.
3.1. Data Set
The data set used was related to Portuguese tomato plantations in the Ribatejo region, namely Valada, Castanheira, and Lezria, where ANDANTE Portuguese partners collected the data. Information about the tomato crop fields can be found in
Table 1.
The tomato cultivation fields where data were collected were fully mechanized, from planting to harvesting. The crop consists of natural tomato varieties, obtained from cross-pollination, without any kind of genetic modification. Sowing was in a greenhouse, starting at the end of January. Seedling production lasted about one-and-a-half to two months. The crop was staggered with a cycle of about 120 days, depending on the tomato varieties, and the start of planting took place between the end of March and the beginning of June. Planting was in 1.52 m wide ridges. Planting density was about 33,000 plants per hectare with drip irrigation.
The data set used contains 5646 images of IT captured by cameras placed in front of the traps. These were webcams with 12 megapixels. The traps were composed of chromotropic cards, yellow cards in this case, with glue, yellow in order to attract insects, such as
bemisia tabaci. In addition, pheromones were placed in delta-type traps in order to attract the male insects so that they did not create offspring, such as
helicoverpa armigera. The chromotropic leaves and pheromones were used in the biotechnical fight. In the whole data set, only 4637 images were considered legitimate since several did not correspond to IT or were not adequate to improve the model’s performance. These images were considered invalid. This filtering is shown in
Table 2.
The images were captured every day, between the dates shown in
Table 2. Furthermore, the acquisition was mostly done between 11 a.m. and 8 p.m. at different times of the day (11 a.m., 11.30 a.m., 12 midday, 4 p.m., 4.30 p.m., 5 p.m., 7 p.m., 7.30 p.m., and 8 p.m.), usually nine images were captured per day. The ANDANTE partners defined this configuration based on their understanding of the insect’s behavior.
Figure 1 presents an example image for each of the six traps utilized.
3.2. Method Pipeline
An analysis of the images from the data set was carried out; a method was chosen in which the trap was first detected and then the insects presented in that trap through the bounding box resulting from the detection of the trap, the yellow sticky card.
Since ITs differ physically and are sensitive to varied lighting circumstances during image acquisition, we exclusively employed AI models for object detection, abandoning the usage of manual image-processing processes for insect detection. In addition, because the colors of the insects were generally the same as the colors of the lines on the yellow sticky cards, only AI models were used. Taking this into account, and the literature review [
63,
78,
79,
80,
81,
82,
83], it was observed that AI models were increasingly being used, performing better and replacing more traditional methods that involved manual image processing; the manual image processing was discarded despite being considered at an early stage. Regarding the work in [
79], it was verified that a YOLO model could perform better than the model used in the research for segmenting blueberries from an input image. In [
63], the authors concluded that the faster R-CNN proposed had better results than techniques employing manual feature extraction for detecting whiteflies and thrips from sticky trap images in greenhouse conditions.
The insect detection process went as follows: the yellow sticky card in the original image was detected; the resultant bounding box was divided into tiles; the insects on each tile was detected; the original image was rebuilt with all bounding boxes. For the sake of improving the performance and results, cropping techniques were adopted [
84]; the bounding box corresponding to the yellow sticky card, i.e. the result of the yellow sticky card detection model was split into tiles, and these tiles were used to train the insect models tested. From the performed detection, the number of insects presenting in each image can be directly inferred.
Figure 2 depicts this pipeline split into two phases, A and B.
The YOLOv5 object detection model was used to perform the insect detection task. This choice is justified since YOLO is a widely used model that has been proposed for numerous object detection-based tasks and, its most recent version, the one used in this work, is showing an increasing usage trend [
81]. Considering this trend and other works already mentioned in
Section 2, it was decided to use YOLOv5 due to its potential performance in object detection tasks. Transfer learning was applied to train the model for insects and yellow sticky card detection.
The YOLOv5 model has different versions (YOLOv5s with a small size, YOLOv5m with a medium size, YOLOv5l with a large size, and YOLOv5x with an extra large size) and the basic structures of all these versions are the same. Their differences rely on the size of the model, with a multiplier that influences the width and the length (deepness) of the network. Generally, the larger the model, the better the performance at the expense of more processing time and required memory [
85].
The parameters presented in
Table 3 were used in all developments involving the use of YOLOv5.
The results of YOLOv5 were obtained and analyzed through MLflow [
86] integration. This integration made it possible to visualize the
mAP_0.5,
mAP_0.5–0.95, precision, recall, and loss during each training epoch. At the end of the training process, it was also possible to observe the F1 curve, as well as precision/recall curves. Of all the metrics obtained, due to the nature of the problem, the evaluation of the results was based on the
mAP_0.5,
mAP_0.5–0.95, precision, recall, F1 score, and F1 score curve.
The
mAP, corresponds to the mean over classes, of the interpolated average of precision (
), of each class (out of
N classes), given by the area under the precision/recall curve [
87], and is calculated as follows:
The precision measures the model’s accuracy in classifying a sample as positive. It is calculated as the ratio between the number of positive samples correctly classified to the total number of samples classified as positive:
The recall of the model assesses its ability to recognize positive samples. The more positive samples identified, the larger the recall. The recall is computed as the ratio of positive samples that are properly categorized as positive to the total number of positive samples:
The F1 score combines the precision and recall of a classifier into a single metric by taking their harmonic mean. The F1 score formula is shown here:
5. Conclusions
This paper presents the use and performance of YOLOv5 object detection models for insect detection in yellow sticky traps, using images acquired on tomato crop fields. The insect detection process uses a sliding window approach that minimizes the appearance of duplicate detections in yellow sticky card IT images. The presented YOLOv5 model demonstrated robustness and resilience for performing well under various illumination and adverse element exposure conditions. This work contributes to raising the bar for insect detection and monitoring. Furthermore, by creating another metric related to crop fields, this paper contributes to the development associated with forecasts of events regarding the agriculture field, such as the forecasting of disease and pest appearances.
There were limitations due to the absence of manual annotations of insects, which made it impossible to develop models for the detection and classification of insects trained with all available images.
The detection associated with the yellow sticky card and the subsequent training of AI models was performed in the first phase. In this phase, optimal results were obtained using YOLOv5s, and it was possible to perform the detection of yellow stick cards in all data sets.
The second phase was dependent on the first, as it was supposed to use the bounding box associated with the detection performed of the yellow sticky card in order to improve the accuracy of the detections of the insects in the traps. At this stage, a problem that this paper contributed to solving was faced: how does one perform the splits on the yellow sticky card bounding box image in a way that maximizes the quality of the model while not causing insects to be lost during the process of splitting and reconstructing the bounding boxes on the original image? The approach that ended up generally having the best results was OSS, where the tiles were the same sizes and overlapped, with 94.2% of precision in the test set with the YOLOv5x model. It can be concluded that the presented approach and the YOLOv5 models have potential in the detection of insects in insect traps scattered in an agricultural field.
It is possible to develop an insect detection model with the need for human supervision at times since the number and location of bounding boxes may be inaccurate. However, these errors are never in substantial quantities and can end up mostly suppressed when associating the number of detections performed in an image to a group. This association has advantages at the time of the data treatment and analysis.