1. Introduction
Insect infestations are a critical issue in agriculture, with significant impacts on crop productivity, economic stability, and environmental sustainability. The Food and Agriculture Organization of the United Nations (FAO) reports that each year, pests are responsible for up to 40% of global crop losses, leading to economic damages exceeding USD 70 billion [
1]. These challenges are further intensified by climate change, which contributes to increased outbreaks of diseases, pests, and viruses, as well as the development of pesticide resistance in insect populations [
2,
3,
4,
5]. To mitigate these losses, farmers have historically relied heavily on chemical pesticides. However, this approach has raised concerns due to its environmental repercussions, potential health risks, and challenges in maintaining long-term agricultural sustainability [
6]. Therefore, early and precise detection of pests using technological tools is essential for deploying timely management strategies that help reduce both economic and environmental costs [
7].
Conventional pest monitoring methods, such as manual counting on traps and visual inspections, are still widely practiced but are often labor-intensive, prone to human error, and may delay crucial interventions [
8]. These limitations can result in unchecked infestations, which increase the need for repeated pesticide applications, compounding costs, and ecological impact [
9]. Recent advancements in Artificial Intelligence (AI) and the Internet of Things (IoT) offer promising solutions through automated pest detection systems that enhance efficiency and accuracy, providing scalable alternatives for agricultural monitoring [
10,
11].
In particular, Deep Learning (DL) models have shown considerable promise in identifying specific insect species from images captured by cameras [
12]. In recent years, Convolutional Neural Networks (CNNs) in DL have driven significant advancements in computer vision, particularly in the area of general object detection [
13]. Various DL architectures, including YOLO (You Only Look Once) [
14], have proven effective in detecting small, challenging targets like insects.
Several studies have employed these methods for pest detection in agricultural settings. For example, in a recent study Kumar et al. (2023) [
15] developed a YOLOv5-based insect detection system, enhanced with attention modules to improve recognition accuracy, achieving a mean Average Precision (
mAP) of 93% on a custom pest dataset. Similarly, Verma et al. (2021) [
16] applied YOLO v3, v4, and v5 algorithms for insect detection in soybean crops, finding YOLO v5 to be the most accurate, with a
mAP of 99.5%. Liu et al. (2019) [
17] proposed PestNet, a DL framework that achieved 75.46%
mAP for multi-class pest detection on 80,000 labeled pest images, using a region-based end-to-end approach. Zhong et al. (2018) [
18] developed a DL system using a multi-class classifier to identify and count six species of flying insects, utilizing a modified YOLO framework along with image augmentations to enhance the dataset.
In another study, Giakoumoglou et al. (2022) [
19] evaluated YOLO-based models for identifying black aphids and whiteflies on adhesive traps, achieving a
mAP of 75%, while in a more recent study [
20] they introduced a synthetic data generation method, “Generate-Paste-Blend-Detect”, for agricultural object detection, achieving a
mAP of 66% for whiteflies using YOLOv8, effectively reducing the need for extensive annotated datasets. Xie et al. (2015) [
21] introduced a crop insect recognition method using sparse representations and multiple-kernel learning, achieving 97% accuracy across 24 insect classes. In another relevant study, Liu and Wang (2020) [
22] created a dataset featuring tomato pests and diseases for detection in natural settings, comprising 15,000 images across 12 distinct classes. Among various models tested, an improved YOLOv3 with Darknet53 yielded the best performance, achieving a
mAP of 92.39%. Lastly, Gutierrez et al. (2019) [
23] assessed computer vision, ML, and DL techniques for pest detection in tomato farms, finding DL to be the most effective approach.
Forecasting insect population growth has also become an important aspect of pest management, as it allows for proactive intervention strategies. Machine Learning (ML) and time-series models, which incorporate environmental factors like temperature and humidity, have proven valuable in enhancing prediction accuracy [
24]. Recent research emphasizes various approaches to pest population prediction, exploring both statistical and ML models in diverse agricultural contexts. Bahlai (2023) [
25] highlighted the complexities in forecasting insect dynamics due to species diversity, environmental variability, and the limitations of the current modeling techniques. Integrated single-species monitoring and near-term iterative approaches show promise for improving accuracy while balancing generalization. Marković et al. (2021) [
26] demonstrated that incorporating extended weather data improved pest occurrence predictions, achieving an accuracy of 86.3% for insect detection over five-day periods. Skawsang et al. (2019) [
27] applied Artificial Neural Networks (ANNs), random forests, and multiple linear regression to predict brown planthopper populations in rice fields using meteorological and satellite-derived crop phenology data. The ANN model proved most accurate, achieving a Root Mean Square Error (RMSE) of 1.686, outperforming both random forests and linear regression. Similarly, Rathod et al. (2021) [
28] developed a climate-based prediction model for Asian rice gall midge populations, with the ANN model achieving the best results.
For greenhouse pest management, Chiu et al. (2019) [
29] used ARIMA and ARIMAX models to forecast greenhouse whitefly populations, incorporating environmental data. The ARIMAX model proved the most effective, achieving an RMSE of approximately 1.30 for 7-day forecasts and providing valuable insights for pesticide scheduling. Kawakita and Takahashi (2022) [
30] further demonstrated the potential of seasonal ARIMAX in predicting common cutworm population dynamics. By incorporating past temperature data, especially during key developmental stages, the ARIMAX model provided reliable forecasts, making it a powerful tool for proactive pest management.
Building on these developments, this study presents an integrated approach for monitoring and forecasting black aphid populations in cucumber cultivations, leveraging DL, ML, and time-series models. The methodology combines the YOLO object detection framework, optimized for real-time identification of black aphids on adhesive traps, with the ARIMAX model to predict population trends based on environmental data. By enabling both immediate detection and forecasting, this system facilitates timely and informed pest management interventions, aiming to reduce reliance on chemical pesticides and improve crop protection. Additionally, this study explores the potential for integrating these models into mobile applications, promoting accessible, scalable solutions within Agriculture 4.0 and advancing sustainable pest management practices
The use case of this study focuses on detecting black aphids in cucumber cultivations, a pest that poses serious risks to crop health and yield [
31]. Black aphid infestations can lead to significant yield reductions, causing both economic losses and increased vulnerability to secondary infections [
32,
33]. Although there are aphid-resistant cucumber varieties, biological controls, and chemical pesticides, these measures alone often fall short of achieving effective pest management [
34,
35]. Therefore, digital technologies that enable early detection and timely alerts for growers are essential to prevent crop damage and support sustainable agricultural practices.
2. Materials and Methods
The methodology for this study involved the systematic collection of images using mobile phone cameras from late October to early December 2023. Five pheromone sticky paper traps were consistently maintained in greenhouse facilities of the Laboratory of Agricultural Constructions and Environmental Control, at the University of Thessaly (UTH), in the area of Velestino (latitude 39°44′, longitude 22°79′, altitude 85 m), Greece. Throughout the study period, the number of sticky paper traps remained the same, with a replacement occurring due to a high number of captured insects to ensure continued data quality. In parallel, sensors inside the greenhouse continuously captured environmental conditions, including ambient temperature, relative humidity, and barometric pressure. The imagery data were annotated, and the environmental data were transmitted to the Green Deal Decision Support System (DSS) of the H2020 PestNu project [
36] at the Centre for Research and Technology Hellas (CERTH), forming the basis for developing AI models for insect detection and population forecasting. The insect detection model was integrated into a mobile application [
37] to provide real-time monitoring for end users.
Figure 1 presents a high-level overview of the approach, illustrating the process from data acquisition in the greenhouse to the deployment of AI models for insect detection in a mobile application.
2.1. Image and Environmental Data Acquisition
During the data acquisition period, five pheromone-based sticky paper traps were consistently maintained within the greenhouse at the UTH facilities in Velestino, Greece. The number of traps remained constant throughout the study, with a single replacement made due to a high accumulation of insects on the sticky surface, ensuring consistent and reliable data collection. The traps were positioned systematically, and images were captured almost daily using mobile phone cameras from a distance of 30–40 cm. Image capturing occurred primarily on weekdays, with no data recorded during non-working days. This process resulted in a total of 220 images over the course of the 44-day study period (late October to early December) in a cucumber cultivation.
Figure 2a illustrates the deployment of the sticky paper traps in the greenhouse, while
Figure 2b shows an image captured with black aphids stuck on the pheromone-based sticky paper.
The mobile phone cameras were positioned to capture the entire sticky paper surface, providing a comprehensive view of the captured insects for correct annotation. Alongside the imagery data, the environmental conditions inside the greenhouse were continuously monitored by deployed sensors. The environmental conditions in the greenhouse were automatically controlled and recorded every five minutes using a climate control computer (SERCOM, Automation SL, Lisse, The Netherlands). The sensors collected key data, including temperature, humidity, soral radiation, and atmospheric pressure. The environmental measurements were transmitted in real time to the DSS [
36]. The integration of both imagery and environmental data into the DSS allowed for the subsequent development of AI models to detect black aphids and predict population dynamics. The dataset developed in the frame of this work is available for download at
https://zenodo.org/records/14097660 (accessed on 12 November 2024).
2.2. Image Annotation and Augmentation Techniques
In any DL model, the accuracy and reliability of the results heavily depends on the quantity and quality of the data used for training. Given that the original dataset consisted of only 220 images of black aphids captured on pheromone-based sticky paper traps, it was essential to annotate and expand the dataset to provide sufficient data for model training. The annotation process was performed using Roboflow [
38], resulting in 13,357 total annotations of black aphids, averaging 60.71 annotations per image.
To ensure the robustness of the DL model, it was crucial to augment the original dataset, which contained a relatively small number of images. Data augmentation is a vital step before training, particularly when dealing with limited datasets, as it helps prevent overfitting and enables the model to generalize better to unseen data. In this study, augmentations were designed to simulate real-world conditions observed within greenhouse environments. By artificially expanding the dataset, the diversity of the training data was enhanced without the need for additional image collection. Various augmentation techniques were applied, with a particular focus on changes that reflect typical variations in a controlled greenhouse setting. Adjustments in brightness, saturation, and exposure were implemented to account for fluctuations in lighting throughout the day. These augmentations help the model become more resilient to different light intensities and shadows that may affect the visual appearance of the insects.
The augmentations were applied specifically to the training set to artificially triple its size, ensuring that the model could handle a variety of environmental factors and conditions it might encounter in practice. Therefore, the original dataset was split into training and validation subsets randomly, following an 80–20% ratio, which resulted in 175 images for training and 45 images for validation. By applying the aforementioned augmentation techniques to the training set resulted in 534 images, bringing the total dataset size to 579 images after augmentation.
Table 1 provides a detailed overview of the dataset split after augmentation
Figure 3 presents a histogram of the object count per image, illustrating the distribution of black aphid annotations across the dataset and helping to understand the varying insect densities in the captured images. The combination of annotation, augmentation, and preprocessing allowed for the creation of a robust dataset that is suitable for training DL models to detect black aphids effectively.
2.3. Deep Learning-Based Insect Detection
In this study, the task of detecting black aphids was addressed using several versions of the YOLO object detection algorithm [
14]. Specifically, three versions of the YOLO algorithm were utilized: YOLOv5 [
39], YOLOv8 [
40], and YOLOv10 [
41]. For each of them, the largest variants (’large’ and ’xlarge’) were chosen for training in order to achieve the best possible detection precision and processing speed, making them suitable for different operational environments. The YOLO framework has become well-established in the field of object detection due to its ability to handle this task with a single pass through the network, significantly improving inference times when compared to two-stage models, which first generate regions of interest and then classify them. By treating detection as a unified task, YOLO has been widely adopted for scenarios that demand fast and efficient detection, including real-time pest monitoring in agriculture.
The YOLO algorithm has undergone continuous development through multiple versions, with each version enhancing its ability to detect smaller objects, such as insects, which pose unique challenges in computer vision. All YOLO models—YOLOv5, YOLOv8, and YOLOv10—are well-suited for small object detection, making them effective for insect detection tasks. Each version brings its own set of improvements and advantages: YOLOv5 is known for its efficiency and speed, making it ideal for resource-limited environments; YOLOv8 and YOLOv10 introduce architectural refinements that boost detection accuracy and performance, especially for complex backgrounds. This progressive evolution of the YOLO architecture makes it an excellent fit for this study, where the precise and efficient detection of small insect targets is essential.
To prepare the YOLO models for the specific task of identifying black aphids, transfer learning from models pre-trained on the widely used COCO dataset [
42] was employed. Transfer learning allowed the model to benefit from a foundation of general object detection knowledge, requiring fewer data to adapt to the specific task of pest detection. The models underwent a training process of 150 epochs, with early stopping triggered if no further improvements were observed after 20 epochs, thus preventing the model from overfitting and wasting computational resources. The dataset was resized to 640 × 640, 1024 × 1024, and 1600 × 1600 pixels during training for testing different configurations. The different resolutions allowed for a more comprehensive evaluation of the DL model’s performance at varying levels of detail and computational complexity. Finally, a varying batch size of 2 to 8 was used to maintain a balance between efficiency and accuracy.
Optimization was handled using the Stochastic Gradient Descent (SGD) algorithm, with a learning rate set at 0.01. The initial experiments also evaluated alternative optimizers, such as Adam and AdamW, but their performance metrics, particularly in terms of mAP50, were consistently lower compared to SGD. Consequently, SGD was selected as the most effective optimization algorithm for the final training process. The momentum was configured at 0.85 to facilitate smoother convergence, while a small weight decay factor of 0.0005 was introduced to help the models generalize better by reducing overfitting.
The models were trained and evaluated on a high-performance computing setup, powered by an Intel Core i9-14900F 2.00 GHz processor, an RTX 4090 with 24 GB of VRAM, and 128 GB of RAM (Techniki AE, Thessaloniki, Greece).
2.4. Machine Learning Insect Population Prediction
The task of predicting black aphid population growth was addressed using standardized ML models and time-series models trained on environmental data collected throughout the crop cycle using the sensors inside the greenhouse. These data included daily measurements of temperature, humidity, barometric pressure, and black aphid counts obtained from the detection phase. The aim was to build a predictive system capable of forecasting black aphid population growth over the course of seven days, enabling proactive pest management. To obtain reliable inputs for environmental conditions on future days (i.e., the seven days following the prediction start date), weather forecast data from the Open Meteo API [
43] were incorporated, allowing the models to make accurate predictions. Since the goal was to predict future values, the dataset was divided sequentially, with the initial 80% used for training and the subsequent 20% reserved for testing.
A combination of ML techniques and time-series forecasting models was employed to capture the complex relationships between environmental variables and population growth. Random forest models were utilized for both classification and regression tasks, alongside gradient boosting and Long Short-Term Memory (LSTM) networks. These models were selected due to their well-established performance in handling structured data and their ability to model complex interactions among features. The Leave-One-Out Cross-Validation (LOOCV) technique was used for model evaluation, which ensured a robust and unbiased performance assessment by iteratively training the models on all data points except one.
For the random forest models, both classifier and regressor variants were used. Each model was configured with 1000 estimators, a maximum tree depth of 7, and a minimum of 5 samples required for node splitting. These hyperparameters were selected to balance the model complexity and computational efficiency while capturing the underlying relationships in the dataset. Gradient boosting models were also employed using similar hyperparameters but with the addition of a learning rate set at 0.001, which allowed the models to improve their accuracy incrementally over successive iterations.
LSTM networks, designed to handle sequential data, were particularly suited to this task, as they capture long-term dependencies in time-series data. The LSTM architecture consisted of two LSTM layers, followed by batch normalization layers to stabilize training, and a final dense layer for output. This model was optimized using a learning rate scheduler that adjusted the learning rate when no improvement was observed, preventing overfitting during the training process. The LSTM was trained over 200 epochs with a batch size of 64 and an initial learning rate of 0.001, ensuring it had enough capacity to learn the temporal patterns in the dataset.
LOOCV was applied to evaluate the performance of all models. This cross-validation method is known for providing robust performance estimates, as it ensures that every data point is used for both training and validation. By iterating through the entire dataset, this method avoids overfitting and provides a comprehensive evaluation of model accuracy and reliability.
In addition to the ML models, time-series forecasting techniques were also used, including ARIMA (AutoRegressive Integrated Moving Average), ARIMAX (ARIMA with exogenous variables), and SARIMAX (seasonal ARIMA with exogenous variables). These models are designed to handle time-dependent data and were chosen for their ability to capture trends and seasonality in insect population dynamics. ARIMA models rely solely on past values of the target variable to make predictions, while ARIMAX incorporates external factors, such as environmental data, to improve the accuracy of forecasts. SARIMAX adds a seasonal component, which is particularly useful in agricultural studies where population patterns often follow seasonal trends.
For the ARIMA and ARIMAX models, the key parameters include p (the auto-regressive order), d (the degree of differencing), and q (the moving average order). These values were tuned using an exhaustive search across a predefined range to find the optimal combination for the dataset. The SARIMAX model required additional tuning of the seasonal order (s) to capture periodic fluctuations in the aphid populations. The same brute-force approach was used to fine-tune this parameter, ensuring that the model’s seasonal adjustments were well-calibrated.
2.5. Evaluation Metrics
To assess the performance of the models employed in this study, a variety of evaluation metrics was utilized, each selected to provide insights into different aspects of model accuracy. For the DL models tasked with detecting black aphids, the key metrics used were precision, recall, and the mean Average Precision (mAP) at an Intersection over Union (IoU) threshold of 50%. Precision quantifies the proportion of correctly identified instances among all the positive predictions made by the model, essentially measuring how often the model’s detections were accurate. Meanwhile, recall indicates the model’s ability to find all relevant instances, reflecting its capacity to avoid missing detections. The mAP is an important aggregate metric that provides a comprehensive view of the model’s overall performance. The mAP50 is a common benchmark for evaluating object detection tasks. Also, the detection speed (in seconds) was measured to assess the balance between model accuracy and inference speed.
For the task of predicting insect population growth, different metrics were necessary, as this problem involves regression rather than classification. The primary metric used for evaluating the population prediction models was the Mean Squared Error (MSE). MSE is widely used in regression tasks to measure the average of the squared differences between predicted and actual values. It reflects how closely the model’s predictions align with the real-world data. A lower MSE value corresponds to higher prediction accuracy, indicating that the predicted values are closer to the observed values. This metric is particularly useful in understanding the variance between predicted and actual insect counts, making it a reliable choice for assessing the model’s performance in forecasting tasks.
By applying these metrics, this study ensured that the evaluation was both comprehensive and targeted, providing a detailed picture of how well the models performed in both detection and forecasting tasks. These metrics allowed for a rigorous comparison of model effectiveness and were instrumental in determining their real-world applicability.
4. Conclusions
This study explored the detection and prediction of black aphid populations within a greenhouse environment with cucumber cultivation using DL and ML models integrated with real-time monitoring capabilities. By employing YOLO-based object detection models, the research demonstrated effective insect detection with varying model complexities and input sizes. Specifically, three different input image sizes—640 × 640, 1024 × 1024, and 1600 × 1600 pixels—were used, from which YOLOv10l emerged as the best-performing model, achieving an mAP50 of 89.1% with an inference speed of 0.134 s at an input size of 1600 × 1600. This model was particularly advantageous for accurate insect detection, balancing high detection accuracy and computational efficiency. Notably, YOLOv8, despite its newer architecture, underperformed compared to YOLOv5, highlighting that increased complexity does not always yield better results.
The environmental dataset used for population prediction spanned 44 days and included environmental variables, such as temperature, humidity, and barometric pressure, alongside daily insect counts. This limited timeframe presents a challenge; a longer data collection period could improve model robustness and accuracy. Nevertheless, the time-series ARIMAX model effectively captured population trends, outperforming more generic ML models, and demonstrated the importance of incorporating environmental data for predictive accuracy. The ARIMAX model achieved an MSE of 75.61, corresponding to an average deviation of 8.61 insects per day. Expanding the dataset could further refine this model for improved pest management.
By integrating the detection model into a mobile application, real-time monitoring of pest populations is made accessible to users in agricultural settings. However, future work should focus on incorporating the population prediction models directly into the mobile application. This integration would enable end users to receive both immediate detection data and short-term population forecasts, facilitating timely and informed pest control decisions.
Future work could involve developing models tailored to specific environmental and pest conditions. Moreover, extended deployments in open field and greenhouse deployments could periodically update models, progressively enhancing their accuracy for early intervention in pest management. Regular and consistent cleaning or replacement of the pheromone sticky paper traps could also enhance the reliability of predictions by reducing noise from excessive insect accumulation. This study establishes a foundation for scalable, AI-driven pest monitoring solutions that support precision agriculture and sustainable pest management practices.