1. Introduction
Recent advancements in artificial intelligence models have enabled the integration of biological and environmental information within greenhouses. This has led to extensive research on the application of disease diagnosis and crop management, aiming to enhance the efficiency of greenhouse operations and crop production. In the field of agriculture related to crop production within greenhouses, various tasks such as disease detection and classification [
1], analysis of crop phenotypes to identify optimal environmental conditions [
2], and generation of environmental information metadata for cultivation status analysis [
3] are performed using machine learning techniques. These tasks aim to pursue increased productivity and profits through real-time feedback and compensation mechanisms based on the interaction between the environment and crops. To automate operations aimed at minimizing crop damage caused by diseases, a diagnostic model capable of acquiring crop images and environmental information automatically, as well as performing classification tasks, is necessary. Currently, the CNN (Convolutional Neural Network) model has demonstrated excellent performance and is widely used for image classification tasks [
4]. Assuming that humans identify and resolve the problem of classifying crop diseases, utilizing the location information of disease symptoms and the environmental information associated with where these symptoms occur can provide greater clarity and aid in more accurate classification. For example, when performing CNN analysis on images, Global Average Pooling (GAP) is applied to compute the average values of the feature maps. In this case, the detection performance of the disease symptoms is significantly influenced by how clearly the boundaries of the symptoms in the original (RAW) image are distinguished, as GAP employs the average values of spatial information [
5]. To address this, efforts are made from the outset to acquire a diverse set of original (RAW) images of the specific disease symptoms for training. Data augmentation techniques are employed to enable the model to undergo various forms of learning, aiming to enhance the detection and classification accuracy of the model.
In particular, recent classification studies aim to design and optimize network structures to enhance feature extraction capabilities. They conduct research under the assumption that the dataset’s quality remains consistent and immutable. Although these endeavors place greater emphasis on optimizing network structures, they overlook the impact of data quality improvement through preprocessing and augmentation techniques on the detection model.
However, empirical observations reveal significant variations in detection accuracy for the same disease, depending on the greenhouse environment in which the model was trained, as well as environments different from the greenhouse. The study presented in [
6] illustrates that variations in data quality can lead to differences in accuracy in the classification of plant diseases and pests. The automation techniques in this paper propose data-centric machine learning and seek efficient methods to automatically generate suitable datasets to enhance the performance of artificial intelligence models. Specifically, this research aims to establish an industrial foundation for collaborative and sustainable agriculture, incorporating pest control robots and improving analytical performance through the automation of data collection and preprocessing in smart agriculture. In this paper, inspired by these challenges, we designed and implemented a system to enhance the classification accuracy of crop disease images in real time by utilizing an AI-based integrated environmental control system that integrates images acquired in real time through a portable imaging device at regular intervals. This system enables real-time transformation, augmentation, and feedback of crop images, overcoming the differences in images due to varying environmental conditions.
The structure of this paper is as follows: In
Section 2, detection techniques of deep-learning models are discussed for disease data recognition and classification.
Section 3 elaborates on the plant diseases and pests monitoring devices for securing experimental and validation data, as well as the description of experimental data collection and tomato disease diagnosis model utilizing the mentioned devices.
Section 4 tests the performance of the disease classification system using the collected empirical data and describes the results. Finally, in
Section 5, we summarize the achievements of this paper and provide insights into the expected outcomes and future research directions.
2. Related Work
In this study, the data classification technique for the five types of diseases affecting tomatoes (blight, powdery mildew, gray mold, leaf mold, and tomato yellow leaf curl virus) is divided into two modes: object detection mode and Region of Interest (ROI) mode. The object detection mode aims to detect both disease class and bounding box information from the presented images based on research in plant disease recognition [
7,
8,
9,
10,
11]. In this scenario, the system can detect multiple categories corresponding to various diseases from the same sample image.
As depicted in
Figure 1, the Control Class, as shown, is not a priority for the system in terms of object detection. However, during training, it serves as a class that can provide features and information about potential anomalies. On the other hand, the Target Class includes some classes that are part of the object detection objective. The key approach for conducting this study involves progressively improving the performance of the main categories (target) by training the model on the entire category (controlled) of the entire training dataset [
12].
Furthermore, an imbalance in data quantities (imbalanced data) can generally have a detrimental effect on the training process [
13]. Hence, in the composition of the dataset, data augmentation should be applied differently for each class to achieve a level of balance, ensuring that the augmentation amount varies for each class to maintain an approximately equal amount of data for the class with the maximum quantity. As shown in
Table 1, machine learning has been utilized to detect anomalies in various types of datasets. Logistic Regression (LR) is primarily utilized for binary classification problems, classifying data into ‘normal’ or ‘anomalous’ categories. It models the relationship between the data and the results by inputting a dataset labeled in advance as normal or anomalous into the logistic function. In this case, if the output surpasses a specific threshold, it is classified as an anomalous datum; otherwise, it is classified as normal data [
13]. However, it assumes that the criterion for separating data is linear. Thus, when classifying multidimensional data such as image data, it encounters performance degradation issues [
14].
Random Forest (RF) is widely used for classification and regression problems, where it independently learns multiple decision trees and combines their results to make the final predictions [
15,
16]. Although it demonstrates high accuracy for various data types and mitigates overfitting, it tends to have slower prediction speeds compared to other models when trained on very large datasets due to the extended training time [
17].
Support Vector Machine (SVM) is a supervised learning algorithm used for data classification. It employs a kernel function to map the data into a higher-dimensional space, specifying the optimal location of the decision boundary that separates ‘normal’ and ‘anomalous’ data [
18]. SVM exhibits excellent performance for both linear and non-linear data, preventing overfitting and enhancing generalization performance. However, it tends to be time-consuming for large datasets and encounters challenges in multi-class classification [
18,
19,
20].
Variational AutoEncoder (VAE) is a type of generative model used for detecting anomalies in image data and generating data. It learns the probability distribution of the data to generate new data. In the encoder part, it takes image data from the UCSD dataset as input and maps it to the probability distribution of the latent space. Mean and variance are learned in this process, and data are generated by sampling from the latent space. In the decoder, the generation function is restored, resulting in data being generated in a form similar to the input data [
21]. However, VAE assumes a simple parametric distribution in the latent space and may struggle to model highly complex, multidimensional data distributions [
22,
23].
Generative Adversarial Networks (GANs) are models capable of generating data that is highly like the input data, even though they do not actually exist. They take in data and discern the distribution of the data. Once this understanding is achieved, the generative model creates data that is significantly similar to the distribution of the input data [
24]. However, the generative model struggles to create diverse data and often encounters the Collapse Problem, where it generates only similar data rather than a variety of data [
25,
26].
Faster Regions with Convolutional Neural Networks (Faster R-CNN) is one of the deep-learning-based models known for accurately and swiftly performing tasks related to object localization and classification. Faster R-CNN sequentially trains a Region Proposal Network (RPN) and Region of Interest (RoI). The RPN is trained by detecting images and the corresponding objects in the training dataset. It generates candidate regions using Anchor Boxes and classifies each candidate region as either an object or not, adjusting the precise position of the object. RoI extracted using RPN are transformed into fixed-size feature maps through RoI Pooling. Subsequently, the classification of normal and anomalous data is carried out using Fully Connected Layers [
27]. However, when data imbalance occurs, it makes training difficult and leads to biased learning of the model, resulting in lower accuracy in detecting anomalous data.
Studies on anomaly detection in various types of datasets are listed in
Table 1.
Table 1.
Studies on Anomaly Detection in Various Types of Datasets.
Table 1.
Studies on Anomaly Detection in Various Types of Datasets.
Detection Technique | Study | Dataset | Performance |
---|
| Wright, R. E., et al. [13] Wang, X., et al. [14] | Not defined | Not defined |
RF | Breiman, L., et al. [15] Kim, K., et al. [16] Park, H., et al. [17] | BGP [6,16] Sensor [17] | Acc: 0.9959 [16] Acc: 0.99 [17] |
| Noble, W.S., et al. [18] D. Wei., et al. [19] García, S., et al. [20] | UCI [19] CUT-13 [20] | Acc: 0.9937 [19] Acc: 0.5 [20] |
VAE | An, J., et al. [21] Ghosh, P., et al. [22] Xu, J., et al. [23] | UCSD [21] Mnist [22,28] CIFAR-10 [22,29] CELEBA [22,30] PTB [23,31] | Accuracy: 0.99 [22] |
GAN | Goodfellow, I.J., et al. [24] Park, S.W., et al. [25] Pei, S., et al. [26] | Mnist [24,28] KDD 99 [24] CIFAR-10 [26,29] | Accuracy: 0.9975 [26] |
Faster R-CNN | Benjdira, B., et al. [27] | Not defined | Not defined |
This paper not only emphasizes approaches to the data discussed previously but also describes devices operating as part of the implementation of various useful solutions, such as real-time monitoring and production system support, farm management systems, farm monitoring systems, geographic information systems, and decision support systems for weather conditions. As discussed by Latino’s research in [
28], drones, robots, and UAVs can be used not only for data collection but also for automating various activities such as material management and related cost savings. Additionally, they are employed in discovering crop diseases or evaluating food quality through image recognition. Farmers can achieve more efficient production and improve environmental monitoring through digital technology, big data, and analytical applications. Radogna’s research in [
29] involves the development of low-cost framework devices embedded in embedded systems to automatically detect food contamination. They used Molecularly Imprinted Polymer (MIP) detection technology to continuously monitor the environment and identify problems in the pesticide treatment stage. Although the targets and detection methods differ from those covered in this paper, the study addresses technology for early response through monitoring, reducing costs, and preventing damage. It also contributes to achieving sustainable agriculture by increasing production efficiency through this approach. The analytical technology covered in this paper is built into a system that supports manual shooting and input using general cameras, enabling the use of artificial intelligence models at a low cost in conjunction with robot automation. According to Ghobakhloo’s research in [
30] and Ejsmont’s research in [
31], it is predicted that the introduction and collaboration of technology for efficiency and sustainability through the Fourth Industrial Revolution, including the artificial intelligence technology discussed in this paper, will be possible in all fields in the future.
3. Design and Implementation of Tomato Disease Classification Using Real-Time Augmented Data
The two main methods proposed in this paper are the automation technology for plant disease and pest monitoring and a system for preprocessing and augmentation transformations for improving the analysis performance of plant disease and pest data combined with automation technology, as depicted in
Figure 2. The integrated platform for plant disease and pest diagnosis discussed in this study has the following structure. In the case of a greenhouse with a favorable image-capturing environment, image-based analysis is performed in real time in normal mode. If data suspected as disease symptoms persistently detected at specific locations (not identified as normal leaves) in normal mode, augmented data are generated in conjunction with environmental information, and analysis is conducted at the locations indicating abnormal signs.
The process is demonstrated wherein analysis is conducted when abnormal signs are detected, even for diseases not well known to the user, as depicted in
Figure 3.
The disease image collection system used in this study utilized a crop image acquisition device developed by the Rural Development Administration’s Smart Farm Development Division. The basic structure is depicted in
Figure 4. The image acquisition device, as shown in
Figure 5, consists of a robot arm-mounted PTZ-supported RGB camera used for disease recognition in crops, an adjustable lift, a light measurement sensor, temperature and humidity sensors, RTK-GPS for autonomous movement within the greenhouse, a linear motor, and a line-scan barcode scanner integrated into a mobile platform.
The basic deep-learning architecture of the model used in this study utilizes the Faster R-CNN structure of the VGG-16 feature extractor, as illustrated in
Figure 6. This Faster R-CNN consists of a CNN backbone, RoI Pooling layer, and a fully connected layer, with two branches for classification and bounding box regression. RPN is executed with an image input into the backbone convolutional neural network. The network learns whether there is an object at a given location in the input image and estimates its size for all points on the feature map outputted from the CNN backbone. The bounding box proposals of the Region Proposal Network (RPN) are employed for pooling features by the ROI (Region of Interest) pooling layer on the backbone feature map. The operational principles of the RoI Pooling layer are as follows:
(a) selecting the regions corresponding to the proposals of the backbone feature map.
(b) dividing these selected regions into a fixed number of sub-windows.
(c) performing max pooling over the sub-windows to achieve a fixed-size output. The currently implemented model can detect various tomato diseases, including blight, leaf mold, gray mold, white powdery mildew, and yellow leaf curl virus [
32].
The Faster R-CNN possesses analyzable features independent of camera types and image sizes, enhancing object recognition performance by extracting regions where objects are likely to be through region proposal. The proposed fully convolutional network has a structure, as shown in
Figure 7.
The summary of the parameters used is as follows:
Parameters:
Approximately 100,000 iterations over 50 h
Utilization of the VGG16 network architecture
Cross-validation using 80% of the data for training, 10% for testing, and 10% for validation (excluding 308 unseen data used in the final experiment from the set)
Fine-tuning a pre-trained model with the ImageNet dataset
Implementation of Data Augmentation
Application of Batch Normalization
Use of ReLU as the activation function
In this study, we performed the setting of Regions of Interest (ROI) and object detection within disease images through a process as shown in
Figure 8. One of the significant considerations in this paper was how to enhance detection performance as closely as possible, even when farms and environments change. To train a deep-learning model robust to diverse environments, we employed a method where the user selects areas of interest, considering the changes in the environment, specifically focusing on image regions less affected by environmental changes. These selected areas were designated as unknown regions, and an ensemble technique was applied through iterative processes. Consequently, for five types of disease areas, we achieved an average classification accuracy level of at least 88%, reaching an average of 94%.
In this study, experiments on object detection mode were conducted as follows. Initially, the bounding boxes and labels of the existing dataset were modified and used as a baseline dataset. The entire model was trained using this dataset, and the performance was evaluated.
Figure 9 illustrates the detection results of the baseline dataset.
The performance of the learning model is directly associated with high-quality datasets in terms of both qualitative and quantitative aspects. However, it is challenging to collect a large amount of actual data from various environments to learn the changes within and between classes of the disease targets due to their characteristics. Therefore, during experiments in the real environment, the system typically encounters data it has not been trained on since it is exposed to unseen data in real-world applications. Constructing a dataset that covers all possible scenarios is challenging. To enable the system to adapt, the learning model needs to be trained on new information. Furthermore, to address situations where the system encounters new diseases or patterns that it has not learned, techniques such as generating augmented data using methods like CycleGAN or minimizing/distorting image distortions through physical (reflectors) or software-based optical reflectance corrections should be employed. This is crucial for handling suspected regions representing new diseases (Unknown) during real-world scenarios.
The strategies for handling these diseases are as follows:
(1) First, create new classes to handle the information on the novel diseases.
(2) Separate classes for the background area around the crops, physically undamaged healthy leaves, and cases showing distinct forms based on the specific disease.
(3) Design units of learning feature responses before the final classifier. This approach helps prevent misclassification by the system until it obtains responses for the new diseases.
In this paper, we propose a system that collects disease data along with greenhouse information during disease data collection. Based on the greenhouse information, the disease data in the images are transformed and augmented to aid in classification. The proposed design is a system that utilizes a standard-based integrated environmental control system with an AI model integrated with a disease image automatic acquisition device. This system acquires and analyzes images and environmental data in real time by being connected to the disease image automatic acquisition device. To achieve this, we constructed a pilot device incorporating a Faster R-CNN-based disease image classifier, crop image acquisition device, and JETSON NX Board, including sensor nodes and an AI-based integrated environmental control system. Furthermore, in this study, we conducted design modifications to the entity-relationship modeling of the Smart Farm system’s DB for integrating a cloud-based Smart Farm system for preprocessing images and an integrated DB for disease diagnosis services. This modification was aimed at enabling seamless integration of Smart Farm system data, preprocessing data for disease diagnosis, and new disease classes for future Smart Farm system data and disease preprocessing data, allowing for an organic service configuration. The integration of disease diagnosis-related information into the database involved modifications and development of entity-relationship modeling, focusing on information closely related to disease occurrence, such as cultivation, environmental, management, and facility-related data in the Smart Farm. The entity-relationship diagram (ERD) for storing disease diagnosis information and results was designed to consider farm facilities and the environment, as shown in
Figure 10.
To recognize disease imagery devices, we devised a system by integrating the current artificial intelligence models into a standard-based compound environmental control system. This system processes greenhouse environmental information and disease image data to enhance the accuracy of undetected data detection and disease diagnosis classification through data transformation, augmentation, and feedback. The composite environmental control system considered in this research adheres to the KS X 3267 [
33] and TTAK.KO-10.1172 [
34] standards-based interface for device compatibility. It employs a Plug and Play (PnP) approach to recognize imaging devices and integrates a CNN model implemented in Python code, enabling environmental data collection and image analysis. Additionally, the system is based on open-source technologies and incorporates a 4-channel relay module and sensor nodes within the Arduino environment. To process images, a classification system was established to handle unknown new diseases. Images that fall under the unknown classification, i.e., unclassified image data, were augmented to create an enhanced dataset.
As depicted in
Figure 11, the detection and classification stages of this study can be described as follows [
35]:
- -
This study focuses on recognizing 5 diseases (baseline dataset).
- -
The deep-learning model is designed to recognize these 5 diseases. However, during testing, if the input image does not match the features of the developed model, it is recognized as “unknown”.
- -
When data are identified as “unknown” by the model, the system can adapt to new diseases with the support of domain experts. Subsequently, more data corresponding to the new disease needs to be collected.
- -
To build an expanded dataset, new classes can be added to the baseline dataset, and the deep-learning model is trained using this expanded dataset.
- -
This process involves fine-tuning the hyperparameters using the existing deep-learning model to learn the parameters.
- -
Whenever new unknown data are inputted, the above procedure is repeated to extend the baseline model. The system recognizes diseases from the baseline dataset, and new diseases are incrementally added.
The light data for correcting the images acquired using the image analysis device shown in
Figure 12 was measured based on solar radiation (W/m
2), and an estimation formula for solar radiation was utilized to analyze the lighting conditions based on the weather. The analysis of the solar radiation incident on the surface can be derived from extraterrestrial solar radiation, denoted as
I0. In the following equations, Equation (1) represents
I0 as a function of latitude, solar declination, and hour angle, denoting the solar radiation before passing through the atmosphere. Equation (2) expresses the coefficient
KT related to cloud cover, representing the ratio of solar radiation reaching the surface, denoted as
I, to the solar radiation before passing through the atmosphere,
I0. This can be expressed by the following equation [
32].
is an expression for estimating the amount of insolation before passing through the atmosphere by calculating a function using latitude, solar declination, and time angle.
may be expressed as a ratio of
to
by an equation representing clearness.
Figure 12.
Image Analysis Device.
Figure 12.
Image Analysis Device.
The image data are initially analyzed based on the RAW format. If detection does not occur within 10 s or if the similarity of the suspected region in the feedback of detection results is below 50%, adjustments to the image brightness are made in three stages (clear sky, partly cloudy, overcast) based on the solar radiation estimation formula according to the light conditions. Additionally, up to eight augmented data per image, including 90, 180, and 270-degree rotations, as well as vertical and horizontal flips, are utilized. This allows for verification of whether there is a 10% or more improvement in similarity based on the detection or feedback data [
36].
The light intensity measurement information collected in the greenhouse, the ratio of direct and scattered light, and the light saturation point information for each season and crop were reflected in the insolation estimation information obtained through the above formula, as shown in
Table 2. In addition, it was attempted to improve the recognition rate by determining whether and how to process image augmentation by recognizing an environment in which disease occurs easily.
According to the light environment is classified into 3 stages (clear sky, intermediate sky, and overcast sky) based on the solar radiation estimation formula and environmental information. If image data are not detected within 10 s after analysis based on RAW data, or if the similarity is less than 50% in the detection result feedback, data augmentation is determined using the light environment information and disease occurrence probability information. Data augmentation generates up to 9 augmented data per image, such as adjusting image brightness and rotating 90, 180, 270 degrees, up/down, left/right inversion, etc., as shown in
Figure 13. It was designed to verify whether there is a similarity improvement of 10% or more based on the feedback data after attempting detection with the augmented data.
As shown in
Figure 14, the AI-based integrated environmental control system automatically adjusts the environment (temperature, humidity, moisture, light, etc.) according to the optimal environmental settings for crop cultivation after verifying disease diagnosis.
5. Conclusions
This study raised the disease diagnosis accuracy to 92.5% as a gradual learning study on the existing disease diagnosis engine. In addition, the preprocessing technology research that combines external light environment information and environmental information utilization prediction information for disease occurrence in the greenhouse showed 95.2% accuracy in the demonstration stage of disease symptoms occurring in the actual greenhouse environment. In other words, it has been shown that crop disease diagnosis technology that has stayed at the laboratory level can be discriminated against with high accuracy through real-time pre-treatment technology in the field. When the surrounding environment sensitively affects the identification, such as disease diagnosis in a greenhouse, there is a limit to increasing the identification accuracy through gradual learning. In addition, when learning a specific class, it may affect the existing disease diagnosis engine, resulting in a decrease in identification accuracy.
In addition, in this paper, a complex environment control system including a reference sensor node that can acquire images using a mobile imaging device and process image and environmental information in a complex manner is configured, and an artificial intelligence classification model is used in the control system to classify and feedback the augmented image data according to environmental changes in real time to prevent the disease from being unclassified due to image loss due to the light environment. By utilizing a standard-based composite currency system equipped with an artificial intelligence model that is linked to an automatic image acquisition device, the system acquires and analyzes images and environmental data in real time, which will contribute to improving the sustainability of the greenhouse.
From a scientific perspective, examining the plant disease diagnosis service reveals that by applying artificial intelligence in agricultural fields instead of human cognition, time and costs can be reduced. Additionally, the future integration of agricultural robot technology could lead to the development of intelligent robots that autonomously recognize crop diseases and perform pest control.
In terms of pure technological contributions, overcoming the limitations of existing research that require substantial resources and data was achieved through performance improvement via preprocessing combined with environmental information.
From a societal standpoint, farmers with limited farming experience or those venturing into new crops due to climate change can make swift decisions for greenhouse pest management, enabling stable smart farming.
In future research, we would like to build a system that can analyze environmental information such as temperature and humidity, as well as literature information on signs of disease so that disease classification and prevention can be performed.