1. Introduction
Recently, rapid climate change has had a great impact on the natural ecosystem and human life, and specifically on crop productivity [
1]. In particular, an increase in temperature generally shortens the life cycles of plant pathogens while increasing their density, which also increases the possibility of generating mutants with strong pathogenicity [
2]. In addition, it causes problems such as crop diseases and pests, which were rare in the past, or the expansion of the areas vulnerable to plant diseases that occur only in a limited region. Currently, the decline in crop yields due to plant diseases is estimated at around 16% worldwide [
3], but if climate change continues, it may become very difficult to maintain current crop productivity. Therefore, the problem of crop productivity reduction due to climate change may intensify in the future, and there is also a growing lack of professional manpower to help solve it.
Deep learning has made a breakthrough in the field of computer vision in recent years, and there has been much research on how to use it for the early diagnosis of plant diseases [
4,
5,
6,
7,
8,
9]. Deep learning is suitable for diagnosing visual disease symptoms in the leaves and stems of plants because it is possible to extract and learn high-level features from images. Using this deep learning method, Mohanty et al. [
10] conducted a plant image disease study based on a deep convolutional neural network model. The authors conducted an experiment to classify 26 disease images from 14 crop species using Google LeNet and AlexNet models and reported an average accuracy of 99.3%. Ferentinos et al. [
11] also pointed out that models taken on datasets generated in controlled experimental environments achieved 99.5% accuracy when tested with models AlexNet, AlexNetOWTBn, GoogLeNet, OverFeat, and VGG, but did not perform the same in real-world environments. The accuracy dropped to 33% when using the learned model in the real world. Hassan et al. [
12] conducted a study on image-based plant disease recognition using the Shallow VGG network. The Shallow VGG network is constructed using only a few layers from a pre-trained VGG network and serves to perform feature extraction from input data. The authors proposed to classify the features extracted through this through the Random Forest and Xgboost models. As a result of the experiment, it was reported that the two combined models proposed have better classification performance than the Vanilla VGG Network model. Lee et al. [
13] performed image-based plant leaf disease research based on Recurrent Neural Network. The authors point out that CNN-based classification models do not clearly capture disease areas, and to solve this problem, they proposed a network combining Gated Recurrent Units (GRU) and Attention modules. They verified through experiments that the proposed method can capture clear disease regions unlike CNN and reported that classification performance was improved.
Although many vision-based plant disease recognition advanced studies using deep learning are in progress, some diseases are not common in nature, making it difficult to collect the same amount of data from healthy plant samples. Therefore, datasets for plant disease diagnosis collected from natural environments often lack samples of these rare disease classes, a problem known as imbalanced data in machine learning. When trained from such imbalanced data, most supervised learning models suffer from the overfitting problem, in which decision boundaries are biased toward major classes [
14,
15]. To solve this problem, previous work has adopted a manual augmentation method using professional manpower, but this is very inefficient and has high costs.
In this paper, we propose a data augmentation method based on image-to-image translation that can increase the sample diversity of diseased leaf datasets with insufficient numbers. The proposed augmentation method performs translation between healthy and diseased leaf images through the cycle-consistent generative adversarial networks (Cycle GAN) [
16]. However, vanilla CycleGAN shows poor plant leaf image translation results, and we point out and improve on the following two problems: (1) poor reflection of the evident texture of the target disease, and (2) poor preservation of the shape of the input leaf image due to indiscriminate transformation in the background region. To solve this problem, we further utilize an Attention mechanism and a binary mask that explicitly indicates the location of the leaves. The attention mechanism is a method of selectively focusing on key information within given information and can greatly improve the expressive ability of the proposed plant leaf translation model. In deep learning, attention mechanisms originated in human perceptual systems as a method of selecting key information to focus on [
17,
18]. Many studies combining attention mechanisms have been conducted, and it has been demonstrated that the model can significantly improve the feature expression ability [
17,
18,
19,
20,
21]. We used the Convolutional Block Attention Module (CBAM) [
22] to construct a network that allows the generator to decide which features should be preferentially selected to deceive the discriminator. CBAM is a lightweight attention module that can learn attention maps corresponding to spaces and channels from a given feature map. We use CBAM to configure the generator network as a Pre-Activation Residual Attention Network and perform adaptive feature refinement during the training process. Also, we train a background loss function that regulates the generator to not perform unnecessary transformations on background regions. The background loss suppresses unnecessary transformations between the input data and the transformed image through a binary mask explicitly representing the background region of the input image. This assists the attention module in quickly focusing on the leaf itself, excluding the background where it cannot perform a transformation. Through these improvements, we generated a more plausible diseased leaf image compared to existing methods and conducted an experiment to verify whether this data augmentation method could further improve the performance of a classification model for early diagnosis of plants. We conducted experiments using apple, potato, and grape leaf data provided by the Plant Village [
23] dataset and verified whether the proposed method can solve the problems of images produced by vanilla CycleGAN. We then compared the experimental results of various classification models to verify whether the extended dataset from the images generated by the proposed method can improve performance. Experimental results prove that the proposed augmentation method can solve the problem of lack of diversity in some disease samples that are difficult to collect by generating usefully translated images and contributing to improving classification performance.
The main contributions of this study are as follows:
In this study, we focus on the overfitting problem caused by imbalanced data problems created by some disease images that are rarely observed in natural environments and propose a novel plant image augmentation method based on CycleGAN to solve this problem.
We point out two issues that vanilla CycleGAN needs to address for image augmentation: (1) poor reflection of the evident texture of the target disease, and (2) poor preservation of the shape of the input leaf image due to indiscriminate transformation in the background region.
To improve the poor image conversion results of CycleGAN, we utilized the attention mechanism and binary mask, and solved the bias problem of classification models by constructing an extended dataset.
We have conducted extensive experiments to verify that the proposed method can improve the overfitting problem due to imbalanced dataset.
2. Related Studies
Recently, various data augmentation studies on the early diagnosis of plants using Generative Adversarial Networks (GAN) are being conducted. Most of them aim to correct problems with existing models to verify that more realistic images can be generated or to solve overfitting problems by replacing insufficient data.
Wu et al. [
24] worked on recovering tomato disease data via DCGAN (Deep Convolutional GAN) [
25]. Their contribution demonstrates that samples generated via DCGAN can improve the efficiency of classification models. However, it was confirmed that the generated samples are not clear and are not likely images in reality. Also, there is a problem of not being able to control the class of the image to be created.
Deng et al. [
26] proposed an RHAC_GAN model that improves ACGAN (Auxiliary Classifier GAN) to solve the problem of tomato disease data augmentation. RHAC_GAN aims to solve the problem of lack of diversity among generated images by using hidden variables obtained from the discriminator along with inputs from the generator. Regarding the lack of diversity in the generated images, the authors explain that traditional ACGAN generators have difficulty learning different information within a class. In addition, a network was constructed by combining the Attention module and the Residual Block so that the generated images reflected obvious disease features. Plant leaf images generated by RHAC_GAN are sharper and more realistic compared to DCGAN and ACGAN.
Cap et al. [
27] proposed a LeafGAN model that solved CycleGAN’s problem of changing unwanted content such as background when transforming plant leaf images. Their contribution is to propose an LFLSeg module that can segment leaf regions relatively simply and use it to control the transformation of background regions. The author proposed a background loss function to control the transformation of the background area by the generator and used only the input from which the background was removed as the input of the discriminator so that the discriminator for the background area did not react. The experiment was conducted based on actual data collected in the natural environment, and it was reported that classification performance was partially improved when learning was performed with a dataset augmented by the proposed model.
Xu et al. [
28] proposed a Style-Consistent Image Translation (SCIT) model that aims at converting to the target domain while preserving information about the parts related to styles independent of labels in the translation process of plant images. To this end, we proposed a style loss calculated through a pre-trained vgg19 network. The authors verified through experiments that the amount of change to the target domain can be adjusted according to the change in the coefficient value of style loss.
5. Discussion
Data generation research through GAN is actively being conducted in various domains. However, it is very difficult to train GAN models well, and usually requires large amounts of data for successful results. In this study, we focused on improving the expressiveness of the model through a concentration mechanism, not a method to increase the amount of data. Through this method, by fixing the number of data required for learning and partially modifying the model, it was possible to generate a clearer image than the result of the existing CycleGAN. The advantage of the proposed method is that it preserves the shape of the input image very well. These results show very clear transformation results compared to the results of GAN-based plant leaf augmentation techniques proposed by Wu et al. [
24] and the Conditional GAN-based plant leaf augmentation techniques proposed by Abbas et al. [
40]. However, our proposed method has a limitation in that it can only be augmented by the number of healthy leaf images as a 1:1 mapping function. To address these issues, there is room for improvement by conducting research through additional models such as StyleGAN [
41] and StarGANv2 [
42]. Also, some diseases can significantly damage the shape of healthy input leaves. Since the proposed method regulates maintaining the original input shape, it does not reflect these disease shapes. Therefore, it is necessary to proceed with future research on the generative model that considers the morphological transformation of diseased leaves. In addition to the disease texture, future research can be conducted through the UGATIT [
43] model when considering additional transformations related to morphology. The UGATIT model is a model that can perform translation considering the shape of the target domain.
6. Conclusions
Recently, weather changes caused by climate change have had a great impact on crop productivity. If climate change continues as it is now, it is expected to be very difficult to maintain current crop production. To address this, deep learning-based plant early diagnosis studies are actively underway, but some rare disease samples have the problem of increasing the bias of learning models because they are difficult to collect.
Therefore, in this study, we study a data augmentation technique based on the Cycle-consistent Generative Adversarial Network to increase the diversity of rare disease samples by solving the above problem. The main content of the study is to address the problem of imbalances in plant disease data by generating diseased leaf images to be converted from healthy plant leaf images that are commonly collectible. We also propose a Pre-activation Residual Attention Block to improve the representation ability of the model, which allows the generator to select features that should be considered more important during domain transformation. Furthermore, by learning the background loss function, the unnecessary transformation between the input leaf image and the fake leaf image generated from it was suppressed. Experiments were conducted to verify whether the plant leaf image generated from the proposed method could further improve the performance of the classification model for early diagnosis of plants, and Precision, Recall, and F1 Score were evaluated by indicators. The models used in the experiment were ResNet-18, DenseNet-161, MobileNet-v2, and EfficientNet-b0, all of which were fine-tuned using pre-trained weights. Experimental results confirmed that for all three plant leaves, the highest F1 score was achieved when the EfficientNet and the proposed data augmentation method were used together. In addition, improved FID scores and classification performance were achieved compared to CycleGAN as a whole. Through this, it was confirmed that the data augmentation method proposed in this study improves the generalization performance of various classification models.