Figure 1.
Model structure. The stride of the convolution operation is denoted as s.
Figure 1.
Model structure. The stride of the convolution operation is denoted as s.
Figure 2.
Structure of DSC.
Figure 2.
Structure of DSC.
Figure 3.
Structure of an inverted residual block. Expansion*6 denotes a channel expansion operation with the multiplier set to 6. Projection denotes the channel compression operation. Residual concatenation is only required when stride is 1. There is no residual concatenation for the downsampling layers in each stage.
Figure 3.
Structure of an inverted residual block. Expansion*6 denotes a channel expansion operation with the multiplier set to 6. Projection denotes the channel compression operation. Residual concatenation is only required when stride is 1. There is no residual concatenation for the downsampling layers in each stage.
Figure 4.
Patch Embedding. H, W, and C denote height, width, and the number of channels, respectively. * denotes multiplication between numbers.
Figure 4.
Patch Embedding. H, W, and C denote height, width, and the number of channels, respectively. * denotes multiplication between numbers.
Figure 5.
Self-attention module.
Figure 5.
Self-attention module.
Figure 7.
Examples of diseases in the PlantVillage dataset: (a) Apple_Cedar_apple_rust, (b) Grape_Black_rot, (c) Peach_Bacterial_spot, (d) Potato_Early_blight, (e) Strawberry_Leaf_scorch, (f) Tomato_Bacterial_spot.
Figure 7.
Examples of diseases in the PlantVillage dataset: (a) Apple_Cedar_apple_rust, (b) Grape_Black_rot, (c) Peach_Bacterial_spot, (d) Potato_Early_blight, (e) Strawberry_Leaf_scorch, (f) Tomato_Bacterial_spot.
Figure 8.
Some samples of Apple Leaf Pathology dataset: (a) Alternaria_Boltch, (b) Brown_Spot, (c) Grey_Spot, (d) Mosaic, (e) Rust.
Figure 8.
Some samples of Apple Leaf Pathology dataset: (a) Alternaria_Boltch, (b) Brown_Spot, (c) Grey_Spot, (d) Mosaic, (e) Rust.
Figure 9.
Some of the augmented images: (a) origin image, (b) turn down brightness, (c) enhance brightness, (d) rotation , (e) horizontal flip.
Figure 9.
Some of the augmented images: (a) origin image, (b) turn down brightness, (c) enhance brightness, (d) rotation , (e) horizontal flip.
Figure 10.
Loss variation curves of ResNet models on the PlantVillage training dataset.
Figure 10.
Loss variation curves of ResNet models on the PlantVillage training dataset.
Figure 11.
Accuracy variation curves of ResNet models on the PlantVillage testing dataset.
Figure 11.
Accuracy variation curves of ResNet models on the PlantVillage testing dataset.
Figure 12.
Loss variation curves of other CNN models on the PlantVillage training dataset.
Figure 12.
Loss variation curves of other CNN models on the PlantVillage training dataset.
Figure 13.
Accuracy variation curves of other CNN models on the PlantVillage testing dataset.
Figure 13.
Accuracy variation curves of other CNN models on the PlantVillage testing dataset.
Figure 14.
Confusion matrix on the PlantVillage dataset.
Figure 14.
Confusion matrix on the PlantVillage dataset.
Figure 15.
Loss variation curves of ResNet models on the Apple Leaf Pathology training dataset.
Figure 15.
Loss variation curves of ResNet models on the Apple Leaf Pathology training dataset.
Figure 16.
Accuracy variation curves of ResNet models on the Apple Leaf Pathology testing dataset.
Figure 16.
Accuracy variation curves of ResNet models on the Apple Leaf Pathology testing dataset.
Figure 17.
Loss variation curves of other CNN models on the Apple Leaf Pathology training dataset.
Figure 17.
Loss variation curves of other CNN models on the Apple Leaf Pathology training dataset.
Figure 18.
Accuracy variation curves of other CNN models on the Apple Leaf Pathology testing dataset.
Figure 18.
Accuracy variation curves of other CNN models on the Apple Leaf Pathology testing dataset.
Figure 19.
Confusion matrix on the Apple Leaf Pathology testing dataset.
Figure 19.
Confusion matrix on the Apple Leaf Pathology testing dataset.
Figure 20.
Visualization. Darker red means the model pays more attention to the region, and darker blue means the model ignores the region more. results on Peach_Bacterial_spot: (a) origin image, (b) MSCVT, (c) VGG19, (d) ResNet101, (e) MobileNetV2.
Figure 20.
Visualization. Darker red means the model pays more attention to the region, and darker blue means the model ignores the region more. results on Peach_Bacterial_spot: (a) origin image, (b) MSCVT, (c) VGG19, (d) ResNet101, (e) MobileNetV2.
Figure 21.
Visualization. Darker red means the model pays more attention to the region, and darker blue means the model ignores the region more. results on Strawberry_Leaf_scorch, (a) origin image, (b) MSCVT, (c) VGG19, (d) ResNet101, (e) MobileNetV2.
Figure 21.
Visualization. Darker red means the model pays more attention to the region, and darker blue means the model ignores the region more. results on Strawberry_Leaf_scorch, (a) origin image, (b) MSCVT, (c) VGG19, (d) ResNet101, (e) MobileNetV2.
Figure 22.
Visualization. Darker red means the model pays more attention to the region, and darker blue means the model ignores the region more. results of SA module: (a) sample1 (Tomato_Early_blight), (b) MSCVT for sample1, (c) MSCVT without SA module for sample1, (d) sample2 (Tomato_Septoria_leaf_spot), (e) MSCVT for sample2, (f) MSCVT without SA module for sample2.
Figure 22.
Visualization. Darker red means the model pays more attention to the region, and darker blue means the model ignores the region more. results of SA module: (a) sample1 (Tomato_Early_blight), (b) MSCVT for sample1, (c) MSCVT without SA module for sample1, (d) sample2 (Tomato_Septoria_leaf_spot), (e) MSCVT for sample2, (f) MSCVT without SA module for sample2.
Table 1.
Input shapes and output shapes (height× width × channel) of each stage. k is denoted the number of disease categories.
Table 1.
Input shapes and output shapes (height× width × channel) of each stage. k is denoted the number of disease categories.
| Input Shapes | Output Shapes | Strides |
---|
Stage 1 | 224 × 224 ×3 | 56 × 56 × 24 | 2 |
Stage 2 | 56 × 56 × 24 | 56 × 56 × 32 | 1 |
Stage 3 | 56 × 56 × 32 | 28 × 28 × 48 | 2 |
Stage 4 | 28 × 28 × 48 | 14 × 14 × 88 | 2 |
Stage 5 | 14 × 14 × 88 | 7 × 7 × 168 | 2 |
IR | 7 × 7 × 168 | 7 × 7 × 320 | 1 |
Average pooling | 7 × 7 × 320 | 1 × 1 × 320 | - |
Fc | 1 × 1 × 320 | k × 1 | - |
Table 2.
Distribution of the PlantVillage dataset.
Table 2.
Distribution of the PlantVillage dataset.
Crop Category | Number of Disease Types | Number of Health Categories | Number of Images |
---|
Tomato | 9 | 1 | 18,160 |
Orange | 1 | 0 | 5507 |
Soybean | 0 | 1 | 5090 |
Grape | 3 | 1 | 4063 |
Corn | 3 | 1 | 3852 |
Apple | 3 | 1 | 3171 |
Peach | 1 | 1 | 2657 |
Pepper | 1 | 1 | 2475 |
Potato | 2 | 1 | 2152 |
Cherry | 1 | 1 | 1906 |
Squash | 1 | 0 | 1835 |
Strawberry | 1 | 1 | 1565 |
Blueberry | 0 | 1 | 1502 |
Raspberry | 0 | 1 | 371 |
Total | 26 | 12 | 54,306 |
Table 3.
Distribution of the Apple Leaf Pathology dataset.
Table 3.
Distribution of the Apple Leaf Pathology dataset.
Crop Category | Number of Images |
---|
Rust | 5694 |
Brown_Spot | 5655 |
Alternaria_Boltch | 5342 |
Mosaic | 4875 |
Gray_Spot | 4810 |
Total | 26,376 |
Table 4.
Comparison of testing performance and training performance of the MSCVT on PlantVillage dataset.
Table 4.
Comparison of testing performance and training performance of the MSCVT on PlantVillage dataset.
Dataset | Accuracy (%) | Recall (%) | Precision (%) | F1-Score (%) | Sens (%) | Spec (%) |
---|
Train | 99.59 | 99.35 | 99.43 | 99.46 | 99.43 | 99.98 |
Test | 99.86 | 99.82 | 99.72 | 99.77 | 99.82 | 99.99 |
Table 5.
The performance of ResNet models on the PlantVillage testing dataset.
Table 5.
The performance of ResNet models on the PlantVillage testing dataset.
Model | Accuracy (%) | Recall (%) | Precision (%) | F1-Score (%) | Sens (%) | Spec (%) |
---|
MSCVT | 99.86 | 99.82 | 99.72 | 99.77 | 99.82 | 99.99 |
ResNet18 | 99.71 | 99.66 | 99.52 | 99.55 | 99.66 | 99.98 |
ResNet34 | 99.75 | 99.72 | 99.62 | 99.63 | 99.72 | 99.98 |
ResNet50 | 99.70 | 99.58 | 99.58 | 99.58 | 99.58 | 99.98 |
ResNet101 | 99.79 | 99.65 | 99.74 | 99.69 | 99.65 | 99.99 |
Table 6.
The lightweight indicators of ResNet models on the PlantVillage testing dataset.
Table 6.
The lightweight indicators of ResNet models on the PlantVillage testing dataset.
Model | Param (M) | FLOPs (M) |
---|
MSCVT | 4.20 | 1035.78 |
ResNet18 | 11.20 | 1823.52 |
ResNet34 | 21.30 | 3678.22 |
ResNet50 | 23.59 | 4131.69 |
ResNet101 | 42.57 | 7864.39 |
Table 7.
The performance of other CNN models on the PlantVillage testing dataset.
Table 7.
The performance of other CNN models on the PlantVillage testing dataset.
Model | Accuracy (%) | Recall (%) | Precision (%) | F1-Score (%) | Sens (%) | Spec (%) |
---|
MSCVT | 99.86 | 99.82 | 99.72 | 99.77 | 99.82 | 99.99 |
VGG16 | 99.83 | 99.78 | 99.76 | 99.77 | 99.78 | 99.99 |
VGG19 | 99.84 | 99.75 | 99.77 | 99.76 | 99.75 | 99.99 |
MobileNetV1 | 99.71 | 99.60 | 99.61 | 99.60 | 99.60 | 99.98 |
MobileNetV2 | 99.82 | 99.79 | 99.66 | 99.72 | 99.79 | 99.99 |
Table 8.
The lightweight indicators of other CNN models on the PlantVillage testing dataset.
Table 8.
The lightweight indicators of other CNN models on the PlantVillage testing dataset.
Model | Param (M) | FLOPs (M) |
---|
MSCVT | 4.20 | 1035.78 |
VGG16 | 134.41 | 15,466.18 |
VGG19 | 139.72 | 19,627.97 |
MobileNetV1 | 3.26 | 587.93 |
MobileNetV2 | 2.44 | 542.18 |
Table 9.
The performance of different studies on the PlantVillage dataset.
Table 9.
The performance of different studies on the PlantVillage dataset.
Study | Year | Model (%) | Accuracy (%) |
---|
Mohanty et al. [14] | 2016 | GoogleNet | 99.34 |
Ferentinos [38] | 2018 | VGG | 99.53 |
Kamal et al. [20] | 2019 | MobileNet | 98.65 |
Kamal et al. [20] | 2019 | Reduced MobileNet | 98.34 |
Gao et al. [39] | 2021 | DECA_ResNet18 | 99.74 |
Sanida et al. [40] | 2021 | MobileNetV2 | 98.08 |
Sutaji and Yıldız [22] | 2022 | LEMOXINET | 99.10 |
This study | 2023 | MSCVT | 99.86 |
Table 10.
Comparison of testing performance and training performance of MSCVT on the Apple Leaf Pathology dataset.
Table 10.
Comparison of testing performance and training performance of MSCVT on the Apple Leaf Pathology dataset.
Dataset | Accuracy (%) | Recall (%) | Precision (%) | F1-Score (%) | Sens (%) | Spec (%) |
---|
Train | 96.74 | 96.74 | 96.74 | 96.86 | 96.74 | 99.18 |
Test | 97.50 | 97.52 | 97.51 | 97.51 | 97.52 | 99.36 |
Table 11.
The performance of ResNet models on the Apple Leaf Pathology testing dataset.
Table 11.
The performance of ResNet models on the Apple Leaf Pathology testing dataset.
Model | Accuracy (%) | Recall (%) | Precision (%) | F1-Score (%) | Sens (%) | Spec (%) |
---|
MSCVT | 97.50 | 97.52 | 97.51 | 97.51 | 97.52 | 99.36 |
ResNet18 | 97.19 | 97.19 | 97.19 | 97.19 | 97.19 | 99.26 |
ResNet34 | 97.19 | 97.25 | 97.19 | 97.21 | 97.25 | 99.28 |
ResNet50 | 96.85 | 96.89 | 96.84 | 96.86 | 96.89 | 99.21 |
ResNet101 | 97.02 | 97.06 | 97.01 | 97.04 | 97.01 | 99.25 |
Table 12.
The performance of other CNN models on the Apple Leaf Pathology testing dataset.
Table 12.
The performance of other CNN models on the Apple Leaf Pathology testing dataset.
Model | Accuracy (%) | Recall (%) | Precision (%) | F1-Score (%) | Sens (%) | Spec (%) |
---|
MSCVT | 97.50 | 97.52 | 97.51 | 97.51 | 97.52 | 99.36 |
VGG16 | 97.25 | 97.26 | 97.27 | 97.26 | 97.26 | 99.21 |
VGG19 | 97.63 | 97.62 | 97.66 | 97.64 | 97.62 | 99.38 |
MobileNetV1 | 95.60 | 95.56 | 95.53 | 95.53 | 95.26 | 99.02 |
MobileNetV2 | 97.27 | 97.32 | 97.28 | 97.30 | 97.32 | 99.27 |
Table 13.
The lightweight indicators of ResNet models on the Apple Leaf Pathology testing dataset.
Table 13.
The lightweight indicators of ResNet models on the Apple Leaf Pathology testing dataset.
Model | Param (M) | FLOPs (M) |
---|
MSCVT | 4.18 | 1035.76 |
ResNet18 | 11.18 | 1823.52 |
ResNet34 | 21.29 | 3678.22 |
ResNet50 | 23.52 | 4131.69 |
ResNet101 | 42.51 | 7864.38 |
Table 14.
The lightweight indicators of other CNN models on the Apple Leaf Pathology testing dataset.
Table 14.
The lightweight indicators of other CNN models on the Apple Leaf Pathology testing dataset.
Model | Param (M) | FLOPs (M) |
---|
MSCVT | 4.20 | 1035.76 |
VGG16 | 134.28 | 15,466.17 |
VGG19 | 139.59 | 19,627.97 |
MobileNetV1 | 3.22 | 587.89 |
MobileNetV2 | 2.40 | 542.13 |
Table 15.
Ablation experiment results for SA module.
Table 15.
Ablation experiment results for SA module.
Model | Plant Village Dataset (%) | Apple Leaf Pathology Dataset (%) |
---|
MSCVT | 99.86 | 97.50 |
MSCVT without SA module | 99.80 | 97.42 |