Semantic Segmentation Algorithm of Rice Small Target Based on Deep Learning
Abstract
:1. Introduction
2. Materials and Methods
2.1. Multi-View Feature Extraction Module
2.2. Super-Resolution Feature Building Blocks
2.3. Semantic Segmentation Model
- (1)
- Spatial path framework building via multi-branch convolution.In semantic segmentation tasks, the fully convolutional structure is usually adopted by existing segmentation networks, consisting of an up- and down-sampling process. However, the down-sampling process for small objects will lose important spatial features, resulting in the inability to obtain accurate segmentation results during upsampling. Therefore, we design a multi-branch convolution module (MBC-module) to optimize the spatial branch network to extract low-level spatial features. Its internal structure is divided into a multi-branch convolutional layer and a connected dilated convolutional layer through which multiple branched convolution kernels are connected to obtain receptive fields of different scales, as shown in Figure 5. Bypass pruning is added to the multi-branch convolution part to reduce the large number of convolution kernel channels. First, a convolution kernel is used to realize the interaction and information integration of the channels and to reduce the dimensionality of the number of convolution kernel channels. The two convolution kernels were subsequently replaced with two convolution kernels to reduce the amount of parameters and at the same time enhance the nonlinear ability of the model. In addition, and convolution kernels are further used to replace the original convolution kernels to enhance the width and height features. At the same time, bypass pruning is set to reduce the number of channels of the convolution kernel and reduce the amount of parameter calculation. The improved spatial path is shown in Figure 6.
- (2)
- The context path structure based on feature pyramid is established.Unlike spatial branching networks which need to preserve rich underlying spatial features, contextual branching networks are designed to provide a larger receptive field. In the current investigation, we improve a network branching contextual branching network that utilizes residual networks, global average pooling to obtain large perceptual fields by considering the increase of perceptual fields and the requirement of computational power. To merge the surface feature maps with high precision and the internal feature maps with sufficient semantic information in the context path, the lateral connection of the pyramid structure is added, as shown in Figure 7. Therefore, it is possible to rapidly compose a feature pyramid with rich semantic messages at all sizes from a unitary image of a unitary size, and a residual network is used in the context path to quickly down-sample the feature map to obtain a large perceived range. With these fast down-sampled feature maps, a rich semantic context message is encoded. Furthermore, an overall average pooling is added to the last part of the residual module to receive the global receptive field. The improved context path is shown in Figure 8.
- (3)
- Design of feature fusion module and attention refinement module.Since the feature representation levels of the two paths are different, the method cannot simply summarize these features. Most of the spatial information captured by the spatial path encodes rich detailed information. Furthermore, the contextual information is mainly encoded by the output features of the contextual path. In summary, the resulting features of spatial paths are poor, while the resulting features of contextual paths are excellent. Then, a customized feature merge module is needed to merge these features, as shown in Figure 9.In this paper, we consider that feature fusion in which both feature maps are fused with a larger receptive field can better utilize the spatial information of the low-level features and the semantic information of the high-level features. Therefore, in the current study, to not increase the computational effort, we use atrous convolution to expand the receptive field instead of conventional convolution. Atrous convolution improves the perception range by joining atrous into the standard convolution kernel. Contrasted with the ordinary convolution operation, atrous convolution has one more parameter to be adjusted, which is called the atrous rate. It represents the number of spaces between each pixel in the filter, as shown in Figure 10.In the present study, a squeeze excitation module is added to inhibit the incorrect information channel of the attention refinement module and increase the work speed of the module, as shown in Figure 11. The secondary module is connected to the backbone network by two different methods. One of the branches is the result of the assist module, which is integrated through a convolution before entering the main branch. Another branch complements an attention mechanism structure to the deep auxiliary network between the connections of the two networks. The purpose is to fix the output properties of the assist module. The work process can be classified as squeezing and activation. First, the feature map is condensed, and the two-dimensional feature is transformed into one-dimensional through average pooling. In addition, these feature maps are transformed from two-dimensional feature maps to one-dimensional feature maps through pooling, which can more perfectly show the arrangement of feature values of all channels and further improve the effect of feature learning.
3. Results and Discussion
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Rother, C.; Kolmogorov, V.; Blake, A. “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (TOG) 2004, 23, 309–314. [Google Scholar] [CrossRef]
- Boykov, Y.; Jolly, M.P. Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images. In Proceedings of the Eighth IEEE International Conference on Computer Vision, ICCV 2001, Vancouver, BC, Canada, 7–14 July 2001; Volume 1, pp. 105–112. [Google Scholar] [CrossRef]
- Tang, M.; Gorelick, L.; Veksler, O.; Boykov, Y. GrabCut in One Cut. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 1769–1776. [Google Scholar] [CrossRef]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Pearlmutter, B. Gradient calculations for dynamic recurrent neural networks: A survey. IEEE Trans. Neural Netw. 1995, 6, 1212–1228. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June2015; pp. 3431–3440. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar] [CrossRef]
- Ding, X.; Guo, Y.; Ding, G.; Han, J. ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 1911–1920. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), LasVegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3141–3149. [Google Scholar] [CrossRef]
- Niu, R.; Sun, X.; Tian, Y.; Diao, W.; Chen, K.; Fu, K. Hybrid Multiple Attention Network for Semantic Segmentation in Aerial Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5603018. [Google Scholar] [CrossRef]
- Tian, Z.; He, T.; Shen, C.; Yan, Y. Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3121–3130. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
- Peng, C.; Zhang, X.; Yu, G.; Luo, G.; Sun, J. Large Kernel Matters—Improve Semantic Segmentation by Global Convolutional Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1743–1751. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science. Springer International Publishing: Cham, Switzerland, 2018; pp. 833–851. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar] [CrossRef]
- Eigen, D.; Fergus, R. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 2650–2658. [Google Scholar] [CrossRef]
- Roy, A.; Todorovic, S. A Multi-scale CNN for Affordance Segmentation in RGB Images. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science. Springer International Publishing: Cham, Switzerland, 2016; pp. 186–201. [Google Scholar] [CrossRef]
- Lu, Y.; Yaran, C.; Zhao, D.; Chen, J. Graph-FCN for Image Semantic Segmentation; Springer: Cham, Switzerland, 2020. [Google Scholar]
- Yuan, Y.; Chen, X.; Wang, J. Object-Contextual Representations for Semantic Segmentation; Springer: Cham, Switzerland, 2019. [Google Scholar]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science. Springer International Publishing: Cham, Switzerland, 2018; pp. 334–349. [Google Scholar] [CrossRef]
- Wang, Y.; Zhou, Q.; Liu, J.; Xiong, J.; Gao, G.; Wu, X.; Latecki, L.J. Lednet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1860–1864. [Google Scholar] [CrossRef]
- Li, H.; Xiong, P.; Fan, H.; Sun, J. DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9514–9523. [Google Scholar] [CrossRef]
- Wei, Y.; Xiao, H.; Shi, H.; Jie, Z.; Feng, J.; Huang, T.S. Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7268–7277. [Google Scholar] [CrossRef]
- Lee, J.; Kim, E.; Lee, S.; Lee, J.; Yoon, S. FickleNet: Weakly and Semi-Supervised Semantic Image Segmentation Using Stochastic Inference. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5262–5271. [Google Scholar] [CrossRef]
- Sun, G.; Wang, W.; Dai, J.; Van Gool, L. Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Lecture Notes in Computer Science. Springer International Publishing: Cham, Switzerland, 2020; pp. 347–365. [Google Scholar] [CrossRef]
- Fan, J.; Zhang, Z.; Tan, T. Employing Multi-estimations for Weakly-Supervised Semantic Segmentation. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Lecture Notes in Computer Science. Springer International Publishing: Cham, Switzerland, 2020; pp. 332–348. [Google Scholar] [CrossRef]
- Chen, L.C.; Lopes, R.G.; Cheng, B.; Collins, M.D.; Cubuk, E.D.; Zoph, B.; Adam, H.; Shlens, J. Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Lecture Notes in Computer Science. Springer International Publishing: Cham, Switzerland, 2020; pp. 695–714. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. Technical Report. arXiv 2015, arXiv:1505.04597. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. Technical Report. arXiv 2016, arXiv:1511.00561. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision–ECCV, Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation. Technical Report. arXiv 2016, arXiv:1611.06612. [Google Scholar] [CrossRef]
- Islam, M.A.; Rochan, M.; Bruce, N.D.B.; Wang, Y. Gated Feedback Refinement Network for Dense Image Labeling. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4877–4885. [Google Scholar] [CrossRef]
Model | MIOU (%) | F1-Score (%) | Accuracy (%) |
---|---|---|---|
FCN-32s | 46.12 | 55.23 | 58.76 |
FCN-8s | 53.86 | 68.71 | 70.54 |
U-net [30] | 56.77 | 69.36 | 71.87 |
SegNet [31] | 59.63 | 78.25 | 80.35 |
CBAM [32] | 61.32 | 80.74 | 82.71 |
RefineNet [33] | 63.94 | 82.52 | 83.65 |
DeepLabv3+ | 63.45 | 83.97 | 85.93 |
G-FRNet [34] | 64.91 | 85.19 | 86.42 |
Network (ours) | 66.31 | 87.28 | 87.97 |
Model | MIOU (%) | F1-Score (%) | Accuracy (%) |
---|---|---|---|
FCN-32s | 47.43 | 56.54 | 61.07 |
FCN-8s | 55.27 | 70.12 | 72.25 |
U-net | 58.08 | 71.67 | 73.18 |
SegNet | 61.74 | 79.76 | 81.66 |
CBAM | 62.63 | 82.05 | 85.12 |
RefineNet | 64.35 | 84.13 | 84.96 |
DeepLabv3+ | 64.76 | 85.28 | 87.24 |
G-FRNet | 66.32 | 86.62 | 87.23 |
Network (ours) | 67.63 | 88.51 | 89.18 |
Model | MIOU (%) | F1-Score (%) | Accuracy (%) |
---|---|---|---|
FCN-32s | 44.97 | 54.08 | 57.61 |
FCN-8s | 52.52 | 67.46 | 69.39 |
U-net | 55.62 | 68.23 | 73.73 |
SegNet | 59.08 | 77.19 | 79.27 |
CBAM | 61.27 | 79.39 | 81.56 |
RefineNet | 62.79 | 81.77 | 82.52 |
DeepLabv3+ | 62.33 | 82.82 | 84.78 |
G-FRNet | 63.86 | 84.24 | 85.37 |
Network (ours) | 65.09 | 86.31 | 86.54 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, S.; Li, B.; Li, J.; Liu, B.; Li, X. Semantic Segmentation Algorithm of Rice Small Target Based on Deep Learning. Agriculture 2022, 12, 1232. https://doi.org/10.3390/agriculture12081232
Li S, Li B, Li J, Liu B, Li X. Semantic Segmentation Algorithm of Rice Small Target Based on Deep Learning. Agriculture. 2022; 12(8):1232. https://doi.org/10.3390/agriculture12081232
Chicago/Turabian StyleLi, Shuofeng, Bing Li, Jin Li, Bin Liu, and Xin Li. 2022. "Semantic Segmentation Algorithm of Rice Small Target Based on Deep Learning" Agriculture 12, no. 8: 1232. https://doi.org/10.3390/agriculture12081232
APA StyleLi, S., Li, B., Li, J., Liu, B., & Li, X. (2022). Semantic Segmentation Algorithm of Rice Small Target Based on Deep Learning. Agriculture, 12(8), 1232. https://doi.org/10.3390/agriculture12081232