FASSD-Net Model for Person Semantic Segmentation
Abstract
:1. Introduction
- Adaptation of the FASSD-Net model for two-class semantic segmentations (“human silhouette” and “background”).
- Reduction of the computational complexity of the original FASSD-Net model that requires 45.1 GFLOPS to segment 19 classes, to 11.25 GFLOPS for two-class segmentation.
2. Methods and Materials
2.1. FASSD-Net
- Reduced computational complexity allowing its use in real time applications.
- State-of-the-art result of mean intersection over union (mIoU) in the validation of urban landscapes.
- Better learning by using two different stages of the network, simultaneously refining spatial and contextual information.
- Three versions of the model, FASSD-Net, FASSD-Net-L1 and FASSD-Net-L2, to maintain a better tradeoff between speed and accuracy.
2.2. Dataset
2.2.1. Cityscapes
2.2.2. Database Pre-Processing
2.3. Model Training
2.4. Implementation Details
2.5. Methodology of Experiments
3. Results and Discussions
Evaluation Methods
4. Conclusions and Future Work
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Mabrouk, A.B.; Zagrouba, E. Abnormal behavior recognition for intelligent video surveillance systems: A review. Expert Syst. Appl. 2018, 91, 480–491. [Google Scholar] [CrossRef]
- Han, H.; Ma, W.; Zhou, M.C.; Guo, Q.; Abusorrah, A. A Novel Semi-supervised Learning Approach to person Re-Identification. IEEE Internet Things J. 2020, 8, 3042–3052. [Google Scholar] [CrossRef]
- Koshmak, G. Remote Monitoring and Automatic Fall Detection for Elderly People at Home. Ph.D. Thesis, Mälardalen University, Vasteras, Sweden, 2015. [Google Scholar]
- Zhang, H.B.; Zhang, Y.X.; Zhong, B.; Lei, Q.; Yang, L.; Du, J.X.; Chen, D.S. A Comprehensive Survey of Vision-Based Human Action Recognition Methods. Sensors 2019, 19, 1005. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J. Recent advances in convolutional neural networks. Pattern Recogn. 2018, 77, 354–377. [Google Scholar] [CrossRef] [Green Version]
- Sultana, F.; Sufian, A.; Dutta, P. Evolution of Image Segmentation using Deep Convolutional Neural Network: A Survey. In Knowledge-Based Systems; Jones & Bartlett Publishers: Burlington, MA, USA, 2020; pp. 201–202. [Google Scholar]
- Xia, Y.; Yu, H.; Wang, F.Y. Accurate and robust eye center localization via fully convolutional networks. IEEE/CAA J. Automat. Sin. 2019, 6, 1127–1138. [Google Scholar] [CrossRef]
- Rosas-Arias, L.; Benitez-Garcia, G.; Portillo-Portillo, J.; Sanchez-Perez, G.; Yanai, K. Fast and Accurate Real-Time Semantic Segmentation with Dilated Asymmetric Convolutions. ICPR 2021, 1–8. [Google Scholar] [CrossRef]
- Han, H.; Zhou, M.; Shang, X.; Cao, W.; Abusorrah, A. KISS+ for rapid and accurate person re-identification. IEEE Transact. Intell. Transport. Syst. 2020, 99, 394–403. [Google Scholar]
- Chao, P.; Kao, C.Y.; Ruan, Y.S.; Huang, C.H.; Lin, Y.L. HarDNet: A Low Memory Traffic Network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 3552–3561. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Rosas-Arias, L.; Benitez-Garcia, G.; Portillo-Portillo, J.; Olivares-Mercado, J.; Sanchez-Perez, G.; Yanai, K. FaSSD-Net: Fast and Accurate Real-Time Semantic Segmentation for Embedded System. In Proceedings of the ITS World Congress, T-ITS 2021, Hamburg, Germany, 11–15 October 2021. [Google Scholar]
- Wu, Z.; Shen, C.; Hengel, A.v.d. High-performance semantic segmentation using very deep fully convolutional networks. arXiv 2016, arXiv:1604.04339. [Google Scholar]
- Romera, E.; Alvarez, J.M.; Bergasa, L.M.; Arroyo, R. Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Transact. Intell. Transport. Syst. 2017, 19, 263–272. [Google Scholar] [CrossRef]
- Poudel, R.P.; Bonde, U.; Liwicki, S.; Zach, C. Contextnet: Exploring context and detail for semantic segmentation in real-time. arXiv 2018, arXiv:1805.04554. [Google Scholar]
- Dong, G.; Yan, Y.; Shen, C.; Wang, H. Real-time high-performance semantic image segmentation of urban street scenes. IEEE Transact. Intell. Transport. Syst. 2020. [Google Scholar] [CrossRef] [Green Version]
- Takikawa, T.; Acuna, D.; Jampani, V.; Fidler, S. Gated-scnn: Gated shape cnns for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 5229–5238. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Siam, M.; Gamal, M.; Abdel-Razek, M.; Yogamani, S.; Jagersand, M.; Zhang, H. A comparative study of real-time semantic segmentation for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 587–597. [Google Scholar]
- Han, H.Y.; Chen, Y.C.; Hsiao, P.Y.; Fu, L.C. Using Channel-Wise Attention for Deep CNN Based Real-Time Semantic Segmentation With Class-Aware Edge Information. IEEE Transact. Intell. Transport. Syst. 2020. [Google Scholar] [CrossRef]
- Wang, Y.; Zhou, Q.; Xiong, J.; Wu, X.; Jin, X. Esnet: An efficient symmetric network for real-time semantic segmentation. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Xian, China, 8–11 November 2019; pp. 41–52. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Wang, Y.; Zhou, Q.; Liu, J.; Xiong, J.; Gao, G.; Wu, X.; Latecki, L.J. Lednet: A lightweight encoder-decoder network for real-time semantic segmentation. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 2019; pp. 1860–1864. [Google Scholar]
Dataset | Person IoU | Background IoU | Mean IoU |
---|---|---|---|
Cityscapes | 79.86% | 99.74% | 89.80% |
Models | Ro | Si | Bu | Wa | Fe | Po | TL | TS | Ve | Te | Sk | Pe | Ri | Ca | Tr | Bu | Tn | Mo | Bi |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Scratch (VAL) [15] | 97.4 | 80.6 | 90.3 | 55.8 | 50.1 | 57.5 | 58.6 | 68.2 | 90.9 | 61.2 | 93.1 | 73 | 53.2 | 91.8 | 59.1 | 70.1 | 66.7 | 44.9 | 67.1 |
Pretrained (VAL) [15] | 97.5 | 81.4 | 90.9 | 54.6 | 54.1 | 59.8 | 62.5 | 71.6 | 91.3 | 62.9 | 93.1 | 75.2 | 55.3 | 92.9 | 67 | 77.4 | 59.8 | 41.9 | 68.4 |
FCRN [14] | 97.4 | 80.3 | 90.8 | 47.6 | 53.8 | 53.1 | 58.1 | 70.2 | 91.2 | 59.6 | 93.2 | 77.1 | 54.4 | 93 | 67.1 | 79.4 | 62.2 | 57.3 | 72.7 |
fCRN+Bs [14] | 97.6 | 82 | 91.7 | 52.3 | 56.2 | 57 | 65.7 | 74.4 | 91.7 | 62.5 | 93.8 | 79.8 | 59.6 | 94 | 66.2 | 83.7 | 70.3 | 64.2 | 75.5 |
ContextNet [16] | 97.4 | 79.6 | 89.5 | 44.1 | 49.8 | 45.5 | 50.6 | 64.6 | 90.2 | 59.4 | 93.4 | 70.9 | 43.1 | 91.8 | 65.2 | 71.9 | 64.5 | 41.95 | 66.1 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Garcia-Ortiz, L.B.; Portillo-Portillo, J.; Hernandez-Suarez, A.; Olivares-Mercado, J.; Sanchez-Perez, G.; Toscano-Medina, K.; Perez-Meana, H.; Benitez-Garcia, G. FASSD-Net Model for Person Semantic Segmentation. Electronics 2021, 10, 1393. https://doi.org/10.3390/electronics10121393
Garcia-Ortiz LB, Portillo-Portillo J, Hernandez-Suarez A, Olivares-Mercado J, Sanchez-Perez G, Toscano-Medina K, Perez-Meana H, Benitez-Garcia G. FASSD-Net Model for Person Semantic Segmentation. Electronics. 2021; 10(12):1393. https://doi.org/10.3390/electronics10121393
Chicago/Turabian StyleGarcia-Ortiz, Luis Brandon, Jose Portillo-Portillo, Aldo Hernandez-Suarez, Jesus Olivares-Mercado, Gabriel Sanchez-Perez, Karina Toscano-Medina, Hector Perez-Meana, and Gibran Benitez-Garcia. 2021. "FASSD-Net Model for Person Semantic Segmentation" Electronics 10, no. 12: 1393. https://doi.org/10.3390/electronics10121393
APA StyleGarcia-Ortiz, L. B., Portillo-Portillo, J., Hernandez-Suarez, A., Olivares-Mercado, J., Sanchez-Perez, G., Toscano-Medina, K., Perez-Meana, H., & Benitez-Garcia, G. (2021). FASSD-Net Model for Person Semantic Segmentation. Electronics, 10(12), 1393. https://doi.org/10.3390/electronics10121393