Deep-Learning-Based Complex Scene Text Detection Algorithm for Architectural Images
Abstract
:1. Introduction
2. Materials and Methods
2.1. EAST Algorithm
2.1.1. Feature Extraction Network
2.1.2. Feature Fusion
2.1.3. Output Layer
2.2. NEAST Oblique Text Detection Method
2.2.1. Feature Extraction Module
2.2.2. Generalized Intersection over Union Algorithm
2.3. Class Imbalance
- An extremely negative BBOX will cause its loss value to be significantly large. The loss value of a positive BBOX is overwhelmed, which is not conducive to the convergence of the target.
- When the parameter changes in the training process are not evident, the model cannot be effectively trained, and the problem of gradient disappearance may occur. However, when the easy negative sample is trained, the corresponding target score is small. That is, the loss value of a single BBOX sample is small. The parameter changes during model training backpropagation are also significantly small. Small parameter changes are not conducive to model training. Therefore, for text detection, it is extremely necessary to find BBOX samples with larger loss values and a greater impact on parameter convergence—namely, a hard BBOX.
2.4. Experimental Dataset
3. Results
3.1. Impact of Pyramid Network on Text Detection
3.2. Impact of Modules on Text Detection
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Fang, S.; Xie, H.; Wang, Y.; Mao, Z.; Zhang, Y. Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 7098–7107. [Google Scholar]
- Wu, F.; Zhu, C.; Xu, J.; Bhatt, M.W.; Sharma, A. Research on image text recognition based on canny edge detection algorithm and k-means algorithm. Int. J. Syst. Assur. Eng. 2022, 13, 72–80. [Google Scholar] [CrossRef]
- Kisacanin, B.; Pavlovic, V.; Huang, T.S. Real-Time Vision for Human-Computer Interaction; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
- Barber, D.B.; Redding, J.D.; McLain, T.W.; Beard, R.W.; Taylor, C.N. Vision-based target geo-location using a fixed-wing miniature air vehicle. J. Intell. Robot. Syst. 2006, 47, 361–382. [Google Scholar] [CrossRef]
- Haritaoglu, I. Scene text extraction and translation for handheld devices. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Kauai, HI, USA, 8–14 December 2001; p. II. [Google Scholar]
- DeSouza, G.N.; Kak, A.C. Vision for mobile robot navigation: A survey. IEEE Tran. Pattern Anal. 2002, 24, 237–267. [Google Scholar] [CrossRef] [Green Version]
- Ham, Y.K.; Kang, M.S.; Chung, H.K.; Park, R.H.; Park, G.T. Recognition of raised characters for automatic classification of rubber tires. Opt. Eng. 1995, 34, 102–109. [Google Scholar] [CrossRef]
- Neumann, L.; Matas, J. Real-time lexicon-free scene text localization and recognition. IEEE Tran. Pattern Anal. 2015, 38, 1872–1885. [Google Scholar] [CrossRef]
- Louloudis, G.; Gatos, B.; Pratikakis, I.; Halatsis, C. Text line detection in handwritten documents. Pattern Recogn. 2008, 41, 3758–3772. [Google Scholar] [CrossRef] [Green Version]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [Green Version]
- Jinqiang, W.; Basnet, P.; Mahtab, S. Review of machine learning and deep learning application in mine microseismic event classification. Min. Miner. Deposits 2021, 15, 19–26. [Google Scholar] [CrossRef]
- Peng, P.; He, Z.; Wang, L.; Jiang, Y. Automatic classification of microseismic records in underground mining: A deep learning approach. IEEE Access 2020, 8, 17863–17876. [Google Scholar] [CrossRef]
- Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agr. 2018, 147, 70–90. [Google Scholar] [CrossRef]
- Jospin, L.V.; Laga, H.; Boussaid, F.; Buntine, W.; Bennamoun, M. Hands-on Bayesian neural networks—A tutorial for deep learning users. IEEE Comput. Intell. Mag. 2022, 17, 29–48. [Google Scholar] [CrossRef]
- Su, X.; Xue, S.; Liu, F.; Wu, J.; Yang, J.; Zhou, C.; Hu, W.; Paris, C.; Nepal, S.; Jin, D.; et al. A Comprehensive Survey on Community Detection with Deep Learning. IEEE Trans. Neural Netw. Learn. Syst. 2022. Early Access. [Google Scholar] [CrossRef]
- Li, T.; Wang, Y.; Hong, R.; Wang, M.; Wu, X. pDisVPL: Probabilistic discriminative visual Part Learning for image classification. IEEE MultiMedia 2018, 25, 34–45. [Google Scholar] [CrossRef]
- Li, T.; Cheng, B.; Ni, B.; Liu, G.; Yan, S. Multitask low-rank affinity graph for image segmentation and image annotation. ACM T. Intel. Syst. Tec. 2016, 7, 1–18. [Google Scholar] [CrossRef]
- Li, T.; Ni, B.; Xu, M.; Wang, M.; Gao, Q.; Yan, S. Data-driven affective filtering for images and videos. IEEE T. Cybernetics 2015, 45, 2336–2349. [Google Scholar] [CrossRef]
- Wang, W.; Xie, E.; Li, X.; Hou, W.; Lu, T.; Yu, G.; Shao, S. Shape Robust Text Detection with Progressive Scale Expansion Network. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9328–9337. [Google Scholar]
- Wang, T.; Wu, D.J.; Coates, A.; Ng, A.Y. End-to-end text recognition with convolutional neural networks. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan, 11–15 November 2012; pp. 3304–3308. [Google Scholar]
- Zang, D.; Zhang, J.; Zhang, D.; Bao, M.; Cheng, J.; Tang, K. Traffic sign detection based on cascaded convolutional neural networks. In Proceedings of the 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Shanghai, China, 30 May–1 June 2016; pp. 201–206. [Google Scholar]
- Zhang, Z.; Zhang, C.; Shen, W.; Yao, C.; Liu, W.; Bai, X. Multi-oriented Text Detection with Fully Convolutional Networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4159–4167. [Google Scholar]
- Liu, X.; Liang, D.; Yan, S.; Chen, D.; Qiao, Y.; Yan, J. FOTS: Fast Oriented Text Spotting with a Unified Network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5676–5685. [Google Scholar]
- He, W.; Zhang, X.; Yin, F.; Liu, C. Deep Direct Regression for Multi-oriented Scene Text Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 745–753. [Google Scholar]
- Qin, S.; Ren, P.; Kim, S.; Manduchi, R. Robust and Accurate Text Stroke Segmentation. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 242–250. [Google Scholar]
- Feng, W.; He, W.H.; Yin, F.; Liu, C.L. Scene Text Detection with Recurrent Instance Segmentation. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 2227–2232. [Google Scholar]
- Jiang, Y.; Zhu, X.; Wang, X.; Yang, S.; Li, W.; Wang, H.; Fu, P.; Luo, Z. R2 CNN: Rotational Region CNN for Arbitrarily-Oriented Scene Text Detection. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 3610–3615. [Google Scholar]
- Liao, M.; Zhu, Z.; Shi, B.; Xia, G.; Bai, X. Rotation-Sensitive Regression for Oriented Scene Text Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5909–5918. [Google Scholar]
- Liu, X.; Zhou, G.; Zhang, R.; Wei, X. An Accurate Segmentation-Based Scene Text Detector with Context Attention and Repulsive Text Border. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 2344–2352. [Google Scholar]
- Long, S.; Ruan, J.; Zhang, W.; He, X.; Wu, W.; Yao, C. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes. In Proceedings of the Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 19–35. [Google Scholar]
- Lyu, P.; Yao, C.; Wu, W.; Yan, S.; Bai, X. Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7553–7563. [Google Scholar]
- Nayef, N.; Yin, F.; Bizid, I.; Choi, H.; Feng, Y.; Karatzas, D.; Luo, Z.; Pal, U.; Rigaud, C.; Chazalon, J.; et al. ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification-RRC-MLT. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; pp. 1454–1459. [Google Scholar]
- Li, J.; Lin, Y.; Liu, R.; Ho, C.M.; Shi, H. RSCA: Real-time Segmentation-based Context-Aware Scene Text Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 19–25 June 2021; pp. 2349–2358. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Huang, W.; Qiao, Y.; Tang, X. Robust scene text detection with convolution neural network induced mser trees. In Proceedings of the European Conference on Computer Vision (ECCV), Zürich, Switzerland, 6–7 and 12 September 2014; pp. 497–511. [Google Scholar]
- Jaderberg, M.; Vedaldi, A.; Zisserman, A. Deep features for text spotting. In Proceedings of the European Conference on Computer Vision (ECCV), Zürich, Switzerland, 6–7 and 12 September 2014; pp. 512–528. [Google Scholar]
- Shi, B.; Bai, X.; Belongie, S. Detecting Oriented Text in Natural Images by Linking Segments. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3482–3490. [Google Scholar]
- Tian, Z.; Huang, W.; He, T.; He, P.; Qiao, Y. Detecting text in natural image with connectionist text proposal network. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 56–72. [Google Scholar]
- Zhou, X.; Yao, C.; Wen, H.; Wang, Y.; Zhou, S.; He, W.; Liang, J. East: An efficient and accurate scene text detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5551–5560. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Hou, F.; Lei, W.; Li, S.; Xi, J.; Xu, M.; Luo, J. Improved Mask R-CNN with distance guided intersection over union for GPR signature detection and segmentation. Automat. Constr. 2021, 121, 103414. [Google Scholar] [CrossRef]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. IEEE T. Multimed. 2018, 20, 3111–3122. [Google Scholar] [CrossRef] [Green Version]
- Stamatopoulos, N.; Gatos, B.; Louloudis, G.; Pal, U.; Alaei, A. ICDAR 2013 Handwriting Segmentation Contest. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25–28 August 2013; pp. 1402–1406. [Google Scholar]
- Karatzas, D.; Gomez-Bigorda, L.; Nicolaou, A.; Ghosh, S.; Bagdanov, A.; Iwamura, M.; Matas, J.; Neumann, L.; Chandrasekhar, V.R.; Lu, S.; et al. ICDAR 2015 competition on Robust Reading. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; pp. 1156–1160. [Google Scholar]
- Buta, M.; Neumann, L.; Matas, J. FASText: Efficient Unconstrained Scene Text Detector. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1206–1214. [Google Scholar]
- Kumar, A.; Zhang, Z.J.; Lyu, H. Object detection in real time based on improved single shot multi-box detector algorithm. EURASIP J. Wirel. Comm. 2020, 2020, 204. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. ITPAM 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Diaz-Escobar, J.; Kober, V. Natural Scene Text Detection and Segmentation Using Phase-Based Regions and Character Retrieval. Math. Probl. Eng. 2020, 2020, 7067251. [Google Scholar] [CrossRef]
- Xue, C.; Lu, S.; Zhang, W. MSR: Multi-scale shape regression for scene text detection. arXiv 2019, arXiv:1901.02596. [Google Scholar]
- He, P.; Huang, W.; He, T.; Zhu, Q.; Qiao, Y.; Li, X. Single Shot Text Detector with Regional Attention. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3066–3074. [Google Scholar]
- Jiang, X.; Xu, S.; Zhang, S.; Cao, S. Arbitrary-Shaped Text Detection with Adaptive Text Region Representation. IEEE Access 2020, 8, 102106–102118. [Google Scholar] [CrossRef]
- Deng, D.; Liu, H.; Li, X.; Cai, D. Pixellink: Detecting scene text via instance segmentation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 7–8 February 2018; Volume 32. [Google Scholar]
Geometry | Channels | Description |
---|---|---|
AABB | 4 | |
RBOX | 5 | |
QUAD | 8 |
Dataset | Size | Number of Images (Train/Test) | Amount of Text |
---|---|---|---|
ICDAR 2013 | 250 M | 462 (229/233) | 1943 |
ICDAR 2015 | 131.8 M | 1500 (1000/500) | 17,548 |
Type | Setting |
---|---|
Batch size | 16 |
learning rate | |
Focal loss | ɑ = 0.25, ɤ = 2 |
Learning decay rate | 0.9/10,000 |
Iterations | 100,000 |
Method | Tt/h | Dt/ms | R | P | F |
---|---|---|---|---|---|
EAST | 19.7 | 150.9 | 0.73 | 0.84 | 0.78 |
FL+GIOU | 33.4 | 198.0 | 0.79 | 0.85 | 0.82 |
NAS+GIOU | 139.3 | 307.8 | 0.76 | 0.87 | 0.81 |
NAS+FL | 144.9 | 323.5 | 0.82 | 0.85 | 0.83 |
Proposed | 145.6 | 337.4 | 0.84 | 0.89 | 0.87 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, W.; Wang, H.; Lu, Y.; Luo, J.; Liu, T.; Lin, J.; Pang, Y.; Zhang, G. Deep-Learning-Based Complex Scene Text Detection Algorithm for Architectural Images. Mathematics 2022, 10, 3914. https://doi.org/10.3390/math10203914
Sun W, Wang H, Lu Y, Luo J, Liu T, Lin J, Pang Y, Zhang G. Deep-Learning-Based Complex Scene Text Detection Algorithm for Architectural Images. Mathematics. 2022; 10(20):3914. https://doi.org/10.3390/math10203914
Chicago/Turabian StyleSun, Weiwei, Huiqian Wang, Yi Lu, Jiasai Luo, Ting Liu, Jinzhao Lin, Yu Pang, and Guo Zhang. 2022. "Deep-Learning-Based Complex Scene Text Detection Algorithm for Architectural Images" Mathematics 10, no. 20: 3914. https://doi.org/10.3390/math10203914
APA StyleSun, W., Wang, H., Lu, Y., Luo, J., Liu, T., Lin, J., Pang, Y., & Zhang, G. (2022). Deep-Learning-Based Complex Scene Text Detection Algorithm for Architectural Images. Mathematics, 10(20), 3914. https://doi.org/10.3390/math10203914