An Efficient Text Detection Model for Street Signs
Abstract
:1. Introduction
2. Related Work
- (1)
- Street signs contain multiple types of text, such as Chinese characters, English, numbers, and punctuation marks.
- (2)
- Differences in image brightness can be significant owing to backlit shooting or nighttime shooting.
- (3)
- Because of different shooting angles, the tilt angle on the street sign text area may be large, leading to a perspective phenomenon.
3. Improving the East Model
3.1. Label Generation
3.2. Improving the Network Structure of the EAST Model
3.3. Loss Function
4. Experiment
4.1. Experimental Environment and Evaluation Metrics
4.2. Experimental Steps
4.3. Construction of SSTD Dataset
4.4. Experiment and Analysis
4.5. Discussion
5. Conclusions and Future Prospects
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mahajan, S.; Rani, R. Text detection and localization in scene images: A broad review. Artif. Intell. Rev. 2021. [Google Scholar] [CrossRef]
- Karatzas, D.; Gomez-Bigorda, L.; Nicolaou, A.; Ghosh, S.; Bagdanov, A.; Iwamura, M.; Matas, J.; Neumann, L.; Chandrasekhar, V.R.; Lu, S.; et al. ICDAR 2015 competition on Robust Reading. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; pp. 1156–1160. [Google Scholar] [CrossRef]
- Epshtein, B.; Ofek, E.; Wexler, Y. Detecting text in natural scenes with stroke width transform. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2963–2970. [Google Scholar]
- Matas, J.; Chum, O.; Urban, M.; Pajdla, T. Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 2004, 22, 761–767. [Google Scholar] [CrossRef]
- Lee, J.; Lee, P.; Lee, S.; Yuille, A.; Koch, C. AdaBoost for Text Detection in Natural Scene. In Proceedings of the 2011 International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, 18–21 September 2011; pp. 429–434. [Google Scholar]
- Wang, K.; Belongie, S.J. Word spotting in the wild. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2010; pp. 591–604. [Google Scholar]
- Tian, S.; Pan, Y.; Huang, C.; Lu, S.; Yu, K.; Tan, C.L. Text flow: A unified text detection system in natural scene images. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 4651–4659. [Google Scholar] [CrossRef] [Green Version]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef] [Green Version]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot MultiBox detector. In Proceedings of the European Conference on Computer Cision; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef] [Green Version]
- Liao, M.; Shi, B.; Bai, X.; Wang, X.; Liu, W. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. AAAI 2017, 31. Available online: https://ojs.aaai.org/index.php/AAAI/article/view/11196 (accessed on 26 June 2021).
- Liao, M.; Shi, B.; Bai, X. TextBoxes++: A single-shot oriented scene text detector. IEEE Trans. Image Process. 2018, 27, 3676–3690. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tian, Z.; Huang, W.; He, T.; He, P.; Qiao, Y. Detecting text in natural image with connectionist text proposal network. In Proceedings of the European Conf. on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 56–72. [Google Scholar]
- Shi, B.; Bai, X.; Belongie, S. Detecting oriented text in natural images by linking segments. In Proceedings of the IEEE Conference on Computer Vision and Pattern, Honolulu, HI, USA, 21–26 July 2017; pp. 2550–2558. [Google Scholar]
- Zhou, X.; Yao, C.; Wen, H.; Wang, Y.; Zhou, S.; He, W.; Liang, J. EAST: An efficient and accurate scene text detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5551–5560. [Google Scholar] [CrossRef] [Green Version]
- He, W.; Zhang, X.; Yin, F.; Liu, C. Deep direct regression for multi-oriented scene text detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 745–753. [Google Scholar] [CrossRef] [Green Version]
- Song, Y.; Cui, Y.; Han, H.; Shan, S.; Chen, X. Scene text detection via deep semantic feature fusion and attention-based refinement. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 3747–3752. [Google Scholar] [CrossRef]
- Xu, Y.; Wang, Y.; Zhou, W.; Wang, Y.; Yang, Z.; Bai, X. TextField: Learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process. 2019, 28, 5566–5579. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shelhamer, E.; Long, J.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef] [Green Version]
- Hu, H.; Zhang, C.; Luo, Y.; Wang, Y.; Han, J.; Ding, E. WordSup: Exploiting Word Annotations for Character based Text Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4950–4959. [Google Scholar] [CrossRef] [Green Version]
- Cao, D.; Dang, J.; Zhong, Y. Towards Accurate Scene Text Detection with Bidirectional Feature Pyramid Network. Symmetry 2021, 13, 486. [Google Scholar] [CrossRef]
- Ma, C.; Sun, L.; Zhong, Z.; Huo, Q. ReLaText: Exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognit. 2021, 111, 107684. [Google Scholar] [CrossRef]
- Nagaoka, Y.; Miyazaki, T.; Sugaya, Y.; Omachi, S. Text Detection Using Multi-Stage Region Proposal Network Sensitive to Text Scale †. Sensors 2021, 21, 1232. [Google Scholar] [CrossRef] [PubMed]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. ICLR. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar] [CrossRef] [Green Version]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.; Goyal, P.; Grishick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yao, C.; Bai, X.; Sang, N.; Zhou, X.; Zhou, S.; Cao, Z. Scene Text Detection via Holistic, Multi-Channel Prediction. arXiv 2016, arXiv:1606.09002. [Google Scholar]
- Saif, H.K.; Abdul, R.G.; Abdullah, A.; Ahmad, W.; Aeshah, A.; Jafreezal, J. Deep Neural Networks Combined with STN for Multi-Oriented Text Detection and Recognition. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 178–185. [Google Scholar]
- Ch’ng, C.K.; Chan, C.S.; Liu, C.L. Total-Text: Toward Orientation Robustness in Scene Text Detection. Int. J. Doc. Anal. Recognit. 2020, 23, 31–52. [Google Scholar] [CrossRef]
- Deng, G.Y.; Ming, Y.; Xue, J.H. RFRN: A Recurrent Feature Refinement Network for Accurate and Efficient Scene Text Detection. Neurocomputing 2021, 453, 465–481. [Google Scholar] [CrossRef]
Item | Year | Model | Advantages | Disadvantages |
---|---|---|---|---|
1 | 2016 | CTPN [14] | Good detection for long text areas | Cannot detect text areas with large tilt angles; a large number of parameters and operations |
2 | 2017 | SegLink [15] | Can detect inclined text areas | Easily misidentifies dense text as a single whole area |
3 | 2018 | TextField [21] | Can detect text areas with curvature | The post-processing process is complicated |
4 | 2021 | Cao et al. [23] | Good detection of quadrilateral text areas | Bidirectional feature pyramid leads to a large number of model parameters and computations |
5 | 2021 | Ma et al. [24] | Good detection of arbitrarily shaped text areas | The introduction of graph convolutional neural networks leads to a high parameter number and computation |
6 | 2021 | Nagaoka et al. [25] | Good detect small texts | Multiple RPN modules complicate the module structure and introduce more parameters; need pre-designed anchors |
Model | Precision (%) | Recall (%) | F1 (%) |
---|---|---|---|
EAST + VGG16 | 80.50 | 72.80 | 76.40 |
EAST + Focal loss | 82.28 | 73.56 | 77.68 |
EAST + Focal loss + improved shrinking algorithm | 87.61 | 73.52 | 79.95 |
EAST + Focal loss + improved shrinking algorithm + FEM | 85.59 | 76.94 | 81.03 |
Model | Precision (%) | Recall (%) | F1 (%) |
---|---|---|---|
CTPN + VGG16 | 74.20 | 51.60 | 60.90 |
Seglink + VGG16 | 73.10 | 76.80 | 75.00 |
WordSup | 77.03 | 79.33 | 78.16 |
Yao et al. [31] | 72.26 | 58.69 | 64.77 |
EAST + VGG16 | 80.50 | 72.80 | 76.40 |
EAST + ResNet50 | 77.32 | 81.66 | 79.43 |
EAST + PAVNET2x | 83.60 | 73.50 | 78.20 |
EAST + PAVNET2x MS | 84.64 | 77.23 | 80.77 |
STN-OCR [32] | 78.53 | 65.20 | 71.86 |
Poly-FRCNN-3 [33] | 80.00 | 66.00 | 73.00 |
RFPN-4s [34] | 85.10 | 76.80 | 80.80 |
Ours | 85.59 | 76.94 | 81.03 |
Model | Precision (%) | Recall (%) | F1 (%) |
---|---|---|---|
EAST + VGG16 | 73.80 | 84.61 | 78.84 |
Ours | 78.30 | 84.94 | 81.48 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lu, M.; Mou, Y.; Chen, C.-L.; Tang, Q. An Efficient Text Detection Model for Street Signs. Appl. Sci. 2021, 11, 5962. https://doi.org/10.3390/app11135962
Lu M, Mou Y, Chen C-L, Tang Q. An Efficient Text Detection Model for Street Signs. Applied Sciences. 2021; 11(13):5962. https://doi.org/10.3390/app11135962
Chicago/Turabian StyleLu, Manhuai, Yuanxiang Mou, Chin-Ling Chen, and Qiting Tang. 2021. "An Efficient Text Detection Model for Street Signs" Applied Sciences 11, no. 13: 5962. https://doi.org/10.3390/app11135962
APA StyleLu, M., Mou, Y., Chen, C. -L., & Tang, Q. (2021). An Efficient Text Detection Model for Street Signs. Applied Sciences, 11(13), 5962. https://doi.org/10.3390/app11135962