An Algorithm for Scene Text Detection Using Multibox and Semantic Segmentation
Abstract
:1. Introduction
2. Related Work
2.1. Multibox Text Detector
2.2. Semantic Segmentation
3. Proposed Algorithm
3.1. Overall Framework
3.2. Bounding Box Enhancement Module
Algorithm 1. Bounding box enhancement module (BEM) | |
Step 1. | Acquire a multibox result: Reci = ((x1, y1), (x2, y2))i. i refers to the i-th result. (x1, y1) is the coordinates of the upper left corner of the text bounding box. (x2, y2) is the coordinates of the right bottom corner of the text bounding box. |
Step 2. | Get the rectangular area AreaRec of Reci in the semantic segmentation result. |
Step 3. | Calculate the regional median probability: |
Step 4. | Compare to the threshold T: If : Delete the Reci in the multibox results. Else: continue. |
Step 5. | Repeat steps 1--4 until all multibox results have been calculated. |
3.3. Semantic Bounding Box Module
4. Experimental Results
4.1. Datasets
4.2. Exploration Study
4.3. Experimental Results
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Fletcher, L.A.; Kasturi, R. A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. Pattern Anal. Mach. Intell. 1988, 10, 910–918. [Google Scholar] [CrossRef] [Green Version]
- Ye, Q.; Doermann, D. Text detection and recognition in imagery: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1480–1500. [Google Scholar] [CrossRef] [PubMed]
- Wang, F.; Zhao, L.; Li, X.; Wang, X.; Tao, D. Geometry-aware scene text detection with instance transformation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018. [Google Scholar]
- Chen, H.; Tsai, S.S.; Schroth, G.; Chen, D.M.; Grzeszczuk, R.; Girod, B. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In Proceedings of the 18th IEEE International Conference on Image Processing (ICIP), Brussels, Belgium, 11–14 September 2011; IEEE: Piscataway, NJ, USA, 2011. [Google Scholar]
- Neumann, L.; Matas, J. Real-time scene text localization and recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; IEEE: Piscataway, NJ, USA, 2012. [Google Scholar]
- Shi, C.; Wang, C.; Xiao, B.; Zhang, Y.; Gao, S. Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit. Lett. 2013, 34, 107–116. [Google Scholar] [CrossRef]
- Epshtein, B.; Ofek, E.; Wexler, Y. Detecting text in natural scenes with stroke width transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; IEEE: Piscataway, NJ, USA, 2010. [Google Scholar] [Green Version]
- Mosleh, A.; Bouguila, N.; Hamza, A.B. Image text detection using a bandlet-based edge detector and stroke width transform. In BMVC; BMVC: Newcastle, UK, 2012. [Google Scholar]
- He, W.; Zhang, X.Y.; Yin, F.; Liu, C.L. Deep direct regression for multi-oriented scene text detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar]
- Liu, Y.; Jin, L. Deep matching prior network: Toward tighter multi-oriented text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar]
- Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 2018, 20, 3111–3122. [Google Scholar] [CrossRef]
- Xiang, D.; Guo, Q.; Xia, Y. Robust text detection with vertically-regressed proposal network. In European Conference on Computer Vision; Springer: Berlin, Germany, 2016. [Google Scholar]
- Shi, B.; Bai, X.; Belongie, S. Detecting oriented text in natural images by linking segments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar]
- Tian, Z.; Huang, W.; He, T.; He, P.; Qiao, Y. Detecting text in natural image with connectionist text proposal network. In European Conference on Computer Vision; Springer: Berlin, Germany, 2016. [Google Scholar]
- Tian, S.; Pan, Y.; Huang, C.; Lu, S.; Yu, K.; Lim Tan, C. Text flow: A unified text detection system in natural scene images. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; IEEE: Piscataway, NJ, USA, 2015. [Google Scholar]
- Cho, H.; Sung, M.; Jun, B. Canny text detector: Fast and robust scene text localization algorithm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar]
- Zhang, Z.; Zhang, C.; Shen, W.; Yao, C.; Liu, W.; Bai, X. Multi-oriented text detection with fully convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar]
- Gupta, A.; Vedaldi, A.; Zisserman, A. Synthetic data for text localisation in natural images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single Shot Multibox Detector. In European Conference on Computer Vision; Springer: Berlin, Germany, 2016. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar]
- Liao, M.; Shi, B.; Bai, X.; Wang, X.; Liu, W. TextBoxes: A Fast Text Detector with a Single Deep Neural Network; AAAI: Menlo Park, CA, USA, 2017. [Google Scholar]
- Zhou, X.; Yao, C.; Wen, H.; Wang, Y.; Zhou, S.; He, W.; Liang, J. EAST: An efficient and accurate scene text detector. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar]
- He, P.; Huang, W.; He, T.; Zhu, Q.; Qiao, Y.; Li, X. Single shot text detector with regional attention. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2015. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. ACM 2012, 60, 84–90. [Google Scholar] [CrossRef]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Rother, C.; Kolmogorov, V.; Blake, A. Grabcut: Interactive foreground extraction using iterated graph cuts. In ACM Transactions on Graphics (TOG); ACM: New York, NY, USA, 2004. [Google Scholar]
- Kohli, P.; Torr, P.H. Robust higher order potentials for enforcing label consistency. Int. J. Comput.Vis. 2009, 82, 302–324. [Google Scholar] [CrossRef]
- Krähenbühl, P.; Koltun, V. Efficient inference in fully connected crfs with gaussian edge potentials. Adv. Neural Inf. Process. Syst. 2011, 24, 109–117. [Google Scholar]
- Karatzas, D.; Shafait, F.; Uchida, S.; Iwamura, M.; Bigorda, L.G.; Mestre, S.R.; Mas, J.; Mota, D.F.; Almazan, J.A.; de Las Heras, L.P. ICDAR 2013 robust reading competition. In Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), Washington, DC, USA, 25–28 August 2013; IEEE: Piscataway, NJ, USA, 2013. [Google Scholar]
- Wang, K.; Belongie, S. Word spotting in the wild. In European Conference on Computer Vision; Springer: Berlin, Germany, 2010. [Google Scholar]
- Lucas, S.M.; Panaretos, A.; Sosa, L.; Tang, A.; Wong, S.; Young, R.; Ashida, K.; Nagai, H.; Okamoto, M.; Yamamoto, H. ICDAR 2003 robust reading competitions: entries, results, and future directions. Int. J. Doc. Anal Recognit. 2005, 7, 105–122. [Google Scholar] [CrossRef] [Green Version]
- Yin, X.-C.; Yin, X.; Huang, K.; Hao, H.W. Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 970–983. [Google Scholar] [PubMed]
- Neumann, L.; Matas, J. Efficient scene text localization and recognition with local character refinement. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; IEEE: Piscataway, NJ, USA, 2015. [Google Scholar]
- Zhang, Z.; Shen, W.; Yao, C.; Bai, X. Symmetry-based text line detection in natural scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; IEEE: Piscataway, NJ, USA, 2015. [Google Scholar]
Network | IC13 Standard | DetEval Standard | ||||
---|---|---|---|---|---|---|
R 1 | P 2 | F 3 | R | P | F | |
SSD | 52.20% | 87.76% | 65.18% | 53.06% | 87.24% | 65.98% |
SSD-IFE | 49.17% | 82.79% | 61.69% | 49.88% | 83.29% | 62.39% |
SSD-OMC | 69.30% | 81.21% | 74.78% | 70.15% | 85.10% | 76.91% |
SSTD-IFE | 74.54% | 83.65% | 78.83% | 75.39% | 84.07% | 79.50% |
SSTD-OMC | 80.51% | 82.27% | 81.38% | 80.43% | 87.60% | 83.86% |
Network | IC13 Standard | DetEval Standard | ||||
---|---|---|---|---|---|---|
R | P | F | R | P | F | |
SSD | 48.61% | 73.24% | 58.43% | 48.17% | 75.12% | 58.70% |
SSD-IFE | 50.34% | 79.91% | 61.77% | 50.34% | 79.91% | 61.77% |
SSD-OMC | 69.91% | 73.57% | 71.69% | 66.01% | 77.97% | 71.48% |
SSTD-IFE | 78.60% | 73.35% | 75.89% | 78.60% | 73.35% | 75.89% |
SSTD-OMC | 85.13% | 71.68% | 77.83% | 81.65% | 79.78% | 80.71% |
Network | IC13 Standard | DetEval Standard | ||||
---|---|---|---|---|---|---|
R 1 | P 2 | F 3 | R | P | F | |
FCN | 62.54% | 60.80% | 61.66% | 66.97 | 62.05% | 64.42% |
SSD | 52.20% | 87.76% | 65.18% | 53.06% | 87.24% | 65.98% |
SSD-OMC | 69.30% | 81.21% | 74.78% | 70.15% | 85.10% | 76.91% |
SSTD | 74.54% | 83.65% | 78.83% | 75.39% | 84.07% | 79.50% |
SSTD-OMC | 80.51% | 82.27% | 81.38% | 80.43% | 87.60% | 83.86% |
Network | IC13 Standard | DetEval Standard | ||||
---|---|---|---|---|---|---|
R | P | F | R | P | F | |
Yin [34] | 0.66 | 0.88 | 0.76 | 0.69 | 0.89 | 0.78 |
Neumann [35] | 0.72 | 0.82 | 0.77 | - | - | - |
Zhang [36] | 0.74 | 0.88 | 0.80 | 0.76 | 0.88 | 0.82 |
Textboxes [22] | 0.74 | 0.86 | 0.80 | 0.74 | 0.88 | 0.81 |
SSTD-OMC | 0.80 | 0.82 | 0.81 | 0.80 | 0.80 | 0.83 |
Network | IC13 Standard | DetEval Standard | ||||
---|---|---|---|---|---|---|
R | P | F | R | P | F | |
FCN | 50.78% | 54.94% | 52.78% | 55.13% | 54.94% | 55.03% |
SSD | 48.61% | 73.24% | 58.43% | 48.17% | 75.12% | 58.70% |
SSD-OMC | 69.91% | 73.57% | 71.69% | 66.01% | 77.97% | 71.48% |
SSTD | 78.60% | 73.35% | 75.89% | 78.60% | 73.35% | 75.89% |
SSTD-OMC | 85.13% | 71.68% | 77.83% | 81.65% | 79.78% | 80.71% |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qin, H.; Zhang, H.; Wang, H.; Yan, Y.; Zhang, M.; Zhao, W. An Algorithm for Scene Text Detection Using Multibox and Semantic Segmentation. Appl. Sci. 2019, 9, 1054. https://doi.org/10.3390/app9061054
Qin H, Zhang H, Wang H, Yan Y, Zhang M, Zhao W. An Algorithm for Scene Text Detection Using Multibox and Semantic Segmentation. Applied Sciences. 2019; 9(6):1054. https://doi.org/10.3390/app9061054
Chicago/Turabian StyleQin, Hongbo, Haodi Zhang, Hai Wang, Yujin Yan, Min Zhang, and Wei Zhao. 2019. "An Algorithm for Scene Text Detection Using Multibox and Semantic Segmentation" Applied Sciences 9, no. 6: 1054. https://doi.org/10.3390/app9061054
APA StyleQin, H., Zhang, H., Wang, H., Yan, Y., Zhang, M., & Zhao, W. (2019). An Algorithm for Scene Text Detection Using Multibox and Semantic Segmentation. Applied Sciences, 9(6), 1054. https://doi.org/10.3390/app9061054