Arbitrary-Shaped Text Detection with B-Spline Curve Network
Abstract
:1. Introduction
2. Related Works
3. Method
3.1. Datasets Resample
3.2. B-Spline Curve Modeling
3.3. Visual Feature Extraction
3.4. Text Region Reconstruction
3.5. Loss Function Design
4. Experiments
4.1. Implementation Details
4.2. Datasets
4.3. Ablation Study
4.4. Evaluation
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
DETR | Detection Transformer |
BSNet | B-Spline Curve Network |
OCR | Optical Character Recognition |
FPN | Feature Pyramid Networks |
NMS | Non-maximum Suppression Algorithm |
FFN | Feed Forward Networks |
NLP | Nature Language Processing |
SOTA | State-Of-The-Art |
IoU | Intersection of Union |
GIoU | Generalized Intersection of Union |
ResNet | Residual Neural Network |
FLOPs | Floating-point Operations per second |
Params | parameters |
References
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2020; Volume 12346. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July. [CrossRef] [Green Version]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. 2020, pp. 1–16. Available online: https://arxiv.org/abs/2010.04159 (accessed on 1 January 2023).
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar] [CrossRef] [Green Version]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Houlsby, N. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Raisi, Z.; Naiel, M.A.; Younes, G.; Wardell, S.; Zelek, J.S. Transformer-based text detection in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Virtual, 19–25 June 2021. [Google Scholar] [CrossRef]
- Lin, J.; Jiang, J.; Yan, Y.; Guo, C.; Wang, H.; Liu, W.; Wang, H. DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection. 2022. Available online: https://arxiv.org/abs/2208.09878 (accessed on 1 January 2023).
- Raisi, Z.; Younes, G.; Zelek, J. Arbitrary Shape Text Detection using Transformers. In Proceedings of the International Conference on Pattern Recognition, Montreal, QC, Canada, 21–25 August 2022; pp. 3238–3245. [Google Scholar] [CrossRef]
- Liu, Y.; Shen, C.; Jin, L.; He, T.; Chen, P.; Liu, C.; Chen, H. ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 8048–8064. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Chen, H.; Shen, C.; He, T.; Jin, L.; Wang, L. ABCNet: Real-Time Scene Text Spotting with Adaptive Bezier-Curve Network. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9806–9815. [Google Scholar] [CrossRef]
- Zhu, Y.; Chen, J.; Liang, L.; Kuang, Z.; Jin, L.; Zhang, W. Fourier Contour Embedding for Arbitrary-Shaped Text Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 19–25 June 2021; pp. 3123–3131. [Google Scholar]
- Zhang, Z.; Tong, M. Wavelet descriptor network for arbitrary-shaped text detection. J. Electron. Imaging 2022, 31, 43051. [Google Scholar] [CrossRef]
- Long, S.; Ruan, J.; Zhang, W.; He, X.; Wu, W.; Yao, C. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes. In Proceedings of the Computer Vision—ECCV 2018, Lecture Notes in Computer Science, Munich, Germany, 8–14 September 2018; Volume 11206, pp. 19–35. [Google Scholar] [CrossRef] [Green Version]
- Yang, C.; Chen, M.; Yuan, Y.; Wang, Q. Text Growing on Leaf. 2022, pp. 1–15. Available online: https://arxiv.org/abs/2209.03016 (accessed on 1 January 2023).
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. arXiv 2022, arXiv:2201.03545. [Google Scholar]
- Piegl, L.; Tiller, W. The Nurbs Book; Springer: Berlin/Heidelberg, Germany, 1997. [Google Scholar]
- Bingol, O.R.; Krishnamurthy, A. NURBS-Python: An open-source object-oriented NURBS modeling framework in Python. SoftwareX 2019, 9, 85–94. [Google Scholar] [CrossRef]
- Stewart, R.; Andriluka, M.; Ng, A.Y. End-to-End People Detection in Crowded Scenes. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 1–26 July 2016. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 1–26 July 2016. [Google Scholar] [CrossRef] [Green Version]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
- Yuan, T.L.; Zhu, Z.; Xu, K.; Li, C.J.; Mu, T.J.; Hu, S.M. A Large Chinese Text Dataset in the Wild. J. Comput. Sci. Technol. 2019, 34, 509–521. [Google Scholar] [CrossRef]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable convnets V2: More deformable, better results. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar] [CrossRef] [Green Version]
- Zhang, S.X.; Zhu, X.; Hou, J.B.; Liu, C.; Yang, C.; Wang, H.; Yin, X.C. Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 9696–9705. [Google Scholar] [CrossRef]
- Liao, M.; Zou, Z.; Wan, Z.; Yao, C.; Bai, X. Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 919–931. [Google Scholar] [CrossRef] [PubMed]
- Wang, W. TPSNet: Thin-Plate-Spline Representation for Arbitrary Shape Scene Text Detection. 2021. Available online: https://arxiv.org/abs/2110.12826 (accessed on 1 January 2023).
- Wang, W.; Xie, E.; Li, X.; Hou, W.; Lu, T.; Yu, G.; Shao, S. Shape robust text detection with progressive scale expansion network. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 9328–9337. [Google Scholar] [CrossRef] [Green Version]
- Baek, Y.; Lee, B.; Han, D.; Yun, S.; Lee, H. Character region awareness for text detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar] [CrossRef] [Green Version]
- Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. CARAFE: Content-aware reassembly of features. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar] [CrossRef] [Green Version]
- Tang, J.; Yang, Z.; Wang, Y.; Zheng, Q.; Xu, Y.; Bai, X. SegLink++: Detecting Dense and Arbitrary-shaped Scene Text by Instance-aware Component Grouping. Pattern Recognit. 2019, 96, 106954. [Google Scholar] [CrossRef]
- Liao, M.; Wan, Z.; Yao, C.; Chen, K.; Bai, X. Real-time scene text detection with differentiable binarization. In Proceedings of the AAAI 2020—34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 11474–11481. [Google Scholar] [CrossRef]
- Wang, Y.; Xie, H.; Zha, Z.; Xing, M.; Fu, Z.; Zhang, Y. Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11750–11759. [Google Scholar] [CrossRef]
- Wang, F.; Chen, Y.; Wu, F.; Li, X. TextRay: Contour-based Geometric Modeling for Arbitrary-shaped Scene Text Detection. In Proceedings of the MM 2020—28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020. [Google Scholar] [CrossRef]
- Ma, C.; Sun, L.; Zhong, Z.; Huo, Q. ReLaText: Exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognit. 2021, 111, 107684. [Google Scholar] [CrossRef]
- Chen, Z.; Wang, W.; Xie, E.; Yang, Z.; Lu, T.; Luo, P. FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation. 2021. Available online: https://arxiv.org/abs/2111.02394 (accessed on 1 January 2023).
- Wang, Z.; Silamu, W.; Li, Y.; Xu, M. A Robust Method: Arbitrary Shape Text Detection Combining Semantic and Position Information. Sensors 2022, 22, 9982. [Google Scholar] [CrossRef] [PubMed]
- Wu, X.; Qi, Y.; Song, J.; Yao, J.; Wang, Y.; Liu, Y.; Han, Y.; Qian, Q. CA-STD: Scene Text Detection in Arbitrary Shape Based on Conditional Attention. Information 2022, 13, 565. [Google Scholar] [CrossRef]
B-Spline Degree | Control Points | R(%) | P(%) | F(%) |
---|---|---|---|---|
3 | 4 | 83.3 | 85.2 | 84.3 |
3 | 5 | 83.5 | 85.6 | 84.6 |
3 | 6 | 83.0 | 85.8 | 84.4 |
4 | 5 | 81.7 | 86.8 | 84.2 |
4 | 6 | 83.9 | 85.5 | 84.7 |
R(%) | P(%) | F(%) | ||
---|---|---|---|---|
✗ | ✓ | 83.5 | 86.0 | 84.8 |
✓ | ✗ | 83.9 | 85.5 | 84.7 |
✓ | ✓ | 83.5 | 87.3 | 85.4 |
BackBone | Params (M) | FLOPs (G) | CTW1500 | Total-Text | ||||
---|---|---|---|---|---|---|---|---|
R(%) | P(%) | F(%) | R(%) | P(%) | F(%) | |||
ResNet50 | 39.8 | 122.6 | 83.5 | 87.3 | 85.4 | 82.9 | 87.2 | 85.0 |
ConvNeXt | 40.9 | 125.0 | 86.2 | 87.3 | 86.8 | 86.4 | 88.8 | 87.6 |
Methods | Year | Ext | CTW1500 | Total-Text | ||||
---|---|---|---|---|---|---|---|---|
R(%) | P(%) | F(%) | R(%) | P(%) | F(%) | |||
TextSnake [16] | 2018 | ✓ | 85.3 | 67.9 | 75.6 | 74.5 | 82.7 | 78.4 |
PSENet [31] | 2019 | ✓ | 79.7 | 84.8 | 82.2 | 84.0 | 78.0 | 80.9 |
CRAFT [32] | 2019 | ✓ | 81.1 | 86.0 | 83.5 | 79.9 | 87.6 | 83.6 |
PAN [33] | 2019 | ✓ | 81.2 | 86.4 | 83.7 | 81.0 | 89.3 | 85.0 |
Seglink++ [34] | 2019 | ✓ | 79.8 | 82.8 | 81.3 | 80.9 | 82.1 | 81.5 |
DRRG [28] | 2020 | ✓ | 83.0 | 85.9 | 84.5 | 84.9 | 86.5 | 85.7 |
DB [35] | 2020 | ✓ | 80.2 | 86.9 | 83.4 | 82.5 | 87.1 | 84.7 |
ABCNetv1 [13] | 2020 | ✓ | 78.5 | 84.4 | 81.4 | 81.3 | 87.9 | 84.5 |
ContourNet [36] | 2020 | ✗ | 84.1 | 83.7 | 83.9 | 83.9 | 86.9 | 85.4 |
TextRay [37] | 2020 | ✗ | 80.4 | 82.8 | 81.6 | 77.9 | 83.5 | 80.6 |
ABCNetv2 [12] | 2021 | ✓ | 83.8 | 85.6 | 84.7 | 84.1 | 90.2 | 87.0 |
ReLaText [38] | 2021 | ✓ | 83.3 | 86.2 | 84.8 | 83.1 | 84.8 | 84.0 |
FAST [39] | 2021 | ✗ | 80.4 | 87.2 | 83.7 | 82.5 | 90.5 | 86.3 |
FCENet [14] | 2021 | ✓ | 83.4 | 87.6 | 85.5 | 82.5 | 89.3 | 85.8 |
TPSNet [30] | 2021 | ✓ | 85.1 | 87.7 | 86.4 | 84.6 | 90.8 | 87.6 |
DBNet++ [29] | 2022 | ✓ | 87.9 | 82.8 | 85.3 | 88.9 | 83.2 | 86.0 |
WDNet [15] | 2022 | ✓ | 84.0 | 87.6 | 85.8 | 82.9 | 87.9 | 85.3 |
LeafText [17] | 2022 | ✗ | 83.9 | 87.1 | 85.5 | 84.0 | 90.8 | 87.3 |
Wang et al. [40] | 2023 | ✗ | 80.5 | 86.1 | 83.6 | 83.4 | 89.6 | 86.4 |
CA-STD [41] | 2023 | ✓ | 84.5 | 83.0 | 83.8 | 82.1 | 82.9 | 82.5 |
Ours | 2023 | ✗ | 86.2 | 87.3 | 86.8 | 86.4 | 88.8 | 87.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
You, Y.; Lei, Y.; Zhang, Z.; Tong, M. Arbitrary-Shaped Text Detection with B-Spline Curve Network. Sensors 2023, 23, 2418. https://doi.org/10.3390/s23052418
You Y, Lei Y, Zhang Z, Tong M. Arbitrary-Shaped Text Detection with B-Spline Curve Network. Sensors. 2023; 23(5):2418. https://doi.org/10.3390/s23052418
Chicago/Turabian StyleYou, Yuwei, Yuxin Lei, Zixu Zhang, and Minglei Tong. 2023. "Arbitrary-Shaped Text Detection with B-Spline Curve Network" Sensors 23, no. 5: 2418. https://doi.org/10.3390/s23052418
APA StyleYou, Y., Lei, Y., Zhang, Z., & Tong, M. (2023). Arbitrary-Shaped Text Detection with B-Spline Curve Network. Sensors, 23(5), 2418. https://doi.org/10.3390/s23052418