A Real-World Approach on the Problem of Chart Recognition Using Classification, Detection and Perspective Correction
Abstract
:1. Introduction
2. Chart Recognition
2.1. Image Classification
2.2. Object Detection
2.3. Perspective Correction
3. Related Works
4. Methods
4.1. Datasets
4.2. Training and Evaluation
4.2.1. Classification
4.2.2. Detection
4.2.3. Perspective Correction
5. Results
5.1. Classification
5.2. Detection
5.3. Perspective Correction
5.4. Discussion
6. Use Case
Illustrative Example
7. Final Remarks and Future Works
Author Contributions
Funding
Conflicts of Interest
References
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Savva, M.; Kong, N.; Chhajta, A.; Fei-Fei, L.; Agrawala, M.; Heer, J. Revision: Automated classification, analysis and redesign of chart images. In Proceedings of the 24th annual ACM Symposium on User Interface Software and Technology, Santa Barbara, CA, USA, 16–19 October 2011; pp. 393–402. [Google Scholar]
- Dai, W.; Wang, M.; Niu, Z.; Zhang, J. Chart decoder: Generating textual and numeric information from chart images automatically. J. Vis. Lang. Comput. 2018, 48, 101–109. [Google Scholar] [CrossRef]
- Battle, L.; Duan, P.; Miranda, Z.; Mukusheva, D.; Chang, R.; Stonebraker, M. Beagle: Automated extraction and interpretation of visualizations from the web. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, 21–26 April 2018; pp. 1–8. [Google Scholar]
- Jung, D.; Kim, W.; Song, H.; Hwang, J.i.; Lee, B.; Kim, B.; Seo, J. ChartSense: Interactive data extraction from chart images. In Proceedings of the 2017 chi Conference on Human Factors in Computing Systems, Denver, CO, USA, May 2017; pp. 6706–6717. [Google Scholar]
- Tummers, B. Datathief iii. 2006. Available online: https://datathief.org/ (accessed on 14 July 2020).
- Mishchenko, A.; Vassilieva, N. Chart image understanding and numerical data extraction. In Proceedings of the 2011 Sixth International Conference on Digital Information Management. IEEE, Melbourn, QLD, Australia, 26–28 September 2011; pp. 115–120. [Google Scholar]
- Al-Zaidy, R.A.; Choudhury, S.R.; Giles, C.L. Automatic summary generation for scientific data charts. In Proceedings of the Workshops at the thirtieth aaai Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–13 February 2016. [Google Scholar]
- Chagas, P.; Akiyama, R.; Meiguins, A.; Santos, C.; Saraiva, F.; Meiguins, B.; Morais, J. Evaluation of convolutional neural network architectures for chart image classification. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
- Kavasidis, I.; Pino, C.; Palazzo, S.; Rundo, F.; Giordano, D.; Messina, P.; Spampinato, C. A saliency-based convolutional neural network for table and chart detection in digitized documents. In Proceedings of the International Conference on Image Analysis and Processing, Trento, Italy, 9–13 September 2019; pp. 292–302. [Google Scholar]
- Svendsen, J.P. Chart Detection and Recognition in Graphics Intensive Business Documents. Ph.D. Thesis, University of Victoria, Victoria, BC, USA, 2015. [Google Scholar]
- He, Y.; Yu, X.; Gan, Y.; Zhu, T.; Xiong, S.; Peng, J.; Hu, L.; Xu, G.; Yuan, X. Bar charts detection and analysis in biomedical literature of PubMed Central. In Proceedings of the AMIA Annual Symposium Proceedings. American Medical Informatics Association, Washington, DC, USA, 4–8 November 2017; Volume 2017, p. 859. [Google Scholar]
- Fusiello, A.; Trucco, E.; Verri, A. A compact algorithm for rectification of stereo pairs. Mach. Vis. Appl. 2000, 12, 16–22. [Google Scholar] [CrossRef]
- Chaudhury, K.; DiVerdi, S.; Ioffe, S. Auto-rectification of user photos. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 3479–3483. [Google Scholar]
- Takezawa, Y.; Hasegawa, M.; Tabbone, S. Robust perspective rectification of camera-captured document images. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; Volume 6, pp. 27–32. [Google Scholar]
- Shemiakina, J.; Konovalenko, I.; Tropin, D.; Faradjev, I. Fast projective image rectification for planar objects with Manhattan structure. In Proceedings of the Twelfth International Conference on Machine Vision (ICMV 2019), Amsterdam, The Netherlands, 6–18 November 2019; p. 114331N. [Google Scholar]
- Khan, M.; Khan, S.S. Data and information visualization methods, and interactive mechanisms: A survey. Int. J. Comput. Appl. 2011, 34, 1–14. [Google Scholar]
- Tang, B.; Liu, X.; Lei, J.; Song, M.; Tao, D.; Sun, S.; Dong, F. Deepchart: Combining deep convolutional networks and deep belief networks in chart classification. Signal Process. 2016, 124, 156–161. [Google Scholar] [CrossRef]
- Junior, P.R.S.C.; De Freitas, A.A.; Akiyama, R.D.; Miranda, B.P.; De Araújo, T.D.O.; Dos Santos, C.G.R.; Meiguins, B.S.; De Morais, J.M. Architecture proposal for data extraction of chart images using Convolutional Neural Network. In Proceedings of the 2017 21st International Conference Information Visualisation (IV), London, UK, 11–14 July 2017; pp. 318–323. [Google Scholar]
- Linowes, J.; Babilinski, K. Augmented Reality for Developers: Build Practical Augmented Reality Applications with Unity, ARCore, ARKit, and Vuforia; Packt Publishing Ltd.: Birmingham, UK, 2017. [Google Scholar]
- Passian, A.; Imam, N. Nanosystems, Edge Computing, and the Next Generation Computing Systems. Sensors 2019, 19, 4048. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Parker, J.R. Algorithms for Image Processing and Computer Vision; John Wiley & Sons: Hoboken, NJ, USA, 2010. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 630–645. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the Thirty-first AAAI conference on artificial intelligence, Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 142–158. [Google Scholar] [CrossRef] [PubMed]
- Vinyals, O.; Toshev, A.; Bengio, S.; Erhan, D. Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 652–663. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE international Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE international Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 1440–1448. [Google Scholar]
- Jagannathan, L.; Jawahar, C. Perspective correction methods for camera based document analysis. In Proceedings of the First Int. Workshop on Camera-based Document Analysis and Recognition, Seoul, Korea, 29 August–1 September 2005; pp. 148–154. [Google Scholar]
- Li, X.; Zhi, Y.; Yin, P.; Duan, C. Camera model and parameter calibration. E&ES 2020, 440, 042099. [Google Scholar]
- Sheshkus, A.; Ingacheva, A.; Arlazarov, V.; Nikolaev, D. HoughNet: Neural network architecture for vanishing points detection. arXiv 2019, arXiv:1909.03812. [Google Scholar]
- Arlazarov, V.V.; Bulatov, K.B.; Chernov, T.S.; Arlazarov, V.L. MIDV-500: A dataset for identity document analysis and recognition on mobile devices in video stream. arXiv. 2019, 43. [CrossRef]
- El Abed, H.; Wenyin, L.; Margner, V. International conference on document analysis and recognition (ICDAR 2011)-competitions overview. In Proceedings of the 2011 International Conference on Document Analysis and Recognition, Beijing, China, 18–21 September 2011; pp. 1437–1443. [Google Scholar]
- Göbel, M.; Hassan, T.; Oro, E.; Orsi, G. ICDAR 2013 table competition. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25–28 August 2013; pp. 1449–1453. [Google Scholar]
- Gatos, B.; Danatsas, D.; Pratikakis, I.; Perantonis, S.J. Automatic table detection in document images. In Proceedings of the International Conference on Pattern Recognition and Image Analysis, Genoa, Italy, 7–11 September 2005; pp. 609–618. [Google Scholar]
- Schreiber, S.; Agne, S.; Wolf, I.; Dengel, A.; Ahmed, S. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; Volume 1, pp. 1162–1167. [Google Scholar]
- Huang, W.; Tan, C.L. Locating charts from scanned document pages. In Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Parana, Brazil, 23–26 September 2007; Volume 1, pp. 307–311. [Google Scholar]
- Poco, J.; Heer, J. Reverse-engineering visualizations: Recovering visual encodings from chart images. In Computer Graphics Forum; Wiley Online Library: Hoboken, NY, USA, 2017; Volume 36, pp. 353–363. [Google Scholar]
- Bylinskii, Z.; Borkin, M. Eye fixation metrics for large scale analysis of information visualizations. ETVIS Work. Eye Track. Vis. 2015. [Google Scholar]
- Barth, R.; IJsselmuiden, J.; Hemming, J.; Van Henten, E. Synthetic bootstrapping of convolutional neural networks for semantic plant part segmentation. Comput. Electron. Agric. 2019, 161, 291–304. [Google Scholar] [CrossRef]
- Shatnawi, M.; Abdallah, S. Improving handwritten arabic character recognition by modeling human handwriting distortions. ACM Trans. Asian Low-Resource Lang. Inf. Proc. 2015, 15, 1–12. [Google Scholar] [CrossRef]
- Eggert, C.; Winschel, A.; Lienhart, R. On the benefit of synthetic data for company logo detection. In Proceedings of the 23rd ACM international conference on Multimedia, Mountain View, CA, USA, 23–27 October 2015; pp. 1283–1286. [Google Scholar]
- CVPR2020 Workshop on Text and Documents in the Deep Learning Era. Available online: https://cvpr2020text.wordpress.com/ (accessed on 8 April 2020).
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. arXiv 2016, arXiv:1603.04467. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 8024–8035. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Chen, T.; Li, M.; Li, Y.; Lin, M.; Wang, N.; Wang, M.; Xiao, T.; Xu, B.; Zhang, C.; Zhang, Z. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv 2015, arXiv:1512.01274. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep learning for generic object detection: A survey. Int. J. Comput. Vis. 2020, 128, 261–318. [Google Scholar] [CrossRef] [Green Version]
- Wu, Y.; Kirillov, A.; Massa, F.; Lo, W.Y.; Girshick, R. Detectron2. 2019. Available online: https://github.com/facebookresearch/detectron2 (accessed on 8 April 2020).
- Image-Rectification. Available online: https://github.com/chsasank/Image-Rectification (accessed on 8 April 2020).
- Mordvintsev, A.; Olah, C.; Tyka, M. Deepdream-a code example for visualizing neural networks. Google Research. 2015. Available online: https://ai.googleblog.com/2015/07/deepdream-code-example-for-visualizing.html (accessed on 8 April 2020).
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Get Office Lens—Microsoft Store. Available online: https://www.microsoft.com/en-us/p/office-lens/9wzdncrfj3t8/ (accessed on 8 April 2020).
- Feng, X.; Jiang, Y.; Yang, X.; Du, M.; Li, X. Computer vision algorithms and hardware implementations: A survey. Integration 2019, 69, 309–320. [Google Scholar] [CrossRef]
- Raaen, K.; Kjellmo, I. Measuring latency in virtual reality systems. In Proceedings of the International Conference on Entertainment Computing, Tsukuba City, Japan, 18–21 September 2015; pp. 457–462. [Google Scholar]
Chart Types | Instances | ||
---|---|---|---|
Train | Test | Train + Test | |
Arc | 129 | 26 | 155 |
Area | 494 | 87 | 581 |
Bar | 3883 | 761 | 4644 |
Force Directed Graph | 1137 | 228 | 1365 |
Line | 2618 | 529 | 3147 |
Parallel Coordinates | 702 | 168 | 870 |
Pie | 2415 | 481 | 2896 |
Reorderable Matrix | 242 | 42 | 284 |
Scatterplot | 1797 | 228 | 2025 |
Scatterplot Matrix | 837 | 158 | 995 |
Sunburst | 540 | 65 | 605 |
Treemap | 626 | 73 | 699 |
Wordcloud | 2557 | 276 | 2833 |
Total | 17,977 | 3122 | 21,099 |
Architecture | Learning Rate | Decay | Accuracy–13 Classes | Accuracy–4 classes |
---|---|---|---|---|
Xception | ||||
ResNet152 | ||||
VGG19 | ||||
MobileNet | ||||
Arc | Area | Bar | Force Directed Graph | Line | Parallel Coordinates | Pie | Reorderable Matrix | Scatterplot | Scatterplot Matrix | Sunburst | Treemap | Wordcloud | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Arc | 26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Area | 0 | 87 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Bar | 0 | 0 | 728 | 0 | 28 | 1 | 0 | 0 | 1 | 2 | 0 | 1 | 0 |
Force Directed Graph | 0 | 0 | 0 | 222 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 2 |
Line | 0 | 2 | 9 | 1 | 511 | 0 | 4 | 0 | 1 | 1 | 0 | 0 | 0 |
Parallel Coordinates | 0 | 0 | 1 | 0 | 0 | 151 | 0 | 0 | 0 | 6 | 0 | 0 | 0 |
Pie | 0 | 0 | 1 | 0 | 1 | 1 | 164 | 0 | 0 | 1 | 0 | 0 | 0 |
Reorderable Matrix | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 477 | 0 | 1 | 0 | 0 | 0 |
Scatterplot | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 40 | 0 | 0 | 0 | 0 |
Scatterplot Matrix | 0 | 0 | 2 | 10 | 16 | 10 | 2 | 0 | 1 | 184 | 0 | 0 | 3 |
Sunburst | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 10 | 0 | 1 | 50 | 0 | 2 |
Treemap | 0 | 0 | 3 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 66 | 2 |
Wordcloud | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 271 |
Method | Inference Evaluation Time (s/img) | |||
---|---|---|---|---|
RetinaNet | 81.987 | 91.127 | 89.428 | 0.199285 |
Faster R-CNN | 69.68 | 79.101 | 77.428 | 0.210505 |
Class | RetinaNet | Faster R-CNN |
---|---|---|
Arc | 86.513 | 88.52 |
Area | 78.004 | 76.447 |
Bar | 87.428 | 82.334 |
Force Directed Graph | 79.746 | 45.519 |
Line | 83.494 | 61.618 |
Scatterplot Matrix | 81.072 | 70.266 |
Parallel Coordinates | 81.669 | 61.582 |
Pie | 88.26 | 83.063 |
Reorderable Matrix | 67.69 | 61.392 |
Scatterplot | 76.751 | 66.804 |
Sunburst | 76.84 | 52.785 |
Treemap | 89.843 | 73.419 |
Wordcloud | 88.52 | 88.633 |
Mode | Image | Full | Partial |
---|---|---|---|
Camera | Chart (a) | – | – |
Chart (b) | 9/16 | – | |
Chart (c) | 6/16 | 4/16 | |
Chart (d) | – | – | |
Chart (e) | – | – | |
Rectified | Chart (a) | – | – |
Chart (b) | 12/16 | – | |
Chart (c) | 12/16 | 1/16 | |
Chart (d) | – | – | |
Chart (e) | – | 6/16 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Araújo, T.; Chagas, P.; Alves, J.; Santos, C.; Sousa Santos, B.; Serique Meiguins, B. A Real-World Approach on the Problem of Chart Recognition Using Classification, Detection and Perspective Correction. Sensors 2020, 20, 4370. https://doi.org/10.3390/s20164370
Araújo T, Chagas P, Alves J, Santos C, Sousa Santos B, Serique Meiguins B. A Real-World Approach on the Problem of Chart Recognition Using Classification, Detection and Perspective Correction. Sensors. 2020; 20(16):4370. https://doi.org/10.3390/s20164370
Chicago/Turabian StyleAraújo, Tiago, Paulo Chagas, João Alves, Carlos Santos, Beatriz Sousa Santos, and Bianchi Serique Meiguins. 2020. "A Real-World Approach on the Problem of Chart Recognition Using Classification, Detection and Perspective Correction" Sensors 20, no. 16: 4370. https://doi.org/10.3390/s20164370
APA StyleAraújo, T., Chagas, P., Alves, J., Santos, C., Sousa Santos, B., & Serique Meiguins, B. (2020). A Real-World Approach on the Problem of Chart Recognition Using Classification, Detection and Perspective Correction. Sensors, 20(16), 4370. https://doi.org/10.3390/s20164370